Hi, >From my experience so far of working with Hudi, I understand that Hudi is not designed to handle concurrent writes from 2 different sources for example 2 instances of HoodieDeltaStreamer are simultaneously running and writing to the same dataset. I have experienced such a case can result in duplicate writes in case of inserts. Also once duplicates are written, you are not sure of which file the update will go to next since the record is already present in 2 different parquet files. Please correct me if I am wrong.
Having experienced this in few Hudi datasets, I now want to delete one of the parquet files which contains duplicates in some partition of a COW type Hudi dataset. I want to know if deleting a parquet file manually can have any repercussions? If yes, what all can be the side effects of doing the same? Any leads will be highly appreciated.