Now hudi just support write、compaction concurrency control. But some scenario need write concurrency control.Such as two spark job with different data source ,need to write to the same hudi table.
I have two Proposal: 1. first step :support write concurrency control on different partition but now when two client write data to different partition, will meet these error a、Rolling back commits failed b、instants version already exist [2020-05-25 21:20:34,732] INFO Checking for file exists ?/tmp/HudiDLATestPartition/.hoodie/20200525212031.clean.inflight (org.apache.hudi.common.table.timeline.HoodieActiveTimeline) Exception in thread "main" org.apache.hudi.exception.HoodieIOException: Failed to create file /tmp/HudiDLATestPartition/.hoodie/20200525212031.clean at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:437) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionCleanInflightToComplete(HoodieActiveTimeline.java:290) at org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:183) at org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:142) at org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) c、two client's archiving conflict d、the read client meets "Unable to infer schema for Parquet. It must be specified manually.;" 2. second step: support insert、upsert、compaction concurrency control on different isolation level such as Serializable、WriteSerializable. hudi can design a mechanism to check the confict in AbstractHoodieWriteClient.commit() I created a issue https://issues.apache.org/jira/browse/HUDI-944 Best Regards, Wei Li.