Now hudi  just support write、compaction  concurrency control. But some
 scenario need  write  concurrency control.Such as two spark job with
different data source ,need to write to the same hudi table.

I have two Proposal:

1. first  step :support write concurrency control on different partition
but now  when two client write data to different partition, will meet these
error

a、Rolling back commits failed

b、instants version already exist
[2020-05-25 21:20:34,732] INFO Checking for file exists
?/tmp/HudiDLATestPartition/.hoodie/20200525212031.clean.inflight
(org.apache.hudi.common.table.timeline.HoodieActiveTimeline)
Exception in thread "main" org.apache.hudi.exception.HoodieIOException:
Failed to create file /tmp/HudiDLATestPartition/.hoodie/20200525212031.clean
  at
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.createImmutableFileInPath(HoodieActiveTimeline.java:437)
  at
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:327)
  at
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionCleanInflightToComplete(HoodieActiveTimeline.java:290)
  at
org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:183)
  at
org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:142)
  at
org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
  at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)

c、two client's archiving conflict

d、the read client meets "Unable to infer schema for Parquet. It must be
specified manually.;"

2. second step: support insert、upsert、compaction concurrency control  on
different  isolation level such as Serializable、WriteSerializable.

hudi can design a mechanism to check the confict in
AbstractHoodieWriteClient.commit()

I created a issue https://issues.apache.org/jira/browse/HUDI-944

Best Regards,
Wei Li.

Reply via email to