Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-16 Thread via GitHub


danny0405 merged PR #9850:
URL: https://github.com/apache/hudi/pull/9850


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-16 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1763821127

   
   ## CI report:
   
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   * 1297401002c4712836c5c09f56868add98f6f847 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20339)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-15 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1763680880

   
   ## CI report:
   
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332)
 
   * 1297401002c4712836c5c09f56868add98f6f847 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20339)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-15 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1763675402

   
   ## CI report:
   
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332)
 
   * 1297401002c4712836c5c09f56868add98f6f847 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-15 Thread via GitHub


danny0405 commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1763620590

   
[6480.patch.zip](https://github.com/apache/hudi/files/12911422/6480.patch.zip)
   Thanks for the contribution, I have reviewed and created a patch, you need 
to rebase with latest master then apply the path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1359177852


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java:
##
@@ -524,9 +524,15 @@ private void checkInstantState(HoodieInstant.State state, 
String instantStr) {
 }
 
 protected String lastCompleteInstant() {
-  return OptionsResolver.isMorTable(conf)
-  ? TestUtils.getLastDeltaCompleteInstant(basePath)
-  : TestUtils.getLastCompleteInstant(basePath, 
HoodieTimeline.COMMIT_ACTION);
+  if (OptionsResolver.isLocklessMultiWriter(this.conf)) {
+// If there are lockless multiple writer, fetch last complete instant 
of current writer from ckp metadata
+// because the writer started first might finish later, similarly, the 
writer started later might finish first.
+return this.ckpMetadata.lastCompleteInstant();
+  } else {

Review Comment:
   Does `this.ckpMetadata.isEmpty()` indicate a failover?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761799261

   
   ## CI report:
   
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761786081

   
   ## CI report:
   
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   * a247acb83fce64f8f9b8e9f696a53a629534b4f6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20331)
 
   * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761621340

   
   ## CI report:
   
   * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327)
 
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   * a247acb83fce64f8f9b8e9f696a53a629534b4f6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20331)
 
   * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761538586

   
   ## CI report:
   
   * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305)
 
   * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327)
 
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   * a247acb83fce64f8f9b8e9f696a53a629534b4f6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20331)
 
   * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761523198

   
   ## CI report:
   
   * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305)
 
   * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327)
 
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   * a247acb83fce64f8f9b8e9f696a53a629534b4f6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20331)
 
   * f73c1b694ae0cb5557a109f28cfe25f87a264825 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761462595

   
   ## CI report:
   
   * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305)
 
   * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327)
 
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   * a247acb83fce64f8f9b8e9f696a53a629534b4f6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761447384

   
   ## CI report:
   
   * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305)
 
   * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327)
 
   * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


beyond1920 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358191937


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java:
##
@@ -524,9 +524,15 @@ private void checkInstantState(HoodieInstant.State state, 
String instantStr) {
 }
 
 protected String lastCompleteInstant() {
-  return OptionsResolver.isMorTable(conf)
-  ? TestUtils.getLastDeltaCompleteInstant(basePath)
-  : TestUtils.getLastCompleteInstant(basePath, 
HoodieTimeline.COMMIT_ACTION);
+  if (OptionsResolver.isLocklessMultiWriter(this.conf)) {
+// If there are lockless multiple writer, fetch last complete instant 
of current writer from ckp metadata
+// because the writer started first might finish later, similarly, the 
writer started later might finish first.
+return this.ckpMetadata.lastCompleteInstant();
+  } else {

Review Comment:
   In most case `return this.ckpMetadata.lastCompleteInstant();`  directly 
works fine.
   But the case `testSubtaskFails` would fail because failover happens, the ckp 
meta might be cleaned, fetch the instant from timeline.
   I add branch to handle this case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


beyond1920 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358145916


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java:
##
@@ -538,11 +539,20 @@ public void testWriteMultiWriterInvolved() throws 
Exception {
 .assertNextEvent()
 .checkpointComplete(1)
 .checkWrittenData(EXPECTED3, 1);
-// step to commit the 2nd txn, should throw exception
-// for concurrent modification of same fileGroups
-pipeline1.checkpoint(1)
-.assertNextEvent()
-.checkpointCompleteThrows(1, HoodieWriteConflictException.class, 
"Cannot resolve conflicts");
+// step to commit the 2nd txn
+pipeline1 = pipeline1
+.checkpoint(1)
+.assertNextEvent();
+if (OptionsResolver.isLocklessMultiWriter(conf)) {
+  // should success for concurrent modification of same fileGroups if 
using lockless multi writers

Review Comment:
   No, it's only be true if all the following condition is satisfied:
   * the table is MOR table
   * the index is simple bucket index
   * enable OPTIMISTIC_CONCURRENCY_CONTROL
   
   So it could be false for COW table or MOR table with other index type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358087824


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java:
##
@@ -524,9 +524,15 @@ private void checkInstantState(HoodieInstant.State state, 
String instantStr) {
 }
 
 protected String lastCompleteInstant() {
-  return OptionsResolver.isMorTable(conf)
-  ? TestUtils.getLastDeltaCompleteInstant(basePath)
-  : TestUtils.getLastCompleteInstant(basePath, 
HoodieTimeline.COMMIT_ACTION);
+  if (OptionsResolver.isLocklessMultiWriter(this.conf)) {
+// If there are lockless multiple writer, fetch last complete instant 
of current writer from ckp metadata
+// because the writer started first might finish later, similarly, the 
writer started later might finish first.
+return this.ckpMetadata.lastCompleteInstant();
+  } else {

Review Comment:
   Got it, maybe we can use `return this.ckpMetadata.lastCompleteInstant();` 
directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358086955


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteMergeOnReadWithCompact.java:
##
@@ -70,6 +86,80 @@ protected Map getMiniBatchExpected() {
 return expected;
   }
 
+  @Test
+  public void testLocklessMultiWriterWithPartialUpdatePayload() throws 
Exception {
+conf.setString(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), 
WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name());
+conf.setString(FlinkOptions.INDEX_TYPE, 
HoodieIndex.IndexType.BUCKET.name());
+conf.setString(FlinkOptions.PAYLOAD_CLASS_NAME, 
PartialUpdateAvroPayload.class.getName());
+// disable schedule compaction in writers
+conf.setBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED, false);
+conf.setBoolean(FlinkOptions.PRE_COMBINE, true);
+
+// start pipeline1 and insert record: [id1,par1,id1,Danny,null,1,par1], 
suspend the tx commit
+List dataset1 = Collections.singletonList(
+insertRow(
+StringData.fromString("id1"), StringData.fromString("Danny"), null,
+TimestampData.fromEpochMillis(1), StringData.fromString("par1")));
+TestHarness pipeline1 = preparePipeline(conf)
+.consume(dataset1)
+.assertEmptyDataFiles();
+
+// start pipeline2 and insert record: [id1,par1,id1,null,23,1,par1], 
suspend the tx commit
+Configuration conf2 = conf.clone();
+conf2.setString(FlinkOptions.WRITE_CLIENT_ID, "2");
+List dataset2 = Collections.singletonList(
+insertRow(
+StringData.fromString("id1"), null, 23,
+TimestampData.fromEpochMillis(2), StringData.fromString("par1")));
+TestHarness pipeline2 = preparePipeline(conf2)
+.consume(dataset2)
+.assertEmptyDataFiles();
+
+// step to commit the 1st txn

Review Comment:
   Can we add another case: we have both completed and inflight instants, then 
we schedule a compaction, and we validate the ro and rt view for correctness.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358070940


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/TransactionUtils.java:
##
@@ -67,7 +67,8 @@ public static Option 
resolveWriteConflictIfAny(
   Option lastCompletedTxnOwnerInstant,
   boolean reloadActiveTimeline,
   Set pendingInstants) throws HoodieWriteConflictException {
-if 
(config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl()) {
+// Skip to resolve conflict if there are multiple writers working on the 
same MOR table with simple bucket index
+if 
(config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() && 
!config.isLocklessMultiWriter()) {
   // deal with pendingInstants

Review Comment:
   In `HoodieFlinkWriteClient.preTxn`, we can also skip the metadata collection 
if the NB-CC is enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358068254


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java:
##
@@ -538,11 +539,20 @@ public void testWriteMultiWriterInvolved() throws 
Exception {
 .assertNextEvent()
 .checkpointComplete(1)
 .checkWrittenData(EXPECTED3, 1);
-// step to commit the 2nd txn, should throw exception
-// for concurrent modification of same fileGroups
-pipeline1.checkpoint(1)
-.assertNextEvent()
-.checkpointCompleteThrows(1, HoodieWriteConflictException.class, 
"Cannot resolve conflicts");
+// step to commit the 2nd txn
+pipeline1 = pipeline1
+.checkpoint(1)
+.assertNextEvent();
+if (OptionsResolver.isLocklessMultiWriter(conf)) {
+  // should success for concurrent modification of same fileGroups if 
using lockless multi writers

Review Comment:
   It should be always true? `OptionsResolver.isLocklessMultiWriter(conf)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358061351


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##
@@ -360,6 +367,13 @@ public static boolean 
allowCommitOnEmptyBatch(Configuration conf) {
 return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), false);
   }
 
+  /**
+   * Returns whether this is lockless multi-writer ingestion.
+   */
+  public static boolean isLocklessMultiWriter(Configuration config) {
+return isMorTable(config) && isSimpleBucketIndexType(config) && 
isOptimisticConcurrencyControl(config);

Review Comment:
   `isLocklessMultiWriter` -> `isNonBlockingConcurrencyControl`, also fix the 
comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358061351


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##
@@ -360,6 +367,13 @@ public static boolean 
allowCommitOnEmptyBatch(Configuration conf) {
 return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), false);
   }
 
+  /**
+   * Returns whether this is lockless multi-writer ingestion.
+   */
+  public static boolean isLocklessMultiWriter(Configuration config) {
+return isMorTable(config) && isSimpleBucketIndexType(config) && 
isOptimisticConcurrencyControl(config);

Review Comment:
   `isLocklessMultiWriter` -> `isNonBlockingConcurrencyControl`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358057951


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##
@@ -2609,6 +2614,12 @@ public Integer getWritesFileIdEncoding() {
 return props.getInteger(WRITES_FILEID_ENCODING, 
HoodieMetadataPayload.RECORD_INDEX_FIELD_FILEID_ENCODING_UUID);
   }
 
+  public boolean isLocklessMultiWriter() {
+return getTableType().equals(HoodieTableType.MERGE_ON_READ)

Review Comment:
   `isLocklessMultiWriter` -> `isNonBlockingConcurrencyControl`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1358056689


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java:
##
@@ -540,9 +541,7 @@ public void testWriteMultiWriterInvolved() throws Exception 
{
 .checkWrittenData(EXPECTED3, 1);
 // step to commit the 2nd txn, should throw exception
 // for concurrent modification of same fileGroups
-pipeline1.checkpoint(1)
-.assertNextEvent()
-.checkpointCompleteThrows(1, HoodieWriteConflictException.class, 
"Cannot resolve conflicts");
+testConcurrentCommit(pipeline1);

Review Comment:
   Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761235101

   
   ## CI report:
   
   * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305)
 
   * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761221614

   
   ## CI report:
   
   * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305)
 
   * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


beyond1920 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1357983985


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java:
##
@@ -540,9 +541,7 @@ public void testWriteMultiWriterInvolved() throws Exception 
{
 .checkWrittenData(EXPECTED3, 1);
 // step to commit the 2nd txn, should throw exception
 // for concurrent modification of same fileGroups
-pipeline1.checkpoint(1)
-.assertNextEvent()
-.checkpointCompleteThrows(1, HoodieWriteConflictException.class, 
"Cannot resolve conflicts");
+testConcurrentCommit(pipeline1);

Review Comment:
   The behavior of existed test cases (`testWriteMultiWriterInvolved` and 
`testWriteMultiWriterPartialOverlapping`) change after support non-blocking 
multiple writers. So those two tests need to be updated.
   I also add some new tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-13 Thread via GitHub


beyond1920 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1357981634


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java:
##
@@ -524,9 +528,7 @@ private void checkInstantState(HoodieInstant.State state, 
String instantStr) {
 }
 
 protected String lastCompleteInstant() {
-  return OptionsResolver.isMorTable(conf)
-  ? TestUtils.getLastDeltaCompleteInstant(basePath)
-  : TestUtils.getLastCompleteInstant(basePath, 
HoodieTimeline.COMMIT_ACTION);
+  return this.ckpMetadata.lastCompleteInstant();

Review Comment:
   Updated.
   Add a special branch to handle lockless multiple writers.
   For multiple writes, the writer started first might finish later, similarly, 
the writer started later might finish first. So the last completed instant from 
ckp metadata might not be equals to the last completed instant from timeline.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-12 Thread via GitHub


beyond1920 commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1760672959

   @danny0405 Thanks for review. I would add more tests soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-12 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1357634569


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java:
##
@@ -540,9 +541,7 @@ public void testWriteMultiWriterInvolved() throws Exception 
{
 .checkWrittenData(EXPECTED3, 1);
 // step to commit the 2nd txn, should throw exception
 // for concurrent modification of same fileGroups
-pipeline1.checkpoint(1)
-.assertNextEvent()
-.checkpointCompleteThrows(1, HoodieWriteConflictException.class, 
"Cannot resolve conflicts");
+testConcurrentCommit(pipeline1);

Review Comment:
   Can we add new tests instead of ammending existing one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-12 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1357634270


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteMergeOnRead.java:
##
@@ -213,6 +213,14 @@ protected Map getMiniBatchExpected() {
 return expected;
   }
 
+  @Override
+  protected void testConcurrentCommit(TestHarness pipeline) throws Exception {
+pipeline.checkpoint(1)
+.assertNextEvent()

Review Comment:
   I'm confused by the method name, is it a concurrent commit? The pipeline 
only commits 1 commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-12 Thread via GitHub


danny0405 commented on code in PR #9850:
URL: https://github.com/apache/hudi/pull/9850#discussion_r1357633319


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java:
##
@@ -524,9 +528,7 @@ private void checkInstantState(HoodieInstant.State state, 
String instantStr) {
 }
 
 protected String lastCompleteInstant() {
-  return OptionsResolver.isMorTable(conf)
-  ? TestUtils.getLastDeltaCompleteInstant(basePath)
-  : TestUtils.getLastCompleteInstant(basePath, 
HoodieTimeline.COMMIT_ACTION);
+  return this.ckpMetadata.lastCompleteInstant();

Review Comment:
   Why we must fetch the instant from ckp metadata?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-12 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1759954394

   
   ## CI report:
   
   * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-12 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1759544663

   
   ## CI report:
   
   * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]

2023-10-12 Thread via GitHub


hudi-bot commented on PR #9850:
URL: https://github.com/apache/hudi/pull/9850#issuecomment-1759527066

   
   ## CI report:
   
   * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org