Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 merged PR #9850: URL: https://github.com/apache/hudi/pull/9850 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1763821127 ## CI report: * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN * 1297401002c4712836c5c09f56868add98f6f847 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20339) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1763680880 ## CI report: * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332) * 1297401002c4712836c5c09f56868add98f6f847 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20339) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1763675402 ## CI report: * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332) * 1297401002c4712836c5c09f56868add98f6f847 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1763620590 [6480.patch.zip](https://github.com/apache/hudi/files/12911422/6480.patch.zip) Thanks for the contribution, I have reviewed and created a patch, you need to rebase with latest master then apply the path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1359177852 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java: ## @@ -524,9 +524,15 @@ private void checkInstantState(HoodieInstant.State state, String instantStr) { } protected String lastCompleteInstant() { - return OptionsResolver.isMorTable(conf) - ? TestUtils.getLastDeltaCompleteInstant(basePath) - : TestUtils.getLastCompleteInstant(basePath, HoodieTimeline.COMMIT_ACTION); + if (OptionsResolver.isLocklessMultiWriter(this.conf)) { +// If there are lockless multiple writer, fetch last complete instant of current writer from ckp metadata +// because the writer started first might finish later, similarly, the writer started later might finish first. +return this.ckpMetadata.lastCompleteInstant(); + } else { Review Comment: Does `this.ckpMetadata.isEmpty()` indicate a failover? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761799261 ## CI report: * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761786081 ## CI report: * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN * a247acb83fce64f8f9b8e9f696a53a629534b4f6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20331) * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761621340 ## CI report: * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327) * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN * a247acb83fce64f8f9b8e9f696a53a629534b4f6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20331) * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761538586 ## CI report: * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305) * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327) * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN * a247acb83fce64f8f9b8e9f696a53a629534b4f6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20331) * f73c1b694ae0cb5557a109f28cfe25f87a264825 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20332) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761523198 ## CI report: * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305) * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327) * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN * a247acb83fce64f8f9b8e9f696a53a629534b4f6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20331) * f73c1b694ae0cb5557a109f28cfe25f87a264825 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761462595 ## CI report: * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305) * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327) * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN * a247acb83fce64f8f9b8e9f696a53a629534b4f6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761447384 ## CI report: * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305) * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327) * d827ee5a0f974aa38b4298f45d6a7d4f7e465b24 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
beyond1920 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358191937 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java: ## @@ -524,9 +524,15 @@ private void checkInstantState(HoodieInstant.State state, String instantStr) { } protected String lastCompleteInstant() { - return OptionsResolver.isMorTable(conf) - ? TestUtils.getLastDeltaCompleteInstant(basePath) - : TestUtils.getLastCompleteInstant(basePath, HoodieTimeline.COMMIT_ACTION); + if (OptionsResolver.isLocklessMultiWriter(this.conf)) { +// If there are lockless multiple writer, fetch last complete instant of current writer from ckp metadata +// because the writer started first might finish later, similarly, the writer started later might finish first. +return this.ckpMetadata.lastCompleteInstant(); + } else { Review Comment: In most case `return this.ckpMetadata.lastCompleteInstant();` directly works fine. But the case `testSubtaskFails` would fail because failover happens, the ckp meta might be cleaned, fetch the instant from timeline. I add branch to handle this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
beyond1920 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358145916 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java: ## @@ -538,11 +539,20 @@ public void testWriteMultiWriterInvolved() throws Exception { .assertNextEvent() .checkpointComplete(1) .checkWrittenData(EXPECTED3, 1); -// step to commit the 2nd txn, should throw exception -// for concurrent modification of same fileGroups -pipeline1.checkpoint(1) -.assertNextEvent() -.checkpointCompleteThrows(1, HoodieWriteConflictException.class, "Cannot resolve conflicts"); +// step to commit the 2nd txn +pipeline1 = pipeline1 +.checkpoint(1) +.assertNextEvent(); +if (OptionsResolver.isLocklessMultiWriter(conf)) { + // should success for concurrent modification of same fileGroups if using lockless multi writers Review Comment: No, it's only be true if all the following condition is satisfied: * the table is MOR table * the index is simple bucket index * enable OPTIMISTIC_CONCURRENCY_CONTROL So it could be false for COW table or MOR table with other index type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358087824 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java: ## @@ -524,9 +524,15 @@ private void checkInstantState(HoodieInstant.State state, String instantStr) { } protected String lastCompleteInstant() { - return OptionsResolver.isMorTable(conf) - ? TestUtils.getLastDeltaCompleteInstant(basePath) - : TestUtils.getLastCompleteInstant(basePath, HoodieTimeline.COMMIT_ACTION); + if (OptionsResolver.isLocklessMultiWriter(this.conf)) { +// If there are lockless multiple writer, fetch last complete instant of current writer from ckp metadata +// because the writer started first might finish later, similarly, the writer started later might finish first. +return this.ckpMetadata.lastCompleteInstant(); + } else { Review Comment: Got it, maybe we can use `return this.ckpMetadata.lastCompleteInstant();` directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358086955 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteMergeOnReadWithCompact.java: ## @@ -70,6 +86,80 @@ protected Map getMiniBatchExpected() { return expected; } + @Test + public void testLocklessMultiWriterWithPartialUpdatePayload() throws Exception { +conf.setString(HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key(), WriteConcurrencyMode.OPTIMISTIC_CONCURRENCY_CONTROL.name()); +conf.setString(FlinkOptions.INDEX_TYPE, HoodieIndex.IndexType.BUCKET.name()); +conf.setString(FlinkOptions.PAYLOAD_CLASS_NAME, PartialUpdateAvroPayload.class.getName()); +// disable schedule compaction in writers +conf.setBoolean(FlinkOptions.COMPACTION_SCHEDULE_ENABLED, false); +conf.setBoolean(FlinkOptions.PRE_COMBINE, true); + +// start pipeline1 and insert record: [id1,par1,id1,Danny,null,1,par1], suspend the tx commit +List dataset1 = Collections.singletonList( +insertRow( +StringData.fromString("id1"), StringData.fromString("Danny"), null, +TimestampData.fromEpochMillis(1), StringData.fromString("par1"))); +TestHarness pipeline1 = preparePipeline(conf) +.consume(dataset1) +.assertEmptyDataFiles(); + +// start pipeline2 and insert record: [id1,par1,id1,null,23,1,par1], suspend the tx commit +Configuration conf2 = conf.clone(); +conf2.setString(FlinkOptions.WRITE_CLIENT_ID, "2"); +List dataset2 = Collections.singletonList( +insertRow( +StringData.fromString("id1"), null, 23, +TimestampData.fromEpochMillis(2), StringData.fromString("par1"))); +TestHarness pipeline2 = preparePipeline(conf2) +.consume(dataset2) +.assertEmptyDataFiles(); + +// step to commit the 1st txn Review Comment: Can we add another case: we have both completed and inflight instants, then we schedule a compaction, and we validate the ro and rt view for correctness. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358070940 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/TransactionUtils.java: ## @@ -67,7 +67,8 @@ public static Option resolveWriteConflictIfAny( Option lastCompletedTxnOwnerInstant, boolean reloadActiveTimeline, Set pendingInstants) throws HoodieWriteConflictException { -if (config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl()) { +// Skip to resolve conflict if there are multiple writers working on the same MOR table with simple bucket index +if (config.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl() && !config.isLocklessMultiWriter()) { // deal with pendingInstants Review Comment: In `HoodieFlinkWriteClient.preTxn`, we can also skip the metadata collection if the NB-CC is enabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358068254 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java: ## @@ -538,11 +539,20 @@ public void testWriteMultiWriterInvolved() throws Exception { .assertNextEvent() .checkpointComplete(1) .checkWrittenData(EXPECTED3, 1); -// step to commit the 2nd txn, should throw exception -// for concurrent modification of same fileGroups -pipeline1.checkpoint(1) -.assertNextEvent() -.checkpointCompleteThrows(1, HoodieWriteConflictException.class, "Cannot resolve conflicts"); +// step to commit the 2nd txn +pipeline1 = pipeline1 +.checkpoint(1) +.assertNextEvent(); +if (OptionsResolver.isLocklessMultiWriter(conf)) { + // should success for concurrent modification of same fileGroups if using lockless multi writers Review Comment: It should be always true? `OptionsResolver.isLocklessMultiWriter(conf)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358061351 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java: ## @@ -360,6 +367,13 @@ public static boolean allowCommitOnEmptyBatch(Configuration conf) { return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), false); } + /** + * Returns whether this is lockless multi-writer ingestion. + */ + public static boolean isLocklessMultiWriter(Configuration config) { +return isMorTable(config) && isSimpleBucketIndexType(config) && isOptimisticConcurrencyControl(config); Review Comment: `isLocklessMultiWriter` -> `isNonBlockingConcurrencyControl`, also fix the comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358061351 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java: ## @@ -360,6 +367,13 @@ public static boolean allowCommitOnEmptyBatch(Configuration conf) { return conf.getBoolean(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), false); } + /** + * Returns whether this is lockless multi-writer ingestion. + */ + public static boolean isLocklessMultiWriter(Configuration config) { +return isMorTable(config) && isSimpleBucketIndexType(config) && isOptimisticConcurrencyControl(config); Review Comment: `isLocklessMultiWriter` -> `isNonBlockingConcurrencyControl` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358057951 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -2609,6 +2614,12 @@ public Integer getWritesFileIdEncoding() { return props.getInteger(WRITES_FILEID_ENCODING, HoodieMetadataPayload.RECORD_INDEX_FIELD_FILEID_ENCODING_UUID); } + public boolean isLocklessMultiWriter() { +return getTableType().equals(HoodieTableType.MERGE_ON_READ) Review Comment: `isLocklessMultiWriter` -> `isNonBlockingConcurrencyControl` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1358056689 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java: ## @@ -540,9 +541,7 @@ public void testWriteMultiWriterInvolved() throws Exception { .checkWrittenData(EXPECTED3, 1); // step to commit the 2nd txn, should throw exception // for concurrent modification of same fileGroups -pipeline1.checkpoint(1) -.assertNextEvent() -.checkpointCompleteThrows(1, HoodieWriteConflictException.class, "Cannot resolve conflicts"); +testConcurrentCommit(pipeline1); Review Comment: Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761235101 ## CI report: * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305) * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20327) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1761221614 ## CI report: * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305) * d6a5091c80ac89d8cfe4527e1eb63c962bfb17df UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
beyond1920 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1357983985 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java: ## @@ -540,9 +541,7 @@ public void testWriteMultiWriterInvolved() throws Exception { .checkWrittenData(EXPECTED3, 1); // step to commit the 2nd txn, should throw exception // for concurrent modification of same fileGroups -pipeline1.checkpoint(1) -.assertNextEvent() -.checkpointCompleteThrows(1, HoodieWriteConflictException.class, "Cannot resolve conflicts"); +testConcurrentCommit(pipeline1); Review Comment: The behavior of existed test cases (`testWriteMultiWriterInvolved` and `testWriteMultiWriterPartialOverlapping`) change after support non-blocking multiple writers. So those two tests need to be updated. I also add some new tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
beyond1920 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1357981634 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java: ## @@ -524,9 +528,7 @@ private void checkInstantState(HoodieInstant.State state, String instantStr) { } protected String lastCompleteInstant() { - return OptionsResolver.isMorTable(conf) - ? TestUtils.getLastDeltaCompleteInstant(basePath) - : TestUtils.getLastCompleteInstant(basePath, HoodieTimeline.COMMIT_ACTION); + return this.ckpMetadata.lastCompleteInstant(); Review Comment: Updated. Add a special branch to handle lockless multiple writers. For multiple writes, the writer started first might finish later, similarly, the writer started later might finish first. So the last completed instant from ckp metadata might not be equals to the last completed instant from timeline. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
beyond1920 commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1760672959 @danny0405 Thanks for review. I would add more tests soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1357634569 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteCopyOnWrite.java: ## @@ -540,9 +541,7 @@ public void testWriteMultiWriterInvolved() throws Exception { .checkWrittenData(EXPECTED3, 1); // step to commit the 2nd txn, should throw exception // for concurrent modification of same fileGroups -pipeline1.checkpoint(1) -.assertNextEvent() -.checkpointCompleteThrows(1, HoodieWriteConflictException.class, "Cannot resolve conflicts"); +testConcurrentCommit(pipeline1); Review Comment: Can we add new tests instead of ammending existing one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1357634270 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestWriteMergeOnRead.java: ## @@ -213,6 +213,14 @@ protected Map getMiniBatchExpected() { return expected; } + @Override + protected void testConcurrentCommit(TestHarness pipeline) throws Exception { +pipeline.checkpoint(1) +.assertNextEvent() Review Comment: I'm confused by the method name, is it a concurrent commit? The pipeline only commits 1 commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
danny0405 commented on code in PR #9850: URL: https://github.com/apache/hudi/pull/9850#discussion_r1357633319 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/utils/TestWriteBase.java: ## @@ -524,9 +528,7 @@ private void checkInstantState(HoodieInstant.State state, String instantStr) { } protected String lastCompleteInstant() { - return OptionsResolver.isMorTable(conf) - ? TestUtils.getLastDeltaCompleteInstant(basePath) - : TestUtils.getLastCompleteInstant(basePath, HoodieTimeline.COMMIT_ACTION); + return this.ckpMetadata.lastCompleteInstant(); Review Comment: Why we must fetch the instant from ckp metadata? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1759954394 ## CI report: * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1759544663 ## CI report: * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20305) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6480] Flink support non-blocking concurrency control [hudi]
hudi-bot commented on PR #9850: URL: https://github.com/apache/hudi/pull/9850#issuecomment-1759527066 ## CI report: * 72aebcc59f5ebebc64402dc8d1d9a491474b1dd0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org