[GitHub] [hudi] hudi-bot removed a comment on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload
hudi-bot removed a comment on pull request #4660: URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018227679 ## CI report: * 590944041ba967d5390e5cc3d9b937226b6705af Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5405) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload
hudi-bot commented on pull request #4660: URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018262142 ## CI report: * 590944041ba967d5390e5cc3d9b937226b6705af Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5405) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot removed a comment on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018227649 ## CI report: * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403) * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot commented on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018258609 ## CI report: * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403) * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5410) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Guanpx commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)
Guanpx commented on issue #4658: URL: https://github.com/apache/hudi/issues/4658#issuecomment-1018254469 > So you use the `upsert` mode right ? And the hoodie table has a pk there ? we use insert (append) mode, not have a unique key, does data will deduplicate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient
hudi-bot commented on pull request #3866: URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018248383 ## CI report: * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN * 6ded004f02b3a5ca4b8314f66df59a1abc9bf5a3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5409) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient
hudi-bot removed a comment on pull request #3866: URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018246488 ## CI report: * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN * 7e96f0f751a745f3a77bed4461099aee2c00f697 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5402) * 6ded004f02b3a5ca4b8314f66df59a1abc9bf5a3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient
hudi-bot removed a comment on pull request #3866: URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018216050 ## CI report: * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN * 7e96f0f751a745f3a77bed4461099aee2c00f697 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5402) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient
hudi-bot commented on pull request #3866: URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018246488 ## CI report: * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN * 7e96f0f751a745f3a77bed4461099aee2c00f697 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5402) * 6ded004f02b3a5ca4b8314f66df59a1abc9bf5a3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] watermelon12138 commented on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …
watermelon12138 commented on pull request #4645: URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018234112 @nsivabalan Ok, Thank you very much. These are some very good advice and I will try to land them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4662: [HUDI-3293] Fixing default value for clustering small file config
hudi-bot removed a comment on pull request #4662: URL: https://github.com/apache/hudi/pull/4662#issuecomment-1018230462 ## CI report: * 789ecb457d2f5424674d512dd62d64480edc8c36 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4662: [HUDI-3293] Fixing default value for clustering small file config
hudi-bot commented on pull request #4662: URL: https://github.com/apache/hudi/pull/4662#issuecomment-1018232163 ## CI report: * 789ecb457d2f5424674d512dd62d64480edc8c36 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5408) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2903: [HUDI-1850][HUDI-3234] Fixing read of a empty table but with failed write
nsivabalan commented on pull request #2903: URL: https://github.com/apache/hudi/pull/2903#issuecomment-1018232403 @YannByron : can you review the patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4661: [HUDI-3292] Enabling lazy read by default for log blocks during compaction
hudi-bot commented on pull request #4661: URL: https://github.com/apache/hudi/pull/4661#issuecomment-1018232140 ## CI report: * aa1156a61a9a6f5559597eda6231567bf55fde42 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5407) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4661: [HUDI-3292] Enabling lazy read by default for log blocks during compaction
hudi-bot removed a comment on pull request #4661: URL: https://github.com/apache/hudi/pull/4661#issuecomment-1018230440 ## CI report: * aa1156a61a9a6f5559597eda6231567bf55fde42 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type
hudi-bot commented on pull request #4659: URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018230413 ## CI report: * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404) * cc6512086b494976e154cf2db10597953d3c71d4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5406) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4662: [HUDI-3293] Fixing default value for clustering small file config
hudi-bot commented on pull request #4662: URL: https://github.com/apache/hudi/pull/4662#issuecomment-1018230462 ## CI report: * 789ecb457d2f5424674d512dd62d64480edc8c36 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4661: [HUDI-3292] Enabling lazy read by default for log blocks during compaction
hudi-bot commented on pull request #4661: URL: https://github.com/apache/hudi/pull/4661#issuecomment-1018230440 ## CI report: * aa1156a61a9a6f5559597eda6231567bf55fde42 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type
hudi-bot removed a comment on pull request #4659: URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018229037 ## CI report: * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404) * cc6512086b494976e154cf2db10597953d3c71d4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wxplovecc commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true
wxplovecc commented on a change in pull request #4654: URL: https://github.com/apache/hudi/pull/4654#discussion_r789394982 ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java ## @@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception { */ private void waitForBootstrapReady(int taskID) { int taskNum = getRuntimeContext().getNumberOfParallelSubtasks(); +int attemptNum = getRuntimeContext().getAttemptNumber(); int readyTaskNum = 1; while (taskNum != readyTaskNum) { try { -readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new BootstrapAggFunction()); -LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}.", taskID); +readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + attemptNum, taskID, new BootstrapAggFunction()); +LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, attemptNum = {}.", taskID, attemptNum); Review comment: yes, you are right, after fail over `updateGlobalAggregate` function return previous accumulator info -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type
hudi-bot commented on pull request #4659: URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018229037 ## CI report: * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404) * cc6512086b494976e154cf2db10597953d3c71d4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type
hudi-bot removed a comment on pull request #4659: URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018227672 ## CI report: * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3292) Enable lazy read of log blocks for compaction
[ https://issues.apache.org/jira/browse/HUDI-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3292: - Labels: pull-request-available (was: ) > Enable lazy read of log blocks for compaction > - > > Key: HUDI-3292 > URL: https://issues.apache.org/jira/browse/HUDI-3292 > Project: Apache Hudi > Issue Type: Task > Components: compaction >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] nsivabalan opened a new pull request #4661: [HUDI-3292] Enabling lazy read by default for log blocks during compaction
nsivabalan opened a new pull request #4661: URL: https://github.com/apache/hudi/pull/4661 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request #4662: [HUDI-3293] Fixing default value for clustering small file config
nsivabalan opened a new pull request #4662: URL: https://github.com/apache/hudi/pull/4662 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3293) Fix default value for clustering small file size
[ https://issues.apache.org/jira/browse/HUDI-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3293: - Labels: pull-request-available (was: ) > Fix default value for clustering small file size > > > Key: HUDI-3293 > URL: https://issues.apache.org/jira/browse/HUDI-3293 > Project: Apache Hudi > Issue Type: Task > Components: clustering >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3293) Fix default value for clustering small file size
[ https://issues.apache.org/jira/browse/HUDI-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3293: -- Fix Version/s: 0.11.0 > Fix default value for clustering small file size > > > Key: HUDI-3293 > URL: https://issues.apache.org/jira/browse/HUDI-3293 > Project: Apache Hudi > Issue Type: Task > Components: clustering >Reporter: sivabalan narayanan >Priority: Major > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3293) Fix default value for clustering small file size
sivabalan narayanan created HUDI-3293: - Summary: Fix default value for clustering small file size Key: HUDI-3293 URL: https://issues.apache.org/jira/browse/HUDI-3293 Project: Apache Hudi Issue Type: Task Components: clustering Reporter: sivabalan narayanan -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload
hudi-bot removed a comment on pull request #4660: URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018226185 ## CI report: * 590944041ba967d5390e5cc3d9b937226b6705af UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot commented on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018227649 ## CI report: * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403) * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload
hudi-bot commented on pull request #4660: URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018227679 ## CI report: * 590944041ba967d5390e5cc3d9b937226b6705af Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5405) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot removed a comment on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018226144 ## CI report: * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374) * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403) * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type
hudi-bot commented on pull request #4659: URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018227672 ## CI report: * 1ec3d9b036d2a743243dd75556f6eb3492e0126f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5404) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type
hudi-bot removed a comment on pull request #4659: URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018226166 ## CI report: * 1ec3d9b036d2a743243dd75556f6eb3492e0126f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type
hudi-bot commented on pull request #4659: URL: https://github.com/apache/hudi/pull/4659#issuecomment-1018226166 ## CI report: * 1ec3d9b036d2a743243dd75556f6eb3492e0126f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload
hudi-bot commented on pull request #4660: URL: https://github.com/apache/hudi/pull/4660#issuecomment-1018226185 ## CI report: * 590944041ba967d5390e5cc3d9b937226b6705af UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot commented on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018226144 ## CI report: * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374) * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403) * d3dd5ae21bb4df56967d4d5eec18d9358f0f0cb9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot removed a comment on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018224665 ## CI report: * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374) * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3292) Enable lazy read of log blocks for compaction
[ https://issues.apache.org/jira/browse/HUDI-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3292: -- Fix Version/s: 0.11.0 > Enable lazy read of log blocks for compaction > - > > Key: HUDI-3292 > URL: https://issues.apache.org/jira/browse/HUDI-3292 > Project: Apache Hudi > Issue Type: Task > Components: compaction >Reporter: sivabalan narayanan >Priority: Major > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3292) Enable lazy read of log blocks for compaction
sivabalan narayanan created HUDI-3292: - Summary: Enable lazy read of log blocks for compaction Key: HUDI-3292 URL: https://issues.apache.org/jira/browse/HUDI-3292 Project: Apache Hudi Issue Type: Task Components: compaction Reporter: sivabalan narayanan -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot commented on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018224665 ## CI report: * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374) * c44a34bc4a46da4918493ca95967cf0fbddbfe70 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5403) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot removed a comment on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018223472 ## CI report: * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374) * c44a34bc4a46da4918493ca95967cf0fbddbfe70 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3291) Flip Default record paylod to DefaultHoodieRecordPayload
[ https://issues.apache.org/jira/browse/HUDI-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3291: - Labels: pull-request-available (was: ) > Flip Default record paylod to DefaultHoodieRecordPayload > > > Key: HUDI-3291 > URL: https://issues.apache.org/jira/browse/HUDI-3291 > Project: Apache Hudi > Issue Type: Task > Components: writer-core >Reporter: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3091) Make simple index as the default hoodie.index.type
[ https://issues.apache.org/jira/browse/HUDI-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3091: - Labels: pull-request-available (was: ) > Make simple index as the default hoodie.index.type > -- > > Key: HUDI-3091 > URL: https://issues.apache.org/jira/browse/HUDI-3091 > Project: Apache Hudi > Issue Type: New Feature > Components: index >Reporter: Vinoth Govindarajan >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > When performing upserts with derived datasets, we often run into an OOM issue > with the bloom filter, hence we changed all the dataset index types to simple > to resolve the issue. > > Some of the tables were non-partitioned tables for which bloom index is not > the right choice. > I'm proposing to make a simple index as the default value and on case-by-case > basics, folks can choose the bloom filter for additional performance gains > offered by bloom filters. > > I agree that the performance will not be optimal but for regular use cases > simple index would not break and give them sub-optimal read/write performance > but it won't break any ingestion/derived jobs. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] nsivabalan opened a new pull request #4660: [HUDI-3291] Flipping default record payload to DefaultHoodieRecordPayload
nsivabalan opened a new pull request #4660: URL: https://github.com/apache/hudi/pull/4660 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan opened a new pull request #4659: [HUDI-3091] Making SIMPLE index as the default index type
nsivabalan opened a new pull request #4659: URL: https://github.com/apache/hudi/pull/4659 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3291) Flip Default record paylod to DefaultHoodieRecordPayload
[ https://issues.apache.org/jira/browse/HUDI-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3291: -- Fix Version/s: 0.11.0 > Flip Default record paylod to DefaultHoodieRecordPayload > > > Key: HUDI-3291 > URL: https://issues.apache.org/jira/browse/HUDI-3291 > Project: Apache Hudi > Issue Type: Task > Components: writer-core >Reporter: sivabalan narayanan >Priority: Major > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] wxplovecc commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true
wxplovecc commented on a change in pull request #4654: URL: https://github.com/apache/hudi/pull/4654#discussion_r789394982 ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java ## @@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception { */ private void waitForBootstrapReady(int taskID) { int taskNum = getRuntimeContext().getNumberOfParallelSubtasks(); +int attemptNum = getRuntimeContext().getAttemptNumber(); int readyTaskNum = 1; while (taskNum != readyTaskNum) { try { -readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new BootstrapAggFunction()); -LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}.", taskID); +readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + attemptNum, taskID, new BootstrapAggFunction()); +LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, attemptNum = {}.", taskID, attemptNum); Review comment: yes, you are right -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-3291) Flip Default record paylod to DefaultHoodieRecordPayload
sivabalan narayanan created HUDI-3291: - Summary: Flip Default record paylod to DefaultHoodieRecordPayload Key: HUDI-3291 URL: https://issues.apache.org/jira/browse/HUDI-3291 Project: Apache Hudi Issue Type: Task Components: writer-core Reporter: sivabalan narayanan -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot commented on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1018223472 ## CI report: * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374) * c44a34bc4a46da4918493ca95967cf0fbddbfe70 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
hudi-bot removed a comment on pull request #4649: URL: https://github.com/apache/hudi/pull/4649#issuecomment-1017460532 ## CI report: * b9ae619a0beadc105fcec9466f5c29b97ff3af84 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5368) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5374) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-3290) Make the .hoodie-partition-metadata as empty parquet file
Vinoth Govindarajan created HUDI-3290: - Summary: Make the .hoodie-partition-metadata as empty parquet file Key: HUDI-3290 URL: https://issues.apache.org/jira/browse/HUDI-3290 Project: Apache Hudi Issue Type: New Feature Components: metadata Reporter: Vinoth Govindarajan Assignee: Vinoth Govindarajan For BigQuery and Snowflake integration, we can't able to create external tables when the partition folder has a non-parquet file `.hoodie-partition-metadata`. I understand this is an important file to find the .hoodie folder from within the partition folder, the long term solution is to get rid of this file, but as a short term solution if we can convert this to an empty parquet file and add the necessary depth information in the footer, then it will pass the BigQuery/Snowflake external table validation and allow us to create an external parquet table on top of hudi folder structure. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3278) Make Simple Index the default index type
[ https://issues.apache.org/jira/browse/HUDI-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-3278. - Resolution: Duplicate > Make Simple Index the default index type > > > Key: HUDI-3278 > URL: https://issues.apache.org/jira/browse/HUDI-3278 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Vinoth Chandar >Assignee: sivabalan narayanan >Priority: Blocker > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3091) Make simple index as the default hoodie.index.type
[ https://issues.apache.org/jira/browse/HUDI-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3091: -- Priority: Blocker (was: Major) > Make simple index as the default hoodie.index.type > -- > > Key: HUDI-3091 > URL: https://issues.apache.org/jira/browse/HUDI-3091 > Project: Apache Hudi > Issue Type: New Feature > Components: index >Reporter: Vinoth Govindarajan >Assignee: sivabalan narayanan >Priority: Blocker > > When performing upserts with derived datasets, we often run into an OOM issue > with the bloom filter, hence we changed all the dataset index types to simple > to resolve the issue. > > Some of the tables were non-partitioned tables for which bloom index is not > the right choice. > I'm proposing to make a simple index as the default value and on case-by-case > basics, folks can choose the bloom filter for additional performance gains > offered by bloom filters. > > I agree that the performance will not be optimal but for regular use cases > simple index would not break and give them sub-optimal read/write performance > but it won't break any ingestion/derived jobs. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3091) Make simple index as the default hoodie.index.type
[ https://issues.apache.org/jira/browse/HUDI-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3091: - Assignee: sivabalan narayanan > Make simple index as the default hoodie.index.type > -- > > Key: HUDI-3091 > URL: https://issues.apache.org/jira/browse/HUDI-3091 > Project: Apache Hudi > Issue Type: New Feature > Components: index >Reporter: Vinoth Govindarajan >Assignee: sivabalan narayanan >Priority: Major > > When performing upserts with derived datasets, we often run into an OOM issue > with the bloom filter, hence we changed all the dataset index types to simple > to resolve the issue. > > Some of the tables were non-partitioned tables for which bloom index is not > the right choice. > I'm proposing to make a simple index as the default value and on case-by-case > basics, folks can choose the bloom filter for additional performance gains > offered by bloom filters. > > I agree that the performance will not be optimal but for regular use cases > simple index would not break and give them sub-optimal read/write performance > but it won't break any ingestion/derived jobs. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient
hudi-bot removed a comment on pull request #3866: URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018214437 ## CI report: * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN * c047f394e58415a14c5a4070627fd90a7d1106b6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2974) * 7e96f0f751a745f3a77bed4461099aee2c00f697 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3091) Make simple index as the default hoodie.index.type
[ https://issues.apache.org/jira/browse/HUDI-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3091: -- Fix Version/s: 0.11.0 > Make simple index as the default hoodie.index.type > -- > > Key: HUDI-3091 > URL: https://issues.apache.org/jira/browse/HUDI-3091 > Project: Apache Hudi > Issue Type: New Feature > Components: index >Reporter: Vinoth Govindarajan >Assignee: sivabalan narayanan >Priority: Blocker > Fix For: 0.11.0 > > > When performing upserts with derived datasets, we often run into an OOM issue > with the bloom filter, hence we changed all the dataset index types to simple > to resolve the issue. > > Some of the tables were non-partitioned tables for which bloom index is not > the right choice. > I'm proposing to make a simple index as the default value and on case-by-case > basics, folks can choose the bloom filter for additional performance gains > offered by bloom filters. > > I agree that the performance will not be optimal but for regular use cases > simple index would not break and give them sub-optimal read/write performance > but it won't break any ingestion/derived jobs. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-2978) Change default index type to Simple
[ https://issues.apache.org/jira/browse/HUDI-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-2978. - Resolution: Duplicate > Change default index type to Simple > --- > > Key: HUDI-2978 > URL: https://issues.apache.org/jira/browse/HUDI-2978 > Project: Apache Hudi > Issue Type: Task >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy >Priority: Major > Labels: release-notes, sev:high > Fix For: 0.11.0 > > > Toady the default index type is Bloom. For the read-update-all workloads, > simple index is performant compared to the Bloom index. Better to have Simple > as the default index type and choose Bloom only based on workloads. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient
hudi-bot commented on pull request #3866: URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018216050 ## CI report: * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN * 7e96f0f751a745f3a77bed4461099aee2c00f697 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5402) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)
danny0405 commented on issue #4658: URL: https://github.com/apache/hudi/issues/4658#issuecomment-1018215386 So you use the `upsert` mode right ? And the hoodie table has a pk there ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient
hudi-bot removed a comment on pull request #3866: URL: https://github.com/apache/hudi/pull/3866#issuecomment-961589833 ## CI report: * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN * c047f394e58415a14c5a4070627fd90a7d1106b6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2974) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3866: [HUDI-1430] SparkDataFrameWriteClient
hudi-bot commented on pull request #3866: URL: https://github.com/apache/hudi/pull/3866#issuecomment-1018214437 ## CI report: * 8144fcd5285a5f53f4a76c4327e0bb8c90b46c97 UNKNOWN * 01cb7594fc6b49dcdde255269d43f4b97d5193ce UNKNOWN * 7d3e9053f159b07c3266e4eef1dc0c17bb850b59 UNKNOWN * c047f394e58415a14c5a4070627fd90a7d1106b6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2974) * 7e96f0f751a745f3a77bed4461099aee2c00f697 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true
danny0405 commented on a change in pull request #4654: URL: https://github.com/apache/hudi/pull/4654#discussion_r789387075 ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java ## @@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception { */ private void waitForBootstrapReady(int taskID) { int taskNum = getRuntimeContext().getNumberOfParallelSubtasks(); +int attemptNum = getRuntimeContext().getAttemptNumber(); int readyTaskNum = 1; while (taskNum != readyTaskNum) { try { -readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new BootstrapAggFunction()); -LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}.", taskID); +readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + attemptNum, taskID, new BootstrapAggFunction()); +LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, attemptNum = {}.", taskID, attemptNum); Review comment: Only when the accumulator received all the task bootstrap info, the `readyTaskNum` matches and returns true, does that work for your case ? Because the fail over retry does not increase the `readyTaskNum` right ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380198#comment-17380198 ] sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:13 AM: - -[High Priority] marker based rollback should be on- {code:java} public static final ConfigProperty ROLLBACK_USING_MARKERS = ConfigProperty .key("hoodie.rollback.using.markers") .defaultValue("false") .withDocumentation("Enables a more efficient mechanism for rollbacks based on the marker files generated " + "during the writes. Turned off by default.");{code} was (Author: vc): [High Priority] marker based rollback should be on {code:java} public static final ConfigProperty ROLLBACK_USING_MARKERS = ConfigProperty .key("hoodie.rollback.using.markers") .defaultValue("false") .withDocumentation("Enables a more efficient mechanism for rollbacks based on the marker files generated " + "during the writes. Turned off by default.");{code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Task > Components: Code Cleanup, docs, writer-core >Reporter: Vinoth Chandar >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380174#comment-17380174 ] sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:11 AM: - [High Priority]Need to ensure this is actually 1, going forward. {code:java} public static final ConfigProperty HOODIE_TABLE_VERSION_PROP = ConfigProperty .key("hoodie.table.version") .defaultValue(HoodieTableVersion.ZERO) .withDocumentation("");{code} Update: above default value is not used anywhere. default value is picked up from HoodieTableVersion.current() was (Author: vc): [High Priority]Need to ensure this is actually 1, going forward. {code:java} public static final ConfigProperty HOODIE_TABLE_VERSION_PROP = ConfigProperty .key("hoodie.table.version") .defaultValue(HoodieTableVersion.ZERO) .withDocumentation("");{code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Task > Components: Code Cleanup, docs, writer-core >Reporter: Vinoth Chandar >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] leesf commented on a change in pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
leesf commented on a change in pull request #4649: URL: https://github.com/apache/hudi/pull/4649#discussion_r789385383 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala ## @@ -17,16 +17,15 @@ package org.apache.hudi +import org.apache.hadoop.fs.{GlobPattern, Path} Review comment: please revert the import change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results
leesf commented on a change in pull request #4649: URL: https://github.com/apache/hudi/pull/4649#discussion_r789385270 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/TableSchemaResolverUtils.java ## @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.util; + +import org.apache.avro.Schema; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.table.TableSchemaResolver; + +public final class TableSchemaResolverUtils { Review comment: we would avoid introducing a new util class and put the util method into TableSchemaResolver -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377665#comment-17377665 ] sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:05 AM: - -[High Priority] Timeline layout version should now be 1- {code:java} public static final ConfigProperty TIMELINE_LAYOUT_VERSION = ConfigProperty .key("hoodie.timeline.layout.version"){code} was (Author: vc): [High Priority] Timeline layout version should now be 1 {code:java} public static final ConfigProperty TIMELINE_LAYOUT_VERSION = ConfigProperty .key("hoodie.timeline.layout.version"){code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Task > Components: Code Cleanup, docs, writer-core >Reporter: Vinoth Chandar >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377617#comment-17377617 ] sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:04 AM: - -Is file listing parallelism too high?- already set to 200 {code:java} public static final ConfigProperty FILE_LISTING_PARALLELISM_PROP = ConfigProperty .key("hoodie.file.listing.parallelism") .defaultValue(1500){code} was (Author: vc): Is file listing parallelism too high? {code:java} public static final ConfigProperty FILE_LISTING_PARALLELISM_PROP = ConfigProperty .key("hoodie.file.listing.parallelism") .defaultValue(1500){code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Task > Components: Code Cleanup, docs, writer-core >Reporter: Vinoth Chandar >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377663#comment-17377663 ] sivabalan narayanan edited comment on HUDI-2151 at 1/21/22, 6:04 AM: - -[High Priority] Rollback using markers- {code:java} public static final ConfigProperty ROLLBACK_USING_MARKERS = ConfigProperty .key("hoodie.rollback.using.markers") .defaultValue("false"){code} was (Author: vc): [High Priority] Rollback using markers {code:java} public static final ConfigProperty ROLLBACK_USING_MARKERS = ConfigProperty .key("hoodie.rollback.using.markers") .defaultValue("false"){code} > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Task > Components: Code Cleanup, docs, writer-core >Reporter: Vinoth Chandar >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] wxplovecc commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true
wxplovecc commented on a change in pull request #4654: URL: https://github.com/apache/hudi/pull/4654#discussion_r789373079 ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java ## @@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception { */ private void waitForBootstrapReady(int taskID) { int taskNum = getRuntimeContext().getNumberOfParallelSubtasks(); +int attemptNum = getRuntimeContext().getAttemptNumber(); int readyTaskNum = 1; while (taskNum != readyTaskNum) { try { -readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new BootstrapAggFunction()); -LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}.", taskID); +readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + attemptNum, taskID, new BootstrapAggFunction()); +LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, attemptNum = {}.", taskID, attemptNum); Review comment: Ok,once flink job with index.bootstrap=true failed like taskmanager lost if the job restart with the same GlobalAggregate name, it will reuse the `accumulators` in JobMaster and then, some parallelism of BootstrapOperator that faster then others will send records downstream without wait for all bootstrap task done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-3242) Checkpoint 0 is ignored -Partial parquet file discovery after the first commit
[ https://issues.apache.org/jira/browse/HUDI-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479835#comment-17479835 ] Harsha Teja Kanna commented on HUDI-3242: - Input: monthly partitions partition=2021/01 file1_1 - timestamp1 file2_1 - timestamp2 file3_1 - timestamp3 partition=2021/02 file1_2 - timestamp1 file2_2 - timestamp2 file3_2 - timestamp3 Now I want to run Deltastreamer partition after partition to create Hudi table > Checkpoint 0 is ignored -Partial parquet file discovery after the first commit > -- > > Key: HUDI-3242 > URL: https://issues.apache.org/jira/browse/HUDI-3242 > Project: Apache Hudi > Issue Type: Bug > Components: spark, writer-core >Affects Versions: 0.10.1 > Environment: AWS > EMR 6.4.0 > Spark 3.1.2 > Hudi - 0.10.1-rc >Reporter: Harsha Teja Kanna >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call, sev:critical, user-support-issues > Attachments: Screen Shot 2022-01-13 at 2.40.55 AM.png, Screen Shot > 2022-01-13 at 2.55.35 AM.png, Screen Shot 2022-01-20 at 1.36.48 PM.png > > Original Estimate: 3h > Remaining Estimate: 3h > > Hi, I am testing release branch 0.10.1 as I needed few bug fixes from it. > However, I see for a certain table. Only partial discovery of files happening > after the initial commit of the table. > But if the second partition is given as input for the first commit, all the > files are getting discovered. > First partition : 2021/01 has 744 files and all of them are discovered > Second partition: 2021/02 has 762 files but only 72 are discovered. > Checkpoint is set to 0. > No errors in the logs. > {code:java} > spark-submit \ > --master yarn \ > --deploy-mode cluster \ > --driver-cores 30 \ > --driver-memory 32g \ > --executor-cores 5 \ > --executor-memory 32g \ > --num-executors 120 \ > --jars > s3://bucket/apps/datalake/jars/unused-1.0.0.jar,s3://bucket/apps/datalake/jars/spark-avro_2.12-3.1.2.jar > \ > --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ > --conf spark.serializer=org.apache.spark.serializer.KryoSerializer > s3://bucket/apps/datalake/jars/hudi-0.10.0/hudi-utilities-bundle_2.12-0.10.0.jar > \ > --table-type COPY_ON_WRITE \ > --source-ordering-field timestamp \ > --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \ > --target-base-path s3a://datalake-hudi/datastream/v1/sessions_by_date \ > --target-table sessions_by_date \ > --transformer-class > org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \ > --op INSERT \ > --checkpoint 0 \ > --hoodie-conf hoodie.clean.automatic=true \ > --hoodie-conf hoodie.cleaner.commits.retained=1 \ > --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS \ > --hoodie-conf hoodie.clustering.inline=false \ > --hoodie-conf hoodie.clustering.inline.max.commits=1 \ > --hoodie-conf > hoodie.clustering.plan.strategy.class=org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy > \ > --hoodie-conf hoodie.clustering.plan.strategy.max.num.groups=100 \ > --hoodie-conf hoodie.clustering.plan.strategy.small.file.limit=25000 \ > --hoodie-conf hoodie.clustering.plan.strategy.sort.columns=sid,id \ > --hoodie-conf hoodie.clustering.plan.strategy.target.file.max.bytes=268435456 > \ > --hoodie-conf hoodie.clustering.preserve.commit.metadata=true \ > --hoodie-conf hoodie.datasource.hive_sync.database=datalake-hudi \ > --hoodie-conf hoodie.datasource.hive_sync.enable=false \ > --hoodie-conf hoodie.datasource.hive_sync.ignore_exceptions=true \ > --hoodie-conf hoodie.datasource.hive_sync.mode=hms \ > --hoodie-conf > hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.HiveStylePartitionValueExtractor > \ > --hoodie-conf hoodie.datasource.hive_sync.table=sessions_by_date \ > --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false \ > --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \ > --hoodie-conf > hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator > \ > --hoodie-conf hoodie.datasource.write.operation=insert \ > --hoodie-conf hoodie.datasource.write.partitionpath.field=date:TIMESTAMP \ > --hoodie-conf hoodie.datasource.write.precombine.field=timestamp \ > --hoodie-conf hoodie.datasource.write.recordkey.field=id,qid,aid \ > --hoodie-conf > hoodie.deltastreamer.keygen.timebased.input.dateformat=/MM/dd \ > --hoodie-conf hoodie.deltastreamer.keygen.timebased.input.timezone=GMT \ > --hoodie-conf > hoodie.deltastreamer.keygen.timebased.output.dateformat=/MM/dd \ > --hoodie-conf hoodie.deltastreamer.keygen.timebased.output.timezone=GMT \ > --hoodie-conf > hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING \ > --hoodie-con
[jira] [Commented] (HUDI-3242) Checkpoint 0 is ignored -Partial parquet file discovery after the first commit
[ https://issues.apache.org/jira/browse/HUDI-3242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479833#comment-17479833 ] sivabalan narayanan commented on HUDI-3242: --- I don't understand this statement of yours "Also few of my unloaded datasets have non linear timestamps across partitions and I create the hudi table partition after partition and set checkpoint to 0." sorry. can you please clarify. I am trying to understand whats your intention to explicitly set checkpoint value to 0? > Checkpoint 0 is ignored -Partial parquet file discovery after the first commit > -- > > Key: HUDI-3242 > URL: https://issues.apache.org/jira/browse/HUDI-3242 > Project: Apache Hudi > Issue Type: Bug > Components: spark, writer-core >Affects Versions: 0.10.1 > Environment: AWS > EMR 6.4.0 > Spark 3.1.2 > Hudi - 0.10.1-rc >Reporter: Harsha Teja Kanna >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call, sev:critical, user-support-issues > Attachments: Screen Shot 2022-01-13 at 2.40.55 AM.png, Screen Shot > 2022-01-13 at 2.55.35 AM.png, Screen Shot 2022-01-20 at 1.36.48 PM.png > > Original Estimate: 3h > Remaining Estimate: 3h > > Hi, I am testing release branch 0.10.1 as I needed few bug fixes from it. > However, I see for a certain table. Only partial discovery of files happening > after the initial commit of the table. > But if the second partition is given as input for the first commit, all the > files are getting discovered. > First partition : 2021/01 has 744 files and all of them are discovered > Second partition: 2021/02 has 762 files but only 72 are discovered. > Checkpoint is set to 0. > No errors in the logs. > {code:java} > spark-submit \ > --master yarn \ > --deploy-mode cluster \ > --driver-cores 30 \ > --driver-memory 32g \ > --executor-cores 5 \ > --executor-memory 32g \ > --num-executors 120 \ > --jars > s3://bucket/apps/datalake/jars/unused-1.0.0.jar,s3://bucket/apps/datalake/jars/spark-avro_2.12-3.1.2.jar > \ > --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ > --conf spark.serializer=org.apache.spark.serializer.KryoSerializer > s3://bucket/apps/datalake/jars/hudi-0.10.0/hudi-utilities-bundle_2.12-0.10.0.jar > \ > --table-type COPY_ON_WRITE \ > --source-ordering-field timestamp \ > --source-class org.apache.hudi.utilities.sources.ParquetDFSSource \ > --target-base-path s3a://datalake-hudi/datastream/v1/sessions_by_date \ > --target-table sessions_by_date \ > --transformer-class > org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \ > --op INSERT \ > --checkpoint 0 \ > --hoodie-conf hoodie.clean.automatic=true \ > --hoodie-conf hoodie.cleaner.commits.retained=1 \ > --hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_COMMITS \ > --hoodie-conf hoodie.clustering.inline=false \ > --hoodie-conf hoodie.clustering.inline.max.commits=1 \ > --hoodie-conf > hoodie.clustering.plan.strategy.class=org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy > \ > --hoodie-conf hoodie.clustering.plan.strategy.max.num.groups=100 \ > --hoodie-conf hoodie.clustering.plan.strategy.small.file.limit=25000 \ > --hoodie-conf hoodie.clustering.plan.strategy.sort.columns=sid,id \ > --hoodie-conf hoodie.clustering.plan.strategy.target.file.max.bytes=268435456 > \ > --hoodie-conf hoodie.clustering.preserve.commit.metadata=true \ > --hoodie-conf hoodie.datasource.hive_sync.database=datalake-hudi \ > --hoodie-conf hoodie.datasource.hive_sync.enable=false \ > --hoodie-conf hoodie.datasource.hive_sync.ignore_exceptions=true \ > --hoodie-conf hoodie.datasource.hive_sync.mode=hms \ > --hoodie-conf > hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.HiveStylePartitionValueExtractor > \ > --hoodie-conf hoodie.datasource.hive_sync.table=sessions_by_date \ > --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false \ > --hoodie-conf hoodie.datasource.write.hive_style_partitioning=true \ > --hoodie-conf > hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator > \ > --hoodie-conf hoodie.datasource.write.operation=insert \ > --hoodie-conf hoodie.datasource.write.partitionpath.field=date:TIMESTAMP \ > --hoodie-conf hoodie.datasource.write.precombine.field=timestamp \ > --hoodie-conf hoodie.datasource.write.recordkey.field=id,qid,aid \ > --hoodie-conf > hoodie.deltastreamer.keygen.timebased.input.dateformat=/MM/dd \ > --hoodie-conf hoodie.deltastreamer.keygen.timebased.input.timezone=GMT \ > --hoodie-conf > hoodie.deltastreamer.keygen.timebased.output.dateformat=/MM/dd \ > --hoodie-conf hoodie.deltastreamer.keygen.timebased.output.timezone=GMT \ > --hoodie-conf > hoodie.deltastreamer.k
[GitHub] [hudi] YannByron commented on pull request #4644: [HUDI-3282] Fix delete exception for Spark SQL when sync Hive
YannByron commented on pull request #4644: URL: https://github.com/apache/hudi/pull/4644#issuecomment-1018191017 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #3929: [HUDI-1881] Make multi table delta streamer to use thread pool for table sync asynchronously.
nsivabalan commented on a change in pull request #3929: URL: https://github.com/apache/hudi/pull/3929#discussion_r789367919 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java ## @@ -378,16 +383,23 @@ private static String resetTarget(Config configuration, String database, String /** * Creates actual HoodieDeltaStreamer objects for every table/topic and does incremental sync. */ - public void sync() { -for (TableExecutionContext context : tableExecutionContexts) { - try { -new HoodieDeltaStreamer(context.getConfig(), jssc, Option.ofNullable(context.getProperties())).sync(); -successTables.add(Helpers.getTableWithDatabase(context)); - } catch (Exception e) { -logger.error("error while running MultiTableDeltaStreamer for table: " + context.getTableName(), e); -failedTables.add(Helpers.getTableWithDatabase(context)); - } -} + public void sync() throws InterruptedException { +ExecutorService executorService = Executors.newFixedThreadPool(tableExecutionContexts.size()); +tableExecutionContexts.forEach(context -> { + executorService.execute(new Runnable() { +@Override +public void run() { + try { +new HoodieDeltaStreamer(context.getConfig(), jssc, Option.ofNullable(context.getProperties())).sync(); +successTables.add(Helpers.getTableWithDatabase(context)); + } catch (Exception e) { +logger.error("error while running MultiTableDeltaStreamer for table: " + context.getTableName(), e); +failedTables.add(Helpers.getTableWithDatabase(context)); + } +} + }); +}); +executorService.shutdown(); Review comment: should we add awaitTermination here ? we can't proceed w/ next batch until all tables in current batch is completed right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …
nsivabalan commented on a change in pull request #4645: URL: https://github.com/apache/hudi/pull/4645#discussion_r789365060 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java ## @@ -370,50 +441,124 @@ public static void main(String[] args) throws IOException { private static String resetTarget(Config configuration, String database, String tableName) { String basePathPrefix = configuration.basePathPrefix; basePathPrefix = basePathPrefix.charAt(basePathPrefix.length() - 1) == '/' ? basePathPrefix.substring(0, basePathPrefix.length() - 1) : basePathPrefix; -String targetBasePath = basePathPrefix + Constants.FILE_DELIMITER + database + Constants.FILE_DELIMITER + tableName; -configuration.targetTableName = database + Constants.DELIMITER + tableName; +String targetBasePath = basePathPrefix + Constants.PATH_SEPARATOR + database + Constants.PATH_SEPARATOR + tableName; +configuration.targetTableName = database + Constants.PATH_CUR_DIR + tableName; return targetBasePath; } /** * Creates actual HoodieDeltaStreamer objects for every table/topic and does incremental sync. */ public void sync() { +List hdsObjectList = new ArrayList<>(); + +// The sync function is not executed when multiple sources update the same target. Review comment: probably we can have a big if else blocks for single source vs multiple sources for one hudi table. would be easy to reason about and maintain. existing code will go into if block and new code for multiple source will go into else block. ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java ## @@ -370,50 +441,124 @@ public static void main(String[] args) throws IOException { private static String resetTarget(Config configuration, String database, String tableName) { String basePathPrefix = configuration.basePathPrefix; basePathPrefix = basePathPrefix.charAt(basePathPrefix.length() - 1) == '/' ? basePathPrefix.substring(0, basePathPrefix.length() - 1) : basePathPrefix; -String targetBasePath = basePathPrefix + Constants.FILE_DELIMITER + database + Constants.FILE_DELIMITER + tableName; -configuration.targetTableName = database + Constants.DELIMITER + tableName; +String targetBasePath = basePathPrefix + Constants.PATH_SEPARATOR + database + Constants.PATH_SEPARATOR + tableName; +configuration.targetTableName = database + Constants.PATH_CUR_DIR + tableName; return targetBasePath; } /** * Creates actual HoodieDeltaStreamer objects for every table/topic and does incremental sync. */ public void sync() { +List hdsObjectList = new ArrayList<>(); + +// The sync function is not executed when multiple sources update the same target. for (TableExecutionContext context : tableExecutionContexts) { try { -new HoodieDeltaStreamer(context.getConfig(), jssc, Option.ofNullable(context.getProperties())).sync(); +HoodieDeltaStreamer hds = new HoodieDeltaStreamer(context.getConfig(), jssc, Option.ofNullable(context.getProperties())); + +// Add object of HoodieDeltaStreamer temporarily to hdsObjectList when multiple sources update the same target. +if (!StringUtils.isNullOrEmpty(context.getProperties().getProperty(Constants.SOURCES_TO_BE_BOUND))) { + hdsObjectList.add(hds); + continue; +} + +hds.sync(); successTables.add(Helpers.getTableWithDatabase(context)); } catch (Exception e) { -logger.error("error while running MultiTableDeltaStreamer for table: " + context.getTableName(), e); +logger.error("Error while running MultiTableDeltaStreamer for table: " + context.getTableName(), e); failedTables.add(Helpers.getTableWithDatabase(context)); } } -logger.info("Ingestion was successful for topics: " + successTables); -if (!failedTables.isEmpty()) { - logger.info("Ingestion failed for topics: " + failedTables); +// If hdsObjectList is empty, it indicates that all source sync operations have been completed. In this case, directly return. +if (hdsObjectList.isEmpty()) { + logger.info("Ingestion was successful for topics: " + successTables); + if (!failedTables.isEmpty()) { +logger.info("Ingestion failed for topics: " + failedTables); + } + return; } + +// The sync function is executing here when multiple sources update the same target. +boolean isContinuousMode = hdsObjectList.get(0).cfg.continuousMode; Review comment: I guess we need to move this to L488 as ``` boolean isContinuousMode = hdsObjectList.get(i).cfg.continuousMode; ``` essentially we can't have continuous mode enabled for any tables right. ## File path: hudi-utilities/src/main/java/org/apache/hudi/ut
[GitHub] [hudi] Guanpx opened a new issue #4658: [SUPPORT] Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1)
Guanpx opened a new issue #4658: URL: https://github.com/apache/hudi/issues/4658 **Describe the problem you faced** Data lose with Flink write COW insert table, Flink web UI show Records Received was different with HIVE count(1) **To Reproduce** Steps to reproduce the behavior: 1. Flink write and sync to hive 2. and face that Flink web UI show Records Received was different with HIVE(impala) count(1) **Expected behavior** ![image](https://user-images.githubusercontent.com/29246713/150461634-237e705c-1bff-4183-bf8a-be7222b7d917.png) ![Uploading image.png…]() **Environment Description** * Hudi version : 0.10.0 * Flink version : 1.13.2 * Hive version : 2.1.1-cdh6 * Hadoop version : 3.0.0-cdh6 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no **Additional context** Flink writre config ``` 'connector' = 'hudi', 'path' = 'hdfs://nameservice-ha/hudi/rds/event_log_origin', 'table.type' = 'COPY_ON_WRITE', 'hoodie.datasource.write.recordkey.field' = 'distinct_id', 'hive_sync.enable'='true', 'hive_sync.table'='hudi_event_log_origin', 'hive_sync.db'='default', 'hive_sync.mode' = 'hms', 'hive_sync.metastore.uris' = '', 'hive_sync.skip_ro_suffix' = 'true', 'hoodie.datasource.write.operation' = 'insert',-- append模式 'write.tasks' = '2', 'write.bucket_assign.tasks' = '2', 'write.insert.cluster' = 'true', 'write.ignore.failed' = 'false', 'clean.async.enabled' = 'true', 'clean.retain_commits' = '4', 'archive.min_commits' = '6', 'archive.max_commits' = '12', 'hoodie.cleaner.commits.retained' = '4', 'hoodie.keep.min.commits' = '5', 'hoodie.keep.max.commits' = '10' ``` **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-2563) Refactor CompactionTriggerStrategy.
[ https://issues.apache.org/jira/browse/HUDI-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] RocMarshal closed HUDI-2563. Resolution: Abandoned > Refactor CompactionTriggerStrategy. > --- > > Key: HUDI-2563 > URL: https://issues.apache.org/jira/browse/HUDI-2563 > Project: Apache Hudi > Issue Type: Improvement > Components: cli, compaction, writer-core >Reporter: RocMarshal >Assignee: RocMarshal >Priority: Minor > Labels: pull-request-available > > > # Replace conditional in ScheduleCompactionActionExecutor with polymorphsim > of CompactionTriggerStrategy class. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication
hudi-bot removed a comment on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018129948 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350) * 0aa3cea08224b3a86843251ec43ffd5e22e086ed Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5401) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication
hudi-bot commented on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018160149 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * 0aa3cea08224b3a86843251ec43ffd5e22e086ed Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5401) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] VIKASPATID commented on issue #4635: [SUPPORT] Bulk write failing due to hudi timeline archive exception
VIKASPATID commented on issue #4635: URL: https://github.com/apache/hudi/issues/4635#issuecomment-1018159576 Bulk Write is running without any failures with single writer, but we want to write bunch of files, so we need multi writer to decrease total write time. Is there anything we are missing for multi writer or any way to fix it for multi writer ? That's all we have in the stack trace. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #4654: [HUDI-3286] duplicate records when flink task restart with index.bootstrap=true
danny0405 commented on a change in pull request #4654: URL: https://github.com/apache/hudi/pull/4654#discussion_r789339843 ## File path: hudi-flink/src/main/java/org/apache/hudi/sink/bootstrap/BootstrapOperator.java ## @@ -151,11 +151,12 @@ protected void preLoadIndexRecords() throws Exception { */ private void waitForBootstrapReady(int taskID) { int taskNum = getRuntimeContext().getNumberOfParallelSubtasks(); +int attemptNum = getRuntimeContext().getAttemptNumber(); int readyTaskNum = 1; while (taskNum != readyTaskNum) { try { -readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME, taskID, new BootstrapAggFunction()); -LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}.", taskID); +readyTaskNum = aggregateManager.updateGlobalAggregate(BootstrapAggFunction.NAME + "_" + attemptNum, taskID, new BootstrapAggFunction()); +LOG.info("Waiting for other bootstrap tasks to complete, taskId = {}, attemptNum = {}.", taskID, attemptNum); Review comment: Hello, can you explain why we need this change ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] cdmikechen commented on pull request #3391: [HUDI-83] Fix Timestamp type read by Hive
cdmikechen commented on pull request #3391: URL: https://github.com/apache/hudi/pull/3391#issuecomment-1018140612 @lucasmo You can try this pr, but it looks like there are some conflicts after I push this commit. I will resolve the conflicts later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication
hudi-bot commented on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018129948 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350) * 0aa3cea08224b3a86843251ec43ffd5e22e086ed Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5401) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication
hudi-bot removed a comment on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018121606 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350) * 0aa3cea08224b3a86843251ec43ffd5e22e086ed UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication
hudi-bot commented on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1018121606 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350) * 0aa3cea08224b3a86843251ec43ffd5e22e086ed UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4559: [HUDI-3206][Stacked on 4556] Unify Hive's MOR implementations to avoid duplication
hudi-bot removed a comment on pull request #4559: URL: https://github.com/apache/hudi/pull/4559#issuecomment-1016940234 ## CI report: * 47970bd3a9cbbf2eb85b0a87f899256487efdffa UNKNOWN * b557e6b3a0fbd8bc07c29561b787a9cff259fe04 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5350) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-3221) Support querying a table as of a savepoint
[ https://issues.apache.org/jira/browse/HUDI-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479792#comment-17479792 ] Forward Xu commented on HUDI-3221: -- hi [~fedsp] Thanks > Support querying a table as of a savepoint > -- > > Key: HUDI-3221 > URL: https://issues.apache.org/jira/browse/HUDI-3221 > Project: Apache Hudi > Issue Type: New Feature > Components: hive, reader-core, spark, writer-core >Reporter: Ethan Guo >Assignee: Forward Xu >Priority: Blocker > Labels: user-support-issues > Fix For: 0.11.0 > > > Right now point-in-time queries are limited to what's retained by the > cleaner. If we fix this and expose via SQL, then it's a gap we close. > Dataframe read path support this option but not for SQL read path > https://hudi.apache.org/docs/quick-start-guide/#time-travel-query -- This message was sent by Atlassian Jira (v8.20.1#820001)
[hudi] branch asf-site updated: [MINOR] [DOCS] fix a typo in Spark quick start example (#4657)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 0368b03 [MINOR] [DOCS] fix a typo in Spark quick start example (#4657) 0368b03 is described below commit 0368b038c34fac0e13a575c4ed3696914baaf6cc Author: 董可伦 AuthorDate: Fri Jan 21 10:52:03 2022 +0800 [MINOR] [DOCS] fix a typo in Spark quick start example (#4657) --- website/docs/quick-start-guide.md | 2 +- website/versioned_docs/version-0.10.0/quick-start-guide.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/quick-start-guide.md b/website/docs/quick-start-guide.md index 619f773..685492e 100644 --- a/website/docs/quick-start-guide.md +++ b/website/docs/quick-start-guide.md @@ -258,7 +258,7 @@ create table hudi_mor_tbl ( ts bigint ) using hudi tblproperties ( - type = 'cow', + type = 'mor', primaryKey = 'id', preCombineField = 'ts' ); diff --git a/website/versioned_docs/version-0.10.0/quick-start-guide.md b/website/versioned_docs/version-0.10.0/quick-start-guide.md index 7550712..e3f3844 100644 --- a/website/versioned_docs/version-0.10.0/quick-start-guide.md +++ b/website/versioned_docs/version-0.10.0/quick-start-guide.md @@ -258,7 +258,7 @@ create table hudi_mor_tbl ( ts bigint ) using hudi tblproperties ( - type = 'cow', + type = 'mor', primaryKey = 'id', preCombineField = 'ts' );
[GitHub] [hudi] xushiyan merged pull request #4657: [MINOR] [DOCS] fix a typo in Spark quick start example
xushiyan merged pull request #4657: URL: https://github.com/apache/hudi/pull/4657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2941) Show _hoodie_operation in spark sql results
[ https://issues.apache.org/jira/browse/HUDI-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2941: - Status: In Progress (was: Open) > Show _hoodie_operation in spark sql results > --- > > Key: HUDI-2941 > URL: https://issues.apache.org/jira/browse/HUDI-2941 > Project: Apache Hudi > Issue Type: Task > Components: spark-sql >Reporter: Raymond Xu >Assignee: Forward Xu >Priority: Critical > Labels: hudi-on-call, pull-request-available, sev:critical, > user-support-issues > Fix For: 0.11.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > Details in > [https://github.com/apache/hudi/issues/4160] > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2941) Show _hoodie_operation in spark sql results
[ https://issues.apache.org/jira/browse/HUDI-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2941: - Epic Link: HUDI-1658 > Show _hoodie_operation in spark sql results > --- > > Key: HUDI-2941 > URL: https://issues.apache.org/jira/browse/HUDI-2941 > Project: Apache Hudi > Issue Type: Task > Components: spark-sql >Reporter: Raymond Xu >Assignee: Forward Xu >Priority: Critical > Labels: hudi-on-call, pull-request-available, sev:critical, > user-support-issues > Fix For: 0.11.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > Details in > [https://github.com/apache/hudi/issues/4160] > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2941) Show _hoodie_operation in spark sql results
[ https://issues.apache.org/jira/browse/HUDI-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2941: - Sprint: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18 (was: Cont' improve - 2021/01/10, Cont' improve - 2021/01/24) > Show _hoodie_operation in spark sql results > --- > > Key: HUDI-2941 > URL: https://issues.apache.org/jira/browse/HUDI-2941 > Project: Apache Hudi > Issue Type: Task > Components: spark-sql >Reporter: Raymond Xu >Assignee: Forward Xu >Priority: Critical > Labels: hudi-on-call, pull-request-available, sev:critical, > user-support-issues > Fix For: 0.11.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > Details in > [https://github.com/apache/hudi/issues/4160] > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …
hudi-bot removed a comment on pull request #4645: URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018114106 ## CI report: * ee9f2eaa28c5836977ea980a1d50b1d65ce342ef Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5380) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …
hudi-bot commented on pull request #4645: URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018115438 ## CI report: * ee9f2eaa28c5836977ea980a1d50b1d65ce342ef Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5380) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …
hudi-bot commented on pull request #4645: URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018114106 ## CI report: * ee9f2eaa28c5836977ea980a1d50b1d65ce342ef Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5380) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun opened a new pull request #4657: [MINOR] fix typos
dongkelun opened a new pull request #4657: URL: https://github.com/apache/hudi/pull/4657 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …
hudi-bot removed a comment on pull request #4645: URL: https://github.com/apache/hudi/pull/4645#issuecomment-1017556536 ## CI report: * ee9f2eaa28c5836977ea980a1d50b1d65ce342ef Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5380) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] watermelon12138 commented on pull request #4645: [HUDI-3103] Enable MultiTableDeltaStreamer to update a single target …
watermelon12138 commented on pull request #4645: URL: https://github.com/apache/hudi/pull/4645#issuecomment-1018113861 @hudi-bot run azure re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on pull request #4643: [HUDI-3281][Performance]Tuning performance of getAllPartitionPaths API in FileSystemBackedTableMetadata
zhangyue19921010 commented on pull request #4643: URL: https://github.com/apache/hudi/pull/4643#issuecomment-1018100236 @nsivabalan Thanks a lot for your help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org