[GitHub] [hudi] slfan1989 commented on pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.
slfan1989 commented on PR #8435: URL: https://github.com/apache/hudi/pull/8435#issuecomment-1504700781 @danny0405 Can you help review this pr? Thank you very much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8299: [HUDI-5990]Avoid missing data during incremental queries
hudi-bot commented on PR #8299: URL: https://github.com/apache/hudi/pull/8299#issuecomment-1504676930 ## CI report: * 71963bcf055f63179dfdcc235478aff8487bd328 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15955) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15988) * cbb28bbdfc0434564d7ddd363bb42405b9771ed1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16278) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8299: [HUDI-5990]Avoid missing data during incremental queries
hudi-bot commented on PR #8299: URL: https://github.com/apache/hudi/pull/8299#issuecomment-1504669117 ## CI report: * 71963bcf055f63179dfdcc235478aff8487bd328 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15955) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15988) * cbb28bbdfc0434564d7ddd363bb42405b9771ed1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Mulavar commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder
Mulavar commented on PR #8385: URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504603719 > @Mulavar : This requires a table version change also as we need to create .aux files when we downgrade to older version. Can you add relevant UpgradeDowngrade handlers for this. @bvaradar thanks, done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8335: [HUDI-6009] Let the jetty server in TimelineService create daemon threads
hudi-bot commented on PR #8335: URL: https://github.com/apache/hudi/pull/8335#issuecomment-1504598719 ## CI report: * 9bcbb85e4b2bb803e03900b8f01c938833bb1185 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16108) * 919882e2014728df9d3299fd239c250bf166608c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16277) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8335: [HUDI-6009] Let the jetty server in TimelineService create daemon threads
hudi-bot commented on PR #8335: URL: https://github.com/apache/hudi/pull/8335#issuecomment-1504590323 ## CI report: * 9bcbb85e4b2bb803e03900b8f01c938833bb1185 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16108) * 919882e2014728df9d3299fd239c250bf166608c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder
hudi-bot commented on PR #8385: URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504582669 ## CI report: * 19ff36f7635f289b752b46cf692014b22f4b9ab8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16254) * 3874447e48c21cb336f28625e1682b8f229f623c UNKNOWN * 768ffaabf5934199e1afa1c0b6b37f9bb665b989 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16276) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords
danny0405 commented on code in PR #8300: URL: https://github.com/apache/hudi/pull/8300#discussion_r1163570948 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java: ## @@ -61,13 +61,8 @@ public JavaRDD> repartitionRecords(JavaRDD> reco final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled; return records.sortBy( record -> { - Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled); - // null values are replaced with empty string for null_first order - if (recordValue == null) { -return StringUtils.EMPTY_STRING; - } else { -return StringUtils.objToString(recordValue); - } + Object[] columnValues = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled); + return FlatLists.ofComparableArray(columnValues); Review Comment: We should fix `JavaCustomColumnsSortPartitioner` too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a diff in pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords
danny0405 commented on code in PR #8300: URL: https://github.com/apache/hudi/pull/8300#discussion_r1163569337 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java: ## @@ -61,13 +61,8 @@ public JavaRDD> repartitionRecords(JavaRDD> reco final boolean consistentLogicalTimestampEnabled = this.consistentLogicalTimestampEnabled; return records.sortBy( record -> { - Object recordValue = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled); - // null values are replaced with empty string for null_first order - if (recordValue == null) { -return StringUtils.EMPTY_STRING; - } else { -return StringUtils.objToString(recordValue); - } + Object[] columnValues = record.getColumnValues(schema.get(), sortColumns, consistentLogicalTimestampEnabled); + return FlatLists.ofComparableArray(columnValues); Review Comment: The default behavior is null_last, the original comment is wrong, it returned empty string for nulls, empty string should be always smaller than non empty strings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder
hudi-bot commented on PR #8385: URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504537773 ## CI report: * 19ff36f7635f289b752b46cf692014b22f4b9ab8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16254) * 3874447e48c21cb336f28625e1682b8f229f623c UNKNOWN * 768ffaabf5934199e1afa1c0b6b37f9bb665b989 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.
hudi-bot commented on PR #8435: URL: https://github.com/apache/hudi/pull/8435#issuecomment-1504529199 ## CI report: * fda3847c439a2d889bc29c6511ce26bdd922d13c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16274) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11
hudi-bot commented on PR #8429: URL: https://github.com/apache/hudi/pull/8429#issuecomment-1504529115 ## CI report: * 18f438577f444c75e8060a20b7fdf59e40e9ab7e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16262) * acd347c5e6cd019ee98b7f1fa435b95153e71238 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16273) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder
hudi-bot commented on PR #8385: URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504528832 ## CI report: * 19ff36f7635f289b752b46cf692014b22f4b9ab8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16254) * 3874447e48c21cb336f28625e1682b8f229f623c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8338: [HUDI-5996] Verify the consistency of bucket num at job sta…
hudi-bot commented on PR #8338: URL: https://github.com/apache/hudi/pull/8338#issuecomment-1504528612 ## CI report: * fccdb147c249b08d856819e028986d76603828e9 UNKNOWN * 7abfb144f1c76d65cf00115d5bfa9a82b1a28846 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16253) * 86a201c099bbc2016d53d987aa481b779464c9c2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16272) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #2701: [HUDI 1623] New Hoodie Instant on disk format with end time and milliseconds granularity
danny0405 commented on PR #2701: URL: https://github.com/apache/hudi/pull/2701#issuecomment-1504527704 A valuable PR especially for the use case in incremental style: incremental streaming read, incremental cleaning, incremental meta sync, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #7827: [SUPPORT]Errors are thrown when querying rt table of no deltalogs count(1)/count(*) by presto
ad1happy2go commented on issue #7827: URL: https://github.com/apache/hudi/issues/7827#issuecomment-1504522724 @silencily Can you please confirm if upgrading to newer presto version fixed your issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11
hudi-bot commented on PR #8429: URL: https://github.com/apache/hudi/pull/8429#issuecomment-1504521437 ## CI report: * 18f438577f444c75e8060a20b7fdf59e40e9ab7e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16262) * acd347c5e6cd019ee98b7f1fa435b95153e71238 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.
hudi-bot commented on PR #8435: URL: https://github.com/apache/hudi/pull/8435#issuecomment-1504521522 ## CI report: * fda3847c439a2d889bc29c6511ce26bdd922d13c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8338: [HUDI-5996] Verify the consistency of bucket num at job sta…
hudi-bot commented on PR #8338: URL: https://github.com/apache/hudi/pull/8338#issuecomment-1504521063 ## CI report: * fccdb147c249b08d856819e028986d76603828e9 UNKNOWN * 7abfb144f1c76d65cf00115d5bfa9a82b1a28846 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16253) * 86a201c099bbc2016d53d987aa481b779464c9c2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8144: [SUPPORT]Unable to connect to an s3 hudi table
ad1happy2go commented on issue #8144: URL: https://github.com/apache/hudi/issues/8144#issuecomment-1504517165 @peter-mccabe Are you still facing this issue? If yes, can you share complete stack trace of the error? Are you setting up S3 keys properly in Hadoop fs configuration to connect to S3. (props like - fs.s3a.access.key) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.
hudi-bot commented on PR #8434: URL: https://github.com/apache/hudi/pull/8434#issuecomment-1504513257 ## CI report: * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16269) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on pull request #7143: [HUDI-5175] Improving FileIndex load performance in PARALLELISM mode
zhangyue19921010 commented on PR #7143: URL: https://github.com/apache/hudi/pull/7143#issuecomment-1504508775 Hey Hey! @bvaradar Sorry for missing this PR. And Appreciate for your attention and review. Sure, will address this PR later this week. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6064) Improve JDBCExecutor#getTableSchema Use ColName
[ https://issues.apache.org/jira/browse/HUDI-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6064: - Labels: pull-request-available (was: ) > Improve JDBCExecutor#getTableSchema Use ColName > --- > > Key: HUDI-6064 > URL: https://issues.apache.org/jira/browse/HUDI-6064 > Project: Apache Hudi > Issue Type: Improvement > Components: hive >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code > reading, use ColName instead of ColIndex. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] LinMingQiang commented on pull request #8338: [HUDI-5996] Verify the consistency of bucket num at job sta…
LinMingQiang commented on PR #8338: URL: https://github.com/apache/hudi/pull/8338#issuecomment-1504502481 add test cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] slfan1989 opened a new pull request, #8435: [HUDI-6064] Improve JDBCExecutor#getTableSchema Use ColName.
slfan1989 opened a new pull request, #8435: URL: https://github.com/apache/hudi/pull/8435 ### Change Logs JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code reading, use ColName instead of ColIndex. ### Impact none. ### Risk level (write none, low medium or high below) none. ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ none. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6064) Improve JDBCExecutor#getTableSchema Use ColName
Shilun Fan created HUDI-6064: Summary: Improve JDBCExecutor#getTableSchema Use ColName Key: HUDI-6064 URL: https://issues.apache.org/jira/browse/HUDI-6064 Project: Apache Hudi Issue Type: Improvement Components: hive Reporter: Shilun Fan Assignee: Shilun Fan JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code reading, use ColName instead of ColIndex. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6064) Improve JDBCExecutor#getTableSchema Use ColName
[ https://issues.apache.org/jira/browse/HUDI-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HUDI-6064: - Status: In Progress (was: Open) > Improve JDBCExecutor#getTableSchema Use ColName > --- > > Key: HUDI-6064 > URL: https://issues.apache.org/jira/browse/HUDI-6064 > Project: Apache Hudi > Issue Type: Improvement > Components: hive >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > > JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code > reading, use ColName instead of ColIndex. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] xccui commented on issue #8325: [SUPPORT] spark read hudi error: Unable to instantiate HFileBootstrapIndex
xccui commented on issue #8325: URL: https://github.com/apache/hudi/issues/8325#issuecomment-1504472901 Got some time today to take a closer look at the errors. `HFileBootstrapIndex` needs to access some remote data during initialization. There should be some connection issues (e.g. file system closed or connection interrupted due to some reason) causing the initialization to fail. Shouldn't be a compatibility problem. Maybe we could move some logic out of the constructor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8433: [minor] avoid synchronized block in ReflectionUtils if key is present in cache
hudi-bot commented on PR #8433: URL: https://github.com/apache/hudi/pull/8433#issuecomment-1504463584 ## CI report: * 106eefb312139bfec944a5693e2e3608f0a11bd1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16268) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on a diff in pull request #8378: [HUDI-6031] fix bug: checkpoint lost after changing cow to mor
bvaradar commented on code in PR #8378: URL: https://github.com/apache/hudi/pull/8378#discussion_r1163506777 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java: ## @@ -650,16 +648,19 @@ private JavaRDD getTransformedRDD(Dataset rowDataset, boolea /** * Process previous commit metadata and checkpoint configs set by user to determine the checkpoint to resume from. - * @param commitTimelineOpt commit timeline of interest. + * + * @param commitsTimelineOpt commits timeline of interest, including .commit and .deltacommit. * @return the checkpoint to resume from if applicable. * @throws IOException */ - private Option getCheckpointToResume(Option commitTimelineOpt) throws IOException { + private Option getCheckpointToResume(Option commitsTimelineOpt) throws IOException { Option resumeCheckpointStr = Option.empty(); -Option lastCommit = commitTimelineOpt.get().lastInstant(); +// try get checkpoint from commits(including commit and deltacommit) +// in COW migrating to MOR case, the first batch of the deltastreamer will lost the checkpoint from COW table, cause the dataloss +Option lastCommit = commitsTimelineOpt.get().lastInstant(); Review Comment: For MOR table, we need to only read .deltacommit files if there is atleast one .deltacommit in the timeline. Otherwise, pick the latest .commit file. This is safe approach . If there are no .deltacommit, then this table is either empty or just being converted from COW to MOR. In this case, pick the latest .commit and read checkpoint from there. So, the pseudo-code is something like ``` boolean hasNoDeltaCommit = commitsTimelineOpt.filter(instant -> instant.action.equals(HoodieTimeline.DELTA_COMMIT_ACTION).empty() if (isMOR && hasNoDeltaCommit) { commitsTimelineOpt = commitsTimelineOpt.filter(instant -> !instant.action.equals(HoodieTimeline.DELTA_COMMIT_ACTION) } /// Rest of the code ``` Let me know if you have questions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wuwenchi commented on pull request #7834: [HUDI-5690] Add simpleBucketPartitioner to support using the simple bucket index under bulkinsert
wuwenchi commented on PR #7834: URL: https://github.com/apache/hudi/pull/7834#issuecomment-1504421161 > @wuwenchi : Please ping me in this PR once you have addressed all comments and is ready for review. @bvaradar All comments have now been corrected, it's ready to review now, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on pull request #8424: [HUDI-6057] Fix deltastreamer shutdown when post write termination strategy enabled
codope commented on PR #8424: URL: https://github.com/apache/hudi/pull/8424#issuecomment-1504419143 @LiJie20190102 Have you checked if this fix solves your problem? https://github.com/apache/hudi/pull/8173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on pull request #7834: [HUDI-5690] Add simpleBucketPartitioner to support using the simple bucket index under bulkinsert
bvaradar commented on PR #7834: URL: https://github.com/apache/hudi/pull/7834#issuecomment-1504414285 @wuwenchi : Please ping me in this PR once you have addressed all comments and is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on pull request #7143: [HUDI-5175] Improving FileIndex load performance in PARALLELISM mode
bvaradar commented on PR #7143: URL: https://github.com/apache/hudi/pull/7143#issuecomment-1504411183 @zhangyue19921010 : Pinging to see if you can address review comments ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on pull request #7913: Adding support for EPOCHMICROSECONDS in TimestampBasedAvroKeyGenerator
bvaradar commented on PR #7913: URL: https://github.com/apache/hudi/pull/7913#issuecomment-1504409552 @sydneybeal : Pinging to see if you can address the comments ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (8f014c033c3 -> 10040de05ad)
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 8f014c033c3 [HUDI-6014] Remove unused import in hudi-spark (#8350) add 10040de05ad [HUDI-5389] Remove Hudi Cli Duplicates Code. (#8360) No new revisions were added by this update. Summary of changes: .../apache/hudi/cli/commands/RepairsCommand.java | 2 +- .../org/apache/hudi/cli/commands/SparkMain.java| 4 +- .../scala/org/apache/hudi/cli/DeDupeType.scala | 28 --- .../scala/org/apache/hudi/cli/DedupeSparkJob.scala | 248 - .../scala/org/apache/hudi/cli/SparkHelpers.scala | 147 .../org/apache/spark/sql/hudi/DedupeSparkJob.scala | 4 +- 6 files changed, 5 insertions(+), 428 deletions(-) delete mode 100644 hudi-cli/src/main/scala/org/apache/hudi/cli/DeDupeType.scala delete mode 100644 hudi-cli/src/main/scala/org/apache/hudi/cli/DedupeSparkJob.scala delete mode 100644 hudi-cli/src/main/scala/org/apache/hudi/cli/SparkHelpers.scala
[GitHub] [hudi] bvaradar merged pull request #8360: [HUDI-5389] Remove Hudi Cli Duplicates Code.
bvaradar merged PR #8360: URL: https://github.com/apache/hudi/pull/8360 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on pull request #7680: [HUDI-5548] spark sql show | update hudi's table properties
bvaradar commented on PR #7680: URL: https://github.com/apache/hudi/pull/7680#issuecomment-1504392751 @XuQianJin-Stars : Can you let us know if you will be able to look at the failing test and also rebase please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] bvaradar commented on pull request #8385: [HUDI-6040]Stop writing and reading compaction plans from .aux folder
bvaradar commented on PR #8385: URL: https://github.com/apache/hudi/pull/8385#issuecomment-1504389808 @Mulavar : We need to create a higher version 6 and write upgrade/downgrade to handle transition from current version (5). You can look at https://github.com/apache/hudi/pull/6248 as example of how to do this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs
hudi-bot commented on PR #7881: URL: https://github.com/apache/hudi/pull/7881#issuecomment-1504379935 ## CI report: * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN * a2a75f077cf831e05b5659eaf0990ebc4865622e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16267) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-6014] Remove unused import in hudi-spark (#8350)
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 8f014c033c3 [HUDI-6014] Remove unused import in hudi-spark (#8350) 8f014c033c3 is described below commit 8f014c033c3b7332d91fa09fce0631e5a59600d7 Author: huangxiaoping <1754789...@qq.com> AuthorDate: Wed Apr 12 09:16:38 2023 +0800 [HUDI-6014] Remove unused import in hudi-spark (#8350) --- .../spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala| 2 -- .../apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala | 1 - .../sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala | 1 - .../src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala | 3 --- 4 files changed, 7 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala index c6c39b73989..97930432e4e 100644 --- a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala +++ b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ExportInstantsProcedure.scala @@ -32,8 +32,6 @@ import org.apache.hudi.common.table.timeline.{HoodieInstant, HoodieTimeline, Tim import org.apache.hudi.exception.HoodieException import org.apache.spark.internal.Logging import org.apache.spark.sql.Row -import org.apache.spark.sql.catalyst.TableIdentifier -import org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, StructType} import java.io.File import java.util diff --git a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala index ca8b3fc95bc..43d636b65ec 100644 --- a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala +++ b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/RunCleanProcedure.scala @@ -22,7 +22,6 @@ import org.apache.hudi.client.SparkRDDWriteClient import org.apache.hudi.common.table.timeline.HoodieActiveTimeline import org.apache.hudi.common.util.JsonUtils import org.apache.hudi.config.HoodieCleanConfig -import org.apache.hudi.table.action.clean.CleaningTriggerStrategy import org.apache.spark.internal.Logging import org.apache.spark.sql.Row import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, StructType} diff --git a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala index d75df07fc9d..e245159c849 100644 --- a/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala +++ b/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/ShowTablePropertiesProcedure.scala @@ -17,7 +17,6 @@ package org.apache.spark.sql.hudi.command.procedures -import org.apache.hudi.HoodieCLIUtils import org.apache.hudi.common.table.HoodieTableMetaClient import org.apache.spark.sql.Row import org.apache.spark.sql.types.{DataTypes, Metadata, StructField, StructType} diff --git a/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala b/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala index 9759356b720..c99f2b197a1 100644 --- a/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala +++ b/hudi-spark-datasource/hudi-spark2/src/main/scala/org/apache/hudi/Spark2HoodieFileScanRDD.scala @@ -18,12 +18,9 @@ package org.apache.hudi -import org.apache.hudi.HoodieUnsafeRDD import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.expressions.AttributeReference import org.apache.spark.sql.execution.datasources.{FilePartition, FileScanRDD, PartitionedFile} -import org.apache.spark.sql.types.StructType class Spark2HoodieFileScanRDD(@transient private val sparkSession: SparkSession, read: PartitionedFile => Iterator[InternalRow],
[GitHub] [hudi] bvaradar merged pull request #8350: [HUDI-6014] Remove unused import in hudi-spark
bvaradar merged PR #8350: URL: https://github.com/apache/hudi/pull/8350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.
hudi-bot commented on PR #8434: URL: https://github.com/apache/hudi/pull/8434#issuecomment-1504336034 ## CI report: * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16269) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8433: [minor] avoid synchronized block in ReflectionUtils if key is present in cache
hudi-bot commented on PR #8433: URL: https://github.com/apache/hudi/pull/8433#issuecomment-1504335986 ## CI report: * 106eefb312139bfec944a5693e2e3608f0a11bd1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16268) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.
hudi-bot commented on PR #8434: URL: https://github.com/apache/hudi/pull/8434#issuecomment-1504328434 ## CI report: * 321a9073e72e4c06cc8c93b6fb114b9ed6aecfbd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8433: [minor] avoid synchronized block in ReflectionUtils if key is present in cache
hudi-bot commented on PR #8433: URL: https://github.com/apache/hudi/pull/8433#issuecomment-1504328379 ## CI report: * 106eefb312139bfec944a5693e2e3608f0a11bd1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6063) Modify logging errors In JDBCExecutor
[ https://issues.apache.org/jira/browse/HUDI-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6063: - Labels: pull-request-available (was: ) > Modify logging errors In JDBCExecutor > - > > Key: HUDI-6063 > URL: https://issues.apache.org/jira/browse/HUDI-6063 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > Labels: pull-request-available > > There is a logging error in JDBCExecutor. During the process of drop > partitions, the log prints add partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] slfan1989 opened a new pull request, #8434: [HUDI-6063] Modify logging errors In JDBCExecutor.
slfan1989 opened a new pull request, #8434: URL: https://github.com/apache/hudi/pull/8434 ### Change Logs There is a logging error in JDBCExecutor. During the process of drop partitions, the log prints add partitions. ### Impact none. ### Risk level (write none, low medium or high below) none. ### Documentation Update none. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-6063) Modify logging errors In JDBCExecutor
[ https://issues.apache.org/jira/browse/HUDI-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan reassigned HUDI-6063: Assignee: Shilun Fan > Modify logging errors In JDBCExecutor > - > > Key: HUDI-6063 > URL: https://issues.apache.org/jira/browse/HUDI-6063 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > > There is a logging error in JDBCExecutor. During the process of drop > partitions, the log prints add partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6063) Modify logging errors In JDBCExecutor
[ https://issues.apache.org/jira/browse/HUDI-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HUDI-6063: - Status: In Progress (was: Open) > Modify logging errors In JDBCExecutor > - > > Key: HUDI-6063 > URL: https://issues.apache.org/jira/browse/HUDI-6063 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: Shilun Fan >Assignee: Shilun Fan >Priority: Major > > There is a logging error in JDBCExecutor. During the process of drop > partitions, the log prints add partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6063) Modify logging errors In JDBCExecutor
Shilun Fan created HUDI-6063: Summary: Modify logging errors In JDBCExecutor Key: HUDI-6063 URL: https://issues.apache.org/jira/browse/HUDI-6063 Project: Apache Hudi Issue Type: Bug Components: hive Reporter: Shilun Fan There is a logging error in JDBCExecutor. During the process of drop partitions, the log prints add partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] slfan1989 commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync
slfan1989 commented on code in PR #8388: URL: https://github.com/apache/hudi/pull/8388#discussion_r1163442102 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java: ## @@ -258,13 +258,28 @@ protected void syncHoodieTable(String tableName, boolean useRealtimeInputFormat, lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName); } LOG.info("Last commit time synced was found to be " + lastCommitTimeSynced.orElse("null")); -List writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced); -LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size()); -// Sync the partitions if needed -// find dropped partitions, if any, in the latest commit -Set droppedPartitions = syncClient.getDroppedPartitionsSince(lastCommitTimeSynced); -boolean partitionsChanged = syncPartitions(tableName, writtenPartitionsSince, droppedPartitions); +boolean partitionsChanged; +if (!lastCommitTimeSynced.isPresent() +|| syncClient.getActiveTimeline().isBeforeTimelineStarts(lastCommitTimeSynced.get())) { + // If the last commit time synced is before the start of the active timeline, + // the Hive sync falls back to list all partitions on storage, instead of + // reading active and archived timelines for written partitions. + LOG.info("Sync all partitions given the last commit time synced is empty or " + + "before the start of the active timeline. Listing all partitions in " + + config.getString(META_SYNC_BASE_PATH) + + ", file system: " + config.getHadoopFileSystem()); + partitionsChanged = syncAllPartitions(tableName); +} else { + List writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced); + LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size()); Review Comment: LOG.info("Storage partitions scan complete. Found {}.", writtenPartitionsSince.size()); -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] slfan1989 commented on a diff in pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync
slfan1989 commented on code in PR #8388: URL: https://github.com/apache/hudi/pull/8388#discussion_r1163441854 ## hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java: ## @@ -258,13 +258,28 @@ protected void syncHoodieTable(String tableName, boolean useRealtimeInputFormat, lastCommitTimeSynced = syncClient.getLastCommitTimeSynced(tableName); } LOG.info("Last commit time synced was found to be " + lastCommitTimeSynced.orElse("null")); -List writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced); -LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size()); -// Sync the partitions if needed -// find dropped partitions, if any, in the latest commit -Set droppedPartitions = syncClient.getDroppedPartitionsSince(lastCommitTimeSynced); -boolean partitionsChanged = syncPartitions(tableName, writtenPartitionsSince, droppedPartitions); +boolean partitionsChanged; +if (!lastCommitTimeSynced.isPresent() +|| syncClient.getActiveTimeline().isBeforeTimelineStarts(lastCommitTimeSynced.get())) { + // If the last commit time synced is before the start of the active timeline, + // the Hive sync falls back to list all partitions on storage, instead of + // reading active and archived timelines for written partitions. + LOG.info("Sync all partitions given the last commit time synced is empty or " + + "before the start of the active timeline. Listing all partitions in " + + config.getString(META_SYNC_BASE_PATH) + + ", file system: " + config.getHadoopFileSystem()); + partitionsChanged = syncAllPartitions(tableName); +} else { + List writtenPartitionsSince = syncClient.getWrittenPartitionsSince(lastCommitTimeSynced); + LOG.info("Storage partitions scan complete. Found " + writtenPartitionsSince.size()); Review Comment: Our logging has changed to slf4j, can we use {}? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] the-other-tim-brown opened a new pull request, #8433: [minor] avoid synchronized block in ReflectionUtils if key is present in cache
the-other-tim-brown opened a new pull request, #8433: URL: https://github.com/apache/hudi/pull/8433 ### Change Logs Avoids acquiring a lock to check whether a value is present in a cache to allow better performance when the value is already in the cache. ### Impact This method is invoked on all rows in the DeltaStreamer when building the payload class. This should provide a minor improvement in execution time. ### Risk level (write none, low medium or high below) none ### Documentation Update NA ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs
hudi-bot commented on PR #7881: URL: https://github.com/apache/hudi/pull/7881#issuecomment-1504268255 ## CI report: * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN * 8fd9b3a58eb63e330b306ed70843e677dfbc4a2d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16223) * a2a75f077cf831e05b5659eaf0990ebc4865622e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16267) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7881: [HUDI-5723] Automate and standardize enum configs
hudi-bot commented on PR #7881: URL: https://github.com/apache/hudi/pull/7881#issuecomment-1504260452 ## CI report: * c378a74c177a2f1a924609a44f0978ee347d272a UNKNOWN * 8fd9b3a58eb63e330b306ed70843e677dfbc4a2d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16223) * a2a75f077cf831e05b5659eaf0990ebc4865622e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8432: Fix NPE when upsert merger and null map or array
hudi-bot commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1504253664 ## CI report: * f4502dad350e0dc84299dc0bd5889506420b0f49 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16266) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5997) Support DFS Schema Provider with S3/GCS EventsHoodieIncrSource
[ https://issues.apache.org/jira/browse/HUDI-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Léo Biscassi updated HUDI-5997: --- Status: In Progress (was: Open) > Support DFS Schema Provider with S3/GCS EventsHoodieIncrSource > -- > > Key: HUDI-5997 > URL: https://issues.apache.org/jira/browse/HUDI-5997 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Sagar Sumit >Assignee: Léo Biscassi >Priority: Major > Fix For: 0.14.0 > > > See for more details -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-5997) Support DFS Schema Provider with S3/GCS EventsHoodieIncrSource
[ https://issues.apache.org/jira/browse/HUDI-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Léo Biscassi reassigned HUDI-5997: -- Assignee: Léo Biscassi > Support DFS Schema Provider with S3/GCS EventsHoodieIncrSource > -- > > Key: HUDI-5997 > URL: https://issues.apache.org/jira/browse/HUDI-5997 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Sagar Sumit >Assignee: Léo Biscassi >Priority: Major > Fix For: 0.14.0 > > > See for more details -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs
jonvex commented on code in PR #7881: URL: https://github.com/apache/hudi/pull/7881#discussion_r1163391536 ## hudi-common/src/test/java/org/apache/hudi/common/config/TestConfigProperty.java: ## @@ -171,4 +171,28 @@ public void testAdvancedValue() { assertTrue(FAKE_BOOLEAN_CONFIG.markAdvanced().isAdvanced()); assertTrue(FAKE_BOOLEAN_CONFIG_NO_DEFAULT.markAdvanced().isAdvanced()); } + + @EnumDescription("Test enum description.") + public enum TestEnum { Review Comment: That has to happen in the getter and/or the setter, it won't happen in configproperty -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2
hudi-bot commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1504203496 ## CI report: * f43a772d2efe7d19657b44d2ce8b92b8fcee390f UNKNOWN * 58edb8dbad6d6e4dd7455bcabc5e5f70369493ab Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16265) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs
jonvex commented on code in PR #7881: URL: https://github.com/apache/hudi/pull/7881#discussion_r1163375572 ## hudi-common/src/main/java/org/apache/hudi/common/config/ConfigProperty.java: ## @@ -139,6 +144,49 @@ public ConfigProperty withDocumentation(String doc) { return new ConfigProperty<>(key, defaultValue, docOnDefaultValue, doc, sinceVersion, deprecatedVersion, inferFunction, validValues, advanced, alternatives); } + public > ConfigProperty withDocumentation(Class e) { +return withDocumentation(e,""); + } + + private > boolean isDefaultField(Class e, Field f) { +if (!hasDefaultValue()) { + return false; +} +if (defaultValue() instanceof String) { + return f.getName().equals(defaultValue()); +} +return Enum.valueOf(e, f.getName()).equals(defaultValue()); + } + + public > ConfigProperty withDocumentation(Class e, String doc) { Review Comment: Why? Sometimes the config needs some extra explanation that the enum doesn't provide -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs
jonvex commented on code in PR #7881: URL: https://github.com/apache/hudi/pull/7881#discussion_r1163352137 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -168,23 +168,16 @@ public class HoodieWriteConfig extends HoodieConfig { public static final ConfigProperty WRITE_EXECUTOR_TYPE = ConfigProperty .key("hoodie.write.executor.type") - .defaultValue(SIMPLE.name()) - .withValidValues(Arrays.stream(ExecutorType.values()).map(Enum::name).toArray(String[]::new)) - .sinceVersion("0.13.0") - .withDocumentation("Set executor which orchestrates concurrent producers and consumers communicating through a message queue." - + "BOUNDED_IN_MEMORY: Use LinkedBlockingQueue as a bounded in-memory queue, this queue will use extra lock to balance producers and consumer" - + "DISRUPTOR: Use disruptor which a lock free message queue as inner message, this queue may gain better writing performance if lock was the bottleneck. " - + "SIMPLE(default): Executor with no inner message queue and no inner lock. Consuming and writing records from iterator directly. Compared with BIM and DISRUPTOR, " - + "this queue has no need for additional memory and cpu resources due to lock or multithreading, but also lost some benefits such as speed limit. " - + "Although DISRUPTOR is still experimental."); + .defaultValue(ExecutorType.SIMPLE.name()) + .withDocumentation(ExecutorType.class) + .sinceVersion("0.13.0"); public static final ConfigProperty KEYGENERATOR_TYPE = ConfigProperty .key("hoodie.datasource.write.keygenerator.type") .defaultValue(KeyGeneratorType.SIMPLE.name()) - .withDocumentation("Easily configure one the built-in key generators, instead of specifying the key generator class." - + "Currently supports SIMPLE, COMPLEX, TIMESTAMP, CUSTOM, NON_PARTITION, GLOBAL_DELETE. " - + "**Note** This is being actively worked on. Please use " - + "`hoodie.datasource.write.keygenerator.class` instead."); + .withDocumentation(KeyGeneratorType.class, Review Comment: This seems correct to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs
jonvex commented on code in PR #7881: URL: https://github.com/apache/hudi/pull/7881#discussion_r1163346107 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java: ## @@ -732,40 +712,17 @@ public String getValue() { } } + @EnumDescription("Clustering mode to use.") public enum ClusteringOperator { -/** - * only schedule the clustering plan - */ -SCHEDULE("schedule"), - -/** - * only execute then pending clustering plans - */ -EXECUTE("execute"), - -/** - * schedule cluster first, and execute all pending clustering plans - */ -SCHEDULE_AND_EXECUTE("scheduleandexecute"); +@EnumFieldDescription("Only schedule the clustering plan.") +SCHEDULE, -private static final Map VALUE_TO_ENUM_MAP = -TypeUtils.getValueToEnumMap(ClusteringOperator.class, e -> e.value); +@EnumFieldDescription("Only execute pending clustering plans.") +EXECUTE, -private final String value; - -ClusteringOperator(String value) { - this.value = value; -} - -@Nonnull -public static ClusteringOperator fromValue(String value) { - ClusteringOperator enumValue = VALUE_TO_ENUM_MAP.get(value); - if (enumValue == null) { -throw new HoodieException(String.format("Invalid value (%s)", value)); - } - return enumValue; -} +@EnumFieldDescription("Schedule cluster first, and execute all pending clustering plans.") +SCHEDULE_AND_EXECUTE; Review Comment: Yeah, I reverted it. I mentioned it in the jira issue I created -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8432: Fix NPE when upsert merger and null map or array
hudi-bot commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-1504104875 ## CI report: * f4502dad350e0dc84299dc0bd5889506420b0f49 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16266) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8430: [HUDI-6060] Added a config to backup instants before deletion during rollbacks and restores.
hudi-bot commented on PR #8430: URL: https://github.com/apache/hudi/pull/8430#issuecomment-1504104816 ## CI report: * d357330a200b9c5ad7f719d9985d40ef2e604d51 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16263) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6062) Update LayoutOptimizationStrategy and ClusteringOperator to use standard enum notation
Jonathan Vexler created HUDI-6062: - Summary: Update LayoutOptimizationStrategy and ClusteringOperator to use standard enum notation Key: HUDI-6062 URL: https://issues.apache.org/jira/browse/HUDI-6062 Project: Apache Hudi Issue Type: Improvement Components: clustering, code-quality, configs Reporter: Jonathan Vexler ClusteringOperator and LayoutOptimizationStrategy have enums with values that are not capitalized snake case like every other config. We need to maintain backwards compatibility so we can't just change this. To make this change, we need to have the old values be translated to the updated values so that if a user uses the old values, it will still work. For example if the hoodie.layout.optimize.strategy config is set to "z-order" we need to translate it to "ZORDER" and then use "ZORDER" internally. But the user could also set the config to "ZORDER" of course. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-6061) NPE with nullable MapType and new hudi merger
nicolas paris created HUDI-6061: --- Summary: NPE with nullable MapType and new hudi merger Key: HUDI-6061 URL: https://issues.apache.org/jira/browse/HUDI-6061 Project: Apache Hudi Issue Type: Bug Components: core Reporter: nicolas paris Fix For: 0.13.1 In 0.13.0, when dealing with null map values during an upsert with the new hudi merger api, then null pointer raises. AFAIK, it happens when both MapTypes are containing null in different maner. See [issue]([https://github.com/apache/hudi/issues/8431)] for details See [PR]([https://github.com/apache/hudi/pull/8432)] for details -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs
jonvex commented on code in PR #7881: URL: https://github.com/apache/hudi/pull/7881#discussion_r1163302008 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/bootstrap/BootstrapMode.java: ## @@ -18,18 +18,28 @@ package org.apache.hudi.client.bootstrap; +import org.apache.hudi.common.config.EnumDescription; +import org.apache.hudi.common.config.EnumFieldDescription; + /** * Identifies different types of bootstrap. */ +@EnumDescription("Bootstrap mode to apply for partition paths that match the regex set in `hoodie.bootstrap.mode.selector.regex`.") Review Comment: You don't need to use regex selector. It's also used in uniform selector -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8432: Fix NPE when upsert merger and null map or array
hudi-bot commented on PR #8432: URL: https://github.com/apache/hudi/pull/8432#issuecomment-150405 ## CI report: * f4502dad350e0dc84299dc0bd5889506420b0f49 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] parisni opened a new pull request, #8432: Fix NPE when upsert merger and null map or array
parisni opened a new pull request, #8432: URL: https://github.com/apache/hudi/pull/8432 ### Change Logs Fixes #8431 ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11
hudi-bot commented on PR #8429: URL: https://github.com/apache/hudi/pull/8429#issuecomment-1504039432 ## CI report: * 18f438577f444c75e8060a20b7fdf59e40e9ab7e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16262) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] parisni opened a new issue, #8431: [SUPPORT] NPE with MapType and new hudi merger
parisni opened a new issue, #8431: URL: https://github.com/apache/hudi/issues/8431 **Describe the problem you faced** When dealing with null map values during an upsert with the new hudi merger api, then null pointer raises. AFAIK, it happens when both MapTypes are containing null in different maner. **To Reproduce** ```python from pyspark.sql.types import StructType, StructField, IntegerType, StringType, MapType, ArrayType, TimestampType tableName = 'test_hudi' basePath = "/tmp/{tableName}".format(tableName=tableName) data = [ ("a", None, 1, 'b'), ] schema = StructType( [ StructField("event_id", StringType(), True), StructField( "mp", MapType(StringType(), ArrayType(TimestampType(), False), False) ), StructField("version", IntegerType(), True), StructField("event_date", StringType(), True), ] ) df = ( spark.createDataFrame(data=data, schema=schema) ) # # INIT THE TABLE WITH INSERT # hudi_options = { "hoodie.table.name": tableName, "hoodie.datasource.write.recordkey.field": "event_id", "hoodie.datasource.write.partitionpath.field": "event_date", "hoodie.datasource.write.table.name": tableName, "hoodie.datasource.write.operation": "insert", "hoodie.datasource.write.precombine.field": "version", "hoodie.upsert.shuffle.parallelism": 1, "hoodie.insert.shuffle.parallelism": 1, "hoodie.delete.shuffle.parallelism": 1, "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator", "hoodie.datasource.write.hive_style_partitioning": "true", "hoodie.datasource.hive_sync.database": "default", "hoodie.datasource.hive_sync.table": tableName, "hoodie.datasource.hive_sync.mode": "jdbc", "hoodie.combine.before.insert":"true", "hoodie.datasource.hive_sync.enable": "false", "hoodie.datasource.hive_sync.partition_fields": "event_date", "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator", "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor", 'hoodie.datasource.hive_sync.use_jdbc': False, "hoodie.merge.allow.duplicate.on.inserts":"true", "hoodie.metadata.enable": "true", #"hoodie.datasource.write.payload.class": "org.apache.hudi.common.model.DefaultHoodieRecordPayload", "hoodie.payload.ordering.field": "version", "hoodie.payload.event.time.field": "version", "hoodie.datasource.write.record.merger.impls": "org.apache.hudi.HoodieSparkRecordMerger" } (df.write.format("hudi").options(**hudi_options).mode("overwrite").save(basePath)) spark.read.format("hudi").load(basePath).printSchema() data = [ ("a", None, 1, 'b'), ] schema = StructType( [ StructField("event_id", StringType(), True), StructField( "mp", MapType(StringType(), ArrayType(TimestampType(), True), False) ), StructField("version", IntegerType(), True), StructField("event_date", StringType(), True), ] ) df = ( spark.createDataFrame(data=data, schema=schema) ) # # NOW UPSERT DATA WITH A DIFFERENT SCHEMA # hudi_options = { "hoodie.table.name": tableName, "hoodie.datasource.write.recordkey.field": "event_id", "hoodie.datasource.write.partitionpath.field": "event_date", "hoodie.datasource.write.table.name": tableName, "hoodie.datasource.write.operation": "upsert", "hoodie.datasource.write.precombine.field": "version", "hoodie.upsert.shuffle.parallelism": 1, "hoodie.insert.shuffle.parallelism": 1, "hoodie.delete.shuffle.parallelism": 1, "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator", "hoodie.datasource.write.hive_style_partitioning": "true", "hoodie.datasource.hive_sync.database": "default", "hoodie.datasource.hive_sync.table": tableName, "hoodie.datasource.hive_sync.mode": "jdbc", "hoodie.combine.before.insert":"true", "hoodie.datasource.hive_sync.enable": "false", "hoodie.datasource.hive_sync.partition_fields": "event_date", "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator", "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor", 'hoodie.datasource.hive_sync.use_jdbc': False, "hoodie.merge.allow.duplicate.on.inserts":"true", "hoodie.metadata.enable": "true", "hoodie.payload.ordering.field": "version", "hoodie.payload.event.time.field": "version", "hoodie.datasource.write.record.merger.impls":
[GitHub] [hudi] CTTY commented on a diff in pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2
CTTY commented on code in PR #8082: URL: https://github.com/apache/hudi/pull/8082#discussion_r1163228965 ## hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark332PlusHoodieParquetFileFormat.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.parquet + +import org.apache.hadoop.conf.Configuration +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.execution.datasources.PartitionedFile +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.types.StructType + +class Spark332PlusHoodieParquetFileFormat(override protected val shouldAppendPartitionValues: Boolean) extends Spark32PlusHoodieParquetFileFormat(shouldAppendPartitionValues) { Review Comment: With this class under `hudi-spark3.3.x`, Hudi won't be able to compile with Spark 3.3.1 anymore ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/HoodieSparkFileReaderFactory.java: ## @@ -33,6 +33,7 @@ protected HoodieFileReader newParquetFileReader(Configuration conf, Path path) { conf.setIfUnset(SQLConf.PARQUET_INT96_AS_TIMESTAMP().key(), SQLConf.PARQUET_INT96_AS_TIMESTAMP().defaultValueString()); conf.setIfUnset(SQLConf.CASE_SENSITIVE().key(), SQLConf.CASE_SENSITIVE().defaultValueString()); +conf.setIfUnset("spark.sql.legacy.parquet.nanosAsLong", "false"); Review Comment: nit: Can we add a comment to explain why we put a plain string here? ## hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark332PlusHoodieParquetFileFormat.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.parquet + +import org.apache.hadoop.conf.Configuration +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.execution.datasources.PartitionedFile +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.types.StructType + +class Spark332PlusHoodieParquetFileFormat(override protected val shouldAppendPartitionValues: Boolean) extends Spark32PlusHoodieParquetFileFormat(shouldAppendPartitionValues) { + + override def buildReaderWithPartitionValues(sparkSession: SparkSession, + dataSchema: StructType, + partitionSchema: StructType, + requiredSchema: StructType, + filters: Seq[Filter], + options: Map[String, String], + hadoopConf: Configuration): PartitionedFile => Iterator[InternalRow] = { +// Sets flags for `ParquetToSparkSchemaConverter` +hadoopConf.setBoolean(SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.key, sparkSession.sessionState.conf.legacyParquetNanosAsLong) Review Comment: Maybe use string here for property name would help build issues with Spark 3.3.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to
[GitHub] [hudi] hudi-bot commented on pull request #8300: [HUDI-5991] Fix RDDCustomColumnsSortPartitioner's RepartitionRecords
hudi-bot commented on PR #8300: URL: https://github.com/apache/hudi/pull/8300#issuecomment-1503970614 ## CI report: * b7ab237090a715521e580113486849489d1bf00c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16260) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2
hudi-bot commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1503894762 ## CI report: * f43a772d2efe7d19657b44d2ce8b92b8fcee390f UNKNOWN * 46a2c22795b5e1be2bc74f92090cd1a496ea9a39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15754) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16264) * 58edb8dbad6d6e4dd7455bcabc5e5f70369493ab Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16265) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2
hudi-bot commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1503886479 ## CI report: * f43a772d2efe7d19657b44d2ce8b92b8fcee390f UNKNOWN * 46a2c22795b5e1be2bc74f92090cd1a496ea9a39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15754) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16264) * 58edb8dbad6d6e4dd7455bcabc5e5f70369493ab UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2
hudi-bot commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1503877790 ## CI report: * f43a772d2efe7d19657b44d2ce8b92b8fcee390f UNKNOWN * 46a2c22795b5e1be2bc74f92090cd1a496ea9a39 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15754) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16264) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8358: [HUDI-6017] Sort the results of Call help Procedure with no params
hudi-bot commented on PR #8358: URL: https://github.com/apache/hudi/pull/8358#issuecomment-1503878485 ## CI report: * 2c0e780e2dce3717fc3586417b5110dea2ca028c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16259) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11
nsivabalan commented on PR #8429: URL: https://github.com/apache/hudi/pull/8429#issuecomment-1503874676 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8016: Inline Clustering : Clustering failed to write to files
ad1happy2go commented on issue #8016: URL: https://github.com/apache/hudi/issues/8016#issuecomment-1503856835 @raghavant-git Did you got a chance to test with those parameters? Are you still facing this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex commented on a diff in pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark
jonvex commented on code in PR #8303: URL: https://github.com/apache/hudi/pull/8303#discussion_r1163162168 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala: ## @@ -270,6 +271,21 @@ object DefaultSource { } } + private def resolveHoodieBootstrapRelation(sqlContext: SQLContext, + globPaths: Seq[Path], + userSchema: Option[StructType], + metaClient: HoodieTableMetaClient, + parameters: Map[String, String]): BaseRelation = { +val enableFileIndex = HoodieSparkConfUtils.getConfigValue(parameters, sqlContext.sparkSession.sessionState.conf, + ENABLE_HOODIE_FILE_INDEX.key, ENABLE_HOODIE_FILE_INDEX.defaultValue.toString).toBoolean +if (!enableFileIndex || globPaths.nonEmpty || parameters.getOrElse(HoodieBootstrapConfig.DATA_QUERIES_ONLY.key(), "true") != "true") { Review Comment: When I set a breakpoint here, userschema was null -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Madan16 commented on issue #8428: [SUPPORT]: When trying to UPSERT, Getting issues like : An error occurred while calling o168.save. org/apache/spark/sql/avro/SchemaConverters$ AND
Madan16 commented on issue #8428: URL: https://github.com/apache/hudi/issues/8428#issuecomment-1503853073 > @Madan16 I wanted to ask were you using AWS Glue version : Glue 3.0 only from start. (When the job is successful) > > My guess is somehow the version mismatch might be happening which is resulting in ClassNot Found for SchemaConverters which is not present in older avro versions. @ad1happy2go : yeah Glue 3.0 version since beginning -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8017: [SUPPORT] Parquet file size is small after running deltastreamer in BULK_INSERT which results in large number of files under same partitioning
ad1happy2go commented on issue #8017: URL: https://github.com/apache/hudi/issues/8017#issuecomment-1503852321 @ROOBALJINDAL Are you still facing this issue? If yes, can you provide reproducible script if possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jonvex commented on a diff in pull request #7881: [HUDI-5723] Automate and standardize enum configs
jonvex commented on code in PR #7881: URL: https://github.com/apache/hudi/pull/7881#discussion_r1163153414 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -194,16 +187,18 @@ public class HoodieWriteConfig extends HoodieConfig { public static final ConfigProperty TIMELINE_LAYOUT_VERSION_NUM = ConfigProperty .key("hoodie.timeline.layout.version") - .defaultValue(Integer.toString(TimelineLayoutVersion.VERSION_1)) + .defaultValue(Integer.toString(TimelineLayoutVersion.CURR_VERSION)) + .withValidValues(Integer.toString(TimelineLayoutVersion.VERSION_0),Integer.toString(TimelineLayoutVersion.VERSION_1)) .sinceVersion("0.5.1") .withDocumentation("Controls the layout of the timeline. Version 0 relied on renames, Version 1 (default) models " + "the timeline as an immutable log relying only on atomic writes for object storage."); public static final ConfigProperty BASE_FILE_FORMAT = ConfigProperty .key("hoodie.table.base.file.format") .defaultValue(HoodieFileFormat.PARQUET) - .withAlternatives("hoodie.table.ro.file.format") - .withDocumentation("Base file format to store all the base file data."); + .withValidValues(HoodieFileFormat.PARQUET.name(), HoodieFileFormat.ORC.name(), HoodieFileFormat.HFILE.name()) + .withDocumentation(HoodieFileFormat.class, "File format to store all the base file data.") Review Comment: That one doesn't work because the enum has HOODIE_LOG as a value -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8428: [SUPPORT]: When trying to UPSERT, Getting issues like : An error occurred while calling o168.save. org/apache/spark/sql/avro/SchemaConverters$
ad1happy2go commented on issue #8428: URL: https://github.com/apache/hudi/issues/8428#issuecomment-1503843944 @Madan16 I wanted to ask were you using AWS Glue version : Glue 3.0 only from start. (When the job is successful) My guess is somehow the version mismatch might be happening which is resulting in ClassNot Found for SchemaConverters which is not present in older avro versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] rahil-c commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2
rahil-c commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1503840585 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8430: [HUDI-6060] Added a config to backup instants before deletion during rollbacks and restores.
hudi-bot commented on PR #8430: URL: https://github.com/apache/hudi/pull/8430#issuecomment-1503828676 ## CI report: * d357330a200b9c5ad7f719d9985d40ef2e604d51 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16263) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zyclove commented on issue #8244: [SUPPORT] Is there any plan to support metadata management, index and table optimization services?
zyclove commented on issue #8244: URL: https://github.com/apache/hudi/issues/8244#issuecomment-1503828199 Do you have contact with the arctic project team? As the arctic project very much hopes to support hudi metadata management and data optimization. > @zyclove Do you need any other help as part of this ticket or can we close the same? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8130: Spark java.util.NoSuchElementException: FileID * partition path p_c=CN_1 does not exist.
ad1happy2go commented on issue #8130: URL: https://github.com/apache/hudi/issues/8130#issuecomment-1503826578 @18511327133 Couldn't able to reproduce the issue. Can you provide exact reproducible script with your datasets. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8430: [HUDI-6060] Added a config to backup instants before deletion during rollbacks and restores.
hudi-bot commented on PR #8430: URL: https://github.com/apache/hudi/pull/8430#issuecomment-1503818107 ## CI report: * d357330a200b9c5ad7f719d9985d40ef2e604d51 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8244: [SUPPORT] Is there any plan to support metadata management, index and table optimization services?
ad1happy2go commented on issue #8244: URL: https://github.com/apache/hudi/issues/8244#issuecomment-1503814032 @zyclove Do you need any other help as part of this ticket or can we close the same? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8236: [SUPPORT]Duplicate data in MOR table Hudi
ad1happy2go commented on issue #8236: URL: https://github.com/apache/hudi/issues/8236#issuecomment-1503813179 @xiagupqin Can you please let us know if you got this issue again with the fix or disabling metadata. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6060) Add config to backup instants before deletion during rollbacks
[ https://issues.apache.org/jira/browse/HUDI-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6060: - Labels: pull-request-available (was: ) > Add config to backup instants before deletion during rollbacks > -- > > Key: HUDI-6060 > URL: https://issues.apache.org/jira/browse/HUDI-6060 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Prashant Wason >Assignee: Prashant Wason >Priority: Minor > Labels: pull-request-available > > When rollbacks / restores are performed, instants are deleted from the > .hoodie folder. Keeping a copy of such instants is useful for debugging > issues like the following: > # File left over without any commits > # Bugs which leave files during the failed commit which is rolled back > The implementation provides a config (off by default) which when enabled > would backup the instants before deletion to a backup directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] prashantwason opened a new pull request, #8430: [HUDI-6060] Added a config to backup instants before deletion during rollbacks and restores.
prashantwason opened a new pull request, #8430: URL: https://github.com/apache/hudi/pull/8430 [HUDI-6060] Added a config to backup instants before deletion during rollbacks and restores. ### Change Logs 1. Added config to enable backing up instants 2. Added config for location of backup directory 3. Added code to backup ### Impact None. New feature is off by default. ### Risk level (write none, low medium or high below) None ### Documentation Update Config has the necessary docstring. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8355: [HUDI-6016] HoodieCLIUtils supports creating HoodieClient with non-default database
hudi-bot commented on PR #8355: URL: https://github.com/apache/hudi/pull/8355#issuecomment-1503798063 ## CI report: * 61a2efa806a9004721966f885f84d95f1b882dbd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16258) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-6060) Add config to backup instants before deletion during rollbacks
Prashant Wason created HUDI-6060: Summary: Add config to backup instants before deletion during rollbacks Key: HUDI-6060 URL: https://issues.apache.org/jira/browse/HUDI-6060 Project: Apache Hudi Issue Type: Improvement Reporter: Prashant Wason Assignee: Prashant Wason When rollbacks / restores are performed, instants are deleted from the .hoodie folder. Keeping a copy of such instants is useful for debugging issues like the following: # File left over without any commits # Bugs which leave files during the failed commit which is rolled back The implementation provides a config (off by default) which when enabled would backup the instants before deletion to a backup directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11
hudi-bot commented on PR #8429: URL: https://github.com/apache/hudi/pull/8429#issuecomment-1503736037 ## CI report: * 18f438577f444c75e8060a20b7fdf59e40e9ab7e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16262) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #8429: [HUDI-5975] Release 0.12.3 rc2 prep apr11
hudi-bot commented on PR #8429: URL: https://github.com/apache/hudi/pull/8429#issuecomment-1503717961 ## CI report: * 18f438577f444c75e8060a20b7fdf59e40e9ab7e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Madan16 commented on issue #8428: [SUPPORT]: When trying to UPSERT, Getting issues like : An error occurred while calling o168.save. org/apache/spark/sql/avro/SchemaConverters$ AND
Madan16 commented on issue #8428: URL: https://github.com/apache/hudi/issues/8428#issuecomment-1503690930 > @Madan16 Looks like the avro library mismatch issue, as you are saying this is running fine for 2 months do you know if any aws lib or any other updated recently. @ad1happy2go : Sorry but I could not understand your question. Can you please be more specific so that I can provide more details. Note***: I am using this (pyspark) code in AWS glue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org