Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2088019252 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN * da5bbcce94223f796d6e40c2a20daeff43794993 Azure:

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-04-30 Thread via GitHub
danny0405 commented on code in PR #11124: URL: https://github.com/apache/hudi/pull/11124#discussion_r1585900571 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -554,8 +554,7 @@ protected void postCommit(HoodieTable table,

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087986187 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN * 86f618e91d63ed5da3b16dbe5e71c00e5546e8cb Azure:

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11124: URL: https://github.com/apache/hudi/pull/11124#issuecomment-2087986151 ## CI report: * 33909835f589e444771c8c9c6e5bdec15785e397 UNKNOWN * 13d4b2235ffd4671b6573996b0f7ac3052226ad0 Azure:

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2087985985 ## CI report: * 22f01c9e071a9f92747f4af966c9f63056c7216d UNKNOWN * de51f5efb052c32725b5eeb97773133d8c98498f Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087981370 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN * 86f618e91d63ed5da3b16dbe5e71c00e5546e8cb Azure:

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2087981153 ## CI report: * 22f01c9e071a9f92747f4af966c9f63056c7216d UNKNOWN * de51f5efb052c32725b5eeb97773133d8c98498f Azure:

Re: [I] [SUPPORT] The Hive run_sync_tool's Logged Command & The Actual Command Do Not Match [hudi]

2024-04-30 Thread via GitHub
ad1happy2go commented on issue #11029: URL: https://github.com/apache/hudi/issues/11029#issuecomment-2087977490 @samserpoosh Were you able to work on this PR. Do let us know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Recovering job from checkpoint, reporting NoSuchElementException and data exception [hudi]

2024-04-30 Thread via GitHub
ad1happy2go commented on issue #11023: URL: https://github.com/apache/hudi/issues/11023#issuecomment-2087976183 @jack1234smith Did you able to figure out the issue here? Please let us know in case you still need help. -- This is an automated message from the Apache Git Service. To

Re: [I] [SUPPORT]Data Loss Issue with Hudi Table After 3 Days of Continuous Writes [hudi]

2024-04-30 Thread via GitHub
ad1happy2go commented on issue #11016: URL: https://github.com/apache/hudi/issues/11016#issuecomment-2087975711 @juice411 Do you have any other help on this. Please let us know if you are good. Thanks. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] [SUPPORT] Spark job relying over Hudi are blocked after one or zero commit [hudi]

2024-04-30 Thread via GitHub
ad1happy2go commented on issue #11011: URL: https://github.com/apache/hudi/issues/11011#issuecomment-2087975030 @pontisa95 Were you able to get it resolved? If yes, Please let us know the issue and resolution or let us know in case you still need help here. -- This is an automated

Re: [I] [SUPPORT] can't retrieve original partition column value when exacting date with CustomKeyGenerator [hudi]

2024-04-30 Thread via GitHub
ad1happy2go commented on issue #11002: URL: https://github.com/apache/hudi/issues/11002#issuecomment-2087971292 @liangchen-datanerd That's the good suggestion. Created a tracking JIRA too - https://issues.apache.org/jira/browse/HUDI-7698 We can think of introducing the reader side

[jira] [Created] (HUDI-7698) Introduce config to Return the original partition value from parquet when using CustomKeyGenerator

2024-04-30 Thread Aditya Goenka (Jira)
Aditya Goenka created HUDI-7698: --- Summary: Introduce config to Return the original partition value from parquet when using CustomKeyGenerator Key: HUDI-7698 URL: https://issues.apache.org/jira/browse/HUDI-7698

[jira] [Created] (HUDI-7697) Add branch protection in GitHub

2024-04-30 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7697: --- Summary: Add branch protection in GitHub Key: HUDI-7697 URL: https://issues.apache.org/jira/browse/HUDI-7697 Project: Apache Hudi Issue Type: Improvement

[jira] [Updated] (HUDI-7697) Add branch protection in GitHub

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7697: Description: Only allow PR merging when all CI pass. > Add branch protection in GitHub >

[jira] [Assigned] (HUDI-7473) Rebalance CI

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7473: --- Assignee: Ethan Guo > Rebalance CI > > > Key: HUDI-7473 >

[jira] [Updated] (HUDI-7473) Rebalance CI

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7473: Epic Link: HUDI-4302 > Rebalance CI > > > Key: HUDI-7473 >

[jira] [Closed] (HUDI-7473) Rebalance CI

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7473. --- Resolution: Fixed > Rebalance CI > > > Key: HUDI-7473 > URL:

[jira] [Updated] (HUDI-7473) Rebalance CI

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7473: Fix Version/s: 0.15.0 1.0.0 > Rebalance CI > > > Key:

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11124: URL: https://github.com/apache/hudi/pull/11124#issuecomment-2087946878 ## CI report: * 33909835f589e444771c8c9c6e5bdec15785e397 UNKNOWN * d18ce474faa16547a8969cd56f67dfed5b80891a Azure:

Re: [PR] [HUDI-4372] Enable matadata table by default for flink [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11124: URL: https://github.com/apache/hudi/pull/11124#issuecomment-2087942409 ## CI report: * 33909835f589e444771c8c9c6e5bdec15785e397 UNKNOWN * d18ce474faa16547a8969cd56f67dfed5b80891a Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087938312 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * b1598f5861c2b90da91ad33dc360533728ef7163 UNKNOWN * 86f618e91d63ed5da3b16dbe5e71c00e5546e8cb Azure:

[jira] [Updated] (HUDI-6712) Implement optimized keyed lookup on parquet files

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-6712: - Status: Open (was: Patch Available) > Implement optimized keyed lookup on parquet files >

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087911762 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * 07c2396a64d505633ac103cf2bcd4c6dc992fb81 Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087906539 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * 07c2396a64d505633ac103cf2bcd4c6dc992fb81 Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087901993 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * 07c2396a64d505633ac103cf2bcd4c6dc992fb81 Azure:

(hudi) branch master updated (a29fe277df8 -> f553ba25fe3)

2024-04-30 Thread codope
This is an automated email from the ASF dual-hosted git repository. codope pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from a29fe277df8 [HUDI-7694] Unify bijection-avro dependency version (#11132) add f553ba25fe3 [HUDI-7144] Build

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
codope merged PR #10352: URL: https://github.com/apache/hudi/pull/10352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Updated] (HUDI-6700) Archiving should be time based, not this min-max and not per instant. Lets treat it like a log (Phase 2)

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-6700: - Status: Open (was: In Progress) > Archiving should be time based, not this min-max and not per

(hudi) branch asf-site updated: [DOCS] Add Daft read example (#11133)

2024-04-30 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 6c29efa83d9 [DOCS] Add Daft read example

Re: [PR] [DOCS] Add Daft read example [hudi]

2024-04-30 Thread via GitHub
xushiyan merged PR #11133: URL: https://github.com/apache/hudi/pull/11133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[PR] [DOCS] Add Daft read example [hudi]

2024-04-30 Thread via GitHub
xushiyan opened a new pull request, #11133: URL: https://github.com/apache/hudi/pull/11133 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087861615 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * 07c2396a64d505633ac103cf2bcd4c6dc992fb81 Azure:

[jira] [Updated] (HUDI-4372) Enable matadata table by default for flink

2024-04-30 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-4372: - Sprint: Sprint 2023-04-26 > Enable matadata table by default for flink >

[jira] [Updated] (HUDI-4372) Enable matadata table by default for flink

2024-04-30 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-4372: - Status: In Progress (was: Reopened) > Enable matadata table by default for flink >

(hudi) branch master updated (f99b181a04e -> a29fe277df8)

2024-04-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from f99b181a04e [HUDI-7588] Replace hadoop Configuration with StorageConfiguration in meta client (#11071) add

Re: [PR] [HUDI-7694] Unify bijection-avro dependency version [hudi]

2024-04-30 Thread via GitHub
yihua merged PR #11132: URL: https://github.com/apache/hudi/pull/11132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1585778411 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestPartitionStatsIndexWithSql.scala: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1585777410 ## hudi-common/src/main/java/org/apache/hudi/common/util/BaseFileUtils.java: ## @@ -67,6 +70,61 @@ public static BaseFileUtils getInstance(HoodieFileFormat fileFormat)

[jira] [Created] (HUDI-7696) Consolidate convertFilesToPartitionStatsRecords and convertMetadataToPartitionStatsRecords

2024-04-30 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-7696: - Summary: Consolidate convertFilesToPartitionStatsRecords and convertMetadataToPartitionStatsRecords Key: HUDI-7696 URL: https://issues.apache.org/jira/browse/HUDI-7696

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1585774319 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1872,4 +1883,175 @@ public HoodieRecord next() { } }; } + +

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
codope commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1585773581 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1872,4 +1883,175 @@ public HoodieRecord next() { } }; } + +

Re: [PR] [HUDI-7694] Unify bijection-avro dependency version [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11132: URL: https://github.com/apache/hudi/pull/11132#issuecomment-2087801564 ## CI report: * f5f72a318977302fc3828831c150f41690e2504c Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087795216 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * 695095976531b603d8d5712a8acc163eb1824f9b Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087789246 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * 695095976531b603d8d5712a8acc163eb1824f9b Azure:

Re: [PR] [HUDI-7694] Unify bijection-avro dependency version [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11132: URL: https://github.com/apache/hudi/pull/11132#issuecomment-2087783819 ## CI report: * f5f72a318977302fc3828831c150f41690e2504c Azure:

[jira] [Updated] (HUDI-7694) Unify bijection-avro dependency version

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7694: Status: Patch Available (was: In Progress) > Unify bijection-avro dependency version >

[jira] [Updated] (HUDI-7695) Add docs on Spark 3.5 and Scala 2.13

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7695: Status: In Progress (was: Open) > Add docs on Spark 3.5 and Scala 2.13 >

[jira] [Updated] (HUDI-7695) Add docs on Spark 3.5 and Scala 2.13

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7695: Fix Version/s: 0.15.0 1.0.0 > Add docs on Spark 3.5 and Scala 2.13 >

[jira] [Created] (HUDI-7695) Add docs on Spark 3.5 and Scala 2.13

2024-04-30 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7695: --- Summary: Add docs on Spark 3.5 and Scala 2.13 Key: HUDI-7695 URL: https://issues.apache.org/jira/browse/HUDI-7695 Project: Apache Hudi Issue Type: Improvement

[jira] [Assigned] (HUDI-7695) Add docs on Spark 3.5 and Scala 2.13

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7695: --- Assignee: Ethan Guo > Add docs on Spark 3.5 and Scala 2.13 > >

Re: [PR] [HUDI-7694] Unify bijection-avro dependency version [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11132: URL: https://github.com/apache/hudi/pull/11132#issuecomment-2087744917 ## CI report: * f5f72a318977302fc3828831c150f41690e2504c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7587] Make bundle dependencies for storage abstraction in correct order [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11131: URL: https://github.com/apache/hudi/pull/11131#issuecomment-2087738011 ## CI report: * 70e6f707c00ef7c84047c445a5c3be8b8aae2c75 Azure:

[jira] [Updated] (HUDI-7694) Unify bijection-avro dependency version

2024-04-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7694: - Labels: pull-request-available (was: ) > Unify bijection-avro dependency version >

[PR] [HUDI-7694] Unify bijection-avro dependency version [hudi]

2024-04-30 Thread via GitHub
yihua opened a new pull request, #11132: URL: https://github.com/apache/hudi/pull/11132 ### Change Logs This PR unifies `bijection-avro` dependency version in the repo and upgrades the dependency version in `hudi-integ-test-bundle` (there is no reason to use a different version).

[jira] [Assigned] (HUDI-7694) Unify bijection-avro dependency version

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7694: --- Assignee: Ethan Guo > Unify bijection-avro dependency version >

[jira] [Updated] (HUDI-7694) Unify bijection-avro dependency version

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7694: Fix Version/s: 0.15.0 1.0.0 > Unify bijection-avro dependency version >

[jira] [Updated] (HUDI-7694) Unify bijection-avro dependency version

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7694: Status: In Progress (was: Open) > Unify bijection-avro dependency version >

[jira] [Created] (HUDI-7694) Unify bijection-avro dependency version

2024-04-30 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7694: --- Summary: Unify bijection-avro dependency version Key: HUDI-7694 URL: https://issues.apache.org/jira/browse/HUDI-7694 Project: Apache Hudi Issue Type: Improvement

[jira] [Updated] (HUDI-7694) Unify bijection-avro dependency version

2024-04-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7694: Story Points: 0.5 > Unify bijection-avro dependency version > --- > >

Re: [PR] [HUDI-7587] Make bundle dependencies for storage abstraction in correct order [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11131: URL: https://github.com/apache/hudi/pull/11131#issuecomment-2087732209 ## CI report: * 70e6f707c00ef7c84047c445a5c3be8b8aae2c75 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

[jira] [Updated] (HUDI-7587) Move hadoop-dependent reader and writer implementation to hudi-hadoop-common module

2024-04-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7587: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Move hadoop-dependent

[PR] [HUDI-7587] Make bundle dependencies for storage abstraction in correct order [hudi]

2024-04-30 Thread via GitHub
jonvex opened a new pull request, #11131: URL: https://github.com/apache/hudi/pull/11131 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087675008 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * 695095976531b603d8d5712a8acc163eb1824f9b Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087657586 ## CI report: * edf2bf30a2ddbd48db9452f34b1ac716bd2ebe18 UNKNOWN * a8997fbab4049f052cbd1fe216a8cb5fe375c5d1 Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087518121 ## CI report: * 2ea5169f6e25c154748401c49ffd7d3177c50660 Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087489196 ## CI report: * 2ea5169f6e25c154748401c49ffd7d3177c50660 Azure:

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842466#comment-17842466 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 9:26 PM: --- h2. [WIP]

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842466#comment-17842466 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 9:01 PM: --- h2. [WIP]

[jira] [Updated] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1045: - Description: h4. We need to allow a writer w writing to file groups f1, f2, f3, concurrently

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842465#comment-17842465 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 8:27 PM: --- h2.  [WIP]

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842465#comment-17842465 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 8:27 PM: --- h2.  [WIP]

Re: [I] [SUPPORT] java.lang.OutOfMemoryError: Requested array size exceeds VM limit on data ingestion to COW table [hudi]

2024-04-30 Thread via GitHub
TarunMootala commented on issue #11122: URL: https://github.com/apache/hudi/issues/11122#issuecomment-2087065842 `.hoodie/` fold is 350 MB and it has 3435 files (this includes active and archival timelines) `.hoodie/archived/` is 327 MB and it has 695 files (only archival timelines)

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2087010994 ## CI report: * 2ea5169f6e25c154748401c49ffd7d3177c50660 Azure:

Re: [PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11130: URL: https://github.com/apache/hudi/pull/11130#issuecomment-2086970922 ## CI report: * 2ea5169f6e25c154748401c49ffd7d3177c50660 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

[PR] [HUDI-6296] Add Scala 2.13 support for Spark 3.5 integration [hudi]

2024-04-30 Thread via GitHub
yihua opened a new pull request, #11130: URL: https://github.com/apache/hudi/pull/11130 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance

[jira] [Updated] (HUDI-6296) Add Scala 2.13 build profile to support scala 2.13

2024-04-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6296: - Labels: pull-request-available (was: ) > Add Scala 2.13 build profile to support scala 2.13 >

Re: [PR] [HUDI-7146] Implement secondary index [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11129: URL: https://github.com/apache/hudi/pull/11129#issuecomment-2086577705 ## CI report: * 0274004b842a332f57c1104de44e4e262ff2942d Azure:

Re: [I] [SUPPORT] Hudi MOR high latency on data availability [hudi]

2024-04-30 Thread via GitHub
sgcisco commented on issue #8: URL: https://github.com/apache/hudi/issues/8#issuecomment-2086534021 @ad1happy2go thanks for your reply. We tried `compact num.delta commits as 1` in one of the tests for other runs and in what try to use now it is a default value which is 5.

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842465#comment-17842465 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 6:32 PM: --- h2.  [WIP]

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842465#comment-17842465 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 6:32 PM: --- h2.  [WIP]

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842465#comment-17842465 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 6:20 PM: --- h2.  [WIP]

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
yihua commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1585289704 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestPartitionStatsIndexWithSql.scala: ## @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2086335812 ## CI report: * c98242b22fb2518c0cc93c037df558037030500f UNKNOWN * 9d1ac2a1bd9f2343174a0273437e7a240294eee4 Azure:

Re: [PR] [HUDI-7146] Implement secondary index [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11129: URL: https://github.com/apache/hudi/pull/11129#issuecomment-2086299511 ## CI report: * 0274004b842a332f57c1104de44e4e262ff2942d Azure:

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842465#comment-17842465 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 5:56 PM: --- h2.  [WIP]

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842466#comment-17842466 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 5:56 PM: --- h2. [WIP]

[jira] [Commented] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842466#comment-17842466 ] Vinoth Chandar commented on HUDI-1045: -- [WIP] Approach 2 : Introduce pointer data blocks into storage

[jira] [Commented] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842465#comment-17842465 ] Vinoth Chandar commented on HUDI-1045: -- h3.  [WIP] Approach 1 :  Redistribute records from the

[jira] [Comment Edited] (HUDI-1045) Support updates during clustering

2024-04-30 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841372#comment-17841372 ] Vinoth Chandar edited comment on HUDI-1045 at 4/30/24 5:54 PM: --- At first it

Re: [PR] [HUDI-7146] Implement secondary index [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #11129: URL: https://github.com/apache/hudi/pull/11129#issuecomment-2086259091 ## CI report: * 0274004b842a332f57c1104de44e4e262ff2942d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2086257066 ## CI report: * 879e07c167692250636215e06e67b6c370496c03 Azure:

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
yihua commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1585254463 ## hudi-common/src/main/java/org/apache/hudi/common/util/BaseFileUtils.java: ## @@ -67,6 +70,61 @@ public static BaseFileUtils getInstance(HoodieFileFormat fileFormat) {

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
yihua commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1585250523 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1872,4 +1883,175 @@ public HoodieRecord next() { } }; } + +

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
yihua commented on code in PR #10352: URL: https://github.com/apache/hudi/pull/10352#discussion_r1585240628 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -1872,4 +1883,175 @@ public HoodieRecord next() { } }; } + +

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2086028226 ## CI report: * 879e07c167692250636215e06e67b6c370496c03 Azure:

[PR] [HUDI-7146] Implement secondary index [hudi]

2024-04-30 Thread via GitHub
codope opened a new pull request, #11129: URL: https://github.com/apache/hudi/pull/11129 ### Change Logs This PR is stacked on #11077. Main changes done here: - New index type added in `MetadataPartitionType` - Initialization of the new index in

Re: [PR] [HUDI-7567] Add schema evolution to the filegroup reader [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #10957: URL: https://github.com/apache/hudi/pull/10957#issuecomment-2086001550 ## CI report: * 879e07c167692250636215e06e67b6c370496c03 Azure:

[jira] [Created] (HUDI-7693) Allow Vectorized Reading for bootstrap in the new fg reader under some conditions

2024-04-30 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7693: - Summary: Allow Vectorized Reading for bootstrap in the new fg reader under some conditions Key: HUDI-7693 URL: https://issues.apache.org/jira/browse/HUDI-7693

Re: [PR] [HUDI-7144] Build storage partition stats index and use it for data skipping [hudi]

2024-04-30 Thread via GitHub
hudi-bot commented on PR #10352: URL: https://github.com/apache/hudi/pull/10352#issuecomment-2085972850 ## CI report: * f63dbe172cf8dec2603c266396fb7d31d5cb7f60 Azure:

  1   2   >