[GitHub] [hudi] SteNicholas closed pull request #7928: [HUDI-5772] Align Flink clustering configuration with HoodieClusteringConfig
SteNicholas closed pull request #7928: [HUDI-5772] Align Flink clustering configuration with HoodieClusteringConfig URL: https://github.com/apache/hudi/pull/7928 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope opened a new pull request, #7929: [DOCS] [WIP] Add new sources to deltastreamer docs
codope opened a new pull request, #7929: URL: https://github.com/apache/hudi/pull/7929 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7928: [HUDI-5772] Align Flink clustering configuration with HoodieClusteringConfig
hudi-bot commented on PR #7928: URL: https://github.com/apache/hudi/pull/7928#issuecomment-1427480348 ## CI report: * 82b52107672f324918988ef7b9b914fe992202df UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig
[ https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5772: - Labels: pull-request-available (was: ) > Align Flink clustering configuration with HoodieClusteringConfig > > > Key: HUDI-5772 > URL: https://issues.apache.org/jira/browse/HUDI-5772 > Project: Apache Hudi > Issue Type: Bug > Components: flink >Affects Versions: 0.13.1 >Reporter: Nicholas Jiang >Assignee: Nicholas Jiang >Priority: Major > Labels: pull-request-available > > In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are > 'clustering.plan.strategy.cluster.begin.partition', > 'clustering.plan.strategy.cluster.end.partition', > 'clustering.plan.strategy.partition.regex.pattern', > 'clustering.plan.strategy.partition.selected' options which do not align the > clustering configuration of HoodieClusteringConfig. FlinkOptions, > FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering > configuration with HoodieClusteringConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] SteNicholas opened a new pull request, #7928: [HUDI-5772] Align Flink clustering configuration with HoodieClusteringConfig
SteNicholas opened a new pull request, #7928: URL: https://github.com/apache/hudi/pull/7928 ### Change Logs In `FlinkOptions`, `FlinkClusteringConfig` and `FlinkStreamerConfig`, there are `clustering.plan.strategy.cluster.begin.partition`, `clustering.plan.strategy.cluster.end.partition`, `clustering.plan.strategy.partition.regex.pattern`, `clustering.plan.strategy.partition.selected` options which do not align the clustering configuration of `HoodieClusteringConfig`. `FlinkOptions`, `FlinkClusteringConfig` and `FlinkStreamerConfig` should align Flink clustering configuration with `HoodieClusteringConfig`. ### Impact Align Flink clustering configuration with `HoodieClusteringConfig` in `FlinkOptions`, `FlinkClusteringConfig` and `FlinkStreamerConfig`. ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [x] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log
hudi-bot commented on PR #7915: URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427469490 ## CI report: * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN * 52ff32a1bb04340505e309191c398d95a9c8f928 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15127) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6121: [HUDI-4406] Support Flink compaction/clustering write error resolvement to avoid data loss
hudi-bot commented on PR #6121: URL: https://github.com/apache/hudi/pull/6121#issuecomment-1427466320 ## CI report: * 52b6f55e196007f993b0506d899c48bb80b36546 UNKNOWN * 5dc463fcade7c5a495cca1437fca8230b01d0229 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15126) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig
Nicholas Jiang created HUDI-5772: Summary: Align Flink clustering configuration with HoodieClusteringConfig Key: HUDI-5772 URL: https://issues.apache.org/jira/browse/HUDI-5772 Project: Apache Hudi Issue Type: Bug Components: flink Affects Versions: 0.13.1 Reporter: Nicholas Jiang Assignee: Nicholas Jiang In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 'clustering.plan.strategy.cluster.begin.partition', 'clustering.plan.strategy.cluster.end.partition', 'clustering.plan.strategy.partition.regex.pattern', 'clustering.plan.strategy.partition.selected' options which do not align the clustering configuration of HoodieClusteringConfig. FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering configuration with HoodieClusteringConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
svn commit: r60068 - in /dev/hudi/hudi-0.13.0-rc3: ./ hudi-0.13.0-rc3.src.tgz hudi-0.13.0-rc3.src.tgz.asc hudi-0.13.0-rc3.src.tgz.sha512
Author: yihua Date: Mon Feb 13 06:45:46 2023 New Revision: 60068 Log: Add Apache Hudi 0.13.0 RC3 source release Added: dev/hudi/hudi-0.13.0-rc3/ dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz (with props) dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.asc dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.sha512 Added: dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz == Binary file - no diff available. Propchange: dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz -- svn:mime-type = application/octet-stream Added: dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.asc == --- dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.asc (added) +++ dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.asc Mon Feb 13 06:45:46 2023 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEEiIqTQeYA64VQqs1e+xt1BPf3cMkFAmPp1+cACgkQ+xt1BPf3 +cMn8dRAAm3le+qkP49Qnwi/t5qDWvgfUALXRH9KlUU9Efo4ChCHnuTBgmNmcjvJ/ +af2FBuxeMfg5GRbgm0bkHhYpx58CcjWPdi8zGLiL+ih5fBwqvbLZGVM/jpHtrmur +dAoZX5Sq5MLtf8vigzAT9GfHD36g43dtWWBoYCGzfUBGi2ZETNnEAkbGF5M3lkxh +1R9ysXk9u79Cm1UkC4HDDozDdj+U51XegyGYf+2QrGqCVeIZ69JrfF6vlIsr0Jl4 +Wj6T4ZURANjBhpA2n87r2DZhjCLobMgnQZiB1Va52U4Z6Ocu2s6Nc47nI+piLenF +JFWj5YyFR+AzWqzTPRvj8U1CguD3bHkZfFS3ioOllkvtRh+BCGO8HXkgnmzVbv67 +RedUHBfTVdp/4PKWlg2dptLpSNzRwDFYjcyYP3yeMIQ7BfpOHPJ/Vdp/udM2+lRt +h9+tAagSeU1nxVNxj7fgzQBVtcpsmHA0uRz1YzCco8jmSWNG7evtGU9vwYShIf0m +LurVV3SexbK9iLhS2H2pNiuhAxvpEc3BqmaBA8KghdmjmrZmq13VSWuZiSDj8qtM +v3S/F3J8ifVbIgbF5oXLiuZ++untmVrqnKDghYMPIy3/5GQ4XSG2ueSNG7Hz0PYV +veoPUUcPs6aJeP2EqYYen9amSkn3fwC5bWMVBneosusdLpZLr0g= +=n5Le +-END PGP SIGNATURE- Added: dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.sha512 == --- dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.sha512 (added) +++ dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.sha512 Mon Feb 13 06:45:46 2023 @@ -0,0 +1 @@ +c725ce843c5800483b69098cda7a0f2380b0f8e502441335d4c613cf023202c93b6cdb4922983c029945f7af1dd1e2fa8bde58fcade87921fdff9f84f57d5559 hudi-0.13.0-rc3.src.tgz
[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits
hudi-bot commented on PR #7633: URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427424545 ## CI report: * 50480623485bb99353655f4c6df23a2462214f7f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15123) * d8560fd11027818c5f2a218deeae3b68a6fa6420 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15130) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427419585 ## CI report: * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15124) * 0f35441097e274abe020127c5bd2a5f3d46e0b99 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15129) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits
hudi-bot commented on PR #7633: URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427419115 ## CI report: * 50480623485bb99353655f4c6df23a2462214f7f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15123) * d8560fd11027818c5f2a218deeae3b68a6fa6420 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] annotated tag release-0.13.0-rc3 updated (fe664886029 -> 91c28298a13)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to annotated tag release-0.13.0-rc3 in repository https://gitbox.apache.org/repos/asf/hudi.git *** WARNING: tag release-0.13.0-rc3 was modified! *** from fe664886029 (commit) to 91c28298a13 (tag) tagging fe664886029657eb2c2c303be18aaf1c598a7181 (commit) replaces release-0.13.0-rc2 by Y Ethan Guo on Sun Feb 12 22:24:16 2023 -0800 - Log - 0.13.0 -BEGIN PGP SIGNATURE- iQIzBAABCAAdFiEEiIqTQeYA64VQqs1e+xt1BPf3cMkFAmPp15AACgkQ+xt1BPf3 cMllnA/+MQyKJAb9An3mmdor5jOQ9ObhkvMZVUASCHC00HkpWhRNXtKt48hXgZJ4 gzuWPI0/B5uze5JD1M9+gHXHhcvPrj2FctTMcHbFkwr1ZlMj2ulrDj1zyLR9wSqG +8VU6w92GyURtHO9lmzCvplY1NeHr7SUOy9mIT3tsAVB9JLwLh2R0Rtd6iD3zhnq 8sDcvz0A7QJfPRzbKI3h9368FbtQM9z27+xEwaeGfqRyMDMfJ4VVUQCV79a9jVUi 5G77JsArrasjsqbAzmFkzAYC671hNNe615TA8WHExb6nzJFuMbzihajo4U2gz4K/ L2777N/DTKRLDSLcQzOinNe5kZXdAOgnDQBNlNZ/J6dvfNFU56gU9FNn3QaO9N5c OVXT0C4yOvbh12iqnwo8wTOz4qMwauyATPqqo28liglIpNrXN0VuKFFl3ZizS7Sf ykdD2XEDiUhzL1Rsclr9LI9pK0JUpem3SOT0mCBjOA9MUv+sp8E99XpnmnGmrUUQ 4YHpOlXYjPQFuMP0qIeKL8ThsUlbAERw9ccmVGR1ik0IHss5Cejn1r04IIIeHOhn oWTlk6raua2J+1d6T+ZFWAZUgEk+QXeIX6LsnC/vgcKvCQ5lax3GrZA775WOciUs 9UiUt1gTmm6+bstKPfSrngnovX94X0Zs86gsUDE2RET4EyVcBY8= =8GtY -END PGP SIGNATURE- --- No new revisions were added by this update. Summary of changes:
[hudi] 05/05: Bumping release candidate number 3
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch release-0.13.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit fe664886029657eb2c2c303be18aaf1c598a7181 Author: Y Ethan Guo AuthorDate: Sun Feb 12 22:22:38 2023 -0800 Bumping release candidate number 3 --- docker/hoodie/hadoop/base/pom.xml| 2 +- docker/hoodie/hadoop/base_java11/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml| 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml| 2 +- docker/hoodie/hadoop/pom.xml | 2 +- docker/hoodie/hadoop/prestobase/pom.xml | 2 +- docker/hoodie/hadoop/spark_base/pom.xml | 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml | 2 +- docker/hoodie/hadoop/sparkworker/pom.xml | 2 +- docker/hoodie/hadoop/trinobase/pom.xml | 2 +- docker/hoodie/hadoop/trinocoordinator/pom.xml| 2 +- docker/hoodie/hadoop/trinoworker/pom.xml | 2 +- hudi-aws/pom.xml | 4 ++-- hudi-cli/pom.xml | 2 +- hudi-client/hudi-client-common/pom.xml | 4 ++-- hudi-client/hudi-flink-client/pom.xml| 4 ++-- hudi-client/hudi-java-client/pom.xml | 4 ++-- hudi-client/hudi-spark-client/pom.xml| 4 ++-- hudi-client/pom.xml | 2 +- hudi-common/pom.xml | 2 +- hudi-examples/hudi-examples-common/pom.xml | 2 +- hudi-examples/hudi-examples-flink/pom.xml| 2 +- hudi-examples/hudi-examples-java/pom.xml | 2 +- hudi-examples/hudi-examples-spark/pom.xml| 2 +- hudi-examples/pom.xml| 2 +- hudi-flink-datasource/hudi-flink/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.13.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.14.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.15.x/pom.xml | 4 ++-- hudi-flink-datasource/hudi-flink1.16.x/pom.xml | 4 ++-- hudi-flink-datasource/pom.xml| 4 ++-- hudi-gcp/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml | 2 +- hudi-kafka-connect/pom.xml | 4 ++-- hudi-platform-service/hudi-metaserver/hudi-metaserver-client/pom.xml | 2 +- hudi-platform-service/hudi-metaserver/hudi-metaserver-server/pom.xml | 2 +- hudi-platform-service/hudi-metaserver/pom.xml| 4 ++-- hudi-platform-service/pom.xml| 2 +- hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark/pom.xml | 4 ++-- hudi-spark-datasource/hudi-spark2-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark2/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.1.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.2.x/pom.xml| 4 ++-- hudi-spark-datasource/hudi-spark3.2plus-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.3.x/pom.xml| 4 ++-- hudi-spark-datasource/pom.xml| 2 +- hudi-sync/hudi-adb-sync/pom.xml | 2 +- hudi-sync/hudi-datahub-sync/pom.xml | 2 +- hudi-sync/hudi-hive-sync/pom.xml | 2 +- hudi-sync/hudi-sync-common/pom.xml | 2 +- hudi-sync/pom.xml| 2 +- hudi-tests-common/pom.xml| 2 +- hudi-timeline-service/pom.xml| 2 +- hudi-utilities/pom.xml | 2 +-
[hudi] branch release-0.13.0 updated (820006e025a -> fe664886029)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch release-0.13.0 in repository https://gitbox.apache.org/repos/asf/hudi.git from 820006e025a [HUDI-5718] Unsupported Operation Exception for compaction (#7874) new 847e7a975bf [HUDI-5758] Restoring state of `HoodieKey` to make sure it's binary compatible w/ its state in 0.12 (#7917) new 7ccf6e67827 [HUDI-5768] Fix Spark Datasource read of metadata table (#7924) new d4106f35b4a [HUDI-5764] Rollback delta commits from `HoodieIndexer` lazily in metadata table (#7921) new 4254fc9f482 [HUDI-5771] Improve deploy script of release artifacts (#7927) new fe664886029 Bumping release candidate number 3 The 5 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: docker/hoodie/hadoop/base/pom.xml | 2 +- docker/hoodie/hadoop/base_java11/pom.xml | 2 +- docker/hoodie/hadoop/datanode/pom.xml | 2 +- docker/hoodie/hadoop/historyserver/pom.xml | 2 +- docker/hoodie/hadoop/hive_base/pom.xml | 2 +- docker/hoodie/hadoop/namenode/pom.xml | 2 +- docker/hoodie/hadoop/pom.xml | 2 +- docker/hoodie/hadoop/prestobase/pom.xml| 2 +- docker/hoodie/hadoop/spark_base/pom.xml| 2 +- docker/hoodie/hadoop/sparkadhoc/pom.xml| 2 +- docker/hoodie/hadoop/sparkmaster/pom.xml | 2 +- docker/hoodie/hadoop/sparkworker/pom.xml | 2 +- docker/hoodie/hadoop/trinobase/pom.xml | 2 +- docker/hoodie/hadoop/trinocoordinator/pom.xml | 2 +- docker/hoodie/hadoop/trinoworker/pom.xml | 2 +- hudi-aws/pom.xml | 4 +- hudi-cli/pom.xml | 2 +- hudi-client/hudi-client-common/pom.xml | 4 +- .../hudi/client/BaseHoodieTableServiceClient.java | 48 + .../apache/hudi/client/BaseHoodieWriteClient.java | 13 +++ .../metadata/HoodieBackedTableMetadataWriter.java | 34 --- .../java/org/apache/hudi/table/HoodieTable.java| 38 +++- .../table/action/index/RunIndexActionExecutor.java | 5 +- hudi-client/hudi-flink-client/pom.xml | 4 +- .../FlinkHoodieBackedTableMetadataWriter.java | 21 +++- .../org/apache/hudi/table/HoodieFlinkTable.java| 12 ++- hudi-client/hudi-java-client/pom.xml | 4 +- hudi-client/hudi-spark-client/pom.xml | 4 +- .../SparkHoodieBackedTableMetadataWriter.java | 20 +++- .../org/apache/hudi/table/HoodieSparkTable.java| 10 +- .../apache/spark/HoodieSparkKryoRegistrar.scala| 25 - hudi-client/pom.xml| 2 +- hudi-common/pom.xml| 2 +- .../org/apache/hudi/common/model/DeleteRecord.java | 9 ++ .../org/apache/hudi/common/model/HoodieKey.java| 28 ++ .../common/table/log/block/HoodieDeleteBlock.java | 2 + .../hudi/metadata/HoodieBackedTableMetadata.java | 12 ++- .../hudi/metadata/HoodieTableMetadataUtil.java | 20 hudi-examples/hudi-examples-common/pom.xml | 2 +- hudi-examples/hudi-examples-flink/pom.xml | 2 +- hudi-examples/hudi-examples-java/pom.xml | 2 +- hudi-examples/hudi-examples-spark/pom.xml | 2 +- hudi-examples/pom.xml | 2 +- hudi-flink-datasource/hudi-flink/pom.xml | 4 +- hudi-flink-datasource/hudi-flink1.13.x/pom.xml | 4 +- hudi-flink-datasource/hudi-flink1.14.x/pom.xml | 4 +- hudi-flink-datasource/hudi-flink1.15.x/pom.xml | 4 +- hudi-flink-datasource/hudi-flink1.16.x/pom.xml | 4 +- hudi-flink-datasource/pom.xml | 4 +- hudi-gcp/pom.xml | 2 +- hudi-hadoop-mr/pom.xml | 2 +- hudi-integ-test/pom.xml| 2 +- hudi-kafka-connect/pom.xml | 4 +- .../hudi-metaserver/hudi-metaserver-client/pom.xml | 2 +- .../hudi-metaserver/hudi-metaserver-server/pom.xml | 2 +- hudi-platform-service/hudi-metaserver/pom.xml | 4 +- hudi-platform-service/pom.xml | 2 +- hudi-spark-datasource/hudi-spark-common/pom.xml| 4 +- .../scala/org/apache/hudi/HoodieBaseRelation.scala | 5 +- hudi-spark-datasource/hudi-spark/pom.xml | 4 +- hudi-spark-datasource/hudi-spark2-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark2/pom.xml | 4 +- hudi-spark-datasource/hudi-spark3-common/pom.xml | 2 +- hudi-spark-datasource/hudi-spark3.1.x/pom.xml | 4 +- hudi-spark-datasource/hudi-spark3.2.x/pom.xml | 4 +-
[hudi] 03/05: [HUDI-5764] Rollback delta commits from `HoodieIndexer` lazily in metadata table (#7921)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch release-0.13.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit d4106f35b4aee53ea5cb1430288f397b37c81183 Author: Y Ethan Guo AuthorDate: Sun Feb 12 03:30:10 2023 -0800 [HUDI-5764] Rollback delta commits from `HoodieIndexer` lazily in metadata table (#7921) Fixes two issues: - Makes the rollback of indexing delta commit lazy in the metadata table, otherwise, it would be cleaned up eagerly by other regular writes. - Uses a suffix (004) appending to the up-to-instant used by the async index to avoid collision with existing completed delta commit of the same instant time. --- .../hudi/client/BaseHoodieTableServiceClient.java | 48 + .../apache/hudi/client/BaseHoodieWriteClient.java | 13 +++ .../metadata/HoodieBackedTableMetadataWriter.java | 34 --- .../java/org/apache/hudi/table/HoodieTable.java| 38 +++- .../table/action/index/RunIndexActionExecutor.java | 5 +- .../FlinkHoodieBackedTableMetadataWriter.java | 21 +++- .../org/apache/hudi/table/HoodieFlinkTable.java| 12 ++- .../SparkHoodieBackedTableMetadataWriter.java | 20 +++- .../org/apache/hudi/table/HoodieSparkTable.java| 10 +- .../hudi/metadata/HoodieBackedTableMetadata.java | 12 ++- .../hudi/metadata/HoodieTableMetadataUtil.java | 20 .../apache/hudi/utilities/TestHoodieIndexer.java | 108 +++-- 12 files changed, 298 insertions(+), 43 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java index 390bc4b9714..301ed61bf4e 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java @@ -48,6 +48,7 @@ import org.apache.hudi.config.HoodieWriteConfig; import org.apache.hudi.exception.HoodieException; import org.apache.hudi.exception.HoodieIOException; import org.apache.hudi.exception.HoodieRollbackException; +import org.apache.hudi.metadata.HoodieTableMetadata; import org.apache.hudi.metadata.HoodieTableMetadataWriter; import org.apache.hudi.table.HoodieTable; import org.apache.hudi.table.action.HoodieWriteMetadata; @@ -71,6 +72,7 @@ import java.util.stream.Collectors; import java.util.stream.Stream; import static org.apache.hudi.common.util.ValidationUtils.checkArgument; +import static org.apache.hudi.metadata.HoodieTableMetadataUtil.isIndexingCommit; public abstract class BaseHoodieTableServiceClient extends BaseHoodieClient implements RunsTableService { @@ -659,8 +661,41 @@ public abstract class BaseHoodieTableServiceClient extends BaseHoodieClient i return infoMap; } + /** + * Rolls back the failed delta commits corresponding to the indexing action. + * Such delta commits are identified based on the suffix `METADATA_INDEXER_TIME_SUFFIX` ("004"). + * + * TODO(HUDI-5733): This should be cleaned up once the proper fix of rollbacks + * in the metadata table is landed. + * + * @return {@code true} if rollback happens; {@code false} otherwise. + */ + protected boolean rollbackFailedIndexingCommits() { +HoodieTable table = createTable(config, hadoopConf); +List instantsToRollback = getFailedIndexingCommitsToRollback(table.getMetaClient()); +Map> pendingRollbacks = getPendingRollbackInfos(table.getMetaClient()); +instantsToRollback.forEach(entry -> pendingRollbacks.putIfAbsent(entry, Option.empty())); +rollbackFailedWrites(pendingRollbacks); +return !pendingRollbacks.isEmpty(); + } + + protected List getFailedIndexingCommitsToRollback(HoodieTableMetaClient metaClient) { +Stream inflightInstantsStream = metaClient.getCommitsTimeline() +.filter(instant -> !instant.isCompleted() +&& isIndexingCommit(instant.getTimestamp())) +.getInstantsAsStream(); +return inflightInstantsStream.filter(instant -> { + try { +return heartbeatClient.isHeartbeatExpired(instant.getTimestamp()); + } catch (IOException io) { +throw new HoodieException("Failed to check heartbeat for instant " + instant, io); + } +}).map(HoodieInstant::getTimestamp).collect(Collectors.toList()); + } + /** * Rollback all failed writes. + * * @return true if rollback was triggered. false otherwise. */ protected Boolean rollbackFailedWrites() { @@ -699,6 +734,19 @@ public abstract class BaseHoodieTableServiceClient extends BaseHoodieClient i Stream inflightInstantsStream = getInflightTimelineExcludeCompactionAndClustering(metaClient) .getReverseOrderedInstants(); if (cleaningPolicy.isEager()) { + // Metadata table uses eager cleaning policy, but
[hudi] 02/05: [HUDI-5768] Fix Spark Datasource read of metadata table (#7924)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch release-0.13.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 7ccf6e678278ceca592b8d95160bb0b17906928f Author: Y Ethan Guo AuthorDate: Sun Feb 12 03:25:51 2023 -0800 [HUDI-5768] Fix Spark Datasource read of metadata table (#7924) --- .../src/main/scala/org/apache/hudi/HoodieBaseRelation.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala index bf3d38b808d..8a730a8334b 100644 --- a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala +++ b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala @@ -42,6 +42,7 @@ import org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter import org.apache.hudi.internal.schema.utils.{InternalSchemaUtils, SerDeHelper} import org.apache.hudi.internal.schema.{HoodieSchemaException, InternalSchema} import org.apache.hudi.io.storage.HoodieAvroHFileReader +import org.apache.hudi.metadata.HoodieTableMetadata import org.apache.spark.execution.datasources.HoodieInMemoryFileIndex import org.apache.spark.internal.Logging import org.apache.spark.rdd.RDD @@ -59,7 +60,6 @@ import org.apache.spark.sql.{Row, SQLContext, SparkSession} import org.apache.spark.unsafe.types.UTF8String import java.net.URI -import java.util.Locale import scala.collection.JavaConverters._ import scala.util.control.NonFatal import scala.util.{Failure, Success, Try} @@ -292,7 +292,8 @@ abstract class HoodieBaseRelation(val sqlContext: SQLContext, * Determines whether relation's schema could be pruned by Spark's Optimizer */ def canPruneRelationSchema: Boolean = -(fileFormat.isInstanceOf[ParquetFileFormat] || fileFormat.isInstanceOf[OrcFileFormat]) && +!HoodieTableMetadata.isMetadataTable(basePath.toString) && + (fileFormat.isInstanceOf[ParquetFileFormat] || fileFormat.isInstanceOf[OrcFileFormat]) && // NOTE: In case this relation has already been pruned there's no point in pruning it again prunedDataSchema.isEmpty && // TODO(HUDI-5421) internal schema doesn't support nested schema pruning currently
[hudi] 01/05: [HUDI-5758] Restoring state of `HoodieKey` to make sure it's binary compatible w/ its state in 0.12 (#7917)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch release-0.13.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 847e7a975bfeb94956885cc252285f95afc4a843 Author: Alexey Kudinkin AuthorDate: Fri Feb 10 15:02:47 2023 -0800 [HUDI-5758] Restoring state of `HoodieKey` to make sure it's binary compatible w/ its state in 0.12 (#7917) RFC-46 modified `HoodieKey` to substantially optimize its serialized footprint (while using Kryo) by making it explicitly serializable by Kryo (inheriting form `KryoSerializable`, making it final). However, this broken its binary compatibility w/ the state as it was in 0.12.2. Unfortunately, this entailed that as this class is used in `DeleteRecord` w/in `HoodieDeleteBlock` that it also made impossible to read such blocks created by prior Hudi versions (more details in HUDI-5758). This PR restores previous state for `HoodieKey` to make sure it stays binary compatible w/ existing persisted `HoodieDeleteBlock` created by prior Hudi versions --- .../apache/spark/HoodieSparkKryoRegistrar.scala| 25 +-- .../org/apache/hudi/common/model/DeleteRecord.java | 9 +++ .../org/apache/hudi/common/model/HoodieKey.java| 28 -- .../common/table/log/block/HoodieDeleteBlock.java | 2 ++ 4 files changed, 44 insertions(+), 20 deletions(-) diff --git a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/HoodieSparkKryoRegistrar.scala b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/HoodieSparkKryoRegistrar.scala index 3894065d809..9d7fa3b784f 100644 --- a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/HoodieSparkKryoRegistrar.scala +++ b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/HoodieSparkKryoRegistrar.scala @@ -18,11 +18,12 @@ package org.apache.spark -import com.esotericsoftware.kryo.Kryo +import com.esotericsoftware.kryo.io.{Input, Output} +import com.esotericsoftware.kryo.{Kryo, Serializer} import com.esotericsoftware.kryo.serializers.JavaSerializer import org.apache.hudi.client.model.HoodieInternalRow import org.apache.hudi.common.config.SerializableConfiguration -import org.apache.hudi.common.model.HoodieSparkRecord +import org.apache.hudi.common.model.{HoodieKey, HoodieSparkRecord} import org.apache.hudi.common.util.HoodieCommonKryoRegistrar import org.apache.hudi.config.HoodieWriteConfig import org.apache.spark.serializer.KryoRegistrator @@ -44,12 +45,15 @@ import org.apache.spark.serializer.KryoRegistrator * */ class HoodieSparkKryoRegistrar extends HoodieCommonKryoRegistrar with KryoRegistrator { + override def registerClasses(kryo: Kryo): Unit = { /// // NOTE: DO NOT REORDER REGISTRATIONS /// super[HoodieCommonKryoRegistrar].registerClasses(kryo) +kryo.register(classOf[HoodieKey], new HoodieKeySerializer) + kryo.register(classOf[HoodieWriteConfig]) kryo.register(classOf[HoodieSparkRecord]) @@ -59,6 +63,23 @@ class HoodieSparkKryoRegistrar extends HoodieCommonKryoRegistrar with KryoRegist // we're relying on [[SerializableConfiguration]] wrapper to work it around kryo.register(classOf[SerializableConfiguration], new JavaSerializer()) } + + /** + * NOTE: This {@link Serializer} could deserialize instance of {@link HoodieKey} serialized + * by implicitly generated Kryo serializer (based on {@link com.esotericsoftware.kryo.serializers.FieldSerializer} + */ + class HoodieKeySerializer extends Serializer[HoodieKey] { +override def write(kryo: Kryo, output: Output, key: HoodieKey): Unit = { + output.writeString(key.getRecordKey) + output.writeString(key.getPartitionPath) +} + +override def read(kryo: Kryo, input: Input, klass: Class[HoodieKey]): HoodieKey = { + val recordKey = input.readString() + val partitionPath = input.readString() + new HoodieKey(recordKey, partitionPath) +} + } } object HoodieSparkKryoRegistrar { diff --git a/hudi-common/src/main/java/org/apache/hudi/common/model/DeleteRecord.java b/hudi-common/src/main/java/org/apache/hudi/common/model/DeleteRecord.java index 003b591c20c..296e95e8bfa 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/model/DeleteRecord.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/model/DeleteRecord.java @@ -28,6 +28,15 @@ import java.util.Objects; * we need to keep the ordering val to combine with the data records when merging, or the data loss * may occur if there are intermediate deletions for the inputs * (a new INSERT comes after a DELETE in one input batch). + * + * NOTE: PLEASE READ CAREFULLY BEFORE CHANGING + * + * This class is serialized (using Kryo) as part of {@code HoodieDeleteBlock} to
[hudi] 04/05: [HUDI-5771] Improve deploy script of release artifacts (#7927)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch release-0.13.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 4254fc9f4829733c24d4c22c78ae855df7755798 Author: Y Ethan Guo AuthorDate: Sun Feb 12 22:14:31 2023 -0800 [HUDI-5771] Improve deploy script of release artifacts (#7927) The current scripts/release/deploy_staging_jars.sh took around 6 hours to upload all release artifacts to the Apache Nexus staging repository, which is too long. This commit cuts down the upload time by 70% to <2 hours, without changing the intended jars for uploads. --- scripts/release/deploy_staging_jars.sh | 74 -- 1 file changed, 34 insertions(+), 40 deletions(-) diff --git a/scripts/release/deploy_staging_jars.sh b/scripts/release/deploy_staging_jars.sh index 049e5ee7144..7d44e5ffa96 100755 --- a/scripts/release/deploy_staging_jars.sh +++ b/scripts/release/deploy_staging_jars.sh @@ -36,38 +36,41 @@ if [ "$#" -gt "1" ]; then exit 1 fi -BUNDLE_MODULES=$(find -s packaging -name 'hudi-*-bundle' -type d) -BUNDLE_MODULES_EXCLUDED="-${BUNDLE_MODULES//$'\n'/,-}" - declare -a ALL_VERSION_OPTS=( -# upload all module jars and bundle jars -"-Dscala-2.11 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark3.3 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark3.2 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark3.1" # this profile goes last in this section to ensure bundles use avro 1.8 - -# spark bundles -"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am" +# Upload Spark specific modules and bundle jars +# For Spark 2.4, Scala 2.11: +# hudi-spark-common_2.11 +# hudi-spark_2.11 +# hudi-spark2_2.11 +# hudi-utilities_2.11 +# hudi-cli-bundle_2.11 +# hudi-spark2.4-bundle_2.11 +# hudi-utilities-bundle_2.11 +# hudi-utilities-slim-bundle_2.11 +"-Dscala-2.11 -Dspark2.4 -pl hudi-spark-datasource/hudi-spark-common,hudi-spark-datasource/hudi-spark2,hudi-spark-datasource/hudi-spark,hudi-utilities,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle -am" +# For Spark 2.4, Scala 2.12: +# hudi-spark2.4-bundle_2.12 "-Dscala-2.12 -Dspark2.4 -pl packaging/hudi-spark-bundle -am" -"-Dscala-2.12 -Dspark3.3 -pl packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am" -"-Dscala-2.12 -Dspark3.2 -pl packaging/hudi-spark-bundle -am" -"-Dscala-2.12 -Dspark3.1 -pl packaging/hudi-spark-bundle -am" - -# spark bundles (legacy) (not overwriting previous uploads as these jar names are unique) +# For Spark 3.2, Scala 2.12: +# hudi-spark3.2.x_2.12 +# hudi-spark3.2plus-common +# hudi-spark3.2-bundle_2.12 +"-Dscala-2.12 -Dspark3.2 -pl hudi-spark-datasource/hudi-spark3.2.x,hudi-spark-datasource/hudi-spark3.2plus-common,packaging/hudi-spark-bundle -am" +# For Spark 3.1, Scala 2.12: +# All other modules and bundles using avro 1.8 +"-Dscala-2.12 -Dspark3.1" +# For Spark 3.3, Scala 2.12: +# hudi-spark3.3.x_2.12 +# hudi-cli-bundle_2.12 +# hudi-spark3.3-bundle_2.12 +"-Dscala-2.12 -Dspark3.3 -pl hudi-spark-datasource/hudi-spark3.3.x,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am" + +# Upload legacy Spark bundles (not overwriting previous uploads as these jar names are unique) "-Dscala-2.11 -Dspark2 -pl packaging/hudi-spark-bundle -am" # for legacy bundle name hudi-spark-bundle_2.11 "-Dscala-2.12 -Dspark2 -pl packaging/hudi-spark-bundle -am" # for legacy bundle name hudi-spark-bundle_2.12 "-Dscala-2.12 -Dspark3 -pl packaging/hudi-spark-bundle -am" # for legacy bundle name hudi-spark3-bundle_2.12 -# utilities bundles (legacy) (overwriting previous uploads) -"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-utilities-bundle -am" # hudi-utilities-bundle_2.11 is for spark 2.4 only -"-Dscala-2.12 -Dspark3.1 -pl packaging/hudi-utilities-bundle -am" # hudi-utilities-bundle_2.12 is for spark 3.1 only - -# utilities slim bundles -"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-utilities-slim-bundle -am" # hudi-utilities-slim-bundle_2.11 -"-Dscala-2.12 -Dspark3.1 -pl packaging/hudi-utilities-slim-bundle -am" # hudi-utilities-slim-bundle_2.12 - -# flink bundles (overwriting previous uploads) +# Upload Flink bundles (overwriting previous uploads) "-Dscala-2.12 -Dflink1.13 -Davro.version=1.10.0 -pl packaging/hudi-flink-bundle -am" "-Dscala-2.12 -Dflink1.14 -Davro.version=1.10.0 -pl packaging/hudi-flink-bundle -am" "-Dscala-2.12 -Dflink1.15 -Davro.version=1.10.0 -pl packaging/hudi-flink-bundle -am" @@ -105,20 +108,11 @@ COMMON_OPTIONS="-DdeployArtifacts=true -DskipTests -DretryFailedDeploymentCount= for v in "${ALL_VERSION_OPTS[@]}" do # TODO: consider cleaning all modules by listing directories instead of specifying profile - if [[ "$v" == *"$BUNDLE_MODULES_EXCLUDED" ]]; then -# When deploying jars with bundle exclusions, we still need to build the
[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427414425 ## CI report: * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15124) * 0f35441097e274abe020127c5bd2a5f3d46e0b99 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (e25381c6966 -> a932e482408)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from e25381c6966 [HUDI-5764] Rollback delta commits from `HoodieIndexer` lazily in metadata table (#7921) add a932e482408 [HUDI-5771] Improve deploy script of release artifacts (#7927) No new revisions were added by this update. Summary of changes: scripts/release/deploy_staging_jars.sh | 74 -- 1 file changed, 34 insertions(+), 40 deletions(-)
[GitHub] [hudi] yihua merged pull request #7927: [HUDI-5771] Improve deploy script of release artifacts
yihua merged PR #7927: URL: https://github.com/apache/hudi/pull/7927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua commented on a diff in pull request #7927: [HUDI-5771] Improve deploy script of release artifacts
yihua commented on code in PR #7927: URL: https://github.com/apache/hudi/pull/7927#discussion_r1104034135 ## scripts/release/deploy_staging_jars.sh: ## @@ -36,38 +36,41 @@ if [ "$#" -gt "1" ]; then exit 1 fi -BUNDLE_MODULES=$(find -s packaging -name 'hudi-*-bundle' -type d) -BUNDLE_MODULES_EXCLUDED="-${BUNDLE_MODULES//$'\n'/,-}" - declare -a ALL_VERSION_OPTS=( -# upload all module jars and bundle jars -"-Dscala-2.11 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark3.3 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark3.2 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark3.1" # this profile goes last in this section to ensure bundles use avro 1.8 - -# spark bundles -"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am" +# Upload Spark specific modules and bundle jars +# For Spark 2.4, Scala 2.11: +# hudi-spark-common_2.11 +# hudi-spark_2.11 +# hudi-spark2_2.11 +# hudi-utilities_2.11 +# hudi-cli-bundle_2.11 +# hudi-spark2.4-bundle_2.11 +# hudi-utilities-bundle_2.11 +# hudi-utilities-slim-bundle_2.11 +"-Dscala-2.11 -Dspark2.4 -pl hudi-spark-datasource/hudi-spark-common,hudi-spark-datasource/hudi-spark2,hudi-spark-datasource/hudi-spark,hudi-utilities,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle -am" Review Comment: Yes, it is still uploaded. If you check the [staging_file_timestamp.txt](https://github.com/apache/hudi/files/10719051/staging_file_timestamp.txt), `hudi-spark2-common` is uploaded by `-Dscala-2.12 -Dspark3.1` profile. I keep it the same for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a diff in pull request #7927: [HUDI-5771] Improve deploy script of release artifacts
xushiyan commented on code in PR #7927: URL: https://github.com/apache/hudi/pull/7927#discussion_r1104030123 ## scripts/release/deploy_staging_jars.sh: ## @@ -36,38 +36,41 @@ if [ "$#" -gt "1" ]; then exit 1 fi -BUNDLE_MODULES=$(find -s packaging -name 'hudi-*-bundle' -type d) -BUNDLE_MODULES_EXCLUDED="-${BUNDLE_MODULES//$'\n'/,-}" - declare -a ALL_VERSION_OPTS=( -# upload all module jars and bundle jars -"-Dscala-2.11 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark3.3 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark3.2 -pl $BUNDLE_MODULES_EXCLUDED" -"-Dscala-2.12 -Dspark3.1" # this profile goes last in this section to ensure bundles use avro 1.8 - -# spark bundles -"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am" +# Upload Spark specific modules and bundle jars +# For Spark 2.4, Scala 2.11: +# hudi-spark-common_2.11 +# hudi-spark_2.11 +# hudi-spark2_2.11 +# hudi-utilities_2.11 +# hudi-cli-bundle_2.11 +# hudi-spark2.4-bundle_2.11 +# hudi-utilities-bundle_2.11 +# hudi-utilities-slim-bundle_2.11 +"-Dscala-2.11 -Dspark2.4 -pl hudi-spark-datasource/hudi-spark-common,hudi-spark-datasource/hudi-spark2,hudi-spark-datasource/hudi-spark,hudi-utilities,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle -am" Review Comment: there is a `hudi-spark2-common`, which is a placeholder module and empty. Though it won't affect things, it should be still added to keep consistent with existing modules. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts
[ https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5771: - Labels: pull-request-available (was: ) > Improve deploy script of release artifacts > -- > > Key: HUDI-5771 > URL: https://issues.apache.org/jira/browse/HUDI-5771 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > Current script is inefficient as some artifacts are repeatedly uploaded which > wastes time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] yihua opened a new pull request, #7927: [HUDI-5771] Improve deploy script of release artifacts
yihua opened a new pull request, #7927: URL: https://github.com/apache/hudi/pull/7927 ### Change Logs The current `scripts/release/deploy_staging_jars.sh` took around 6 hours to upload all release artifacts to the Apache Nexus staging repository, which is too long. After analyzing the upload sequence, there are repeated uploads of the same module that can be avoided. After carefully reviewing the deploy script and logs, I make the following changes to cut down the upload time by 70%, without changing the intended jars for uploads: - For each profile (e.g., `-Dscala-2.12 -Dspark3.2`), only make one mvn build - Remove overlapping build targets among different profiles - For Spark 2.4, Scala 2.11: `hudi-spark-common_2.11`, `hudi-spark_2.11`, `hudi-spark2_2.11`, `hudi-utilities_2.11`, `hudi-cli-bundle_2.11`, `hudi-spark2.4-bundle_2.11`, `hudi-utilities-bundle_2.11`, `hudi-utilities-slim-bundle_2.11` - For Spark 2.4, Scala 2.12: `hudi-spark2.4-bundle_2.12` - For Spark 3.2, Scala 2.12: `hudi-spark3.2.x_2.12`, `hudi-spark3.2plus-common`, `hudi-spark3.2-bundle_2.12` - For Spark 3.3, Scala 2.12: `hudi-spark3.3.x_2.12`, `hudi-cli-bundle_2.12`, `hudi-spark3.3-bundle_2.12` - For Spark 3.1, Scala 2.12: all other modules and bundles (`hudi-cli-bundle_2.12` is not overridden) Legacy Spark bundles and Flink bundles are not changed. Raw logs: - Summary of existing upload sequence: [deploy_sequence.txt](https://github.com/apache/hudi/files/10719044/deploy_sequence.txt) - Last modified times of uploaded artifacts for analyzing the relevant upload and profile: [staging_file_timestamp.txt](https://github.com/apache/hudi/files/10719051/staging_file_timestamp.txt) ### Impact Significantly reduces the time (by ~70%, from 6 hours to <2 hours) of uploading all release artifacts to the Apache Nexus staging repository. ### Risk level low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nfarah86 commented on pull request #7926: updated hudi content
nfarah86 commented on PR #7926: URL: https://github.com/apache/hudi/pull/7926#issuecomment-1427375751 cc @bhasudha to review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nfarah86 opened a new pull request, #7926: updated hudi content
nfarah86 opened a new pull request, #7926: URL: https://github.com/apache/hudi/pull/7926 ### Change Logs updated videos and blog content; blog image is null- but file is added https://user-images.githubusercontent.com/5392555/218378737-f301ffb5-e41f-40fb-97a7-44d06c20d306.png;> https://user-images.githubusercontent.com/5392555/218378739-2d2eebd1-c6c6-4d83-8ecf-8072f4f8a186.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427368880 ## CI report: * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15124) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-5771) Improve deploy script of release artifacts
[ https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-5771: --- Assignee: Ethan Guo > Improve deploy script of release artifacts > -- > > Key: HUDI-5771 > URL: https://issues.apache.org/jira/browse/HUDI-5771 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Current script is inefficient as some artifacts are repeatedly uploaded which > wastes time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts
[ https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5771: Description: Current script is inefficient as some artifacts are repeatedly uploaded which wastes time. > Improve deploy script of release artifacts > -- > > Key: HUDI-5771 > URL: https://issues.apache.org/jira/browse/HUDI-5771 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Current script is inefficient as some artifacts are repeatedly uploaded which > wastes time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts
[ https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5771: Story Points: 3 > Improve deploy script of release artifacts > -- > > Key: HUDI-5771 > URL: https://issues.apache.org/jira/browse/HUDI-5771 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Current script is inefficient as some artifacts are repeatedly uploaded which > wastes time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts
[ https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5771: Fix Version/s: 0.13.0 > Improve deploy script of release artifacts > -- > > Key: HUDI-5771 > URL: https://issues.apache.org/jira/browse/HUDI-5771 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-5771) Improve deploy script of release artifacts
Ethan Guo created HUDI-5771: --- Summary: Improve deploy script of release artifacts Key: HUDI-5771 URL: https://issues.apache.org/jira/browse/HUDI-5771 Project: Apache Hudi Issue Type: Improvement Reporter: Ethan Guo -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts
[ https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5771: Priority: Blocker (was: Major) > Improve deploy script of release artifacts > -- > > Key: HUDI-5771 > URL: https://issues.apache.org/jira/browse/HUDI-5771 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log
hudi-bot commented on PR #7915: URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427336585 ## CI report: * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN * 6b3cafb7422b1cb3bfb49557327effc2b144dc58 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15125) * 52ff32a1bb04340505e309191c398d95a9c8f928 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15127) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6121: [HUDI-4406] Support Flink compaction/clustering write error resolvement to avoid data loss
hudi-bot commented on PR #6121: URL: https://github.com/apache/hudi/pull/6121#issuecomment-1427335554 ## CI report: * 52b6f55e196007f993b0506d899c48bb80b36546 UNKNOWN * c52a60118c2e7fba170ea1cea0c4105ff83c52f9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15089) * 5dc463fcade7c5a495cca1437fca8230b01d0229 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15126) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log
hudi-bot commented on PR #7915: URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427332991 ## CI report: * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN * 7f2456c65f6d17280fd6abe3185edc5a7f4d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15080) * 6b3cafb7422b1cb3bfb49557327effc2b144dc58 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15125) * 52ff32a1bb04340505e309191c398d95a9c8f928 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #6121: [HUDI-4406] Support Flink compaction/clustering write error resolvement to avoid data loss
hudi-bot commented on PR #6121: URL: https://github.com/apache/hudi/pull/6121#issuecomment-1427328065 ## CI report: * 52b6f55e196007f993b0506d899c48bb80b36546 UNKNOWN * c52a60118c2e7fba170ea1cea0c4105ff83c52f9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15089) * 5dc463fcade7c5a495cca1437fca8230b01d0229 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits
hudi-bot commented on PR #7633: URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427322176 ## CI report: * 50480623485bb99353655f4c6df23a2462214f7f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15123) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] chenshzh commented on a diff in pull request #6121: [HUDI-4406] Support Flink compaction/clustering write error resolvement to avoid data loss
chenshzh commented on code in PR #6121: URL: https://github.com/apache/hudi/pull/6121#discussion_r1103982916 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java: ## @@ -119,7 +119,16 @@ private void commitIfNecessary(String instant, List event return; } -if (events.stream().anyMatch(ClusteringCommitEvent::isFailed)) { +// here we should take the write errors under consideration +// as some write errors might cause data loss when clustering +List statuses = events.stream() Review Comment: Agree that `isFailed` indicates the execution failure always to be rollbacked. So in the updated we will judge whether to rollback write status errors when the config `FlinkOptions.IGNORE_FAILED` false. Pls take a review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7891: [HUDI-5728] HoodieTimelineArchiver archives the latest instant before inflight replacecommit
zhuanshenbsj1 commented on code in PR #7891: URL: https://github.com/apache/hudi/pull/7891#discussion_r1103965694 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -473,6 +473,33 @@ private Stream getCommitInstantsToArchive() throws IOException { HoodieTimeline.compareTimestamps(s.getTimestamp(), LESSER_THAN, instantToRetain.getTimestamp())) .orElse(true) ); + + // When inline or async clustering is enabled, we need to ensure that there is a commit in the active timeline + // to check whether the file slice generated in pending clustering after archive isn't committed + // via {@code HoodieFileGroup#isFileSliceCommitted(slice)} + boolean isOldestPendingReplaceInstant = + oldestPendingCompactionAndReplaceInstant.map(instant -> + HoodieTimeline.REPLACE_COMMIT_ACTION.equals(instant.getAction())).orElse(false); + if (isOldestPendingReplaceInstant) { +List instantsToArchive = instantToArchiveStream.collect(Collectors.toList()); +Option latestInstantRetainForReplace = Option.fromJavaOptional( +instantsToArchive.stream() +.filter(s -> HoodieTimeline.compareTimestamps( +s.getTimestamp(), +LESSER_THAN, + oldestPendingCompactionAndReplaceInstant.get().getTimestamp())) +.reduce((i1, i2) -> i2)); +if (latestInstantRetainForReplace.isPresent()) { + LOG.info(String.format( + "Retaining the archived instant %s before the inflight replacecommit %s.", + latestInstantRetainForReplace.get().getTimestamp(), + oldestPendingCompactionAndReplaceInstant.get().getTimestamp())); +} +instantToArchiveStream = instantsToArchive.stream() +.filter(s -> latestInstantRetainForReplace.map(instant -> s.compareTo(instant) != 0) +.orElse(true)); + } + Review Comment: getOldestInstantToRetainForClustering(){ 1.get the first unclean clustering instant 2.get the previous commit of last inflight clustering instant 3.compare 1&2, return the earliest } -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log
hudi-bot commented on PR #7915: URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427290880 ## CI report: * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN * 7f2456c65f6d17280fd6abe3185edc5a7f4d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15080) * 6b3cafb7422b1cb3bfb49557327effc2b144dc58 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15125) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log
hudi-bot commented on PR #7915: URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427286081 ## CI report: * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN * 7f2456c65f6d17280fd6abe3185edc5a7f4d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15080) * 6b3cafb7422b1cb3bfb49557327effc2b144dc58 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-5770) Plan error when partition column is timestamp type and SQL query contains filter condition which contains partition
[ https://issues.apache.org/jira/browse/HUDI-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687687#comment-17687687 ] Jing Zhang commented on HUDI-5770: -- The cause of this bug is similar with [HUDI-4601|https://issues.apache.org/jira/browse/HUDI-4601] If partition column is timestamp type, the partition path value is not the real value, because the partition value is converted according to the real value. We need take care the partition case when it's timestamp/date type when applying partition prune. > Plan error when partition column is timestamp type and SQL query contains > filter condition which contains partition > --- > > Key: HUDI-5770 > URL: https://issues.apache.org/jira/browse/HUDI-5770 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Jing Zhang >Priority: Major > > If a hudi table is a partition table, and partition column is timestamp type. > When run a flink query which contain the filter conditions on partition > column, an error would be thrown out in the plan generating phase. > {code:java} > java.time.format.DateTimeParseException: Text '1970010100' could not be > parsed at index 0 at > java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949) > at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777) > at > org.apache.flink.table.utils.DateTimeUtils.parseTimestampData(DateTimeUtils.java:413) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionFieldValue(PartitionPruner.scala:182) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1(PartitionPruner.scala:157) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1$adapted(PartitionPruner.scala:155) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionToRow(PartitionPruner.scala:155) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1(PartitionPruner.scala:137) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1$adapted(PartitionPruner.scala:132) > at scala.collection.Iterator.foreach(Iterator.scala:937) > at scala.collection.Iterator.foreach$(Iterator.scala:937) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) > at scala.collection.IterableLike.foreach(IterableLike.scala:70) > at scala.collection.IterableLike.foreach$(IterableLike.scala:69) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.prunePartitions(PartitionPruner.scala:132) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner.prunePartitions(PartitionPruner.scala) > at > org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.lambda$onMatch$3(PushPartitionIntoTableSourceScanRule.java:163) > at > org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.readPartitionsAndPrune(PushPartitionIntoTableSourceScanRule.java:254) > at > org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.onMatch(PushPartitionIntoTableSourceScanRule.java:172) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) > at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:64) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:78) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.$anonfun$optimize$2(FlinkGroupProgram.scala:59) > at > scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:156) > at >
[jira] [Assigned] (HUDI-5770) Plan error when partition column is timestamp type and SQL query contains filter condition which contains partition
[ https://issues.apache.org/jira/browse/HUDI-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhang reassigned HUDI-5770: Assignee: Jing Zhang > Plan error when partition column is timestamp type and SQL query contains > filter condition which contains partition > --- > > Key: HUDI-5770 > URL: https://issues.apache.org/jira/browse/HUDI-5770 > Project: Apache Hudi > Issue Type: Bug > Components: flink-sql >Reporter: Jing Zhang >Assignee: Jing Zhang >Priority: Major > > If a hudi table is a partition table, and partition column is timestamp type. > When run a flink query which contain the filter conditions on partition > column, an error would be thrown out in the plan generating phase. > {code:java} > java.time.format.DateTimeParseException: Text '1970010100' could not be > parsed at index 0 at > java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949) > at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777) > at > org.apache.flink.table.utils.DateTimeUtils.parseTimestampData(DateTimeUtils.java:413) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionFieldValue(PartitionPruner.scala:182) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1(PartitionPruner.scala:157) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1$adapted(PartitionPruner.scala:155) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionToRow(PartitionPruner.scala:155) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1(PartitionPruner.scala:137) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1$adapted(PartitionPruner.scala:132) > at scala.collection.Iterator.foreach(Iterator.scala:937) > at scala.collection.Iterator.foreach$(Iterator.scala:937) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) > at scala.collection.IterableLike.foreach(IterableLike.scala:70) > at scala.collection.IterableLike.foreach$(IterableLike.scala:69) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner$.prunePartitions(PartitionPruner.scala:132) > at > org.apache.flink.table.planner.plan.utils.PartitionPruner.prunePartitions(PartitionPruner.scala) > at > org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.lambda$onMatch$3(PushPartitionIntoTableSourceScanRule.java:163) > at > org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.readPartitionsAndPrune(PushPartitionIntoTableSourceScanRule.java:254) > at > org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.onMatch(PushPartitionIntoTableSourceScanRule.java:172) > at > org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) > at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) > at > org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:64) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:78) > at > org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.$anonfun$optimize$2(FlinkGroupProgram.scala:59) > at > scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:156) > at > scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:156) > at scala.collection.Iterator.foreach(Iterator.scala:937) > at scala.collection.Iterator.foreach$(Iterator.scala:937) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) > at scala.collection.IterableLike.foreach(IterableLike.scala:70) > at
[jira] [Created] (HUDI-5770) Plan error when partition column is timestamp type and SQL query contains filter condition which contains partition
Jing Zhang created HUDI-5770: Summary: Plan error when partition column is timestamp type and SQL query contains filter condition which contains partition Key: HUDI-5770 URL: https://issues.apache.org/jira/browse/HUDI-5770 Project: Apache Hudi Issue Type: Bug Components: flink-sql Reporter: Jing Zhang If a hudi table is a partition table, and partition column is timestamp type. When run a flink query which contain the filter conditions on partition column, an error would be thrown out in the plan generating phase. {code:java} java.time.format.DateTimeParseException: Text '1970010100' could not be parsed at index 0 at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949) at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777) at org.apache.flink.table.utils.DateTimeUtils.parseTimestampData(DateTimeUtils.java:413) at org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionFieldValue(PartitionPruner.scala:182) at org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1(PartitionPruner.scala:157) at org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1$adapted(PartitionPruner.scala:155) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194) at org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionToRow(PartitionPruner.scala:155) at org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1(PartitionPruner.scala:137) at org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1$adapted(PartitionPruner.scala:132) at scala.collection.Iterator.foreach(Iterator.scala:937) at scala.collection.Iterator.foreach$(Iterator.scala:937) at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at scala.collection.IterableLike.foreach(IterableLike.scala:70) at scala.collection.IterableLike.foreach$(IterableLike.scala:69) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.flink.table.planner.plan.utils.PartitionPruner$.prunePartitions(PartitionPruner.scala:132) at org.apache.flink.table.planner.plan.utils.PartitionPruner.prunePartitions(PartitionPruner.scala) at org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.lambda$onMatch$3(PushPartitionIntoTableSourceScanRule.java:163) at org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.readPartitionsAndPrune(PushPartitionIntoTableSourceScanRule.java:254) at org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.onMatch(PushPartitionIntoTableSourceScanRule.java:172) at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333) at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542) at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) at org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243) at org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127) at org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) at org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:64) at org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:78) at org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.$anonfun$optimize$2(FlinkGroupProgram.scala:59) at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:156) at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:156) at scala.collection.Iterator.foreach(Iterator.scala:937) at scala.collection.Iterator.foreach$(Iterator.scala:937) at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at scala.collection.IterableLike.foreach(IterableLike.scala:70) at scala.collection.IterableLike.foreach$(IterableLike.scala:69) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:156) at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:154) at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104) at org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.$anonfun$optimize$1(FlinkGroupProgram.scala:56) at
[GitHub] [hudi] qidian99 commented on a diff in pull request #7915: [HUDI-5759] Supports add column on mor table with log
qidian99 commented on code in PR #7915: URL: https://github.com/apache/hudi/pull/7915#discussion_r1103953817 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala: ## @@ -202,6 +202,13 @@ private[sql] object SchemaConverters { st.foreach { f => val fieldAvroType = toAvroType(f.dataType, f.nullable, f.name, childNameSpace) +val fieldBuilder = fieldsAssembler.name(f.name).`type`(fieldAvroType) Review Comment: ![image](https://user-images.githubusercontent.com/20527912/218361319-2783b730-ddea-4d7b-b7d4-ec225014e531.png) When `extractPartitionValuesFromPartitionPath` is turned on, the StructType schema and AvroSchema differs. convertToAvroSchema is missing the default value when the field is nullable, making the table not queryable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] qidian99 commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log
qidian99 commented on PR #7915: URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427253388 Here's the stacktrace when I tried to add a column named `new_col1` in mor table: ``` Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2403) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2352) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2351) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2351) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1109) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1109) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1109) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2591) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2533) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:898) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2214) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2235) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2254) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2279) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.collect(RDD.scala:1029) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:394) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:421) at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:69) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at
[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427235209 ## CI report: * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15124) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stream2000 commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
stream2000 commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427231106 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zinking commented on issue #4457: [SUPPORT] Hudi archive stopped working
zinking commented on issue #4457: URL: https://github.com/apache/hudi/issues/4457#issuecomment-1427228551 @nsivabalan I observed same thing here. rollbacks on the timeline didn't get processed in the flink engine. compact pending on the rollbacks, and marker cleaning pending on compacts, causing an extra large timeline. in the spark compact process, the rollbacks are processed though, not sure if flink compact should do the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits
hudi-bot commented on PR #7633: URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427204592 ## CI report: * 948c6823094e63b03adfb98b40f9c70c3edf3ad2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15108) * 50480623485bb99353655f4c6df23a2462214f7f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15123) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits
hudi-bot commented on PR #7633: URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427201506 ## CI report: * 948c6823094e63b03adfb98b40f9c70c3edf3ad2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15108) * 50480623485bb99353655f4c6df23a2462214f7f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8
kazdy commented on PR #7922: URL: https://github.com/apache/hudi/pull/7922#issuecomment-1427055835 CI failed two times due to the timeout -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a diff in pull request #7871: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s
codope commented on code in PR #7871: URL: https://github.com/apache/hudi/pull/7871#discussion_r1103803439 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalystExpressionUtils.scala: ## @@ -75,7 +81,7 @@ trait HoodieCatalystExpressionUtils { def unapplyCastExpression(expr: Expression): Option[(Expression, DataType, Option[String], Boolean)] } -object HoodieCatalystExpressionUtils { +object HoodieCatalystExpressionUtils extends SparkAdapterSupport { Review Comment: Why does it need to extend `SparkAdapterSupport`? Is there something that changes across spark versions? ## hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java: ## @@ -69,6 +69,26 @@ public static boolean nonEmpty(Collection c) { return !isNullOrEmpty(c); } + /** + * Reduces provided {@link Collection} using provided {@code reducer} applied to + * every element of the collection like following + * + * {@code reduce(reduce(reduce(identity, e1), e2), ...)} + * + * @param c target collection to be reduced + * @param identity element for reducing to start from + * @param reducer actual reducing operator + * + * @return result of the reduction of the collection using reducing operator + */ + public static U reduce(Collection c, U identity, BiFunction reducer) { +return c.stream() +.sequential() Review Comment: Does it have to be strictly sequential? I mean the elements of collection should be independent of each other. Is there any value add in parameterizing this behavior, say we add a boolean `shouldReduceParallelly`? ## hudi-common/src/main/java/org/apache/hudi/internal/schema/action/TableChange.java: ## @@ -83,10 +83,16 @@ abstract class BaseColumnChange implements TableChange { protected final InternalSchema internalSchema; protected final Map id2parent; protected final Map> positionChangeMap = new HashMap<>(); +protected final boolean caseSensitive; BaseColumnChange(InternalSchema schema) { + this(schema, false); Review Comment: why default `caseSensitive` is false? ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala: ## @@ -28,97 +28,125 @@ import org.apache.hudi.config.HoodieWriteConfig.{AVRO_SCHEMA_VALIDATE_ENABLE, TB import org.apache.hudi.exception.HoodieException import org.apache.hudi.hive.HiveSyncConfigHolder import org.apache.hudi.sync.common.HoodieSyncConfig +import org.apache.hudi.util.JFunction.scalaFunction1Noop import org.apache.hudi.{AvroConversionUtils, DataSourceWriteOptions, HoodieSparkSqlWriter, SparkAdapterSupport} -import org.apache.spark.sql.HoodieCatalystExpressionUtils.MatchCast +import org.apache.spark.sql.HoodieCatalystExpressionUtils.{MatchCast, attributeEquals} import org.apache.spark.sql._ -import org.apache.spark.sql.catalyst.TableIdentifier -import org.apache.spark.sql.catalyst.analysis.Resolver import org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable -import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, BoundReference, Cast, EqualTo, Expression, Literal} +import org.apache.spark.sql.catalyst.expressions.BindReferences.bindReference +import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, BoundReference, EqualTo, Expression, Literal, NamedExpression, PredicateHelper} import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.hudi.HoodieSqlCommonUtils._ -import org.apache.spark.sql.hudi.HoodieSqlUtils.getMergeIntoTargetTableId +import org.apache.spark.sql.hudi.analysis.HoodieAnalysis.failAnalysis import org.apache.spark.sql.hudi.ProvidesHoodieConfig.combineOptions -import org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.CoercedAttributeReference +import org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.{CoercedAttributeReference, encodeAsBase64String, stripCasting, toStructType} import org.apache.spark.sql.hudi.command.payload.ExpressionPayload import org.apache.spark.sql.hudi.command.payload.ExpressionPayload._ import org.apache.spark.sql.hudi.ProvidesHoodieConfig -import org.apache.spark.sql.types.{BooleanType, StructType} +import org.apache.spark.sql.types.{BooleanType, StructField, StructType} import java.util.Base64 /** - * The Command for hoodie MergeIntoTable. - * The match on condition must contain the row key fields currently, so that we can use Hoodie - * Index to speed up the performance. + * Hudi's implementation of the {@code MERGE INTO} (MIT) Spark SQL statement. * - * The main algorithm: + * NOTE: That this implementation is restricted in a some aspects to accommodate for Hudi's crucial + * constraint (of requiring every record to bear unique primary-key): merging condition ([[mergeCondition]]) + * is currently can only (and must) reference
[GitHub] [hudi] hudi-bot commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8
hudi-bot commented on PR #7922: URL: https://github.com/apache/hudi/pull/7922#issuecomment-1427045421 ## CI report: * d75235c11b5619654d6399f397ecea13f874aec4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15111) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15120) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] menna224 commented on issue #4839: Hudi upsert doesnt trigger compaction for MOR
menna224 commented on issue #4839: URL: https://github.com/apache/hudi/issues/4839#issuecomment-1427045014 > hello @shahiidiqbal can you please provide snippet from the code in which you write stream directly how did u pass the cleansing function to it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427020023 ## CI report: * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-5764) Allow lazy rollback for async indexer commit
[ https://issues.apache.org/jira/browse/HUDI-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5764. - Resolution: Fixed > Allow lazy rollback for async indexer commit > > > Key: HUDI-5764 > URL: https://issues.apache.org/jira/browse/HUDI-5764 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > This is to fix HUDI-5733, where async indexer may fail due to eager rollback > in metadata table. > Temporary solution for 0.13.0: Little more invovled and not so clean fix. > Apply eager rollbacks only for regular delta commits. Deduce delta commits > from HoodieIndexer and employ lazy clean policy(based on heartbeat). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [hudi] codope merged pull request #7921: [HUDI-5764] Rollback delta commits from `HoodieIndexer` lazily in metadata table
codope merged PR #7921: URL: https://github.com/apache/hudi/pull/7921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (1cb8ffe7264 -> e25381c6966)
This is an automated email from the ASF dual-hosted git repository. codope pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 1cb8ffe7264 [HUDI-5768] Fix Spark Datasource read of metadata table (#7924) add e25381c6966 [HUDI-5764] Rollback delta commits from `HoodieIndexer` lazily in metadata table (#7921) No new revisions were added by this update. Summary of changes: .../hudi/client/BaseHoodieTableServiceClient.java | 48 + .../apache/hudi/client/BaseHoodieWriteClient.java | 13 +++ .../metadata/HoodieBackedTableMetadataWriter.java | 34 --- .../java/org/apache/hudi/table/HoodieTable.java| 38 +++- .../table/action/index/RunIndexActionExecutor.java | 5 +- .../FlinkHoodieBackedTableMetadataWriter.java | 21 +++- .../org/apache/hudi/table/HoodieFlinkTable.java| 12 ++- .../SparkHoodieBackedTableMetadataWriter.java | 20 +++- .../org/apache/hudi/table/HoodieSparkTable.java| 10 +- .../hudi/metadata/HoodieBackedTableMetadata.java | 12 ++- .../hudi/metadata/HoodieTableMetadataUtil.java | 20 .../apache/hudi/utilities/TestHoodieIndexer.java | 108 +++-- 12 files changed, 298 insertions(+), 43 deletions(-)
[jira] [Closed] (HUDI-5768) Fail to read metadata table in Spark Datasource
[ https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-5768. - Resolution: Fixed > Fail to read metadata table in Spark Datasource > --- > > Key: HUDI-5768 > URL: https://issues.apache.org/jira/browse/HUDI-5768 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.12.0, 0.12.1, 0.12.2 >Reporter: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0: > {code:java} > scala> val df = > spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata") > scala> df.count > scala.MatchError: HFILE (of class > org.apache.hudi.common.model.HoodieFileFormat) > at > org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216) > at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215) > at > org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215) > at > org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215) > at > org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295) > at > org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227) > at > org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:91) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200) > at
[hudi] branch master updated (3e31ca73828 -> 1cb8ffe7264)
This is an automated email from the ASF dual-hosted git repository. codope pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 3e31ca73828 [MINOR] Remove unnecessary TestCallExpressions which are adapters for CallExpression (#7911) add 1cb8ffe7264 [HUDI-5768] Fix Spark Datasource read of metadata table (#7924) No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/hudi/HoodieBaseRelation.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
[GitHub] [hudi] codope merged pull request #7924: [HUDI-5768] Fix Spark Datasource read of metadata table
codope merged PR #7924: URL: https://github.com/apache/hudi/pull/7924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7924: [HUDI-5768] Fix Spark Datasource read of metadata table
hudi-bot commented on PR #7924: URL: https://github.com/apache/hudi/pull/7924#issuecomment-1427006591 ## CI report: * 1d00cbd70323708d204e00aca22c90d66d5c2297 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15118) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8
hudi-bot commented on PR #7922: URL: https://github.com/apache/hudi/pull/7922#issuecomment-1427006577 ## CI report: * d75235c11b5619654d6399f397ecea13f874aec4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15111) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15120) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kazdy commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8
kazdy commented on PR #7922: URL: https://github.com/apache/hudi/pull/7922#issuecomment-1427002319 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7921: [HUDI-5764][DO NOT MERGE] Roll back delta commits from `HoodieIndexer` lazily in metadata table
hudi-bot commented on PR #7921: URL: https://github.com/apache/hudi/pull/7921#issuecomment-1426994163 ## CI report: * 8d961453bb808b5f6273e68a455940f2f6014605 UNKNOWN * 42a40ccdd6ee5d58b6aaf06cbc5af6bbd618dea2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15117) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1426994144 ## CI report: * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stream2000 commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
stream2000 commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1426986850 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] GallonREX opened a new issue, #7925: [SUPPORT]hudi 0.8 upgrade to hudi 0.12 report java.util.ConcurrentModificationException: Cannot resolve conflicts for overlapping writes
GallonREX opened a new issue, #7925: URL: https://github.com/apache/hudi/issues/7925 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** Upgrade from hudi 0.8 to hudi 0.12 Upgrade steps: Use the hudi 0.12 program to write to the table created by the existing hudi 0.8, and use automatic upgrade After writing to the 0.8 table, **hudi 0.12 cannot be written by two writers at the same time** **To Reproduce** Steps to reproduce the behavior: 1.Use hudi0.12 to write to existing hudi 0.8 tables 2.scala code: `articleDataframe .write.format("org.apache.hudi"). option("hoodie.insert.shuffle.parallelism", "264"). option("hoodie.upsert.shuffle.parallelism", "264"). option("hoodie.cleaner.policy.failed.writes", "LAZY") .option("hoodie.write.concurrency.mode", "optimistic_concurrency_control") .option("hoodie.write.lock.provider", "org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider") .option("hoodie.write.lock.zookeeper.url", "10.1.4.1,10.1.4.2,10.1.4.3") .option("hoodie.write.lock.zookeeper.port", "2181") .option("hoodie.write.lock.zookeeper.lock_key", "zycg_article_data_day_test08limit") .option("hoodie.write.lock.zookeeper.base_path", "/hudi_data_zycg_article_data_day_test") .option(RECORDKEY_FIELD.key(), "doc_id"). option(PARTITIONPATH_FIELD.key(), "partionpath"). option(PRECOMBINE_FIELD.key(), "publish_time"). option(TBL_NAME.key(), "hudi_test_tb"). mode(Append). save("hdfs://10.1.4.1:9000/data_center/hudidata/hudi_test_tb")` 3.spark submit(hudi 0.12): `bin/spark-submit \ --name hudi012_20220807 \ --class com.honeycomb.hudi.hudiimport.spark.ZhongyunImportHudiRecovery \ --master yarn --deploy-mode cluster \ --executor-memory 10g --driver-memory 5g --executor-cores 2 --num-executors 20 \ --queue default \ --jars /data/sas01/opt/module/hudi-0.12.0/packaging/hudi-spark-bundle/target/hudi-spark2.4-bundle_2.11-0.12.0.jar \ /data/sas01/crontabprogram2/zytongzhan2/honeycomb-hudi08-download-1.0-SNAPSHOT.jar \` **Expected behavior** update hudi 0.8 table to hudi 0.12 hudi 0.12 can write to the table at the same time without error **Environment Description** * Hudi version : hudi 0.8 ->hudi 0.12 * Spark version : spark 2.4.5 scala 2.11.12 * Hadoop version : hadoop 2.7.7 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no application on yarn **Additional context** Use hudi0.12 to write to the existing hudi0.8 table **hudi 0.8 hoodie.properties** hoodie.table.precombine.field=publish_time hoodie.table.name=zycg_article_data_day_test08 hoodie.archivelog.folder=archived hoodie.table.type=COPY_ON_WRITE hoodie.table.version=1 hoodie.timeline.layout.version=1 **hudi 0.12 hoodie.properties** hoodie.table.precombine.field=publish_time hoodie.table.partition.fields=partionpath hoodie.table.type=COPY_ON_WRITE hoodie.archivelog.folder=archived hoodie.timeline.layout.version=1 hoodie.table.version=5 hoodie.table.metadata.partitions=files hoodie.table.recordkey.fields=doc_id hoodie.table.base.file.format=PARQUET hoodie.datasource.write.partitionpath.urlencode=false hoodie.table.name=zycg_article_data_day_test08 hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator hoodie.datasource.write.hive_style_partitioning=false hoodie.table.checksum=3536879415 **Stacktrace** ```Add the stacktrace of the error.``` 23/02/11 23:01:42 INFO view.FileSystemViewManager: Creating remote first table view 23/02/11 23:01:42 INFO timeline.HoodieActiveTimeline: Loaded instants upto : Option{val=[20230211225240790__rollback__COMPLETED]} 23/02/11 23:01:42 INFO transaction.SimpleConcurrentFileWritesConflictResolutionStrategy: Found conflicting writes between first operation = {actionType=commit, instantTime=2023021122489, actionState=INFLIGHT'}, second operation = {actionType=commit, instantTime=20230211224251755, actionState=COMPLETED'} , intersecting file ids [29d8e24e-f5c5-43b5-a10e-2240cc51dda0-0, 3ac5d5f6-df53-4f81-848a-316ca38107b6-0, cb1f2488-d860-4d08-aa2a-134ba89558e3-0, ea157114-677d-4011-8c63-84af3b2526e5-0, f5301297-6e18-4166-8f56-a853b5d6485b-0, f124d9a9-f04e-4655-8a4b-c45fa357b38f-0, a4681446-fb69-4ebd-a121-13323fdb62a5-0, d48017c8-56cb-4172-a92a-5caf08d605a6-0, a6fc9c73-dc74-47ad-8085-ec63915b534b-0, 637d35d4-c492-4236-955b-ce3c515cf7ee-0,
[GitHub] [hudi] rfyu commented on pull request #7672: [HUDI-5557]Avoid converting columns that are not indexed in CSI
rfyu commented on PR #7672: URL: https://github.com/apache/hudi/pull/7672#issuecomment-1426980873 @alexeykudinkin A test has been added. Could you please help to review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7924: [HUDI-5768] Fix Spark Datasource read of metadata table
hudi-bot commented on PR #7924: URL: https://github.com/apache/hudi/pull/7924#issuecomment-1426973433 ## CI report: * 1d00cbd70323708d204e00aca22c90d66d5c2297 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15118) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7921: [HUDI-5764][DO NOT MERGE] Roll back delta commits from `HoodieIndexer` lazily in metadata table
hudi-bot commented on PR #7921: URL: https://github.com/apache/hudi/pull/7921#issuecomment-1426973423 ## CI report: * 8d961453bb808b5f6273e68a455940f2f6014605 UNKNOWN * 00a05691b0163c7bb8e39a0a15957f3b72cd71eb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15113) * 42a40ccdd6ee5d58b6aaf06cbc5af6bbd618dea2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15117) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7924: [HUDI-5768] Fix Spark Datasource read of metadata table
hudi-bot commented on PR #7924: URL: https://github.com/apache/hudi/pull/7924#issuecomment-1426972303 ## CI report: * 1d00cbd70323708d204e00aca22c90d66d5c2297 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7921: [HUDI-5764][DO NOT MERGE] Roll back delta commits from `HoodieIndexer` lazily in metadata table
hudi-bot commented on PR #7921: URL: https://github.com/apache/hudi/pull/7921#issuecomment-1426972288 ## CI report: * 8d961453bb808b5f6273e68a455940f2f6014605 UNKNOWN * 00a05691b0163c7bb8e39a0a15957f3b72cd71eb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15113) * 42a40ccdd6ee5d58b6aaf06cbc5af6bbd618dea2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource
[ https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5768: - Labels: pull-request-available (was: ) > Fail to read metadata table in Spark Datasource > --- > > Key: HUDI-5768 > URL: https://issues.apache.org/jira/browse/HUDI-5768 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.12.0, 0.12.1, 0.12.2 >Reporter: Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0: > {code:java} > scala> val df = > spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata") > scala> df.count > scala.MatchError: HFILE (of class > org.apache.hudi.common.model.HoodieFileFormat) > at > org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216) > at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215) > at > org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215) > at > org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215) > at > org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295) > at > org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227) > at > org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44) > at > org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:91) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200) > at
[GitHub] [hudi] yihua opened a new pull request, #7924: [HUDI-5768] Fix Spark Datasource read of metadata table
yihua opened a new pull request, #7924: URL: https://github.com/apache/hudi/pull/7924 ### Change Logs Fixes Spark Datasource read of metadata table in Spark 3. ### Impact As above. ### Risk level low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1426969867 ## CI report: * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org