date:20230212

[GitHub] [hudi] SteNicholas closed pull request #7928: [HUDI-5772] Align Flink clustering configuration with HoodieClusteringConfig

2023-02-12 Thread via GitHub



SteNicholas closed pull request #7928: [HUDI-5772] Align Flink clustering 
configuration with HoodieClusteringConfig
URL: https://github.com/apache/hudi/pull/7928


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope opened a new pull request, #7929: [DOCS] [WIP] Add new sources to deltastreamer docs

2023-02-12 Thread via GitHub



codope opened a new pull request, #7929:
URL: https://github.com/apache/hudi/pull/7929

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7928: [HUDI-5772] Align Flink clustering configuration with HoodieClusteringConfig

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7928:
URL: https://github.com/apache/hudi/pull/7928#issuecomment-1427480348

   
   ## CI report:
   
   * 82b52107672f324918988ef7b9b914fe992202df UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig

2023-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5772:
-
Labels: pull-request-available  (was: )

> Align Flink clustering configuration with HoodieClusteringConfig
> 
>
> Key: HUDI-5772
> URL: https://issues.apache.org/jira/browse/HUDI-5772
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink
>Affects Versions: 0.13.1
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>  Labels: pull-request-available
>
> In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 
> 'clustering.plan.strategy.cluster.begin.partition', 
> 'clustering.plan.strategy.cluster.end.partition', 
> 'clustering.plan.strategy.partition.regex.pattern', 
> 'clustering.plan.strategy.partition.selected' options which do not align the 
> clustering configuration of HoodieClusteringConfig. FlinkOptions, 
> FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering 
> configuration with HoodieClusteringConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] SteNicholas opened a new pull request, #7928: [HUDI-5772] Align Flink clustering configuration with HoodieClusteringConfig

2023-02-12 Thread via GitHub



SteNicholas opened a new pull request, #7928:
URL: https://github.com/apache/hudi/pull/7928

   ### Change Logs
   
   In `FlinkOptions`, `FlinkClusteringConfig` and `FlinkStreamerConfig`, there 
are `clustering.plan.strategy.cluster.begin.partition`, 
`clustering.plan.strategy.cluster.end.partition`, 
`clustering.plan.strategy.partition.regex.pattern`, 
`clustering.plan.strategy.partition.selected` options which do not align the 
clustering configuration of `HoodieClusteringConfig`. `FlinkOptions`, 
`FlinkClusteringConfig` and `FlinkStreamerConfig` should align Flink clustering 
configuration with `HoodieClusteringConfig`.
   
   ### Impact
   
   Align Flink clustering configuration with `HoodieClusteringConfig` in 
`FlinkOptions`, `FlinkClusteringConfig` and `FlinkStreamerConfig`.
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7915:
URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427469490

   
   ## CI report:
   
   * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN
   * 52ff32a1bb04340505e309191c398d95a9c8f928 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15127)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6121: [HUDI-4406] Support Flink compaction/clustering write error resolvement to avoid data loss

2023-02-12 Thread via GitHub



hudi-bot commented on PR #6121:
URL: https://github.com/apache/hudi/pull/6121#issuecomment-1427466320

   
   ## CI report:
   
   * 52b6f55e196007f993b0506d899c48bb80b36546 UNKNOWN
   * 5dc463fcade7c5a495cca1437fca8230b01d0229 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15126)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-5772) Align Flink clustering configuration with HoodieClusteringConfig

2023-02-12 Thread Nicholas Jiang (Jira)

Nicholas Jiang created HUDI-5772:


 Summary: Align Flink clustering configuration with 
HoodieClusteringConfig
 Key: HUDI-5772
 URL: https://issues.apache.org/jira/browse/HUDI-5772
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink
Affects Versions: 0.13.1
Reporter: Nicholas Jiang
Assignee: Nicholas Jiang


In FlinkOptions, FlinkClusteringConfig and FlinkStreamerConfig, there are 
'clustering.plan.strategy.cluster.begin.partition', 
'clustering.plan.strategy.cluster.end.partition', 
'clustering.plan.strategy.partition.regex.pattern', 
'clustering.plan.strategy.partition.selected' options which do not align the 
clustering configuration of HoodieClusteringConfig. FlinkOptions, 
FlinkClusteringConfig and FlinkStreamerConfig should align Flink clustering 
configuration with HoodieClusteringConfig.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

svn commit: r60068 - in /dev/hudi/hudi-0.13.0-rc3: ./ hudi-0.13.0-rc3.src.tgz hudi-0.13.0-rc3.src.tgz.asc hudi-0.13.0-rc3.src.tgz.sha512

2023-02-12 Thread yihua

Author: yihua
Date: Mon Feb 13 06:45:46 2023
New Revision: 60068

Log:
Add Apache Hudi 0.13.0 RC3 source release

Added:
dev/hudi/hudi-0.13.0-rc3/
dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz   (with props)
dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.asc
dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.sha512

Added: dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz
==
Binary file - no diff available.

Propchange: dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz
--
svn:mime-type = application/octet-stream

Added: dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.asc
==
--- dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.asc (added)
+++ dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.asc Mon Feb 13 06:45:46 
2023
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEEiIqTQeYA64VQqs1e+xt1BPf3cMkFAmPp1+cACgkQ+xt1BPf3
+cMn8dRAAm3le+qkP49Qnwi/t5qDWvgfUALXRH9KlUU9Efo4ChCHnuTBgmNmcjvJ/
+af2FBuxeMfg5GRbgm0bkHhYpx58CcjWPdi8zGLiL+ih5fBwqvbLZGVM/jpHtrmur
+dAoZX5Sq5MLtf8vigzAT9GfHD36g43dtWWBoYCGzfUBGi2ZETNnEAkbGF5M3lkxh
+1R9ysXk9u79Cm1UkC4HDDozDdj+U51XegyGYf+2QrGqCVeIZ69JrfF6vlIsr0Jl4
+Wj6T4ZURANjBhpA2n87r2DZhjCLobMgnQZiB1Va52U4Z6Ocu2s6Nc47nI+piLenF
+JFWj5YyFR+AzWqzTPRvj8U1CguD3bHkZfFS3ioOllkvtRh+BCGO8HXkgnmzVbv67
+RedUHBfTVdp/4PKWlg2dptLpSNzRwDFYjcyYP3yeMIQ7BfpOHPJ/Vdp/udM2+lRt
+h9+tAagSeU1nxVNxj7fgzQBVtcpsmHA0uRz1YzCco8jmSWNG7evtGU9vwYShIf0m
+LurVV3SexbK9iLhS2H2pNiuhAxvpEc3BqmaBA8KghdmjmrZmq13VSWuZiSDj8qtM
+v3S/F3J8ifVbIgbF5oXLiuZ++untmVrqnKDghYMPIy3/5GQ4XSG2ueSNG7Hz0PYV
+veoPUUcPs6aJeP2EqYYen9amSkn3fwC5bWMVBneosusdLpZLr0g=
+=n5Le
+-END PGP SIGNATURE-

Added: dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.sha512
==
--- dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.sha512 (added)
+++ dev/hudi/hudi-0.13.0-rc3/hudi-0.13.0-rc3.src.tgz.sha512 Mon Feb 13 06:45:46 
2023
@@ -0,0 +1 @@
+c725ce843c5800483b69098cda7a0f2380b0f8e502441335d4c613cf023202c93b6cdb4922983c029945f7af1dd1e2fa8bde58fcade87921fdff9f84f57d5559
  hudi-0.13.0-rc3.src.tgz

[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427424545

   
   ## CI report:
   
   * 50480623485bb99353655f4c6df23a2462214f7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15123)
 
   * d8560fd11027818c5f2a218deeae3b68a6fa6420 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15130)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427419585

   
   ## CI report:
   
   * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15124)
 
   * 0f35441097e274abe020127c5bd2a5f3d46e0b99 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15129)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427419115

   
   ## CI report:
   
   * 50480623485bb99353655f4c6df23a2462214f7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15123)
 
   * d8560fd11027818c5f2a218deeae3b68a6fa6420 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] annotated tag release-0.13.0-rc3 updated (fe664886029 -> 91c28298a13)

2023-02-12 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to annotated tag release-0.13.0-rc3
in repository https://gitbox.apache.org/repos/asf/hudi.git


*** WARNING: tag release-0.13.0-rc3 was modified! ***

from fe664886029 (commit)
  to 91c28298a13 (tag)
 tagging fe664886029657eb2c2c303be18aaf1c598a7181 (commit)
 replaces release-0.13.0-rc2
  by Y Ethan Guo
  on Sun Feb 12 22:24:16 2023 -0800

- Log -
0.13.0
-BEGIN PGP SIGNATURE-

iQIzBAABCAAdFiEEiIqTQeYA64VQqs1e+xt1BPf3cMkFAmPp15AACgkQ+xt1BPf3
cMllnA/+MQyKJAb9An3mmdor5jOQ9ObhkvMZVUASCHC00HkpWhRNXtKt48hXgZJ4
gzuWPI0/B5uze5JD1M9+gHXHhcvPrj2FctTMcHbFkwr1ZlMj2ulrDj1zyLR9wSqG
+8VU6w92GyURtHO9lmzCvplY1NeHr7SUOy9mIT3tsAVB9JLwLh2R0Rtd6iD3zhnq
8sDcvz0A7QJfPRzbKI3h9368FbtQM9z27+xEwaeGfqRyMDMfJ4VVUQCV79a9jVUi
5G77JsArrasjsqbAzmFkzAYC671hNNe615TA8WHExb6nzJFuMbzihajo4U2gz4K/
L2777N/DTKRLDSLcQzOinNe5kZXdAOgnDQBNlNZ/J6dvfNFU56gU9FNn3QaO9N5c
OVXT0C4yOvbh12iqnwo8wTOz4qMwauyATPqqo28liglIpNrXN0VuKFFl3ZizS7Sf
ykdD2XEDiUhzL1Rsclr9LI9pK0JUpem3SOT0mCBjOA9MUv+sp8E99XpnmnGmrUUQ
4YHpOlXYjPQFuMP0qIeKL8ThsUlbAERw9ccmVGR1ik0IHss5Cejn1r04IIIeHOhn
oWTlk6raua2J+1d6T+ZFWAZUgEk+QXeIX6LsnC/vgcKvCQ5lax3GrZA775WOciUs
9UiUt1gTmm6+bstKPfSrngnovX94X0Zs86gsUDE2RET4EyVcBY8=
=8GtY
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:

[hudi] 05/05: Bumping release candidate number 3

2023-02-12 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch release-0.13.0
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit fe664886029657eb2c2c303be18aaf1c598a7181
Author: Y Ethan Guo 
AuthorDate: Sun Feb 12 22:22:38 2023 -0800

Bumping release candidate number 3
---
 docker/hoodie/hadoop/base/pom.xml| 2 +-
 docker/hoodie/hadoop/base_java11/pom.xml | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml| 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml   | 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml   | 2 +-
 docker/hoodie/hadoop/namenode/pom.xml| 2 +-
 docker/hoodie/hadoop/pom.xml | 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml  | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml  | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml  | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml | 2 +-
 docker/hoodie/hadoop/trinobase/pom.xml   | 2 +-
 docker/hoodie/hadoop/trinocoordinator/pom.xml| 2 +-
 docker/hoodie/hadoop/trinoworker/pom.xml | 2 +-
 hudi-aws/pom.xml | 4 ++--
 hudi-cli/pom.xml | 2 +-
 hudi-client/hudi-client-common/pom.xml   | 4 ++--
 hudi-client/hudi-flink-client/pom.xml| 4 ++--
 hudi-client/hudi-java-client/pom.xml | 4 ++--
 hudi-client/hudi-spark-client/pom.xml| 4 ++--
 hudi-client/pom.xml  | 2 +-
 hudi-common/pom.xml  | 2 +-
 hudi-examples/hudi-examples-common/pom.xml   | 2 +-
 hudi-examples/hudi-examples-flink/pom.xml| 2 +-
 hudi-examples/hudi-examples-java/pom.xml | 2 +-
 hudi-examples/hudi-examples-spark/pom.xml| 2 +-
 hudi-examples/pom.xml| 2 +-
 hudi-flink-datasource/hudi-flink/pom.xml | 4 ++--
 hudi-flink-datasource/hudi-flink1.13.x/pom.xml   | 4 ++--
 hudi-flink-datasource/hudi-flink1.14.x/pom.xml   | 4 ++--
 hudi-flink-datasource/hudi-flink1.15.x/pom.xml   | 4 ++--
 hudi-flink-datasource/hudi-flink1.16.x/pom.xml   | 4 ++--
 hudi-flink-datasource/pom.xml| 4 ++--
 hudi-gcp/pom.xml | 2 +-
 hudi-hadoop-mr/pom.xml   | 2 +-
 hudi-integ-test/pom.xml  | 2 +-
 hudi-kafka-connect/pom.xml   | 4 ++--
 hudi-platform-service/hudi-metaserver/hudi-metaserver-client/pom.xml | 2 +-
 hudi-platform-service/hudi-metaserver/hudi-metaserver-server/pom.xml | 2 +-
 hudi-platform-service/hudi-metaserver/pom.xml| 4 ++--
 hudi-platform-service/pom.xml| 2 +-
 hudi-spark-datasource/hudi-spark-common/pom.xml  | 4 ++--
 hudi-spark-datasource/hudi-spark/pom.xml | 4 ++--
 hudi-spark-datasource/hudi-spark2-common/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark2/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark3-common/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark3.1.x/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark3.2.x/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark3.2plus-common/pom.xml   | 2 +-
 hudi-spark-datasource/hudi-spark3.3.x/pom.xml| 4 ++--
 hudi-spark-datasource/pom.xml| 2 +-
 hudi-sync/hudi-adb-sync/pom.xml  | 2 +-
 hudi-sync/hudi-datahub-sync/pom.xml  | 2 +-
 hudi-sync/hudi-hive-sync/pom.xml | 2 +-
 hudi-sync/hudi-sync-common/pom.xml   | 2 +-
 hudi-sync/pom.xml| 2 +-
 hudi-tests-common/pom.xml| 2 +-
 hudi-timeline-service/pom.xml| 2 +-
 hudi-utilities/pom.xml   | 2 +-

[hudi] branch release-0.13.0 updated (820006e025a -> fe664886029)

2023-02-12 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch release-0.13.0
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 820006e025a [HUDI-5718] Unsupported Operation Exception for compaction 
(#7874)
 new 847e7a975bf [HUDI-5758] Restoring state of `HoodieKey` to make sure 
it's binary compatible w/ its state in 0.12 (#7917)
 new 7ccf6e67827 [HUDI-5768] Fix Spark Datasource read of metadata table 
(#7924)
 new d4106f35b4a [HUDI-5764] Rollback delta commits from `HoodieIndexer` 
lazily in metadata table (#7921)
 new 4254fc9f482 [HUDI-5771] Improve deploy script of release artifacts 
(#7927)
 new fe664886029 Bumping release candidate number 3

The 5 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docker/hoodie/hadoop/base/pom.xml  |   2 +-
 docker/hoodie/hadoop/base_java11/pom.xml   |   2 +-
 docker/hoodie/hadoop/datanode/pom.xml  |   2 +-
 docker/hoodie/hadoop/historyserver/pom.xml |   2 +-
 docker/hoodie/hadoop/hive_base/pom.xml |   2 +-
 docker/hoodie/hadoop/namenode/pom.xml  |   2 +-
 docker/hoodie/hadoop/pom.xml   |   2 +-
 docker/hoodie/hadoop/prestobase/pom.xml|   2 +-
 docker/hoodie/hadoop/spark_base/pom.xml|   2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml|   2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml   |   2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml   |   2 +-
 docker/hoodie/hadoop/trinobase/pom.xml |   2 +-
 docker/hoodie/hadoop/trinocoordinator/pom.xml  |   2 +-
 docker/hoodie/hadoop/trinoworker/pom.xml   |   2 +-
 hudi-aws/pom.xml   |   4 +-
 hudi-cli/pom.xml   |   2 +-
 hudi-client/hudi-client-common/pom.xml |   4 +-
 .../hudi/client/BaseHoodieTableServiceClient.java  |  48 +
 .../apache/hudi/client/BaseHoodieWriteClient.java  |  13 +++
 .../metadata/HoodieBackedTableMetadataWriter.java  |  34 ---
 .../java/org/apache/hudi/table/HoodieTable.java|  38 +++-
 .../table/action/index/RunIndexActionExecutor.java |   5 +-
 hudi-client/hudi-flink-client/pom.xml  |   4 +-
 .../FlinkHoodieBackedTableMetadataWriter.java  |  21 +++-
 .../org/apache/hudi/table/HoodieFlinkTable.java|  12 ++-
 hudi-client/hudi-java-client/pom.xml   |   4 +-
 hudi-client/hudi-spark-client/pom.xml  |   4 +-
 .../SparkHoodieBackedTableMetadataWriter.java  |  20 +++-
 .../org/apache/hudi/table/HoodieSparkTable.java|  10 +-
 .../apache/spark/HoodieSparkKryoRegistrar.scala|  25 -
 hudi-client/pom.xml|   2 +-
 hudi-common/pom.xml|   2 +-
 .../org/apache/hudi/common/model/DeleteRecord.java |   9 ++
 .../org/apache/hudi/common/model/HoodieKey.java|  28 ++
 .../common/table/log/block/HoodieDeleteBlock.java  |   2 +
 .../hudi/metadata/HoodieBackedTableMetadata.java   |  12 ++-
 .../hudi/metadata/HoodieTableMetadataUtil.java |  20 
 hudi-examples/hudi-examples-common/pom.xml |   2 +-
 hudi-examples/hudi-examples-flink/pom.xml  |   2 +-
 hudi-examples/hudi-examples-java/pom.xml   |   2 +-
 hudi-examples/hudi-examples-spark/pom.xml  |   2 +-
 hudi-examples/pom.xml  |   2 +-
 hudi-flink-datasource/hudi-flink/pom.xml   |   4 +-
 hudi-flink-datasource/hudi-flink1.13.x/pom.xml |   4 +-
 hudi-flink-datasource/hudi-flink1.14.x/pom.xml |   4 +-
 hudi-flink-datasource/hudi-flink1.15.x/pom.xml |   4 +-
 hudi-flink-datasource/hudi-flink1.16.x/pom.xml |   4 +-
 hudi-flink-datasource/pom.xml  |   4 +-
 hudi-gcp/pom.xml   |   2 +-
 hudi-hadoop-mr/pom.xml |   2 +-
 hudi-integ-test/pom.xml|   2 +-
 hudi-kafka-connect/pom.xml |   4 +-
 .../hudi-metaserver/hudi-metaserver-client/pom.xml |   2 +-
 .../hudi-metaserver/hudi-metaserver-server/pom.xml |   2 +-
 hudi-platform-service/hudi-metaserver/pom.xml  |   4 +-
 hudi-platform-service/pom.xml  |   2 +-
 hudi-spark-datasource/hudi-spark-common/pom.xml|   4 +-
 .../scala/org/apache/hudi/HoodieBaseRelation.scala |   5 +-
 hudi-spark-datasource/hudi-spark/pom.xml   |   4 +-
 hudi-spark-datasource/hudi-spark2-common/pom.xml   |   2 +-
 hudi-spark-datasource/hudi-spark2/pom.xml  |   4 +-
 hudi-spark-datasource/hudi-spark3-common/pom.xml   |   2 +-
 hudi-spark-datasource/hudi-spark3.1.x/pom.xml  |   4 +-
 hudi-spark-datasource/hudi-spark3.2.x/pom.xml  |   4 +-

[hudi] 03/05: [HUDI-5764] Rollback delta commits from `HoodieIndexer` lazily in metadata table (#7921)

2023-02-12 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch release-0.13.0
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit d4106f35b4aee53ea5cb1430288f397b37c81183
Author: Y Ethan Guo 
AuthorDate: Sun Feb 12 03:30:10 2023 -0800

[HUDI-5764] Rollback delta commits from `HoodieIndexer` lazily in metadata 
table (#7921)

Fixes two issues:
- Makes the rollback of indexing delta commit lazy in the metadata table, 
otherwise, it would be cleaned up eagerly by other regular writes.
- Uses a suffix (004) appending to the up-to-instant used by the async 
index to avoid collision with existing completed delta commit of the same 
instant time.
---
 .../hudi/client/BaseHoodieTableServiceClient.java  |  48 +
 .../apache/hudi/client/BaseHoodieWriteClient.java  |  13 +++
 .../metadata/HoodieBackedTableMetadataWriter.java  |  34 ---
 .../java/org/apache/hudi/table/HoodieTable.java|  38 +++-
 .../table/action/index/RunIndexActionExecutor.java |   5 +-
 .../FlinkHoodieBackedTableMetadataWriter.java  |  21 +++-
 .../org/apache/hudi/table/HoodieFlinkTable.java|  12 ++-
 .../SparkHoodieBackedTableMetadataWriter.java  |  20 +++-
 .../org/apache/hudi/table/HoodieSparkTable.java|  10 +-
 .../hudi/metadata/HoodieBackedTableMetadata.java   |  12 ++-
 .../hudi/metadata/HoodieTableMetadataUtil.java |  20 
 .../apache/hudi/utilities/TestHoodieIndexer.java   | 108 +++--
 12 files changed, 298 insertions(+), 43 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
index 390bc4b9714..301ed61bf4e 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
@@ -48,6 +48,7 @@ import org.apache.hudi.config.HoodieWriteConfig;
 import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.exception.HoodieRollbackException;
+import org.apache.hudi.metadata.HoodieTableMetadata;
 import org.apache.hudi.metadata.HoodieTableMetadataWriter;
 import org.apache.hudi.table.HoodieTable;
 import org.apache.hudi.table.action.HoodieWriteMetadata;
@@ -71,6 +72,7 @@ import java.util.stream.Collectors;
 import java.util.stream.Stream;
 
 import static org.apache.hudi.common.util.ValidationUtils.checkArgument;
+import static 
org.apache.hudi.metadata.HoodieTableMetadataUtil.isIndexingCommit;
 
 public abstract class BaseHoodieTableServiceClient extends BaseHoodieClient 
implements RunsTableService {
 
@@ -659,8 +661,41 @@ public abstract class BaseHoodieTableServiceClient 
extends BaseHoodieClient i
 return infoMap;
   }
 
+  /**
+   * Rolls back the failed delta commits corresponding to the indexing action.
+   * Such delta commits are identified based on the suffix 
`METADATA_INDEXER_TIME_SUFFIX` ("004").
+   * 
+   * TODO(HUDI-5733): This should be cleaned up once the proper fix of 
rollbacks
+   *  in the metadata table is landed.
+   *
+   * @return {@code true} if rollback happens; {@code false} otherwise.
+   */
+  protected boolean rollbackFailedIndexingCommits() {
+HoodieTable table = createTable(config, hadoopConf);
+List instantsToRollback = 
getFailedIndexingCommitsToRollback(table.getMetaClient());
+Map> pendingRollbacks = 
getPendingRollbackInfos(table.getMetaClient());
+instantsToRollback.forEach(entry -> pendingRollbacks.putIfAbsent(entry, 
Option.empty()));
+rollbackFailedWrites(pendingRollbacks);
+return !pendingRollbacks.isEmpty();
+  }
+
+  protected List 
getFailedIndexingCommitsToRollback(HoodieTableMetaClient metaClient) {
+Stream inflightInstantsStream = 
metaClient.getCommitsTimeline()
+.filter(instant -> !instant.isCompleted()
+&& isIndexingCommit(instant.getTimestamp()))
+.getInstantsAsStream();
+return inflightInstantsStream.filter(instant -> {
+  try {
+return heartbeatClient.isHeartbeatExpired(instant.getTimestamp());
+  } catch (IOException io) {
+throw new HoodieException("Failed to check heartbeat for instant " + 
instant, io);
+  }
+}).map(HoodieInstant::getTimestamp).collect(Collectors.toList());
+  }
+
   /**
* Rollback all failed writes.
+   *
* @return true if rollback was triggered. false otherwise.
*/
   protected Boolean rollbackFailedWrites() {
@@ -699,6 +734,19 @@ public abstract class BaseHoodieTableServiceClient 
extends BaseHoodieClient i
 Stream inflightInstantsStream = 
getInflightTimelineExcludeCompactionAndClustering(metaClient)
 .getReverseOrderedInstants();
 if (cleaningPolicy.isEager()) {
+  // Metadata table uses eager cleaning policy, but

[hudi] 02/05: [HUDI-5768] Fix Spark Datasource read of metadata table (#7924)

2023-02-12 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch release-0.13.0
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 7ccf6e678278ceca592b8d95160bb0b17906928f
Author: Y Ethan Guo 
AuthorDate: Sun Feb 12 03:25:51 2023 -0800

[HUDI-5768] Fix Spark Datasource read of metadata table (#7924)
---
 .../src/main/scala/org/apache/hudi/HoodieBaseRelation.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
index bf3d38b808d..8a730a8334b 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
@@ -42,6 +42,7 @@ import 
org.apache.hudi.internal.schema.convert.AvroInternalSchemaConverter
 import org.apache.hudi.internal.schema.utils.{InternalSchemaUtils, SerDeHelper}
 import org.apache.hudi.internal.schema.{HoodieSchemaException, InternalSchema}
 import org.apache.hudi.io.storage.HoodieAvroHFileReader
+import org.apache.hudi.metadata.HoodieTableMetadata
 import org.apache.spark.execution.datasources.HoodieInMemoryFileIndex
 import org.apache.spark.internal.Logging
 import org.apache.spark.rdd.RDD
@@ -59,7 +60,6 @@ import org.apache.spark.sql.{Row, SQLContext, SparkSession}
 import org.apache.spark.unsafe.types.UTF8String
 
 import java.net.URI
-import java.util.Locale
 import scala.collection.JavaConverters._
 import scala.util.control.NonFatal
 import scala.util.{Failure, Success, Try}
@@ -292,7 +292,8 @@ abstract class HoodieBaseRelation(val sqlContext: 
SQLContext,
* Determines whether relation's schema could be pruned by Spark's Optimizer
*/
   def canPruneRelationSchema: Boolean =
-(fileFormat.isInstanceOf[ParquetFileFormat] || 
fileFormat.isInstanceOf[OrcFileFormat]) &&
+!HoodieTableMetadata.isMetadataTable(basePath.toString) &&
+  (fileFormat.isInstanceOf[ParquetFileFormat] || 
fileFormat.isInstanceOf[OrcFileFormat]) &&
   // NOTE: In case this relation has already been pruned there's no point 
in pruning it again
   prunedDataSchema.isEmpty &&
   // TODO(HUDI-5421) internal schema doesn't support nested schema pruning 
currently

[hudi] 01/05: [HUDI-5758] Restoring state of `HoodieKey` to make sure it's binary compatible w/ its state in 0.12 (#7917)

2023-02-12 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch release-0.13.0
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 847e7a975bfeb94956885cc252285f95afc4a843
Author: Alexey Kudinkin 
AuthorDate: Fri Feb 10 15:02:47 2023 -0800

[HUDI-5758] Restoring state of `HoodieKey` to make sure it's binary 
compatible w/ its state in 0.12 (#7917)

RFC-46 modified `HoodieKey` to substantially optimize its serialized 
footprint (while using Kryo) by making it explicitly serializable by Kryo 
(inheriting form `KryoSerializable`, making it final).

However, this broken its binary compatibility w/ the state as it was in 
0.12.2.

Unfortunately, this entailed that as this class is used in `DeleteRecord` 
w/in `HoodieDeleteBlock` that it also made impossible to read such blocks 
created by prior Hudi versions (more details in HUDI-5758).

This PR restores previous state for `HoodieKey` to make sure it stays 
binary compatible w/ existing persisted `HoodieDeleteBlock` created by prior 
Hudi versions
---
 .../apache/spark/HoodieSparkKryoRegistrar.scala| 25 +--
 .../org/apache/hudi/common/model/DeleteRecord.java |  9 +++
 .../org/apache/hudi/common/model/HoodieKey.java| 28 --
 .../common/table/log/block/HoodieDeleteBlock.java  |  2 ++
 4 files changed, 44 insertions(+), 20 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/HoodieSparkKryoRegistrar.scala
 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/HoodieSparkKryoRegistrar.scala
index 3894065d809..9d7fa3b784f 100644
--- 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/HoodieSparkKryoRegistrar.scala
+++ 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/HoodieSparkKryoRegistrar.scala
@@ -18,11 +18,12 @@
 
 package org.apache.spark
 
-import com.esotericsoftware.kryo.Kryo
+import com.esotericsoftware.kryo.io.{Input, Output}
+import com.esotericsoftware.kryo.{Kryo, Serializer}
 import com.esotericsoftware.kryo.serializers.JavaSerializer
 import org.apache.hudi.client.model.HoodieInternalRow
 import org.apache.hudi.common.config.SerializableConfiguration
-import org.apache.hudi.common.model.HoodieSparkRecord
+import org.apache.hudi.common.model.{HoodieKey, HoodieSparkRecord}
 import org.apache.hudi.common.util.HoodieCommonKryoRegistrar
 import org.apache.hudi.config.HoodieWriteConfig
 import org.apache.spark.serializer.KryoRegistrator
@@ -44,12 +45,15 @@ import org.apache.spark.serializer.KryoRegistrator
  * 
  */
 class HoodieSparkKryoRegistrar extends HoodieCommonKryoRegistrar with 
KryoRegistrator {
+
   override def registerClasses(kryo: Kryo): Unit = {
 ///
 // NOTE: DO NOT REORDER REGISTRATIONS
 ///
 super[HoodieCommonKryoRegistrar].registerClasses(kryo)
 
+kryo.register(classOf[HoodieKey], new HoodieKeySerializer)
+
 kryo.register(classOf[HoodieWriteConfig])
 
 kryo.register(classOf[HoodieSparkRecord])
@@ -59,6 +63,23 @@ class HoodieSparkKryoRegistrar extends 
HoodieCommonKryoRegistrar with KryoRegist
 //   we're relying on [[SerializableConfiguration]] wrapper to work it 
around
 kryo.register(classOf[SerializableConfiguration], new JavaSerializer())
   }
+
+  /**
+   * NOTE: This {@link Serializer} could deserialize instance of {@link 
HoodieKey} serialized
+   *   by implicitly generated Kryo serializer (based on {@link 
com.esotericsoftware.kryo.serializers.FieldSerializer}
+   */
+  class HoodieKeySerializer extends Serializer[HoodieKey] {
+override def write(kryo: Kryo, output: Output, key: HoodieKey): Unit = {
+  output.writeString(key.getRecordKey)
+  output.writeString(key.getPartitionPath)
+}
+
+override def read(kryo: Kryo, input: Input, klass: Class[HoodieKey]): 
HoodieKey = {
+  val recordKey = input.readString()
+  val partitionPath = input.readString()
+  new HoodieKey(recordKey, partitionPath)
+}
+  }
 }
 
 object HoodieSparkKryoRegistrar {
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/model/DeleteRecord.java 
b/hudi-common/src/main/java/org/apache/hudi/common/model/DeleteRecord.java
index 003b591c20c..296e95e8bfa 100644
--- a/hudi-common/src/main/java/org/apache/hudi/common/model/DeleteRecord.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/model/DeleteRecord.java
@@ -28,6 +28,15 @@ import java.util.Objects;
  * we need to keep the ordering val to combine with the data records when 
merging, or the data loss
  * may occur if there are intermediate deletions for the inputs
  * (a new INSERT comes after a DELETE in one input batch).
+ *
+ * NOTE: PLEASE READ CAREFULLY BEFORE CHANGING
+ *
+ *   This class is serialized (using Kryo) as part of {@code 
HoodieDeleteBlock} to

[hudi] 04/05: [HUDI-5771] Improve deploy script of release artifacts (#7927)

2023-02-12 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch release-0.13.0
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 4254fc9f4829733c24d4c22c78ae855df7755798
Author: Y Ethan Guo 
AuthorDate: Sun Feb 12 22:14:31 2023 -0800

[HUDI-5771] Improve deploy script of release artifacts (#7927)

The current scripts/release/deploy_staging_jars.sh took around 6 hours to 
upload all release artifacts to the Apache Nexus staging repository, which is 
too long.  This commit cuts down the upload time by 70% to <2 hours, without 
changing the intended jars for uploads.
---
 scripts/release/deploy_staging_jars.sh | 74 --
 1 file changed, 34 insertions(+), 40 deletions(-)

diff --git a/scripts/release/deploy_staging_jars.sh 
b/scripts/release/deploy_staging_jars.sh
index 049e5ee7144..7d44e5ffa96 100755
--- a/scripts/release/deploy_staging_jars.sh
+++ b/scripts/release/deploy_staging_jars.sh
@@ -36,38 +36,41 @@ if [ "$#" -gt "1" ]; then
   exit 1
 fi
 
-BUNDLE_MODULES=$(find -s packaging -name 'hudi-*-bundle' -type d)
-BUNDLE_MODULES_EXCLUDED="-${BUNDLE_MODULES//$'\n'/,-}"
-
 declare -a ALL_VERSION_OPTS=(
-# upload all module jars and bundle jars
-"-Dscala-2.11 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.3 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.2 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.1"  # this profile goes last in this section to ensure 
bundles use avro 1.8
-
-# spark bundles
-"-Dscala-2.11 -Dspark2.4 -pl 
packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am"
+# Upload Spark specific modules and bundle jars
+# For Spark 2.4, Scala 2.11:
+# hudi-spark-common_2.11
+# hudi-spark_2.11
+# hudi-spark2_2.11
+# hudi-utilities_2.11
+# hudi-cli-bundle_2.11
+# hudi-spark2.4-bundle_2.11
+# hudi-utilities-bundle_2.11
+# hudi-utilities-slim-bundle_2.11
+"-Dscala-2.11 -Dspark2.4 -pl 
hudi-spark-datasource/hudi-spark-common,hudi-spark-datasource/hudi-spark2,hudi-spark-datasource/hudi-spark,hudi-utilities,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle
 -am"
+# For Spark 2.4, Scala 2.12:
+# hudi-spark2.4-bundle_2.12
 "-Dscala-2.12 -Dspark2.4 -pl packaging/hudi-spark-bundle -am"
-"-Dscala-2.12 -Dspark3.3 -pl 
packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am"
-"-Dscala-2.12 -Dspark3.2 -pl packaging/hudi-spark-bundle -am"
-"-Dscala-2.12 -Dspark3.1 -pl packaging/hudi-spark-bundle -am"
-
-# spark bundles (legacy) (not overwriting previous uploads as these jar names 
are unique)
+# For Spark 3.2, Scala 2.12:
+# hudi-spark3.2.x_2.12
+# hudi-spark3.2plus-common
+# hudi-spark3.2-bundle_2.12
+"-Dscala-2.12 -Dspark3.2 -pl 
hudi-spark-datasource/hudi-spark3.2.x,hudi-spark-datasource/hudi-spark3.2plus-common,packaging/hudi-spark-bundle
 -am"
+# For Spark 3.1, Scala 2.12:
+# All other modules and bundles using avro 1.8
+"-Dscala-2.12 -Dspark3.1"
+# For Spark 3.3, Scala 2.12:
+# hudi-spark3.3.x_2.12
+# hudi-cli-bundle_2.12
+# hudi-spark3.3-bundle_2.12
+"-Dscala-2.12 -Dspark3.3 -pl 
hudi-spark-datasource/hudi-spark3.3.x,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle
 -am"
+
+# Upload legacy Spark bundles (not overwriting previous uploads as these jar 
names are unique)
 "-Dscala-2.11 -Dspark2 -pl packaging/hudi-spark-bundle -am" # for legacy 
bundle name hudi-spark-bundle_2.11
 "-Dscala-2.12 -Dspark2 -pl packaging/hudi-spark-bundle -am" # for legacy 
bundle name hudi-spark-bundle_2.12
 "-Dscala-2.12 -Dspark3 -pl packaging/hudi-spark-bundle -am" # for legacy 
bundle name hudi-spark3-bundle_2.12
 
-# utilities bundles (legacy) (overwriting previous uploads)
-"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-utilities-bundle -am" # 
hudi-utilities-bundle_2.11 is for spark 2.4 only
-"-Dscala-2.12 -Dspark3.1 -pl packaging/hudi-utilities-bundle -am" # 
hudi-utilities-bundle_2.12 is for spark 3.1 only
-
-# utilities slim bundles
-"-Dscala-2.11 -Dspark2.4 -pl packaging/hudi-utilities-slim-bundle -am" # 
hudi-utilities-slim-bundle_2.11
-"-Dscala-2.12 -Dspark3.1 -pl packaging/hudi-utilities-slim-bundle -am" # 
hudi-utilities-slim-bundle_2.12
-
-# flink bundles (overwriting previous uploads)
+# Upload Flink bundles (overwriting previous uploads)
 "-Dscala-2.12 -Dflink1.13 -Davro.version=1.10.0 -pl 
packaging/hudi-flink-bundle -am"
 "-Dscala-2.12 -Dflink1.14 -Davro.version=1.10.0 -pl 
packaging/hudi-flink-bundle -am"
 "-Dscala-2.12 -Dflink1.15 -Davro.version=1.10.0 -pl 
packaging/hudi-flink-bundle -am"
@@ -105,20 +108,11 @@ COMMON_OPTIONS="-DdeployArtifacts=true -DskipTests 
-DretryFailedDeploymentCount=
 for v in "${ALL_VERSION_OPTS[@]}"
 do
   # TODO: consider cleaning all modules by listing directories instead of 
specifying profile
-  if [[ "$v" == *"$BUNDLE_MODULES_EXCLUDED" ]]; then
-# When deploying jars with bundle exclusions, we still need to build the

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427414425

   
   ## CI report:
   
   * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15124)
 
   * 0f35441097e274abe020127c5bd2a5f3d46e0b99 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (e25381c6966 -> a932e482408)

2023-02-12 Thread yihua

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from e25381c6966 [HUDI-5764] Rollback delta commits from `HoodieIndexer` 
lazily in metadata table (#7921)
 add a932e482408 [HUDI-5771] Improve deploy script of release artifacts 
(#7927)

No new revisions were added by this update.

Summary of changes:
 scripts/release/deploy_staging_jars.sh | 74 --
 1 file changed, 34 insertions(+), 40 deletions(-)

[GitHub] [hudi] yihua merged pull request #7927: [HUDI-5771] Improve deploy script of release artifacts

2023-02-12 Thread via GitHub



yihua merged PR #7927:
URL: https://github.com/apache/hudi/pull/7927


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] yihua commented on a diff in pull request #7927: [HUDI-5771] Improve deploy script of release artifacts

2023-02-12 Thread via GitHub



yihua commented on code in PR #7927:
URL: https://github.com/apache/hudi/pull/7927#discussion_r1104034135


##
scripts/release/deploy_staging_jars.sh:
##
@@ -36,38 +36,41 @@ if [ "$#" -gt "1" ]; then
   exit 1
 fi
 
-BUNDLE_MODULES=$(find -s packaging -name 'hudi-*-bundle' -type d)
-BUNDLE_MODULES_EXCLUDED="-${BUNDLE_MODULES//$'\n'/,-}"
-
 declare -a ALL_VERSION_OPTS=(
-# upload all module jars and bundle jars
-"-Dscala-2.11 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.3 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.2 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.1"  # this profile goes last in this section to ensure 
bundles use avro 1.8
-
-# spark bundles
-"-Dscala-2.11 -Dspark2.4 -pl 
packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am"
+# Upload Spark specific modules and bundle jars
+# For Spark 2.4, Scala 2.11:
+# hudi-spark-common_2.11
+# hudi-spark_2.11
+# hudi-spark2_2.11
+# hudi-utilities_2.11
+# hudi-cli-bundle_2.11
+# hudi-spark2.4-bundle_2.11
+# hudi-utilities-bundle_2.11
+# hudi-utilities-slim-bundle_2.11
+"-Dscala-2.11 -Dspark2.4 -pl 
hudi-spark-datasource/hudi-spark-common,hudi-spark-datasource/hudi-spark2,hudi-spark-datasource/hudi-spark,hudi-utilities,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle
 -am"

Review Comment:
   Yes, it is still uploaded.  If you check the 
[staging_file_timestamp.txt](https://github.com/apache/hudi/files/10719051/staging_file_timestamp.txt),
 `hudi-spark2-common` is uploaded by `-Dscala-2.12 -Dspark3.1` profile.  I keep 
it the same for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on a diff in pull request #7927: [HUDI-5771] Improve deploy script of release artifacts

2023-02-12 Thread via GitHub



xushiyan commented on code in PR #7927:
URL: https://github.com/apache/hudi/pull/7927#discussion_r1104030123


##
scripts/release/deploy_staging_jars.sh:
##
@@ -36,38 +36,41 @@ if [ "$#" -gt "1" ]; then
   exit 1
 fi
 
-BUNDLE_MODULES=$(find -s packaging -name 'hudi-*-bundle' -type d)
-BUNDLE_MODULES_EXCLUDED="-${BUNDLE_MODULES//$'\n'/,-}"
-
 declare -a ALL_VERSION_OPTS=(
-# upload all module jars and bundle jars
-"-Dscala-2.11 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark2.4 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.3 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.2 -pl $BUNDLE_MODULES_EXCLUDED"
-"-Dscala-2.12 -Dspark3.1"  # this profile goes last in this section to ensure 
bundles use avro 1.8
-
-# spark bundles
-"-Dscala-2.11 -Dspark2.4 -pl 
packaging/hudi-spark-bundle,packaging/hudi-cli-bundle -am"
+# Upload Spark specific modules and bundle jars
+# For Spark 2.4, Scala 2.11:
+# hudi-spark-common_2.11
+# hudi-spark_2.11
+# hudi-spark2_2.11
+# hudi-utilities_2.11
+# hudi-cli-bundle_2.11
+# hudi-spark2.4-bundle_2.11
+# hudi-utilities-bundle_2.11
+# hudi-utilities-slim-bundle_2.11
+"-Dscala-2.11 -Dspark2.4 -pl 
hudi-spark-datasource/hudi-spark-common,hudi-spark-datasource/hudi-spark2,hudi-spark-datasource/hudi-spark,hudi-utilities,packaging/hudi-spark-bundle,packaging/hudi-cli-bundle,packaging/hudi-utilities-bundle,packaging/hudi-utilities-slim-bundle
 -am"

Review Comment:
   there is a `hudi-spark2-common`, which is a placeholder module and empty. 
Though it won't affect things, it should be still added to keep consistent with 
existing modules.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts

2023-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5771:
-
Labels: pull-request-available  (was: )

> Improve deploy script of release artifacts
> --
>
> Key: HUDI-5771
> URL: https://issues.apache.org/jira/browse/HUDI-5771
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Current script is inefficient as some artifacts are repeatedly uploaded which 
> wastes time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] yihua opened a new pull request, #7927: [HUDI-5771] Improve deploy script of release artifacts

2023-02-12 Thread via GitHub



yihua opened a new pull request, #7927:
URL: https://github.com/apache/hudi/pull/7927

   ### Change Logs
   
   The current `scripts/release/deploy_staging_jars.sh` took around 6 hours to 
upload all release artifacts to the Apache Nexus staging repository, which is 
too long.  After analyzing the upload sequence, there are repeated uploads of 
the same module that can be avoided.
   
   After carefully reviewing the deploy script and logs, I make the following 
changes to cut down the upload time by 70%, without changing the intended jars 
for uploads:
   - For each profile (e.g., `-Dscala-2.12 -Dspark3.2`), only make one mvn build
   - Remove overlapping build targets among different profiles
 - For Spark 2.4, Scala 2.11: `hudi-spark-common_2.11`, `hudi-spark_2.11`, 
`hudi-spark2_2.11`, `hudi-utilities_2.11`, `hudi-cli-bundle_2.11`, 
`hudi-spark2.4-bundle_2.11`, `hudi-utilities-bundle_2.11`, 
`hudi-utilities-slim-bundle_2.11`
 - For Spark 2.4, Scala 2.12: `hudi-spark2.4-bundle_2.12`
 - For Spark 3.2, Scala 2.12: `hudi-spark3.2.x_2.12`, 
`hudi-spark3.2plus-common`, `hudi-spark3.2-bundle_2.12`
 - For Spark 3.3, Scala 2.12: `hudi-spark3.3.x_2.12`, 
`hudi-cli-bundle_2.12`, `hudi-spark3.3-bundle_2.12`
 - For Spark 3.1, Scala 2.12: all other modules and bundles 
(`hudi-cli-bundle_2.12` is not overridden)
   
   Legacy Spark bundles and Flink bundles are not changed.
   
   Raw logs:
   - Summary of existing upload sequence: 
[deploy_sequence.txt](https://github.com/apache/hudi/files/10719044/deploy_sequence.txt)
   - Last modified times of uploaded artifacts for analyzing the relevant 
upload and profile: 
[staging_file_timestamp.txt](https://github.com/apache/hudi/files/10719051/staging_file_timestamp.txt)
   
   ### Impact
   
   Significantly reduces the time (by ~70%, from 6 hours to <2 hours) of 
uploading all release artifacts to the Apache Nexus staging repository.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nfarah86 commented on pull request #7926: updated hudi content

2023-02-12 Thread via GitHub



nfarah86 commented on PR #7926:
URL: https://github.com/apache/hudi/pull/7926#issuecomment-1427375751

   cc @bhasudha to review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nfarah86 opened a new pull request, #7926: updated hudi content

2023-02-12 Thread via GitHub



nfarah86 opened a new pull request, #7926:
URL: https://github.com/apache/hudi/pull/7926

   ### Change Logs
   
   updated videos and blog content; blog image is null- but file is added
   
   
   https://user-images.githubusercontent.com/5392555/218378737-f301ffb5-e41f-40fb-97a7-44d06c20d306.png;>
   https://user-images.githubusercontent.com/5392555/218378739-2d2eebd1-c6c6-4d83-8ecf-8072f4f8a186.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427368880

   
   ## CI report:
   
   * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15124)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-5771) Improve deploy script of release artifacts

2023-02-12 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo reassigned HUDI-5771:
---

Assignee: Ethan Guo

> Improve deploy script of release artifacts
> --
>
> Key: HUDI-5771
> URL: https://issues.apache.org/jira/browse/HUDI-5771
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Current script is inefficient as some artifacts are repeatedly uploaded which 
> wastes time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts

2023-02-12 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5771:

Description: Current script is inefficient as some artifacts are repeatedly 
uploaded which wastes time.

> Improve deploy script of release artifacts
> --
>
> Key: HUDI-5771
> URL: https://issues.apache.org/jira/browse/HUDI-5771
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Current script is inefficient as some artifacts are repeatedly uploaded which 
> wastes time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts

2023-02-12 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5771:

Story Points: 3

> Improve deploy script of release artifacts
> --
>
> Key: HUDI-5771
> URL: https://issues.apache.org/jira/browse/HUDI-5771
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Current script is inefficient as some artifacts are repeatedly uploaded which 
> wastes time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts

2023-02-12 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5771:

Fix Version/s: 0.13.0

> Improve deploy script of release artifacts
> --
>
> Key: HUDI-5771
> URL: https://issues.apache.org/jira/browse/HUDI-5771
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-5771) Improve deploy script of release artifacts

2023-02-12 Thread Ethan Guo (Jira)

Ethan Guo created HUDI-5771:
---

 Summary: Improve deploy script of release artifacts
 Key: HUDI-5771
 URL: https://issues.apache.org/jira/browse/HUDI-5771
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Ethan Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-5771) Improve deploy script of release artifacts

2023-02-12 Thread Ethan Guo (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5771:

Priority: Blocker  (was: Major)

> Improve deploy script of release artifacts
> --
>
> Key: HUDI-5771
> URL: https://issues.apache.org/jira/browse/HUDI-5771
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7915:
URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427336585

   
   ## CI report:
   
   * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN
   * 6b3cafb7422b1cb3bfb49557327effc2b144dc58 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15125)
 
   * 52ff32a1bb04340505e309191c398d95a9c8f928 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15127)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6121: [HUDI-4406] Support Flink compaction/clustering write error resolvement to avoid data loss

2023-02-12 Thread via GitHub



hudi-bot commented on PR #6121:
URL: https://github.com/apache/hudi/pull/6121#issuecomment-1427335554

   
   ## CI report:
   
   * 52b6f55e196007f993b0506d899c48bb80b36546 UNKNOWN
   * c52a60118c2e7fba170ea1cea0c4105ff83c52f9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15089)
 
   * 5dc463fcade7c5a495cca1437fca8230b01d0229 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15126)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7915:
URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427332991

   
   ## CI report:
   
   * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN
   * 7f2456c65f6d17280fd6abe3185edc5a7f4d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15080)
 
   * 6b3cafb7422b1cb3bfb49557327effc2b144dc58 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15125)
 
   * 52ff32a1bb04340505e309191c398d95a9c8f928 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #6121: [HUDI-4406] Support Flink compaction/clustering write error resolvement to avoid data loss

2023-02-12 Thread via GitHub



hudi-bot commented on PR #6121:
URL: https://github.com/apache/hudi/pull/6121#issuecomment-1427328065

   
   ## CI report:
   
   * 52b6f55e196007f993b0506d899c48bb80b36546 UNKNOWN
   * c52a60118c2e7fba170ea1cea0c4105ff83c52f9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15089)
 
   * 5dc463fcade7c5a495cca1437fca8230b01d0229 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427322176

   
   ## CI report:
   
   * 50480623485bb99353655f4c6df23a2462214f7f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15123)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] chenshzh commented on a diff in pull request #6121: [HUDI-4406] Support Flink compaction/clustering write error resolvement to avoid data loss

2023-02-12 Thread via GitHub



chenshzh commented on code in PR #6121:
URL: https://github.com/apache/hudi/pull/6121#discussion_r1103982916


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringCommitSink.java:
##
@@ -119,7 +119,16 @@ private void commitIfNecessary(String instant, 
List event
   return;
 }
 
-if (events.stream().anyMatch(ClusteringCommitEvent::isFailed)) {
+// here we should take the write errors under consideration
+// as some write errors might cause data loss when clustering
+List statuses = events.stream()

Review Comment:
   Agree that `isFailed` indicates the execution failure always to be 
rollbacked. 
   
   So in the updated we will judge whether to rollback write status errors when 
the config `FlinkOptions.IGNORE_FAILED` false.
   
   Pls take a review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhuanshenbsj1 commented on a diff in pull request #7891: [HUDI-5728] HoodieTimelineArchiver archives the latest instant before inflight replacecommit

2023-02-12 Thread via GitHub



zhuanshenbsj1 commented on code in PR #7891:
URL: https://github.com/apache/hudi/pull/7891#discussion_r1103965694


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java:
##
@@ -473,6 +473,33 @@ private Stream getCommitInstantsToArchive() 
throws IOException {
   HoodieTimeline.compareTimestamps(s.getTimestamp(), 
LESSER_THAN, instantToRetain.getTimestamp()))
   .orElse(true)
   );
+
+  // When inline or async clustering is enabled, we need to ensure that 
there is a commit in the active timeline
+  // to check whether the file slice generated in pending clustering after 
archive isn't committed
+  // via {@code HoodieFileGroup#isFileSliceCommitted(slice)}
+  boolean isOldestPendingReplaceInstant =
+  oldestPendingCompactionAndReplaceInstant.map(instant ->
+  
HoodieTimeline.REPLACE_COMMIT_ACTION.equals(instant.getAction())).orElse(false);
+  if (isOldestPendingReplaceInstant) {
+List instantsToArchive = 
instantToArchiveStream.collect(Collectors.toList());
+Option latestInstantRetainForReplace = 
Option.fromJavaOptional(
+instantsToArchive.stream()
+.filter(s -> HoodieTimeline.compareTimestamps(
+s.getTimestamp(),
+LESSER_THAN,
+
oldestPendingCompactionAndReplaceInstant.get().getTimestamp()))
+.reduce((i1, i2) -> i2));
+if (latestInstantRetainForReplace.isPresent()) {
+  LOG.info(String.format(
+  "Retaining the archived instant %s before the inflight 
replacecommit %s.",
+  latestInstantRetainForReplace.get().getTimestamp(),
+  oldestPendingCompactionAndReplaceInstant.get().getTimestamp()));
+}
+instantToArchiveStream = instantsToArchive.stream()
+.filter(s -> latestInstantRetainForReplace.map(instant -> 
s.compareTo(instant) != 0)
+.orElse(true));
+  }
+

Review Comment:
   getOldestInstantToRetainForClustering(）{
1.get the first unclean clustering instant
2.get the previous commit of last inflight clustering instant
3.compare 1&2, return the earliest
   }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7915:
URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427290880

   
   ## CI report:
   
   * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN
   * 7f2456c65f6d17280fd6abe3185edc5a7f4d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15080)
 
   * 6b3cafb7422b1cb3bfb49557327effc2b144dc58 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15125)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7915:
URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427286081

   
   ## CI report:
   
   * 3609b742d773da98bd00e0a19b096ee6ede289b8 UNKNOWN
   * 7f2456c65f6d17280fd6abe3185edc5a7f4d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15080)
 
   * 6b3cafb7422b1cb3bfb49557327effc2b144dc58 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-5770) Plan error when partition column is timestamp type and SQL query contains filter condition which contains partition

2023-02-12 Thread Jing Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17687687#comment-17687687
 ] 

Jing Zhang commented on HUDI-5770:
--

The cause of this bug is similar with 
[HUDI-4601|https://issues.apache.org/jira/browse/HUDI-4601]

If partition column is timestamp type, the partition path value is not the real 
value, because the partition value is converted according to the real value.

We need take care the partition case when it's timestamp/date type when 
applying partition prune.

 

> Plan error when partition column is timestamp type and SQL query contains 
> filter condition which contains partition
> ---
>
> Key: HUDI-5770
> URL: https://issues.apache.org/jira/browse/HUDI-5770
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink-sql
>Reporter: Jing Zhang
>Priority: Major
>
> If a hudi table is a partition table, and partition column is timestamp type.
> When run a flink query which contain the filter conditions on partition 
> column, an error would be thrown out in the plan generating phase.
> {code:java}
> java.time.format.DateTimeParseException: Text '1970010100' could not be 
> parsed at index 0    at 
> java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
>     at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777)
>     at 
> org.apache.flink.table.utils.DateTimeUtils.parseTimestampData(DateTimeUtils.java:413)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionFieldValue(PartitionPruner.scala:182)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1(PartitionPruner.scala:157)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1$adapted(PartitionPruner.scala:155)
>     at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
>     at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
>     at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionToRow(PartitionPruner.scala:155)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1(PartitionPruner.scala:137)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1$adapted(PartitionPruner.scala:132)
>     at scala.collection.Iterator.foreach(Iterator.scala:937)
>     at scala.collection.Iterator.foreach$(Iterator.scala:937)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>     at scala.collection.IterableLike.foreach(IterableLike.scala:70)
>     at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.prunePartitions(PartitionPruner.scala:132)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner.prunePartitions(PartitionPruner.scala)
>     at 
> org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.lambda$onMatch$3(PushPartitionIntoTableSourceScanRule.java:163)
>     at 
> org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.readPartitionsAndPrune(PushPartitionIntoTableSourceScanRule.java:254)
>     at 
> org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.onMatch(PushPartitionIntoTableSourceScanRule.java:172)
>     at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
>     at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>     at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>     at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:64)
>     at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:78)
>     at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.$anonfun$optimize$2(FlinkGroupProgram.scala:59)
>     at 
> scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:156)
>     at 
>

[jira] [Assigned] (HUDI-5770) Plan error when partition column is timestamp type and SQL query contains filter condition which contains partition

2023-02-12 Thread Jing Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhang reassigned HUDI-5770:


Assignee: Jing Zhang

> Plan error when partition column is timestamp type and SQL query contains 
> filter condition which contains partition
> ---
>
> Key: HUDI-5770
> URL: https://issues.apache.org/jira/browse/HUDI-5770
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: flink-sql
>Reporter: Jing Zhang
>Assignee: Jing Zhang
>Priority: Major
>
> If a hudi table is a partition table, and partition column is timestamp type.
> When run a flink query which contain the filter conditions on partition 
> column, an error would be thrown out in the plan generating phase.
> {code:java}
> java.time.format.DateTimeParseException: Text '1970010100' could not be 
> parsed at index 0    at 
> java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
>     at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777)
>     at 
> org.apache.flink.table.utils.DateTimeUtils.parseTimestampData(DateTimeUtils.java:413)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionFieldValue(PartitionPruner.scala:182)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1(PartitionPruner.scala:157)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1$adapted(PartitionPruner.scala:155)
>     at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
>     at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
>     at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionToRow(PartitionPruner.scala:155)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1(PartitionPruner.scala:137)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1$adapted(PartitionPruner.scala:132)
>     at scala.collection.Iterator.foreach(Iterator.scala:937)
>     at scala.collection.Iterator.foreach$(Iterator.scala:937)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>     at scala.collection.IterableLike.foreach(IterableLike.scala:70)
>     at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner$.prunePartitions(PartitionPruner.scala:132)
>     at 
> org.apache.flink.table.planner.plan.utils.PartitionPruner.prunePartitions(PartitionPruner.scala)
>     at 
> org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.lambda$onMatch$3(PushPartitionIntoTableSourceScanRule.java:163)
>     at 
> org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.readPartitionsAndPrune(PushPartitionIntoTableSourceScanRule.java:254)
>     at 
> org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.onMatch(PushPartitionIntoTableSourceScanRule.java:172)
>     at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
>     at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
>     at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
>     at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>     at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>     at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:64)
>     at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:78)
>     at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.$anonfun$optimize$2(FlinkGroupProgram.scala:59)
>     at 
> scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:156)
>     at 
> scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:156)
>     at scala.collection.Iterator.foreach(Iterator.scala:937)
>     at scala.collection.Iterator.foreach$(Iterator.scala:937)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
>     at scala.collection.IterableLike.foreach(IterableLike.scala:70)
>     at

[jira] [Created] (HUDI-5770) Plan error when partition column is timestamp type and SQL query contains filter condition which contains partition

2023-02-12 Thread Jing Zhang (Jira)

Jing Zhang created HUDI-5770:


 Summary: Plan error when partition column is timestamp type and 
SQL query contains filter condition which contains partition
 Key: HUDI-5770
 URL: https://issues.apache.org/jira/browse/HUDI-5770
 Project: Apache Hudi
  Issue Type: Bug
  Components: flink-sql
Reporter: Jing Zhang


If a hudi table is a partition table, and partition column is timestamp type.

When run a flink query which contain the filter conditions on partition column, 
an error would be thrown out in the plan generating phase.
{code:java}
java.time.format.DateTimeParseException: Text '1970010100' could not be parsed 
at index 0    at 
java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
    at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777)
    at 
org.apache.flink.table.utils.DateTimeUtils.parseTimestampData(DateTimeUtils.java:413)
    at 
org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionFieldValue(PartitionPruner.scala:182)
    at 
org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1(PartitionPruner.scala:157)
    at 
org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$convertPartitionToRow$1$adapted(PartitionPruner.scala:155)
    at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32)
    at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:194)
    at 
org.apache.flink.table.planner.plan.utils.PartitionPruner$.convertPartitionToRow(PartitionPruner.scala:155)
    at 
org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1(PartitionPruner.scala:137)
    at 
org.apache.flink.table.planner.plan.utils.PartitionPruner$.$anonfun$prunePartitions$1$adapted(PartitionPruner.scala:132)
    at scala.collection.Iterator.foreach(Iterator.scala:937)
    at scala.collection.Iterator.foreach$(Iterator.scala:937)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
    at scala.collection.IterableLike.foreach(IterableLike.scala:70)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at 
org.apache.flink.table.planner.plan.utils.PartitionPruner$.prunePartitions(PartitionPruner.scala:132)
    at 
org.apache.flink.table.planner.plan.utils.PartitionPruner.prunePartitions(PartitionPruner.scala)
    at 
org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.lambda$onMatch$3(PushPartitionIntoTableSourceScanRule.java:163)
    at 
org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.readPartitionsAndPrune(PushPartitionIntoTableSourceScanRule.java:254)
    at 
org.apache.flink.table.planner.plan.rules.logical.PushPartitionIntoTableSourceScanRule.onMatch(PushPartitionIntoTableSourceScanRule.java:172)
    at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
    at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
    at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
    at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
    at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
    at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
    at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
    at 
org.apache.flink.table.planner.plan.optimize.program.FlinkHepProgram.optimize(FlinkHepProgram.scala:64)
    at 
org.apache.flink.table.planner.plan.optimize.program.FlinkHepRuleSetProgram.optimize(FlinkHepRuleSetProgram.scala:78)
    at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.$anonfun$optimize$2(FlinkGroupProgram.scala:59)
    at 
scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:156)
    at 
scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:156)
    at scala.collection.Iterator.foreach(Iterator.scala:937)
    at scala.collection.Iterator.foreach$(Iterator.scala:937)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
    at scala.collection.IterableLike.foreach(IterableLike.scala:70)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:156)
    at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:154)
    at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
    at 
org.apache.flink.table.planner.plan.optimize.program.FlinkGroupProgram.$anonfun$optimize$1(FlinkGroupProgram.scala:56)
    at

[GitHub] [hudi] qidian99 commented on a diff in pull request #7915: [HUDI-5759] Supports add column on mor table with log

2023-02-12 Thread via GitHub



qidian99 commented on code in PR #7915:
URL: https://github.com/apache/hudi/pull/7915#discussion_r1103953817


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala:
##
@@ -202,6 +202,13 @@ private[sql] object SchemaConverters {
   st.foreach { f =>
 val fieldAvroType =
   toAvroType(f.dataType, f.nullable, f.name, childNameSpace)
+val fieldBuilder = 
fieldsAssembler.name(f.name).`type`(fieldAvroType)

Review Comment:
   
![image](https://user-images.githubusercontent.com/20527912/218361319-2783b730-ddea-4d7b-b7d4-ec225014e531.png)
   When `extractPartitionValuesFromPartitionPath` is turned on, the StructType 
schema and AvroSchema differs. convertToAvroSchema is missing the default value 
when the field is nullable, making the table not queryable.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] qidian99 commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log

2023-02-12 Thread via GitHub



qidian99 commented on PR #7915:
URL: https://github.com/apache/hudi/pull/7915#issuecomment-1427253388

   Here's the stacktrace when I tried to add a column named `new_col1` in mor 
table:
   
   ```
   
   Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2403)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2352)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2351)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2351)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1109)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1109)
at scala.Option.foreach(Option.scala:407)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1109)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2591)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2533)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:898)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2214)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2235)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2254)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2279)
at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
at 
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:394)
at 
org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:421)
at 
org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:69)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427235209

   
   ## CI report:
   
   * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15124)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] stream2000 commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-12 Thread via GitHub



stream2000 commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427231106

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zinking commented on issue #4457: [SUPPORT] Hudi archive stopped working

2023-02-12 Thread via GitHub



zinking commented on issue #4457:
URL: https://github.com/apache/hudi/issues/4457#issuecomment-1427228551

   @nsivabalan  I observed same thing here. rollbacks on the timeline didn't 
get processed in the flink engine. compact pending on the rollbacks, and marker 
cleaning pending on compacts, causing an extra large timeline.
   
   in the spark compact process, the rollbacks are processed though, not sure 
if flink compact should do the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427204592

   
   ## CI report:
   
   * 948c6823094e63b03adfb98b40f9c70c3edf3ad2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15108)
 
   * 50480623485bb99353655f4c6df23a2462214f7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15123)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7633: [HUDI-5737] Fix Deletes issued without any prior commits

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7633:
URL: https://github.com/apache/hudi/pull/7633#issuecomment-1427201506

   
   ## CI report:
   
   * 948c6823094e63b03adfb98b40f9c70c3edf3ad2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15108)
 
   * 50480623485bb99353655f4c6df23a2462214f7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] kazdy commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8

2023-02-12 Thread via GitHub



kazdy commented on PR #7922:
URL: https://github.com/apache/hudi/pull/7922#issuecomment-1427055835

   CI failed two times due to the timeout


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on a diff in pull request #7871: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2023-02-12 Thread via GitHub



codope commented on code in PR #7871:
URL: https://github.com/apache/hudi/pull/7871#discussion_r1103803439


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/HoodieCatalystExpressionUtils.scala:
##
@@ -75,7 +81,7 @@ trait HoodieCatalystExpressionUtils {
   def unapplyCastExpression(expr: Expression): Option[(Expression, DataType, 
Option[String], Boolean)]
 }
 
-object HoodieCatalystExpressionUtils {
+object HoodieCatalystExpressionUtils extends SparkAdapterSupport {

Review Comment:
   Why does it need to extend `SparkAdapterSupport`? Is there something that 
changes across spark versions?



##
hudi-common/src/main/java/org/apache/hudi/common/util/CollectionUtils.java:
##
@@ -69,6 +69,26 @@ public static boolean nonEmpty(Collection c) {
 return !isNullOrEmpty(c);
   }
 
+  /**
+   * Reduces provided {@link Collection} using provided {@code reducer} 
applied to
+   * every element of the collection like following
+   *
+   * {@code reduce(reduce(reduce(identity, e1), e2), ...)}
+   *
+   * @param c target collection to be reduced
+   * @param identity element for reducing to start from
+   * @param reducer actual reducing operator
+   *
+   * @return result of the reduction of the collection using reducing operator
+   */
+  public static  U reduce(Collection c, U identity, BiFunction reducer) {
+return c.stream()
+.sequential()

Review Comment:
   Does it have to be strictly sequential? I mean the elements of collection 
should be independent of each other. Is there any value add in parameterizing 
this behavior, say we add a boolean `shouldReduceParallelly`?



##
hudi-common/src/main/java/org/apache/hudi/internal/schema/action/TableChange.java:
##
@@ -83,10 +83,16 @@ abstract class BaseColumnChange implements TableChange {
 protected final InternalSchema internalSchema;
 protected final Map id2parent;
 protected final Map> 
positionChangeMap = new HashMap<>();
+protected final boolean caseSensitive;
 
 BaseColumnChange(InternalSchema schema) {
+  this(schema, false);

Review Comment:
   why default `caseSensitive` is false?



##
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##
@@ -28,97 +28,125 @@ import 
org.apache.hudi.config.HoodieWriteConfig.{AVRO_SCHEMA_VALIDATE_ENABLE, TB
 import org.apache.hudi.exception.HoodieException
 import org.apache.hudi.hive.HiveSyncConfigHolder
 import org.apache.hudi.sync.common.HoodieSyncConfig
+import org.apache.hudi.util.JFunction.scalaFunction1Noop
 import org.apache.hudi.{AvroConversionUtils, DataSourceWriteOptions, 
HoodieSparkSqlWriter, SparkAdapterSupport}
-import org.apache.spark.sql.HoodieCatalystExpressionUtils.MatchCast
+import org.apache.spark.sql.HoodieCatalystExpressionUtils.{MatchCast, 
attributeEquals}
 import org.apache.spark.sql._
-import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.catalyst.analysis.Resolver
 import org.apache.spark.sql.catalyst.catalog.HoodieCatalogTable
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, BoundReference, Cast, EqualTo, Expression, Literal}
+import org.apache.spark.sql.catalyst.expressions.BindReferences.bindReference
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, BoundReference, EqualTo, Expression, Literal, 
NamedExpression, PredicateHelper}
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.hudi.HoodieSqlCommonUtils._
-import org.apache.spark.sql.hudi.HoodieSqlUtils.getMergeIntoTargetTableId
+import org.apache.spark.sql.hudi.analysis.HoodieAnalysis.failAnalysis
 import org.apache.spark.sql.hudi.ProvidesHoodieConfig.combineOptions
-import 
org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.CoercedAttributeReference
+import 
org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.{CoercedAttributeReference,
 encodeAsBase64String, stripCasting, toStructType}
 import org.apache.spark.sql.hudi.command.payload.ExpressionPayload
 import org.apache.spark.sql.hudi.command.payload.ExpressionPayload._
 import org.apache.spark.sql.hudi.ProvidesHoodieConfig
-import org.apache.spark.sql.types.{BooleanType, StructType}
+import org.apache.spark.sql.types.{BooleanType, StructField, StructType}
 
 import java.util.Base64
 
 /**
- * The Command for hoodie MergeIntoTable.
- * The match on condition must contain the row key fields currently, so that 
we can use Hoodie
- * Index to speed up the performance.
+ * Hudi's implementation of the {@code MERGE INTO} (MIT) Spark SQL statement.
  *
- * The main algorithm:
+ * NOTE: That this implementation is restricted in a some aspects to 
accommodate for Hudi's crucial
+ *   constraint (of requiring every record to bear unique primary-key): 
merging condition ([[mergeCondition]])
+ *   is currently can only (and must) reference

[GitHub] [hudi] hudi-bot commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7922:
URL: https://github.com/apache/hudi/pull/7922#issuecomment-1427045421

   
   ## CI report:
   
   * d75235c11b5619654d6399f397ecea13f874aec4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15111)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15120)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] menna224 commented on issue #4839: Hudi upsert doesnt trigger compaction for MOR

2023-02-12 Thread via GitHub



menna224 commented on issue #4839:
URL: https://github.com/apache/hudi/issues/4839#issuecomment-1427045014

   > 
   
   hello @shahiidiqbal  can you please provide snippet from the code in which 
you write stream directly how did u pass the  cleansing function to it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1427020023

   
   ## CI report:
   
   * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-5764) Allow lazy rollback for async indexer commit

2023-02-12 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5764.
-
Resolution: Fixed

> Allow lazy rollback for async indexer commit
> 
>
> Key: HUDI-5764
> URL: https://issues.apache.org/jira/browse/HUDI-5764
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> This is to fix HUDI-5733, where async indexer may fail due to eager rollback 
> in metadata table.
> Temporary solution for 0.13.0: Little more invovled and not so clean fix. 
> Apply eager rollbacks only for regular delta commits. Deduce delta commits 
> from HoodieIndexer and employ lazy clean policy(based on heartbeat). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[GitHub] [hudi] codope merged pull request #7921: [HUDI-5764] Rollback delta commits from `HoodieIndexer` lazily in metadata table

2023-02-12 Thread via GitHub



codope merged PR #7921:
URL: https://github.com/apache/hudi/pull/7921


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (1cb8ffe7264 -> e25381c6966)

2023-02-12 Thread codope

This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 1cb8ffe7264 [HUDI-5768] Fix Spark Datasource read of metadata table 
(#7924)
 add e25381c6966 [HUDI-5764] Rollback delta commits from `HoodieIndexer` 
lazily in metadata table (#7921)

No new revisions were added by this update.

Summary of changes:
 .../hudi/client/BaseHoodieTableServiceClient.java  |  48 +
 .../apache/hudi/client/BaseHoodieWriteClient.java  |  13 +++
 .../metadata/HoodieBackedTableMetadataWriter.java  |  34 ---
 .../java/org/apache/hudi/table/HoodieTable.java|  38 +++-
 .../table/action/index/RunIndexActionExecutor.java |   5 +-
 .../FlinkHoodieBackedTableMetadataWriter.java  |  21 +++-
 .../org/apache/hudi/table/HoodieFlinkTable.java|  12 ++-
 .../SparkHoodieBackedTableMetadataWriter.java  |  20 +++-
 .../org/apache/hudi/table/HoodieSparkTable.java|  10 +-
 .../hudi/metadata/HoodieBackedTableMetadata.java   |  12 ++-
 .../hudi/metadata/HoodieTableMetadataUtil.java |  20 
 .../apache/hudi/utilities/TestHoodieIndexer.java   | 108 +++--
 12 files changed, 298 insertions(+), 43 deletions(-)

[jira] [Closed] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-12 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-5768.
-
Resolution: Fixed

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
>   at

[hudi] branch master updated (3e31ca73828 -> 1cb8ffe7264)

2023-02-12 Thread codope

This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 3e31ca73828 [MINOR] Remove unnecessary TestCallExpressions which are 
adapters for CallExpression (#7911)
 add 1cb8ffe7264 [HUDI-5768] Fix Spark Datasource read of metadata table 
(#7924)

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/hudi/HoodieBaseRelation.scala  | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

[GitHub] [hudi] codope merged pull request #7924: [HUDI-5768] Fix Spark Datasource read of metadata table

2023-02-12 Thread via GitHub



codope merged PR #7924:
URL: https://github.com/apache/hudi/pull/7924


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7924: [HUDI-5768] Fix Spark Datasource read of metadata table

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7924:
URL: https://github.com/apache/hudi/pull/7924#issuecomment-1427006591

   
   ## CI report:
   
   * 1d00cbd70323708d204e00aca22c90d66d5c2297 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15118)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7922:
URL: https://github.com/apache/hudi/pull/7922#issuecomment-1427006577

   
   ## CI report:
   
   * d75235c11b5619654d6399f397ecea13f874aec4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15111)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15120)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] kazdy commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8

2023-02-12 Thread via GitHub



kazdy commented on PR #7922:
URL: https://github.com/apache/hudi/pull/7922#issuecomment-1427002319

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7921: [HUDI-5764][DO NOT MERGE] Roll back delta commits from `HoodieIndexer` lazily in metadata table

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7921:
URL: https://github.com/apache/hudi/pull/7921#issuecomment-1426994163

   
   ## CI report:
   
   * 8d961453bb808b5f6273e68a455940f2f6014605 UNKNOWN
   * 42a40ccdd6ee5d58b6aaf06cbc5af6bbd618dea2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15117)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1426994144

   
   ## CI report:
   
   * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15119)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] stream2000 commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-12 Thread via GitHub



stream2000 commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1426986850

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] GallonREX opened a new issue, #7925: [SUPPORT]hudi 0.8 upgrade to hudi 0.12 report java.util.ConcurrentModificationException: Cannot resolve conflicts for overlapping writes

2023-02-12 Thread via GitHub



GallonREX opened a new issue, #7925:
URL: https://github.com/apache/hudi/issues/7925

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   Upgrade from hudi 0.8 to hudi 0.12
   Upgrade steps:
   Use the hudi 0.12 program to write to the table created by the existing hudi 
0.8, and use automatic upgrade
   After writing to the 0.8 table, **hudi 0.12 cannot be written by two writers 
at the same time**
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.Use hudi0.12 to write to existing hudi 0.8 tables
   2.scala code：
   `articleDataframe
 .write.format("org.apache.hudi").
 option("hoodie.insert.shuffle.parallelism", "264").
 option("hoodie.upsert.shuffle.parallelism", "264").
 option("hoodie.cleaner.policy.failed.writes", "LAZY")
 .option("hoodie.write.concurrency.mode", 
"optimistic_concurrency_control")
 .option("hoodie.write.lock.provider", 
"org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider")
 .option("hoodie.write.lock.zookeeper.url", 
"10.1.4.1,10.1.4.2,10.1.4.3")
 .option("hoodie.write.lock.zookeeper.port", "2181")
 .option("hoodie.write.lock.zookeeper.lock_key", 
"zycg_article_data_day_test08limit")
 .option("hoodie.write.lock.zookeeper.base_path", 
"/hudi_data_zycg_article_data_day_test")
 .option(RECORDKEY_FIELD.key(), "doc_id").
 option(PARTITIONPATH_FIELD.key(), "partionpath").
 option(PRECOMBINE_FIELD.key(), "publish_time").
 option(TBL_NAME.key(), "hudi_test_tb").
 mode(Append).
 save("hdfs://10.1.4.1:9000/data_center/hudidata/hudi_test_tb")`
   3.spark submit(hudi 0.12):
   `bin/spark-submit \
   --name hudi012_20220807 \
   --class com.honeycomb.hudi.hudiimport.spark.ZhongyunImportHudiRecovery \
   --master yarn --deploy-mode cluster \
   --executor-memory 10g --driver-memory 5g --executor-cores 2 --num-executors 
20 \
   --queue default \
   --jars 
/data/sas01/opt/module/hudi-0.12.0/packaging/hudi-spark-bundle/target/hudi-spark2.4-bundle_2.11-0.12.0.jar
 \
   
/data/sas01/crontabprogram2/zytongzhan2/honeycomb-hudi08-download-1.0-SNAPSHOT.jar
 \`
   
   **Expected behavior**
   update hudi 0.8 table to hudi 0.12
   hudi 0.12 can write to the table at the same time without error
   
   **Environment Description**
   
   * Hudi version :
   hudi 0.8 ->hudi 0.12
   * Spark version :
   spark 2.4.5   scala 2.11.12
   * Hadoop version :
   hadoop 2.7.7
   * Storage (HDFS/S3/GCS..) :
   HDFS
   * Running on Docker? (yes/no) :
   no
   application on yarn
   
   **Additional context**
   Use hudi0.12 to write to the existing hudi0.8 table
   **hudi 0.8 hoodie.properties**
   hoodie.table.precombine.field=publish_time
   hoodie.table.name=zycg_article_data_day_test08
   hoodie.archivelog.folder=archived
   hoodie.table.type=COPY_ON_WRITE
   hoodie.table.version=1
   hoodie.timeline.layout.version=1
   
   **hudi 0.12 hoodie.properties**
   hoodie.table.precombine.field=publish_time
   hoodie.table.partition.fields=partionpath
   hoodie.table.type=COPY_ON_WRITE
   hoodie.archivelog.folder=archived
   hoodie.timeline.layout.version=1
   hoodie.table.version=5
   hoodie.table.metadata.partitions=files
   hoodie.table.recordkey.fields=doc_id
   hoodie.table.base.file.format=PARQUET
   hoodie.datasource.write.partitionpath.urlencode=false
   hoodie.table.name=zycg_article_data_day_test08
   hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
   hoodie.datasource.write.hive_style_partitioning=false
   hoodie.table.checksum=3536879415
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   23/02/11 23:01:42 INFO view.FileSystemViewManager: Creating remote first 
table view
   23/02/11 23:01:42 INFO timeline.HoodieActiveTimeline: Loaded instants upto : 
Option{val=[20230211225240790__rollback__COMPLETED]}
   23/02/11 23:01:42 INFO 
transaction.SimpleConcurrentFileWritesConflictResolutionStrategy: Found 
conflicting writes between first operation = {actionType=commit, 
instantTime=2023021122489, actionState=INFLIGHT'}, second operation = 
{actionType=commit, instantTime=20230211224251755, actionState=COMPLETED'} , 
intersecting file ids [29d8e24e-f5c5-43b5-a10e-2240cc51dda0-0, 
3ac5d5f6-df53-4f81-848a-316ca38107b6-0, cb1f2488-d860-4d08-aa2a-134ba89558e3-0, 
ea157114-677d-4011-8c63-84af3b2526e5-0, f5301297-6e18-4166-8f56-a853b5d6485b-0, 
f124d9a9-f04e-4655-8a4b-c45fa357b38f-0, a4681446-fb69-4ebd-a121-13323fdb62a5-0, 
d48017c8-56cb-4172-a92a-5caf08d605a6-0, a6fc9c73-dc74-47ad-8085-ec63915b534b-0, 
637d35d4-c492-4236-955b-ce3c515cf7ee-0,

[GitHub] [hudi] rfyu commented on pull request #7672: [HUDI-5557]Avoid converting columns that are not indexed in CSI

2023-02-12 Thread via GitHub



rfyu commented on PR #7672:
URL: https://github.com/apache/hudi/pull/7672#issuecomment-1426980873

   @alexeykudinkin   A test has been added. Could you please help to review?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7924: [HUDI-5768] Fix Spark Datasource read of metadata table

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7924:
URL: https://github.com/apache/hudi/pull/7924#issuecomment-1426973433

   
   ## CI report:
   
   * 1d00cbd70323708d204e00aca22c90d66d5c2297 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15118)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7921: [HUDI-5764][DO NOT MERGE] Roll back delta commits from `HoodieIndexer` lazily in metadata table

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7921:
URL: https://github.com/apache/hudi/pull/7921#issuecomment-1426973423

   
   ## CI report:
   
   * 8d961453bb808b5f6273e68a455940f2f6014605 UNKNOWN
   * 00a05691b0163c7bb8e39a0a15957f3b72cd71eb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15113)
 
   * 42a40ccdd6ee5d58b6aaf06cbc5af6bbd618dea2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15117)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7924: [HUDI-5768] Fix Spark Datasource read of metadata table

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7924:
URL: https://github.com/apache/hudi/pull/7924#issuecomment-1426972303

   
   ## CI report:
   
   * 1d00cbd70323708d204e00aca22c90d66d5c2297 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7921: [HUDI-5764][DO NOT MERGE] Roll back delta commits from `HoodieIndexer` lazily in metadata table

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7921:
URL: https://github.com/apache/hudi/pull/7921#issuecomment-1426972288

   
   ## CI report:
   
   * 8d961453bb808b5f6273e68a455940f2f6014605 UNKNOWN
   * 00a05691b0163c7bb8e39a0a15957f3b72cd71eb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15113)
 
   * 42a40ccdd6ee5d58b6aaf06cbc5af6bbd618dea2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5768:
-
Labels: pull-request-available  (was: )

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
>   at

[GitHub] [hudi] yihua opened a new pull request, #7924: [HUDI-5768] Fix Spark Datasource read of metadata table

2023-02-12 Thread via GitHub



yihua opened a new pull request, #7924:
URL: https://github.com/apache/hudi/pull/7924

   ### Change Logs
   
   Fixes Spark Datasource read of metadata table in Spark 3.
   
   ### Impact
   
   As above.
   
   ### Risk level
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-12 Thread via GitHub



hudi-bot commented on PR #7918:
URL: https://github.com/apache/hudi/pull/7918#issuecomment-1426969867

   
   ## CI report:
   
   * dc12ef61c3bfd5070b10a07ac9dc2b65fc15c606 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15115)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

80 matches

Mail list logo