[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051787441


   
   ## CI report:
   
   * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341)
 
   * 6816a4b47b88108172b46fece160e4e078345687 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6345)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051783953


   
   ## CI report:
   
   * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341)
 
   * 6816a4b47b88108172b46fece160e4e078345687 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051595547


   
   ## CI report:
   
   * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051783953


   
   ## CI report:
   
   * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341)
 
   * 6816a4b47b88108172b46fece160e4e078345687 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4909:
URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051523638


   
   ## CI report:
   
   * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4909:
URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051780284


   
   ## CI report:
   
   * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6344)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051780334


   
   ## CI report:
   
   * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342)
 
   * 758d417cc8f02537d8174f19c904c062b0873646 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6343)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051759517


   
   ## CI report:
   
   * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342)
 
   * 758d417cc8f02537d8174f19c904c062b0873646 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] cuibo01 commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink

2022-02-25 Thread GitBox


cuibo01 commented on pull request #4909:
URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051779248


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051759517


   
   ## CI report:
   
   * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342)
 
   * 758d417cc8f02537d8174f19c904c062b0873646 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051756102


   
   ## CI report:
   
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342)
 
   * 758d417cc8f02537d8174f19c904c062b0873646 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051704555


   
   ## CI report:
   
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051756102


   
   ## CI report:
   
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342)
 
   * 758d417cc8f02537d8174f19c904c062b0873646 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051704555


   
   ## CI report:
   
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6342)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051701135


   
   ## CI report:
   
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051701135


   
   ## CI report:
   
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   * 6ba1413ff8b09ec39ec823ae2e3816cd217df553 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051572873


   
   ## CI report:
   
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051595547


   
   ## CI report:
   
   * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051565083


   
   ## CI report:
   
   * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328)
 
   * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051572873


   
   ## CI report:
   
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051543590


   
   ## CI report:
   
   * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316)
 
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] XuQianJin-Stars commented on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL

2022-02-25 Thread GitBox


XuQianJin-Stars commented on pull request #4901:
URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051567205


   > @XuQianJin-Stars do you have some time to review this pr, thanks, may be 
we can add this to [HUDI-3161](https://issues.apache.org/jira/browse/HUDI-3161)
   
   well, Let me  review this pr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] stayrascal commented on a change in pull request #4724: [HUDI-2815] add partial overwrite payload to support partial overwrit…

2022-02-25 Thread GitBox


stayrascal commented on a change in pull request #4724:
URL: https://github.com/apache/hudi/pull/4724#discussion_r815265938



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java
##
@@ -58,6 +58,31 @@ default T preCombine(T oldValue, Properties properties) {
 return preCombine(oldValue);
   }
 
+  /**
+   *When more than one HoodieRecord have the same HoodieKey in the incoming 
batch, this function combines them before attempting to insert/upsert by taking 
in a property map.
+   *
+   * @param oldValue instance of the old {@link HoodieRecordPayload} to be 
combined with.
+   * @param properties Payload related properties. For example pass the 
ordering field(s) name to extract from value in storage.
+   * @param schema Schema used for record
+   * @return the combined value
+   */
+  @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
+  default T preCombine(T oldValue, Properties properties, Schema schema) {

Review comment:
   BTW, thanks a lot for you time, will ping you on slack.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] stayrascal commented on a change in pull request #4724: [HUDI-2815] add partial overwrite payload to support partial overwrit…

2022-02-25 Thread GitBox


stayrascal commented on a change in pull request #4724:
URL: https://github.com/apache/hudi/pull/4724#discussion_r815265857



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java
##
@@ -58,6 +58,31 @@ default T preCombine(T oldValue, Properties properties) {
 return preCombine(oldValue);
   }
 
+  /**
+   *When more than one HoodieRecord have the same HoodieKey in the incoming 
batch, this function combines them before attempting to insert/upsert by taking 
in a property map.
+   *
+   * @param oldValue instance of the old {@link HoodieRecordPayload} to be 
combined with.
+   * @param properties Payload related properties. For example pass the 
ordering field(s) name to extract from value in storage.
+   * @param schema Schema used for record
+   * @return the combined value
+   */
+  @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
+  default T preCombine(T oldValue, Properties properties, Schema schema) {

Review comment:
   Hi @alexeykudinkin , Thanks a lot for you detail clarification.
   1. Regarding the design of `preCombine`, I'm clear now. I'm sorry I don't 
know the detail of RFC-46, and also I didn't find the link RFC-46 from 
[here](https://cwiki.apache.org/confluence/display/HUDI/RFC+Process), cloud you 
please share the link?
   2. and regarding the requirements for partial updates/overwrite, I saw some 
same requirements from community. In my case, generally, we want to build a 
customer profile with multiple attributes, these attributes might come from 
different systems, one system might only provides some attributes in a 
event/record, and two systems might the events/records with different 
attributes, we should not only choose the recent one, we need to merged them 
before writing to disk. Otherwise, we have to keep all change logs, and then 
start a new job to dedup & merge these attributes among the change logs. For 
example, we have 10 attributes a1-a10(all of them are optional), source system 
A only has the a1-a5, source system B only has a6-a10, what result we expect is 
that the final record contains a1-a10, not only a1-a5 or a6-a10. And because we 
might receive two events/records in same time, they might be in a same batch, 
that's why we want to merge them before `combineAndGetUpdateValue `.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051565083


   
   ## CI report:
   
   * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328)
 
   * 875ec8b00cd379e669498fe7575503b192f0de5e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6341)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051564339


   
   ## CI report:
   
   * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328)
 
   * 875ec8b00cd379e669498fe7575503b192f0de5e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051564339


   
   ## CI report:
   
   * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328)
 
   * 875ec8b00cd379e669498fe7575503b192f0de5e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4913: [WIP][HUDI-1517] create marker file for every log file

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4913:
URL: https://github.com/apache/hudi/pull/4913#issuecomment-1051223344


   
   ## CI report:
   
   * ea1621d1d17e2c85fe9f69f6b39aaa08f61871d7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6328)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051543457


   
   ## CI report:
   
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN
   * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6339)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051558049


   
   ## CI report:
   
   * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN
   * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6339)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao removed a comment on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL

2022-02-25 Thread GitBox


xiarixiaoyao removed a comment on pull request #4901:
URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051549434


   @XuQianJin-Stars  do you have some time to review this pr, thanks,  may be 
we can add this to HUDI-3161


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051552922


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6338)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051495717


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6338)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL

2022-02-25 Thread GitBox


xiarixiaoyao commented on pull request #4901:
URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051550139


   @huberylee Thank you very much for contributing, pls fixed the CI build


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL

2022-02-25 Thread GitBox


xiarixiaoyao commented on pull request #4901:
URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051549434


   @XuQianJin-Stars  do you have some time to review this pr, thanks,  may be 
we can add this to HUDI-3161


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4901: [HUDI-3445] Supporting Clustering Command Based on Call Procedure Command for Spark SQL

2022-02-25 Thread GitBox


xiarixiaoyao commented on pull request #4901:
URL: https://github.com/apache/hudi/pull/4901#issuecomment-1051549041


   @XuQianJin-Stars  do you have some time to review this pr, thanks,  may be 
we can add this to HUDI-3161


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051543457


   
   ## CI report:
   
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN
   * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6339)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051543590


   
   ## CI report:
   
   * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316)
 
   * e909b66fb05a4cdad405b144b041554f45664d3e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6340)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051499404


   
   ## CI report:
   
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN
   * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051540581


   
   ## CI report:
   
   * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316)
 
   * e909b66fb05a4cdad405b144b041554f45664d3e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1051540581


   
   ## CI report:
   
   * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316)
 
   * e909b66fb05a4cdad405b144b041554f45664d3e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4910: [RFC-33] [HUDI-2429][Stacked on HUDI-2560] Support full Schema evolution for Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4910:
URL: https://github.com/apache/hudi/pull/4910#issuecomment-1050966417


   
   ## CI report:
   
   * 4e04e2076296e16e1b5b60b510ed85d536873a93 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6316)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4909:
URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051523638


   
   ## CI report:
   
   * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4909:
URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051437141


   
   ## CI report:
   
   * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310)
 
   * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4911: [HUDI-3460]Flink TM memory Optimization

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4911:
URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051516503


   
   ## CI report:
   
   * f275704b3dc2fbe99be692bc8c4d2cef383664f0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6336)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4911: [HUDI-3460]Flink TM memory Optimization

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4911:
URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051437185


   
   ## CI report:
   
   * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313)
 
   * f275704b3dc2fbe99be692bc8c4d2cef383664f0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6336)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on a change in pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


xushiyan commented on a change in pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#discussion_r815258027



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##
@@ -269,25 +300,25 @@ public static GenericRecord generateGenericRecord(String 
rowKey, String partitio
 rec.put("partition_path", partitionPath);
 rec.put("rider", riderName);
 rec.put("driver", driverName);
-rec.put("begin_lat", RAND.nextDouble());
-rec.put("begin_lon", RAND.nextDouble());
-rec.put("end_lat", RAND.nextDouble());
-rec.put("end_lon", RAND.nextDouble());
+rec.put("begin_lat", r.nextDouble());
+rec.put("begin_lon", r.nextDouble());
+rec.put("end_lat", r.nextDouble());
+rec.put("end_lon", r.nextDouble());
 if (isFlattened) {
-  rec.put("fare", RAND.nextDouble() * 100);
+  rec.put("fare", r.nextDouble() * 100);
   rec.put("currency", "USD");
 } else {
-  rec.put("distance_in_meters", RAND.nextInt());
-  rec.put("seconds_since_epoch", RAND.nextLong());
-  rec.put("weight", RAND.nextFloat());
+  rec.put("distance_in_meters", r.nextInt());
+  rec.put("seconds_since_epoch", r.nextLong());
+  rec.put("weight", r.nextFloat());
   byte[] bytes = "Canada".getBytes();
   rec.put("nation", ByteBuffer.wrap(bytes));
-  long currentTimeMillis = System.currentTimeMillis();
-  Date date = new Date(currentTimeMillis);
+  long randomMillis = genRandomTimeMillis(r);
+  Date date = new Date(randomMillis);

Review comment:
   It's probably LocalDate we need




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051483851


   
   ## CI report:
   
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051499404


   
   ## CI report:
   
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN
   * 2c85afe1c7a84f3e7d55a8111bc6a6e9a0214c16 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051495717


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6338)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051413537


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] alexeykudinkin commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


alexeykudinkin commented on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051494793


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


alexeykudinkin commented on a change in pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#discussion_r815255941



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java
##
@@ -269,25 +300,25 @@ public static GenericRecord generateGenericRecord(String 
rowKey, String partitio
 rec.put("partition_path", partitionPath);
 rec.put("rider", riderName);
 rec.put("driver", driverName);
-rec.put("begin_lat", RAND.nextDouble());
-rec.put("begin_lon", RAND.nextDouble());
-rec.put("end_lat", RAND.nextDouble());
-rec.put("end_lon", RAND.nextDouble());
+rec.put("begin_lat", r.nextDouble());
+rec.put("begin_lon", r.nextDouble());
+rec.put("end_lat", r.nextDouble());
+rec.put("end_lon", r.nextDouble());
 if (isFlattened) {
-  rec.put("fare", RAND.nextDouble() * 100);
+  rec.put("fare", r.nextDouble() * 100);
   rec.put("currency", "USD");
 } else {
-  rec.put("distance_in_meters", RAND.nextInt());
-  rec.put("seconds_since_epoch", RAND.nextLong());
-  rec.put("weight", RAND.nextFloat());
+  rec.put("distance_in_meters", r.nextInt());
+  rec.put("seconds_since_epoch", r.nextLong());
+  rec.put("weight", r.nextFloat());
   byte[] bytes = "Canada".getBytes();
   rec.put("nation", ByteBuffer.wrap(bytes));
-  long currentTimeMillis = System.currentTimeMillis();
-  Date date = new Date(currentTimeMillis);
+  long randomMillis = genRandomTimeMillis(r);
+  Date date = new Date(randomMillis);

Review comment:
   Yeah, no problem. Which one are you referring to? There's no such thing 
as `DateTime` in java.time




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4818: [HUDI-3396] Make sure `BaseFileOnlyViewRelation` only reads projected columns

2022-02-25 Thread GitBox


alexeykudinkin commented on a change in pull request #4818:
URL: https://github.com/apache/hudi/pull/4818#discussion_r815255066



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java
##
@@ -348,6 +355,21 @@ protected HoodieWriteConfig getConfig(Boolean autoCommit, 
Boolean rollbackUsingM
 .withRollbackUsingMarkers(rollbackUsingMarkers);
   }
 
+  protected Dataset toDataset(List records) {

Review comment:
   Good call

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileScanRDD.scala
##
@@ -18,56 +18,37 @@
 
 package org.apache.hudi
 
-import org.apache.spark.{Partition, TaskContext}
-import org.apache.spark.rdd.RDD
-import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.execution.QueryExecutionException
-import org.apache.spark.sql.{Row, SparkSession}
 import org.apache.spark.sql.execution.datasources.{FilePartition, 
PartitionedFile, SchemaColumnConvertNotSupportedException}
-import org.apache.spark.sql.types.StructType
+import org.apache.spark.{Partition, TaskContext}
 
 /**
- * Similar to [[org.apache.spark.sql.execution.datasources.FileScanRDD]].
- *
- * This class will extract the fields needed according to [[requiredColumns]] 
and
- * return iterator of [[org.apache.spark.sql.Row]] directly.
+ * TODO eval if we actually need it

Review comment:
   This would be cleaned up in a stacked on PR

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileScanRDD.scala
##
@@ -93,17 +74,8 @@ class HoodieFileScanRDD(
 // Register an on-task-completion callback to close the input stream.
 context.addTaskCompletionListener[Unit](_ => iterator.close())
 
-// extract required columns from row
-val iterAfterExtract = HoodieDataSourceHelper.extractRequiredSchema(

Review comment:
   This utility itself would be cleaned up in a stacked PR

##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestParquetColumnProjection.scala
##
@@ -0,0 +1,293 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.avro.Schema
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, 
DefaultSource, HoodieBaseRelation, HoodieSparkUtils, HoodieUnsafeRDD}
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.common.testutils.{HadoopMapRedUtils, 
HoodieTestDataGenerator}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.keygen.NonpartitionedKeyGenerator
+import org.apache.hudi.testutils.SparkClientFunctionalTestHarness
+import org.apache.parquet.hadoop.util.counters.BenchmarkCounter
+import org.apache.spark.HoodieUnsafeRDDUtils
+import org.apache.spark.sql.{Dataset, Row, SaveMode}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.junit.jupiter.api.Assertions.assertEquals
+import org.junit.jupiter.api.{Tag, Test}
+
+import scala.collection.JavaConverters._
+
+@Tag("functional")
+class TestParquetColumnProjection extends SparkClientFunctionalTestHarness {
+
+  val defaultWriteOpts = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+"hoodie.bulkinsert.shuffle.parallelism" -> "2",
+"hoodie.delete.shuffle.parallelism" -> "1",
+DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+DataSourceWriteOptions.PRECOMBINE_FIELD.key -> "timestamp",
+HoodieWriteConfig.TBL_NAME.key -> "hoodie_test",
+HoodieMetadataConfig.ENABLE.key -> "true",
+// NOTE: It's critical that we use non-partitioned table, since the way we 
track amount of bytes read
+//   is not robust, and works most reliably only when we read just a 
single file. As such, making table
+//   non-partitioned makes it much more likely just a single file will 
be written
+DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> 

[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051481439


   
   ## CI report:
   
   * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251)
 
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051483851


   
   ## CI report:
   
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051481439


   
   ## CI report:
   
   * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251)
 
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   * 180ea55d8c08a4933202dbb3cd2cc87b06e0ef3d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051475251


   
   ## CI report:
   
   * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251)
 
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051472573


   
   ## CI report:
   
   * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251)
 
   * 19e5414e6de587e6c941e818e6961a96057b5e7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051475251


   
   ## CI report:
   
   * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251)
 
   * 19e5414e6de587e6c941e818e6961a96057b5e7f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6337)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1049425839


   
   ## CI report:
   
   * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4866: [HUDI-3469] Refactor `HoodieTestDataGenerator` to provide for reproducible Builds

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4866:
URL: https://github.com/apache/hudi/pull/4866#issuecomment-1051472573


   
   ## CI report:
   
   * e6627d184210cb0949ed4f378a0e55fcff3823af Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6251)
 
   * 19e5414e6de587e6c941e818e6961a96057b5e7f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-347) Fix TestHoodieClientOnCopyOnWriteStorage Tests with modular private methods

2022-02-25 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498382#comment-17498382
 ] 

sivabalan narayanan commented on HUDI-347:
--

yes, makes sense. please go ahead. 

> Fix TestHoodieClientOnCopyOnWriteStorage Tests with modular private methods
> ---
>
> Key: HUDI-347
> URL: https://issues.apache.org/jira/browse/HUDI-347
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing, writer-core
>Reporter: sivabalan narayanan
>Assignee: Rajesh
>Priority: Major
>  Labels: new-to-hudi
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-3409) Expose Timeline Server Metrics

2022-02-25 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498381#comment-17498381
 ] 

sivabalan narayanan commented on HUDI-3409:
---

yes, sure makes sense. 

> Expose Timeline Server Metrics
> --
>
> Key: HUDI-3409
> URL: https://issues.apache.org/jira/browse/HUDI-3409
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: timeline-server
>Reporter: DarAmani Swift
>Assignee: Rajesh
>Priority: Major
>  Labels: new-to-hudi
>
> Timeline server metrics are pushed to local registry but never going to 
> reporters. Exposing these metrics would greatly improve debugging latency 
> around async processes and timeline server syncs. 
> Metrics are already captured in the [Request 
> Handler|https://github.com/apache/hudi/blob/master/hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java#L527-L531]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4909:
URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051434239


   
   ## CI report:
   
   * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310)
 
   * 33493276318c24c12d2e78ed719b0cb794c8b656 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4911: [HUDI-3460]Flink TM memory Optimization

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4911:
URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051437185


   
   ## CI report:
   
   * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313)
 
   * f275704b3dc2fbe99be692bc8c4d2cef383664f0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6336)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4909:
URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051437141


   
   ## CI report:
   
   * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310)
 
   * 33493276318c24c12d2e78ed719b0cb794c8b656 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6335)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4911: [HUDI-3460]Flink TM memory Optimization

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4911:
URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051434289


   
   ## CI report:
   
   * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313)
 
   * f275704b3dc2fbe99be692bc8c4d2cef383664f0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4911: [HUDI-3460]Flink TM memory Optimization

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4911:
URL: https://github.com/apache/hudi/pull/4911#issuecomment-1050892174


   
   ## CI report:
   
   * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4911: [HUDI-3460]Flink TM memory Optimization

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4911:
URL: https://github.com/apache/hudi/pull/4911#issuecomment-1051434289


   
   ## CI report:
   
   * 17d293ebacd7125656795cc04815e5bd41a1666a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6313)
 
   * f275704b3dc2fbe99be692bc8c4d2cef383664f0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4909:
URL: https://github.com/apache/hudi/pull/4909#issuecomment-1050837761


   
   ## CI report:
   
   * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4909: [HUDI-3516][HUDI-FLINK]Optimize the memory of HoodieDataBlock for Flink

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4909:
URL: https://github.com/apache/hudi/pull/4909#issuecomment-1051434239


   
   ## CI report:
   
   * 8e1f10b144a87e8e262c19d1e408cdb4829c8fcf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6310)
 
   * 33493276318c24c12d2e78ed719b0cb794c8b656 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] allenxyang commented on a change in pull request #4679: [HUDI-3315] RFC-35 Part-1 Support bucket index in Flink writer

2022-02-25 Thread GitBox


allenxyang commented on a change in pull request #4679:
URL: https://github.com/apache/hudi/pull/4679#discussion_r814511291



##
File path: hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java
##
@@ -189,21 +191,32 @@
   }
 
   public static DataStream hoodieStreamWrite(Configuration conf, int 
defaultParallelism, DataStream dataStream) {
-WriteOperatorFactory operatorFactory = 
StreamWriteOperator.getFactory(conf);
-return dataStream
+if (OptionsResolver.isBucketIndexType(conf)) {

Review comment:
   After I used this patch, reported this error. I don't know why.
   
   Caused by: java.lang.RuntimeException: The timer service has not been 
initialized.
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.getInternalTimerService(AbstractStreamOperator.java:616)
 
at 
org.apache.flink.streaming.api.operators.KeyedProcessOperator.open(KeyedProcessOperator.java:62)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:428)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$2(StreamTask.java:555)
 
at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
 
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:545)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:585) 
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:765)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:580)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4907:
URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051421561


   
   ## CI report:
   
   * 55db32bf4b6aa3796be90879815328ae376dd606 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6334)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4907:
URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051371368


   
   ## CI report:
   
   * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331)
 
   * 55db32bf4b6aa3796be90879815328ae376dd606 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6334)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051413537


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051369841


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255)
 
   * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #4818: [HUDI-3396] Make sure `BaseFileOnlyViewRelation` only reads projected columns

2022-02-25 Thread GitBox


nsivabalan commented on a change in pull request #4818:
URL: https://github.com/apache/hudi/pull/4818#discussion_r815229255



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkClientFunctionalTestHarness.java
##
@@ -348,6 +355,21 @@ protected HoodieWriteConfig getConfig(Boolean autoCommit, 
Boolean rollbackUsingM
 .withRollbackUsingMarkers(rollbackUsingMarkers);
   }
 
+  protected Dataset toDataset(List records) {

Review comment:
   can we take in AvroSchema as an argument. may be create another 
overloaded method which calls into this w/ some default for avro schema.

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/BaseFileOnlyViewRelation.scala
##
@@ -89,18 +108,46 @@ class BaseFileOnlyViewRelation(
   inMemoryFileIndex.listFiles(partitionFilters, dataFilters)
 }
 
-val partitionFiles = partitionDirectories.flatMap { partition =>
+val partitions = partitionDirectories.flatMap { partition =>
   partition.files.flatMap { file =>
+// TODO move to adapter
+// TODO fix, currently assuming parquet as underlying format
 HoodieDataSourceHelper.splitFiles(
   sparkSession = sparkSession,
   file = file,
-  partitionValues = partition.values
+  // TODO clarify why this is required
+  partitionValues = InternalRow.empty

Review comment:
   I see why we are doing this. do you think we can fix 
   HoodieDataSourceHelper.splitFiles() only to not take in the last arg and 
directly set InternalRow.empty when creating Partitionedfile. I guess this is 
the only usage/caller. so should be safe to do. 

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieDataSourceHelper.scala
##
@@ -33,43 +33,6 @@ import scala.collection.JavaConverters._
 
 object HoodieDataSourceHelper extends PredicateHelper {
 
-  /**
-   * Partition the given condition into two sequence of conjunctive predicates:
-   * - predicates that can be evaluated using metadata only.
-   * - other predicates.
-   */
-  def splitPartitionAndDataPredicates(
-  spark: SparkSession,
-  condition: Expression,
-  partitionColumns: Seq[String]): (Seq[Expression], Seq[Expression]) = {
-splitConjunctivePredicates(condition).partition(
-  isPredicateMetadataOnly(spark, _, partitionColumns))
-  }
-
-  /**
-   * Check if condition can be evaluated using only metadata. In Delta, this 
means the condition
-   * only references partition columns and involves no subquery.
-   */
-  def isPredicateMetadataOnly(

Review comment:
   guess it was copied over from here. 

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieUnsafeRDD.scala
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.{Partition, SparkContext, TaskContext}
+
+/**
+ * !!! PLEASE READ CAREFULLY !!!
+ *
+ * Base class for all of the custom low-overhead RDD implementations for Hudi.
+ *
+ * To keep memory allocation footprint as low as possible, each inheritor of 
this RDD base class
+ *
+ * 
+ *   1. Does NOT deserialize from [[InternalRow]] to [[Row]] (therefore only 
providing access to
+ *   Catalyst internal representations (often mutable) of the read row)
+ *
+ *   2. DOES NOT COPY UNDERLYING ROW OUT OF THE BOX, meaning that
+ *
+ *  a) access to this RDD is NOT thread-safe
+ *
+ *  b) iterating over it reference to a _mutable_ underlying instance (of 
[[InternalRow]]) is
+ *  returned, entailing that after [[Iterator#next()]] is invoked on the 
provided iterator,
+ *  previous reference becomes **invalid**. Therefore, you will have to 
copy underlying mutable
+ *  instance of [[InternalRow]] if you plan to access it after 
[[Iterator#next()]] is invoked (filling
+ *  it with the next row's payload)
+ *
+ *  c) due to item b) above, no operation other than the iteration will 
produce meaningful
+ *  results on it and 

[GitHub] [hudi] hudi-bot removed a comment on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4915:
URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051367237


   
   ## CI report:
   
   * 402156b00dfb5d49593f80df9d18656e08fd3dcb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6332)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4915:
URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051405803


   
   ## CI report:
   
   * 402156b00dfb5d49593f80df9d18656e08fd3dcb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6332)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 closed issue #4757: [SUPPORT] Support flink 1.14 in future

2022-02-25 Thread GitBox


danny0405 closed issue #4757:
URL: https://github.com/apache/hudi/issues/4757


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4907:
URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051369994


   
   ## CI report:
   
   * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331)
 
   * 55db32bf4b6aa3796be90879815328ae376dd606 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4907:
URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051371368


   
   ## CI report:
   
   * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331)
 
   * 55db32bf4b6aa3796be90879815328ae376dd606 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6334)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4907:
URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051333479


   
   ## CI report:
   
   * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4907:
URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051369994


   
   ## CI report:
   
   * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331)
 
   * 55db32bf4b6aa3796be90879815328ae376dd606 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051369841


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255)
 
   * 2f65bc5fe37942aa72721f19a483194da02ba912 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6333)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051359116


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255)
 
   * 2f65bc5fe37942aa72721f19a483194da02ba912 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4915:
URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051367237


   
   ## CI report:
   
   * 402156b00dfb5d49593f80df9d18656e08fd3dcb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6332)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4915:
URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051365831


   
   ## CI report:
   
   * 402156b00dfb5d49593f80df9d18656e08fd3dcb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4915:
URL: https://github.com/apache/hudi/pull/4915#issuecomment-1051365831


   
   ## CI report:
   
   * 402156b00dfb5d49593f80df9d18656e08fd3dcb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhedoubushishi opened a new pull request #4915: [WIP] Allow loading external configs while querying Hudi tables with Spark

2022-02-25 Thread GitBox


zhedoubushishi opened a new pull request #4915:
URL: https://github.com/apache/hudi/pull/4915


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1051359116


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255)
 
   * 2f65bc5fe37942aa72721f19a483194da02ba912 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4739: [HUDI-3365] Make sure Metadata Table records are updated appropriately on HDFS

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4739:
URL: https://github.com/apache/hudi/pull/4739#issuecomment-1049487198


   
   ## CI report:
   
   * 11f1b688459ab9017ebde2a38d1645e0f59b50c3 UNKNOWN
   * c243f70d774b7ecb059dad4bb03870b2c2d4436b UNKNOWN
   * 2790e24a229e808602113c7ed80932b09e56c8fd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6255)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3519) Make sure every public Hudi Client Method invokes necessary prologue

2022-02-25 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-3519:
-

 Summary: Make sure every public Hudi Client Method invokes 
necessary prologue
 Key: HUDI-3519
 URL: https://issues.apache.org/jira/browse/HUDI-3519
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Alexey Kudinkin


Right now, only a handful of operations actually invoke the "prologue" method 
doing, for ex
 # Checks around whether the table needs to be upgraded
 # Bootstraps MDT (if necessary)

As well as some other minor book-keeping stuff. As part of 
[https://github.com/apache/hudi/pull/4739,] i had to address that and 
introduced universal method `initTable` that serves as such prologue.

However, while i've injected it into most major public methods of the Hudi 
Client's Base class, we need to carefully and holistically review all remaining 
exposed *public* methods and make sure that all _public-facing_ operations 
(insert, upsert, commit, delete, rollback, clean, etc) are invoking prologue 
properly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4907:
URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051299304


   
   ## CI report:
   
   * b4cc5c73732106ad1528ef52b52e3dedfcc305d1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6325)
 
   * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4907:
URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051333479


   
   ## CI report:
   
   * 90895141bcecf4a6f966faf73b2c0fa609290281 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6331)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4468: [HUDI-3130] Fixing Hive getSchema for RT tables addressing different partitions having different schemas

2022-02-25 Thread GitBox


hudi-bot commented on pull request #4468:
URL: https://github.com/apache/hudi/pull/4468#issuecomment-1051316141


   
   ## CI report:
   
   * 2461ceb7bb042694782128370df1e96f05a71391 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6330)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4468: [HUDI-3130] Fixing Hive getSchema for RT tables addressing different partitions having different schemas

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4468:
URL: https://github.com/apache/hudi/pull/4468#issuecomment-1051280091


   
   ## CI report:
   
   * 59331875dcbfb6d64a6cca6e2794c7be565bc97d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6329)
 
   * 2461ceb7bb042694782128370df1e96f05a71391 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6330)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4724: [HUDI-2815] add partial overwrite payload to support partial overwrit…

2022-02-25 Thread GitBox


alexeykudinkin commented on a change in pull request #4724:
URL: https://github.com/apache/hudi/pull/4724#discussion_r815187048



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java
##
@@ -58,6 +58,31 @@ default T preCombine(T oldValue, Properties properties) {
 return preCombine(oldValue);
   }
 
+  /**
+   *When more than one HoodieRecord have the same HoodieKey in the incoming 
batch, this function combines them before attempting to insert/upsert by taking 
in a property map.
+   *
+   * @param oldValue instance of the old {@link HoodieRecordPayload} to be 
combined with.
+   * @param properties Payload related properties. For example pass the 
ordering field(s) name to extract from value in storage.
+   * @param schema Schema used for record
+   * @return the combined value
+   */
+  @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
+  default T preCombine(T oldValue, Properties properties, Schema schema) {

Review comment:
   Let me try to clarify a few things: 
   
   `preCombine` has a _very specific_ semantic: it's de-duplicating by the way 
of picking "most recent" among records in the batch. Expectation always is that 
it being handed 2 records it will **have to** return either of them. It could 
not produce new record. If we want to revisit this semantic this is a far 
larger change that will surely require writing an RFC and broader discussion 
regarding the merits of such migration. Please also keep in mind that as of 
RFC-46 there's an effort underway to abstract whole "record 
combination/merging" semantic out of `RecordPayload` hierarchy into standalone 
Combination/Merge Engine API.
   
   > First, from the description of preCombine method, it used for combining 
multiple records with same HoodieKey before attempting to insert/upsert to 
disk. The "combine multiple records" might not mean only choosing one of them, 
we also can combine & merged them to a new one, just depends on how the 
sub-class implement the preCombine logic(Please correct me if my understanding 
is wrong :) ). Yeah, it might be a little bit confused that we need Schema if 
we are trying to merged them.
   
   Please see my comment regarding `preCombine` semantic above. I certainly 
agree with you that the name is confusing, but i've tried to clear that 
confusion. Let me know if you have more questions about it.
   
   > Second, I checked when will we call preCombine method is trying to 
duplicate records with same HoodieKey before insert/update to disk, especially 
in Flink write case, even through the duplicated logic is choose the latest 
record, but we need to ensure that one HoodieKey should only contains one 
record before comparing to existing record and write to disk, otherwise, some 
records will missed. For example, in HoodieMergeHandle.init(fieId, 
newRecordsIter), it will convert the record iterator to a map and treat the 
recordKey as key. So we might not stop de-duping logics and merge them against 
what is on disk unless we change the logic here. And also we implement another 
class/method to handle the merge logic, and switch the existing de-duping logic 
from calling preCombine to new class/method, we have to add an condition to 
control whether should we call preCombine or not, I think it might not a good 
way. Instead, we should handle it in preCombine method by different implemented 
payl
 oad.
   
   You're bringing up a good points, let's dive into them one by one: so 
currently we have 2 mechanisms
   
   1. `preCombine` that allows to select "most recent" record among those 
having the same key w/in the batch
   2. `combineAndGetUpdateValue` that allows to combine previous or 
"historical" record (on Disk) with the new incoming one (all partial merging 
semantic is currently implemented in this method)
   
   You rightfully mention some of the invariants are currently that the batch 
would be de-duped at certain level (b/c we have to maintain PK uniqueness on 
disk), and so we might need to shift that to accommodate for case that you 
have. And that's exactly what my question was: if you can elaborate on use-case 
that you have at hand that you're trying to solve w/ this PR, i would be able 
to better understand where you're coming from and what's the best path forward 
for us here.
   
   Questions i'm looking an answers for are basically following:
   
   1. What's nature of your use-case? (domain, record types, frequency, size, 
etc)
   2. Where requirements for partial updates are coming from?
   
   and etc.  I'm happy to set some 30min to talk in person regarding this or 
connect on Slack and discuss it there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:

[GitHub] [hudi] hudi-bot removed a comment on pull request #4907: [WIP][CI Test Only][HUDI-1180] Upgrade HBase to 2.4.9

2022-02-25 Thread GitBox


hudi-bot removed a comment on pull request #4907:
URL: https://github.com/apache/hudi/pull/4907#issuecomment-1051297656


   
   ## CI report:
   
   * b4cc5c73732106ad1528ef52b52e3dedfcc305d1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6325)
 
   * 90895141bcecf4a6f966faf73b2c0fa609290281 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   >