[GitHub] [hudi] hudi-bot removed a comment on pull request #4611: [HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Datasource V2

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4611:
URL: https://github.com/apache/hudi/pull/4611#issuecomment-1034558510


   
   ## CI report:
   
   * 5feee50ea8accfc643e1d9fd607e9e605cd97a40 UNKNOWN
   * 0abd457fb40ee1af67da5d283cd6c09e6e07dac9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5490)
 
   * e93af4603cc620726fc34a7d4aa482ee0acb7c41 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5865)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4611: [HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Datasource V2

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4611:
URL: https://github.com/apache/hudi/pull/4611#issuecomment-1034600357


   
   ## CI report:
   
   * 5feee50ea8accfc643e1d9fd607e9e605cd97a40 UNKNOWN
   * e93af4603cc620726fc34a7d4aa482ee0acb7c41 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5865)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4783: [HUDI-3333] fix that getNestedFieldVal breaks with Spark 3.2

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4783:
URL: https://github.com/apache/hudi/pull/4783#issuecomment-1034596256


   
   ## CI report:
   
   * f2fd0ecbb5066e5a1fa0ef243dbbade950e28230 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4783: [HUDI-3333] fix that getNestedFieldVal breaks with Spark 3.2

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4783:
URL: https://github.com/apache/hudi/pull/4783#issuecomment-1034598441


   
   ## CI report:
   
   * f2fd0ecbb5066e5a1fa0ef243dbbade950e28230 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5868)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4783: [HUDI-3333] fix that getNestedFieldVal breaks with Spark 3.2

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4783:
URL: https://github.com/apache/hudi/pull/4783#issuecomment-1034596256


   
   ## CI report:
   
   * f2fd0ecbb5066e5a1fa0ef243dbbade950e28230 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3333) getNestedFieldVal breaks with Spark 3.2

2022-02-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-:
-
Labels: pull-request-available  (was: )

> getNestedFieldVal breaks with Spark 3.2
> ---
>
> Key: HUDI-
> URL: https://issues.apache.org/jira/browse/HUDI-
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: Raymond Xu
>Assignee: Yann Byron
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> When set `returnNullIfNotFound` = true, the method sill throws exception. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] YannByron opened a new pull request #4783: [HUDI-3333] fix that getNestedFieldVal breaks with Spark 3.2

2022-02-09 Thread GitBox


YannByron opened a new pull request #4783:
URL: https://github.com/apache/hudi/pull/4783


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#issuecomment-1034543957


   
   ## CI report:
   
   * 1a7e348bfe3a36092e00177bb0440c05241871e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5834)
 
   * ccd1d89352a2f72feb381962718cc0c80920c041 UNKNOWN
   * d1b4a5ee6084e396ac4e74bb58c5303fe11f3b91 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5864)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


hudi-bot commented on pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#issuecomment-1034586681


   
   ## CI report:
   
   * ccd1d89352a2f72feb381962718cc0c80920c041 UNKNOWN
   * d1b4a5ee6084e396ac4e74bb58c5303fe11f3b91 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5864)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] cuibo01 commented on pull request #4699: [HUDI-3336][HUDI-FLINK] Support custom hadoop config options for flink

2022-02-09 Thread GitBox


cuibo01 commented on pull request #4699:
URL: https://github.com/apache/hudi/pull/4699#issuecomment-1034578682


   @danny0405 can u merge the PR? thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034577123


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866)
 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034573420


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866)
 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034569940


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034573420


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866)
 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on issue #4221: [SUPPORT] hudi mor table has a lack of data

2022-02-09 Thread GitBox


danny0405 commented on issue #4221:
URL: https://github.com/apache/hudi/issues/4221#issuecomment-1034570360


   > can you explain why MoR possibly can have less without the patch
   
   The data is not lost, it is committed but not exposed by the reader view 
before the patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034565025


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034569940


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4753: [HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4753:
URL: https://github.com/apache/hudi/pull/4753#issuecomment-1034566623


   
   ## CI report:
   
   * 9198c17de2ea5ca7331755b8df17c8fcbe66de68 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5817)
 
   * c94071e19feae0b56d3dbdd4329f947299b15221 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5867)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4753: [HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4753:
URL: https://github.com/apache/hudi/pull/4753#issuecomment-1034564935


   
   ## CI report:
   
   * 9198c17de2ea5ca7331755b8df17c8fcbe66de68 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5817)
 
   * c94071e19feae0b56d3dbdd4329f947299b15221 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034563213


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034565025


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4753: [HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4753:
URL: https://github.com/apache/hudi/pull/4753#issuecomment-1034564935


   
   ## CI report:
   
   * 9198c17de2ea5ca7331755b8df17c8fcbe66de68 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5817)
 
   * c94071e19feae0b56d3dbdd4329f947299b15221 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4753: [HUDI-3370] The files recorded in the commit may not match the actual ones for MOR Compaction

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4753:
URL: https://github.com/apache/hudi/pull/4753#issuecomment-1033361415


   
   ## CI report:
   
   * 9198c17de2ea5ca7331755b8df17c8fcbe66de68 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5817)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


yihua commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034563630


   CC @nsivabalan @manojpec @xushiyan @XuQianJin-Stars 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034563213


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034561491


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034561491


   
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3398) Schema validation fails for metadata table base file

2022-02-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3398:
-
Labels: HUDI-bug pull-request-available  (was: HUDI-bug)

> Schema validation fails for metadata table base file
> 
>
> Key: HUDI-3398
> URL: https://issues.apache.org/jira/browse/HUDI-3398
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: writer-core
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: HUDI-bug, pull-request-available
> Fix For: 0.11.0
>
>
> Stacktrace:
> {code:java}
> java.lang.IllegalArgumentException: Unknown file format 
> :file:/Users/ethan/Work/data/hudi/metadata_test_ds_mor_continuous_4/.hoodie/metadata/files/files-_0-93-815_20220208164926830001.hfile
>  at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:103)
>  at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:119)
> at 
> org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:480)
>  at 
> org.apache.hudi.common.table.TableSchemaResolver.(TableSchemaResolver.java:65)
>  at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:682)
>at 
> org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:698) 
> at 
> org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:173)
> at 
> org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:154)
>   at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:663)
>   at 
> org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:675)
> at 
> org.apache.hudi.client.BaseHoodieWriteClient.lambda$writeTableMetadata$0(BaseHoodieWriteClient.java:273)
>  at org.apache.hudi.common.util.Option.ifPresent(Option.java:96) at 
> org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:273)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:229)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:199)
>  at 
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:127)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:609)
> at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:329)
>at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:652)
>  at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}
> full logs: https://gist.github.com/yihua/e00a1caddacbdc570b5b757049750f39



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4611: [HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Datasource V2

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4611:
URL: https://github.com/apache/hudi/pull/4611#issuecomment-1034558510


   
   ## CI report:
   
   * 5feee50ea8accfc643e1d9fd607e9e605cd97a40 UNKNOWN
   * 0abd457fb40ee1af67da5d283cd6c09e6e07dac9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5490)
 
   * e93af4603cc620726fc34a7d4aa482ee0acb7c41 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5865)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4611: [HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Datasource V2

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4611:
URL: https://github.com/apache/hudi/pull/4611#issuecomment-1034547393


   
   ## CI report:
   
   * 5feee50ea8accfc643e1d9fd607e9e605cd97a40 UNKNOWN
   * 0abd457fb40ee1af67da5d283cd6c09e6e07dac9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5490)
 
   * e93af4603cc620726fc34a7d4aa482ee0acb7c41 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 opened a new pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

2022-02-09 Thread GitBox


zhangyue19921010 opened a new pull request #4782:
URL: https://github.com/apache/hudi/pull/4782


   Please look at https://issues.apache.org/jira/browse/HUDI-3398 for details.
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   The root cause is that https://github.com/apache/hudi/pull/4649 brings a bug.
   It doesn't take care of HFile or ORC file as base data file when checking 
`hasOperationField`
   
   Also add a UT for this Patch.
   Without this patch, this UT will failed
   ```
   511  [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
   6407 [main] WARN  org.apache.hudi.common.config.DFSPropertiesConfiguration  
- Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   6439 [main] WARN  org.apache.hudi.common.config.DFSPropertiesConfiguration  
- Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to 
load props file
   7618 [main] WARN  org.apache.hudi.metadata.HoodieBackedTableMetadata  - 
Metadata table was not found at path 
/var/folders/61/77xdhf3x0x9g3t_vdd1c9_nwr4wznp/T/hoodie_test_path480310162801304201/.hoodie/metadata
   24864 [main] WARN  org.apache.hudi.common.table.TableSchemaResolver  - 
Failed to read operation field from avro schema
   java.lang.IllegalArgumentException: Unknown file format 
:/var/folders/61/77xdhf3x0x9g3t_vdd1c9_nwr4wznp/T/hoodie_test_path480310162801304201/.hoodie/metadata/files/files-_0-90-85_20220210144836531001.hfile
at 
org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:103)
at 
org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:119)
at 
org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:480)
at 
org.apache.hudi.common.table.TableSchemaResolver.(TableSchemaResolver.java:65)
at 
org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:682)
at 
org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:698)
at 
org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:171)
at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:154)
at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:663)
at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:675)
at 
org.apache.hudi.client.BaseHoodieWriteClient.lambda$writeTableMetadata$0(BaseHoodieWriteClient.java:270)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
at 
org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:270)
at 
org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:226)
at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:197)
at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:644)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:292)
at 
org.apache.hudi.TestHoodieSparkSqlWriter.testTableSchemaResolver(TestHoodieSparkSqlWriter.scala:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
at 
org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
at 
org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationIntercepto

[GitHub] [hudi] hudi-bot removed a comment on pull request #4611: [HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Datasource V2

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4611:
URL: https://github.com/apache/hudi/pull/4611#issuecomment-1020807520


   
   ## CI report:
   
   * 5feee50ea8accfc643e1d9fd607e9e605cd97a40 UNKNOWN
   * 0abd457fb40ee1af67da5d283cd6c09e6e07dac9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5490)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4611: [HUDI-3254] Introduce HoodieCatalog to manage tables for Spark Datasource V2

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4611:
URL: https://github.com/apache/hudi/pull/4611#issuecomment-1034547393


   
   ## CI report:
   
   * 5feee50ea8accfc643e1d9fd607e9e605cd97a40 UNKNOWN
   * 0abd457fb40ee1af67da5d283cd6c09e6e07dac9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5490)
 
   * e93af4603cc620726fc34a7d4aa482ee0acb7c41 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #4746: [HUDI-3356][HUDI-3142] Metadata index initialization for bloom filters and column stats partitions

2022-02-09 Thread GitBox


codope commented on a change in pull request #4746:
URL: https://github.com/apache/hudi/pull/4746#discussion_r803335014



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##
@@ -843,28 +840,37 @@ protected void cleanIfNecessary(BaseHoodieWriteClient 
writeClient, String instan
   }
 
   /**
-   * This is invoked to bootstrap metadata table for a dataset. Bootstrap 
Commit has special handling mechanism due to its scale compared to
+   * This is invoked to initialize metadata table for a dataset. Bootstrap 
Commit has special handling mechanism due to its scale compared to
* other regular commits.
-   *
*/
-  protected void bootstrapCommit(List partitionInfoList, String 
createInstantTime) {
-List partitions = partitionInfoList.stream().map(p ->
-p.getRelativePath().isEmpty() ? NON_PARTITIONED_NAME : 
p.getRelativePath()).collect(Collectors.toList());
-final int totalFiles = partitionInfoList.stream().mapToInt(p -> 
p.getTotalFiles()).sum();
+  private void initialCommit(String createInstantTime) {

Review comment:
   +1 for extracting the actual commit part out of bootstrap (initialize 
filegroup). This will help me for async indexing.
   How about adding partitionType as another parameter and making 
`initialCommit` part of the `HoodieTableMetadataWriter` interface? That way we 
can call from write client or any action executor if needed. Not necessary to 
do this in PR but just a suggestion to consider.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #4761: [HUDI-3356][HUDI-3142][HUDI-1492] Metadata column stats index - handling delta writes

2022-02-09 Thread GitBox


codope commented on a change in pull request #4761:
URL: https://github.com/apache/hudi/pull/4761#discussion_r803315237



##
File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
##
@@ -599,4 +600,34 @@ public static Object getRecordColumnValues(HoodieRecord> columnRangeMap) {
+if (!(record instanceof GenericRecord)) {
+  throw new HoodieIOException("Record is not a generic type to get column 
range metadata!");
+}
+
+schema.getFields().forEach(field -> {
+  final String fieldVal = getNestedFieldValAsString((GenericRecord) 
record, field.name(), true, true);
+  final int fieldSize = fieldVal == null ? 0 : fieldVal.length();
+  final HoodieColumnRangeMetadata fieldRange = new 
HoodieColumnRangeMetadata<>(
+  filePath,
+  field.name(),
+  fieldVal,
+  fieldVal,
+  fieldVal == null ? 1 : 0,

Review comment:
   nit: better to declare `1` and `0` as meaningful constant to help 
readers?

##
File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
##
@@ -599,4 +600,34 @@ public static Object getRecordColumnValues(HoodieRecord, 
HoodieColumnRangeMetadata, HoodieColumnRangeMetadata> 
COLUMN_RANGE_MERGE_FUNCTION =
+  (oldColumnRange, newColumnRange) -> {
+
ValidationUtils.checkArgument(oldColumnRange.getColumnName().equals(newColumnRange.getColumnName()));
+
ValidationUtils.checkArgument(oldColumnRange.getFilePath().equals(newColumnRange.getFilePath()));
+return new HoodieColumnRangeMetadata<>(
+newColumnRange.getFilePath(),
+newColumnRange.getColumnName(),
+(Comparable) Arrays.asList(oldColumnRange.getMinValue(), 
newColumnRange.getMinValue())

Review comment:
   nit: remove redundant cast to comparable?

##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java
##
@@ -874,45 +869,53 @@ public static HoodieTableFileSystemView 
getFileSystemView(HoodieTableMetaClient
 }
   }
 
-  private static List getLatestColumns(HoodieTableMetaClient 
datasetMetaClient) {
-return getLatestColumns(datasetMetaClient, false);
+  private static List getColumnsToIndex(HoodieTableMetaClient 
datasetMetaClient) {
+return getColumnsToIndex(datasetMetaClient, false);
   }
 
   public static Stream 
translateWriteStatToColumnStats(HoodieWriteStat writeStat,
  
HoodieTableMetaClient datasetMetaClient,
- 
List latestColumns) {
-return getColumnStats(writeStat.getPartitionPath(), writeStat.getPath(), 
datasetMetaClient, latestColumns, false);
+ 
List columnsToIndex) {
+Option>> columnRangeMap 
= Option.empty();
+if (writeStat instanceof HoodieDeltaWriteStat && ((HoodieDeltaWriteStat) 
writeStat).getRecordsStats().isPresent()) {
+  columnRangeMap = Option.of(((HoodieDeltaWriteStat) 
writeStat).getRecordsStats().get().getStats());
+}
+return getColumnStats(writeStat.getPartitionPath(), writeStat.getPath(), 
datasetMetaClient, columnsToIndex,
+columnRangeMap, false);
 
   }
 
   private static Stream getColumnStats(final String 
partitionPath, final String filePathWithPartition,
  HoodieTableMetaClient 
datasetMetaClient,
- List columns, 
boolean isDeleted) {
+ List 
columnsToIndex,
+ Option>> columnRangeMap,
+ boolean isDeleted) {
 final String partition = partitionPath.equals(EMPTY_PARTITION_NAME) ? 
NON_PARTITIONED_NAME : partitionPath;
 final int offset = partition.equals(NON_PARTITIONED_NAME) ? 
(filePathWithPartition.startsWith("/") ? 1 : 0)
 : partition.length() + 1;
 final String fileName = filePathWithPartition.substring(offset);
-if (!FSUtils.isBaseFile(new Path(fileName))) {
-  return Stream.empty();
-}
 
 if 
(filePathWithPartition.endsWith(HoodieFileFormat.PARQUET.getFileExtension())) {
   List> columnRangeMetadataList = 
new ArrayList<>();
   final Path fullFilePath = new Path(datasetMetaClient.getBasePath(), 
filePathWithPartition);
   if (!isDeleted) {
 try {
   columnRangeMetadataList = new 
ParquetUtils().readRangeFromParquetMetadata(
-  datasetMetaClient.getHadoopConf(), fullFilePath, columns);
+  datasetMetaClient.getHadoopConf(), fullFilePath, columnsToIndex);
 } catch (Exception e) {
   LOG.error("Failed to read column stats for " + fullFilePath, e);
 }
   } else {
 columnRangeMetadataList =
-columns.stream().map(entry -

[GitHub] [hudi] hudi-bot commented on pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


hudi-bot commented on pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#issuecomment-1034543957


   
   ## CI report:
   
   * 1a7e348bfe3a36092e00177bb0440c05241871e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5834)
 
   * ccd1d89352a2f72feb381962718cc0c80920c041 UNKNOWN
   * d1b4a5ee6084e396ac4e74bb58c5303fe11f3b91 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5864)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#issuecomment-1034542280


   
   ## CI report:
   
   * 1a7e348bfe3a36092e00177bb0440c05241871e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5834)
 
   * ccd1d89352a2f72feb381962718cc0c80920c041 UNKNOWN
   * d1b4a5ee6084e396ac4e74bb58c5303fe11f3b91 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


hudi-bot commented on pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#issuecomment-1034542280


   
   ## CI report:
   
   * 1a7e348bfe3a36092e00177bb0440c05241871e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5834)
 
   * ccd1d89352a2f72feb381962718cc0c80920c041 UNKNOWN
   * d1b4a5ee6084e396ac4e74bb58c5303fe11f3b91 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#issuecomment-1034540522


   
   ## CI report:
   
   * 1a7e348bfe3a36092e00177bb0440c05241871e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5834)
 
   * ccd1d89352a2f72feb381962718cc0c80920c041 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


hudi-bot commented on pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#issuecomment-1034540522


   
   ## CI report:
   
   * 1a7e348bfe3a36092e00177bb0440c05241871e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5834)
 
   * ccd1d89352a2f72feb381962718cc0c80920c041 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#issuecomment-1033637808


   
   ## CI report:
   
   * 1a7e348bfe3a36092e00177bb0440c05241871e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5834)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Gatsby-Lee commented on issue #4221: [SUPPORT] hudi mor table has a lack of data

2022-02-09 Thread GitBox


Gatsby-Lee commented on issue #4221:
URL: https://github.com/apache/hudi/issues/4221#issuecomment-1034538320


   @danny0405 
   
   Hi, I am using Hudi 0.9.
   I saw the bug fix you mentioned in 0.10.
   
   Since I am using the provided version from AWS Glue, for now, I have to use 
Hudi 0.9.
   Is there any work around in Hudi 0.9 to overcome the bug you fix?
   
   If you have a chance, can you explain why MoR possibly can have less without 
the patch?
   
   Thank you
   Gatsby


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3646: [HUDI-349]: Added new cleaning policy based on number of hours

2022-02-09 Thread GitBox


nsivabalan commented on pull request #3646:
URL: https://github.com/apache/hudi/pull/3646#issuecomment-1034527787


   thanks for fixing all refactoring feedback. now the source code changes 
looks a lot less. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #3646: [HUDI-349]: Added new cleaning policy based on number of hours

2022-02-09 Thread GitBox


nsivabalan commented on a change in pull request #3646:
URL: https://github.com/apache/hudi/pull/3646#discussion_r803321267



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##
@@ -76,6 +76,11 @@
   .withDocumentation("Number of commits to retain, without cleaning. This 
will be retained for num_of_commits * time_between_commits "
   + "(scheduled). This also directly translates into how much data 
retention the table supports for incremental queries.");
 
+  public static final ConfigProperty CLEANER_HOURS_RETAINED = 
ConfigProperty.key("hoodie.cleaner.hours.retained")
+  .defaultValue("24")
+  .withDocumentation("Number of hours for which commits need to be 
retained. This config provides a more flexible option as"
+  + "compared to number of commits retained for cleaning service");

Review comment:
   sg

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##
@@ -76,6 +76,11 @@
   .withDocumentation("Number of commits to retain, without cleaning. This 
will be retained for num_of_commits * time_between_commits "
   + "(scheduled). This also directly translates into how much data 
retention the table supports for incremental queries.");
 
+  public static final ConfigProperty CLEANER_HOURS_RETAINED = 
ConfigProperty.key("hoodie.cleaner.hours.retained")
+  .defaultValue("24")
+  .withDocumentation("Number of hours for which commits need to be 
retained. This config provides a more flexible option as"
+  + "compared to number of commits retained for cleaning service");

Review comment:
   but can we also explicitly state in the documentation. 
   "this policy will clean up commits whose timestamps are greater than the 
configured hours have elapsed".or something of these sorts. will let you take 
the call.  

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##
@@ -330,6 +349,19 @@ public CleanPlanner(HoodieEngineContext context, 
HoodieTable hoodieT
 }
 return deletePaths;
   }
+
+  /**
+   * This method finds the files to be cleaned based on the number of hours. 
If {@code config.getCleanerHoursRetained()} is set to 5,
+   * all the files with commit time earlier than 5 hours will be removed. Also 
the latest file for any file group is retained.
+   * This policy gives much more flexibility to users for retaining data for 
running incremental queries as compared to
+   * KEEP_LATEST_COMMITS cleaning policy. The default number of hours is 5.
+   * @param partitionPath partition path to check
+   * @return list of files to clean
+   */
+  private List getFilesToCleanKeepingLatestHours(String 
partitionPath) {
+int commitsToRetain = 0;

Review comment:
   got it. we can directly pass 0 as 2nd arg. we don't need to declare the 
variable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jintaoguan commented on a change in pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


jintaoguan commented on a change in pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#discussion_r803325113



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##
@@ -168,7 +169,17 @@ class IncrementalRelation(val sqlContext: SQLContext,
   } else {
 log.info("Additional Filters to be applied to incremental source are 
:" + filters.mkString("Array(", ", ", ")"))
 
-var df: DataFrame = 
sqlContext.createDataFrame(sqlContext.sparkContext.emptyRDD[Row], usedSchema)
+var prunedSchema = StructType(Seq())
+if 
(!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) {

Review comment:
   Ack.

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##
@@ -225,6 +236,9 @@ class IncrementalRelation(val sqlContext: SQLContext,
   }
 }
 
+if 
(!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) {

Review comment:
   Ack.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jintaoguan commented on a change in pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


jintaoguan commented on a change in pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#discussion_r803324790



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##
@@ -157,27 +161,40 @@ class IncrementalRelation(val sqlContext: SQLContext,
 } else {
   log.info("Additional Filters to be applied to incremental source are :" 
+ filters)
 
-  var df: DataFrame = 
sqlContext.createDataFrame(sqlContext.sparkContext.emptyRDD[Row], usedSchema)
+  var prunedSchema = StructType(Seq())
+  if (!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) {
+prunedSchema = 
prunedSchema.add(usedSchema(HoodieRecord.COMMIT_TIME_METADATA_FIELD))
+  }
+  requiredColumns.foreach(col => {
+val field = usedSchema.find(_.name == col)
+if (field.isDefined) {
+  prunedSchema = prunedSchema.add(field.get)
+}
+  })
+  var df: DataFrame = 
sqlContext.createDataFrame(sqlContext.sparkContext.emptyRDD[Row], prunedSchema)
 
   if (metaBootstrapFileIdToFullPath.nonEmpty) {
 df = sqlContext.sparkSession.read
.format("hudi")
-   .schema(usedSchema)
+   .schema(prunedSchema)
.option(DataSourceReadOptions.READ_PATHS.key, 
filteredMetaBootstrapFullPaths.mkString(","))
.load()
   }
 
   if (regularFileIdToFullPath.nonEmpty)
   {
 df = df.union(sqlContext.read.options(sOpts)
-.schema(usedSchema)
+.schema(prunedSchema)
 .parquet(filteredRegularFullPaths.toList: _*)
 .filter(String.format("%s >= '%s'", 
HoodieRecord.COMMIT_TIME_METADATA_FIELD,
   commitsToReturn.head.getTimestamp))
 .filter(String.format("%s <= '%s'", 
HoodieRecord.COMMIT_TIME_METADATA_FIELD,
   commitsToReturn.last.getTimestamp)))
   }
 
+  if (!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) {

Review comment:
   Because IncrementalRelation extends PrunedScan here, the SparkSQL engine 
passes the required columns to IncrementalRelation and it expects the returned 
RDD only to have the required columns.
   If we don't remove the non-required column here, it will cause schema 
mismatch errors. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034498529


   
   ## CI report:
   
   * d439934deae42a89dadb3f0e4b0427daff0ae866 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5753)
 
   * e7933bc365b3dc9a0a310f04be4a654512421491 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5863)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034523012


   
   ## CI report:
   
   * e7933bc365b3dc9a0a310f04be4a654512421491 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5863)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] todd5167 commented on pull request #4774: [HUDI-3401] fix NPE caused by incorrect beforeKeyGenClassName validation

2022-02-09 Thread GitBox


todd5167 commented on pull request #4774:
URL: https://github.com/apache/hudi/pull/4774#issuecomment-1034521580


   > @todd5167 LGTM. But if you show some codes or steps to indicate how to 
reproduce this , it will be nice.
   
   @YannByron   Execute insert statement using spark sql,with the default 
KEYGEN_CLASS, the exception occurs when the RecordKey is generated. 
   
   e.g.:INSERT into `hudi_mor_rt`  partition(order_date = '20220209')  values 
(5,'todd',1,1,'x','x')
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3646: [HUDI-349]: Added new cleaning policy based on number of hours

2022-02-09 Thread GitBox


nsivabalan commented on pull request #3646:
URL: https://github.com/apache/hudi/pull/3646#issuecomment-1034519000


   @pratyakshsharma : Can you check for CI failures. also, is it already 
rebased w/ latest master. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4709: [HUDI-3338] custom relation instead of HadoopFsRelation

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4709:
URL: https://github.com/apache/hudi/pull/4709#issuecomment-1034465858


   
   ## CI report:
   
   * 2f14cbdd761921dc1b29c01b1201f58cc1f98b5a UNKNOWN
   * dc2eb504cccb8c012692c5763610e632f7e92d06 UNKNOWN
   * 936166530526ae8b58bdd5417fcd6fbac8f02488 UNKNOWN
   * b9e1bd769208137666350f95d1fd20f3d449fb73 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5845)
 
   * b2f63a5a798b410bd473db3de691502ccfbd9881 UNKNOWN
   * aa1d91a2d3fc7f6a150b0cb8b55a27c72ac79c66 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5861)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4709: [HUDI-3338] custom relation instead of HadoopFsRelation

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4709:
URL: https://github.com/apache/hudi/pull/4709#issuecomment-1034517556


   
   ## CI report:
   
   * 2f14cbdd761921dc1b29c01b1201f58cc1f98b5a UNKNOWN
   * dc2eb504cccb8c012692c5763610e632f7e92d06 UNKNOWN
   * 936166530526ae8b58bdd5417fcd6fbac8f02488 UNKNOWN
   * b2f63a5a798b410bd473db3de691502ccfbd9881 UNKNOWN
   * aa1d91a2d3fc7f6a150b0cb8b55a27c72ac79c66 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5861)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on a change in pull request #3985: [HUDI-2754] Performance improvement for IncrementalRelation

2022-02-09 Thread GitBox


prashantwason commented on a change in pull request #3985:
URL: https://github.com/apache/hudi/pull/3985#discussion_r803313357



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##
@@ -225,6 +236,9 @@ class IncrementalRelation(val sqlContext: SQLContext,
   }
 }
 
+if 
(!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) {

Review comment:
   Add a code comment here that "remove the COMMIT_TIME_METADATA_FIELD if 
not requested"

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##
@@ -157,27 +161,40 @@ class IncrementalRelation(val sqlContext: SQLContext,
 } else {
   log.info("Additional Filters to be applied to incremental source are :" 
+ filters)
 
-  var df: DataFrame = 
sqlContext.createDataFrame(sqlContext.sparkContext.emptyRDD[Row], usedSchema)
+  var prunedSchema = StructType(Seq())
+  if (!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) {
+prunedSchema = 
prunedSchema.add(usedSchema(HoodieRecord.COMMIT_TIME_METADATA_FIELD))
+  }
+  requiredColumns.foreach(col => {
+val field = usedSchema.find(_.name == col)
+if (field.isDefined) {
+  prunedSchema = prunedSchema.add(field.get)
+}
+  })
+  var df: DataFrame = 
sqlContext.createDataFrame(sqlContext.sparkContext.emptyRDD[Row], prunedSchema)
 
   if (metaBootstrapFileIdToFullPath.nonEmpty) {
 df = sqlContext.sparkSession.read
.format("hudi")
-   .schema(usedSchema)
+   .schema(prunedSchema)
.option(DataSourceReadOptions.READ_PATHS.key, 
filteredMetaBootstrapFullPaths.mkString(","))
.load()
   }
 
   if (regularFileIdToFullPath.nonEmpty)
   {
 df = df.union(sqlContext.read.options(sOpts)
-.schema(usedSchema)
+.schema(prunedSchema)
 .parquet(filteredRegularFullPaths.toList: _*)
 .filter(String.format("%s >= '%s'", 
HoodieRecord.COMMIT_TIME_METADATA_FIELD,
   commitsToReturn.head.getTimestamp))
 .filter(String.format("%s <= '%s'", 
HoodieRecord.COMMIT_TIME_METADATA_FIELD,
   commitsToReturn.last.getTimestamp)))
   }
 
+  if (!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) {

Review comment:
   After this change, the COMMIT_TIME_METADATA_FIELD will not be returned 
from the results, right? I wonder if anyone is using these and that will fail.
   
   Why is removing this column required? 

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##
@@ -168,7 +169,17 @@ class IncrementalRelation(val sqlContext: SQLContext,
   } else {
 log.info("Additional Filters to be applied to incremental source are 
:" + filters.mkString("Array(", ", ", ")"))
 
-var df: DataFrame = 
sqlContext.createDataFrame(sqlContext.sparkContext.emptyRDD[Row], usedSchema)
+var prunedSchema = StructType(Seq())
+if 
(!requiredColumns.contains(HoodieRecord.COMMIT_TIME_METADATA_FIELD)) {

Review comment:
   Add a code comment here to explain why COMMIT_TIME_METADATA_FIELD is 
required in prunedSchema even if it is not requested by user and not present in 
requiredColumns (because it is required to match the incremental fetch).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on a change in pull request #4212: [HUDI-2925] Fix duplicate cleaning of same files when unfinished clean operations are present.

2022-02-09 Thread GitBox


prashantwason commented on a change in pull request #4212:
URL: https://github.com/apache/hudi/pull/4212#discussion_r803309459



##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataFileSystemView.java
##
@@ -36,7 +36,7 @@
  */
 public class HoodieMetadataFileSystemView extends HoodieTableFileSystemView {
 
-  private final HoodieTableMetadata tableMetadata;
+  private HoodieTableMetadata tableMetadata;

Review comment:
   I didnt get why this change is required?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on a change in pull request #4212: [HUDI-2925] Fix duplicate cleaning of same files when unfinished clean operations are present.

2022-02-09 Thread GitBox


prashantwason commented on a change in pull request #4212:
URL: https://github.com/apache/hudi/pull/4212#discussion_r803309004



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##
@@ -1737,6 +1739,75 @@ public void testErrorCases() throws Exception {
 }
   }
 
+  /**
+   * Tests no more than 1 clean is scheduled/executed if 
HoodieCompactionConfig.allowMultipleCleanSchedule config is disabled.
+   */
+  @Test
+  public void testMultiClean() throws Exception {

Review comment:
   Is there any file which has cleaner specific tests? This does not seem 
metadata table related test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on a change in pull request #4212: [HUDI-2925] Fix duplicate cleaning of same files when unfinished clean operations are present.

2022-02-09 Thread GitBox


prashantwason commented on a change in pull request #4212:
URL: https://github.com/apache/hudi/pull/4212#discussion_r803308155



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##
@@ -721,21 +721,29 @@ public HoodieCleanMetadata clean(String cleanInstantTime, 
boolean skipLocking) t
* @param skipLocking if this is triggered by another parent transaction, 
locking can be skipped.
*/
   public HoodieCleanMetadata clean(String cleanInstantTime, boolean 
scheduleInline, boolean skipLocking) throws HoodieIOException {
-if (scheduleInline) {
-  scheduleTableServiceInternal(cleanInstantTime, Option.empty(), 
TableServiceType.CLEAN);
-}
 LOG.info("Cleaner started");
 final Timer.Context timerContext = metrics.getCleanCtx();
 LOG.info("Cleaned failed attempts if any");

Review comment:
   This log is not very useful and prints each time. Probably move it to 
within rollbackFailedWrites() if any writes are actually found to be cleaned.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4781: [MINOR] Fix typos in Spark client related classes

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4781:
URL: https://github.com/apache/hudi/pull/4781#issuecomment-1034455169


   
   ## CI report:
   
   * a99182f1f0215bfa466b28e3b3313934e77bd0fe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5859)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4781: [MINOR] Fix typos in Spark client related classes

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4781:
URL: https://github.com/apache/hudi/pull/4781#issuecomment-1034502776


   
   ## CI report:
   
   * a99182f1f0215bfa466b28e3b3313934e77bd0fe Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5859)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on a change in pull request #4212: [HUDI-2925] Fix duplicate cleaning of same files when unfinished clean operations are present.

2022-02-09 Thread GitBox


prashantwason commented on a change in pull request #4212:
URL: https://github.com/apache/hudi/pull/4212#discussion_r803307671



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
##
@@ -721,21 +721,29 @@ public HoodieCleanMetadata clean(String cleanInstantTime, 
boolean skipLocking) t
* @param skipLocking if this is triggered by another parent transaction, 
locking can be skipped.
*/
   public HoodieCleanMetadata clean(String cleanInstantTime, boolean 
scheduleInline, boolean skipLocking) throws HoodieIOException {
-if (scheduleInline) {
-  scheduleTableServiceInternal(cleanInstantTime, Option.empty(), 
TableServiceType.CLEAN);
-}
 LOG.info("Cleaner started");

Review comment:
   Maybe move this log to within the if-block in line 732 so it is printed 
when cleaner is actually started.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YannByron commented on a change in pull request #4709: [HUDI-3338] custom relation instead of HadoopFsRelation

2022-02-09 Thread GitBox


YannByron commented on a change in pull request #4709:
URL: https://github.com/apache/hudi/pull/4709#discussion_r803305661



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBootstrapRelation.scala
##
@@ -107,37 +107,35 @@ class HoodieBootstrapRelation(@transient val _sqlContext: 
SQLContext,
 })
 
 // Prepare readers for reading data file and skeleton files
-val dataReadFunction = new ParquetFileFormat()
-.buildReaderWithPartitionValues(
-  sparkSession = _sqlContext.sparkSession,
-  dataSchema = dataSchema,
-  partitionSchema = StructType(Seq.empty),
-  requiredSchema = requiredDataSchema,
-  filters = if (requiredSkeletonSchema.isEmpty) filters else Seq() ,
-  options = Map.empty,
-  hadoopConf = _sqlContext.sparkSession.sessionState.newHadoopConf()
-)
-
-val skeletonReadFunction = new ParquetFileFormat()
-  .buildReaderWithPartitionValues(
-sparkSession = _sqlContext.sparkSession,
-dataSchema = skeletonSchema,
-partitionSchema = StructType(Seq.empty),
-requiredSchema = requiredSkeletonSchema,
-filters = if (requiredDataSchema.isEmpty) filters else Seq(),
-options = Map.empty,
-hadoopConf = _sqlContext.sparkSession.sessionState.newHadoopConf()
-  )
-
-val regularReadFunction = new ParquetFileFormat()
-  .buildReaderWithPartitionValues(
-sparkSession = _sqlContext.sparkSession,
-dataSchema = fullSchema,
-partitionSchema = StructType(Seq.empty),
-requiredSchema = requiredColsSchema,
-filters = filters,
-options = Map.empty,
-hadoopConf = _sqlContext.sparkSession.sessionState.newHadoopConf())
+val dataReadFunction = HoodieDataSourceHelper.buildHoodieParquetReader(
+  sparkSession = _sqlContext.sparkSession,
+  dataSchema = dataSchema,
+  partitionSchema = StructType(Seq.empty),
+  requiredSchema = requiredDataSchema,
+  filters = if (requiredSkeletonSchema.isEmpty) filters else Seq() ,
+  options = Map.empty,

Review comment:
   have changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034490759


   
   ## CI report:
   
   * e7933bc365b3dc9a0a310f04be4a654512421491 UNKNOWN
   * d439934deae42a89dadb3f0e4b0427daff0ae866 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5753)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034498529


   
   ## CI report:
   
   * d439934deae42a89dadb3f0e4b0427daff0ae866 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5753)
 
   * e7933bc365b3dc9a0a310f04be4a654512421491 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5863)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034489508


   
   ## CI report:
   
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5862)
 
   * e7933bc365b3dc9a0a310f04be4a654512421491 UNKNOWN
   * d439934deae42a89dadb3f0e4b0427daff0ae866 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034490759


   
   ## CI report:
   
   * e7933bc365b3dc9a0a310f04be4a654512421491 UNKNOWN
   * d439934deae42a89dadb3f0e4b0427daff0ae866 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5753)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034475842


   
   ## CI report:
   
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5862)
 
   * e7933bc365b3dc9a0a310f04be4a654512421491 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034489508


   
   ## CI report:
   
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5862)
 
   * e7933bc365b3dc9a0a310f04be4a654512421491 UNKNOWN
   * d439934deae42a89dadb3f0e4b0427daff0ae866 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4709: [HUDI-3338] custom relation instead of HadoopFsRelation

2022-02-09 Thread GitBox


alexeykudinkin commented on a change in pull request #4709:
URL: https://github.com/apache/hudi/pull/4709#discussion_r803294484



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBootstrapRelation.scala
##
@@ -107,37 +107,35 @@ class HoodieBootstrapRelation(@transient val _sqlContext: 
SQLContext,
 })
 
 // Prepare readers for reading data file and skeleton files
-val dataReadFunction = new ParquetFileFormat()
-.buildReaderWithPartitionValues(
-  sparkSession = _sqlContext.sparkSession,
-  dataSchema = dataSchema,
-  partitionSchema = StructType(Seq.empty),
-  requiredSchema = requiredDataSchema,
-  filters = if (requiredSkeletonSchema.isEmpty) filters else Seq() ,
-  options = Map.empty,
-  hadoopConf = _sqlContext.sparkSession.sessionState.newHadoopConf()
-)
-
-val skeletonReadFunction = new ParquetFileFormat()
-  .buildReaderWithPartitionValues(
-sparkSession = _sqlContext.sparkSession,
-dataSchema = skeletonSchema,
-partitionSchema = StructType(Seq.empty),
-requiredSchema = requiredSkeletonSchema,
-filters = if (requiredDataSchema.isEmpty) filters else Seq(),
-options = Map.empty,
-hadoopConf = _sqlContext.sparkSession.sessionState.newHadoopConf()
-  )
-
-val regularReadFunction = new ParquetFileFormat()
-  .buildReaderWithPartitionValues(
-sparkSession = _sqlContext.sparkSession,
-dataSchema = fullSchema,
-partitionSchema = StructType(Seq.empty),
-requiredSchema = requiredColsSchema,
-filters = filters,
-options = Map.empty,
-hadoopConf = _sqlContext.sparkSession.sessionState.newHadoopConf())
+val dataReadFunction = HoodieDataSourceHelper.buildHoodieParquetReader(
+  sparkSession = _sqlContext.sparkSession,
+  dataSchema = dataSchema,
+  partitionSchema = StructType(Seq.empty),
+  requiredSchema = requiredDataSchema,
+  filters = if (requiredSkeletonSchema.isEmpty) filters else Seq() ,
+  options = Map.empty,

Review comment:
   See no reasons why it should not




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #4385: [HUDI-1436]: provided option to trigger clean every nth commit

2022-02-09 Thread GitBox


nsivabalan commented on a change in pull request #4385:
URL: https://github.com/apache/hudi/pull/4385#discussion_r803291165



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
##
@@ -58,8 +59,34 @@ public CleanPlanActionExecutor(HoodieEngineContext context,
 this.extraMetadata = extraMetadata;
   }
 
-  protected Option createCleanerPlan() {
-return execute();
+  private int getCommitInfo() {
+Option lastCleanInstant = 
table.getActiveTimeline().getCleanerTimeline().filterCompletedInstants().lastInstant();
+HoodieTimeline commitTimeline = 
table.getActiveTimeline().getCommitTimeline().filterCompletedInstants();

Review comment:
   nope. cleaner could clean log files too. 

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java
##
@@ -1148,7 +1167,9 @@ public void testKeepLatestCommits(boolean 
simulateFailureRetry, boolean enableIn
 .withIncrementalCleaningMode(enableIncrementalClean)
 
.withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.EAGER)
 .withCleanBootstrapBaseFileEnabled(enableBootstrapSourceClean)
-
.withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS).retainCommits(2).build())
+.withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
+.retainCommits(2)

Review comment:
   yeah. can we try to add more tests around clean plan executor level. at 
a higher level, we can have just one set of cleaner tests. But for diff 
strategies would be better to avoid more tests at write client or table level. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034474666


   
   ## CI report:
   
   * 8e6709532cb5073a9e7b0fc7f24b7ca131102e35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5843)
 
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5862)
 
   * e7933bc365b3dc9a0a310f04be4a654512421491 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034475842


   
   ## CI report:
   
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5862)
 
   * e7933bc365b3dc9a0a310f04be4a654512421491 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4714: [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4714:
URL: https://github.com/apache/hudi/pull/4714#issuecomment-1034475793


   
   ## CI report:
   
   * 4ba5756a483f5f5fab6878e4dc055f4116377650 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5858)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4714: [HUDI-3204] fix problem that spark on TimestampKeyGenerator has no re…

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4714:
URL: https://github.com/apache/hudi/pull/4714#issuecomment-1034450979


   
   ## CI report:
   
   * 6103ba6800a244253f9e7150f2ee90dc8019df61 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5836)
 
   * 4ba5756a483f5f5fab6878e4dc055f4116377650 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5858)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034472044


   
   ## CI report:
   
   * 8e6709532cb5073a9e7b0fc7f24b7ca131102e35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5843)
 
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5862)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034474666


   
   ## CI report:
   
   * 8e6709532cb5073a9e7b0fc7f24b7ca131102e35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5843)
 
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5862)
 
   * e7933bc365b3dc9a0a310f04be4a654512421491 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034472044


   
   ## CI report:
   
   * 8e6709532cb5073a9e7b0fc7f24b7ca131102e35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5843)
 
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5862)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034470691


   
   ## CI report:
   
   * 8e6709532cb5073a9e7b0fc7f24b7ca131102e35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5843)
 
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3200) File Index config affects partition fields shown in printSchema results

2022-02-09 Thread Yann Byron (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron updated HUDI-3200:
-
Status: In Progress  (was: Open)

> File Index config affects partition fields shown in printSchema results
> ---
>
> Key: HUDI-3200
> URL: https://issues.apache.org/jira/browse/HUDI-3200
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark, writer-core
>Reporter: Raymond Xu
>Priority: Critical
> Fix For: 0.11.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Discovered in HUDI-3065, disabling file index config should not affect 
> partition fields shown in printSchema. 
> It looks like since 0.9.0
> - file index = true: it enables partition auto discovery
> - file index = false: it disables partition auto discovery



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3338) Use custom relation instead of HadoopFsRelation

2022-02-09 Thread Yann Byron (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron updated HUDI-3338:
-
Status: Patch Available  (was: In Progress)

> Use custom relation instead of HadoopFsRelation
> ---
>
> Key: HUDI-3338
> URL: https://issues.apache.org/jira/browse/HUDI-3338
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark, spark-sql
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> For HUDI-3204, COW table and MOR table in read_optimized query mode should 
> return the '-MM-dd' format of origin `data_date`, not /MM/dd''.
> And the reason for that is because Hudi use HadoopFsRelation for the snapshot 
> query mode of cow and the read_optimized query mode of mor.
> Spark HadoopFsRelation will append the partition value of the real partition 
> path. However, different from the normal table, Hudi will persist the 
> partition value in the parquet file. So we just need read the partition value 
> from the parquet file, not leave it to spark.
> So we should not use `HadoopFsRelation` any more, and implement Hudi own 
> `Relation` to deal with it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1034470691


   
   ## CI report:
   
   * 8e6709532cb5073a9e7b0fc7f24b7ca131102e35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5843)
 
   * e25be0830925dfeefe5a3e2d8cbc663309d6cce7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2610) Fix Spark version info for hudi table CTAS from another hudi table

2022-02-09 Thread Yann Byron (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron updated HUDI-2610:
-
Status: In Progress  (was: Open)

> Fix Spark version info for hudi table CTAS from another hudi table
> --
>
> Key: HUDI-2610
> URL: https://issues.apache.org/jira/browse/HUDI-2610
> Project: Apache Hudi
>  Issue Type: Task
>  Components: spark
>Reporter: Raymond Xu
>Assignee: Yann Byron
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> See details in the original issue
>  
> https://github.com/apache/hudi/issues/3662#issuecomment-938489457



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3338) Use custom relation instead of HadoopFsRelation

2022-02-09 Thread Yann Byron (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron updated HUDI-3338:
-
Status: In Progress  (was: Open)

> Use custom relation instead of HadoopFsRelation
> ---
>
> Key: HUDI-3338
> URL: https://issues.apache.org/jira/browse/HUDI-3338
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark, spark-sql
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> For HUDI-3204, COW table and MOR table in read_optimized query mode should 
> return the '-MM-dd' format of origin `data_date`, not /MM/dd''.
> And the reason for that is because Hudi use HadoopFsRelation for the snapshot 
> query mode of cow and the read_optimized query mode of mor.
> Spark HadoopFsRelation will append the partition value of the real partition 
> path. However, different from the normal table, Hudi will persist the 
> partition value in the parquet file. So we just need read the partition value 
> from the parquet file, not leave it to spark.
> So we should not use `HadoopFsRelation` any more, and implement Hudi own 
> `Relation` to deal with it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3204) spark on TimestampBasedKeyGenerator has no result when query by partition column

2022-02-09 Thread Yann Byron (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron updated HUDI-3204:
-
Status: Patch Available  (was: In Progress)

> spark on TimestampBasedKeyGenerator has no result when query by partition 
> column
> 
>
> Key: HUDI-3204
> URL: https://issues.apache.org/jira/browse/HUDI-3204
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Critical
>  Labels: hudi-on-call, pull-request-available, sev:critical
> Fix For: 0.11.0
>
>   Original Estimate: 3h
>  Time Spent: 1h
>  Remaining Estimate: 1h
>
>  
> {code:java}
> import org.apache.hudi.DataSourceWriteOptions
> import org.apache.hudi.config.HoodieWriteConfig
> import org.apache.hudi.keygen.constant.KeyGeneratorOptions._
> import org.apache.hudi.hive.MultiPartKeysValueExtractor
> val df = Seq((1, "z3", 30, "v1", "2018-09-23"), (2, "z3", 35, "v1", 
> "2018-09-24")).toDF("id", "name", "age", "ts", "data_date")
> // mor
> df.write.format("hudi").
> option(HoodieWriteConfig.TABLE_NAME, "issue_4417_mor").
> option("hoodie.datasource.write.table.type", "MERGE_ON_READ").
> option("hoodie.datasource.write.recordkey.field", "id").
> option("hoodie.datasource.write.partitionpath.field", "data_date").
> option("hoodie.datasource.write.precombine.field", "ts").
> option("hoodie.datasource.write.keygenerator.class", 
> "org.apache.hudi.keygen.TimestampBasedKeyGenerator").
> option("hoodie.deltastreamer.keygen.timebased.timestamp.type", "DATE_STRING").
> option("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
> "/MM/dd").
> option("hoodie.deltastreamer.keygen.timebased.timezone", "GMT+8:00").
> option("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
> "-MM-dd").
> mode(org.apache.spark.sql.SaveMode.Append).
> save("file:///tmp/hudi/issue_4417_mor")
> +---++--+--++---++---+---+--+
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
>    _hoodie_file_name| id|name|age| ts| data_date|
> +---++--+--++---++---+---+--+
> |  20220110172709324|20220110172709324...|                 2|            
> 2018/09/24|703e56d3-badb-40b...|  2|  z3| 35| v1|2018-09-24|
> |  20220110172709324|20220110172709324...|                 1|            
> 2018/09/23|58fde2b3-db0e-464...|  1|  z3| 30| v1|2018-09-23|
> +---++--+--++---++---+---+--+
> // can not query any data
> spark.read.format("hudi").load("file:///tmp/hudi/issue_4417_mor").where("data_date
>  = '2018-09-24'")
> // still can not query any data
> spark.read.format("hudi").load("file:///tmp/hudi/issue_4417_mor").where("data_date
>  = '2018/09/24'").show 
> // cow
> df.write.format("hudi").
> option(HoodieWriteConfig.TABLE_NAME, "issue_4417_cow").
> option("hoodie.datasource.write.table.type", "COPY_ON_WRITE").
> option("hoodie.datasource.write.recordkey.field", "id").
> option("hoodie.datasource.write.partitionpath.field", "data_date").
> option("hoodie.datasource.write.precombine.field", "ts").
> option("hoodie.datasource.write.keygenerator.class", 
> "org.apache.hudi.keygen.TimestampBasedKeyGenerator").
> option("hoodie.deltastreamer.keygen.timebased.timestamp.type", "DATE_STRING").
> option("hoodie.deltastreamer.keygen.timebased.output.dateformat", 
> "/MM/dd").
> option("hoodie.deltastreamer.keygen.timebased.timezone", "GMT+8:00").
> option("hoodie.deltastreamer.keygen.timebased.input.dateformat", 
> "-MM-dd").
> mode(org.apache.spark.sql.SaveMode.Append).
> save("file:///tmp/hudi/issue_4417_cow") 
> +---++--+--++---++---+---+--+
>  
> |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
>    _hoodie_file_name| id|name|age| ts| data_date| 
> +---++--+--++---++---+---+--+
>  |  20220110172721896|20220110172721896...|                 2|            
> 2018/09/24|81cc7819-a0d1-4e6...|  2|  z3| 35| v1|2018/09/24| |  
> 20220110172721896|20220110172721896...|                 1|            
> 2018/09/23|d428019b-a829-41a...|  1|  z3| 30| v1|2018/09/23| 
> +---++--+--++---++---+---+--+
>  
> // can not query any data
> spark.read.format("hudi")

[GitHub] [hudi] hudi-bot removed a comment on pull request #4752: [WIP][HUDI-3088] Use Spark 3.2 as default Spark version

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4752:
URL: https://github.com/apache/hudi/pull/4752#issuecomment-1033878216


   
   ## CI report:
   
   * 8e6709532cb5073a9e7b0fc7f24b7ca131102e35 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5843)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-3333) getNestedFieldVal breaks with Spark 3.2

2022-02-09 Thread Yann Byron (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron reassigned HUDI-:


Assignee: Yann Byron

> getNestedFieldVal breaks with Spark 3.2
> ---
>
> Key: HUDI-
> URL: https://issues.apache.org/jira/browse/HUDI-
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: Raymond Xu
>Assignee: Yann Byron
>Priority: Critical
> Fix For: 0.11.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> When set `returnNullIfNotFound` = true, the method sill throws exception. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3333) getNestedFieldVal breaks with Spark 3.2

2022-02-09 Thread Yann Byron (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yann Byron updated HUDI-:
-
Status: In Progress  (was: Open)

> getNestedFieldVal breaks with Spark 3.2
> ---
>
> Key: HUDI-
> URL: https://issues.apache.org/jira/browse/HUDI-
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark
>Reporter: Raymond Xu
>Assignee: Yann Byron
>Priority: Critical
> Fix For: 0.11.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> When set `returnNullIfNotFound` = true, the method sill throws exception. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4780: [WIP] Hudi 3088 test spark3

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4780:
URL: https://github.com/apache/hudi/pull/4780#issuecomment-1034443234


   
   ## CI report:
   
   * 1006f9fd504f6a9ea7f4aeb6527dbf8c279020de Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5857)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4780: [WIP] Hudi 3088 test spark3

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4780:
URL: https://github.com/apache/hudi/pull/4780#issuecomment-1034467141


   
   ## CI report:
   
   * 1006f9fd504f6a9ea7f4aeb6527dbf8c279020de Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5857)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4773: [HUDI-3400] Avoid throw exception when create hoodie table

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4773:
URL: https://github.com/apache/hudi/pull/4773#issuecomment-1034423840


   
   ## CI report:
   
   * 761821810bc67902553c37dcbfb40c61abd4e7c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5832)
 
   * 09217dbc34c15ccb5259cb4168ed9e221ac92bf1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5856)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4773: [HUDI-3400] Avoid throw exception when create hoodie table

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4773:
URL: https://github.com/apache/hudi/pull/4773#issuecomment-1034465942


   
   ## CI report:
   
   * 09217dbc34c15ccb5259cb4168ed9e221ac92bf1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5856)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4709: [HUDI-3338] custom relation instead of HadoopFsRelation

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4709:
URL: https://github.com/apache/hudi/pull/4709#issuecomment-1034462941


   
   ## CI report:
   
   * 2f14cbdd761921dc1b29c01b1201f58cc1f98b5a UNKNOWN
   * dc2eb504cccb8c012692c5763610e632f7e92d06 UNKNOWN
   * 936166530526ae8b58bdd5417fcd6fbac8f02488 UNKNOWN
   * b9e1bd769208137666350f95d1fd20f3d449fb73 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5845)
 
   * b2f63a5a798b410bd473db3de691502ccfbd9881 UNKNOWN
   * aa1d91a2d3fc7f6a150b0cb8b55a27c72ac79c66 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4709: [HUDI-3338] custom relation instead of HadoopFsRelation

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4709:
URL: https://github.com/apache/hudi/pull/4709#issuecomment-1034465858


   
   ## CI report:
   
   * 2f14cbdd761921dc1b29c01b1201f58cc1f98b5a UNKNOWN
   * dc2eb504cccb8c012692c5763610e632f7e92d06 UNKNOWN
   * 936166530526ae8b58bdd5417fcd6fbac8f02488 UNKNOWN
   * b9e1bd769208137666350f95d1fd20f3d449fb73 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5845)
 
   * b2f63a5a798b410bd473db3de691502ccfbd9881 UNKNOWN
   * aa1d91a2d3fc7f6a150b0cb8b55a27c72ac79c66 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5861)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4709: [HUDI-3338] custom relation instead of HadoopFsRelation

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4709:
URL: https://github.com/apache/hudi/pull/4709#issuecomment-1034462941


   
   ## CI report:
   
   * 2f14cbdd761921dc1b29c01b1201f58cc1f98b5a UNKNOWN
   * dc2eb504cccb8c012692c5763610e632f7e92d06 UNKNOWN
   * 936166530526ae8b58bdd5417fcd6fbac8f02488 UNKNOWN
   * b9e1bd769208137666350f95d1fd20f3d449fb73 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5845)
 
   * b2f63a5a798b410bd473db3de691502ccfbd9881 UNKNOWN
   * aa1d91a2d3fc7f6a150b0cb8b55a27c72ac79c66 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4709: [HUDI-3338] custom relation instead of HadoopFsRelation

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4709:
URL: https://github.com/apache/hudi/pull/4709#issuecomment-1034443132


   
   ## CI report:
   
   * 2f14cbdd761921dc1b29c01b1201f58cc1f98b5a UNKNOWN
   * dc2eb504cccb8c012692c5763610e632f7e92d06 UNKNOWN
   * 936166530526ae8b58bdd5417fcd6fbac8f02488 UNKNOWN
   * b9e1bd769208137666350f95d1fd20f3d449fb73 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5845)
 
   * b2f63a5a798b410bd473db3de691502ccfbd9881 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (464027e -> b3b4423)

2022-02-09 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 464027e  [HUDI-3239] Convert `BaseHoodieTableFileIndex` to Java (#4669)
 add b3b4423  [HUDI-3389] Bump flink version to 1.14.3 (#4776)

No new revisions were added by this update.

Summary of changes:
 hudi-client/hudi-flink-client/pom.xml  |  4 ++--
 hudi-flink/pom.xml |  8 
 .../hudi/sink/StreamWriteOperatorCoordinator.java  | 10 --
 .../RowDataToHoodieFunctionWithRateLimit.java  |  2 +-
 .../java/org/apache/hudi/sink/utils/Pipelines.java |  9 -
 .../org/apache/hudi/source/StreamReadOperator.java | 16 
 .../org/apache/hudi/sink/StreamWriteITCase.java|  4 ++--
 .../apache/hudi/sink/utils/CollectorOutput.java|  6 ++
 .../hudi/sink/utils/CompactFunctionWrapper.java|  6 ++
 .../sink/utils/MockStateInitializationContext.java |  7 +++
 .../sink/utils/MockStreamingRuntimeContext.java|  6 +++---
 .../apache/hudi/table/HoodieDataSourceITCase.java  | 22 --
 pom.xml|  2 +-
 13 files changed, 72 insertions(+), 30 deletions(-)


[GitHub] [hudi] danny0405 merged pull request #4776: [HUDI-3389] Bump flink version to 1.14.3

2022-02-09 Thread GitBox


danny0405 merged pull request #4776:
URL: https://github.com/apache/hudi/pull/4776


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua commented on a change in pull request #4669: [HUDI-3239] Convert `BaseHoodieTableFileIndex` to Java

2022-02-09 Thread GitBox


yihua commented on a change in pull request #4669:
URL: https://github.com/apache/hudi/pull/4669#discussion_r803272577



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieTableQueryType.java
##
@@ -30,7 +30,7 @@
  * 
  */
 public enum HoodieTableQueryType {
-  QUERY_TYPE_SNAPSHOT,
-  QUERY_TYPE_INCREMENTAL,
-  QUERY_TYPE_READ_OPTIMIZED
+  SNAPSHOT,

Review comment:
   @nsivabalan looks like we are good here.  These are totally new enums, 
not there in 0.10.0 release.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4781: [MINOR] Fix typos in Spark client related classes

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4781:
URL: https://github.com/apache/hudi/pull/4781#issuecomment-1034455169


   
   ## CI report:
   
   * a99182f1f0215bfa466b28e3b3313934e77bd0fe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5859)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4781: [MINOR] Fix typos in Spark client related classes

2022-02-09 Thread GitBox


hudi-bot removed a comment on pull request #4781:
URL: https://github.com/apache/hudi/pull/4781#issuecomment-1034453835


   
   ## CI report:
   
   * a99182f1f0215bfa466b28e3b3313934e77bd0fe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4781: [MINOR] Fix typos in Spark client related classes

2022-02-09 Thread GitBox


hudi-bot commented on pull request #4781:
URL: https://github.com/apache/hudi/pull/4781#issuecomment-1034453835


   
   ## CI report:
   
   * a99182f1f0215bfa466b28e3b3313934e77bd0fe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua opened a new pull request #4781: [MINOR] Fix typos in Spark client related classes

2022-02-09 Thread GitBox


yihua opened a new pull request #4781:
URL: https://github.com/apache/hudi/pull/4781


   ## What is the purpose of the pull request
   
   This PR fixes a few typos in Spark client related classes.
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   >