[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395049#comment-17395049 ] ASF GitHub Bot commented on HUDI-1468: -- nsivabalan merged pull request #3419: URL: https://github.com/apache/hudi/pull/3419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395039#comment-17395039 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3419: URL: https://github.com/apache/hudi/pull/3419#issuecomment-893980561 ## CI report: * 835c62fca2fad8622c358cfdb053d4cca86d1747 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1428) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1448) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395029#comment-17395029 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3419: URL: https://github.com/apache/hudi/pull/3419#issuecomment-893980561 ## CI report: * 835c62fca2fad8622c358cfdb053d4cca86d1747 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1428) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1448) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17395025#comment-17395025 ] ASF GitHub Bot commented on HUDI-1468: -- nsivabalan commented on pull request #3419: URL: https://github.com/apache/hudi/pull/3419#issuecomment-894576839 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394608#comment-17394608 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3419: URL: https://github.com/apache/hudi/pull/3419#issuecomment-893980561 ## CI report: * 835c62fca2fad8622c358cfdb053d4cca86d1747 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1428) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394576#comment-17394576 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3419: URL: https://github.com/apache/hudi/pull/3419#issuecomment-893980561 ## CI report: * 9a72ddd7f012528a3e7c0c9441f760bfb2a18f3d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423) * 835c62fca2fad8622c358cfdb053d4cca86d1747 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1428) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394574#comment-17394574 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3419: URL: https://github.com/apache/hudi/pull/3419#issuecomment-893980561 ## CI report: * 9a72ddd7f012528a3e7c0c9441f760bfb2a18f3d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423) * 835c62fca2fad8622c358cfdb053d4cca86d1747 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394513#comment-17394513 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3419: URL: https://github.com/apache/hudi/pull/3419#issuecomment-893980561 ## CI report: * 9a72ddd7f012528a3e7c0c9441f760bfb2a18f3d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394507#comment-17394507 ] ASF GitHub Bot commented on HUDI-1468: -- vinothchandar commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-893993045 Closing in favor of #3419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394508#comment-17394508 ] ASF GitHub Bot commented on HUDI-1468: -- vinothchandar closed pull request #3211: URL: https://github.com/apache/hudi/pull/3211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394496#comment-17394496 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3419: URL: https://github.com/apache/hudi/pull/3419#issuecomment-893980561 ## CI report: * 9a72ddd7f012528a3e7c0c9441f760bfb2a18f3d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394492#comment-17394492 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot commented on pull request #3419: URL: https://github.com/apache/hudi/pull/3419#issuecomment-893980561 ## CI report: * 9a72ddd7f012528a3e7c0c9441f760bfb2a18f3d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394491#comment-17394491 ] ASF GitHub Bot commented on HUDI-1468: -- codope opened a new pull request #3419: URL: https://github.com/apache/hudi/pull/3419 …metadata as part of clustering This PR is a re-work of #3211 with some minor changes and tests. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393858#comment-17393858 ] ASF GitHub Bot commented on HUDI-1468: -- satishkotha commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-892786714 > @codope @satishkotha what's the next step here? > Could I help somehow to get this moving along @codope is working on adding additional tests for this PR. he mentioned he opened https://github.com/codope/hudi/pull/3 I'll review that and merge it here sometime this week/early next week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393297#comment-17393297 ] ASF GitHub Bot commented on HUDI-1468: -- satishkotha commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-892786714 > @codope @satishkotha what's the next step here? > Could I help somehow to get this moving along @codope is working on adding additional tests for this PR. he mentioned he opened https://github.com/codope/hudi/pull/3 I'll review that and merge it here sometime this week/early next week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available, release-blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391812#comment-17391812 ] ASF GitHub Bot commented on HUDI-1468: -- vinothchandar commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-891366331 @codope @satishkotha what's the next step here? Could I help somehow to get this moving along -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377966#comment-17377966 ] liwei commented on HUDI-1468: - [~vinoth] hello , is [https://github.com/apache/hudi/pull/3139/files] land it , this issue can close? > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375529#comment-17375529 ] ASF GitHub Bot commented on HUDI-1468: -- codope commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-874070580 @satishkotha Couple of high level questions: * Would preserving commit time be sufficient to support incremental read? Won't we need incremental timeline support (#2388 ) as well? * I see that a new `SparkAllowUpdateStrategy` has been added. How are we handling update conflicts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374790#comment-17374790 ] ASF GitHub Bot commented on HUDI-1468: -- codope commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-874070580 @satishkotha Couple of high level questions: * Would preserving commit time be sufficient to support incremental read? Won't we need incremental timeline support (#2388 ) as well? * I see that a new `SparkAllowUpdateStrategy` has been added. How are we handling update conflicts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373838#comment-17373838 ] ASF GitHub Bot commented on HUDI-1468: -- codecov-commenter edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872810385 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3211](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (56f4484) into [master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6eca06d) will **increase** coverage by `18.25%`. > The diff coverage is `60.30%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3211/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3211 +/- ## = + Coverage 47.51% 65.76% +18.25% + Complexity 5429 796 -4633 = Files 922 101 -821 Lines 40968 3529-37439 Branches 4105 351 -3754 = - Hits 19464 2321-17143 + Misses19780 1070-18710 + Partials 1724 138 -1586 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `65.76% <60.30%> (+31.18%)` | :arrow_up: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `?` | | | huditimelineservice | `?` | | | hudiutilities | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...trategy/SparkRecentDaysClusteringPlanStrategy.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2NsdXN0ZXJpbmcvcGxhbi9zdHJhdGVneS9TcGFya1JlY2VudERheXNDbHVzdGVyaW5nUGxhblN0cmF0ZWd5LmphdmE=) | `100.00% <ø> (+24.39%)` | :arrow_up: | | [...SparkSelectedPartitionsClusteringPlanStrategy.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2NsdXN0ZXJpbmcvcGxhbi9zdHJhdGVneS9TcGFya1NlbGVjdGVkUGFydGl0aW9uc0NsdXN0ZXJpbmdQbGFuU3RyYXRlZ3kuamF2YQ==) | `0.00% <0.00%> (ø)` | | | [.../run/strategy/SingleSparkJobExecutionStrategy.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2NsdXN0ZXJpbmcvcnVuL3N0cmF0ZWd5L1NpbmdsZVNwYXJrSm9iRXhlY3V0aW9uU3RyYXRlZ3kuamF2YQ==) | `0.00% <0.00%> (ø)` | | | [...ring/update/strategy/SparkAllowUpdateStrategy.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1zcGFyay1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L2NsdXN0ZXJpbmcvdXBkYXRlL3N0cmF0ZWd5L1NwYXJrQWxsb3dVcGRhdGVTdHJhdGVneS5qYXZh) | `0.00% <0.00%> (ø)` | | | [...SparkInsertOver
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373726#comment-17373726 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * 56f44844fbb9f251f0840a556553f3862771d4fc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=661) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373725#comment-17373725 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637) * 56f44844fbb9f251f0840a556553f3862771d4fc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373361#comment-17373361 ] ASF GitHub Bot commented on HUDI-1468: -- codecov-commenter edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872810385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373356#comment-17373356 ] ASF GitHub Bot commented on HUDI-1468: -- codecov-commenter edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872810385 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3211](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c9c9a0d) into [master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6eca06d) will **decrease** coverage by `29.95%`. > The diff coverage is `48.58%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3211/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#3211 +/- ## = - Coverage 47.51% 17.55% -29.96% + Complexity 5429 878 -4551 = Files 922 383 -539 Lines 4096815122-25846 Branches 4105 1297 -2808 = - Hits 19464 2655-16809 + Misses1978012303 -7477 + Partials 1724 164 -1560 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `20.91% <48.58%> (-13.67%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.31% <ø> (-48.72%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...org/apache/hudi/config/HoodieClusteringConfig.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVDbHVzdGVyaW5nQ29uZmlnLmphdmE=) | `0.00% <0.00%> (-71.28%)` | :arrow_down: | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `0.00% <0.00%> (-42.79%)` | :arrow_down: | | [...n/java/org/apache/hudi/io/CreateHandleFactory.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2lvL0NyZWF0ZUhhbmRsZUZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (ø)` | | | [...in/java/org/apache/hudi/io/HoodieCreateHandle.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2lvL0hvb2RpZUNyZWF0ZUhhbmRsZS5qYXZh) | `0.00% <0.00%> (ø)` | | | [...rg/apache/hudi/io/HoodieUnboundedCreateHandle.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=githu
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373324#comment-17373324 ] ASF GitHub Bot commented on HUDI-1468: -- codecov-commenter commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872810385 # [Codecov](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report > Merging [#3211](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c9c9a0d) into [master](https://codecov.io/gh/apache/hudi/commit/6eca06d074520140d7bc67b48bd2b9a5b76f0a87?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6eca06d) will **decrease** coverage by `44.62%`. > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/3211/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master #3211 +/- ## - Coverage 47.51% 2.88% -44.63% + Complexity 5429 82 -5347 Files 922 282 -640 Lines 40968 11593-29375 Branches 4105 946 -3159 - Hits 19464 334-19130 + Misses19780 11233 -8547 + Partials 1724 26 -1698 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `?` | | | hudiclient | `0.00% <0.00%> (-34.59%)` | :arrow_down: | | hudicommon | `?` | | | hudiflink | `?` | | | hudihadoopmr | `?` | | | hudisparkdatasource | `?` | | | hudisync | `5.38% <ø> (-48.67%)` | :arrow_down: | | huditimelineservice | `?` | | | hudiutilities | `9.31% <ø> (-48.72%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/3211?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...org/apache/hudi/config/HoodieClusteringConfig.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVDbHVzdGVyaW5nQ29uZmlnLmphdmE=) | `0.00% <0.00%> (-71.28%)` | :arrow_down: | | [...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2NvbmZpZy9Ib29kaWVXcml0ZUNvbmZpZy5qYXZh) | `0.00% <0.00%> (-42.79%)` | :arrow_down: | | [...n/java/org/apache/hudi/io/CreateHandleFactory.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2lvL0NyZWF0ZUhhbmRsZUZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (ø)` | | | [...in/java/org/apache/hudi/io/HoodieCreateHandle.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aHVkaS1jbGllbnQvaHVkaS1jbGllbnQtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2lvL0hvb2RpZUNyZWF0ZUhhbmRsZS5qYXZh) | `0.00% <0.00%> (ø)` | | | [...rg/apache/hudi/io/HoodieUnboundedCreateHandle.java](https://codecov.io/gh/apache/hudi/pull/3211/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&u
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373234#comment-17373234 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373214#comment-17373214 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) * c9c9a0d5343b65e690544dfcb85e71d915c455e1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=637) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373212#comment-17373212 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) * c9c9a0d5343b65e690544dfcb85e71d915c455e1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373143#comment-17373143 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373124#comment-17373124 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot edited a comment on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=629) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373122#comment-17373122 ] ASF GitHub Bot commented on HUDI-1468: -- hudi-bot commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639166 ## CI report: * ab7bacb26d44f383e7f61ec81531b34011f1383b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373123#comment-17373123 ] ASF GitHub Bot commented on HUDI-1468: -- satishkotha commented on pull request #3211: URL: https://github.com/apache/hudi/pull/3211#issuecomment-872639220 @n3nash @vinothchandar this includes all my changes done for supporting encryption style usecases using clustering framework. I still need to port some tests. But please take a look and add any comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Labels: pull-request-available > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373121#comment-17373121 ] ASF GitHub Bot commented on HUDI-1468: -- satishkotha opened a new pull request #3211: URL: https://github.com/apache/hudi/pull/3211 ## What is the purpose of the pull request Support custom clustering strategies and preserve commit time to support incremental read ## Brief change log * introduce new way of running clustering using SingleSparkJobExecutionStrategy for usecases that dont need sorting * Push down more logic into clustering strategies to avoid RDD union. * Make some performance improvements after running at large scale. Avoid RDD collect multiple times. * Preserve Hoodie commit time (optional for backward compatibility) while rewriting the data ## Verify this pull request This change added tests and can be verified as follows: ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1468) incremental read support with clustering
[ https://issues.apache.org/jira/browse/HUDI-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371819#comment-17371819 ] Vinoth Chandar commented on HUDI-1468: -- [~309637554] do you plan to work on this? It would be good to have this in the next release. > incremental read support with clustering > > > Key: HUDI-1468 > URL: https://issues.apache.org/jira/browse/HUDI-1468 > Project: Apache Hudi > Issue Type: Sub-task > Components: Incremental Pull >Affects Versions: 0.9.0 >Reporter: satish >Assignee: liwei >Priority: Blocker > Fix For: 0.9.0 > > > As part of clustering, metadata such as hoodie_commit_time changes for > records that are clustered. This is specific to > SparkBulkInsertBasedRunClusteringStrategy implementation. Figure out a way to > carry commit_time from original record to support incremental queries. > Also, incremental queries dont work with 'replacecommit' used by clustering > HUDI-1264. Change incremental query to work for replacecommits created by > Clustering. -- This message was sent by Atlassian Jira (v8.3.4#803005)