[jira] [Resolved] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect
[ https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Govindarajan resolved HUDI-1695. --- Resolution: Fixed > Deltastreamer HoodieIncrSource exception error messaging is incorrect > - > > Key: HUDI-1695 > URL: https://issues.apache.org/jira/browse/HUDI-1695 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Vinoth Govindarajan >Assignee: Vinoth Govindarajan >Priority: Trivial > Labels: beginner, pull-request-available > Fix For: 0.8.0 > > > When you set your source_class as HoodieIncrSource and invoke deltastreamer > without any checkpoint, it throws the following Exception: > > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code} > > The error messaging is wrong and misleading, the correct parameter is: > {code:java} > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt > {code} > Check out the correct parameter in this > [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78] > > The correct messaging should be: > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect
[ https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Govindarajan closed HUDI-1695. - > Deltastreamer HoodieIncrSource exception error messaging is incorrect > - > > Key: HUDI-1695 > URL: https://issues.apache.org/jira/browse/HUDI-1695 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Vinoth Govindarajan >Assignee: Vinoth Govindarajan >Priority: Trivial > Labels: beginner, pull-request-available > Fix For: 0.8.0 > > > When you set your source_class as HoodieIncrSource and invoke deltastreamer > without any checkpoint, it throws the following Exception: > > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code} > > The error messaging is wrong and misleading, the correct parameter is: > {code:java} > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt > {code} > Check out the correct parameter in this > [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78] > > The correct messaging should be: > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] liujinhui1994 commented on a change in pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
liujinhui1994 commented on a change in pull request #2438: URL: https://github.com/apache/hudi/pull/2438#discussion_r594877281 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieDeltaStreamerWrapper.java ## @@ -65,7 +65,7 @@ public void scheduleCompact() throws Exception { return upsert(WriteOperationType.UPSERT); } - public Pair>> fetchSource() throws Exception { + public Pair>, Pair> fetchSource() throws Exception { Review comment: After your PR is over, continue with the next PR? @nsivabalan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect
[ https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302235#comment-17302235 ] Vinoth Govindarajan commented on HUDI-1695: --- PR has been merged. > Deltastreamer HoodieIncrSource exception error messaging is incorrect > - > > Key: HUDI-1695 > URL: https://issues.apache.org/jira/browse/HUDI-1695 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Vinoth Govindarajan >Assignee: Vinoth Govindarajan >Priority: Trivial > Labels: beginner, pull-request-available > Fix For: 0.8.0 > > > When you set your source_class as HoodieIncrSource and invoke deltastreamer > without any checkpoint, it throws the following Exception: > > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code} > > The error messaging is wrong and misleading, the correct parameter is: > {code:java} > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt > {code} > Check out the correct parameter in this > [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78] > > The correct messaging should be: > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on a change in pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
nsivabalan commented on a change in pull request #2438: URL: https://github.com/apache/hudi/pull/2438#discussion_r594471195 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieDeltaStreamerWrapper.java ## @@ -65,7 +65,7 @@ public void scheduleCompact() throws Exception { return upsert(WriteOperationType.UPSERT); } - public Pair>> fetchSource() throws Exception { + public Pair>, Pair> fetchSource() throws Exception { Review comment: this is getting out of hand(two pairs within a pair). we can't keep adding more Pairs here. I am adding a class to hold the return value in a class here in one of my PRs. Lets see if we can rebase once the other PR lands. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
nsivabalan commented on a change in pull request #2438: URL: https://github.com/apache/hudi/pull/2438#discussion_r594471195 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/HoodieDeltaStreamerWrapper.java ## @@ -65,7 +65,7 @@ public void scheduleCompact() throws Exception { return upsert(WriteOperationType.UPSERT); } - public Pair>> fetchSource() throws Exception { + public Pair>, Pair> fetchSource() throws Exception { Review comment: this is getting out of hand(two pairs within a pair). we can't keep adding more Pairs here. I am adding a class to hold the return value here in one of my PRs. Lets see if we can rebase once the other PR lands. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (3b36cb8 -> 16864ae)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 3b36cb8 [HUDI-1552] Improve performance of key lookups from base file in Metadata Table. (#2494) add 16864ae [HUDI-1695] Fixed the error messaging (#2679) No new revisions were added by this update. Summary of changes: .../org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[GitHub] [hudi] yanghua merged pull request #2679: [MINOR] Fixed the error messaging
yanghua merged pull request #2679: URL: https://github.com/apache/hudi/pull/2679 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on pull request #2673: [HUDI-1688] Uncache Rdd once write operation is complete
xiarixiaoyao commented on pull request #2673: URL: https://github.com/apache/hudi/pull/2673#issuecomment-799905602 @nsivabalan yes, since the problem of company information security, i cannot paste screenshot of test result and dump. before fix env: (executor 4 core 8G)*50 step1: merge(df, 800 , "hudikey", "testOOM", DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 616s step2: merge(df, 800 , "hudikey", "testOOM1", DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 710s step3: merge(df, 800 , "hudikey", "testOOM2", DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 676s step4: merge(df, 800 , "hudikey", "testOOM3", DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 1077s step5: merge(df, 800 , "hudikey", "testOOM4", DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 1154s step6: merge(df, 800 , "hudikey", "testOOM5", DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 2055s (some executor oom) Analysis dump: we find More than 90 percent of memory is consumed by cached rdd after fix step1: merge(df, 800 , "hudikey", "testOOM", DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 632s step2: merge(df, 800 , "hudikey", "testOOM1", DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 710s step3: merge(df, 800 , "hudikey", "testOOM2", DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 698s step4: merge(df, 800 , "hudikey", "testOOM3", DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 723s step5: merge(df, 800 , "hudikey", "testOOM4", DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 616s step6: merge(df, 800 , "hudikey", "testOOM5", DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL, op = "insert") time cost: 703s One last point, when we cached some rdds, we should uncache those rdds timely once those rdds are not used。 spark can uncached rdds automaticly but this process is uncertain。 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on issue #2656: HUDI insert operation is working same as upsert
pengzhiwei2018 commented on issue #2656: URL: https://github.com/apache/hudi/issues/2656#issuecomment-799888195 Hi @shivabansal1046 , no need to add a extra column, just set `DataSourceWriteOptions#KEYGENERATOR_CLASS_OPT_KEY` to `classOf[UuidKeyGenKeyGenerator]` which generate a uuid key. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1526) Translate the spark api partitionBy to hoodie.datasource.write.partitionpath.field
[ https://issues.apache.org/jira/browse/HUDI-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianghu Wang updated HUDI-1526: --- Fix Version/s: 0.8.0 > Translate the spark api partitionBy to > hoodie.datasource.write.partitionpath.field > -- > > Key: HUDI-1526 > URL: https://issues.apache.org/jira/browse/HUDI-1526 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: teeyog >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > Currently, if you want to set the partition of hudi, you must configure it > with the parameter hoodie.datasource.write.partitionpath.field, but the Spark > DataFrame api partitonBy does not take effect. We can automatically translate > the parameter of partitionBy into the partition field of hudi. > [https://github.com/apache/hudi/pull/2431|https://github.com/apache/hudi/pull/2431/commits/fa597aa31b5af5ceea651af32bc163911137552c] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1526) Translate the spark api partitionBy to hoodie.datasource.write.partitionpath.field
[ https://issues.apache.org/jira/browse/HUDI-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianghu Wang resolved HUDI-1526. Resolution: Resolved Resolved via master branch : 26da4f546275e8ab6496537743efe73510cb723d > Translate the spark api partitionBy to > hoodie.datasource.write.partitionpath.field > -- > > Key: HUDI-1526 > URL: https://issues.apache.org/jira/browse/HUDI-1526 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: teeyog >Priority: Major > Labels: pull-request-available > > Currently, if you want to set the partition of hudi, you must configure it > with the parameter hoodie.datasource.write.partitionpath.field, but the Spark > DataFrame api partitonBy does not take effect. We can automatically translate > the parameter of partitionBy into the partition field of hudi. > [https://github.com/apache/hudi/pull/2431|https://github.com/apache/hudi/pull/2431/commits/fa597aa31b5af5ceea651af32bc163911137552c] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] satishkotha commented on a change in pull request #2678: Added support for replace commits in commit showpartitions, commit sh…
satishkotha commented on a change in pull request #2678: URL: https://github.com/apache/hudi/pull/2678#discussion_r594753875 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java ## @@ -431,4 +442,20 @@ public String syncCommits(@CliOption(key = {"path"}, help = "Path of the table t return "Load sync state between " + HoodieCLI.getTableMetaClient().getTableConfig().getTableName() + " and " + HoodieCLI.syncTableMetadata.getTableConfig().getTableName(); } + + /* + Checks whether a commit or replacecommit action exists in the timeline. + * */ + private Option getCommitOrReplaceCommitInstant(HoodieTimeline timeline, String instantTime) { Review comment: consider changing signature to return Option and deserialize instant details inside this method. This would avoid repetition to get instant details in multiple places. You can also do additional validation. for example: for replace commit, deserialize using HoodieReplaceCommitMetadata class ## File path: hudi-cli/src/test/java/org/apache/hudi/cli/testutils/HoodieTestReplaceCommitMetadatGenerator.java ## @@ -0,0 +1,74 @@ +package org.apache.hudi.cli.testutils; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hudi.common.model.HoodieReplaceCommitMetadata; +import org.apache.hudi.common.model.HoodieWriteStat; +import org.apache.hudi.common.table.timeline.HoodieTimeline; +import org.apache.hudi.common.testutils.FileCreateUtils; +import org.apache.hudi.common.util.Option; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.UUID; + +import static org.apache.hudi.common.testutils.FileCreateUtils.baseFileName; +import static org.apache.hudi.common.util.CollectionUtils.createImmutableList; + +public class HoodieTestReplaceCommitMetadatGenerator extends HoodieTestCommitMetadataGenerator{ +public static void createReplaceCommitFileWithMetadata(String basePath, String commitTime, Configuration configuration, + Option writes, Option updates) throws Exception { +createReplaceCommitFileWithMetadata(basePath, commitTime, configuration, UUID.randomUUID().toString(), +UUID.randomUUID().toString(), writes, updates); +} + +private static void createReplaceCommitFileWithMetadata(String basePath, String commitTime, Configuration configuration, +String fileId1, String fileId2, Option writes, +Option updates) throws Exception { +List commitFileNames = Arrays.asList(HoodieTimeline.makeCommitFileName(commitTime), Review comment: Can we reuse replace commit generator from other places? HoodieTestTable for example? ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java ## @@ -266,12 +267,15 @@ public String showCommitPartitions( HoodieActiveTimeline activeTimeline = HoodieCLI.getTableMetaClient().getActiveTimeline(); HoodieTimeline timeline = activeTimeline.getCommitsTimeline().filterCompletedInstants(); -HoodieInstant commitInstant = new HoodieInstant(false, HoodieTimeline.COMMIT_ACTION, instantTime); -if (!timeline.containsInstant(commitInstant)) { +Option hoodieInstantOptional = getCommitOrReplaceCommitInstant(timeline, instantTime); +if (!hoodieInstantOptional.isPresent()) { return "Commit " + instantTime + " not found in Commits " + timeline; } -HoodieCommitMetadata meta = HoodieCommitMetadata.fromBytes(activeTimeline.getInstantDetails(commitInstant).get(), + +HoodieInstant hoodieInstant = hoodieInstantOptional.get(); + +HoodieCommitMetadata meta = HoodieCommitMetadata.fromBytes(activeTimeline.getInstantDetails(hoodieInstant).get(), HoodieCommitMetadata.class); List rows = new ArrayList<>(); for (Map.Entry> entry : meta.getPartitionToWriteStats().entrySet()) { Review comment: it'd be nice to compute totalfFilesReplaced and show it in the table. It could be 0 for regular commits. ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CommitsCommand.java ## @@ -431,4 +442,20 @@ public String syncCommits(@CliOption(key = {"path"}, help = "Path of the table t return "Load sync state between " + HoodieCLI.getTableMetaClient().getTableConfig().getTableName() + " and " + HoodieCLI.syncTableMetadata.getTableConfig().getTableName(); } + + /* + Checks whether a commit or replacecommit action exists in the timeline. + * */ + private Option getCommitOrReplaceCommitInstant(HoodieTimeline timeline, String instantTime) { +HoodieInstant hoodieInstant = new HoodieInstant(false, HoodieTimeline.COMMIT_ACTION, instantTime); + +if (!timeline.containsInstant(hoodieInstant)) { + hoodieInstant = new H
[GitHub] [hudi] codecov-io commented on pull request #2679: [MINOR] Fixed the error messaging
codecov-io commented on pull request #2679: URL: https://github.com/apache/hudi/pull/2679#issuecomment-799814201 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2679?src=pr&el=h1) Report > Merging [#2679](https://codecov.io/gh/apache/hudi/pull/2679?src=pr&el=desc) (818439b) into [master](https://codecov.io/gh/apache/hudi/commit/3b36cb805d066a3112e3a355ef502dbe4b2c1824?el=desc) (3b36cb8) will **increase** coverage by `17.44%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2679/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2679?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2679 +/- ## = + Coverage 51.98% 69.43% +17.44% + Complexity 3580 363 -3217 = Files 466 53 -413 Lines 22318 1963-20355 Branches 2377 235 -2142 = - Hits 11603 1363-10240 + Misses 9706 466 -9240 + Partials 1009 134 -875 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2679?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...di/utilities/sources/helpers/IncrSourceHelper.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9JbmNyU291cmNlSGVscGVyLmphdmE=) | `54.54% <ø> (ø)` | `4.00 <0.00> (ø)` | | | [...he/hudi/common/model/BootstrapBaseFileMapping.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jvb3RzdHJhcEJhc2VGaWxlTWFwcGluZy5qYXZh) | | | | | [...n/java/org/apache/hudi/internal/DefaultSource.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2ludGVybmFsL0RlZmF1bHRTb3VyY2UuamF2YQ==) | | | | | [...java/org/apache/hudi/table/format/FormatUtils.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9mb3JtYXQvRm9ybWF0VXRpbHMuamF2YQ==) | | | | | [...a/org/apache/hudi/common/util/CompactionUtils.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29tcGFjdGlvblV0aWxzLmphdmE=) | | | | | [...ache/hudi/hadoop/utils/HoodieInputFormatUtils.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZUlucHV0Rm9ybWF0VXRpbHMuamF2YQ==) | | | | | [...ache/hudi/common/table/timeline/TimelineUtils.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lVXRpbHMuamF2YQ==) | | | | | [.../hudi/common/util/collection/LazyFileIterable.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9MYXp5RmlsZUl0ZXJhYmxlLmphdmE=) | | | | | [...ava/org/apache/hudi/cli/commands/TableCommand.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1RhYmxlQ29tbWFuZC5qYXZh) | | | | | [...org/apache/hudi/hadoop/HoodieHFileInputFormat.java](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZUhGaWxlSW5wdXRGb3JtYXQuamF2YQ==) | | | | | ... and [404 more](https://codecov.io/gh/apache/hudi/pull/2679/diff?src=pr&el=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please cont
[jira] [Updated] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect
[ https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1695: - Labels: beginner pull-request-available (was: beginner) > Deltastreamer HoodieIncrSource exception error messaging is incorrect > - > > Key: HUDI-1695 > URL: https://issues.apache.org/jira/browse/HUDI-1695 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Vinoth Govindarajan >Assignee: Vinoth Govindarajan >Priority: Trivial > Labels: beginner, pull-request-available > Fix For: 0.8.0 > > > When you set your source_class as HoodieIncrSource and invoke deltastreamer > without any checkpoint, it throws the following Exception: > > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code} > > The error messaging is wrong and misleading, the correct parameter is: > {code:java} > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt > {code} > Check out the correct parameter in this > [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78] > > The correct messaging should be: > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vingov opened a new pull request #2679: [HUDI-1695] Fixed the error messaging
vingov opened a new pull request #2679: URL: https://github.com/apache/hudi/pull/2679 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request This pull-request fixes the error messaging with the correct hoodie conf parameter. ## Brief change log - *Updated the error messaging when using HoodieIncrSource class* ## Verify this pull request This pull request is a trivial rework / code cleanup without any test coverage. ## Committer checklist - [x] Has a corresponding JIRA in PR title & commit - [x] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1695) Deltastreamer HoodieIncrSource exception error messaging is incorrect
[ https://issues.apache.org/jira/browse/HUDI-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Govindarajan updated HUDI-1695: -- Summary: Deltastreamer HoodieIncrSource exception error messaging is incorrect (was: Deltastream HoodieIncrSource exception error messaging is incorrect) > Deltastreamer HoodieIncrSource exception error messaging is incorrect > - > > Key: HUDI-1695 > URL: https://issues.apache.org/jira/browse/HUDI-1695 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Vinoth Govindarajan >Assignee: Vinoth Govindarajan >Priority: Trivial > Labels: beginner > Fix For: 0.8.0 > > > When you set your source_class as HoodieIncrSource and invoke deltastreamer > without any checkpoint, it throws the following Exception: > > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code} > > The error messaging is wrong and misleading, the correct parameter is: > {code:java} > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt > {code} > Check out the correct parameter in this > [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78] > > The correct messaging should be: > {code:java} > User class threw exception: java.lang.IllegalArgumentException: Missing begin > instant for incremental pull. For reading from latest committed instant set > hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1695) Deltastream HoodieIncrSource exception error messaging is incorrect
Vinoth Govindarajan created HUDI-1695: - Summary: Deltastream HoodieIncrSource exception error messaging is incorrect Key: HUDI-1695 URL: https://issues.apache.org/jira/browse/HUDI-1695 Project: Apache Hudi Issue Type: Bug Components: DeltaStreamer Reporter: Vinoth Govindarajan Assignee: Vinoth Govindarajan Fix For: 0.8.0 When you set your source_class as HoodieIncrSource and invoke deltastreamer without any checkpoint, it throws the following Exception: {code:java} User class threw exception: java.lang.IllegalArgumentException: Missing begin instant for incremental pull. For reading from latest committed instant set hoodie.deltastreamer.source.hoodie.read_latest_on_midding_ckpt to true{code} The error messaging is wrong and misleading, the correct parameter is: {code:java} hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt {code} Check out the correct parameter in this [file|https://github.com/apache/hudi/blob/release-0.7.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HoodieIncrSource.java#L78] The correct messaging should be: {code:java} User class threw exception: java.lang.IllegalArgumentException: Missing begin instant for incremental pull. For reading from latest committed instant set hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt to true {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] prashantwason commented on pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.
prashantwason commented on pull request #2494: URL: https://github.com/apache/hudi/pull/2494#issuecomment-799740584 Looks good @vinothchandar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] prashantwason commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.
prashantwason commented on a change in pull request #2494: URL: https://github.com/apache/hudi/pull/2494#discussion_r594671546 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java ## @@ -147,82 +150,91 @@ private void initIfNeeded() { } } timings.add(timer.endTimer()); - LOG.info(String.format("Metadata read for key %s took [open, baseFileRead, logMerge] %s ms", key, timings)); + LOG.info(String.format("Metadata read for key %s took [baseFileRead, logMerge] %s ms", key, timings)); return Option.ofNullable(hoodieRecord); } catch (IOException ioe) { throw new HoodieIOException("Error merging records from metadata table for key :" + key, ioe); -} finally { Review comment: Yep. Thanks for fixing it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar merged pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.
vinothchandar merged pull request #2494: URL: https://github.com/apache/hudi/pull/2494 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (76bf2cc -> 3b36cb8)
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 76bf2cc [HUDI-1692] Bounded source for stream writer (#2674) add 3b36cb8 [HUDI-1552] Improve performance of key lookups from base file in Metadata Table. (#2494) No new revisions were added by this update. Summary of changes: .../apache/hudi/cli/commands/MetadataCommand.java | 6 +- .../java/org/apache/hudi/table/HoodieTable.java| 3 +- .../hudi/metadata/TestHoodieBackedMetadata.java| 1 - .../hudi/common/config/HoodieMetadataConfig.java | 15 --- .../common/table/view/FileSystemViewManager.java | 2 +- .../apache/hudi/io/storage/HoodieHFileReader.java | 35 +++-- .../hudi/metadata/HoodieBackedTableMetadata.java | 142 + .../apache/hudi/metadata/HoodieTableMetadata.java | 7 +- 8 files changed, 125 insertions(+), 86 deletions(-)
[GitHub] [hudi] vinothchandar commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.
vinothchandar commented on a change in pull request #2494: URL: https://github.com/apache/hudi/pull/2494#discussion_r594656644 ## File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java ## @@ -232,12 +246,13 @@ public long getTotalRecords() { } @Override - public void close() { + public synchronized void close() { try { reader.close(); reader = null; + keyScanner = null; } catch (IOException e) { - e.printStackTrace(); Review comment: @prashantwason fixed this as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.
codecov-io edited a comment on pull request #2494: URL: https://github.com/apache/hudi/pull/2494#issuecomment-767956391 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2494?src=pr&el=h1) Report > Merging [#2494](https://codecov.io/gh/apache/hudi/pull/2494?src=pr&el=desc) (59b919a) into [master](https://codecov.io/gh/apache/hudi/commit/d8af24d8a2fdbead4592a36df1bd9dda333f1513?el=desc) (d8af24d) will **increase** coverage by `17.89%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2494/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2494?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2494 +/- ## = + Coverage 51.53% 69.43% +17.89% + Complexity 3491 363 -3128 = Files 462 53 -409 Lines 21881 1963-19918 Branches 2327 235 -2092 = - Hits 11277 1363 -9914 + Misses 9624 466 -9158 + Partials980 134 -846 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.43% <ø> (-0.06%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2494?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=) | `54.68% <0.00%> (-1.57%)` | `13.00% <0.00%> (ø%)` | | | [...e/hudi/common/util/queue/BoundedInMemoryQueue.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvQm91bmRlZEluTWVtb3J5UXVldWUuamF2YQ==) | | | | | [...udi/operator/partitioner/BucketAssignFunction.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9wYXJ0aXRpb25lci9CdWNrZXRBc3NpZ25GdW5jdGlvbi5qYXZh) | | | | | [...pache/hudi/operator/KeyedWriteProcessOperator.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9LZXllZFdyaXRlUHJvY2Vzc09wZXJhdG9yLmphdmE=) | | | | | [...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh) | | | | | [...til/jvm/HotSpotMemoryLayoutSpecification64bit.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvanZtL0hvdFNwb3RNZW1vcnlMYXlvdXRTcGVjaWZpY2F0aW9uNjRiaXQuamF2YQ==) | | | | | [...e/hudi/common/model/HoodieRollingStatMetadata.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJvbGxpbmdTdGF0TWV0YWRhdGEuamF2YQ==) | | | | | [...he/hudi/exception/HoodieNotSupportedException.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZU5vdFN1cHBvcnRlZEV4Y2VwdGlvbi5qYXZh) | | | | | [...udi/common/table/timeline/dto/ClusteringOpDTO.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9DbHVzdGVyaW5nT3BEVE8uamF2YQ==) | | | | | [.../apache/hudi/common/model/ClusteringOperation.java](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0NsdXN0ZXJpbmdPcGVyYXRpb24uamF2YQ==) | | | | | ... and [394 more](https://codecov.io/gh/apache/hudi/pull/2494/diff?src=pr&el=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please l
[GitHub] [hudi] nsivabalan edited a comment on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
nsivabalan edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-799571386 Thanks for your contribution. this is going to be useful to the community. Few high level questions. 1. Why not we leverage DeltaSreamerConfig.checkpoint to pass in a timestamp for Kafka source? Or do we expect the format of this config to be "topic_name,partition_num:offset,partition_num:offset," and hence we need a new config for timestamp based checkpoint. 2. If yes to (1), Did we think about parsing the checkpoint config and determining whether its above format or timestamp and then proceeding from there. Just trying to avoid introducing new configs if possible. 3. Checkpoint in deltastreamer in general is getting too complicated. I definitely see a benefit in this patch. But, is there a way we can abstract it out based on source. Bcoz, the new config introduced as part of this PR, is very specific to Kafka. So, trying to see if we can keep it abstracted out from deltastreamer if possible. 4. I see KafkaConsumer.offsetsForTimes() could return null for partitions w/ msgs of old format. So, what's the expected behavior for such partitions. Do we resume from earliest offset? @n3nash @vinothchandar : open to hear your thoughts if any. One of my suggestion above, could potentially add apis to Source and hence CCing you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
nsivabalan commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-799571386 Few high level questions. 1. Why not we leverage DeltaSreamerConfig.checkpoint to pass in a timestamp for Kafka source? Or do we expect the format of this config to be "topic_name,partition_num:offset,partition_num:offset," and hence we need a new config for timestamp based checkpoint. 2. If yes to (1), Did we think about parsing the checkpoint config and determining whether its above format or timestamp and then proceeding from there. Just trying to avoid introducing new configs if possible. 3. Checkpoint in deltastreamer in general is getting too complicated. I definitely see a benefit in this patch. But, is there a way we can abstract it out based on source. Bcoz, the new config introduced as part of this PR, is very specific to Kafka. So, trying to see if we can keep it abstracted out from deltastreamer if possible. 4. I see KafkaConsumer.offsetsForTimes() could return null for partitions w/ msgs of old format. So, what's the expected behavior for such partitions. Do we resume from earliest offset? @n3nash @vinothchandar : open to hear your thoughts if any. One of my suggestion above, could potentially add apis to Source and hence CCing you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jsbali opened a new pull request #2678: Added support for replace commits in commit showpartitions, commit sh…
jsbali opened a new pull request #2678: URL: https://github.com/apache/hudi/pull/2678 …ow_write_stats, commit showfiles ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request Add support for replace commit in hudi-cli ## Brief change log Currently hudi-cli doesn't support replace commits in the commit show* functions. This adds the foundation for that. This PR still doesn't support the extraMetadata of the replace commit which will be added in subsequent PR's. ## Verify this pull request This PR is one part of adding replace commit support in hudi-cli. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] jsbali opened a new pull request #2677: Added tests to TestHoodieTimelineArchiveLog for the archival of compl…
jsbali opened a new pull request #2677: URL: https://github.com/apache/hudi/pull/2677 …eted clean and rollback actions. ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request This pull request adds testcases for the TestHoodieTimelineArchiveLog class. ## Brief change log This pull request adds testcases for the TestHoodieTimelineArchiveLog class specifically the getCleanInstantsToArchive function. ## Verify this pull request This change added tests. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 565dac6 Travis CI build asf-site 565dac6 is described below commit 565dac65cff779d03f3a133314b26b2bb7b341aa Author: CI AuthorDate: Mon Mar 15 16:12:28 2021 + Travis CI build asf-site --- content/activity.html | 24 ++ .../blog/hudi-file-sizing/adding_new_files.png | Bin 0 -> 44237 bytes .../bin_packing_existing_data_files.png| Bin 0 -> 23955 bytes .../blog/hudi-file-sizing/initial_layout.png | Bin 0 -> 34742 bytes content/assets/js/lunr/lunr-store.js | 5 + content/blog.html | 24 ++ content/blog/hudi-file-sizing/index.html | 331 + content/cn/activity.html | 24 ++ content/sitemap.xml| 4 + 9 files changed, 412 insertions(+) diff --git a/content/activity.html b/content/activity.html index 0c02356..0b308b4 100644 --- a/content/activity.html +++ b/content/activity.html @@ -193,6 +193,30 @@ +Streaming Responsibly - How Apache Hudi maintains optimum sized files + + + + + + +https://cwiki.apache.org/confluence/display/~shivnarayan";>Sivabalan Narayanan posted on March 1, 2021 + +Maintaining well-sized files can improve query performance significantly + + + + + + + + + + + https://schema.org/CreativeWork";> + + + Apache Hudi Key Generators diff --git a/content/assets/images/blog/hudi-file-sizing/adding_new_files.png b/content/assets/images/blog/hudi-file-sizing/adding_new_files.png new file mode 100644 index 000..f61cd89 Binary files /dev/null and b/content/assets/images/blog/hudi-file-sizing/adding_new_files.png differ diff --git a/content/assets/images/blog/hudi-file-sizing/bin_packing_existing_data_files.png b/content/assets/images/blog/hudi-file-sizing/bin_packing_existing_data_files.png new file mode 100644 index 000..324c7fc Binary files /dev/null and b/content/assets/images/blog/hudi-file-sizing/bin_packing_existing_data_files.png differ diff --git a/content/assets/images/blog/hudi-file-sizing/initial_layout.png b/content/assets/images/blog/hudi-file-sizing/initial_layout.png new file mode 100644 index 000..ae0e9a1 Binary files /dev/null and b/content/assets/images/blog/hudi-file-sizing/initial_layout.png differ diff --git a/content/assets/js/lunr/lunr-store.js b/content/assets/js/lunr/lunr-store.js index ae425f0..f789966 100644 --- a/content/assets/js/lunr/lunr-store.js +++ b/content/assets/js/lunr/lunr-store.js @@ -1443,4 +1443,9 @@ var store = [{ "excerpt":"Every record in Hudi is uniquely identified by a HoodieKey, which is a pair of record key and partition path where the record belongs to. Hudi has imposed this constraint so that updates and deletes can be applied to the record of interest. Hudi relies on the partition path field...","categories": ["blog"], "tags": [], "url": "https://hudi.apache.org/blog/hudi-key-generators/";, +"teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ +"title": "Streaming Responsibly - How Apache Hudi maintains optimum sized files", +"excerpt":"Apache Hudi is a data lake platform technology that provides several functionalities needed to build and manage data lakes. One such key feature that hudi provides is self-managing file sizing so that users don’t need to worry about manual table maintenance. Having a lot of small files will make it...","categories": ["blog"], +"tags": [], +"url": "https://hudi.apache.org/blog/hudi-file-sizing/";, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},] diff --git a/content/blog.html b/content/blog.html index c0d482d..30a0a7b 100644 --- a/content/blog.html +++ b/content/blog.html @@ -191,6 +191,30 @@ +Streaming Responsibly - How Apache Hudi maintains optimum sized files + + + + + + +https://cwiki.apache.org/confluence/display/~shivnarayan";>Sivabalan Narayanan posted on March 1, 2021 + +Maintaining well-sized files can improve query performance significantly + + + + + + + + + + + https://schema.org/CreativeWork";> + + + Apache Hudi Key Generators diff --git a/content/blog/hudi-file-sizing/index.html b/content/blog/hudi-file-sizing/index.html new file mode 100644 index 000..934174a --- /dev/null +++ b/content/blog/hudi-file-sizing/index.html @@ -0,0 +1,331 @@ + + + + + +Streaming Responsibly - How Apache Hudi maintains optimum sized files - Apache Hudi + + + + + + +https://hudi.apache.org/blog/hudi-file-sizing/";> +
[GitHub] [hudi] shivabansal1046 commented on issue #2656: HUDI insert operation is working same as upsert
shivabansal1046 commented on issue #2656: URL: https://github.com/apache/hudi/issues/2656#issuecomment-799528507 Hi pengzhiwei2018, Are you suggesting to add extra column which is generated key? Is this workaround or this is how it should be? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vburenin commented on pull request #2619: [HUDI-1650] Custom avro kafka deserializer.
vburenin commented on pull request #2619: URL: https://github.com/apache/hudi/pull/2619#issuecomment-799525731 @nsivabalan I am very strapped by time. I will be able get back to it only next Q. The overall change is trivial. If you could continue, it would be great. As soon as this one is done, I will publish another PR for SchemaRegistryProvider. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
liujinhui1994 commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-799512747 no problem -- Original -- From: Sivabalan Narayanan ***@***.***> Date: Mon,Mar 15,2021 11:28 PM To: apache/hudi ***@***.***> Cc: liujinhui ***@***.***>, Mention ***@***.***> Subject: Re: [apache/hudi] [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp (#2438) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on issue #2656: HUDI insert operation is working same as upsert
pengzhiwei2018 commented on issue #2656: URL: https://github.com/apache/hudi/issues/2656#issuecomment-799511860 Hi @shivabansal1046 , currently hudi would do the insert operator when you have specified a rowkey for insert. You can use `UuidKeyGenKeyGenerator` as the row key to walk around this. First define a `UuidKeyGenKeyGenerator` class. > class UuidKeyGenKeyGenerator(props: TypedProperties) extends ComplexKeyGenerator(props) { override def getRecordKey(record: GenericRecord): String = { UUID.randomUUID().toString } } Then config the `DataSourceWriteOptions#KEYGENERATOR_CLASS_OPT_KEY` to `classOf[UuidKeyGenKeyGenerator]`. You can have a try , hope it can help you~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
nsivabalan commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-799511127 hey folks. may I know what's the status of this PR. I see this could benefit others in the community as well. Do you think we can take it across the finish line by this weekend. so that we have it for upcoming release? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2619: [HUDI-1650] Custom avro kafka deserializer.
nsivabalan commented on pull request #2619: URL: https://github.com/apache/hudi/pull/2619#issuecomment-799507450 @vburenin : Did you get a chance to work on this PR. We would like to have this in before our next release. If you are strapped for time, let me know. I will try to squeeze in sometime this week. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: [HUDI-1563] Adding hudi file sizing/ small file management blog (#2612)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 601f54f [HUDI-1563] Adding hudi file sizing/ small file management blog (#2612) 601f54f is described below commit 601f54f1ea215281ede51125872d5c2455077dba Author: Sivabalan Narayanan AuthorDate: Mon Mar 15 11:18:57 2021 -0400 [HUDI-1563] Adding hudi file sizing/ small file management blog (#2612) Co-authored-by: Vinoth Chandar --- docs/_posts/2021-03-01-hudi-file-sizing.md | 85 + .../blog/hudi-file-sizing/adding_new_files.png | Bin 0 -> 44237 bytes .../bin_packing_existing_data_files.png| Bin 0 -> 23955 bytes .../blog/hudi-file-sizing/initial_layout.png | Bin 0 -> 34742 bytes 4 files changed, 85 insertions(+) diff --git a/docs/_posts/2021-03-01-hudi-file-sizing.md b/docs/_posts/2021-03-01-hudi-file-sizing.md new file mode 100644 index 000..c79ea80 --- /dev/null +++ b/docs/_posts/2021-03-01-hudi-file-sizing.md @@ -0,0 +1,85 @@ +--- +title: "Streaming Responsibly - How Apache Hudi maintains optimum sized files" +excerpt: "Maintaining well-sized files can improve query performance significantly" +author: shivnarayan +category: blog +--- + +Apache Hudi is a data lake platform technology that provides several functionalities needed to build and manage data lakes. +One such key feature that hudi provides is self-managing file sizing so that users don’t need to worry about +manual table maintenance. Having a lot of small files will make it harder to achieve good query performance, due to query engines +having to open/read/close files way too many times, to plan and execute queries. But for streaming data lake use-cases, +inherently ingests are going to end up having smaller volume of writes, which might result in lot of small files if no special handling is done. + +# During Write vs After Write + +Common approaches to writing very small files and then later stitching them together solve for system scalability issues posed +by small files but might violate query SLA's by exposing small files to them. In fact, you can easily do so on a Hudi table, +by running a clustering operation, as detailed in a [previous blog](/blog/hudi-clustering-intro/). + +In this blog, we discuss file sizing optimizations in Hudi, during the initial write time, so we don't have to effectively +re-write all data again, just for file sizing. If you want to have both (a) self managed file sizing and +(b) Avoid exposing small files to queries, automatic file sizing feature saves the day. + +Hudi has the ability to maintain a configured target file size, when performing inserts/upsert operations. +(Note: bulk_insert operation does not provide this functionality and is designed as a simpler replacement for +normal `spark.write.parquet`). + +## Configs + +For illustration purposes, we are going to consider only COPY_ON_WRITE table. + +Configs of interest before we dive into the algorithm: + +- [Max file size](/docs/configurations.html#limitFileSize): Max size for a given data file. Hudi will try to maintain file sizes to this configured value +- [Soft file limit](/docs/configurations.html#compactionSmallFileSize): Max file size below which a given data file is considered to a small file +- [Insert split size](/docs/configurations.html#insertSplitSize): Number of inserts grouped for a single partition. This value should match +the number of records in a single file (you can determine based on max file size and per record size) + +For instance, if your first config value is 120MB and 2nd config value is set to 100MB, any file whose size is < 100MB +would be considered a small file. + +If you wish to turn off this feature, set the config value for soft file limit to 0. + +## Example + +Let’s say this is the layout of data files for a given partition. + +![Initial layout](/assets/images/blog/hudi-file-sizing/initial_layout.png) +_Figure: Initial data file sizes for a given partition of interest_ + +Let’s assume the configured values for max file size and small file size limit are 120MB and 100MB. File_1’s current +size is 40MB, File_2’s size is 80MB, File_3’s size is 90MB, File_4’s size is 130MB and File_5’s size is 105MB. Let’s see +what happens when a new write happens. + +**Step 1:** Assigning updates to files. In this step, We look up the index to find the tagged location and records are +assigned to respective files. Note that we assume updates are only going to increase the file size and that would simply result +in a much bigger file. When updates lower the file size (by say, nulling out lot of fields), then a subsequent write will deem +it a small file. + +**Step 2:** Determine small files for each partition path. The soft file limit config value will be leve
[GitHub] [hudi] nsivabalan merged pull request #2612: [HUDI-1563] Adding hudi file sizing/ small file management blog
nsivabalan merged pull request #2612: URL: https://github.com/apache/hudi/pull/2612 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1688) hudi write should uncache rdd, when the write operation is finnished
[ https://issues.apache.org/jira/browse/HUDI-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1688: -- Labels: pull-request-available sev:critical user-support-issues (was: pull-request-available) > hudi write should uncache rdd, when the write operation is finnished > > > Key: HUDI-1688 > URL: https://issues.apache.org/jira/browse/HUDI-1688 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Affects Versions: 0.7.0 >Reporter: tao meng >Priority: Major > Labels: pull-request-available, sev:critical, user-support-issues > Fix For: 0.8.0 > > > now, hudi improve write performance by cache necessary rdds; however when the > write operation is finnished, those cached rdds have not been uncached which > waste lots of memory. > [https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L115] > https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L214 > In our environment: > step1: insert 100GB data into hudi table by spark (ok) > step2: insert another 100GB data into hudi table by spark again (oom ) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan edited a comment on pull request #2673: [HUDI-1688] Uncache Rdd once write operation is complete
nsivabalan edited a comment on pull request #2673: URL: https://github.com/apache/hudi/pull/2673#issuecomment-799497945 @xiarixiaoyao : thanks for your contribution. Were you able to test out the fix in your env. That subsequent writes don't incur OOMs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2673: [HUDI-1688] Uncache Rdd once write operation is complete
nsivabalan commented on pull request #2673: URL: https://github.com/apache/hudi/pull/2673#issuecomment-799497945 @xiarixiaoyao : Were you able to test out the fix in your env. That subsequent writes don't incur OOMs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
nsivabalan commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-799494981 @liujinhui1994 : Thanks for the contribution. There are 2 to 3 PRs with similar goal. Did you get happen to check out existing ones before putting this up? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] shivabansal1046 commented on issue #2656: HUDI insert operation is working same as upsert
shivabansal1046 commented on issue #2656: URL: https://github.com/apache/hudi/issues/2656#issuecomment-799469658 Hi, Below are the configs I am using .write .format("org.apache.hudi"). options(getQuickstartWriteConfigs). option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ"). option(OPERATION_OPT_KEY, "INSERT"). option(PRECOMBINE_FIELD_OPT_KEY, "last_update_time"). option(RECORDKEY_FIELD_OPT_KEY, "id"). option(PARTITIONPATH_FIELD_OPT_KEY, "creation_date"). option(TABLE_NAME, "my_hudi_table") .mode(SaveMode.Append) .save(args(1)) And to your other question, I already have record in HUDI, and during another run its overwriting the record with record having same key. With insert option I am expecting it should simply insert new record without checking if record with same key is present or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wosow opened a new issue #2676: [SUPPORT] When I used 100,000 data to update 100 million data, The program is stuck
wosow opened a new issue #2676: URL: https://github.com/apache/hudi/issues/2676 **Environment Description** * Hudi version : 0.7.0/0.6.0 * Spark version : 2.4.4 * Hive version :2.3.1 * Hadoop version : 2.7.5 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no When I used 100,000 data to update 100 million data, the program was stuck and could not execute further. The table type used was MOR. The program execution diagram is as follows: ![image](https://user-images.githubusercontent.com/34565079/67633-48772800-85dc-11eb-9072-1f4f7a3a2c54.png) hudi parameters as follow: TABLE_TYPE_OPT_KEY -> MOR_TABLE_TYPE_OPT_VAL, // OPERATION_OPT_KEY -> WriteOperationType.UPSERT.value, OPERATION_OPT_KEY -> "upsert", RECORDKEY_FIELD_OPT_KEY -> pkCol, PRECOMBINE_FIELD_OPT_KEY -> preCombineCol, "hoodie.embed.timeline.server" -> "false", "hoodie.cleaner.commits.retained" -> "1", "hoodie.cleaner.fileversions.retained" -> "1", "hoodie.cleaner.policy" -> HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name(), "hoodie.keep.min.commits" -> "3", "hoodie.keep.max.commits" -> "4", "hoodie.compact.inline" -> "true", "hoodie.compact.inline.max.delta.commits" -> "1", // "hoodie.copyonwrite.record.size.estimate" -> String.valueOf(500), PARTITIONPATH_FIELD_OPT_KEY -> "dt", HIVE_PARTITION_FIELDS_OPT_KEY -> "dt", HIVE_URL_OPT_KEY -> "jdbc:hive2:/0.0.0.0:1", HIVE_USER_OPT_KEY -> "", HIVE_PASS_OPT_KEY -> "", HIVE_DATABASE_OPT_KEY -> hiveDatabaseName, HIVE_TABLE_OPT_KEY -> hiveTableName, HIVE_SYNC_ENABLED_OPT_KEY -> "true", HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH -> "true", HoodieWriteConfig.TABLE_NAME -> hiveTableName, HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> classOf[MultiPartKeysValueExtractor].getName, HoodieIndexConfig.INDEX_TYPE_PROP -> HoodieIndex.IndexType.GLOBAL_BLOOM.name(), "hoodie.insert.shuffle.parallelism" -> parallelism, "hoodie.upsert.shuffle.parallelism" -> parallelism This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-1692) Bounded source for stream writer
[ https://issues.apache.org/jira/browse/HUDI-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1692. -- Resolution: Done 76bf2cc790edc0be10dfa2454c42687c38e7e5fc > Bounded source for stream writer > > > Key: HUDI-1692 > URL: https://issues.apache.org/jira/browse/HUDI-1692 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > Supports bounded source such as VALUES for stream mode writer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated (fc6c5f4 -> 76bf2cc)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from fc6c5f4 [HUDI-1684] Tweak hudi-flink-bundle module pom and reorganize the pacakges for hudi-flink module (#2669) add 76bf2cc [HUDI-1692] Bounded source for stream writer (#2674) No new revisions were added by this update. Summary of changes: .../org/apache/hudi/sink/StreamWriteFunction.java | 18 +++-- .../hudi/sink/StreamWriteOperatorCoordinator.java | 26 +++ .../hudi/sink/StreamWriteOperatorFactory.java | 13 +--- .../hudi/sink/event/BatchWriteSuccessEvent.java| 79 +++--- .../org/apache/hudi/table/HoodieTableFactory.java | 2 +- .../org/apache/hudi/table/HoodieTableSink.java | 6 +- .../sink/TestStreamWriteOperatorCoordinator.java | 30 ++-- .../sink/utils/StreamWriteFunctionWrapper.java | 2 +- .../apache/hudi/table/HoodieDataSourceITCase.java | 21 -- 9 files changed, 135 insertions(+), 62 deletions(-)
[GitHub] [hudi] yanghua merged pull request #2674: [HUDI-1692] Bounded source for stream writer
yanghua merged pull request #2674: URL: https://github.com/apache/hudi/pull/2674 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2650: [HUDI-1694] Preparation for Avro update
codecov-io commented on pull request #2650: URL: https://github.com/apache/hudi/pull/2650#issuecomment-799292031 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2650?src=pr&el=h1) Report > Merging [#2650](https://codecov.io/gh/apache/hudi/pull/2650?src=pr&el=desc) (1a5cb70) into [master](https://codecov.io/gh/apache/hudi/commit/899ae70fdb70c1511c099a64230fd91b2fe8d4ee?el=desc) (899ae70) will **increase** coverage by `0.39%`. > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2650/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2650?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2650 +/- ## + Coverage 51.58% 51.97% +0.39% - Complexity 3285 3579 +294 Files 446 466 +20 Lines 2040922275+1866 Branches 2116 2374 +258 + Hits 1052811578+1050 - Misses 9003 9689 +686 - Partials878 1008 +130 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (+0.14%)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.46% <0.00%> (+0.04%)` | `0.00 <0.00> (ø)` | | | hudiflink | `53.57% <ø> (+2.28%)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.44% <ø> (+0.28%)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `69.84% <ø> (+0.17%)` | `0.00 <ø> (ø)` | | | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.48% <ø> (+0.04%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2650?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | | | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `68.31% <0.00%> (-2.46%)` | `43.00% <0.00%> (-1.00%)` | | | [...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh) | `82.66% <0.00%> (-2.27%)` | `59.00% <0.00%> (ø%)` | | | [...che/hudi/common/table/log/HoodieLogFileReader.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGaWxlUmVhZGVyLmphdmE=) | `66.09% <0.00%> (-1.77%)` | `23.00% <0.00%> (+1.00%)` | :arrow_down: | | [.../hadoop/utils/HoodieRealtimeRecordReaderUtils.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3V0aWxzL0hvb2RpZVJlYWx0aW1lUmVjb3JkUmVhZGVyVXRpbHMuamF2YQ==) | `71.79% <0.00%> (-1.25%)` | `30.00% <0.00%> (ø%)` | | | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.34% <0.00%> (-0.37%)` | `53.00% <0.00%> (+1.00%)` | :arrow_down: | | [...ies/sources/helpers/DatePartitionPathSelector.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9EYXRlUGFydGl0aW9uUGF0aFNlbGVjdG9yLmphdmE=) | `54.68% <0.00%> (-0.16%)` | `13.00% <0.00%> (ø%)` | | | [...src/main/java/org/apache/hudi/sink/CommitSink.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL0NvbW1pdFNpbmsuamF2YQ==) | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | | | [.../org/apache/hudi/util/RowDataToAvroConverters.java](https://codecov.io/gh/apache/hudi/pull/2650/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL1Jvd0RhdGFUb0F2cm9Db252ZXJ0ZXJzLmphdmE=) | `42.05% <0.00%> (ø)` | `8.00
[GitHub] [hudi] codecov-io edited a comment on pull request #2674: [HUDI-1692] Bounded source for stream writer
codecov-io edited a comment on pull request #2674: URL: https://github.com/apache/hudi/pull/2674#issuecomment-799291092 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=h1) Report > Merging [#2674](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=desc) (8419357) into [master](https://codecov.io/gh/apache/hudi/commit/fc6c5f4285098d18cd7f6e81785f59e68a3b6862?el=desc) (fc6c5f4) will **increase** coverage by `0.06%`. > The diff coverage is `83.33%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2674/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2674 +/- ## + Coverage 51.96% 52.03% +0.06% - Complexity 3579 3580 +1 Files 466 466 Lines 2227522294 +19 Branches 2374 2374 + Hits 1157611601 +25 + Misses 9690 9685 -5 + Partials 1009 1008 -1 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.46% <ø> (+0.01%)` | `0.00 <ø> (ø)` | | | hudiflink | `53.96% <83.33%> (+0.39%)` | `0.00 <4.00> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `69.84% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...g/apache/hudi/sink/StreamWriteOperatorFactory.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JGYWN0b3J5LmphdmE=) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | | | [...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh) | `69.37% <20.00%> (+0.23%)` | `32.00 <0.00> (ø)` | | | [...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==) | `14.28% <50.00%> (-2.39%)` | `2.00 <1.00> (ø)` | | | [...java/org/apache/hudi/sink/StreamWriteFunction.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlRnVuY3Rpb24uamF2YQ==) | `85.04% <90.90%> (+1.04%)` | `22.00 <0.00> (ø)` | | | [...apache/hudi/sink/event/BatchWriteSuccessEvent.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2V2ZW50L0JhdGNoV3JpdGVTdWNjZXNzRXZlbnQuamF2YQ==) | `92.30% <100.00%> (+6.59%)` | `9.00 <3.00> (+1.00)` | | | [...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==) | `76.92% <100.00%> (ø)` | `5.00 <0.00> (ø)` | | | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `79.68% <0.00%> (+1.56%)` | `26.00% <0.00%> (ø%)` | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2674: [HUDI-1692] Bounded source for stream writer
codecov-io commented on pull request #2674: URL: https://github.com/apache/hudi/pull/2674#issuecomment-799291092 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=h1) Report > Merging [#2674](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=desc) (8419357) into [master](https://codecov.io/gh/apache/hudi/commit/fc6c5f4285098d18cd7f6e81785f59e68a3b6862?el=desc) (fc6c5f4) will **decrease** coverage by `0.07%`. > The diff coverage is `83.33%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2674/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2674 +/- ## - Coverage 51.96% 51.89% -0.08% + Complexity 3579 3390 -189 Files 466 445 -21 Lines 2227520783-1492 Branches 2374 2229 -145 - Hits 1157610785 -791 + Misses 9690 9065 -625 + Partials 1009 933 -76 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.46% <ø> (+0.01%)` | `0.00 <ø> (ø)` | | | hudiflink | `53.96% <83.33%> (+0.39%)` | `0.00 <4.00> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `69.84% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2674?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...g/apache/hudi/sink/StreamWriteOperatorFactory.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JGYWN0b3J5LmphdmE=) | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | | | [...ache/hudi/sink/StreamWriteOperatorCoordinator.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlT3BlcmF0b3JDb29yZGluYXRvci5qYXZh) | `69.37% <20.00%> (+0.23%)` | `32.00 <0.00> (ø)` | | | [...in/java/org/apache/hudi/table/HoodieTableSink.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZVNpbmsuamF2YQ==) | `14.28% <50.00%> (-2.39%)` | `2.00 <1.00> (ø)` | | | [...java/org/apache/hudi/sink/StreamWriteFunction.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL1N0cmVhbVdyaXRlRnVuY3Rpb24uamF2YQ==) | `85.04% <90.90%> (+1.04%)` | `22.00 <0.00> (ø)` | | | [...apache/hudi/sink/event/BatchWriteSuccessEvent.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zaW5rL2V2ZW50L0JhdGNoV3JpdGVTdWNjZXNzRXZlbnQuamF2YQ==) | `92.30% <100.00%> (+6.59%)` | `9.00 <3.00> (+1.00)` | | | [...java/org/apache/hudi/table/HoodieTableFactory.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS90YWJsZS9Ib29kaWVUYWJsZUZhY3RvcnkuamF2YQ==) | `76.92% <100.00%> (ø)` | `5.00 <0.00> (ø)` | | | [.../org/apache/hudi/hive/HoodieHiveSyncException.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZVN5bmNFeGNlcHRpb24uamF2YQ==) | | | | | [...java/org/apache/hudi/hive/util/HiveSchemaUtil.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9IaXZlU2NoZW1hVXRpbC5qYXZh) | | | | | [...src/main/java/org/apache/hudi/dla/DLASyncTool.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL0RMQVN5bmNUb29sLmphdmE=) | | | | | [...main/java/org/apache/hudi/dla/HoodieDLAClient.java](https://codecov.io/gh/apache/hudi/pull/2674/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL0hvb2RpZURMQUNsaWVudC5qYXZh) | | | | | ... and [18 more](https://codecov.io/gh/apache/
[jira] [Created] (HUDI-1694) Preparation for Avro update
Sebastian Bernauer created HUDI-1694: Summary: Preparation for Avro update Key: HUDI-1694 URL: https://issues.apache.org/jira/browse/HUDI-1694 Project: Apache Hudi Issue Type: Task Components: Code Cleanup Reporter: Sebastian Bernauer We need to upgrade to at least Avro 1.9.x in production so i tried upgrading the avro version in the pom.xml of Hudi. Doing so i noticed some problems: Upgrade to Avro 1.9.2: * Renamed method defaultValue to defaultVal * Moved NullNode.getInstance() to JsonProperties.NULL_VALUE * Avro complains about invalid schemas/default values in hudi-common/src/main/avro/ * The shaded guava libs from Avro have been removed Upgrade to Avro 1.10.1: * Some more stuff (Not handled in this PR) Spark 3.2.0 (we currently use 3.1.1) will contain Avro 1.10.1 (https://issues.apache.org/jira/browse/SPARK-27733). Ín order to reduce the effort switching to a newer Avro version in the future i provided a patch that fixes the above mentioned issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] aditiwari01 opened a new issue #2675: [SUPPORT] Unable to query MOR table after schema evolution
aditiwari01 opened a new issue #2675: URL: https://github.com/apache/hudi/issues/2675 As per HUDI confluence (https://cwiki.apache.org/confluence/display/HUDI/FAQ#FAQ-What'sHudi'sschemaevolutionstory), as long as schema is backward compatible, hudi will support seamless read/writes. However, when I try to add a new column to my MOR table, I can successfully keep on writing but I can only read in read_optimised manner and not in snapshot manner. The snapshot query fails with **org.apache.avro.AvroTypeException:missing required field newCol**. Attaching sample spark-shell commands to reproduce the issue on dummy data: [Hudi_sample_commands.txt](https://github.com/apache/hudi/files/6139970/Hudi_sample_commands.txt) With some debugging the issue seems to be in: https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java#L127 When we try to deserialize the older payloads into newer schema(with nullable new column), it fails with the above error. I tried a workaround wherein if (readerSchema != writerSchema), read as writerSchema then convert the payload to readerSchema. This approach is working fine for me in my POCs. However, since Hudi guarnatees schema evolution, I would like to know if I'm missing some config or is this a bug? And how does my workaround fits in case if it's a bug? We have a usecase where we do not want to constraint on backword compatible schema changes and we see MOR as viable fit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1692) Bounded source for stream writer
[ https://issues.apache.org/jira/browse/HUDI-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1692: - Labels: pull-request-available (was: ) > Bounded source for stream writer > > > Key: HUDI-1692 > URL: https://issues.apache.org/jira/browse/HUDI-1692 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > Supports bounded source such as VALUES for stream mode writer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] danny0405 opened a new pull request #2674: [HUDI-1692] Bounded source for stream writer
danny0405 opened a new pull request #2674: URL: https://github.com/apache/hudi/pull/2674 Supports bounded source such as VALUES for stream mode writer. ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] sbernauer commented on a change in pull request #2650: Preparation for Avro update
sbernauer commented on a change in pull request #2650: URL: https://github.com/apache/hudi/pull/2650#discussion_r594155813 ## File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/simple/FlinkHoodieSimpleIndex.java ## @@ -135,8 +133,8 @@ public boolean isImplicitWithStorage() { context.map(latestBaseFiles, partitionPathBaseFile -> new HoodieKeyLocationFetchHandle<>(config, hoodieTable, partitionPathBaseFile), parallelism); Map recordLocations = new HashMap<>(); hoodieKeyLocationFetchHandles.stream() -.flatMap(handle -> Lists.newArrayList(handle.locations()).stream()) -.forEach(x -> x.forEach(y -> recordLocations.put(y.getKey(), y.getRight(; +.flatMap(handle -> handle.locations()) Review comment: This was changed because of the removal of the used guava libs ;) ## File path: hudi-common/src/main/avro/HoodieRestoreMetadata.avsc ## @@ -38,7 +38,6 @@ /* overlaps with 'instantsToRollback' field. Adding this to track action type for all the instants being rolled back. */ { "name": "restoreInstantInfo", - "default": null, Review comment: The default value of null doesn't match top a field with type array. Instead of removing the default value of `null` i now changed it to an empty array `[]` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1693) Add document about HUDI Flink integration
[ https://issues.apache.org/jira/browse/HUDI-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-1693: - Summary: Add document about HUDI Flink integration (was: Bounded source for stream writer) > Add document about HUDI Flink integration > - > > Key: HUDI-1693 > URL: https://issues.apache.org/jira/browse/HUDI-1693 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 0.8.0 > > > Supports bounded source such as VALUES for stream mode writer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1693) Bounded source for stream writer
Danny Chen created HUDI-1693: Summary: Bounded source for stream writer Key: HUDI-1693 URL: https://issues.apache.org/jira/browse/HUDI-1693 Project: Apache Hudi Issue Type: Sub-task Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.8.0 Supports bounded source such as VALUES for stream mode writer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1692) Bounded source for stream writer
Danny Chen created HUDI-1692: Summary: Bounded source for stream writer Key: HUDI-1692 URL: https://issues.apache.org/jira/browse/HUDI-1692 Project: Apache Hudi Issue Type: Sub-task Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.8.0 Supports bounded source such as VALUES for stream mode writer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #2673: [HUDI-1688]hudi write should uncache rdd, when the write operation is finished
xiarixiaoyao edited a comment on pull request #2673: URL: https://github.com/apache/hudi/pull/2673#issuecomment-799059684 cc @garyli1019 @nsivabalan , could you help me review this pr, thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Sugamber commented on issue #2637: [SUPPORT] - Partial Update : update few columns of a table
Sugamber commented on issue #2637: URL: https://github.com/apache/hudi/issues/2637#issuecomment-799211234 @nsivabalan Yes, I have added the jar file in both driver and executor class path. `spark-submit --jars /path/lib/orders-poc-1.0.41-SNAPSHOT-shaded.jar,/path/hudi-support-jars/org.apache.avro_avro-1.8.2.jar,/path/hudi-support-jars/spark-avro_2.11-2.4.4.jar,/path/hudi-support-jars/hudi-spark-bundle_2.11-0.7.0.jar --master yarn --deploy-mode cluster --num-executors 2 --executor-cores 4 --executor-memory 8g --driver-memory=8g --queue=default --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.driver.extraClassPath=org.apache.avro_avro-1.8.2.jar:spark-avro_2.11-2.4.4.jar:hudi-spark-bundle_2.11-0.7.0.jar:/path/lib/orders-poc-1.0.41-SNAPSHOT-shaded.jar --conf spark.executor.extraClassPath=org.apache.avro_avro-1.8.2.jar:spark-avro_2.11-2.4.4.jar:hudi-spark-bundle_2.11-0.7.0.jar:/path/lib/orders-poc-1.0.41-SNAPSHOT-shaded.jar --files /path/hive-site.xml,/path/resources/hudiConf.conf --class com.app.workflows.RecordPartialUpdate lib/orders-poc-1.0.41-SNAPSHOT-shaded.jar/` I'm able to find class name in jar using linux command. `find /path/orders-poc-1.0.41-SNAPSHOT-shaded.jar|xargs grep CustomRecordUpdate Binary file /path/orders-poc-1.0.41-SNAPSHOT-shaded.jar matches` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-1684) Tweak hudi-flink-bundle module pom and re-organize the pacakges for hudi-flink module
[ https://issues.apache.org/jira/browse/HUDI-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1684. -- Resolution: Done fc6c5f4285098d18cd7f6e81785f59e68a3b6862 > Tweak hudi-flink-bundle module pom and re-organize the pacakges for > hudi-flink module > - > > Key: HUDI-1684 > URL: https://issues.apache.org/jira/browse/HUDI-1684 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > - Add required dependencies for hudi-flink-bundle module > - Some package reorganize of hudi-flink module -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1684) Tweak hudi-flink-bundle module pom and re-organize the pacakges for hudi-flink module
[ https://issues.apache.org/jira/browse/HUDI-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned HUDI-1684: -- Assignee: Danny Chen > Tweak hudi-flink-bundle module pom and re-organize the pacakges for > hudi-flink module > - > > Key: HUDI-1684 > URL: https://issues.apache.org/jira/browse/HUDI-1684 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > - Add required dependencies for hudi-flink-bundle module > - Some package reorganize of hudi-flink module -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated (e93c6a5 -> fc6c5f4)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from e93c6a5 [HUDI-1496] Fixing input stream detection of GCS FileSystem (#2500) add fc6c5f4 [HUDI-1684] Tweak hudi-flink-bundle module pom and reorganize the pacakges for hudi-flink module (#2669) No new revisions were added by this update. Summary of changes: .../{operator => configuration}/FlinkOptions.java | 6 +- .../hudi/schema/FilebasedSchemaProvider.java | 2 +- .../main/java/org/apache/hudi/sink/CommitSink.java | 1 - .../InstantGenerateOperator.java | 8 +- .../KeyedWriteProcessFunction.java | 6 +- .../KeyedWriteProcessOperator.java | 3 +- .../{operator => sink}/StreamWriteFunction.java| 5 +- .../{operator => sink}/StreamWriteOperator.java| 2 +- .../StreamWriteOperatorCoordinator.java| 5 +- .../StreamWriteOperatorFactory.java| 2 +- .../compact/CompactFunction.java | 2 +- .../compact/CompactionCommitEvent.java | 2 +- .../compact/CompactionCommitSink.java | 2 +- .../compact/CompactionPlanEvent.java | 2 +- .../compact/CompactionPlanOperator.java| 2 +- .../event/BatchWriteSuccessEvent.java | 6 +- .../partitioner/BucketAssignFunction.java | 4 +- .../partitioner/BucketAssigner.java| 2 +- .../partitioner/BucketAssigners.java | 4 +- .../partitioner/delta/DeltaBucketAssigner.java | 4 +- .../JsonStringToHoodieRecordMapFunction.java | 2 +- .../transform/RowDataToHoodieFunction.java | 4 +- .../StreamReadMonitoringFunction.java | 11 +- .../{operator => source}/StreamReadOperator.java | 6 +- .../apache/hudi/streamer/HoodieFlinkStreamer.java | 12 +- .../hudi/streamer/HoodieFlinkStreamerV2.java | 8 +- .../{factory => table}/HoodieTableFactory.java | 6 +- .../hudi/{sink => table}/HoodieTableSink.java | 20 +-- .../hudi/{source => table}/HoodieTableSource.java | 20 +-- .../{source => table}/format/FilePathUtils.java| 4 +- .../hudi/{source => table}/format/FormatUtils.java | 6 +- .../format/cow/AbstractColumnReader.java | 2 +- .../format/cow/CopyOnWriteInputFormat.java | 2 +- .../format/cow/Int64TimestampColumnReader.java | 2 +- .../format/cow/ParquetColumnarRowSplitReader.java | 6 +- .../format/cow/ParquetDecimalVector.java | 2 +- .../format/cow/ParquetSplitReaderUtil.java | 2 +- .../format/cow/RunLengthDecoder.java | 2 +- .../{source => table}/format/mor/InstantRange.java | 2 +- .../format/mor/MergeOnReadInputFormat.java | 14 +- .../format/mor/MergeOnReadInputSplit.java | 2 +- .../format/mor/MergeOnReadTableState.java | 2 +- .../apache/hudi/util/RowDataToAvroConverters.java | 9 +- .../java/org/apache/hudi/util/StreamerUtil.java| 2 +- .../org.apache.flink.table.factories.TableFactory | 2 +- .../hudi/{operator => sink}/StreamWriteITCase.java | 22 +-- .../TestStreamWriteOperatorCoordinator.java} | 8 +- .../{operator => sink}/TestWriteCopyOnWrite.java | 11 +- .../{operator => sink}/TestWriteMergeOnRead.java | 4 +- .../TestWriteMergeOnReadWithCompact.java | 3 +- .../partitioner/TestBucketAssigner.java| 4 +- .../TestJsonStringToHoodieRecordMapFunction.java | 2 +- .../utils/CompactFunctionWrapper.java | 14 +- .../utils/MockFunctionInitializationContext.java | 2 +- .../{operator => sink}/utils/MockMapState.java | 2 +- .../utils/MockOperatorStateStore.java | 2 +- .../utils/MockStreamingRuntimeContext.java | 2 +- .../utils/StreamWriteFunctionWrapper.java | 15 +- .../source/TestStreamReadMonitoringFunction.java | 9 +- .../apache/hudi/source/TestStreamReadOperator.java | 16 +- .../{source => table}/HoodieDataSourceITCase.java | 10 +- .../{factory => table}/TestHoodieTableFactory.java | 8 +- .../{source => table}/TestHoodieTableSource.java | 10 +- .../{source => table}/format/TestInputFormat.java | 10 +- .../{operator => }/utils/TestConfigurations.java | 4 +- .../apache/hudi/{operator => }/utils/TestData.java | 5 +- .../test/java/org/apache/hudi/utils/TestUtils.java | 6 +- .../utils/factory/CollectSinkTableFactory.java | 2 +- .../utils/factory/ContinuousFileSourceFactory.java | 2 +- .../hudi/utils/source/ContinuousFileSource.java| 2 +- .../org.apache.flink.table.factories.TableFactory | 2 +- packaging/hudi-flink-bundle/pom.xml| 163 - 72 files changed, 357 insertions(+), 203 deletions(-) rename hudi-flink/src/main/java/org/apache/hudi/{operator => configuration}/Fli
[GitHub] [hudi] yanghua merged pull request #2669: [HUDI-1684] Tweak hudi-flink-bundle module pom and reorganize the pa…
yanghua merged pull request #2669: URL: https://github.com/apache/hudi/pull/2669 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] shenbinglife closed issue #2652: [SUPPORT] I have some questions for hudi clustering
shenbinglife closed issue #2652: URL: https://github.com/apache/hudi/issues/2652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] shenbinglife commented on issue #2652: [SUPPORT] I have some questions for hudi clustering
shenbinglife commented on issue #2652: URL: https://github.com/apache/hudi/issues/2652#issuecomment-799201945 Thanks a lot This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-799193666 @n3nash @satishkotha @yanghua This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #2666: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on pull request #2666: URL: https://github.com/apache/hudi/pull/2666#issuecomment-799193044 The content in this pr https://github.com/apache/hudi/pull/1929 comment is resolved here, please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #2374: [HUDI-845] Added locking capability to allow multiple writers
n3nash commented on pull request #2374: URL: https://github.com/apache/hudi/pull/2374#issuecomment-799186819 @vinothchandar Build succeeds locally and should pass on jenkins (will check tomorrow morning), ready for review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2673: [HUDI-1688]hudi write should uncache rdd, when the write operation is finished
codecov-io edited a comment on pull request #2673: URL: https://github.com/apache/hudi/pull/2673#issuecomment-799061373 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=h1) Report > Merging [#2673](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=desc) (2ecca39) into [master](https://codecov.io/gh/apache/hudi/commit/e93c6a569310ce55c5a0fc0655328e7fd32a9da2?el=desc) (e93c6a5) will **increase** coverage by `0.01%`. > The diff coverage is `85.71%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2673/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2673 +/- ## + Coverage 51.99% 52.00% +0.01% Complexity 3580 3580 Files 466 466 Lines 2227522282 +7 Branches 2374 2375 +1 + Hits 1158111587 +6 - Misses 9686 9687 +1 Partials 1008 1008 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.49% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiflink | `53.57% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `69.91% <85.71%> (+0.07%)` | `0.00 <0.00> (ø)` | | | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `64.36% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh) | `52.15% <85.71%> (+0.79%)` | `0.00 <0.00> (ø)` | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2673: [HUDI-1688]hudi write should uncache rdd, when the write operation is finished
codecov-io edited a comment on pull request #2673: URL: https://github.com/apache/hudi/pull/2673#issuecomment-799061373 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=h1) Report > Merging [#2673](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=desc) (2ecca39) into [master](https://codecov.io/gh/apache/hudi/commit/e93c6a569310ce55c5a0fc0655328e7fd32a9da2?el=desc) (e93c6a5) will **decrease** coverage by `0.13%`. > The diff coverage is `85.71%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2673/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master#2673 +/- ## - Coverage 51.99% 51.85% -0.14% + Complexity 3580 3390 -190 Files 466 445 -21 Lines 2227520771-1504 Branches 2374 2230 -144 - Hits 1158110771 -810 + Misses 9686 9067 -619 + Partials 1008 933 -75 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `37.01% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.49% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiflink | `53.57% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.44% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `69.91% <85.71%> (+0.07%)` | `0.00 <0.00> (ø)` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2673?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh) | `52.15% <85.71%> (+0.79%)` | `0.00 <0.00> (ø)` | | | [...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=) | | | | | [.../apache/hudi/timeline/service/TimelineService.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvVGltZWxpbmVTZXJ2aWNlLmphdmE=) | | | | | [...va/org/apache/hudi/hive/util/ColumnNameXLator.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db2x1bW5OYW1lWExhdG9yLmphdmE=) | | | | | [...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh) | | | | | [.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=) | | | | | [...udi/timeline/service/handlers/TimelineHandler.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvVGltZWxpbmVIYW5kbGVyLmphdmE=) | | | | | [...main/java/org/apache/hudi/hive/HiveSyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNDb25maWcuamF2YQ==) | | | | | [...c/main/java/org/apache/hudi/dla/DLASyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL0RMQVN5bmNDb25maWcuamF2YQ==) | | | | | [...in/java/org/apache/hudi/hive/SchemaDifference.java](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2NoZW1hRGlmZmVyZW5jZS5qYXZh) | | | | | ... and [12 more](https://codecov.io/gh/apache/hudi/pull/2673/diff?src=pr&el=tree-more) | |
[GitHub] [hudi] liujinhui1994 commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table
liujinhui1994 commented on pull request #1929: URL: https://github.com/apache/hudi/pull/1929#issuecomment-799175366 @nsivabalan This PR has been modified based on the comments, https://github.com/apache/hudi/pull/2666 Please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2669: [HUDI-1684] Tweak hudi-flink-bundle module pom and reorganize the pa…
codecov-io edited a comment on pull request #2669: URL: https://github.com/apache/hudi/pull/2669#issuecomment-797515929 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2669?src=pr&el=h1) Report > Merging [#2669](https://codecov.io/gh/apache/hudi/pull/2669?src=pr&el=desc) (50d4722) into [master](https://codecov.io/gh/apache/hudi/commit/20786ab8a2a1e7735ab846e92802fb9f4449adc9?el=desc) (20786ab) will **decrease** coverage by `42.48%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2669/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2669?src=pr&el=tree) ```diff @@ Coverage Diff @@ ## master #2669 +/- ## - Coverage 52.00% 9.52% -42.49% + Complexity 3579 48 -3531 Files 465 53 -412 Lines 222681963-20305 Branches 2375 235 -2140 - Hits 11581 187-11394 + Misses 96761763 -7913 + Partials 1011 13 -998 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.52% <ø> (-60.02%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2669?src=pr&el=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2669/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm
[jira] [Created] (HUDI-1691) Enrich HDFS data importer
vinoyang created HUDI-1691: -- Summary: Enrich HDFS data importer Key: HUDI-1691 URL: https://issues.apache.org/jira/browse/HUDI-1691 Project: Apache Hudi Issue Type: Improvement Components: Utilities Reporter: vinoyang Currently, hudi has a utility class named {{HDFSParquetImporter}} , it is used to import parquet dataset from HDFS to be a hudi dataset. This class has a {{format}} config option, however, it's useless. We'd better enhance this importer or introduce other importers to support multiple HDFS input formats. -- This message was sent by Atlassian Jira (v8.3.4#803005)