[GitHub] [hudi] codecov-io commented on pull request #2520: [HUDI-1446] Support skip bootstrapIndex's init in abstract fs view init
codecov-io commented on pull request #2520: URL: https://github.com/apache/hudi/pull/2520#issuecomment-787713017 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2520?src=pr=h1) Report > Merging [#2520](https://codecov.io/gh/apache/hudi/pull/2520?src=pr=desc) (ef091c8) into [master](https://codecov.io/gh/apache/hudi/commit/0d8a4d0a56dcb35e499216c7bfab17a05716bc44?el=desc) (0d8a4d0) will **decrease** coverage by `40.90%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2520/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2520?src=pr=tree) ```diff @@ Coverage Diff @@ ## master #2520 +/- ## - Coverage 50.52% 9.61% -40.91% + Complexity 3122 48 -3074 Files 430 53 -377 Lines 195971944-17653 Branches 2008 235 -1773 - Hits 9902 187 -9715 + Misses 88861744 -7142 + Partials809 13 -796 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `9.61% <ø> (-59.82%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2520?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | | | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | | | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | | | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | | | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | | | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%>
[GitHub] [hudi] n3nash commented on pull request #2374: [HUDI-845] Added locking capability to allow multiple writers
n3nash commented on pull request #2374: URL: https://github.com/apache/hudi/pull/2374#issuecomment-787681396 @vinothchandar Code is ready for review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat
n3nash commented on a change in pull request #2611: URL: https://github.com/apache/hudi/pull/2611#discussion_r584465066 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java ## @@ -62,6 +67,7 @@ public static final String HOODIE_STOP_AT_COMPACTION_PATTERN = "hoodie.%s.ro.stop.at.compaction"; public static final String INCREMENTAL_SCAN_MODE = "INCREMENTAL"; public static final String SNAPSHOT_SCAN_MODE = "SNAPSHOT"; + public static final String VALIDATE_SCAN_MODE = "VALIDATE"; //used for pre-commit validation Review comment: @satishkotha On thinking about this a little deeper, I feel one should be able to do "validate" in both modes `SNAPSHOT` & `INCREMENTAL`. Essentially, what you want to do is a `SNAPSHOT @ commitTime` or `Incremental from or @` which is what time travel allows but ensures that we read only committed data. To keep concepts this way, you may want to just have a flag saying `hoodie.%s.consume.uncommitted` whose default value is false, you always fall back to the `HoodieTableFileSystem` with current behavior, if it's set to true, then you do what you are currently doing in "VALIDATE" scan mode for snapshot mode to start with (It's hard for me to reason what incremental validate would look like). What do you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat
n3nash commented on a change in pull request #2611: URL: https://github.com/apache/hudi/pull/2611#discussion_r584465066 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java ## @@ -62,6 +67,7 @@ public static final String HOODIE_STOP_AT_COMPACTION_PATTERN = "hoodie.%s.ro.stop.at.compaction"; public static final String INCREMENTAL_SCAN_MODE = "INCREMENTAL"; public static final String SNAPSHOT_SCAN_MODE = "SNAPSHOT"; + public static final String VALIDATE_SCAN_MODE = "VALIDATE"; //used for pre-commit validation Review comment: @satishkotha On thinking about this a little deeper, I feel one should be able to do "validate" in both modes `SNAPSHOT` & `INCREMENTAL`. Essentially, what you want to do is a `SNAPSHOT @ commitTime` or `Incremental from or @` which is what time travel allows but ensures that we read only committed data. To keep concepts this way, you may want to just have a flag saying `hoodie.%s.consume.uncommitted` whose default value is false, you always fall back to the `HoodieTableFileSystem` with current behavior, if it's set to true, then you do what you are currently doing in "VALIDATE" scan mode. What do you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on a change in pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat
n3nash commented on a change in pull request #2611: URL: https://github.com/apache/hudi/pull/2611#discussion_r584465066 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java ## @@ -62,6 +67,7 @@ public static final String HOODIE_STOP_AT_COMPACTION_PATTERN = "hoodie.%s.ro.stop.at.compaction"; public static final String INCREMENTAL_SCAN_MODE = "INCREMENTAL"; public static final String SNAPSHOT_SCAN_MODE = "SNAPSHOT"; + public static final String VALIDATE_SCAN_MODE = "VALIDATE"; //used for pre-commit validation Review comment: @satishkotha On thinking about this a little deeper, I feel one should be able to do "validate" in both modes `SNAPSHOT` & `INCREMENTAL`. Essentially, what you want to do is a `SNAPSHOT @ commitTime` which is what time travel allows but ensures that we read only committed data. To keep concepts this way, you may want to just have a flag saying `hoodie.%s.consume.uncommitted` whose default value is false, you always fall back to the `HoodieTableFileSystem` with current behavior, if it's set to true, then you do what you are currently doing in "VALIDATE" scan mode. What do you think ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2580: [HUDI 1623] Introduce start & end commit times to timeline
codecov-io edited a comment on pull request #2580: URL: https://github.com/apache/hudi/pull/2580#issuecomment-780218110 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=h1) Report > Merging [#2580](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=desc) (326d233) into [master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc) (be257b5) will **increase** coverage by `0.28%`. > The diff coverage is `73.80%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2580/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2580 +/- ## + Coverage 51.27% 51.56% +0.28% - Complexity 3241 3295 +54 Files 438 446 +8 Lines 2012620368 +242 Branches 2079 2106 +27 + Hits 1032010502 +182 - Misses 8954 8997 +43 - Partials852 869 +17 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `36.87% <50.00%> (ø)` | `0.00 <0.00> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.31% <73.07%> (-0.05%)` | `0.00 <16.00> (ø)` | | | hudiflink | `51.39% <ø> (+4.53%)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.16% <100.00%> (ø)` | `0.00 <0.00> (ø)` | | | hudisparkdatasource | `69.71% <100.00%> (ø)` | `0.00 <2.00> (ø)` | | | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.59% <ø> (+0.15%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh) | `81.57% <0.00%> (-3.36%)` | `59.00 <0.00> (ø)` | | | [...che/hudi/common/table/timeline/HoodieTimeline.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZVRpbWVsaW5lLmphdmE=) | `91.30% <ø> (ø)` | `44.00 <0.00> (ø)` | | | [...able/timeline/versioning/InstantTimeFormatter.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvSW5zdGFudFRpbWVGb3JtYXR0ZXIuamF2YQ==) | `38.46% <38.46%> (ø)` | `3.00 <3.00> (?)` | | | [...a/org/apache/hudi/cli/commands/CommitsCommand.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0NvbW1pdHNDb21tYW5kLmphdmE=) | `53.50% <50.00%> (ø)` | `15.00 <0.00> (ø)` | | | [...ache/hudi/common/table/timeline/HoodieInstant.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUluc3RhbnQuamF2YQ==) | `86.53% <82.00%> (-7.78%)` | `51.00 <12.00> (+6.00)` | :arrow_down: | | [...che/hudi/common/table/timeline/TimelineLayout.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTGF5b3V0LmphdmE=) | `95.00% <83.33%> (-5.00%)` | `3.00 <0.00> (ø)` | | | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `70.76% <100.00%> (ø)` | `44.00 <0.00> (ø)` | | | [...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=) | `70.45% <100.00%> (-0.33%)` | `42.00 <1.00> (-1.00)` | | | [...ble/timeline/versioning/TimelineLayoutVersion.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvVGltZWxpbmVMYXlvdXRWZXJzaW9uLmphdmE=) | `66.66% <100.00%> (+1.66%)` | `7.00 <0.00> (ø)` | | |
[GitHub] [hudi] codecov-io edited a comment on pull request #2580: [HUDI 1623] Introduce start & end commit times to timeline
codecov-io edited a comment on pull request #2580: URL: https://github.com/apache/hudi/pull/2580#issuecomment-780218110 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-1647) Supports snapshot read for Flink
[ https://issues.apache.org/jira/browse/HUDI-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen reassigned HUDI-1647: Assignee: Danny Chen > Supports snapshot read for Flink > > > Key: HUDI-1647 > URL: https://issues.apache.org/jira/browse/HUDI-1647 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > > Support snapshot read for Flink for both MOR and COW table. > - COW: the parquet files for the latest file group slices > - MOR: the parquet base file + log files for the latest file group slices > Also implements the SQL connectors for both slink and source. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-1647) Supports snapshot read for Flink
Danny Chen created HUDI-1647: Summary: Supports snapshot read for Flink Key: HUDI-1647 URL: https://issues.apache.org/jira/browse/HUDI-1647 Project: Apache Hudi Issue Type: Sub-task Components: Flink Integration Reporter: Danny Chen Support snapshot read for Flink for both MOR and COW table. - COW: the parquet files for the latest file group slices - MOR: the parquet base file + log files for the latest file group slices Also implements the SQL connectors for both slink and source. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1638) Some improvements to BucketAssignFunction
[ https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-1638. Fixed via master branch: 7a11de12764d8f68f296c6e68a22822318bfbefa > Some improvements to BucketAssignFunction > - > > Key: HUDI-1638 > URL: https://issues.apache.org/jira/browse/HUDI-1638 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > - The {{initializeState}} executes before {{open}}, thus, the > {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}} > - Only load the existing partitions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HUDI-1638) Some improvements to BucketAssignFunction
[ https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292631#comment-17292631 ] Danny Chen edited comment on HUDI-1638 at 3/1/21, 5:51 AM: --- Fixed via master branch: 06dc7c7fd8a867a1e1da90f7dc19b0cc2da69bba was (Author: danny0405): Fixed via master branch: 7a11de12764d8f68f296c6e68a22822318bfbefa > Some improvements to BucketAssignFunction > - > > Key: HUDI-1638 > URL: https://issues.apache.org/jira/browse/HUDI-1638 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > - The {{initializeState}} executes before {{open}}, thus, the > {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}} > - Only load the existing partitions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1638) Some improvements to BucketAssignFunction
[ https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-1638: - Status: In Progress (was: Open) > Some improvements to BucketAssignFunction > - > > Key: HUDI-1638 > URL: https://issues.apache.org/jira/browse/HUDI-1638 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > - The {{initializeState}} executes before {{open}}, thus, the > {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}} > - Only load the existing partitions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1638) Some improvements to BucketAssignFunction
[ https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-1638. -- Resolution: Fixed > Some improvements to BucketAssignFunction > - > > Key: HUDI-1638 > URL: https://issues.apache.org/jira/browse/HUDI-1638 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > - The {{initializeState}} executes before {{open}}, thus, the > {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}} > - Only load the existing partitions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
liujinhui1994 commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-787662595 The current implementation is mainly in KafkaOffsetGen @wangxianghu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 closed pull request #2337: [HUDI-982] Flink support mor table
liujinhui1994 closed pull request #2337: URL: https://github.com/apache/hudi/pull/2337 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-1638) Some improvements to BucketAssignFunction
[ https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen reassigned HUDI-1638: Fix Version/s: 0.8.0 Assignee: Danny Chen > Some improvements to BucketAssignFunction > - > > Key: HUDI-1638 > URL: https://issues.apache.org/jira/browse/HUDI-1638 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > - The {{initializeState}} executes before {{open}}, thus, the > {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}} > - Only load the existing partitions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on pull request #2612: [HUDI-1563] Adding hudi file sizing/ small file management blog
nsivabalan commented on pull request #2612: URL: https://github.com/apache/hudi/pull/2612#issuecomment-787658034 ![Screen Shot 2021-03-01 at 12 32 47 AM](https://user-images.githubusercontent.com/513218/109456334-9a3d8100-7a26-11eb-881e-5d1e2523185f.png) ![Screen Shot 2021-03-01 at 12 33 06 AM](https://user-images.githubusercontent.com/513218/109456342-a0336200-7a26-11eb-96a9-0210bba6bcfe.png) ![Screen Shot 2021-03-01 at 12 33 36 AM](https://user-images.githubusercontent.com/513218/109456348-a45f7f80-7a26-11eb-80ac-f4c947f33725.png) ![Screen Shot 2021-03-01 at 12 34 00 AM](https://user-images.githubusercontent.com/513218/109456358-a88b9d00-7a26-11eb-9395-5abc11b43886.png) ![Screen Shot 2021-03-01 at 12 34 10 AM](https://user-images.githubusercontent.com/513218/109456364-ad505100-7a26-11eb-9959-0ce62203f456.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] wangxianghu commented on pull request #2337: [HUDI-982] Flink support mor table
wangxianghu commented on pull request #2337: URL: https://github.com/apache/hudi/pull/2337#issuecomment-787657819 @liujinhui1994 It seems this pr is fixed by https://github.com/apache/hudi/commit/7a11de12764d8f68f296c6e68a22822318bfbefa ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1563) Documentation on small file handling
[ https://issues.apache.org/jira/browse/HUDI-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1563: - Labels: pull-request-available user-support-issues (was: user-support-issues) > Documentation on small file handling > > > Key: HUDI-1563 > URL: https://issues.apache.org/jira/browse/HUDI-1563 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available, user-support-issues > > Questions from slack: > how does Hudi handle small files. What all config knobs one has to play > around w/. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan opened a new pull request #2612: [HUDI-1563] Adding hudi file sizing/ small file management blog
nsivabalan opened a new pull request #2612: URL: https://github.com/apache/hudi/pull/2612 ## What is the purpose of the pull request *Adding hudi file sizing blog* ## Brief change log - *Adding hudi file sizing blog* ## Verify this pull request Built the site locally and verified ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-1563) Documentation on small file handling
[ https://issues.apache.org/jira/browse/HUDI-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-1563: - Assignee: sivabalan narayanan (was: Nishith Agarwal) > Documentation on small file handling > > > Key: HUDI-1563 > URL: https://issues.apache.org/jira/browse/HUDI-1563 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: user-support-issues > > Questions from slack: > how does Hudi handle small files. What all config knobs one has to play > around w/. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codecov-io edited a comment on pull request #2580: [HUDI 1623] Introduce start & end commit times to timeline
codecov-io edited a comment on pull request #2580: URL: https://github.com/apache/hudi/pull/2580#issuecomment-780218110 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=h1) Report > Merging [#2580](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=desc) (90d46f8) into [master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc) (be257b5) will **increase** coverage by `10.19%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2580/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2580 +/- ## = + Coverage 51.27% 61.47% +10.19% + Complexity 3241 324 -2917 = Files 438 53 -385 Lines 20126 1944-18182 Branches 2079 235 -1844 = - Hits 10320 1195 -9125 + Misses 8954 625 -8329 + Partials852 124 -728 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `61.47% <ø> (-7.98%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | `0.00% <0.00%> (-28.00%)` | | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | `0.00% <0.00%> (-2.00%)` | | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-2.00%)` | | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `41.86% <0.00%> (-22.68%)` | `27.00% <0.00%> (-6.00%)` | | | [...he/hudi/common/table/log/block/HoodieLogBlock.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVMb2dCbG9jay5qYXZh) | | | | | [...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=) | | | | | [.../hudi/common/table/view/FileSystemViewManager.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdNYW5hZ2VyLmphdmE=) | | | | | [...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==) | | | | | [.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh) | | | | | ... and [369 more](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree-more) | |
[GitHub] [hudi] codecov-io commented on pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat
codecov-io commented on pull request #2611: URL: https://github.com/apache/hudi/pull/2611#issuecomment-787641189 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2611?src=pr=h1) Report > Merging [#2611](https://codecov.io/gh/apache/hudi/pull/2611?src=pr=desc) (dc7874d) into [master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc) (be257b5) will **increase** coverage by `18.27%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2611/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2611?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2611 +/- ## = + Coverage 51.27% 69.54% +18.27% + Complexity 3241 363 -2878 = Files 438 53 -385 Lines 20126 1944-18182 Branches 2079 235 -1844 = - Hits 10320 1352 -8968 + Misses 8954 458 -8496 + Partials852 134 -718 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.54% <ø> (+0.10%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2611?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...he/hudi/common/table/log/block/HoodieLogBlock.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVMb2dCbG9jay5qYXZh) | | | | | [...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=) | | | | | [...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==) | | | | | [.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh) | | | | | [...org/apache/hudi/common/bloom/BloomFilterUtils.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVXRpbHMuamF2YQ==) | | | | | [...org/apache/hudi/common/config/TypedProperties.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9UeXBlZFByb3BlcnRpZXMuamF2YQ==) | | | | | [...ava/org/apache/hudi/cli/commands/TableCommand.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1RhYmxlQ29tbWFuZC5qYXZh) | | | | | [...rg/apache/hudi/schema/FilebasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zY2hlbWEvRmlsZWJhc2VkU2NoZW1hUHJvdmlkZXIuamF2YQ==) | | | | | [...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0luY3JlbWVudGFsUmVsYXRpb24uc2NhbGE=) | | | | | [...in/java/org/apache/hudi/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zY2hlbWEvU2NoZW1hUHJvdmlkZXIuamF2YQ==) | | | | | ... and [371 more](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1063) Save in Google Cloud Storage not working
[ https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292615#comment-17292615 ] Volodymyr Burenin commented on HUDI-1063: - Would be nice to see the used hadoop configuration and GCS connector versions. I currently suspect that ' java.lang.NoSuchMethodError' has something to do with mixed up dependencies, likely there are some incompatible dependencies either in the spark image or somewhere else. One of the latest GCS connector is incompatible with Hudi due to the dependencies. > Save in Google Cloud Storage not working > > > Key: HUDI-1063 > URL: https://issues.apache.org/jira/browse/HUDI-1063 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: David Lacalle Castillo >Priority: Critical > Labels: sev:critical, user-support-issues > Fix For: 0.8.0 > > > I added to spark submit the following properties: > {{--packages > org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4 > \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}} > Spark version 2.4.5 and Hadoop version 3.2.1 > > I am trying to save a Dataframe as follows in Google Cloud Storage as follows: > tableName = "forecasts" > basePath = "gs://hudi-datalake/" + tableName > hudi_options = { > 'hoodie.table.name': tableName, > 'hoodie.datasource.write.recordkey.field': 'uuid', > 'hoodie.datasource.write.partitionpath.field': 'partitionpath', > 'hoodie.datasource.write.table.name': tableName, > 'hoodie.datasource.write.operation': 'insert', > 'hoodie.datasource.write.precombine.field': 'ts', > 'hoodie.upsert.shuffle.parallelism': 2, > 'hoodie.insert.shuffle.parallelism': 2 > } > results = results.selectExpr( > "ds as date", > "store", > "item", > "y as sales", > "yhat as sales_predicted", > "yhat_upper as sales_predicted_upper", > "yhat_lower as sales_predicted_lower", > "training_date") > results.write.format("hudi"). \ > options(**hudi_options). \ > mode("overwrite"). \ > save(basePath) > I am getting the following error: > Py4JJavaError: An error occurred while calling o312.save. : > java.lang.NoSuchMethodError: > org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at > io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50) > at io.javalin.Javalin.(Javalin.java:94) at > io.javalin.Javalin.create(Javalin.java:107) at > org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74) > at > org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102) > at > org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83) > at > org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) > at > org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) > at > org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) > at > org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at >
[jira] [Updated] (HUDI-1646) Allow support for pre-commit validation
[ https://issues.apache.org/jira/browse/HUDI-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1646: - Labels: pull-request-available (was: ) > Allow support for pre-commit validation > --- > > Key: HUDI-1646 > URL: https://issues.apache.org/jira/browse/HUDI-1646 > Project: Apache Hudi > Issue Type: New Feature >Reporter: satish >Assignee: satish >Priority: Critical > Labels: pull-request-available > Fix For: 0.8.0 > > > We have use cases where we want to support running hive/presto queries to > validate data on uncommitted data. If validation passes, we will promote the > commit. Otherwise, rollback the commit. > This is not possible today because ParquetInputFormat supports only reading > committed data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] satishkotha opened a new pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat
satishkotha opened a new pull request #2611: URL: https://github.com/apache/hudi/pull/2611 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *Add support for pre-commit validation hooks through hive/presto queries. We provide mechanism to read uncommitted data through InputFormat. If validation passes, we will promote the commit. Otherwise, rollback the commit. ## Brief change log * Add new consume mode: validate. In this consume mode, we allow reading dirty data. Users can run hive/presto queries, validate data and then commit/abort. Note that users also need to explicitly specify dirty commit time to use this API. (We can consider making this optional) * In the first version, only added support for COW tables and parquet format. If general approach looks good, i can extend support for other file formats/table types as a follow up. ## Verify this pull request This change added tests. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1646) Allow support for pre-commit validation
satish created HUDI-1646: Summary: Allow support for pre-commit validation Key: HUDI-1646 URL: https://issues.apache.org/jira/browse/HUDI-1646 Project: Apache Hudi Issue Type: New Feature Reporter: satish Assignee: satish Fix For: 0.8.0 We have use cases where we want to support running hive/presto queries to validate data on uncommitted data. If validation passes, we will promote the commit. Otherwise, rollback the commit. This is not possible today because ParquetInputFormat supports only reading committed data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] satishkotha commented on pull request #2610: [HUDI-1644] Do not delete older rollback instants as part of rollback…
satishkotha commented on pull request #2610: URL: https://github.com/apache/hudi/pull/2610#issuecomment-787631465 > @satishkotha High level looks good to me, can we confirm if we have a equivalent test case on the archiving of rollback instants that simulates the same behavior of not leaving behind any rollback instants ? @n3nash Looks like we dont have unit tests for both clean/rollback instants. I created https://issues.apache.org/jira/browse/HUDI-1645 as a follow up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1645) Add unit test to verify clean and rollback instants are archived correctly
satish created HUDI-1645: Summary: Add unit test to verify clean and rollback instants are archived correctly Key: HUDI-1645 URL: https://issues.apache.org/jira/browse/HUDI-1645 Project: Apache Hudi Issue Type: Bug Reporter: satish https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java The tests dont seem to cover clean/rollback instants. Add those instants and make sure those instants are archived correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1632) Supports merge on read write mode for Flink writer
[ https://issues.apache.org/jira/browse/HUDI-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1632. -- Resolution: Implemented Implemented via master branch: 7a11de12764d8f68f296c6e68a22822318bfbefa > Supports merge on read write mode for Flink writer > -- > > Key: HUDI-1632 > URL: https://issues.apache.org/jira/browse/HUDI-1632 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-1632) Supports merge on read write mode for Flink writer
[ https://issues.apache.org/jira/browse/HUDI-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned HUDI-1632: -- Assignee: Danny Chen > Supports merge on read write mode for Flink writer > -- > > Key: HUDI-1632 > URL: https://issues.apache.org/jira/browse/HUDI-1632 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1632) Supports merge on read write mode for Flink writer
[ https://issues.apache.org/jira/browse/HUDI-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-1632: --- Fix Version/s: 0.8.0 > Supports merge on read write mode for Flink writer > -- > > Key: HUDI-1632 > URL: https://issues.apache.org/jira/browse/HUDI-1632 > Project: Apache Hudi > Issue Type: Sub-task > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] yanghua merged pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer
yanghua merged pull request #2593: URL: https://github.com/apache/hudi/pull/2593 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (be257b5 -> 7a11de1)
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from be257b5 [Hudi-1583]: Fix bug that Hudi will skip remaining log files if there is logFile with zero size in logFileList when merge on read. (#2584) add 7a11de1 [HUDI-1632] Supports merge on read write mode for Flink writer (#2593) No new revisions were added by this update. Summary of changes: .../org/apache/hudi/io/HoodieAppendHandle.java | 48 --- .../java/org/apache/hudi/io/HoodieMergeHandle.java | 15 ++- .../apache/hudi/client/HoodieFlinkWriteClient.java | 121 ++--- .../hudi/index/state/FlinkInMemoryStateIndex.java | 10 -- ...actory.java => ExplicitWriteHandleFactory.java} | 6 +- .../java/org/apache/hudi/io/FlinkAppendHandle.java | 125 + .../java/org/apache/hudi/io/FlinkCreateHandle.java | 14 +- .../java/org/apache/hudi/io/FlinkMergeHandle.java | 28 +--- .../hudi/table/HoodieFlinkCopyOnWriteTable.java| 64 - .../hudi/table/HoodieFlinkMergeOnReadTable.java| 66 - .../org/apache/hudi/table/HoodieFlinkTable.java| 3 +- .../commit/BaseFlinkCommitActionExecutor.java | 29 ++-- .../hudi/table/action/commit/FlinkMergeHelper.java | 11 +- .../delta/BaseFlinkDeltaCommitActionExecutor.java | 65 + .../FlinkUpsertDeltaCommitActionExecutor.java} | 24 ++-- .../table/action/compact/FlinkCompactHelpers.java} | 26 ++-- .../FlinkScheduleCompactionActionExecutor.java}| 14 +- .../HoodieFlinkMergeOnReadTableCompactor.java} | 111 +++ .../org/apache/hudi/operator/FlinkOptions.java | 40 +- .../apache/hudi/operator/StreamWriteFunction.java | 8 +- .../operator/StreamWriteOperatorCoordinator.java | 19 +++ .../hudi/operator/compact/CompactFunction.java | 94 + .../operator/compact/CompactionCommitEvent.java| 62 + .../operator/compact/CompactionCommitSink.java | 150 + .../hudi/operator/compact/CompactionPlanEvent.java | 31 ++--- .../operator/compact/CompactionPlanOperator.java | 146 .../operator/partitioner/BucketAssignFunction.java | 9 +- .../hudi/operator/partitioner/BucketAssigner.java | 4 +- .../hudi/operator/partitioner/BucketAssigners.java | 54 .../partitioner/delta/DeltaBucketAssigner.java | 62 +++-- .../java/org/apache/hudi/util/StreamerUtil.java| 6 + .../apache/hudi/operator/StreamWriteITCase.java| 83 ...FunctionTest.java => TestWriteCopyOnWrite.java} | 85 +++- .../apache/hudi/operator/TestWriteMergeOnRead.java | 96 + .../operator/TestWriteMergeOnReadWithCompact.java | 58 .../operator/utils/CompactFunctionWrapper.java | 142 +++ .../operator/utils/StreamWriteFunctionWrapper.java | 16 +++ .../org/apache/hudi/operator/utils/TestData.java | 96 + 38 files changed, 1734 insertions(+), 307 deletions(-) rename hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/{ExplicitCreateHandleFactory.java => ExplicitWriteHandleFactory.java} (87%) create mode 100644 hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkAppendHandle.java create mode 100644 hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/delta/BaseFlinkDeltaCommitActionExecutor.java copy hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/{FlinkUpsertCommitActionExecutor.java => delta/FlinkUpsertDeltaCommitActionExecutor.java} (66%) copy hudi-client/{hudi-spark-client/src/main/java/org/apache/hudi/table/action/compact/SparkCompactHelpers.java => hudi-flink-client/src/main/java/org/apache/hudi/table/action/compact/FlinkCompactHelpers.java} (75%) copy hudi-client/{hudi-spark-client/src/main/java/org/apache/hudi/table/action/compact/SparkScheduleCompactionActionExecutor.java => hudi-flink-client/src/main/java/org/apache/hudi/table/action/compact/FlinkScheduleCompactionActionExecutor.java} (92%) copy hudi-client/{hudi-spark-client/src/main/java/org/apache/hudi/table/action/compact/HoodieSparkMergeOnReadTableCompactor.java => hudi-flink-client/src/main/java/org/apache/hudi/table/action/compact/HoodieFlinkMergeOnReadTableCompactor.java} (72%) create mode 100644 hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactFunction.java create mode 100644 hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactionCommitEvent.java create mode 100644 hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactionCommitSink.java copy hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractCompactor.java => hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactionPlanEvent.java (52%) create mode 100644
[GitHub] [hudi] wangxianghu commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp
wangxianghu commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-787616261 > I will add the unit test, and then please review Hi @liujinhui1994 sorry for the day. Can we keep all these changes in `KafkaOffsetGen`, this seems more elegant This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] n3nash commented on pull request #2610: [HUDI-1644] Do not delete older rollback instants as part of rollback…
n3nash commented on pull request #2610: URL: https://github.com/apache/hudi/pull/2610#issuecomment-787615017 @satishkotha High level looks good to me, can we confirm if we have a equivalent test case on the archiving of rollback instants that simulates the same behavior of not leaving behind any rollback instants ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io commented on pull request #2610: [HUDI-1644] Do not delete older rollback instants as part of rollback…
codecov-io commented on pull request #2610: URL: https://github.com/apache/hudi/pull/2610#issuecomment-787602584 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2610?src=pr=h1) Report > Merging [#2610](https://codecov.io/gh/apache/hudi/pull/2610?src=pr=desc) (b108e6f) into [master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc) (be257b5) will **decrease** coverage by `0.00%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2610/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2610?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2610 +/- ## - Coverage 51.27% 51.26% -0.01% + Complexity 3241 3238 -3 Files 438 438 Lines 2012620112 -14 Branches 2079 2079 - Hits 1032010311 -9 + Misses 8954 8949 -5 Partials852 852 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `36.87% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudicommon | `51.30% <ø> (-0.05%)` | `0.00 <ø> (ø)` | | | hudiflink | `46.85% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisparkdatasource | `69.71% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | | | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | | | hudiutilities | `69.59% <ø> (+0.15%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2610?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2610/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==) | `47.80% <ø> (-1.52%)` | `57.00 <0.00> (-4.00)` | | | [...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2610/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==) | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | | | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2610/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `71.07% <0.00%> (+0.35%)` | `53.00% <0.00%> (+1.00%)` | | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2610/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `65.69% <0.00%> (+1.16%)` | `33.00% <0.00%> (ø%)` | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2580: [HUDI 1623] Introduce start & end commit times to timeline
codecov-io edited a comment on pull request #2580: URL: https://github.com/apache/hudi/pull/2580#issuecomment-780218110 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=h1) Report > Merging [#2580](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=desc) (1065c5c) into [master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc) (be257b5) will **increase** coverage by `10.24%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2580/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2580 +/- ## = + Coverage 51.27% 61.52% +10.24% + Complexity 3241 325 -2916 = Files 438 53 -385 Lines 20126 1944-18182 Branches 2079 235 -1844 = - Hits 10320 1196 -9124 + Misses 8954 625 -8329 + Partials852 123 -729 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `61.52% <ø> (-7.93%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | `0.00% <0.00%> (-28.00%)` | | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | `0.00% <0.00%> (-2.00%)` | | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-2.00%)` | | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `41.86% <0.00%> (-22.68%)` | `27.00% <0.00%> (-6.00%)` | | | [...g/apache/hudi/cli/utils/SparkTempViewProvider.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL1NwYXJrVGVtcFZpZXdQcm92aWRlci5qYXZh) | | | | | [...common/table/view/AbstractTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvQWJzdHJhY3RUYWJsZUZpbGVTeXN0ZW1WaWV3LmphdmE=) | | | | | [...e/hudi/common/util/collection/ImmutableTriple.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9JbW11dGFibGVUcmlwbGUuamF2YQ==) | | | | | [...sioning/clean/CleanMetadataV2MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YVYyTWlncmF0aW9uSGFuZGxlci5qYXZh) | | | | | [...va/org/apache/hudi/hive/util/ColumnNameXLator.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db2x1bW5OYW1lWExhdG9yLmphdmE=) | | | | | ... and [370 more](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree-more) | |
[GitHub] [hudi] codecov-io edited a comment on pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer
codecov-io edited a comment on pull request #2593: URL: https://github.com/apache/hudi/pull/2593#issuecomment-784220708 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2593?src=pr=h1) Report > Merging [#2593](https://codecov.io/gh/apache/hudi/pull/2593?src=pr=desc) (c22ac8d) into [master](https://codecov.io/gh/apache/hudi/commit/06dc7c7fd8a867a1e1da90f7dc19b0cc2da69bba?el=desc) (06dc7c7) will **increase** coverage by `18.37%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2593/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2593?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2593 +/- ## = + Coverage 51.22% 69.59% +18.37% + Complexity 3230 364 -2866 = Files 438 53 -385 Lines 20093 1944-18149 Branches 2069 235 -1834 = - Hits 10292 1353 -8939 + Misses 8954 458 -8496 + Partials847 133 -714 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.59% <ø> (+0.08%)` | `0.00 <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2593?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...hudi/utilities/sources/helpers/KafkaOffsetGen.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9LYWZrYU9mZnNldEdlbi5qYXZh) | `85.84% <0.00%> (-2.94%)` | `20.00% <0.00%> (+4.00%)` | :arrow_down: | | [...he/hudi/common/table/log/block/HoodieLogBlock.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVMb2dCbG9jay5qYXZh) | | | | | [...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=) | | | | | [.../hudi/common/table/view/FileSystemViewManager.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdNYW5hZ2VyLmphdmE=) | | | | | [...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==) | | | | | [.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh) | | | | | [...org/apache/hudi/common/bloom/BloomFilterUtils.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVXRpbHMuamF2YQ==) | | | | | [...org/apache/hudi/common/config/TypedProperties.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9UeXBlZFByb3BlcnRpZXMuamF2YQ==) | | | | | [...ava/org/apache/hudi/cli/commands/TableCommand.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1RhYmxlQ29tbWFuZC5qYXZh) | | | | | [...rg/apache/hudi/schema/FilebasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zY2hlbWEvRmlsZWJhc2VkU2NoZW1hUHJvdmlkZXIuamF2YQ==) | | | | | ... and [371 more](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure
[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer
danny0405 commented on a change in pull request #2593: URL: https://github.com/apache/hudi/pull/2593#discussion_r584416641 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactEvent.java ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator.compact; + +import org.apache.hudi.common.model.CompactionOperation; + +import java.io.Serializable; + +/** + * Represents a compact command from the compaction plan task {@link CompactionPlanOperator}. + */ +public class CompactEvent implements Serializable { Review comment: > Thanks @danny0405 for the awesome work. Hard to catch up on the review since you are making progress too fast :) > Can't go into detail about this large PR too much until I get a chance to run this myself. Left some high-level comments. > One concern is about the test cases. I feel like Flink writer is not as well tested as Spark, so the reliability is a bit concerning for me when we officially release this feature. Any plan to add more test cases? Yes, we can add more test cases when more feature are introduced for Flink, such as `SQL connectors`, `INSERT OVERRIDE`, more kinds of key generators. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pengzhiwei2018 commented on issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet
pengzhiwei2018 commented on issue #2609: URL: https://github.com/apache/hudi/issues/2609#issuecomment-787593403 Hi @ramachandranms , do query cow table by presto? I have found an issue that query hudi cow table slow than parquet by presto before. And you can try this fix at: https://github.com/pengzhiwei2018/hudi/tree/dev_presto. Hope it can help you~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer
danny0405 commented on a change in pull request #2593: URL: https://github.com/apache/hudi/pull/2593#discussion_r584411150 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactCommitEvent.java ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator.compact; + +import org.apache.hudi.client.WriteStatus; + +import java.io.Serializable; +import java.util.List; + +/** + * Represents a commit event from the compaction task {@link CompactFunction}. + */ +public class CompactCommitEvent implements Serializable { + private static final long serialVersionUID = 1L; + + /** + * The compaction commit instant time. + */ + private final String instant; + /** + * The write statuses. + */ + private final List writeStatuses; + /** + * The compaction task identifier. + */ + private final int taskID; + + public CompactCommitEvent(String instant, List writeStatuses, int taskID) { +this.instant = instant; +this.writeStatuses = writeStatuses; +this.taskID = taskID; + } + + public String getInstant() { +return instant; + } + + public List getWriteStatuses() { +return writeStatuses; + } + + public int getTaskID() { Review comment: Not used in current code, but i would rather keep it in case of future usage. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer
danny0405 commented on a change in pull request #2593: URL: https://github.com/apache/hudi/pull/2593#discussion_r584410935 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java ## @@ -165,6 +165,42 @@ private FlinkOptions() { .defaultValue(128D) // 128MB .withDescription("Batch buffer size in MB to flush data into the underneath filesystem"); + // + // Compaction Options + // + + public static final ConfigOption WRITE_ASYNC_COMPACTION = ConfigOptions + .key("compaction.async.enabled") + .booleanType() + .defaultValue(true) // default true for MOR write + .withDescription("Async Compaction, enabled by default for MOR"); + + public static final String NUM_COMMITS = "num_commits"; + public static final String TIME_ELAPSED = "time_elapsed"; + public static final String NUM_AND_TIME = "num_and_time"; + public static final String NUM_OR_TIME = "num_or_time"; + public static final ConfigOption COMPACTION_TRIGGER_STRATEGY = ConfigOptions + .key("compaction.trigger.strategy") Review comment: No, the option key of `HoodieCompactionConfig` is too long and not very friendly to use as SQL options. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer
danny0405 commented on a change in pull request #2593: URL: https://github.com/apache/hudi/pull/2593#discussion_r584410723 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java ## @@ -165,6 +165,42 @@ private FlinkOptions() { .defaultValue(128D) // 128MB .withDescription("Batch buffer size in MB to flush data into the underneath filesystem"); + // + // Compaction Options + // + + public static final ConfigOption WRITE_ASYNC_COMPACTION = ConfigOptions + .key("compaction.async.enabled") + .booleanType() + .defaultValue(true) // default true for MOR write + .withDescription("Async Compaction, enabled by default for MOR"); + + public static final String NUM_COMMITS = "num_commits"; Review comment: No, the enumeration comes from what HUDI core defines, see `CompactionTriggerStrategy`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer
danny0405 commented on a change in pull request #2593: URL: https://github.com/apache/hudi/pull/2593#discussion_r584409961 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java ## @@ -165,6 +165,42 @@ private FlinkOptions() { .defaultValue(128D) // 128MB .withDescription("Batch buffer size in MB to flush data into the underneath filesystem"); + // + // Compaction Options + // + + public static final ConfigOption WRITE_ASYNC_COMPACTION = ConfigOptions Review comment: Rename to `COMPACTION_ASYNC_ENABLED`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2374: [HUDI-845] Added locking capability to allow multiple writers
codecov-io edited a comment on pull request #2374: URL: https://github.com/apache/hudi/pull/2374#issuecomment-750782300 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=h1) Report > Merging [#2374](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=desc) (2d7d890) into [master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc) (be257b5) will **increase** coverage by `10.09%`. > The diff coverage is `0.00%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2374/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2374 +/- ## = + Coverage 51.27% 61.36% +10.09% + Complexity 3241 324 -2917 = Files 438 53 -385 Lines 20126 1944-18182 Branches 2079 235 -1844 = - Hits 10320 1193 -9127 + Misses 8954 627 -8327 + Partials852 124 -728 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `61.36% <0.00%> (-8.08%)` | `0.00 <0.00> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.71% <0.00%> (ø)` | `52.00 <0.00> (ø)` | | | [...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | | | [.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==) | `5.17% <0.00%> (-83.63%)` | `0.00% <0.00%> (-28.00%)` | | | [...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh) | `0.00% <0.00%> (-72.23%)` | `0.00% <0.00%> (-2.00%)` | | | [...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh) | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-2.00%)` | | | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `40.69% <0.00%> (-23.84%)` | `27.00% <0.00%> (-6.00%)` | | | [...meline/versioning/clean/CleanMetadataMigrator.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YU1pZ3JhdG9yLmphdmE=) | | | | | [...he/hudi/common/table/timeline/dto/BaseFileDTO.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9CYXNlRmlsZURUTy5qYXZh) | | | | | [...a/org/apache/hudi/avro/HoodieAvroWriteSupport.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvV3JpdGVTdXBwb3J0LmphdmE=) | | | | | [.../java/org/apache/hudi/common/metrics/Registry.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21ldHJpY3MvUmVnaXN0cnkuamF2YQ==) | | | | | ... and [375 more](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree-more) | |
[jira] [Updated] (HUDI-1644) Do not delete rollback instants in RollbackActionExecutor
[ https://issues.apache.org/jira/browse/HUDI-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1644: - Labels: pull-request-available (was: ) > Do not delete rollback instants in RollbackActionExecutor > - > > Key: HUDI-1644 > URL: https://issues.apache.org/jira/browse/HUDI-1644 > Project: Apache Hudi > Issue Type: Bug >Reporter: satish >Assignee: satish >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > > We are trying to remove older rollback files here > https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java#L204 > But, this is not compatible with timeline layout version 1 because > rollback.inflight and requested continue to stay on. This causes problems for > RFC-15 metadata sync. > Archival takes care of removing these rollback files, so we dont needthis > special logic in BaseRollbackActionExecutor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] satishkotha opened a new pull request #2610: [HUDI-1644] Do not delete older rollback instants as part of rollback…
satishkotha opened a new pull request #2610: URL: https://github.com/apache/hudi/pull/2610 ## What is the purpose of the pull request Archival can take care of removing old rollback instants cleanly ## Brief change log * rollback instants are cleaned up by archival process. Trying to delete them during rollback is not needed. This code also has many bugs: a) it removes newer rollback instants instead of older ones b) it doesnt clean up inflight/requested files, so not compatible with timeline layout version 1. ## Verify this pull request Only code deletions, existing tests cover functionality ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-1644) Do not delete rollback instants in RollbackActionExecutor
satish created HUDI-1644: Summary: Do not delete rollback instants in RollbackActionExecutor Key: HUDI-1644 URL: https://issues.apache.org/jira/browse/HUDI-1644 Project: Apache Hudi Issue Type: Bug Reporter: satish Assignee: satish Fix For: 0.8.0 We are trying to remove older rollback files here https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java#L204 But, this is not compatible with timeline layout version 1 because rollback.inflight and requested continue to stay on. This causes problems for RFC-15 metadata sync. Archival takes care of removing these rollback files, so we dont needthis special logic in BaseRollbackActionExecutor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1539) Bug in HoodieCombineRealtimeRecordReader returns wrong results
[ https://issues.apache.org/jira/browse/HUDI-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish closed HUDI-1539. > Bug in HoodieCombineRealtimeRecordReader returns wrong results > -- > > Key: HUDI-1539 > URL: https://issues.apache.org/jira/browse/HUDI-1539 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: satish >Assignee: satish >Priority: Critical > Labels: pull-request-available, sev:critical, user-support-issues > > https://github.com/apache/hudi/issues/2346#issuecomment-758591316 > in a rt table > the hive query has predicate push down > there are no less than 3 splits (thus no less than 3 recordReaders in > HoodieCombineRealtimeRecordReader), and the records satisfy the predicate are > in the split which is in a relatively back position of the List > 2 recordReaders in succession with this.currentRecordReader.next(key, value) > returns false, as the predicate push down has filtered the baseFile. > In step 4, it leads to HoodieCombineRealtimeRecordReader::next(NullWritable > key, ArrayWritable value) return false and the reader will stop read next. > So, records which satisfy the predicate are in the remanined recordReaders > but can not be read. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1539) Bug in HoodieCombineRealtimeRecordReader returns wrong results
[ https://issues.apache.org/jira/browse/HUDI-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish resolved HUDI-1539. -- Resolution: Fixed > Bug in HoodieCombineRealtimeRecordReader returns wrong results > -- > > Key: HUDI-1539 > URL: https://issues.apache.org/jira/browse/HUDI-1539 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: satish >Assignee: satish >Priority: Critical > Labels: pull-request-available, sev:critical, user-support-issues > > https://github.com/apache/hudi/issues/2346#issuecomment-758591316 > in a rt table > the hive query has predicate push down > there are no less than 3 splits (thus no less than 3 recordReaders in > HoodieCombineRealtimeRecordReader), and the records satisfy the predicate are > in the split which is in a relatively back position of the List > 2 recordReaders in succession with this.currentRecordReader.next(key, value) > returns false, as the predicate push down has filtered the baseFile. > In step 4, it leads to HoodieCombineRealtimeRecordReader::next(NullWritable > key, ArrayWritable value) return false and the reader will stop read next. > So, records which satisfy the predicate are in the remanined recordReaders > but can not be read. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] teeyog commented on a change in pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory
teeyog commented on a change in pull request #2475: URL: https://github.com/apache/hudi/pull/2475#discussion_r584401119 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -84,6 +88,26 @@ class DefaultSource extends RelationProvider val tablePath = DataSourceUtils.getTablePath(fs, globPaths.toArray) log.info("Obtained hudi table path: " + tablePath) +if (path.nonEmpty) { + val _path = path.get.stripSuffix("/") + val pathTmp = new Path(_path).makeQualified(fs.getUri, fs.getWorkingDirectory) + // If the user specifies the table path, the data path is automatically inferred + if (pathTmp.toString.equals(tablePath)) { +val sparkEngineContext = new HoodieSparkEngineContext(sqlContext.sparkContext) +val fsBackedTableMetadata = + new FileSystemBackedTableMetadata(sparkEngineContext, new SerializableConfiguration(fs.getConf), tablePath, false) +val partitionPaths = fsBackedTableMetadata.getAllPartitionPaths Review comment: @lw309637554 Thank you for your review, the previous path to get the hudi table can also be obtained through configuration instead of inference This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.
vinothchandar commented on a change in pull request #2494: URL: https://github.com/apache/hudi/pull/2494#discussion_r584393784 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java ## @@ -112,13 +113,59 @@ private void initIfNeeded() { @Override protected Option> getRecordByKeyFromMetadata(String key) { +// This function can be called in parallel through multiple threads. For each thread, we determine the thread-local +// versions of the baseFile and logRecord readers to use. +// - If reuse is enabled, we use the same readers and dont close them +// - if reuse is disabled, we open new readers in each thread and close them +HoodieFileReader localFileReader = null; +HoodieMetadataMergedLogRecordScanner localLogRecordScanner = null; +synchronized (this) { + if (!metadataConfig.enableReuse()) { +// reuse is disabled so always open new readers +try { + Pair readers = openReaders(); + localFileReader = readers.getKey(); + localLogRecordScanner = readers.getValue(); +} catch (IOException e) { + throw new HoodieIOException("Error opening readers", e); +} + } else if (baseFileReader == null && logRecordScanner == null) { +// reuse is enabled but we haven't opened the readers yet +try { + Pair readers = openReaders(); + localFileReader = readers.getKey(); + localLogRecordScanner = readers.getValue(); + // cache the readers + baseFileReader = localFileReader; + logRecordScanner = localLogRecordScanner; +} catch (IOException e) { + throw new HoodieIOException("Error opening readers", e); +} + } else { +// reuse the already open readers +ValidationUtils.checkState((baseFileReader != null || logRecordScanner != null), "Readers should already be open"); +localFileReader = baseFileReader; +localLogRecordScanner = logRecordScanner; + } +} Review comment: @prashantwason and I already synced on this. Will be catching up on reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.
nsivabalan commented on a change in pull request #2494: URL: https://github.com/apache/hudi/pull/2494#discussion_r584328039 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java ## @@ -112,13 +113,59 @@ private void initIfNeeded() { @Override protected Option> getRecordByKeyFromMetadata(String key) { +// This function can be called in parallel through multiple threads. For each thread, we determine the thread-local +// versions of the baseFile and logRecord readers to use. +// - If reuse is enabled, we use the same readers and dont close them +// - if reuse is disabled, we open new readers in each thread and close them +HoodieFileReader localFileReader = null; +HoodieMetadataMergedLogRecordScanner localLogRecordScanner = null; +synchronized (this) { + if (!metadataConfig.enableReuse()) { +// reuse is disabled so always open new readers +try { + Pair readers = openReaders(); + localFileReader = readers.getKey(); + localLogRecordScanner = readers.getValue(); +} catch (IOException e) { + throw new HoodieIOException("Error opening readers", e); +} + } else if (baseFileReader == null && logRecordScanner == null) { +// reuse is enabled but we haven't opened the readers yet +try { + Pair readers = openReaders(); + localFileReader = readers.getKey(); + localLogRecordScanner = readers.getValue(); + // cache the readers + baseFileReader = localFileReader; + logRecordScanner = localLogRecordScanner; +} catch (IOException e) { + throw new HoodieIOException("Error opening readers", e); +} + } else { +// reuse the already open readers +ValidationUtils.checkState((baseFileReader != null || logRecordScanner != null), "Readers should already be open"); +localFileReader = baseFileReader; +localLogRecordScanner = logRecordScanner; + } +} Review comment: @vinothchandar : Can we sync up on this sometime and get a closure. Would like to have this in before our next release. May be at the end of next week's sync meeting, we can discuss on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-io edited a comment on pull request #2577: [HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer
codecov-io edited a comment on pull request #2577: URL: https://github.com/apache/hudi/pull/2577#issuecomment-779312995 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2577?src=pr=h1) Report > Merging [#2577](https://codecov.io/gh/apache/hudi/pull/2577?src=pr=desc) (d5fb81f) into [master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc) (be257b5) will **increase** coverage by `18.16%`. > The diff coverage is `66.66%`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2577/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2577?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#2577 +/- ## = + Coverage 51.27% 69.44% +18.16% + Complexity 3241 363 -2878 = Files 438 53 -385 Lines 20126 1944-18182 Branches 2079 235 -1844 = - Hits 10320 1350 -8970 + Misses 8954 460 -8494 + Partials852 134 -718 ``` | Flag | Coverage Δ | Complexity Δ | | |---|---|---|---| | hudicli | `?` | `?` | | | hudiclient | `?` | `?` | | | hudicommon | `?` | `?` | | | hudiflink | `?` | `?` | | | hudihadoopmr | `?` | `?` | | | hudisparkdatasource | `?` | `?` | | | hudisync | `?` | `?` | | | huditimelineservice | `?` | `?` | | | hudiutilities | `69.44% <66.66%> (ø)` | `0.00 <1.00> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2577?src=pr=tree) | Coverage Δ | Complexity Δ | | |---|---|---|---| | [...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==) | `78.39% <66.66%> (ø)` | `18.00 <1.00> (ø)` | | | [...udi/common/table/log/block/HoodieCorruptBlock.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVDb3JydXB0QmxvY2suamF2YQ==) | | | | | [.../apache/hudi/operator/InstantGenerateOperator.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9JbnN0YW50R2VuZXJhdGVPcGVyYXRvci5qYXZh) | | | | | [...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh) | | | | | [.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh) | | | | | [.../apache/hudi/timeline/service/TimelineService.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvVGltZWxpbmVTZXJ2aWNlLmphdmE=) | | | | | [.../hive/SlashEncodedHourPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkSG91clBhcnRpdGlvblZhbHVlRXh0cmFjdG9yLmphdmE=) | | | | | [...pache/hudi/hadoop/config/HoodieRealtimeConfig.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2NvbmZpZy9Ib29kaWVSZWFsdGltZUNvbmZpZy5qYXZh) | | | | | [...a/org/apache/hudi/common/util/ReflectionUtils.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUmVmbGVjdGlvblV0aWxzLmphdmE=) | | | | | [...e/hudi/common/table/timeline/dto/FileGroupDTO.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlR3JvdXBEVE8uamF2YQ==) | | | | | ... and [376 more](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree-more) | | This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For
[GitHub] [hudi] nsivabalan commented on a change in pull request #2577: [HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer
nsivabalan commented on a change in pull request #2577: URL: https://github.com/apache/hudi/pull/2577#discussion_r584311235 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -965,16 +969,30 @@ public void testDistributedTestDataSource() { assertEquals(1000, c); } - private static void prepareParquetDFSFiles(int numRecords) throws IOException { -String path = PARQUET_SOURCE_ROOT + "/1.parquet"; + protected static void prepareParquetDFSFiles(int numRecords, String baseParquetPath) throws IOException { +prepareParquetDFSFiles(numRecords, baseParquetPath, "1.parquet"); + } + + protected static void prepareParquetDFSFiles(int numRecords, String baseParquetPath, String fileName) throws IOException { +String path = baseParquetPath + "/" + fileName; Review comment: Paths.get() returns java.nio.file.Paths but we need Hadoop fs path(lines 1003, 1006) here. Hence leaving it as is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1063) Save in Google Cloud Storage not working
[ https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292433#comment-17292433 ] sivabalan narayanan commented on HUDI-1063: --- [~vburenin]: Your help here is much appreciated. > Save in Google Cloud Storage not working > > > Key: HUDI-1063 > URL: https://issues.apache.org/jira/browse/HUDI-1063 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: David Lacalle Castillo >Priority: Critical > Labels: sev:critical, user-support-issues > Fix For: 0.8.0 > > > I added to spark submit the following properties: > {{--packages > org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4 > \ --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}} > Spark version 2.4.5 and Hadoop version 3.2.1 > > I am trying to save a Dataframe as follows in Google Cloud Storage as follows: > tableName = "forecasts" > basePath = "gs://hudi-datalake/" + tableName > hudi_options = { > 'hoodie.table.name': tableName, > 'hoodie.datasource.write.recordkey.field': 'uuid', > 'hoodie.datasource.write.partitionpath.field': 'partitionpath', > 'hoodie.datasource.write.table.name': tableName, > 'hoodie.datasource.write.operation': 'insert', > 'hoodie.datasource.write.precombine.field': 'ts', > 'hoodie.upsert.shuffle.parallelism': 2, > 'hoodie.insert.shuffle.parallelism': 2 > } > results = results.selectExpr( > "ds as date", > "store", > "item", > "y as sales", > "yhat as sales_predicted", > "yhat_upper as sales_predicted_upper", > "yhat_lower as sales_predicted_lower", > "training_date") > results.write.format("hudi"). \ > options(**hudi_options). \ > mode("overwrite"). \ > save(basePath) > I am getting the following error: > Py4JJavaError: An error occurred while calling o312.save. : > java.lang.NoSuchMethodError: > org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at > io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50) > at io.javalin.Javalin.(Javalin.java:94) at > io.javalin.Javalin.create(Javalin.java:107) at > org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102) > at > org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74) > at > org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102) > at > org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69) > at > org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83) > at > org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) > at > org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) > at > org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) > at > org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) > at >
[GitHub] [hudi] lw309637554 commented on pull request #2136: [HUDI-37] Persist the HoodieIndex type in the hoodie.properties file
lw309637554 commented on pull request #2136: URL: https://github.com/apache/hudi/pull/2136#issuecomment-787468796 > @lw309637554 @vinothchandar : can you folks get this to completion, its been open for a while. Would be nice to have this in. We might also add more documentation in fax or somewhere as to what switches are compatible. sorry for my late. Now metatable is ready , i will implement this use meta table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types
lw309637554 commented on pull request #2160: URL: https://github.com/apache/hudi/pull/2160#issuecomment-787467468 > @lw309637554 : Can you please check the feedback and address them. would be nice to have this in. okay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #2577: [HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer
nsivabalan commented on pull request #2577: URL: https://github.com/apache/hudi/pull/2577#issuecomment-787466857 @yanghua : addressed all comments. have responded to one of your feedback. Feel free to check it out when you can. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on a change in pull request #2577: [HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer
nsivabalan commented on a change in pull request #2577: URL: https://github.com/apache/hudi/pull/2577#discussion_r584311235 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -965,16 +969,30 @@ public void testDistributedTestDataSource() { assertEquals(1000, c); } - private static void prepareParquetDFSFiles(int numRecords) throws IOException { -String path = PARQUET_SOURCE_ROOT + "/1.parquet"; + protected static void prepareParquetDFSFiles(int numRecords, String baseParquetPath) throws IOException { +prepareParquetDFSFiles(numRecords, baseParquetPath, "1.parquet"); + } + + protected static void prepareParquetDFSFiles(int numRecords, String baseParquetPath, String fileName) throws IOException { +String path = baseParquetPath + "/" + fileName; Review comment: Paths.get() returns java.nio.file.Paths but we need Hadoop fs path(line 980) here. Hence leaving it as is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] lw309637554 commented on a change in pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory
lw309637554 commented on a change in pull request #2475: URL: https://github.com/apache/hudi/pull/2475#discussion_r584307500 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -84,6 +88,26 @@ class DefaultSource extends RelationProvider val tablePath = DataSourceUtils.getTablePath(fs, globPaths.toArray) log.info("Obtained hudi table path: " + tablePath) +if (path.nonEmpty) { + val _path = path.get.stripSuffix("/") + val pathTmp = new Path(_path).makeQualified(fs.getUri, fs.getWorkingDirectory) + // If the user specifies the table path, the data path is automatically inferred + if (pathTmp.toString.equals(tablePath)) { +val sparkEngineContext = new HoodieSparkEngineContext(sqlContext.sparkContext) +val fsBackedTableMetadata = + new FileSystemBackedTableMetadata(sparkEngineContext, new SerializableConfiguration(fs.getConf), tablePath, false) +val partitionPaths = fsBackedTableMetadata.getAllPartitionPaths Review comment: @teeyog hello, now infer the partition for getallpartition paths from metadata table. The partition mode is set as hoodie.datasource.write.partitionpath.field when write the hudi table. Can we persist the hoodie.datasource.write.partitionpath.field to metatable? Then read just get the properties , not get all the partition path? cc @vinothchandar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garyli1019 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer
garyli1019 commented on a change in pull request #2593: URL: https://github.com/apache/hudi/pull/2593#discussion_r584291118 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java ## @@ -165,6 +165,42 @@ private FlinkOptions() { .defaultValue(128D) // 128MB .withDescription("Batch buffer size in MB to flush data into the underneath filesystem"); + // + // Compaction Options + // + + public static final ConfigOption WRITE_ASYNC_COMPACTION = ConfigOptions Review comment: sounds like `ENABLE_ASYNC_COMPACTION`? ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java ## @@ -165,6 +165,42 @@ private FlinkOptions() { .defaultValue(128D) // 128MB .withDescription("Batch buffer size in MB to flush data into the underneath filesystem"); + // + // Compaction Options + // + + public static final ConfigOption WRITE_ASYNC_COMPACTION = ConfigOptions + .key("compaction.async.enabled") + .booleanType() + .defaultValue(true) // default true for MOR write + .withDescription("Async Compaction, enabled by default for MOR"); + + public static final String NUM_COMMITS = "num_commits"; Review comment: `num_commits` is too simple IMO, can we use a more informative name? ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactCommitEvent.java ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator.compact; + +import org.apache.hudi.client.WriteStatus; + +import java.io.Serializable; +import java.util.List; + +/** + * Represents a commit event from the compaction task {@link CompactFunction}. + */ +public class CompactCommitEvent implements Serializable { Review comment: `CompactionCommitEvent`? ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactCommitEvent.java ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.operator.compact; + +import org.apache.hudi.client.WriteStatus; + +import java.io.Serializable; +import java.util.List; + +/** + * Represents a commit event from the compaction task {@link CompactFunction}. + */ +public class CompactCommitEvent implements Serializable { + private static final long serialVersionUID = 1L; + + /** + * The compaction commit instant time. + */ + private final String instant; + /** + * The write statuses. + */ + private final List writeStatuses; + /** + * The compaction task identifier. + */ + private final int taskID; + + public CompactCommitEvent(String instant, List writeStatuses, int taskID) { +this.instant = instant; +this.writeStatuses = writeStatuses; +this.taskID = taskID; + } + + public String getInstant() { +return instant; + } + + public List getWriteStatuses() { +return writeStatuses; + } + + public int getTaskID() { Review comment: How are we gonna use this TaskID? ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactEvent.java ## @@
[GitHub] [hudi] nsivabalan commented on a change in pull request #2596: [HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig
nsivabalan commented on a change in pull request #2596: URL: https://github.com/apache/hudi/pull/2596#discussion_r584292305 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -258,4 +260,164 @@ public String getArchivelogFolder() { public Properties getProperties() { return props; } + + public static PropertyBuilder propertyBuilder() { Review comment: is there any fixes or simplification required for HoodieTableConfig#createHoodieProperties(FileSystem fs, Path metadataFolder, Properties properties) ? ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -258,4 +260,164 @@ public String getArchivelogFolder() { public Properties getProperties() { return props; } + + public static PropertyBuilder propertyBuilder() { +return new PropertyBuilder(); + } + + public static class PropertyBuilder { +private HoodieTableType tableType; +private String tableName; +private String archiveLogFolder; +private String payloadClassName; +private Integer timelineLayoutVersion; +private String baseFileFormat; +private String preCombineField; +private String bootstrapIndexClass; +private String bootstrapBasePath; + +private PropertyBuilder() { + +} + +public PropertyBuilder setTableType(HoodieTableType tableType) { + this.tableType = tableType; + return this; +} + +public PropertyBuilder setTableType(String tableType) { + return setTableType(HoodieTableType.valueOf(tableType)); +} + +public PropertyBuilder setTableName(String tableName) { + this.tableName = tableName; + return this; +} + +public PropertyBuilder setArchiveLogFolder(String archiveLogFolder) { + this.archiveLogFolder = archiveLogFolder; + return this; +} + +public PropertyBuilder setPayloadClassName(String payloadClassName) { + this.payloadClassName = payloadClassName; + return this; +} + +public PropertyBuilder setPayloadClass(Class payloadClass) { + return setPayloadClassName(payloadClass.getName()); +} + +public PropertyBuilder setTimelineLayoutVersion(Integer timelineLayoutVersion) { + this.timelineLayoutVersion = timelineLayoutVersion; + return this; +} + +public PropertyBuilder setBaseFileFormat(String baseFileFormat) { + this.baseFileFormat = baseFileFormat; + return this; +} + +public PropertyBuilder setPreCombineField(String preCombineField) { + this.preCombineField = preCombineField; + return this; +} + +public PropertyBuilder setBootstrapIndexClass(String bootstrapIndexClass) { + this.bootstrapIndexClass = bootstrapIndexClass; + return this; +} + +public PropertyBuilder setBootstrapBasePath(String bootstrapBasePath) { + this.bootstrapBasePath = bootstrapBasePath; + return this; +} + +public PropertyBuilder fromMetaClient(HoodieTableMetaClient metaClient) { Review comment: can you help point me to the place how/where this code exists prior to this patch? I checked HoodieTableMetaClient and couldn't find it. ## File path: hudi-examples/src/main/java/org/apache/hudi/examples/java/HoodieJavaWriteClientExample.java ## @@ -72,8 +72,11 @@ public static void main(String[] args) throws Exception { Path path = new Path(tablePath); FileSystem fs = FSUtils.getFs(tablePath, hadoopConf); if (!fs.exists(path)) { - HoodieTableMetaClient.initTableType(hadoopConf, tablePath, HoodieTableType.valueOf(tableType), - tableName, HoodieAvroPayload.class.getName()); + HoodieTableConfig.propertyBuilder() +.setTableType(tableType) +.setTableName(tableName) +.setPayloadClassName(HoodieAvroPayload.class.getName()) +.initTable(hadoopConf, tablePath); Review comment: this again makes me feel initTable() should move to HoodieTableMetaClient. bcoz, as you could see here, we start w/ instantiating HoodieTableConfig.propertyBuilder(), but final call results in HoodieTableMetaClient instance which does not sit well. ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -258,4 +260,164 @@ public String getArchivelogFolder() { public Properties getProperties() { return props; } + + public static PropertyBuilder propertyBuilder() { +return new PropertyBuilder(); + } + + public static class PropertyBuilder { +private HoodieTableType tableType; +private String tableName; +private String archiveLogFolder; +private String payloadClassName; +private Integer timelineLayoutVersion; +private String baseFileFormat; +private String preCombineField; +private String bootstrapIndexClass; +private String bootstrapBasePath; + +private PropertyBuilder() { + +} + +public