[GitHub] [hudi] codecov-io commented on pull request #2520: [HUDI-1446] Support skip bootstrapIndex's init in abstract fs view init

2021-02-28 Thread GitBox


codecov-io commented on pull request #2520:
URL: https://github.com/apache/hudi/pull/2520#issuecomment-787713017


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2520?src=pr=h1) Report
   > Merging 
[#2520](https://codecov.io/gh/apache/hudi/pull/2520?src=pr=desc) (ef091c8) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/0d8a4d0a56dcb35e499216c7bfab17a05716bc44?el=desc)
 (0d8a4d0) will **decrease** coverage by `40.90%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2520/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2520?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2520   +/-   ##
   
   - Coverage 50.52%   9.61%   -40.91% 
   + Complexity 3122  48 -3074 
   
 Files   430  53  -377 
 Lines 195971944-17653 
 Branches   2008 235 -1773 
   
   - Hits   9902 187 -9715 
   + Misses 88861744 -7142 
   + Partials809  13  -796 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.61% <ø> (-59.82%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2520?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2520/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] n3nash commented on pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-28 Thread GitBox


n3nash commented on pull request #2374:
URL: https://github.com/apache/hudi/pull/2374#issuecomment-787681396


   @vinothchandar Code is ready for review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat

2021-02-28 Thread GitBox


n3nash commented on a change in pull request #2611:
URL: https://github.com/apache/hudi/pull/2611#discussion_r584465066



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java
##
@@ -62,6 +67,7 @@
   public static final String HOODIE_STOP_AT_COMPACTION_PATTERN = 
"hoodie.%s.ro.stop.at.compaction";
   public static final String INCREMENTAL_SCAN_MODE = "INCREMENTAL";
   public static final String SNAPSHOT_SCAN_MODE = "SNAPSHOT";
+  public static final String VALIDATE_SCAN_MODE = "VALIDATE"; //used for 
pre-commit validation

Review comment:
   @satishkotha On thinking about this a little deeper, I feel one should 
be able to do "validate" in both modes `SNAPSHOT` & `INCREMENTAL`. Essentially, 
what you want to do is a `SNAPSHOT @ commitTime` or `Incremental from or @` 
which is what time travel allows but ensures that we read only committed data. 
To keep concepts this way, you may want to just have a flag saying 
`hoodie.%s.consume.uncommitted` whose default value is false, you always fall 
back to the `HoodieTableFileSystem` with current behavior, if it's set to true, 
then you do what you are currently doing in "VALIDATE" scan mode for snapshot 
mode to start with (It's hard for me to reason what incremental validate would 
look like). What do you think ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat

2021-02-28 Thread GitBox


n3nash commented on a change in pull request #2611:
URL: https://github.com/apache/hudi/pull/2611#discussion_r584465066



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java
##
@@ -62,6 +67,7 @@
   public static final String HOODIE_STOP_AT_COMPACTION_PATTERN = 
"hoodie.%s.ro.stop.at.compaction";
   public static final String INCREMENTAL_SCAN_MODE = "INCREMENTAL";
   public static final String SNAPSHOT_SCAN_MODE = "SNAPSHOT";
+  public static final String VALIDATE_SCAN_MODE = "VALIDATE"; //used for 
pre-commit validation

Review comment:
   @satishkotha On thinking about this a little deeper, I feel one should 
be able to do "validate" in both modes `SNAPSHOT` & `INCREMENTAL`. Essentially, 
what you want to do is a `SNAPSHOT @ commitTime` or `Incremental from or @` 
which is what time travel allows but ensures that we read only committed data. 
To keep concepts this way, you may want to just have a flag saying 
`hoodie.%s.consume.uncommitted` whose default value is false, you always fall 
back to the `HoodieTableFileSystem` with current behavior, if it's set to true, 
then you do what you are currently doing in "VALIDATE" scan mode. What do you 
think ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on a change in pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat

2021-02-28 Thread GitBox


n3nash commented on a change in pull request #2611:
URL: https://github.com/apache/hudi/pull/2611#discussion_r584465066



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieHiveUtils.java
##
@@ -62,6 +67,7 @@
   public static final String HOODIE_STOP_AT_COMPACTION_PATTERN = 
"hoodie.%s.ro.stop.at.compaction";
   public static final String INCREMENTAL_SCAN_MODE = "INCREMENTAL";
   public static final String SNAPSHOT_SCAN_MODE = "SNAPSHOT";
+  public static final String VALIDATE_SCAN_MODE = "VALIDATE"; //used for 
pre-commit validation

Review comment:
   @satishkotha On thinking about this a little deeper, I feel one should 
be able to do "validate" in both modes `SNAPSHOT` & `INCREMENTAL`. Essentially, 
what you want to do is a `SNAPSHOT @ commitTime` which is what time travel 
allows but ensures that we read only committed data. To keep concepts this way, 
you may want to just have a flag saying `hoodie.%s.consume.uncommitted` whose 
default value is false, you always fall back to the `HoodieTableFileSystem` 
with current behavior, if it's set to true, then you do what you are currently 
doing in "VALIDATE" scan mode. What do you think ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2580: [HUDI 1623] Introduce start & end commit times to timeline

2021-02-28 Thread GitBox


codecov-io edited a comment on pull request #2580:
URL: https://github.com/apache/hudi/pull/2580#issuecomment-780218110


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=h1) Report
   > Merging 
[#2580](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=desc) (326d233) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc)
 (be257b5) will **increase** coverage by `0.28%`.
   > The diff coverage is `73.80%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2580/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2580  +/-   ##
   
   + Coverage 51.27%   51.56%   +0.28% 
   - Complexity 3241 3295  +54 
   
 Files   438  446   +8 
 Lines 2012620368 +242 
 Branches   2079 2106  +27 
   
   + Hits  1032010502 +182 
   - Misses 8954 8997  +43 
   - Partials852  869  +17 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `36.87% <50.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.31% <73.07%> (-0.05%)` | `0.00 <16.00> (ø)` | |
   | hudiflink | `51.39% <ø> (+4.53%)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | hudisparkdatasource | `69.71% <100.00%> (ø)` | `0.00 <2.00> (ø)` | |
   | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.59% <ø> (+0.15%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `81.57% <0.00%> (-3.36%)` | `59.00 <0.00> (ø)` | |
   | 
[...che/hudi/common/table/timeline/HoodieTimeline.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | `91.30% <ø> (ø)` | `44.00 <0.00> (ø)` | |
   | 
[...able/timeline/versioning/InstantTimeFormatter.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvSW5zdGFudFRpbWVGb3JtYXR0ZXIuamF2YQ==)
 | `38.46% <38.46%> (ø)` | `3.00 <3.00> (?)` | |
   | 
[...a/org/apache/hudi/cli/commands/CommitsCommand.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0NvbW1pdHNDb21tYW5kLmphdmE=)
 | `53.50% <50.00%> (ø)` | `15.00 <0.00> (ø)` | |
   | 
[...ache/hudi/common/table/timeline/HoodieInstant.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUluc3RhbnQuamF2YQ==)
 | `86.53% <82.00%> (-7.78%)` | `51.00 <12.00> (+6.00)` | :arrow_down: |
   | 
[...che/hudi/common/table/timeline/TimelineLayout.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTGF5b3V0LmphdmE=)
 | `95.00% <83.33%> (-5.00%)` | `3.00 <0.00> (ø)` | |
   | 
[...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh)
 | `70.76% <100.00%> (ø)` | `44.00 <0.00> (ø)` | |
   | 
[...di/common/table/timeline/HoodieActiveTimeline.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZUFjdGl2ZVRpbWVsaW5lLmphdmE=)
 | `70.45% <100.00%> (-0.33%)` | `42.00 <1.00> (-1.00)` | |
   | 
[...ble/timeline/versioning/TimelineLayoutVersion.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvVGltZWxpbmVMYXlvdXRWZXJzaW9uLmphdmE=)
 | `66.66% <100.00%> (+1.66%)` | `7.00 <0.00> (ø)` | |
   | 

[GitHub] [hudi] codecov-io edited a comment on pull request #2580: [HUDI 1623] Introduce start & end commit times to timeline

2021-02-28 Thread GitBox


codecov-io edited a comment on pull request #2580:
URL: https://github.com/apache/hudi/pull/2580#issuecomment-780218110







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-1647) Supports snapshot read for Flink

2021-02-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-1647:


Assignee: Danny Chen

> Supports snapshot read for Flink
> 
>
> Key: HUDI-1647
> URL: https://issues.apache.org/jira/browse/HUDI-1647
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>
> Support snapshot read for Flink for both MOR and COW table.
> - COW: the parquet files for the latest file group slices
> - MOR: the parquet base file + log files for  the latest file group slices
> Also implements the SQL connectors for both slink and source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1647) Supports snapshot read for Flink

2021-02-28 Thread Danny Chen (Jira)
Danny Chen created HUDI-1647:


 Summary: Supports snapshot read for Flink
 Key: HUDI-1647
 URL: https://issues.apache.org/jira/browse/HUDI-1647
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Flink Integration
Reporter: Danny Chen


Support snapshot read for Flink for both MOR and COW table.

- COW: the parquet files for the latest file group slices
- MOR: the parquet base file + log files for  the latest file group slices

Also implements the SQL connectors for both slink and source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1638) Some improvements to BucketAssignFunction

2021-02-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-1638.


Fixed via master branch: 7a11de12764d8f68f296c6e68a22822318bfbefa

> Some improvements to BucketAssignFunction
> -
>
> Key: HUDI-1638
> URL: https://issues.apache.org/jira/browse/HUDI-1638
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> - The {{initializeState}} executes before {{open}}, thus, the 
> {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}}
> - Only load the existing partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-1638) Some improvements to BucketAssignFunction

2021-02-28 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292631#comment-17292631
 ] 

Danny Chen edited comment on HUDI-1638 at 3/1/21, 5:51 AM:
---

Fixed via master branch: 06dc7c7fd8a867a1e1da90f7dc19b0cc2da69bba


was (Author: danny0405):
Fixed via master branch: 7a11de12764d8f68f296c6e68a22822318bfbefa

> Some improvements to BucketAssignFunction
> -
>
> Key: HUDI-1638
> URL: https://issues.apache.org/jira/browse/HUDI-1638
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> - The {{initializeState}} executes before {{open}}, thus, the 
> {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}}
> - Only load the existing partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1638) Some improvements to BucketAssignFunction

2021-02-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-1638:
-
Status: In Progress  (was: Open)

> Some improvements to BucketAssignFunction
> -
>
> Key: HUDI-1638
> URL: https://issues.apache.org/jira/browse/HUDI-1638
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> - The {{initializeState}} executes before {{open}}, thus, the 
> {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}}
> - Only load the existing partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1638) Some improvements to BucketAssignFunction

2021-02-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-1638.
--
Resolution: Fixed

> Some improvements to BucketAssignFunction
> -
>
> Key: HUDI-1638
> URL: https://issues.apache.org/jira/browse/HUDI-1638
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> - The {{initializeState}} executes before {{open}}, thus, the 
> {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}}
> - Only load the existing partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] liujinhui1994 commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-02-28 Thread GitBox


liujinhui1994 commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-787662595


   The current implementation is mainly in KafkaOffsetGen @wangxianghu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 closed pull request #2337: [HUDI-982] Flink support mor table

2021-02-28 Thread GitBox


liujinhui1994 closed pull request #2337:
URL: https://github.com/apache/hudi/pull/2337


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-1638) Some improvements to BucketAssignFunction

2021-02-28 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-1638:


Fix Version/s: 0.8.0
 Assignee: Danny Chen

> Some improvements to BucketAssignFunction
> -
>
> Key: HUDI-1638
> URL: https://issues.apache.org/jira/browse/HUDI-1638
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> - The {{initializeState}} executes before {{open}}, thus, the 
> {{checkPartitionsLoaded}} may see null {{initialPartitionsToLoad}}
> - Only load the existing partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on pull request #2612: [HUDI-1563] Adding hudi file sizing/ small file management blog

2021-02-28 Thread GitBox


nsivabalan commented on pull request #2612:
URL: https://github.com/apache/hudi/pull/2612#issuecomment-787658034


   ![Screen Shot 2021-03-01 at 12 32 47 
AM](https://user-images.githubusercontent.com/513218/109456334-9a3d8100-7a26-11eb-881e-5d1e2523185f.png)
   
   ![Screen Shot 2021-03-01 at 12 33 06 
AM](https://user-images.githubusercontent.com/513218/109456342-a0336200-7a26-11eb-96a9-0210bba6bcfe.png)
   
   ![Screen Shot 2021-03-01 at 12 33 36 
AM](https://user-images.githubusercontent.com/513218/109456348-a45f7f80-7a26-11eb-80ac-f4c947f33725.png)
   
   ![Screen Shot 2021-03-01 at 12 34 00 
AM](https://user-images.githubusercontent.com/513218/109456358-a88b9d00-7a26-11eb-9395-5abc11b43886.png)
   
   ![Screen Shot 2021-03-01 at 12 34 10 
AM](https://user-images.githubusercontent.com/513218/109456364-ad505100-7a26-11eb-9959-0ce62203f456.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on pull request #2337: [HUDI-982] Flink support mor table

2021-02-28 Thread GitBox


wangxianghu commented on pull request #2337:
URL: https://github.com/apache/hudi/pull/2337#issuecomment-787657819


   @liujinhui1994 It seems this pr is fixed by 
https://github.com/apache/hudi/commit/7a11de12764d8f68f296c6e68a22822318bfbefa ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1563) Documentation on small file handling

2021-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1563:
-
Labels: pull-request-available user-support-issues  (was: 
user-support-issues)

> Documentation on small file handling
> 
>
> Key: HUDI-1563
> URL: https://issues.apache.org/jira/browse/HUDI-1563
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available, user-support-issues
>
> Questions from slack:
> how does Hudi handle small files. What all config knobs one has to play 
> around w/.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan opened a new pull request #2612: [HUDI-1563] Adding hudi file sizing/ small file management blog

2021-02-28 Thread GitBox


nsivabalan opened a new pull request #2612:
URL: https://github.com/apache/hudi/pull/2612


   ## What is the purpose of the pull request
   
   *Adding hudi file sizing blog*
   
   ## Brief change log
   
 - *Adding hudi file sizing blog*
   
   ## Verify this pull request
   
   Built the site locally and verified
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-1563) Documentation on small file handling

2021-02-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-1563:
-

Assignee: sivabalan narayanan  (was: Nishith Agarwal)

> Documentation on small file handling
> 
>
> Key: HUDI-1563
> URL: https://issues.apache.org/jira/browse/HUDI-1563
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: user-support-issues
>
> Questions from slack:
> how does Hudi handle small files. What all config knobs one has to play 
> around w/.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-io edited a comment on pull request #2580: [HUDI 1623] Introduce start & end commit times to timeline

2021-02-28 Thread GitBox


codecov-io edited a comment on pull request #2580:
URL: https://github.com/apache/hudi/pull/2580#issuecomment-780218110


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=h1) Report
   > Merging 
[#2580](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=desc) (90d46f8) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc)
 (be257b5) will **increase** coverage by `10.19%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2580/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2580   +/-   ##
   =
   + Coverage 51.27%   61.47%   +10.19% 
   + Complexity 3241  324 -2917 
   =
 Files   438   53  -385 
 Lines 20126 1944-18182 
 Branches   2079  235 -1844 
   =
   - Hits  10320 1195 -9125 
   + Misses 8954  625 -8329 
   + Partials852  124  -728 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `61.47% <ø> (-7.98%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | `0.00% <0.00%> (-28.00%)` | |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `41.86% <0.00%> (-22.68%)` | `27.00% <0.00%> (-6.00%)` | |
   | 
[...he/hudi/common/table/log/block/HoodieLogBlock.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVMb2dCbG9jay5qYXZh)
 | | | |
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | | | |
   | 
[.../hudi/common/table/view/FileSystemViewManager.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdNYW5hZ2VyLmphdmE=)
 | | | |
   | 
[...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==)
 | | | |
   | 
[.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh)
 | | | |
   | ... and [369 
more](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree-more) | |
   



[GitHub] [hudi] codecov-io commented on pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat

2021-02-28 Thread GitBox


codecov-io commented on pull request #2611:
URL: https://github.com/apache/hudi/pull/2611#issuecomment-787641189


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2611?src=pr=h1) Report
   > Merging 
[#2611](https://codecov.io/gh/apache/hudi/pull/2611?src=pr=desc) (dc7874d) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc)
 (be257b5) will **increase** coverage by `18.27%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2611/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2611?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2611   +/-   ##
   =
   + Coverage 51.27%   69.54%   +18.27% 
   + Complexity 3241  363 -2878 
   =
 Files   438   53  -385 
 Lines 20126 1944-18182 
 Branches   2079  235 -1844 
   =
   - Hits  10320 1352 -8968 
   + Misses 8954  458 -8496 
   + Partials852  134  -718 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.54% <ø> (+0.10%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2611?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...he/hudi/common/table/log/block/HoodieLogBlock.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVMb2dCbG9jay5qYXZh)
 | | | |
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | | | |
   | 
[...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==)
 | | | |
   | 
[.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh)
 | | | |
   | 
[...org/apache/hudi/common/bloom/BloomFilterUtils.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVXRpbHMuamF2YQ==)
 | | | |
   | 
[...org/apache/hudi/common/config/TypedProperties.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9UeXBlZFByb3BlcnRpZXMuamF2YQ==)
 | | | |
   | 
[...ava/org/apache/hudi/cli/commands/TableCommand.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1RhYmxlQ29tbWFuZC5qYXZh)
 | | | |
   | 
[...rg/apache/hudi/schema/FilebasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zY2hlbWEvRmlsZWJhc2VkU2NoZW1hUHJvdmlkZXIuamF2YQ==)
 | | | |
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0luY3JlbWVudGFsUmVsYXRpb24uc2NhbGE=)
 | | | |
   | 
[...in/java/org/apache/hudi/schema/SchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zY2hlbWEvU2NoZW1hUHJvdmlkZXIuamF2YQ==)
 | | | |
   | ... and [371 
more](https://codecov.io/gh/apache/hudi/pull/2611/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1063) Save in Google Cloud Storage not working

2021-02-28 Thread Volodymyr Burenin (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292615#comment-17292615
 ] 

Volodymyr Burenin commented on HUDI-1063:
-

Would be nice to see the used hadoop configuration and GCS connector versions.

I currently suspect that ' java.lang.NoSuchMethodError' has something to do 
with mixed up dependencies, likely there are some incompatible dependencies 
either in the spark image or somewhere else. One of the latest GCS connector is 
incompatible with Hudi due to the dependencies.

> Save in Google Cloud Storage not working
> 
>
> Key: HUDI-1063
> URL: https://issues.apache.org/jira/browse/HUDI-1063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: David Lacalle Castillo
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.8.0
>
>
> I added to spark submit the following properties: 
> {{--packages 
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
>  \  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}
> Spark version 2.4.5 and Hadoop version 3.2.1
>  
> I am trying to save a Dataframe as follows in Google Cloud Storage as follows:
> tableName = "forecasts"
> basePath = "gs://hudi-datalake/" + tableName
> hudi_options = {
>  'hoodie.table.name': tableName,
>  'hoodie.datasource.write.recordkey.field': 'uuid',
>  'hoodie.datasource.write.partitionpath.field': 'partitionpath',
>  'hoodie.datasource.write.table.name': tableName,
>  'hoodie.datasource.write.operation': 'insert',
>  'hoodie.datasource.write.precombine.field': 'ts',
>  'hoodie.upsert.shuffle.parallelism': 2, 
>  'hoodie.insert.shuffle.parallelism': 2
> }
> results = results.selectExpr(
>  "ds as date",
>  "store",
>  "item",
>  "y as sales",
>  "yhat as sales_predicted",
>  "yhat_upper as sales_predicted_upper",
>  "yhat_lower as sales_predicted_lower",
>  "training_date")
> results.write.format("hudi"). \
>  options(**hudi_options). \
>  mode("overwrite"). \
>  save(basePath)
> I am getting the following error:
> Py4JJavaError: An error occurred while calling o312.save. : 
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at 
> io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
>  at io.javalin.Javalin.(Javalin.java:94) at 
> io.javalin.Javalin.create(Javalin.java:107) at 
> org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102)
>  at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83)
>  at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) 
> at 
> org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) 
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> 

[jira] [Updated] (HUDI-1646) Allow support for pre-commit validation

2021-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1646:
-
Labels: pull-request-available  (was: )

> Allow support for pre-commit validation
> ---
>
> Key: HUDI-1646
> URL: https://issues.apache.org/jira/browse/HUDI-1646
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> We have use cases where we want to support running hive/presto queries to 
> validate data on uncommitted data. If validation passes, we will promote the 
> commit. Otherwise, rollback the commit.
> This is not possible today because ParquetInputFormat supports only reading 
> committed data. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] satishkotha opened a new pull request #2611: [HUDI-1646] Provide mechanism to read uncommitted data through InputFormat

2021-02-28 Thread GitBox


satishkotha opened a new pull request #2611:
URL: https://github.com/apache/hudi/pull/2611


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *Add support for pre-commit validation hooks through hive/presto queries. We 
provide mechanism to read uncommitted data through InputFormat. If validation 
passes, we will promote the commit. Otherwise, rollback the commit.
   
   
   ## Brief change log
   
   * Add new consume mode: validate. In this consume mode, we allow reading 
dirty data. Users can run hive/presto queries, validate data and then 
commit/abort. Note that users also need to explicitly specify dirty commit time 
to use this API. (We can consider making this optional)
   * In the first version, only added support for COW tables and parquet 
format. If general approach looks good, i can extend support for other file 
formats/table types as a follow up.
   
   ## Verify this pull request
   
   This change added tests.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1646) Allow support for pre-commit validation

2021-02-28 Thread satish (Jira)
satish created HUDI-1646:


 Summary: Allow support for pre-commit validation
 Key: HUDI-1646
 URL: https://issues.apache.org/jira/browse/HUDI-1646
 Project: Apache Hudi
  Issue Type: New Feature
Reporter: satish
Assignee: satish
 Fix For: 0.8.0


We have use cases where we want to support running hive/presto queries to 
validate data on uncommitted data. If validation passes, we will promote the 
commit. Otherwise, rollback the commit.

This is not possible today because ParquetInputFormat supports only reading 
committed data. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] satishkotha commented on pull request #2610: [HUDI-1644] Do not delete older rollback instants as part of rollback…

2021-02-28 Thread GitBox


satishkotha commented on pull request #2610:
URL: https://github.com/apache/hudi/pull/2610#issuecomment-787631465


   > @satishkotha High level looks good to me, can we confirm if we have a 
equivalent test case on the archiving of rollback instants that simulates the 
same behavior of not leaving behind any rollback instants ?
   
   @n3nash Looks like we dont have unit tests for both clean/rollback instants. 
I created https://issues.apache.org/jira/browse/HUDI-1645 as a follow up.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1645) Add unit test to verify clean and rollback instants are archived correctly

2021-02-28 Thread satish (Jira)
satish created HUDI-1645:


 Summary: Add unit test to verify clean and rollback instants are 
archived correctly
 Key: HUDI-1645
 URL: https://issues.apache.org/jira/browse/HUDI-1645
 Project: Apache Hudi
  Issue Type: Bug
Reporter: satish


https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiveLog.java

The tests dont seem to cover clean/rollback instants. Add those instants and 
make sure those instants are archived correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1632) Supports merge on read write mode for Flink writer

2021-02-28 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1632.
--
Resolution: Implemented

Implemented via master branch: 7a11de12764d8f68f296c6e68a22822318bfbefa

> Supports merge on read write mode for Flink writer
> --
>
> Key: HUDI-1632
> URL: https://issues.apache.org/jira/browse/HUDI-1632
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1632) Supports merge on read write mode for Flink writer

2021-02-28 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-1632:
--

Assignee: Danny Chen

> Supports merge on read write mode for Flink writer
> --
>
> Key: HUDI-1632
> URL: https://issues.apache.org/jira/browse/HUDI-1632
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1632) Supports merge on read write mode for Flink writer

2021-02-28 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-1632:
---
Fix Version/s: 0.8.0

> Supports merge on read write mode for Flink writer
> --
>
> Key: HUDI-1632
> URL: https://issues.apache.org/jira/browse/HUDI-1632
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yanghua merged pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer

2021-02-28 Thread GitBox


yanghua merged pull request #2593:
URL: https://github.com/apache/hudi/pull/2593


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (be257b5 -> 7a11de1)

2021-02-28 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from be257b5  [Hudi-1583]: Fix bug that Hudi will skip remaining log files 
if there is logFile with zero size in logFileList when merge on read. (#2584)
 add 7a11de1  [HUDI-1632] Supports merge on read write mode for Flink 
writer (#2593)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/io/HoodieAppendHandle.java |  48 ---
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |  15 ++-
 .../apache/hudi/client/HoodieFlinkWriteClient.java | 121 ++---
 .../hudi/index/state/FlinkInMemoryStateIndex.java  |  10 --
 ...actory.java => ExplicitWriteHandleFactory.java} |   6 +-
 .../java/org/apache/hudi/io/FlinkAppendHandle.java | 125 +
 .../java/org/apache/hudi/io/FlinkCreateHandle.java |  14 +-
 .../java/org/apache/hudi/io/FlinkMergeHandle.java  |  28 +---
 .../hudi/table/HoodieFlinkCopyOnWriteTable.java|  64 -
 .../hudi/table/HoodieFlinkMergeOnReadTable.java|  66 -
 .../org/apache/hudi/table/HoodieFlinkTable.java|   3 +-
 .../commit/BaseFlinkCommitActionExecutor.java  |  29 ++--
 .../hudi/table/action/commit/FlinkMergeHelper.java |  11 +-
 .../delta/BaseFlinkDeltaCommitActionExecutor.java  |  65 +
 .../FlinkUpsertDeltaCommitActionExecutor.java} |  24 ++--
 .../table/action/compact/FlinkCompactHelpers.java} |  26 ++--
 .../FlinkScheduleCompactionActionExecutor.java}|  14 +-
 .../HoodieFlinkMergeOnReadTableCompactor.java} | 111 +++
 .../org/apache/hudi/operator/FlinkOptions.java |  40 +-
 .../apache/hudi/operator/StreamWriteFunction.java  |   8 +-
 .../operator/StreamWriteOperatorCoordinator.java   |  19 +++
 .../hudi/operator/compact/CompactFunction.java |  94 +
 .../operator/compact/CompactionCommitEvent.java|  62 +
 .../operator/compact/CompactionCommitSink.java | 150 +
 .../hudi/operator/compact/CompactionPlanEvent.java |  31 ++---
 .../operator/compact/CompactionPlanOperator.java   | 146 
 .../operator/partitioner/BucketAssignFunction.java |   9 +-
 .../hudi/operator/partitioner/BucketAssigner.java  |   4 +-
 .../hudi/operator/partitioner/BucketAssigners.java |  54 
 .../partitioner/delta/DeltaBucketAssigner.java |  62 +++--
 .../java/org/apache/hudi/util/StreamerUtil.java|   6 +
 .../apache/hudi/operator/StreamWriteITCase.java|  83 
 ...FunctionTest.java => TestWriteCopyOnWrite.java} |  85 +++-
 .../apache/hudi/operator/TestWriteMergeOnRead.java |  96 +
 .../operator/TestWriteMergeOnReadWithCompact.java  |  58 
 .../operator/utils/CompactFunctionWrapper.java | 142 +++
 .../operator/utils/StreamWriteFunctionWrapper.java |  16 +++
 .../org/apache/hudi/operator/utils/TestData.java   |  96 +
 38 files changed, 1734 insertions(+), 307 deletions(-)
 rename 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/{ExplicitCreateHandleFactory.java
 => ExplicitWriteHandleFactory.java} (87%)
 create mode 100644 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/FlinkAppendHandle.java
 create mode 100644 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/delta/BaseFlinkDeltaCommitActionExecutor.java
 copy 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/{FlinkUpsertCommitActionExecutor.java
 => delta/FlinkUpsertDeltaCommitActionExecutor.java} (66%)
 copy 
hudi-client/{hudi-spark-client/src/main/java/org/apache/hudi/table/action/compact/SparkCompactHelpers.java
 => 
hudi-flink-client/src/main/java/org/apache/hudi/table/action/compact/FlinkCompactHelpers.java}
 (75%)
 copy 
hudi-client/{hudi-spark-client/src/main/java/org/apache/hudi/table/action/compact/SparkScheduleCompactionActionExecutor.java
 => 
hudi-flink-client/src/main/java/org/apache/hudi/table/action/compact/FlinkScheduleCompactionActionExecutor.java}
 (92%)
 copy 
hudi-client/{hudi-spark-client/src/main/java/org/apache/hudi/table/action/compact/HoodieSparkMergeOnReadTableCompactor.java
 => 
hudi-flink-client/src/main/java/org/apache/hudi/table/action/compact/HoodieFlinkMergeOnReadTableCompactor.java}
 (72%)
 create mode 100644 
hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactFunction.java
 create mode 100644 
hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactionCommitEvent.java
 create mode 100644 
hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactionCommitSink.java
 copy 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractCompactor.java
 => 
hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactionPlanEvent.java
 (52%)
 create mode 100644 

[GitHub] [hudi] wangxianghu commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-02-28 Thread GitBox


wangxianghu commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-787616261


   > I will add the unit test, and then please review
   
   Hi @liujinhui1994 sorry for the day.
   Can we keep all these changes in `KafkaOffsetGen`, this seems more elegant



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] n3nash commented on pull request #2610: [HUDI-1644] Do not delete older rollback instants as part of rollback…

2021-02-28 Thread GitBox


n3nash commented on pull request #2610:
URL: https://github.com/apache/hudi/pull/2610#issuecomment-787615017


   @satishkotha High level looks good to me, can we confirm if we have a 
equivalent test case on the archiving of rollback instants that simulates the 
same behavior of not leaving behind any rollback instants ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2610: [HUDI-1644] Do not delete older rollback instants as part of rollback…

2021-02-28 Thread GitBox


codecov-io commented on pull request #2610:
URL: https://github.com/apache/hudi/pull/2610#issuecomment-787602584


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2610?src=pr=h1) Report
   > Merging 
[#2610](https://codecov.io/gh/apache/hudi/pull/2610?src=pr=desc) (b108e6f) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc)
 (be257b5) will **decrease** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2610/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2610?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2610  +/-   ##
   
   - Coverage 51.27%   51.26%   -0.01% 
   + Complexity 3241 3238   -3 
   
 Files   438  438  
 Lines 2012620112  -14 
 Branches   2079 2079  
   
   - Hits  1032010311   -9 
   + Misses 8954 8949   -5 
 Partials852  852  
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `36.87% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.30% <ø> (-0.05%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `46.85% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.71% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `49.62% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.59% <ø> (+0.15%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2610?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...c/main/java/org/apache/hudi/common/fs/FSUtils.java](https://codecov.io/gh/apache/hudi/pull/2610/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0ZTVXRpbHMuamF2YQ==)
 | `47.80% <ø> (-1.52%)` | `57.00 <0.00> (-4.00)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2610/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2610/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `71.07% <0.00%> (+0.35%)` | `53.00% <0.00%> (+1.00%)` | |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2610/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `65.69% <0.00%> (+1.16%)` | `33.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2580: [HUDI 1623] Introduce start & end commit times to timeline

2021-02-28 Thread GitBox


codecov-io edited a comment on pull request #2580:
URL: https://github.com/apache/hudi/pull/2580#issuecomment-780218110


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=h1) Report
   > Merging 
[#2580](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=desc) (1065c5c) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc)
 (be257b5) will **increase** coverage by `10.24%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2580/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2580   +/-   ##
   =
   + Coverage 51.27%   61.52%   +10.24% 
   + Complexity 3241  325 -2916 
   =
 Files   438   53  -385 
 Lines 20126 1944-18182 
 Branches   2079  235 -1844 
   =
   - Hits  10320 1196 -9124 
   + Misses 8954  625 -8329 
   + Partials852  123  -729 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `61.52% <ø> (-7.93%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2580?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | `0.00% <0.00%> (-28.00%)` | |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `41.86% <0.00%> (-22.68%)` | `27.00% <0.00%> (-6.00%)` | |
   | 
[...g/apache/hudi/cli/utils/SparkTempViewProvider.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL1NwYXJrVGVtcFZpZXdQcm92aWRlci5qYXZh)
 | | | |
   | 
[...common/table/view/AbstractTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvQWJzdHJhY3RUYWJsZUZpbGVTeXN0ZW1WaWV3LmphdmE=)
 | | | |
   | 
[...e/hudi/common/util/collection/ImmutableTriple.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9JbW11dGFibGVUcmlwbGUuamF2YQ==)
 | | | |
   | 
[...sioning/clean/CleanMetadataV2MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YVYyTWlncmF0aW9uSGFuZGxlci5qYXZh)
 | | | |
   | 
[...va/org/apache/hudi/hive/util/ColumnNameXLator.java](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db2x1bW5OYW1lWExhdG9yLmphdmE=)
 | | | |
   | ... and [370 
more](https://codecov.io/gh/apache/hudi/pull/2580/diff?src=pr=tree-more) | |
   



[GitHub] [hudi] codecov-io edited a comment on pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer

2021-02-28 Thread GitBox


codecov-io edited a comment on pull request #2593:
URL: https://github.com/apache/hudi/pull/2593#issuecomment-784220708


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2593?src=pr=h1) Report
   > Merging 
[#2593](https://codecov.io/gh/apache/hudi/pull/2593?src=pr=desc) (c22ac8d) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/06dc7c7fd8a867a1e1da90f7dc19b0cc2da69bba?el=desc)
 (06dc7c7) will **increase** coverage by `18.37%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2593/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2593?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2593   +/-   ##
   =
   + Coverage 51.22%   69.59%   +18.37% 
   + Complexity 3230  364 -2866 
   =
 Files   438   53  -385 
 Lines 20093 1944-18149 
 Branches   2069  235 -1834 
   =
   - Hits  10292 1353 -8939 
   + Misses 8954  458 -8496 
   + Partials847  133  -714 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.59% <ø> (+0.08%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2593?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...hudi/utilities/sources/helpers/KafkaOffsetGen.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvaGVscGVycy9LYWZrYU9mZnNldEdlbi5qYXZh)
 | `85.84% <0.00%> (-2.94%)` | `20.00% <0.00%> (+4.00%)` | :arrow_down: |
   | 
[...he/hudi/common/table/log/block/HoodieLogBlock.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVMb2dCbG9jay5qYXZh)
 | | | |
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | | | |
   | 
[.../hudi/common/table/view/FileSystemViewManager.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdNYW5hZ2VyLmphdmE=)
 | | | |
   | 
[...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==)
 | | | |
   | 
[.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh)
 | | | |
   | 
[...org/apache/hudi/common/bloom/BloomFilterUtils.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVXRpbHMuamF2YQ==)
 | | | |
   | 
[...org/apache/hudi/common/config/TypedProperties.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9UeXBlZFByb3BlcnRpZXMuamF2YQ==)
 | | | |
   | 
[...ava/org/apache/hudi/cli/commands/TableCommand.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1RhYmxlQ29tbWFuZC5qYXZh)
 | | | |
   | 
[...rg/apache/hudi/schema/FilebasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9zY2hlbWEvRmlsZWJhc2VkU2NoZW1hUHJvdmlkZXIuamF2YQ==)
 | | | |
   | ... and [371 
more](https://codecov.io/gh/apache/hudi/pull/2593/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure 

[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer

2021-02-28 Thread GitBox


danny0405 commented on a change in pull request #2593:
URL: https://github.com/apache/hudi/pull/2593#discussion_r584416641



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactEvent.java
##
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.operator.compact;
+
+import org.apache.hudi.common.model.CompactionOperation;
+
+import java.io.Serializable;
+
+/**
+ * Represents a compact command from the compaction plan task {@link 
CompactionPlanOperator}.
+ */
+public class CompactEvent implements Serializable {

Review comment:
   > Thanks @danny0405 for the awesome work. Hard to catch up on the review 
since you are making progress too fast :)
   > Can't go into detail about this large PR too much until I get a chance to 
run this myself. Left some high-level comments.
   > One concern is about the test cases. I feel like Flink writer is not as 
well tested as Spark, so the reliability is a bit concerning for me when we 
officially release this feature. Any plan to add more test cases?
   
   Yes, we can add more test cases when more feature are introduced for Flink, 
such as `SQL connectors`, `INSERT OVERRIDE`, more kinds of key generators.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] pengzhiwei2018 commented on issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet

2021-02-28 Thread GitBox


pengzhiwei2018 commented on issue #2609:
URL: https://github.com/apache/hudi/issues/2609#issuecomment-787593403


   Hi @ramachandranms , do query cow table by presto? I have found an issue 
that query hudi cow table slow than parquet by presto before. And you can try 
this fix at: https://github.com/pengzhiwei2018/hudi/tree/dev_presto. Hope it 
can help you~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer

2021-02-28 Thread GitBox


danny0405 commented on a change in pull request #2593:
URL: https://github.com/apache/hudi/pull/2593#discussion_r584411150



##
File path: 
hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactCommitEvent.java
##
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.operator.compact;
+
+import org.apache.hudi.client.WriteStatus;
+
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Represents a commit event from the compaction task {@link CompactFunction}.
+ */
+public class CompactCommitEvent implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  /**
+   * The compaction commit instant time.
+   */
+  private final String instant;
+  /**
+   * The write statuses.
+   */
+  private final List writeStatuses;
+  /**
+   * The compaction task identifier.
+   */
+  private final int taskID;
+
+  public CompactCommitEvent(String instant, List writeStatuses, 
int taskID) {
+this.instant = instant;
+this.writeStatuses = writeStatuses;
+this.taskID = taskID;
+  }
+
+  public String getInstant() {
+return instant;
+  }
+
+  public List getWriteStatuses() {
+return writeStatuses;
+  }
+
+  public int getTaskID() {

Review comment:
   Not used in current code, but i would rather keep it in case of future 
usage.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer

2021-02-28 Thread GitBox


danny0405 commented on a change in pull request #2593:
URL: https://github.com/apache/hudi/pull/2593#discussion_r584410935



##
File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java
##
@@ -165,6 +165,42 @@ private FlinkOptions() {
   .defaultValue(128D) // 128MB
   .withDescription("Batch buffer size in MB to flush data into the 
underneath filesystem");
 
+  // 
+  //  Compaction Options
+  // 
+
+  public static final ConfigOption WRITE_ASYNC_COMPACTION = 
ConfigOptions
+  .key("compaction.async.enabled")
+  .booleanType()
+  .defaultValue(true) // default true for MOR write
+  .withDescription("Async Compaction, enabled by default for MOR");
+
+  public static final String NUM_COMMITS = "num_commits";
+  public static final String TIME_ELAPSED = "time_elapsed";
+  public static final String NUM_AND_TIME = "num_and_time";
+  public static final String NUM_OR_TIME = "num_or_time";
+  public static final ConfigOption COMPACTION_TRIGGER_STRATEGY = 
ConfigOptions
+  .key("compaction.trigger.strategy")

Review comment:
   No, the option key of `HoodieCompactionConfig` is too long and not very 
friendly to use as SQL options.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer

2021-02-28 Thread GitBox


danny0405 commented on a change in pull request #2593:
URL: https://github.com/apache/hudi/pull/2593#discussion_r584410723



##
File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java
##
@@ -165,6 +165,42 @@ private FlinkOptions() {
   .defaultValue(128D) // 128MB
   .withDescription("Batch buffer size in MB to flush data into the 
underneath filesystem");
 
+  // 
+  //  Compaction Options
+  // 
+
+  public static final ConfigOption WRITE_ASYNC_COMPACTION = 
ConfigOptions
+  .key("compaction.async.enabled")
+  .booleanType()
+  .defaultValue(true) // default true for MOR write
+  .withDescription("Async Compaction, enabled by default for MOR");
+
+  public static final String NUM_COMMITS = "num_commits";

Review comment:
   No, the enumeration comes from what HUDI core defines, see 
`CompactionTriggerStrategy`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer

2021-02-28 Thread GitBox


danny0405 commented on a change in pull request #2593:
URL: https://github.com/apache/hudi/pull/2593#discussion_r584409961



##
File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java
##
@@ -165,6 +165,42 @@ private FlinkOptions() {
   .defaultValue(128D) // 128MB
   .withDescription("Batch buffer size in MB to flush data into the 
underneath filesystem");
 
+  // 
+  //  Compaction Options
+  // 
+
+  public static final ConfigOption WRITE_ASYNC_COMPACTION = 
ConfigOptions

Review comment:
   Rename to `COMPACTION_ASYNC_ENABLED`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-28 Thread GitBox


codecov-io edited a comment on pull request #2374:
URL: https://github.com/apache/hudi/pull/2374#issuecomment-750782300


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=h1) Report
   > Merging 
[#2374](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=desc) (2d7d890) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc)
 (be257b5) will **increase** coverage by `10.09%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2374/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2374   +/-   ##
   =
   + Coverage 51.27%   61.36%   +10.09% 
   + Complexity 3241  324 -2917 
   =
 Files   438   53  -385 
 Lines 20126 1944-18182 
 Branches   2079  235 -1844 
   =
   - Hits  10320 1193 -9127 
   + Misses 8954  627 -8327 
   + Partials852  124  -728 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `61.36% <0.00%> (-8.08%)` | `0.00 <0.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.71% <0.00%> (ø)` | `52.00 <0.00> (ø)` | |
   | 
[...ies/exception/HoodieSnapshotExporterException.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVTbmFwc2hvdEV4cG9ydGVyRXhjZXB0aW9uLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `5.17% <0.00%> (-83.63%)` | `0.00% <0.00%> (-28.00%)` | |
   | 
[...hudi/utilities/schema/JdbcbasedSchemaProvider.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9KZGJjYmFzZWRTY2hlbWFQcm92aWRlci5qYXZh)
 | `0.00% <0.00%> (-72.23%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...he/hudi/utilities/transform/AWSDmsTransformer.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3RyYW5zZm9ybS9BV1NEbXNUcmFuc2Zvcm1lci5qYXZh)
 | `0.00% <0.00%> (-66.67%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `40.69% <0.00%> (-23.84%)` | `27.00% <0.00%> (-6.00%)` | |
   | 
[...meline/versioning/clean/CleanMetadataMigrator.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YU1pZ3JhdG9yLmphdmE=)
 | | | |
   | 
[...he/hudi/common/table/timeline/dto/BaseFileDTO.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9CYXNlRmlsZURUTy5qYXZh)
 | | | |
   | 
[...a/org/apache/hudi/avro/HoodieAvroWriteSupport.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvV3JpdGVTdXBwb3J0LmphdmE=)
 | | | |
   | 
[.../java/org/apache/hudi/common/metrics/Registry.java](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21ldHJpY3MvUmVnaXN0cnkuamF2YQ==)
 | | | |
   | ... and [375 
more](https://codecov.io/gh/apache/hudi/pull/2374/diff?src=pr=tree-more) | |
   



[jira] [Updated] (HUDI-1644) Do not delete rollback instants in RollbackActionExecutor

2021-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1644:
-
Labels: pull-request-available  (was: )

> Do not delete rollback instants in RollbackActionExecutor
> -
>
> Key: HUDI-1644
> URL: https://issues.apache.org/jira/browse/HUDI-1644
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: satish
>Assignee: satish
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> We are trying to remove older rollback files here 
> https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java#L204
> But, this is not compatible with timeline layout version 1 because 
> rollback.inflight and requested continue to stay on. This causes problems for 
> RFC-15 metadata sync. 
> Archival takes care of removing these rollback files, so we dont needthis 
> special logic in BaseRollbackActionExecutor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] satishkotha opened a new pull request #2610: [HUDI-1644] Do not delete older rollback instants as part of rollback…

2021-02-28 Thread GitBox


satishkotha opened a new pull request #2610:
URL: https://github.com/apache/hudi/pull/2610


   ## What is the purpose of the pull request
   
   Archival can take care of removing old rollback instants cleanly
   
   ## Brief change log
   
   * rollback instants are cleaned up by archival process. Trying to delete 
them during rollback is not needed. This code also has many bugs: a) it removes 
newer rollback instants instead of older ones b) it doesnt clean up 
inflight/requested files, so not compatible with timeline layout version 1.
   
   ## Verify this pull request
   Only code deletions, existing tests cover functionality
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1644) Do not delete rollback instants in RollbackActionExecutor

2021-02-28 Thread satish (Jira)
satish created HUDI-1644:


 Summary: Do not delete rollback instants in RollbackActionExecutor
 Key: HUDI-1644
 URL: https://issues.apache.org/jira/browse/HUDI-1644
 Project: Apache Hudi
  Issue Type: Bug
Reporter: satish
Assignee: satish
 Fix For: 0.8.0


We are trying to remove older rollback files here 
https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java#L204

But, this is not compatible with timeline layout version 1 because 
rollback.inflight and requested continue to stay on. This causes problems for 
RFC-15 metadata sync. 

Archival takes care of removing these rollback files, so we dont needthis 
special logic in BaseRollbackActionExecutor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1539) Bug in HoodieCombineRealtimeRecordReader returns wrong results

2021-02-28 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish closed HUDI-1539.


> Bug in HoodieCombineRealtimeRecordReader returns wrong results
> --
>
> Key: HUDI-1539
> URL: https://issues.apache.org/jira/browse/HUDI-1539
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available, sev:critical, user-support-issues
>
> https://github.com/apache/hudi/issues/2346#issuecomment-758591316 
> in a rt table
> the hive query has predicate push down
> there are no less than 3 splits (thus no less than 3 recordReaders in 
> HoodieCombineRealtimeRecordReader), and the records satisfy the predicate are 
> in the split which is in a relatively back position of the List
> 2 recordReaders in succession with this.currentRecordReader.next(key, value) 
> returns false, as the predicate push down has filtered the baseFile.
> In step 4, it leads to HoodieCombineRealtimeRecordReader::next(NullWritable 
> key, ArrayWritable value) return false and the reader will stop read next. 
> So, records which satisfy the predicate are in the remanined recordReaders 
> but can not be read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1539) Bug in HoodieCombineRealtimeRecordReader returns wrong results

2021-02-28 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish resolved HUDI-1539.
--
Resolution: Fixed

> Bug in HoodieCombineRealtimeRecordReader returns wrong results
> --
>
> Key: HUDI-1539
> URL: https://issues.apache.org/jira/browse/HUDI-1539
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available, sev:critical, user-support-issues
>
> https://github.com/apache/hudi/issues/2346#issuecomment-758591316 
> in a rt table
> the hive query has predicate push down
> there are no less than 3 splits (thus no less than 3 recordReaders in 
> HoodieCombineRealtimeRecordReader), and the records satisfy the predicate are 
> in the split which is in a relatively back position of the List
> 2 recordReaders in succession with this.currentRecordReader.next(key, value) 
> returns false, as the predicate push down has filtered the baseFile.
> In step 4, it leads to HoodieCombineRealtimeRecordReader::next(NullWritable 
> key, ArrayWritable value) return false and the reader will stop read next. 
> So, records which satisfy the predicate are in the remanined recordReaders 
> but can not be read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] teeyog commented on a change in pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-02-28 Thread GitBox


teeyog commented on a change in pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#discussion_r584401119



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##
@@ -84,6 +88,26 @@ class DefaultSource extends RelationProvider
 val tablePath = DataSourceUtils.getTablePath(fs, globPaths.toArray)
 log.info("Obtained hudi table path: " + tablePath)
 
+if (path.nonEmpty) {
+  val _path = path.get.stripSuffix("/")
+  val pathTmp = new Path(_path).makeQualified(fs.getUri, 
fs.getWorkingDirectory)
+  // If the user specifies the table path, the data path is automatically 
inferred
+  if (pathTmp.toString.equals(tablePath)) {
+val sparkEngineContext = new 
HoodieSparkEngineContext(sqlContext.sparkContext)
+val fsBackedTableMetadata =
+  new FileSystemBackedTableMetadata(sparkEngineContext, new 
SerializableConfiguration(fs.getConf), tablePath, false)
+val partitionPaths = fsBackedTableMetadata.getAllPartitionPaths

Review comment:
   @lw309637554 Thank you for your review, the previous path to get the 
hudi table can also be obtained through configuration instead of inference





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-02-28 Thread GitBox


vinothchandar commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r584393784



##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##
@@ -112,13 +113,59 @@ private void initIfNeeded() {
 
   @Override
   protected Option> 
getRecordByKeyFromMetadata(String key) {
+// This function can be called in parallel through multiple threads. For 
each thread, we determine the thread-local
+// versions of the baseFile and logRecord readers to use.
+// - If reuse is enabled, we use the same readers and dont close them
+// - if reuse is disabled, we open new readers in each thread and close 
them
+HoodieFileReader localFileReader = null;
+HoodieMetadataMergedLogRecordScanner localLogRecordScanner = null;
+synchronized (this) {
+  if (!metadataConfig.enableReuse()) {
+// reuse is disabled so always open new readers
+try {
+  Pair readers 
= openReaders();
+  localFileReader = readers.getKey();
+  localLogRecordScanner = readers.getValue();
+} catch (IOException e) {
+  throw new HoodieIOException("Error opening readers", e);
+}
+  } else if (baseFileReader == null && logRecordScanner == null) {
+// reuse is enabled but we haven't opened the readers yet
+try {
+  Pair readers 
= openReaders();
+  localFileReader = readers.getKey();
+  localLogRecordScanner = readers.getValue();
+  // cache the readers
+  baseFileReader = localFileReader;
+  logRecordScanner = localLogRecordScanner;
+} catch (IOException e) {
+  throw new HoodieIOException("Error opening readers", e);
+}
+  } else {
+// reuse the already open readers
+ValidationUtils.checkState((baseFileReader != null || logRecordScanner 
!= null), "Readers should already be open");
+localFileReader = baseFileReader;
+localLogRecordScanner = logRecordScanner;
+  }
+}

Review comment:
   @prashantwason and I already synced on this. Will be catching up on 
reviews. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-02-28 Thread GitBox


nsivabalan commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r584328039



##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##
@@ -112,13 +113,59 @@ private void initIfNeeded() {
 
   @Override
   protected Option> 
getRecordByKeyFromMetadata(String key) {
+// This function can be called in parallel through multiple threads. For 
each thread, we determine the thread-local
+// versions of the baseFile and logRecord readers to use.
+// - If reuse is enabled, we use the same readers and dont close them
+// - if reuse is disabled, we open new readers in each thread and close 
them
+HoodieFileReader localFileReader = null;
+HoodieMetadataMergedLogRecordScanner localLogRecordScanner = null;
+synchronized (this) {
+  if (!metadataConfig.enableReuse()) {
+// reuse is disabled so always open new readers
+try {
+  Pair readers 
= openReaders();
+  localFileReader = readers.getKey();
+  localLogRecordScanner = readers.getValue();
+} catch (IOException e) {
+  throw new HoodieIOException("Error opening readers", e);
+}
+  } else if (baseFileReader == null && logRecordScanner == null) {
+// reuse is enabled but we haven't opened the readers yet
+try {
+  Pair readers 
= openReaders();
+  localFileReader = readers.getKey();
+  localLogRecordScanner = readers.getValue();
+  // cache the readers
+  baseFileReader = localFileReader;
+  logRecordScanner = localLogRecordScanner;
+} catch (IOException e) {
+  throw new HoodieIOException("Error opening readers", e);
+}
+  } else {
+// reuse the already open readers
+ValidationUtils.checkState((baseFileReader != null || logRecordScanner 
!= null), "Readers should already be open");
+localFileReader = baseFileReader;
+localLogRecordScanner = logRecordScanner;
+  }
+}

Review comment:
   @vinothchandar : Can we sync up on this sometime and get a closure. 
Would like to have this in before our next release. May be at the end of next 
week's sync meeting, we can discuss on this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2577: [HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer

2021-02-28 Thread GitBox


codecov-io edited a comment on pull request #2577:
URL: https://github.com/apache/hudi/pull/2577#issuecomment-779312995


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2577?src=pr=h1) Report
   > Merging 
[#2577](https://codecov.io/gh/apache/hudi/pull/2577?src=pr=desc) (d5fb81f) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/be257b58c689510a21529019a766b7a2bfc7ebe6?el=desc)
 (be257b5) will **increase** coverage by `18.16%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2577/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2577?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2577   +/-   ##
   =
   + Coverage 51.27%   69.44%   +18.16% 
   + Complexity 3241  363 -2878 
   =
 Files   438   53  -385 
 Lines 20126 1944-18182 
 Branches   2079  235 -1844 
   =
   - Hits  10320 1350 -8970 
   + Misses 8954  460 -8494 
   + Partials852  134  -718 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.44% <66.66%> (ø)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2577?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `78.39% <66.66%> (ø)` | `18.00 <1.00> (ø)` | |
   | 
[...udi/common/table/log/block/HoodieCorruptBlock.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVDb3JydXB0QmxvY2suamF2YQ==)
 | | | |
   | 
[.../apache/hudi/operator/InstantGenerateOperator.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9JbnN0YW50R2VuZXJhdGVPcGVyYXRvci5qYXZh)
 | | | |
   | 
[...i/common/table/view/HoodieTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvSG9vZGllVGFibGVGaWxlU3lzdGVtVmlldy5qYXZh)
 | | | |
   | 
[.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh)
 | | | |
   | 
[.../apache/hudi/timeline/service/TimelineService.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | | | |
   | 
[.../hive/SlashEncodedHourPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkSG91clBhcnRpdGlvblZhbHVlRXh0cmFjdG9yLmphdmE=)
 | | | |
   | 
[...pache/hudi/hadoop/config/HoodieRealtimeConfig.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL2NvbmZpZy9Ib29kaWVSZWFsdGltZUNvbmZpZy5qYXZh)
 | | | |
   | 
[...a/org/apache/hudi/common/util/ReflectionUtils.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUmVmbGVjdGlvblV0aWxzLmphdmE=)
 | | | |
   | 
[...e/hudi/common/table/timeline/dto/FileGroupDTO.java](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlR3JvdXBEVE8uamF2YQ==)
 | | | |
   | ... and [376 
more](https://codecov.io/gh/apache/hudi/pull/2577/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For 

[GitHub] [hudi] nsivabalan commented on a change in pull request #2577: [HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer

2021-02-28 Thread GitBox


nsivabalan commented on a change in pull request #2577:
URL: https://github.com/apache/hudi/pull/2577#discussion_r584311235



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -965,16 +969,30 @@ public void testDistributedTestDataSource() {
 assertEquals(1000, c);
   }
 
-  private static void prepareParquetDFSFiles(int numRecords) throws 
IOException {
-String path = PARQUET_SOURCE_ROOT + "/1.parquet";
+  protected static void prepareParquetDFSFiles(int numRecords, String 
baseParquetPath) throws IOException {
+prepareParquetDFSFiles(numRecords, baseParquetPath, "1.parquet");
+  }
+
+  protected static void prepareParquetDFSFiles(int numRecords, String 
baseParquetPath, String fileName) throws IOException {
+String path = baseParquetPath + "/" + fileName;

Review comment:
   Paths.get() returns java.nio.file.Paths but we need Hadoop fs path(lines 
1003, 1006) here. Hence leaving it as is. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1063) Save in Google Cloud Storage not working

2021-02-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292433#comment-17292433
 ] 

sivabalan narayanan commented on HUDI-1063:
---

[~vburenin]: Your help here is much appreciated. 

> Save in Google Cloud Storage not working
> 
>
> Key: HUDI-1063
> URL: https://issues.apache.org/jira/browse/HUDI-1063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: David Lacalle Castillo
>Priority: Critical
>  Labels: sev:critical, user-support-issues
> Fix For: 0.8.0
>
>
> I added to spark submit the following properties: 
> {{--packages 
> org.apache.hudi:hudi-spark-bundle_2.11:0.5.3,org.apache.spark:spark-avro_2.11:2.4.4
>  \  --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}
> Spark version 2.4.5 and Hadoop version 3.2.1
>  
> I am trying to save a Dataframe as follows in Google Cloud Storage as follows:
> tableName = "forecasts"
> basePath = "gs://hudi-datalake/" + tableName
> hudi_options = {
>  'hoodie.table.name': tableName,
>  'hoodie.datasource.write.recordkey.field': 'uuid',
>  'hoodie.datasource.write.partitionpath.field': 'partitionpath',
>  'hoodie.datasource.write.table.name': tableName,
>  'hoodie.datasource.write.operation': 'insert',
>  'hoodie.datasource.write.precombine.field': 'ts',
>  'hoodie.upsert.shuffle.parallelism': 2, 
>  'hoodie.insert.shuffle.parallelism': 2
> }
> results = results.selectExpr(
>  "ds as date",
>  "store",
>  "item",
>  "y as sales",
>  "yhat as sales_predicted",
>  "yhat_upper as sales_predicted_upper",
>  "yhat_lower as sales_predicted_lower",
>  "training_date")
> results.write.format("hudi"). \
>  options(**hudi_options). \
>  mode("overwrite"). \
>  save(basePath)
> I am getting the following error:
> Py4JJavaError: An error occurred while calling o312.save. : 
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V at 
> io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
>  at io.javalin.Javalin.(Javalin.java:94) at 
> io.javalin.Javalin.create(Javalin.java:107) at 
> org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:102)
>  at 
> org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:74)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:102)
>  at 
> org.apache.hudi.client.AbstractHoodieClient.(AbstractHoodieClient.java:69)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:83)
>  at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:137) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:124) 
> at 
> org.apache.hudi.client.HoodieWriteClient.(HoodieWriteClient.java:120) 
> at 
> org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:195) 
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:135) 
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108) at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
>  at 
> 

[GitHub] [hudi] lw309637554 commented on pull request #2136: [HUDI-37] Persist the HoodieIndex type in the hoodie.properties file

2021-02-28 Thread GitBox


lw309637554 commented on pull request #2136:
URL: https://github.com/apache/hudi/pull/2136#issuecomment-787468796


   > @lw309637554 @vinothchandar : can you folks get this to completion, its 
been open for a while. Would be nice to have this in. We might also add more 
documentation in fax or somewhere as to what switches are compatible.
   
   sorry for my late. Now metatable is ready , i will implement this use meta 
table.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on pull request #2160: [HUDI-865] Improve Hive Syncing by directly translating avro schema to Hive types

2021-02-28 Thread GitBox


lw309637554 commented on pull request #2160:
URL: https://github.com/apache/hudi/pull/2160#issuecomment-787467468


   > @lw309637554 : Can you please check the feedback and address them. would 
be nice to have this in.
   
   okay



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #2577: [HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer

2021-02-28 Thread GitBox


nsivabalan commented on pull request #2577:
URL: https://github.com/apache/hudi/pull/2577#issuecomment-787466857


   @yanghua : addressed all comments. have responded to one of your feedback. 
Feel free to check it out when you can. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on a change in pull request #2577: [HUDI-1618] Fixing NPE with Parquet src in multi table delta streamer

2021-02-28 Thread GitBox


nsivabalan commented on a change in pull request #2577:
URL: https://github.com/apache/hudi/pull/2577#discussion_r584311235



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -965,16 +969,30 @@ public void testDistributedTestDataSource() {
 assertEquals(1000, c);
   }
 
-  private static void prepareParquetDFSFiles(int numRecords) throws 
IOException {
-String path = PARQUET_SOURCE_ROOT + "/1.parquet";
+  protected static void prepareParquetDFSFiles(int numRecords, String 
baseParquetPath) throws IOException {
+prepareParquetDFSFiles(numRecords, baseParquetPath, "1.parquet");
+  }
+
+  protected static void prepareParquetDFSFiles(int numRecords, String 
baseParquetPath, String fileName) throws IOException {
+String path = baseParquetPath + "/" + fileName;

Review comment:
   Paths.get() returns java.nio.file.Paths but we need Hadoop fs path(line 
980) here. Hence leaving it as is. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on a change in pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-02-28 Thread GitBox


lw309637554 commented on a change in pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#discussion_r584307500



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##
@@ -84,6 +88,26 @@ class DefaultSource extends RelationProvider
 val tablePath = DataSourceUtils.getTablePath(fs, globPaths.toArray)
 log.info("Obtained hudi table path: " + tablePath)
 
+if (path.nonEmpty) {
+  val _path = path.get.stripSuffix("/")
+  val pathTmp = new Path(_path).makeQualified(fs.getUri, 
fs.getWorkingDirectory)
+  // If the user specifies the table path, the data path is automatically 
inferred
+  if (pathTmp.toString.equals(tablePath)) {
+val sparkEngineContext = new 
HoodieSparkEngineContext(sqlContext.sparkContext)
+val fsBackedTableMetadata =
+  new FileSystemBackedTableMetadata(sparkEngineContext, new 
SerializableConfiguration(fs.getConf), tablePath, false)
+val partitionPaths = fsBackedTableMetadata.getAllPartitionPaths

Review comment:
   @teeyog  hello, now infer the partition for getallpartition paths from 
metadata table. 
   The partition mode is set as hoodie.datasource.write.partitionpath.field 
when write the hudi table. Can we persist the 
hoodie.datasource.write.partitionpath.field to metatable? Then read just get 
the properties , not get all the partition path? cc @vinothchandar 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] garyli1019 commented on a change in pull request #2593: [HUDI-1632] Supports merge on read write mode for Flink writer

2021-02-28 Thread GitBox


garyli1019 commented on a change in pull request #2593:
URL: https://github.com/apache/hudi/pull/2593#discussion_r584291118



##
File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java
##
@@ -165,6 +165,42 @@ private FlinkOptions() {
   .defaultValue(128D) // 128MB
   .withDescription("Batch buffer size in MB to flush data into the 
underneath filesystem");
 
+  // 
+  //  Compaction Options
+  // 
+
+  public static final ConfigOption WRITE_ASYNC_COMPACTION = 
ConfigOptions

Review comment:
   sounds like `ENABLE_ASYNC_COMPACTION`?

##
File path: hudi-flink/src/main/java/org/apache/hudi/operator/FlinkOptions.java
##
@@ -165,6 +165,42 @@ private FlinkOptions() {
   .defaultValue(128D) // 128MB
   .withDescription("Batch buffer size in MB to flush data into the 
underneath filesystem");
 
+  // 
+  //  Compaction Options
+  // 
+
+  public static final ConfigOption WRITE_ASYNC_COMPACTION = 
ConfigOptions
+  .key("compaction.async.enabled")
+  .booleanType()
+  .defaultValue(true) // default true for MOR write
+  .withDescription("Async Compaction, enabled by default for MOR");
+
+  public static final String NUM_COMMITS = "num_commits";

Review comment:
   `num_commits` is too simple IMO, can we use a more informative name? 

##
File path: 
hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactCommitEvent.java
##
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.operator.compact;
+
+import org.apache.hudi.client.WriteStatus;
+
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Represents a commit event from the compaction task {@link CompactFunction}.
+ */
+public class CompactCommitEvent implements Serializable {

Review comment:
   `CompactionCommitEvent`?

##
File path: 
hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactCommitEvent.java
##
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.operator.compact;
+
+import org.apache.hudi.client.WriteStatus;
+
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Represents a commit event from the compaction task {@link CompactFunction}.
+ */
+public class CompactCommitEvent implements Serializable {
+  private static final long serialVersionUID = 1L;
+
+  /**
+   * The compaction commit instant time.
+   */
+  private final String instant;
+  /**
+   * The write statuses.
+   */
+  private final List writeStatuses;
+  /**
+   * The compaction task identifier.
+   */
+  private final int taskID;
+
+  public CompactCommitEvent(String instant, List writeStatuses, 
int taskID) {
+this.instant = instant;
+this.writeStatuses = writeStatuses;
+this.taskID = taskID;
+  }
+
+  public String getInstant() {
+return instant;
+  }
+
+  public List getWriteStatuses() {
+return writeStatuses;
+  }
+
+  public int getTaskID() {

Review comment:
   How are we gonna use this TaskID?

##
File path: 
hudi-flink/src/main/java/org/apache/hudi/operator/compact/CompactEvent.java
##
@@ 

[GitHub] [hudi] nsivabalan commented on a change in pull request #2596: [HUDI-1636] Support Builder Pattern To Build Table Properties For HoodieTableConfig

2021-02-28 Thread GitBox


nsivabalan commented on a change in pull request #2596:
URL: https://github.com/apache/hudi/pull/2596#discussion_r584292305



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -258,4 +260,164 @@ public String getArchivelogFolder() {
   public Properties getProperties() {
 return props;
   }
+
+  public static PropertyBuilder propertyBuilder() {

Review comment:
   is there any fixes or simplification required for 
HoodieTableConfig#createHoodieProperties(FileSystem fs, Path metadataFolder, 
Properties properties) ? 

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -258,4 +260,164 @@ public String getArchivelogFolder() {
   public Properties getProperties() {
 return props;
   }
+
+  public static PropertyBuilder propertyBuilder() {
+return new PropertyBuilder();
+  }
+
+  public static class PropertyBuilder {
+private HoodieTableType tableType;
+private String tableName;
+private String archiveLogFolder;
+private String payloadClassName;
+private Integer timelineLayoutVersion;
+private String baseFileFormat;
+private String preCombineField;
+private String bootstrapIndexClass;
+private String bootstrapBasePath;
+
+private PropertyBuilder() {
+
+}
+
+public PropertyBuilder setTableType(HoodieTableType tableType) {
+  this.tableType = tableType;
+  return this;
+}
+
+public PropertyBuilder setTableType(String tableType) {
+  return setTableType(HoodieTableType.valueOf(tableType));
+}
+
+public PropertyBuilder setTableName(String tableName) {
+  this.tableName = tableName;
+  return this;
+}
+
+public PropertyBuilder setArchiveLogFolder(String archiveLogFolder) {
+  this.archiveLogFolder = archiveLogFolder;
+  return this;
+}
+
+public PropertyBuilder setPayloadClassName(String payloadClassName) {
+  this.payloadClassName = payloadClassName;
+  return this;
+}
+
+public PropertyBuilder setPayloadClass(Class payloadClass) {
+  return setPayloadClassName(payloadClass.getName());
+}
+
+public PropertyBuilder setTimelineLayoutVersion(Integer 
timelineLayoutVersion) {
+  this.timelineLayoutVersion = timelineLayoutVersion;
+  return this;
+}
+
+public PropertyBuilder setBaseFileFormat(String baseFileFormat) {
+  this.baseFileFormat = baseFileFormat;
+  return this;
+}
+
+public PropertyBuilder setPreCombineField(String preCombineField) {
+  this.preCombineField = preCombineField;
+  return this;
+}
+
+public PropertyBuilder setBootstrapIndexClass(String bootstrapIndexClass) {
+  this.bootstrapIndexClass = bootstrapIndexClass;
+  return this;
+}
+
+public PropertyBuilder setBootstrapBasePath(String bootstrapBasePath) {
+  this.bootstrapBasePath = bootstrapBasePath;
+  return this;
+}
+
+public PropertyBuilder fromMetaClient(HoodieTableMetaClient metaClient) {

Review comment:
   can you help point me to the place how/where this code exists prior to 
this patch? I checked HoodieTableMetaClient and couldn't find it. 

##
File path: 
hudi-examples/src/main/java/org/apache/hudi/examples/java/HoodieJavaWriteClientExample.java
##
@@ -72,8 +72,11 @@ public static void main(String[] args) throws Exception {
 Path path = new Path(tablePath);
 FileSystem fs = FSUtils.getFs(tablePath, hadoopConf);
 if (!fs.exists(path)) {
-  HoodieTableMetaClient.initTableType(hadoopConf, tablePath, 
HoodieTableType.valueOf(tableType),
-  tableName, HoodieAvroPayload.class.getName());
+  HoodieTableConfig.propertyBuilder()
+.setTableType(tableType)
+.setTableName(tableName)
+.setPayloadClassName(HoodieAvroPayload.class.getName())
+.initTable(hadoopConf, tablePath);

Review comment:
   this again makes me feel initTable() should move to 
HoodieTableMetaClient. bcoz, as you could see here, we start w/ instantiating 
HoodieTableConfig.propertyBuilder(), but final call results in 
HoodieTableMetaClient instance which does not sit well. 

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -258,4 +260,164 @@ public String getArchivelogFolder() {
   public Properties getProperties() {
 return props;
   }
+
+  public static PropertyBuilder propertyBuilder() {
+return new PropertyBuilder();
+  }
+
+  public static class PropertyBuilder {
+private HoodieTableType tableType;
+private String tableName;
+private String archiveLogFolder;
+private String payloadClassName;
+private Integer timelineLayoutVersion;
+private String baseFileFormat;
+private String preCombineField;
+private String bootstrapIndexClass;
+private String bootstrapBasePath;
+
+private PropertyBuilder() {
+
+}
+
+public