[GitHub] [hudi] teeyog commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-22 Thread GitBox


teeyog commented on a change in pull request #2431:
URL: https://github.com/apache/hudi/pull/2431#discussion_r563026619



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -181,16 +183,33 @@ object DataSourceWriteOptions {
   @Deprecated
   val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL
 
-  def translateStorageTypeToTableType(optParams: Map[String, String]) : 
Map[String, String] = {
+  def translateOptParams(optParams: Map[String, String]): Map[String, String] 
= {
+// translate StorageType to TableType
+var newOptParams = optParams
 if (optParams.contains(STORAGE_TYPE_OPT_KEY) && 
!optParams.contains(TABLE_TYPE_OPT_KEY)) {
   log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a 
later release; Please use " + TABLE_TYPE_OPT_KEY)
-  optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY))
-} else {
-  optParams
+  newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> 
optParams(STORAGE_TYPE_OPT_KEY))
 }
+// translate the api partitionBy of spark DataFrameWriter to 
PARTITIONPATH_FIELD_OPT_KEY
+if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && 
!optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) {
+  val partitionColumns = 
optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+.map(SparkDataSourceUtils.decodePartitioningColumns)
+.getOrElse(Nil)
+
+  val keyGeneratorClass = 
optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,
+DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL)
+  val partitionPathField =
+keyGeneratorClass match {
+  case "org.apache.hudi.keygen.CustomKeyGenerator" =>

Review comment:
   @wangxianghu All KeyGenerators are considered, only 
```CustomKeyGenerator``` is special, which requires the user to specify in the 
form of ```field1:PartitionKeyType1, field2:PartitionKeyType2```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on a change in pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-22 Thread GitBox


wangxianghu commented on a change in pull request #2431:
URL: https://github.com/apache/hudi/pull/2431#discussion_r563024824



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -181,16 +183,33 @@ object DataSourceWriteOptions {
   @Deprecated
   val DEFAULT_STORAGE_TYPE_OPT_VAL = COW_STORAGE_TYPE_OPT_VAL
 
-  def translateStorageTypeToTableType(optParams: Map[String, String]) : 
Map[String, String] = {
+  def translateOptParams(optParams: Map[String, String]): Map[String, String] 
= {
+// translate StorageType to TableType
+var newOptParams = optParams
 if (optParams.contains(STORAGE_TYPE_OPT_KEY) && 
!optParams.contains(TABLE_TYPE_OPT_KEY)) {
   log.warn(STORAGE_TYPE_OPT_KEY + " is deprecated and will be removed in a 
later release; Please use " + TABLE_TYPE_OPT_KEY)
-  optParams ++ Map(TABLE_TYPE_OPT_KEY -> optParams(STORAGE_TYPE_OPT_KEY))
-} else {
-  optParams
+  newOptParams = optParams ++ Map(TABLE_TYPE_OPT_KEY -> 
optParams(STORAGE_TYPE_OPT_KEY))
 }
+// translate the api partitionBy of spark DataFrameWriter to 
PARTITIONPATH_FIELD_OPT_KEY
+if (optParams.contains(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY) && 
!optParams.contains(PARTITIONPATH_FIELD_OPT_KEY)) {
+  val partitionColumns = 
optParams.get(SparkDataSourceUtils.PARTITIONING_COLUMNS_KEY)
+.map(SparkDataSourceUtils.decodePartitioningColumns)
+.getOrElse(Nil)
+
+  val keyGeneratorClass = 
optParams.getOrElse(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,
+DataSourceWriteOptions.DEFAULT_KEYGENERATOR_CLASS_OPT_VAL)
+  val partitionPathField =
+keyGeneratorClass match {
+  case "org.apache.hudi.keygen.CustomKeyGenerator" =>

Review comment:
   @teeyog please take 
`TimestampBasedKeyGenerator`,`CustomKeyGenerator`(configured with timestamp 
partitionpath),`ComplexKeyGenerator`...  into consideration





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2477: [MINOR] Use skipTests flag for skip.hudi-spark2.unit.tests property

2021-01-22 Thread GitBox


codecov-io edited a comment on pull request #2477:
URL: https://github.com/apache/hudi/pull/2477#issuecomment-765873084


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=h1) Report
   > Merging 
[#2477](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=desc) (dd2198f) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/048633da1a913a05252b1b5dea0b3d40d75c81b4?el=desc)
 (048633d) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2477/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#2477   +/-   ##
   =
 Coverage 50.17%   50.17%   
   - Complexity 3050 3051+1 
   =
 Files   419  419   
 Lines 1893118931   
 Branches   1948 1948   
   =
   + Hits   9498 9499+1 
 Misses 8657 8657   
   + Partials776  775-1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiflink | `0.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `65.85% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.48% <ø> (+0.05%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.86% <0.00%> (+0.35%)` | `51.00% <0.00%> (+1.00%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2477: [MINOR] Use skipTests flag for skip.hudi-spark2.unit.tests property

2021-01-22 Thread GitBox


codecov-io edited a comment on pull request #2477:
URL: https://github.com/apache/hudi/pull/2477#issuecomment-765873084


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=h1) Report
   > Merging 
[#2477](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=desc) (dd2198f) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/048633da1a913a05252b1b5dea0b3d40d75c81b4?el=desc)
 (048633d) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2477/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#2477   +/-   ##
   =
 Coverage 50.17%   50.17%   
   - Complexity 3050 3051+1 
   =
 Files   419  419   
 Lines 1893118931   
 Branches   1948 1948   
   =
   + Hits   9498 9499+1 
 Misses 8657 8657   
   + Partials776  775-1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiflink | `0.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `65.85% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.48% <ø> (+0.05%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.86% <0.00%> (+0.35%)` | `51.00% <0.00%> (+1.00%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2477: [MINOR] Use skipTests flag for skip.hudi-spark2.unit.tests property

2021-01-22 Thread GitBox


codecov-io commented on pull request #2477:
URL: https://github.com/apache/hudi/pull/2477#issuecomment-765873084


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=h1) Report
   > Merging 
[#2477](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=desc) (dd2198f) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/048633da1a913a05252b1b5dea0b3d40d75c81b4?el=desc)
 (048633d) will **decrease** coverage by `0.26%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2477/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2477  +/-   ##
   
   - Coverage 50.17%   49.91%   -0.27% 
   + Complexity 3050 2868 -182 
   
 Files   419  398  -21 
 Lines 1893117497-1434 
 Branches   1948 1814 -134 
   
   - Hits   9498 8733 -765 
   + Misses 8657 8055 -602 
   + Partials776  709  -67 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.47% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiflink | `0.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `65.85% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.48% <ø> (+0.05%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2477?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...c/main/java/org/apache/hudi/dla/DLASyncConfig.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL0RMQVN5bmNDb25maWcuamF2YQ==)
 | | | |
   | 
[...c/main/java/org/apache/hudi/hive/HiveSyncTool.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZVN5bmNUb29sLmphdmE=)
 | | | |
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh)
 | | | |
   | 
[.../apache/hudi/hive/MultiPartKeysValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTXVsdGlQYXJ0S2V5c1ZhbHVlRXh0cmFjdG9yLmphdmE=)
 | | | |
   | 
[...src/main/java/org/apache/hudi/dla/DLASyncTool.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL0RMQVN5bmNUb29sLmphdmE=)
 | | | |
   | 
[...di/timeline/service/handlers/FileSliceHandler.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvRmlsZVNsaWNlSGFuZGxlci5qYXZh)
 | | | |
   | 
[.../src/main/java/org/apache/hudi/dla/util/Utils.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktZGxhLXN5bmMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZGxhL3V0aWwvVXRpbHMuamF2YQ==)
 | | | |
   | 
[...va/org/apache/hudi/hive/util/ColumnNameXLator.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvdXRpbC9Db2x1bW5OYW1lWExhdG9yLmphdmE=)
 | | | |
   | 
[...i/hive/SlashEncodedDayPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkRGF5UGFydGl0aW9uVmFsdWVFeHRyYWN0b3IuamF2YQ==)
 | | | |
   | 
[...e/hudi/timeline/service/FileSystemViewHandler.java](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvRmlsZVN5c3RlbVZpZXdIYW5kbGVyLmphdmE=)
 | | | |
   | ... and [12 
more](https://codecov.io/gh/apache/hudi/pull/2477/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and 

[GitHub] [hudi] xushiyan opened a new pull request #2477: [MINOR] Use skipTests flag for skip.hudi-spark2.unit.tests property

2021-01-22 Thread GitBox


xushiyan opened a new pull request #2477:
URL: https://github.com/apache/hudi/pull/2477


   To control the property with master flag `skipTests`, which was used in 
`scripts/run_travis_tests.sh:25`
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-22 Thread GitBox


codecov-io edited a comment on pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#issuecomment-765495259


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=h1) Report
   > Merging 
[#2475](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=desc) (9c38d02) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc)
 (e302c6b) will **increase** coverage by `19.24%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2475/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2475   +/-   ##
   =
   + Coverage 50.18%   69.43%   +19.24% 
   + Complexity 3050  357 -2693 
   =
 Files   419   53  -366 
 Lines 18931 1930-17001 
 Branches   1948  230 -1718 
   =
   - Hits   9500 1340 -8160 
   + Misses 8656  456 -8200 
   + Partials775  134  -641 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...e/hudi/common/table/timeline/dto/FileGroupDTO.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlR3JvdXBEVE8uamF2YQ==)
 | | | |
   | 
[...ava/org/apache/hudi/common/model/HoodieRecord.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlY29yZC5qYXZh)
 | | | |
   | 
[...metadata/HoodieMetadataMergedLogRecordScanner.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllTWV0YWRhdGFNZXJnZWRMb2dSZWNvcmRTY2FubmVyLmphdmE=)
 | | | |
   | 
[...g/apache/hudi/exception/HoodieRemoteException.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZVJlbW90ZUV4Y2VwdGlvbi5qYXZh)
 | | | |
   | 
[.../apache/hudi/common/table/log/HoodieLogFormat.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXQuamF2YQ==)
 | | | |
   | 
[...n/java/org/apache/hudi/cli/HoodieSplashScreen.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVNwbGFzaFNjcmVlbi5qYXZh)
 | | | |
   | 
[.../hudi/hadoop/realtime/HoodieRealtimeFileSplit.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lRmlsZVNwbGl0LmphdmE=)
 | | | |
   | 
[...org/apache/hudi/hadoop/realtime/RealtimeSplit.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL1JlYWx0aW1lU3BsaXQuamF2YQ==)
 | | | |
   | 
[...n/java/org/apache/hudi/common/HoodieCleanStat.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL0hvb2RpZUNsZWFuU3RhdC5qYXZh)
 | | | |
   | 
[...udi/common/table/log/block/HoodieCorruptBlock.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVDb3JydXB0QmxvY2suamF2YQ==)
 | | | |
   | ... and [355 
more](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-22 Thread GitBox


codecov-io edited a comment on pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#issuecomment-765495259


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=h1) Report
   > Merging 
[#2475](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=desc) (9c38d02) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc)
 (e302c6b) will **decrease** coverage by `40.49%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2475/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2475   +/-   ##
   
   - Coverage 50.18%   9.68%   -40.50% 
   + Complexity 3050  48 -3002 
   
 Files   419  53  -366 
 Lines 189311930-17001 
 Branches   1948 230 -1718 
   
   - Hits   9500 187 -9313 
   + Misses 86561730 -6926 
   + Partials775  13  -762 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <ø> (-59.75%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] wangxianghu commented on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-22 Thread GitBox


wangxianghu commented on pull request #2431:
URL: https://github.com/apache/hudi/pull/2431#issuecomment-765865527


   > > Hi @teeyog, thanks for your contribution!
   > > can you add some tests to verify this change
   > 
   > @wangxianghu Test has been added
   
   Thanks, @teeyog  will review soon



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-1543) Fix NPE using HoodieFlinkStreamer to etl data from kafka to hudi

2021-01-22 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-1543.
--
Resolution: Fixed

> Fix NPE using HoodieFlinkStreamer to etl data from kafka to hudi
> 
>
> Key: HUDI-1543
> URL: https://issues.apache.org/jira/browse/HUDI-1543
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
>
> ```
> java.lang.NullPointerException: Keyed state can only be used on a 'keyed 
> stream', i.e., after a 'keyBy()' operation.
> at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:75)
> at 
> org.apache.flink.streaming.api.operators.StreamingRuntimeContext.checkPreconditionsAndGetKeyedStateStore(StreamingRuntimeContext.java:223)
> at 
> org.apache.flink.streaming.api.operators.StreamingRuntimeContext.getMapState(StreamingRuntimeContext.java:216)
> at 
> org.apache.hudi.index.state.FlinkInMemoryStateIndex.(FlinkInMemoryStateIndex.java:58)
> at 
> org.apache.hudi.index.FlinkHoodieIndex.createIndex(FlinkHoodieIndex.java:61)
> at 
> org.apache.hudi.client.HoodieFlinkWriteClient.createIndex(HoodieFlinkWriteClient.java:75)
> at 
> org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:136)
> at 
> org.apache.hudi.client.AbstractHoodieWriteClient.(AbstractHoodieWriteClient.java:120)
> at 
> org.apache.hudi.client.HoodieFlinkWriteClient.(HoodieFlinkWriteClient.java:62)
> at 
> org.apache.hudi.operator.InstantGenerateOperator.open(InstantGenerateOperator.java:115)
> at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:291)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
> at java.lang.Thread.run(Thread.java:748)
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-1453] Fix NPE using HoodieFlinkStreamer to etl data from kafka to hudi (#2474)

2021-01-22 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new e302c6b  [HUDI-1453] Fix NPE using HoodieFlinkStreamer to etl data 
from kafka to hudi (#2474)
e302c6b is described below

commit e302c6bc12c7eb764781898fdee8ee302ef4ec10
Author: wangxianghu 
AuthorDate: Sat Jan 23 10:27:40 2021 +0800

[HUDI-1453] Fix NPE using HoodieFlinkStreamer to etl data from kafka to 
hudi (#2474)
---
 .../src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java
 
b/hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java
index 58cad4e..75c7668 100644
--- 
a/hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java
+++ 
b/hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java
@@ -109,7 +109,7 @@ public class InstantGenerateOperator extends 
AbstractStreamOperator

[GitHub] [hudi] yanghua merged pull request #2474: [HUDI-1453] Fix NPE using HoodieFlinkStreamer to etl data from kafka to hudi

2021-01-22 Thread GitBox


yanghua merged pull request #2474:
URL: https://github.com/apache/hudi/pull/2474


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific parti…

2021-01-22 Thread GitBox


wangxianghu commented on pull request #2452:
URL: https://github.com/apache/hudi/pull/2452#issuecomment-765808446


   > can we just have a this implemented as a replace of the partition where 
all files are replaced by an empty list. cleaner would automatically clean the 
partition that way. Love to keep all of our tooling to be flexible at the file 
level, working with existing actions and timeline
   
   I will give a try



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1544) Add unit test against HoodieFlinkStreamer

2021-01-22 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu updated HUDI-1544:
--
Summary: Add unit test against HoodieFlinkStreamer  (was: Add 
HoodieFlinkStreamer unit test)

> Add unit test against HoodieFlinkStreamer
> -
>
> Key: HUDI-1544
> URL: https://issues.apache.org/jira/browse/HUDI-1544
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: wangxianghu
>Priority: Major
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1544) Add unit test against HoodieFlinkStreamer

2021-01-22 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu reassigned HUDI-1544:
-

Assignee: wangxianghu

> Add unit test against HoodieFlinkStreamer
> -
>
> Key: HUDI-1544
> URL: https://issues.apache.org/jira/browse/HUDI-1544
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] wangxianghu commented on pull request #2473: [HOTFIX] Revert upgrade flink verison to 1.12.0

2021-01-22 Thread GitBox


wangxianghu commented on pull request #2473:
URL: https://github.com/apache/hudi/pull/2473#issuecomment-765808074


   > Good catch. Lets file a JIRA to close the testing gap around this?
   > 
   > Please merge into master when ready. I will port to 0.7.0 branch to do a 
RC2
   
   filed here: https://issues.apache.org/jira/browse/HUDI-1544



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1544) Add HoodieFlinkStreamer unit test

2021-01-22 Thread wangxianghu (Jira)
wangxianghu created HUDI-1544:
-

 Summary: Add HoodieFlinkStreamer unit test
 Key: HUDI-1544
 URL: https://issues.apache.org/jira/browse/HUDI-1544
 Project: Apache Hudi
  Issue Type: Test
Reporter: wangxianghu
 Fix For: 0.8.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] wangxianghu commented on pull request #2473: [HOTFIX] Revert upgrade flink verison to 1.12.0

2021-01-22 Thread GitBox


wangxianghu commented on pull request #2473:
URL: https://github.com/apache/hudi/pull/2473#issuecomment-765805235


   > @wangxianghu have you verified that this fix makes the flink path happy? 
i.e any more fixes to do?
   
   I tested it in our dev env, it is ok now



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] rubenssoto commented on pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-22 Thread GitBox


rubenssoto commented on pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#issuecomment-765712586


   This is a great and important feature to make Hudi easier for no heavy users.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific parti…

2021-01-22 Thread GitBox


vinothchandar commented on pull request #2452:
URL: https://github.com/apache/hudi/pull/2452#issuecomment-765711222


   can we just have a this implemented as a replace of the partition where all 
files are replaced by an empty list. cleaner would automatically clean the 
partition that way. Love to keep all of our tooling to be flexible at the file 
level, working with existing actions and timeline



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] adaniline-paytm commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-22 Thread GitBox


adaniline-paytm commented on issue #2285:
URL: https://github.com/apache/hudi/issues/2285#issuecomment-765700396


   I have the same sporadic issue, using standard Spark 2.4.7 distribution and 
Hudi 0.6:
   ```
   $ ls -l /opt/spark-2.4.7-bin-without-hadoop/jars/parquet-*
/opt/spark-2.4.7-bin-without-hadoop/jars/parquet-column-1.10.1.jar
   /opt/spark-2.4.7-bin-without-hadoop/jars/parquet-common-1.10.1.jar
   /opt/spark-2.4.7-bin-without-hadoop/jars/parquet-encoding-1.10.1.jar
   /opt/spark-2.4.7-bin-without-hadoop/jars/parquet-format-2.4.0.jar
   /opt/spark-2.4.7-bin-without-hadoop/jars/parquet-hadoop-1.10.1.jar
   /opt/spark-2.4.7-bin-without-hadoop/jars/parquet-jackson-1.10.1.jar
   ```
   the only workaround we found is to disable VectorizedReader:
   ```
 rc.spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




svn commit: r45548 - in /dev/hudi/hudi-0.7.0-rc2: ./ hudi-0.7.0-rc2.src.tgz hudi-0.7.0-rc2.src.tgz.asc hudi-0.7.0-rc2.src.tgz.sha512

2021-01-22 Thread vinoth
Author: vinoth
Date: Fri Jan 22 20:33:42 2021
New Revision: 45548

Log:
Hudi 0.7.0 RC2

Added:
dev/hudi/hudi-0.7.0-rc2/
dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz   (with props)
dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz.asc
dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz.sha512

Added: dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz
==
Binary file - no diff available.

Propchange: dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz
--
svn:mime-type = application/octet-stream

Added: dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz.asc
==
--- dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz.asc (added)
+++ dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz.asc Fri Jan 22 20:33:42 2021
@@ -0,0 +1,11 @@
+-BEGIN PGP SIGNATURE-
+
+iQEzBAABCAAdFiEEfyo765IhgbBqyxqkX30J5YHSvLYFAmALJ6gACgkQX30J5YHS
+vLaBigf8DAGbcRyDmyrC+ydNfq6TJFr92h7NYl1CDplxUi0lt38QkNOsKwc8p4v1
+GOJoC5C0W+Y1HlLsBjMGgz7Xt/kZ/CpcFj4gMpOd0d5LA/W+Rqopcb8FpbyWld2w
+UjT7loXKMo1LqOD+Y05s9OoWaRl4Aj0wXO5fMtudUZgBYebhrdsnMr58mYJPcXe3
+AqiGSqP155jNlRW2X46/I4z6TKnNIZB4zEL4yTW7HpP/8p6U0E7zQak1kscuA+9t
+2vueMpmJlmEKfMWG/qYi/0M0AkNM6k/ZASzhy6Pe+Pg5snKvn8T7brxb/3BRByB9
+aGv3F7dNk1myUF1W9mSyPCL0Y0+2Ng==
+=KkY9
+-END PGP SIGNATURE-

Added: dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz.sha512
==
--- dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz.sha512 (added)
+++ dev/hudi/hudi-0.7.0-rc2/hudi-0.7.0-rc2.src.tgz.sha512 Fri Jan 22 20:33:42 
2021
@@ -0,0 +1 @@
+a07ed687a3f55843943f21e1b02e4e5d6e5a5b1bdb88aa890876b25f8b1965022710a2fd4ed83e6cffbe7f6d0f37c2312acd0387a57a014e2c5eae4427e2f102
  hudi-0.7.0-rc2.src.tgz




[GitHub] [hudi] vinothchandar commented on a change in pull request #2426: [HUDI-304] Configure spotless and java style

2021-01-22 Thread GitBox


vinothchandar commented on a change in pull request #2426:
URL: https://github.com/apache/hudi/pull/2426#discussion_r562873786



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##
@@ -18,36 +18,36 @@
 
 package org.apache.hudi.hive;
 
-import java.io.UnsupportedEncodingException;
-import java.net.URLDecoder;
-import java.nio.charset.StandardCharsets;
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.hive.metastore.api.Table;
-import org.apache.hadoop.hive.metastore.api.Database;
 import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.fs.StorageSchemes;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.util.Option;
 import org.apache.hudi.common.util.ValidationUtils;
 import org.apache.hudi.hive.util.HiveSchemaUtil;
+import org.apache.hudi.sync.common.AbstractSyncHoodieClient;
 
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.api.Database;

Review comment:
   does this organization work consistenty with `Code > Optimize Imports` 
in IntelliJ? This was the large stick point from last time. 

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##
@@ -76,12 +76,18 @@
   private HiveConf configuration;
 
   public HoodieHiveClient(HiveSyncConfig cfg, HiveConf configuration, 
FileSystem fs) {
-super(cfg.basePath, cfg.assumeDatePartitioning, 
cfg.useFileListingFromMetadata, cfg.verifyMetadataFileListing, fs);
+super(
+cfg.basePath,

Review comment:
   personally, not a great fan of using a full line for each method 
argument.  Where is this style coming from?
   
   I would be happy to concede, but like to understand what style we are 
picking here. 

##
File path: pom.xml
##
@@ -198,34 +200,35 @@
   
 
   

[hudi] annotated tag release-0.7.0-rc2 updated (0473191 -> ee452e2)

2021-01-22 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to annotated tag release-0.7.0-rc2
in repository https://gitbox.apache.org/repos/asf/hudi.git.


*** WARNING: tag release-0.7.0-rc2 was modified! ***

from 0473191  (commit)
  to ee452e2  (tag)
 tagging 04731916a5dd782c1f1f248f85085245a4c7ee7d (commit)
 replaces release-0.7.0-rc1
  by Vinoth Chandar
  on Fri Jan 22 11:30:26 2021 -0800

- Log -
0.7.0
-BEGIN PGP SIGNATURE-

iQEzBAABCAAdFiEEfyo765IhgbBqyxqkX30J5YHSvLYFAmALJ9IACgkQX30J5YHS
vLbE0wf8DAKL47LNum2YBkiwt1FkdrEsObHB+AmXs6yMKKk+DPQ+Hx28StR4pU4a
DIOp4kMlvdImV3+SBmvyij4KMRxBtZSW2JPXSvSamPZEctf6vsHZfxKGihU0dT07
hu01i1WnvJ+y4aVITntY8xmldGBheO5iiBRED/tsA+IDjh13s4etdoLveTv5aPsK
ZxCmayT0hllUg9Fg0lpMum3VBATC78WKuNx/tm+ilWbN/XbQFyI/335X8R5xQH4l
m+i4lIMcvCkCSqe+R+fkv4UDv/lzZqPFAhpX2by930czXNIySjNb0G9PCXNZ7Jjf
iDq/kELqCR5CyL+4tVs/6fMtr7owkw==
=hJzl
-END PGP SIGNATURE-
---


No new revisions were added by this update.

Summary of changes:



[hudi] branch release-0.7.0 updated (6d58ace -> 0473191)

2021-01-22 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch release-0.7.0
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 6d58ace  [MINOR] Add License to test.properties
 new 7aeb3cc  [HOTFIX] Revert upgrade flink verison to 1.12.0 (#2473)
 new 0473191  Bumping release candidate number 2

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docker/hoodie/hadoop/base/pom.xml   | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml   | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml  | 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml  | 2 +-
 docker/hoodie/hadoop/namenode/pom.xml   | 2 +-
 docker/hoodie/hadoop/pom.xml| 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml| 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml| 2 +-
 hudi-cli/pom.xml| 2 +-
 hudi-client/hudi-client-common/pom.xml  | 4 ++--
 hudi-client/hudi-flink-client/pom.xml   | 4 ++--
 hudi-client/hudi-java-client/pom.xml| 4 ++--
 hudi-client/hudi-spark-client/pom.xml   | 4 ++--
 hudi-client/pom.xml | 2 +-
 hudi-common/pom.xml | 2 +-
 hudi-examples/pom.xml   | 2 +-
 hudi-flink/pom.xml  | 2 +-
 hudi-hadoop-mr/pom.xml  | 2 +-
 hudi-integ-test/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++--
 hudi-spark-datasource/hudi-spark/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark2/pom.xml   | 4 ++--
 hudi-spark-datasource/hudi-spark3/pom.xml   | 4 ++--
 hudi-spark-datasource/pom.xml   | 2 +-
 hudi-sync/hudi-dla-sync/pom.xml | 2 +-
 hudi-sync/hudi-hive-sync/pom.xml| 2 +-
 hudi-sync/hudi-sync-common/pom.xml  | 2 +-
 hudi-sync/pom.xml   | 2 +-
 hudi-timeline-service/pom.xml   | 2 +-
 hudi-utilities/pom.xml  | 2 +-
 packaging/hudi-flink-bundle/pom.xml | 9 -
 packaging/hudi-hadoop-mr-bundle/pom.xml | 2 +-
 packaging/hudi-hive-sync-bundle/pom.xml | 2 +-
 packaging/hudi-integ-test-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml| 2 +-
 packaging/hudi-spark-bundle/pom.xml | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml   | 2 +-
 packaging/hudi-utilities-bundle/pom.xml | 2 +-
 pom.xml | 4 ++--
 42 files changed, 58 insertions(+), 51 deletions(-)



[hudi] 01/02: [HOTFIX] Revert upgrade flink verison to 1.12.0 (#2473)

2021-01-22 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch release-0.7.0
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 7aeb3ccff33c740c07456d7c9ec22e9d574e6650
Author: wangxianghu 
AuthorDate: Sat Jan 23 02:55:46 2021 +0800

[HOTFIX] Revert upgrade flink verison to 1.12.0 (#2473)
---
 packaging/hudi-flink-bundle/pom.xml | 7 +++
 pom.xml | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/packaging/hudi-flink-bundle/pom.xml 
b/packaging/hudi-flink-bundle/pom.xml
index 7358faa..3f3a5e6 100644
--- a/packaging/hudi-flink-bundle/pom.xml
+++ b/packaging/hudi-flink-bundle/pom.xml
@@ -104,6 +104,7 @@
   io.prometheus:simpleclient_common
   com.yammer.metrics:metrics-core
   
org.apache.flink:flink-connector-kafka_${scala.binary.version}
+  
org.apache.flink:flink-connector-kafka-base_${scala.binary.version}
   
org.apache.kafka:kafka_${scala.binary.version}
   com.101tec:zkclient
   org.apache.kafka:kafka-clients
@@ -189,6 +190,12 @@
   flink-connector-kafka_${scala.binary.version}
   compile
 
+
+  org.apache.flink
+  
flink-connector-kafka-base_${scala.binary.version}
+  ${flink.version}
+  compile
+
 
 
 
diff --git a/pom.xml b/pom.xml
index b20ef08..8a7e24c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -106,7 +106,7 @@
 0.8.0
 4.4.1
 2.4.4
-1.12.0
+1.11.2
 2.4.4
 3.0.0
 1.8.2



[hudi] 02/02: Bumping release candidate number 2

2021-01-22 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch release-0.7.0
in repository https://gitbox.apache.org/repos/asf/hudi.git

commit 04731916a5dd782c1f1f248f85085245a4c7ee7d
Author: Vinoth Chandar 
AuthorDate: Fri Jan 22 11:23:55 2021 -0800

Bumping release candidate number 2
---
 docker/hoodie/hadoop/base/pom.xml   | 2 +-
 docker/hoodie/hadoop/datanode/pom.xml   | 2 +-
 docker/hoodie/hadoop/historyserver/pom.xml  | 2 +-
 docker/hoodie/hadoop/hive_base/pom.xml  | 2 +-
 docker/hoodie/hadoop/namenode/pom.xml   | 2 +-
 docker/hoodie/hadoop/pom.xml| 2 +-
 docker/hoodie/hadoop/prestobase/pom.xml | 2 +-
 docker/hoodie/hadoop/spark_base/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkadhoc/pom.xml | 2 +-
 docker/hoodie/hadoop/sparkmaster/pom.xml| 2 +-
 docker/hoodie/hadoop/sparkworker/pom.xml| 2 +-
 hudi-cli/pom.xml| 2 +-
 hudi-client/hudi-client-common/pom.xml  | 4 ++--
 hudi-client/hudi-flink-client/pom.xml   | 4 ++--
 hudi-client/hudi-java-client/pom.xml| 4 ++--
 hudi-client/hudi-spark-client/pom.xml   | 4 ++--
 hudi-client/pom.xml | 2 +-
 hudi-common/pom.xml | 2 +-
 hudi-examples/pom.xml   | 2 +-
 hudi-flink/pom.xml  | 2 +-
 hudi-hadoop-mr/pom.xml  | 2 +-
 hudi-integ-test/pom.xml | 2 +-
 hudi-spark-datasource/hudi-spark-common/pom.xml | 4 ++--
 hudi-spark-datasource/hudi-spark/pom.xml| 4 ++--
 hudi-spark-datasource/hudi-spark2/pom.xml   | 4 ++--
 hudi-spark-datasource/hudi-spark3/pom.xml   | 4 ++--
 hudi-spark-datasource/pom.xml   | 2 +-
 hudi-sync/hudi-dla-sync/pom.xml | 2 +-
 hudi-sync/hudi-hive-sync/pom.xml| 2 +-
 hudi-sync/hudi-sync-common/pom.xml  | 2 +-
 hudi-sync/pom.xml   | 2 +-
 hudi-timeline-service/pom.xml   | 2 +-
 hudi-utilities/pom.xml  | 2 +-
 packaging/hudi-flink-bundle/pom.xml | 2 +-
 packaging/hudi-hadoop-mr-bundle/pom.xml | 2 +-
 packaging/hudi-hive-sync-bundle/pom.xml | 2 +-
 packaging/hudi-integ-test-bundle/pom.xml| 2 +-
 packaging/hudi-presto-bundle/pom.xml| 2 +-
 packaging/hudi-spark-bundle/pom.xml | 2 +-
 packaging/hudi-timeline-server-bundle/pom.xml   | 2 +-
 packaging/hudi-utilities-bundle/pom.xml | 2 +-
 pom.xml | 2 +-
 42 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/docker/hoodie/hadoop/base/pom.xml 
b/docker/hoodie/hadoop/base/pom.xml
index 9fda7ee..27e4f4d 100644
--- a/docker/hoodie/hadoop/base/pom.xml
+++ b/docker/hoodie/hadoop/base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.7.0-rc1
+0.7.0-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/datanode/pom.xml 
b/docker/hoodie/hadoop/datanode/pom.xml
index ee1d90b..9ec6f37 100644
--- a/docker/hoodie/hadoop/datanode/pom.xml
+++ b/docker/hoodie/hadoop/datanode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.7.0-rc1
+0.7.0-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/historyserver/pom.xml 
b/docker/hoodie/hadoop/historyserver/pom.xml
index 6d2424f..db1442e 100644
--- a/docker/hoodie/hadoop/historyserver/pom.xml
+++ b/docker/hoodie/hadoop/historyserver/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.7.0-rc1
+0.7.0-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/hive_base/pom.xml 
b/docker/hoodie/hadoop/hive_base/pom.xml
index 5757082..3765068 100644
--- a/docker/hoodie/hadoop/hive_base/pom.xml
+++ b/docker/hoodie/hadoop/hive_base/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.7.0-rc1
+0.7.0-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/namenode/pom.xml 
b/docker/hoodie/hadoop/namenode/pom.xml
index 01ee9f4..da10900 100644
--- a/docker/hoodie/hadoop/namenode/pom.xml
+++ b/docker/hoodie/hadoop/namenode/pom.xml
@@ -19,7 +19,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.7.0-rc1
+0.7.0-rc2
   
   4.0.0
   pom
diff --git a/docker/hoodie/hadoop/pom.xml b/docker/hoodie/hadoop/pom.xml
index 60e9d13..bb5b7c6 100644
--- a/docker/hoodie/hadoop/pom.xml
+++ b/docker/hoodie/hadoop/pom.xml
@@ -19,7 +19,7 @@
   
 hudi
 org.apache.hudi
-0.7.0-rc1
+0.7.0-rc2
 ../../../pom.xml
   
   4.0.0
diff --git a/docker/hoodie/hadoop/prestobase/pom.xml 
b/docker/hoodie/hadoop/prestobase/pom.xml
index df3c3c5..3d3fcbf 100644
--- a/docker/hoodie/hadoop/prestobase/pom.xml
+++ b/docker/hoodie/hadoop/prestobase/pom.xml
@@ -20,7 +20,7 @@
   
 hudi-hadoop-docker
 org.apache.hudi
-0.7.0-rc1
+

[hudi] branch master updated: [HOTFIX] Revert upgrade flink verison to 1.12.0 (#2473)

2021-01-22 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new d3ea0f9  [HOTFIX] Revert upgrade flink verison to 1.12.0 (#2473)
d3ea0f9 is described below

commit d3ea0f957e231029c5c21ac479039159aa499cb4
Author: wangxianghu 
AuthorDate: Sat Jan 23 02:55:46 2021 +0800

[HOTFIX] Revert upgrade flink verison to 1.12.0 (#2473)
---
 packaging/hudi-flink-bundle/pom.xml | 7 +++
 pom.xml | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/packaging/hudi-flink-bundle/pom.xml 
b/packaging/hudi-flink-bundle/pom.xml
index 076ee43..b8a8cd1 100644
--- a/packaging/hudi-flink-bundle/pom.xml
+++ b/packaging/hudi-flink-bundle/pom.xml
@@ -104,6 +104,7 @@
   io.prometheus:simpleclient_common
   com.yammer.metrics:metrics-core
   
org.apache.flink:flink-connector-kafka_${scala.binary.version}
+  
org.apache.flink:flink-connector-kafka-base_${scala.binary.version}
   
org.apache.kafka:kafka_${scala.binary.version}
   com.101tec:zkclient
   org.apache.kafka:kafka-clients
@@ -189,6 +190,12 @@
   flink-connector-kafka_${scala.binary.version}
   compile
 
+
+  org.apache.flink
+  
flink-connector-kafka-base_${scala.binary.version}
+  ${flink.version}
+  compile
+
 
 
 
diff --git a/pom.xml b/pom.xml
index 4f8b153..d4e7b6b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -106,7 +106,7 @@
 0.8.0
 4.4.1
 ${spark2.version}
-1.12.0
+1.11.2
 2.4.4
 3.0.0
 1.8.2



[GitHub] [hudi] vinothchandar merged pull request #2473: [HOTFIX] Revert upgrade flink verison to 1.12.0

2021-01-22 Thread GitBox


vinothchandar merged pull request #2473:
URL: https://github.com/apache/hudi/pull/2473


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2476: [HUDI-1538] Try to init class trying different signatures instead of checking its name

2021-01-22 Thread GitBox


codecov-io edited a comment on pull request #2476:
URL: https://github.com/apache/hudi/pull/2476#issuecomment-765617591


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2476?src=pr=h1) Report
   > Merging 
[#2476](https://codecov.io/gh/apache/hudi/pull/2476?src=pr=desc) (d504a97) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/048633da1a913a05252b1b5dea0b3d40d75c81b4?el=desc)
 (048633d) will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2476/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2476?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2476  +/-   ##
   
   + Coverage 50.17%   50.18%   +0.01% 
   + Complexity 3050 3049   -1 
   
 Files   419  419  
 Lines 1893118930   -1 
 Branches   1948 1947   -1 
   
   + Hits   9498 9500   +2 
   + Misses 8657 8656   -1 
   + Partials776  774   -2 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.49% <ø> (+0.02%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `0.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `65.85% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.46% <100.00%> (+0.03%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2476?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `64.53% <100.00%> (+0.37%)` | `32.00 <1.00> (-1.00)` | :arrow_up: |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `79.68% <0.00%> (+1.56%)` | `26.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2476: [HUDI-1538] Try to init class trying different signatures instead of checking its name

2021-01-22 Thread GitBox


codecov-io commented on pull request #2476:
URL: https://github.com/apache/hudi/pull/2476#issuecomment-765617591


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2476?src=pr=h1) Report
   > Merging 
[#2476](https://codecov.io/gh/apache/hudi/pull/2476?src=pr=desc) (d504a97) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/048633da1a913a05252b1b5dea0b3d40d75c81b4?el=desc)
 (048633d) will **decrease** coverage by `0.32%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2476/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2476?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2476  +/-   ##
   
   - Coverage 50.17%   49.84%   -0.33% 
   + Complexity 3050 2989  -61 
   
 Files   419  413   -6 
 Lines 1893118545 -386 
 Branches   1948 1930  -18 
   
   - Hits   9498 9244 -254 
   + Misses 8657 8541 -116 
   + Partials776  760  -16 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.49% <ø> (+0.02%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `0.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `65.85% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.46% <100.00%> (+0.03%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2476?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=)
 | `64.53% <100.00%> (+0.37%)` | `32.00 <1.00> (-1.00)` | :arrow_up: |
   | 
[...udi/timeline/service/handlers/BaseFileHandler.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvQmFzZUZpbGVIYW5kbGVyLmphdmE=)
 | | | |
   | 
[...apache/hudi/timeline/service/handlers/Handler.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvSGFuZGxlci5qYXZh)
 | | | |
   | 
[.../apache/hudi/timeline/service/TimelineService.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvVGltZWxpbmVTZXJ2aWNlLmphdmE=)
 | | | |
   | 
[...udi/timeline/service/handlers/TimelineHandler.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvVGltZWxpbmVIYW5kbGVyLmphdmE=)
 | | | |
   | 
[...e/hudi/timeline/service/FileSystemViewHandler.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvRmlsZVN5c3RlbVZpZXdIYW5kbGVyLmphdmE=)
 | | | |
   | 
[...di/timeline/service/handlers/FileSliceHandler.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvRmlsZVNsaWNlSGFuZGxlci5qYXZh)
 | | | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2476/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `79.68% <0.00%> (+1.56%)` | `26.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-219) Tabify hudi docker demo page

2021-01-22 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270328#comment-17270328
 ] 

Vinoth Chandar commented on HUDI-219:
-

also I would not treat this as a support issue per se. this seems like a clear 
enhancement to whats already out there. 

> Tabify hudi docker demo page
> 
>
> Key: HUDI-219
> URL: https://issues.apache.org/jira/browse/HUDI-219
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>  Labels: user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-219) Tabify hudi docker demo page

2021-01-22 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270298#comment-17270298
 ] 

Vinoth Chandar commented on HUDI-219:
-

from a quick check, seems like not. 
https://hudi.apache.org/docs/docker_demo.html 

[~shivnarayan] might be easier in some cases, to just take a quick peek 
yourself :) 

> Tabify hudi docker demo page
> 
>
> Key: HUDI-219
> URL: https://issues.apache.org/jira/browse/HUDI-219
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>  Labels: user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on pull request #2473: [HOTFIX] Revert upgrade flink verison to 1.12.0

2021-01-22 Thread GitBox


vinothchandar commented on pull request #2473:
URL: https://github.com/apache/hudi/pull/2473#issuecomment-765561509


   @wangxianghu have you verified that this fix makes the flink path happy? i.e 
any more fixes to do?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vburenin commented on pull request #2476: [HUDI-1538] Try to init class trying different signatures instead of checking its name

2021-01-22 Thread GitBox


vburenin commented on pull request #2476:
URL: https://github.com/apache/hudi/pull/2476#issuecomment-765508771


   @yanghua This is a new PR that is based on not diverted master branch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vburenin opened a new pull request #2476: [HUDI-1538] Try to init class trying different signatures instead of checking its name

2021-01-22 Thread GitBox


vburenin opened a new pull request #2476:
URL: https://github.com/apache/hudi/pull/2476


   ## What is the purpose of the pull request
   UtilHelpers.createSource had a hardcoded way of checking which
   constructor signature needs to be used to instantiate a class
   which makes it impossible to override those hardcoded classes outside of Hudi
   and use them instead as their signature remains the same.
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vburenin commented on pull request #2450: [HUDI-1538] Try to init class trying different signatures instead of checking its name.

2021-01-22 Thread GitBox


vburenin commented on pull request #2450:
URL: https://github.com/apache/hudi/pull/2450#issuecomment-765504810


   I messed up with a master branch on with a different PR, will close this one 
and start over.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-1308) Issues found during testing RFC-15

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-1308.
--
Fix Version/s: (was: 0.8.0)
   0.7.0
   Resolution: Fixed

> Issues found during testing RFC-15
> --
>
> Key: HUDI-1308
> URL: https://issues.apache.org/jira/browse/HUDI-1308
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
> Fix For: 0.7.0
>
>
> THis is an umbrella ticket containing all the issues found during testing 
> RFC-15



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1310) Corruption Block Handling too slow in S3

2021-01-22 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270234#comment-17270234
 ] 

Vinoth Chandar commented on HUDI-1310:
--

[~shivnarayan] this is a good one to tackle head on

> Corruption Block Handling too slow in S3
> 
>
> Key: HUDI-1310
> URL: https://issues.apache.org/jira/browse/HUDI-1310
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Performance, Writer Core
>Reporter: Balaji Varadarajan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.8.0
>
>
> The logic to figure out next valid starting block offset is too slow when run 
> in S3. 
> I have bolded the log message that takes long time to appear. 
>  
>  
> 36589 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning 
> log file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0}
> 36590 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block 
> in file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} with block size(3723305) running past EOF
> 36684 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Log 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} has a corrupted block at 14
> *44515 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block 
> in* 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} starts at 3723319
> 44566 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a 
> corrupt block in 
> s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1310) Corruption Block Handling too slow in S3

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1310:
-
Parent: (was: HUDI-1308)
Issue Type: Bug  (was: Sub-task)

> Corruption Block Handling too slow in S3
> 
>
> Key: HUDI-1310
> URL: https://issues.apache.org/jira/browse/HUDI-1310
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Performance, Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> The logic to figure out next valid starting block offset is too slow when run 
> in S3. 
> I have bolded the log message that takes long time to appear. 
>  
>  
> 36589 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning 
> log file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0}
> 36590 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block 
> in file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} with block size(3723305) running past EOF
> 36684 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Log 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} has a corrupted block at 14
> *44515 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block 
> in* 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} starts at 3723319
> 44566 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a 
> corrupt block in 
> s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1310) Corruption Block Handling too slow in S3

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1310:
-
Status: Open  (was: New)

> Corruption Block Handling too slow in S3
> 
>
> Key: HUDI-1310
> URL: https://issues.apache.org/jira/browse/HUDI-1310
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Performance, Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> The logic to figure out next valid starting block offset is too slow when run 
> in S3. 
> I have bolded the log message that takes long time to appear. 
>  
>  
> 36589 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning 
> log file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0}
> 36590 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block 
> in file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} with block size(3723305) running past EOF
> 36684 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Log 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} has a corrupted block at 14
> *44515 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block 
> in* 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} starts at 3723319
> 44566 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a 
> corrupt block in 
> s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1310) Corruption Block Handling too slow in S3

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-1310:


Assignee: sivabalan narayanan

> Corruption Block Handling too slow in S3
> 
>
> Key: HUDI-1310
> URL: https://issues.apache.org/jira/browse/HUDI-1310
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Performance, Writer Core
>Reporter: Balaji Varadarajan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.8.0
>
>
> The logic to figure out next valid starting block offset is too slow when run 
> in S3. 
> I have bolded the log message that takes long time to appear. 
>  
>  
> 36589 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning 
> log file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0}
> 36590 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block 
> in file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} with block size(3723305) running past EOF
> 36684 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Log 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} has a corrupted block at 14
> *44515 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block 
> in* 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} starts at 3723319
> 44566 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a 
> corrupt block in 
> s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1310) Corruption Block Handling too slow in S3

2021-01-22 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270231#comment-17270231
 ] 

Vinoth Chandar commented on HUDI-1310:
--

HUDI-1532 should have fixed this in 0.7.0. Just keeping it open to verify again

> Corruption Block Handling too slow in S3
> 
>
> Key: HUDI-1310
> URL: https://issues.apache.org/jira/browse/HUDI-1310
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Performance, Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> The logic to figure out next valid starting block offset is too slow when run 
> in S3. 
> I have bolded the log message that takes long time to appear. 
>  
>  
> 36589 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning 
> log file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0}
> 36590 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block 
> in file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} with block size(3723305) running past EOF
> 36684 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Log 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} has a corrupted block at 14
> *44515 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block 
> in* 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} starts at 3723319
> 44566 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a 
> corrupt block in 
> s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1310) Corruption Block Handling too slow in S3

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1310:
-
Fix Version/s: (was: 0.7.0)
   0.8.0

> Corruption Block Handling too slow in S3
> 
>
> Key: HUDI-1310
> URL: https://issues.apache.org/jira/browse/HUDI-1310
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> The logic to figure out next valid starting block offset is too slow when run 
> in S3. 
> I have bolded the log message that takes long time to appear. 
>  
>  
> 36589 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning 
> log file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0}
> 36590 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block 
> in file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} with block size(3723305) running past EOF
> 36684 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Log 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} has a corrupted block at 14
> *44515 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block 
> in* 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} starts at 3723319
> 44566 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a 
> corrupt block in 
> s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1310) Corruption Block Handling too slow in S3

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1310:
-
Fix Version/s: 0.7.0

> Corruption Block Handling too slow in S3
> 
>
> Key: HUDI-1310
> URL: https://issues.apache.org/jira/browse/HUDI-1310
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.7.0
>
>
> The logic to figure out next valid starting block offset is too slow when run 
> in S3. 
> I have bolded the log message that takes long time to appear. 
>  
>  
> 36589 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning 
> log file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0}
> 36590 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block 
> in file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} with block size(3723305) running past EOF
> 36684 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Log 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} has a corrupted block at 14
> *44515 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block 
> in* 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} starts at 3723319
> 44566 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a 
> corrupt block in 
> s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1310) Corruption Block Handling too slow in S3

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1310:
-
Component/s: Performance

> Corruption Block Handling too slow in S3
> 
>
> Key: HUDI-1310
> URL: https://issues.apache.org/jira/browse/HUDI-1310
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Performance, Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> The logic to figure out next valid starting block offset is too slow when run 
> in S3. 
> I have bolded the log message that takes long time to appear. 
>  
>  
> 36589 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning 
> log file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0}
> 36590 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block 
> in file 
> HoodieLogFile\{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} with block size(3723305) running past EOF
> 36684 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Log 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} has a corrupted block at 14
> *44515 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block 
> in* 
> HoodieLogFile\{pathStr='s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} starts at 3723319
> 44566 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a 
> corrupt block in 
> s3a://x/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1497) Timeout Exception during getFileStatus()

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-1497.
--
Fix Version/s: 0.7.0
   Resolution: Fixed

This was due to the leaking reader/writers. Fixed in 0.7.0

> Timeout Exception during getFileStatus() 
> -
>
> Key: HUDI-1497
> URL: https://issues.apache.org/jira/browse/HUDI-1497
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.7.0
>
>
> Seeing this happening when running RFC-15 branch in long running mode. There 
> could be a resource leak as I am seeing this consistently after every 1 or 2 
> hour period runs.  The below log shows it is during accessing bootstrap index 
> but I am seeing it in getFileStatus() for other files too.
>  
>  
> Caused by: java.io.InterruptedIOException: getFileStatus on 
> s3://robinhood-encrypted-hudi-data-cove/dummy/balaji/sickle/public/client_ledger_clientledgerbalance/test_v4/.hoodie/.aux/.bootstrap/.partitions/-----0_1-0-1_01.hfile:
>  com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
> waiting for connection from poolCaused by: java.io.InterruptedIOException: 
> getFileStatus on 
> s3://robinhood-encrypted-hudi-data-cove/dummy/balaji/sickle/public/client_ledger_clientledgerbalance/test_v4/.hoodie/.aux/.bootstrap/.partitions/-----0_1-0-1_01.hfile:
>  com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
> waiting for connection from pool at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:141) at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:117) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1859)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1823)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1763) 
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1627) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2500) at 
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.exists(HoodieWrapperFileSystem.java:549)
>  at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.(HFileBootstrapIndex.java:102)
>  ... 33 moreCaused by: com.amazonaws.SdkClientException: Unable to execute 
> HTTP request: Timeout waiting for connection from pool at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1113)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1063)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176) at 
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1253)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1053)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1841)
>  ... 39 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1311) Writes creating/updating large number of files seeing errors when deleting marker files in S3

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1311:
-
Status: Open  (was: New)

> Writes creating/updating large number of files seeing errors when deleting 
> marker files in S3
> -
>
> Key: HUDI-1311
> URL: https://issues.apache.org/jira/browse/HUDI-1311
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Dont have the exception trace handy. Will add them when I run into this next 
> time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1311) Writes creating/updating large number of files seeing errors when deleting marker files in S3

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-1311.
--
Resolution: Fixed

> Writes creating/updating large number of files seeing errors when deleting 
> marker files in S3
> -
>
> Key: HUDI-1311
> URL: https://issues.apache.org/jira/browse/HUDI-1311
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Vinoth Chandar
>Priority: Major
>
> Dont have the exception trace handy. Will add them when I run into this next 
> time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1311) Writes creating/updating large number of files seeing errors when deleting marker files in S3

2021-01-22 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-1311:


Assignee: Vinoth Chandar

> Writes creating/updating large number of files seeing errors when deleting 
> marker files in S3
> -
>
> Key: HUDI-1311
> URL: https://issues.apache.org/jira/browse/HUDI-1311
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Vinoth Chandar
>Priority: Major
>
> Dont have the exception trace handy. Will add them when I run into this next 
> time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1311) Writes creating/updating large number of files seeing errors when deleting marker files in S3

2021-01-22 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270229#comment-17270229
 ] 

Vinoth Chandar commented on HUDI-1311:
--

I noticed something similar. This was related to the leak as well. should be 
fixed now 

> Writes creating/updating large number of files seeing errors when deleting 
> marker files in S3
> -
>
> Key: HUDI-1311
> URL: https://issues.apache.org/jira/browse/HUDI-1311
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Dont have the exception trace handy. Will add them when I run into this next 
> time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-io commented on pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-22 Thread GitBox


codecov-io commented on pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#issuecomment-765495259


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=h1) Report
   > Merging 
[#2475](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=desc) (8d073b6) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/048633da1a913a05252b1b5dea0b3d40d75c81b4?el=desc)
 (048633d) will **increase** coverage by `19.25%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2475/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2475   +/-   ##
   =
   + Coverage 50.17%   69.43%   +19.25% 
   + Complexity 3050  357 -2693 
   =
 Files   419   53  -366 
 Lines 18931 1930-17001 
 Branches   1948  230 -1718 
   =
   - Hits   9498 1340 -8158 
   + Misses 8657  456 -8201 
   + Partials776  134  -642 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2475?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/metadata/BaseTableMetadata.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvQmFzZVRhYmxlTWV0YWRhdGEuamF2YQ==)
 | | | |
   | 
[.../apache/hudi/common/model/HoodieRecordPayload.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVJlY29yZFBheWxvYWQuamF2YQ==)
 | | | |
   | 
[...metadata/HoodieMetadataMergedLogRecordScanner.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0YWRhdGEvSG9vZGllTWV0YWRhdGFNZXJnZWRMb2dSZWNvcmRTY2FubmVyLmphdmE=)
 | | | |
   | 
[...ain/java/org/apache/hudi/cli/utils/CommitUtil.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL0NvbW1pdFV0aWwuamF2YQ==)
 | | | |
   | 
[...che/hudi/common/table/timeline/HoodieTimeline.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | | | |
   | 
[...i/common/table/log/block/HoodieHFileDataBlock.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVIRmlsZURhdGFCbG9jay5qYXZh)
 | | | |
   | 
[...e/hudi/common/engine/LocalTaskContextSupplier.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Mb2NhbFRhc2tDb250ZXh0U3VwcGxpZXIuamF2YQ==)
 | | | |
   | 
[...ava/org/apache/hudi/payload/AWSDmsAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvcGF5bG9hZC9BV1NEbXNBdnJvUGF5bG9hZC5qYXZh)
 | | | |
   | 
[...rg/apache/hudi/cli/commands/SavepointsCommand.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL1NhdmVwb2ludHNDb21tYW5kLmphdmE=)
 | | | |
   | 
[...ache/hudi/common/table/timeline/TimelineUtils.java](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lVXRpbHMuamF2YQ==)
 | | | |
   | ... and [355 
more](https://codecov.io/gh/apache/hudi/pull/2475/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2473: [HOTFIX] Revert upgrade flink verison to 1.12.0

2021-01-22 Thread GitBox


vinothchandar commented on pull request #2473:
URL: https://github.com/apache/hudi/pull/2473#issuecomment-765494536


   I will time out in an hour and merge/land myself to do RC2 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2473: [HOTFIX] Revert upgrade flink verison to 1.12.0

2021-01-22 Thread GitBox


vinothchandar commented on pull request #2473:
URL: https://github.com/apache/hudi/pull/2473#issuecomment-765492708


   Good catch. Lets file a JIRA to close the testing gap around this? 
   
   Please merge into master when ready. I will port to 0.7.0 branch to do a RC2 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua closed pull request #2472: [hotfix] Add back dependency org.apache.flink:flink-connector-kafka-b…

2021-01-22 Thread GitBox


yanghua closed pull request #2472:
URL: https://github.com/apache/hudi/pull/2472


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #2472: [hotfix] Add back dependency org.apache.flink:flink-connector-kafka-b…

2021-01-22 Thread GitBox


yanghua commented on pull request #2472:
URL: https://github.com/apache/hudi/pull/2472#issuecomment-765487956


   @danny0405 I am closing this PR now because @wangxianghu fixed it via PR 
#2473 . Please feel free to reopen it if you have any opinions.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #2472: [hotfix] Add back dependency org.apache.flink:flink-connector-kafka-b…

2021-01-22 Thread GitBox


yanghua commented on pull request #2472:
URL: https://github.com/apache/hudi/pull/2472#issuecomment-765482271


   @wangxianghu Can we close this PR now?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] teeyog closed pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-22 Thread GitBox


teeyog closed pull request #2447:
URL: https://github.com/apache/hudi/pull/2447


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] teeyog opened a new pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-22 Thread GitBox


teeyog opened a new pull request #2475:
URL: https://github.com/apache/hudi/pull/2475


   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   To read the hudi table, you need to specify the path, but the path is not 
only the tablePath corresponding to the table, but needs to be determined by 
the partition directory structure. Different keyGenerators correspond to 
different partition directory structures. The first-level partition directory 
uses path=```.../table/*/*```, the secondary partition directory 
path=```.../table/*/*/*```,so it is troublesome to let the user specify the 
data path, the user only needs to specify the tablePath:  ```.../table```
   
   At the same time, after reading the hudi table by configuring 
path=```.../table```, it is more convenient to use sparksql to query the hudi 
table. You only need to add tabproperties to the hive table metadata: 
```spark.sql.sources.provider= hudi```, you can automatically convert the hive 
table to the hudi table.
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-22 Thread GitBox


codecov-io edited a comment on pull request #2447:
URL: https://github.com/apache/hudi/pull/2447#issuecomment-760949326


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=h1) Report
   > Merging 
[#2447](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=desc) (ebf2c70) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/a38612b10f6ae04644519270f9b5eb631a77c148?el=desc)
 (a38612b) will **increase** coverage by `18.73%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2447/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2447   +/-   ##
   =
   + Coverage 50.69%   69.43%   +18.73% 
   + Complexity 3059  357 -2702 
   =
 Files   419   53  -366 
 Lines 18810 1930-16880 
 Branches   1924  230 -1694 
   =
   - Hits   9535 1340 -8195 
   + Misses 8498  456 -8042 
   + Partials777  134  -643 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <100.00%> (-0.06%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `88.79% <100.00%> (ø)` | `28.00 <1.00> (ø)` | |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `70.50% <0.00%> (-0.36%)` | `50.00% <0.00%> (-1.00%)` | |
   | 
[...i/common/table/view/FileSystemViewStorageType.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvRmlsZVN5c3RlbVZpZXdTdG9yYWdlVHlwZS5qYXZh)
 | | | |
   | 
[...apache/hudi/common/engine/HoodieEngineContext.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Ib29kaWVFbmdpbmVDb250ZXh0LmphdmE=)
 | | | |
   | 
[...che/hudi/common/table/log/HoodieLogFileReader.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGaWxlUmVhZGVyLmphdmE=)
 | | | |
   | 
[...in/java/org/apache/hudi/hive/HoodieHiveClient.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZUNsaWVudC5qYXZh)
 | | | |
   | 
[.../java/org/apache/hudi/common/util/CommitUtils.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQ29tbWl0VXRpbHMuamF2YQ==)
 | | | |
   | 
[...e/hudi/common/table/log/block/HoodieDataBlock.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEYXRhQmxvY2suamF2YQ==)
 | | | |
   | 
[...udi/common/table/timeline/dto/FSPermissionDTO.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GU1Blcm1pc3Npb25EVE8uamF2YQ==)
 | | | |
   | 
[...org/apache/hudi/common/util/collection/Triple.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9UcmlwbGUuamF2YQ==)
 | | | |
   | ... and [335 
more](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries 

[GitHub] [hudi] leesf commented on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-22 Thread GitBox


leesf commented on pull request #2447:
URL: https://github.com/apache/hudi/pull/2447#issuecomment-765453573


   @teeyog it contains unrelated commits from master, you would use `git rebase 
-i master`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-22 Thread GitBox


codecov-io edited a comment on pull request #2447:
URL: https://github.com/apache/hudi/pull/2447#issuecomment-760949326


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=h1) Report
   > Merging 
[#2447](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=desc) (ebf2c70) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/a38612b10f6ae04644519270f9b5eb631a77c148?el=desc)
 (a38612b) will **decrease** coverage by `41.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2447/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2447   +/-   ##
   
   - Coverage 50.69%   9.68%   -41.01% 
   + Complexity 3059  48 -3011 
   
 Files   419  53  -366 
 Lines 188101930-16880 
 Branches   1924 230 -1694 
   
   - Hits   9535 187 -9348 
   + Misses 84981730 -6768 
   + Partials777  13  -764 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <100.00%> (-59.80%)` | `0.00 <1.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `83.62% <100.00%> (-5.18%)` | `28.00 <1.00> (ø)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2447/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` 

[GitHub] [hudi] leesf commented on pull request #2473: [HOTFIX] Revert upgrade flink verison to 1.12.0

2021-01-22 Thread GitBox


leesf commented on pull request #2473:
URL: https://github.com/apache/hudi/pull/2473#issuecomment-765448614


   @yanghua @danny0405 please also take a pass here. and this patch should be 
merged into 0.7.0 cc @vinothchandar 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-258) Hive Query engine not supporting join queries between RT and RO tables

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-258:
-
Labels: bug-bash-0.6.0 help-requested user-support-issues  (was: 
bug-bash-0.6.0 help-requested)

> Hive Query engine not supporting join queries between RT and RO tables
> --
>
> Key: HUDI-258
> URL: https://issues.apache.org/jira/browse/HUDI-258
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: bug-bash-0.6.0, help-requested, user-support-issues
>
> Description : 
> [https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619]
>  
> Root Cause: Hive is tracking getSplits calls by dataset basePath and does not 
> take INputFormatClass into account. Hence getSplits() is called only once. In 
> the case of RO and RT tables, they both have same dataset base-path but 
> differ in the InputFormatClass. Due to this, Hive join query is returning 
> weird results.
>  
> =
> The result of the demo is very strange
> (Step 6(a))
>  
> {{ select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
> stock_ticks_mor_rt where  symbol = 'GOOG';
>  select `_hoodie_commit_time`, symbol, ts, volume, open, close  from 
> stock_ticks_mor where  symbol = 'GOOG';}}
> return as demo
> BUT!
>  
> {{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b  
> on a.key=b.key where a.ts != b.ts
> ...
> ++---+---+--+
> | a.key  | a.ts  | b.ts  |
> ++---+---+--+
> ++---+---+--+}}
>  
> {{0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from 
> stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 
> 'GOOG_2018-08-31 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Execution log at: 
> /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
> 2019-07-18 09:13:20 Starting to launch local task to process map join;  
> maximum memory = 477626368
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> 2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into 
> file: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
> 2019-07-18 09:13:21 Uploaded 1 File to: 
> file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
>  (317 bytes)
> 2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
> +-+--+--+--+
> |a.key| a.ts | b.ts |
> +-+--+--+--+
> | GOOG_2018-08-31 10  | 2018-08-31 10:29:00  | 2018-08-31 10:29:00  |
> +-+--+--+--+
> 1 row selected (7.207 seconds)
> 0: jdbc:hive2://hiveserver:1> select a.key,a.ts,b.ts from stock_ticks_mor 
> a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 
> 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. spark, 
> tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Execution log at: 
> 

[jira] [Updated] (HUDI-219) Tabify hudi docker demo page

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-219:
-
Labels: user-support-issues  (was: )

> Tabify hudi docker demo page
> 
>
> Key: HUDI-219
> URL: https://issues.apache.org/jira/browse/HUDI-219
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>  Labels: user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-219) Tabify hudi docker demo page

2021-01-22 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270180#comment-17270180
 ] 

sivabalan narayanan commented on HUDI-219:
--

[~bhasudha]: Is this already taken care or still yet to be done? 

> Tabify hudi docker demo page
> 
>
> Key: HUDI-219
> URL: https://issues.apache.org/jira/browse/HUDI-219
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Bhavani Sudha Saktheeswaran
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-274) Consolidate all scripts under top level scripts directory

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-274:
-
Labels: starter user-support-issues  (was: starter)

> Consolidate all scripts under top level scripts directory
> -
>
> Key: HUDI-274
> URL: https://issues.apache.org/jira/browse/HUDI-274
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Balaji Varadarajan
>Assignee: liwei
>Priority: Major
>  Labels: starter, user-support-issues
>
> Before we do this, let us revisit one more time if this is ideal. It has 
> pros/cons. Moving to one place makes it easy to find but the script should 
> assume the inter-directory structure. Also, each sub-module is not contained 
> entirely as the script is in different place
> This came up in a code-review discussion : 
> https://github.com/apache/incubator-hudi/pull/918#discussion_r327904862
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-619) Investigate and implement mechanism to have hive/presto/sparksql queries avoid stitching and return null values for hoodie columns

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-619:
-
Labels: user-support-issues  (was: )

> Investigate and implement mechanism to have hive/presto/sparksql queries 
> avoid stitching and return null values for hoodie columns 
> ---
>
> Key: HUDI-619
> URL: https://issues.apache.org/jira/browse/HUDI-619
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Presto Integration, Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> This idea is suggested by Vinoth during RFC review. This ticket is to track 
> the feasibility and implementation of it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-849) Turn on incremental Syncing by default for DeltaStreamer and spark streaming cases

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-849:
-
Labels: user-support-issues  (was: )

> Turn on incremental Syncing by default for DeltaStreamer and spark streaming 
> cases
> --
>
> Key: HUDI-849
> URL: https://issues.apache.org/jira/browse/HUDI-849
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: DeltaStreamer, Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-984) Support Hive 1.x out of box

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-984:
-
Labels: user-support-issues  (was: )

> Support Hive 1.x out of box
> ---
>
> Key: HUDI-984
> URL: https://issues.apache.org/jira/browse/HUDI-984
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> With 0.5.0, Hudi is using 2.x as part of its compile time dependency and 
> works with Hive 2.x servers out of the box.
> We need similar support for Hive 1.x as it is still being used.
> 1. Hive 1.x servers can run queries with Hudi table
> 2. Hive Sync must happen successfully between Hudi tables and Hive 1.x 
> services
>  
> Important Note: Hive 1.x has 2 classes of versions:
>  # pre 1.2.0
>  # 1.2.0 and later
> We had earlier found out that those 2 classes are not compatible with each 
> other unfortunately. CDH version of Hive used to have pre 1.2.0. We need to 
> look at the feasibility, cost and impact of supporting of one or more of this 
> class.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-865) Improve Hive Syncing by directly translating avro schema to Hive types

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-865:
-
Labels: pull-request-available starter user-support-issues  (was: 
pull-request-available starter)

> Improve Hive Syncing by directly translating avro schema to Hive types
> --
>
> Key: HUDI-865
> URL: https://issues.apache.org/jira/browse/HUDI-865
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available, starter, user-support-issues
>
> With the current code in master and proposed improvements with  
> [https://github.com/apache/incubator-hudi/pull/1559,|https://github.com/apache/incubator-hudi/pull/1559]
> Hive Sync integration would resort to the following translations for finding 
> table schema
>  Avro-Schema to Parquet-Schema to Hive Schema transformations
> We need to implement logic to skip the extra hop to parquet schema when 
> generating hive schema. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1070) Direct write from spark to Parquet when doing Upserts, Inserts and Deletes

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1070:
--
Labels: user-support-issues  (was: )

> Direct write from spark to Parquet when doing Upserts, Inserts and Deletes 
> ---
>
> Key: HUDI-1070
> URL: https://issues.apache.org/jira/browse/HUDI-1070
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Common Core, Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
>
> After we land the  support for direct write to parquet for bulk-insert 
> operations, we need to follow up with similar support for other write 
> operations such as insert, upsert and deletes.
> From API perspective, we need to expose Row in HoodieRecord so that we can 
> support custom merges. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1111) Highlight Hudi guarantees in documentation section of website

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-:
--
Labels: user-support-issues  (was: )

> Highlight Hudi guarantees in documentation section of website 
> --
>
> Key: HUDI-
> URL: https://issues.apache.org/jira/browse/HUDI-
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> [https://github.com/apache/hudi/issues/1795]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1116) Support time travel using timestamp type

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1116:
--
Labels: user-support-issues  (was: )

> Support time travel using timestamp type
> 
>
> Key: HUDI-1116
> URL: https://issues.apache.org/jira/browse/HUDI-1116
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: liwei
>Priority: Major
>  Labels: user-support-issues
>
>  
> {{Currently, we use commit time to mimic time-travel queries. We need ability 
> to handle time-travel with a proper timestamp provided.}}
> {{}}
> {{For e:g: }}
> {{spark.read  .format(“hudi”).option(“timestampAsOf”, 
> “2019-01-01”).load(“/path/to/my/table”)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1117) Add tdunning json library to spark and utilities bundle

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1117:
--
Labels: user-support-issues  (was: )

> Add tdunning json library to spark and utilities bundle
> ---
>
> Key: HUDI-1117
> URL: https://issues.apache.org/jira/browse/HUDI-1117
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> Exception during Hive Sync:
> ```
> An error occurred while calling o175.save.\n: java.lang.NoClassDefFoundError: 
> org/json/JSONException\n\tat 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:10847)\n\tat
>  
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10047)\n\tat
>  
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10128)\n\tat
>  
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:209)\n\tat
>  
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)\n\tat
>  org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)\n\tat 
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)\n\tat 
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)\n\tat 
> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)\n\tat 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)\n\tat 
> org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)\n\tat 
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:515)\n\tat
>  
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:498)\n\tat
>  
> org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:488)\n\tat
>  
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:273)\n\tat
>  org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:146)\n\tat
> ```
> This is from using hudi-spark-bundle. 
> [https://github.com/apache/hudi/issues/1787]
> JSONException class is coming from 
> https://mvnrepository.com/artifact/org.json/json There is licensing issue and 
> hence not part of hudi bundle packages. The underlying issue is due to Hive 
> 1.x vs 2.x ( See 
> https://issues.apache.org/jira/browse/HUDI-150?jql=text%20~%20%22org.json%22%20and%20project%20%3D%20%22Apache%20Hudi%22%20)
> Spark Hive integration still brings in hive 1.x jars which depends on 
> org.json. I believe this was provided in user's environment and hence we have 
> not seen folks complaining about this issue.
> Even though this is not Hudi issue per se, let me check a jar with compatible 
> license : https://mvnrepository.com/artifact/com.tdunning/json/1.8 and if it 
> works, we will add to 0.6 bundles after discussing with community. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-73) Support vanilla Avro Kafka Source in HoodieDeltaStreamer

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-73:

Labels: pull-request-available user-support-issues  (was: 
pull-request-available)

> Support vanilla Avro Kafka Source in HoodieDeltaStreamer
> 
>
> Key: HUDI-73
> URL: https://issues.apache.org/jira/browse/HUDI-73
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.8.0
>
>
> Context : [https://github.com/uber/hudi/issues/597]
> Currently, Avro Kafka Source expects the installation to use Confluent 
> version with SchemaRegistry server running. We need to support the Kafka 
> installations which do not use Schema Registry by allowing 
> FileBasedSchemaProvider to be integrated to AvroKafkaSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-74) Improve compaction support in HoodieDeltaStreamer & CLI

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-74:

Labels: user-support-issues  (was: )

> Improve compaction support in HoodieDeltaStreamer & CLI
> ---
>
> Key: HUDI-74
> URL: https://issues.apache.org/jira/browse/HUDI-74
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: CLI, DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
>
> Currently, the only way to safely schedule and execute a compaction which 
> will preserve checkpoints is through inline compaction. But this is diasbled 
> by default for HoodieDeltaStreamer. 
>  
> Also, the other option to schedule a compaction is through Hoodie CLI. We 
> need to support option to copy last delta-instant's extra-metadata  to make 
> this a viable option for working with DeltaStreamer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-96) Use Command line options instead of positional arguments when launching spark applications from various CLI commands

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-96?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-96:

Labels: newbie pull-request-available user-support-issues  (was: newbie 
pull-request-available)

> Use Command line options instead of positional arguments when launching spark 
> applications from various CLI commands
> 
>
> Key: HUDI-96
> URL: https://issues.apache.org/jira/browse/HUDI-96
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: CLI, newbie
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Minor
>  Labels: newbie, pull-request-available, user-support-issues
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hoodie CLI commands like compaction/rollback/repair/savepoints/parquet-import 
> relies on launching a spark application to perform their operations (look at 
> SparkMain.java). 
> SparkMain (Look at SparkMain.main()) relies on positional arguments for 
> passing  various CLI options. Instead we should define proper CLI options in 
> SparkMain and use them (using Jcommander)  to improve readability and avoid 
> accidental errors at call sites. For e.g : See 
> com.uber.hoodie.utilities.HoodieCompactor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-110) Better defaults for Partition extractor for Spark DataSOurce and DeltaStreamer

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-110:
-
Labels: user-support-issues  (was: )

> Better defaults for Partition extractor for Spark DataSOurce and DeltaStreamer
> --
>
> Key: HUDI-110
> URL: https://issues.apache.org/jira/browse/HUDI-110
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer, Spark Integration, Usability
>Reporter: Balaji Varadarajan
>Priority: Minor
>  Labels: user-support-issues
>
> Currently
> SlashEncodedDayPartitionValueExtractor is the default being used. This is not 
> a common format outside Uber.
>  
> Also, Spark DataSource provides partitionedBy clauses which has not been 
> integrated for Hudi Data Source.  We need to investigate how we can leverage 
> partitionBy clause for partitioning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-157) Allow more than 1 compaction to be run concurrently in deltastreamer after MOR Incremental read is fully supported

2021-01-22 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270169#comment-17270169
 ] 

sivabalan narayanan commented on HUDI-157:
--

[~vbalaji]: is this something user-reported or more of an enhancement. also, do 
we need to tag this with "user-support-issues" label? 

> Allow more than 1 compaction to be run concurrently in deltastreamer after 
> MOR Incremental read is fully supported
> --
>
> Key: HUDI-157
> URL: https://issues.apache.org/jira/browse/HUDI-157
> Project: Apache Hudi
>  Issue Type: Task
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: help-wanted
>
> Only 1 compaction is run by deltastreamer. Once incremental MOR  is 
> supported, we can allow concurrent compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-226) Hudi Website - Provide links to documentation corresponding to older release versions

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-226:
-
Labels: user-support-issues  (was: )

> Hudi Website - Provide links to documentation corresponding to older release 
> versions
> -
>
> Key: HUDI-226
> URL: https://issues.apache.org/jira/browse/HUDI-226
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Docs, docs-chinese, newbie
>Reporter: Balaji Varadarajan
>Assignee: Ethan Guo
>Priority: Major
>  Labels: user-support-issues
>
> While this may be too difficult to do it retroactively for previous versions, 
> we need to support this for apache releases. 
> See flink website (e:g - [https://flink.apache.org/] you will see a link 1.9 
> version  [https://ci.apache.org/projects/flink/flink-docs-release-1.9/]
> For older releases, 0.4.6 and 0.4.7, we have created git tags 
> *hoodie-site-0.4.6 and*  *hoodie-site-0.4.7* 
> *You can checkout the tags and read README.md to access and run website 
> locally.*
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1129) AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1129:
--
Labels: pull-request-available user-support-issues  (was: 
pull-request-available)

> AvroConversionUtils unable to handle avro to row transformation when passing 
> evolved schema 
> 
>
> Key: HUDI-1129
> URL: https://issues.apache.org/jira/browse/HUDI-1129
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: pull-request-available, user-support-issues
>
> Unit test to repro : 
> [https://github.com/apache/hudi/pull/1844/files#diff-2c3763c5782af9c3cbc02e2935211587R476]
> Context in : 
> [https://github.com/apache/hudi/issues/1845#issuecomment-665180775] (issue 2)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1128) DeltaStreamer not handling avro records written with older schema

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1128:
--
Labels: schema-evolution user-support-issues  (was: schema-evolution)

> DeltaStreamer not handling avro records written with older schema
> -
>
> Key: HUDI-1128
> URL: https://issues.apache.org/jira/browse/HUDI-1128
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: schema-evolution, user-support-issues
> Fix For: 0.8.0
>
>
> Context:  [https://github.com/apache/hudi/issues/1845]
> Look at issue 1 of 
> [https://github.com/apache/hudi/issues/1845#issuecomment-665180775]
> When deserializing bytes to avro in OverwriteWithLatestAvroPayload, we are 
> passing latest schema which is failing when the original record was written 
> with older schema
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1201) HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset when commit files do not have checkpoint

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1201:
--
Labels: user-support-issues  (was: )

> HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset 
> when commit files do not have checkpoint
> -
>
> Key: HUDI-1201
> URL: https://issues.apache.org/jira/browse/HUDI-1201
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Trevorzhang
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> [https://github.com/apache/hudi/issues/1985]
>  
> It would be easier for user to just specify deltastreamer to read from 
> earliest offset instead  of implementing -initial-checkpoint-provider or 
> passing raw kafka checkpoints when the table was initially bootstrapped 
> through spark.write().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1210) Update doc to clarify that start timestamp is exclusive for incremental queries

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1210:
--
Labels: user-support-issues  (was: )

> Update doc to clarify that start timestamp is exclusive for incremental 
> queries
> ---
>
> Key: HUDI-1210
> URL: https://issues.apache.org/jira/browse/HUDI-1210
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Incremental Pull
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> [https://github.com/apache/hudi/issues/1973#issuecomment-675087028]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1212) GDPR: Support deletions of records on all versions of Hudi dataset

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1212:
--
Labels: user-support-issues  (was: )

> GDPR: Support deletions of records on  all versions of Hudi dataset
> ---
>
> Key: HUDI-1212
> URL: https://issues.apache.org/jira/browse/HUDI-1212
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Incremental Pull, Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> Incremental Pull should also stop returning the record on historical  datset 
> when we delete them from latest snapshot.
>  
> Context from Mailing list email :
>  
> Hello,
> I am Siva's colleague and I am working on the problem below as well.
> I would like to describe what we are trying to achieve with Hudi as well as 
> our current way of working and our GDPR and "Right To Be Forgotten " 
> compliance policies.
> Our requirements :
> - We wish to apply a strict interpretation of the RTBF.  In other words, when 
> we remove a person's data, it should be throughout the historical data and 
> not just the latest snapshot.
> - We wish to use Hudi to reduce our storage requirements using upserts and 
> don't want to have duplicates between commits.
> - We wish to retain history for persons who have not requested to be 
> forgotten and therefore we do not want to delete commit files from the 
> history as some have proposed.
> We have tried a couple of solutions, but so far without success :
> - replay the data omitting the data of the persons who have requested to be 
> forgotten.  We wanted to manipulate the commit times to rebuild the history.
> We found that we couldn't manipulate the commit times and retain the history.
> - replay the data omitting the data of the persons who have requested to be 
> forgotten, but writing to a date-based partition folder using the 
> "partitionpath" parameter.
> We found that commits using upserts between the partitionpath folders, do not 
> ignore data that is unchanged between 2 commit dates as when using the 
> default commit file system, so we will not save on our storage or speed up 
> our  processing using this technique.
> So basically we would like to find a way to apply a strict RTBF, GDPR, 
> maintain history and time-travel (large history) and save storage space using 
> Hudi.
> Can anyone see a way to achieve this?
> Kind Regards,
> David Rosalia
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1214) Need ability to set deltastreamer checkpoints when doing Spark datasource writes

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1214:
--
Labels: user-support-issues  (was: )

> Need ability to set deltastreamer checkpoints when doing Spark datasource 
> writes
> 
>
> Key: HUDI-1214
> URL: https://issues.apache.org/jira/browse/HUDI-1214
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Trevorzhang
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> Such support is needed  for bootstrapping cases when users use spark write to 
> do initial bootstrap and then subsequently use deltastreamer.
> DeltaStreamer manages checkpoints inside hoodie commit files and expects 
> checkpoints in previously committed metadata. Users are expected to pass 
> checkpoint or initial checkpoint provider when performing bootstrap through 
> deltastreamer. Such support is not present when doing bootstrap using Spark 
> Datasource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1214) Need ability to set deltastreamer checkpoints when doing Spark datasource writes

2021-01-22 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270166#comment-17270166
 ] 

sivabalan narayanan commented on HUDI-1214:
---

[~vbalaji]: is this a duplicate of 
https://issues.apache.org/jira/browse/HUDI-1280 ? 

> Need ability to set deltastreamer checkpoints when doing Spark datasource 
> writes
> 
>
> Key: HUDI-1214
> URL: https://issues.apache.org/jira/browse/HUDI-1214
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Trevorzhang
>Priority: Major
> Fix For: 0.8.0
>
>
> Such support is needed  for bootstrapping cases when users use spark write to 
> do initial bootstrap and then subsequently use deltastreamer.
> DeltaStreamer manages checkpoints inside hoodie commit files and expects 
> checkpoints in previously committed metadata. Users are expected to pass 
> checkpoint or initial checkpoint provider when performing bootstrap through 
> deltastreamer. Such support is not present when doing bootstrap using Spark 
> Datasource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1235) Default vaiue of KeyGenerator configuration is wrongly documented

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1235:
--
Labels: user-support-issues  (was: )

> Default vaiue of KeyGenerator configuration is wrongly documented 
> --
>
> Key: HUDI-1235
> URL: https://issues.apache.org/jira/browse/HUDI-1235
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs, docs-chinese
>Reporter: Balaji Varadarajan
>Assignee: vinoyang
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> Property: {{hoodie.datasource.write.keygenerator.class}}, Default: 
> {{org.apache.hudi.SimpleKeyGenerator}}
> {{}}
> {{Should be }}org.apache.hudi.keygen{{}}{{.SimpleKeyGenerator}}{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1271) Add utility scripts to perform Restores

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1271:
--
Labels: user-support-issues  (was: )

> Add utility scripts to perform Restores
> ---
>
> Key: HUDI-1271
> URL: https://issues.apache.org/jira/browse/HUDI-1271
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: CLI, Utilities
>Reporter: Balaji Varadarajan
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> We need to expose commands for performing restores.
> We have similar scripts for cleaner : 
> org.apache.hudi.utilities.HoodieCleaner
> We need to add something similar for restores.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1272) Add utility scripts to manage Savepoints

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1272:
--
Labels: user-support-issues  (was: )

> Add utility scripts to manage Savepoints
> 
>
> Key: HUDI-1272
> URL: https://issues.apache.org/jira/browse/HUDI-1272
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: CLI, Utilities
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> We need to expose commands for manging savepoints.
> We have similar scripts for cleaner : 
> org.apache.hudi.utilities.HoodieCleaner
> We need to add something similar for restores.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1273) Add documentation for performing savepoints and restores

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1273:
--
Labels: user-support-issues  (was: )

> Add documentation for performing savepoints and restores
> 
>
> Key: HUDI-1273
> URL: https://issues.apache.org/jira/browse/HUDI-1273
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> Context: [https://github.com/apache/hudi/issues/2072]
> Users need documentation on how to perform restores and  savepoints. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1277) [DOC] Need documentation explaining how to write custom record payload class

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1277:
--
Labels: newbie user-support-issues  (was: newbie)

> [DOC] Need documentation explaining how to write custom record payload class
> 
>
> Key: HUDI-1277
> URL: https://issues.apache.org/jira/browse/HUDI-1277
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Docs, docs-chinese
>Reporter: Balaji Varadarajan
>Assignee: vinoyang
>Priority: Major
>  Labels: newbie, user-support-issues
> Fix For: 0.8.0
>
>
> Context : 
> https://lists.apache.org/thread.html/rd5d805d29c2f704d8ff2729457d27bca42e890bc01fc8e5e1f1943e3%40%3Cdev.hudi.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-1309) Listing Metadata unreadable in S3 as the log block is deemed corrupted

2021-01-22 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270154#comment-17270154
 ] 

sivabalan narayanan edited comment on HUDI-1309 at 1/22/21, 2:17 PM:
-

[~vbalaji]: is this a user reported issue? I couldn't find any issue linked. 


was (Author: shivnarayan):
[~vbalaji]: is this a user reported issue? I couldn't find any issue linked. If 
not, do we need to tag this as user-support-issues? 

> Listing Metadata unreadable in S3 as the log block is deemed corrupted
> --
>
> Key: HUDI-1309
> URL: https://issues.apache.org/jira/browse/HUDI-1309
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
>
> When running metadata list-partitions CLI command, I am seeing the below 
> messages and the partition list is empty. Was expecting 10K partitions.
>  
> {code:java}
>  36589 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Scanning 
> log file 
> HoodieLogFile{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0}
>  36590 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Found corrupted block 
> in file 
> HoodieLogFile{pathStr='s3a://robinhood-encrypted-hudi-data-cove/dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} with block size(3723305) running past EOF
>  36684 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Log 
> HoodieLogFile{pathStr='s3a:///dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} has a corrupted block at 14
>  44515 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.HoodieLogFileReader - Next available block 
> in 
> HoodieLogFile{pathStr='s3a:///dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045',
>  fileLen=0} starts at 3723319
>  44566 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - Found a 
> corrupt block in 
> s3a:///dev_hudi_tables/balaji_varadarajan/benchmark_1M_10K_partitions/.hoodie/metadata/metadata_partition/.f02585bd-bb02-43f6-8bc8-cec71df87d1e-0_00.log.1_0-23-206045
>  44567 [Spring Shell] INFO 
> org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner - M{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-1308) Issues found during testing RFC-15

2021-01-22 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270156#comment-17270156
 ] 

sivabalan narayanan edited comment on HUDI-1308 at 1/22/21, 2:17 PM:
-

[~vinoth] [~vbalaji]: Can we close this ticket or is there pending work to be 
done. 


was (Author: shivnarayan):
[~vinoth] [~vbalaji]: Can we close this ticket or is there pending work to be 
done. Also, do we need to tag "user-support-issues" to this? 

> Issues found during testing RFC-15
> --
>
> Key: HUDI-1308
> URL: https://issues.apache.org/jira/browse/HUDI-1308
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
> Fix For: 0.8.0
>
>
> THis is an umbrella ticket containing all the issues found during testing 
> RFC-15



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1278) Need a generic payload class which can skip late arriving data based on specific fields

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1278:
--
Labels: user-support-issues  (was: )

> Need a generic payload class which can skip late arriving data based on 
> specific fields
> ---
>
> Key: HUDI-1278
> URL: https://issues.apache.org/jira/browse/HUDI-1278
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer, Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: shenh062326
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> Context : 
> [https://lists.apache.org/thread.html/rd5d805d29c2f704d8ff2729457d27bca42e890bc01fc8e5e1f1943e3%40%3Cdev.hudi.apache.org%3E]
> We need to implement a Payload class (like OverwriteWithLatestAvroPayload) 
> which will skip late arriving data.
> Notes:
>  # combineAndGetUpdateValue() would need work
>  # The ordering needs to be specified based on 1 or more fields and should be 
> configurable.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1363) Provide Option to drop columns after they are used to generate partition or record keys

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1363:
--
Labels:   (was: user-support-issues)

> Provide Option to drop columns after they are used to generate partition or 
> record keys
> ---
>
> Key: HUDI-1363
> URL: https://issues.apache.org/jira/browse/HUDI-1363
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: liwei
>Priority: Major
> Fix For: 0.8.0
>
>
> Context: https://github.com/apache/hudi/issues/2213



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1497) Timeout Exception during getFileStatus()

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1497:
--
Labels:   (was: user-support-issues)

> Timeout Exception during getFileStatus() 
> -
>
> Key: HUDI-1497
> URL: https://issues.apache.org/jira/browse/HUDI-1497
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
>
> Seeing this happening when running RFC-15 branch in long running mode. There 
> could be a resource leak as I am seeing this consistently after every 1 or 2 
> hour period runs.  The below log shows it is during accessing bootstrap index 
> but I am seeing it in getFileStatus() for other files too.
>  
>  
> Caused by: java.io.InterruptedIOException: getFileStatus on 
> s3://robinhood-encrypted-hudi-data-cove/dummy/balaji/sickle/public/client_ledger_clientledgerbalance/test_v4/.hoodie/.aux/.bootstrap/.partitions/-----0_1-0-1_01.hfile:
>  com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
> waiting for connection from poolCaused by: java.io.InterruptedIOException: 
> getFileStatus on 
> s3://robinhood-encrypted-hudi-data-cove/dummy/balaji/sickle/public/client_ledger_clientledgerbalance/test_v4/.hoodie/.aux/.bootstrap/.partitions/-----0_1-0-1_01.hfile:
>  com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
> waiting for connection from pool at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:141) at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:117) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1859)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1823)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1763) 
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1627) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2500) at 
> org.apache.hudi.common.fs.HoodieWrapperFileSystem.exists(HoodieWrapperFileSystem.java:549)
>  at 
> org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.(HFileBootstrapIndex.java:102)
>  ... 33 moreCaused by: com.amazonaws.SdkClientException: Unable to execute 
> HTTP request: Timeout waiting for connection from pool at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1113)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1063)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
>  at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229) at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176) at 
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1253)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1053)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1841)
>  ... 39 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1381) Schedule compaction based on time elapsed

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1381:
--
Labels: pull-request-available  (was: pull-request-available 
user-support-issues)

> Schedule compaction based on time elapsed 
> --
>
> Key: HUDI-1381
> URL: https://issues.apache.org/jira/browse/HUDI-1381
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Balaji Varadarajan
>Assignee: karl wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> GH : [https://github.com/apache/hudi/issues/2229]
> It would be helpful to introduce configuration to schedule compaction based 
> on time elapsed since last scheduled compaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1452) RocksDB FileSystemView throwing NotSerializableError when embedded timeline server is turned off

2021-01-22 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1452:
--
Labels:   (was: user-support-issues)

> RocksDB FileSystemView throwing NotSerializableError when embedded timeline 
> server is turned off
> 
>
> Key: HUDI-1452
> URL: https://issues.apache.org/jira/browse/HUDI-1452
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Sreeram Ramji
>Priority: Major
>
> [https://github.com/apache/hudi/issues/2321]
>  
> We need to make RocksDBFileSystemView lazy initializable so that it would 
> seamlessly when run in executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >