[GitHub] [hudi] liujinhui1994 commented on issue #2162: [SUPPORT] Deltastreamer transform cannot add fields

2020-10-11 Thread GitBox


liujinhui1994 commented on issue #2162:
URL: https://github.com/apache/hudi/issues/2162#issuecomment-706911463


   @bhasudha   can you help me   thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ashishmgofficial commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-11 Thread GitBox


ashishmgofficial commented on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-706904989


   @bvaradar The json I had provided is the output of kafkacat utility which 
outputs as json. In our process we have Key = String and Value as AVRO for 
Kafka. Now the different schema is due to the inline data types in the json 
output of kafkacat which is read as is by spark



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2167)

2020-10-11 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c5e10d6  [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable 
(#2167)
c5e10d6 is described below

commit c5e10d668f9366f29bdf7721f7efe4140782527b
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Sun Oct 11 23:39:10 2020 -0700

[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable (#2167)

Remove APIs in `HoodieTestUtils`
- `createCommitFiles`
- `createDataFile`
- `createNewLogFile`
- `createCompactionRequest`

Migrated usages in `TestCleaner#testPendingCompactions`.

Also improved some API names in `HoodieTestTable`.
---
 .../hudi/cli/integ/ITTestRepairsCommand.java   |   2 +-
 .../java/org/apache/hudi/table/TestCleaner.java| 150 +++--
 .../rollback/TestMarkerBasedRollbackStrategy.java  |   8 +-
 .../hudi/common/testutils/HoodieTestTable.java |  31 -
 .../hudi/common/testutils/HoodieTestUtils.java |  75 ---
 .../hudi/hadoop/TestHoodieROTablePathFilter.java   |   2 +-
 6 files changed, 112 insertions(+), 156 deletions(-)

diff --git 
a/hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestRepairsCommand.java 
b/hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestRepairsCommand.java
index f277e33..133dcb0 100644
--- a/hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestRepairsCommand.java
+++ b/hudi-cli/src/test/java/org/apache/hudi/cli/integ/ITTestRepairsCommand.java
@@ -87,7 +87,7 @@ public class ITTestRepairsCommand extends 
AbstractShellIntegrationTest {
 testTable.addCommit("20160401010101")
 .withInserts(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH, 
"1", hoodieRecords1)
 .withInserts(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH, 
"2", hoodieRecords2)
-.withLogFile(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH);
+
.getFileIdWithLogFile(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH);
 
 
testTable.withInserts(HoodieTestDataGenerator.DEFAULT_SECOND_PARTITION_PATH, 
"4", hoodieRecords1)
 .withInserts(HoodieTestDataGenerator.DEFAULT_THIRD_PARTITION_PATH, 
"6", hoodieRecords1);
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java
index 152a981..00f1ea0 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java
@@ -51,7 +51,6 @@ import 
org.apache.hudi.common.table.timeline.versioning.clean.CleanMetadataMigra
 import 
org.apache.hudi.common.table.timeline.versioning.clean.CleanPlanMigrator;
 import 
org.apache.hudi.common.table.timeline.versioning.clean.CleanPlanV1MigrationHandler;
 import org.apache.hudi.common.table.view.TableFileSystemView;
-import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
 import org.apache.hudi.common.testutils.HoodieTestTable;
 import org.apache.hudi.common.testutils.HoodieTestUtils;
 import org.apache.hudi.common.util.CleanerUtils;
@@ -155,7 +154,7 @@ public class TestCleaner extends HoodieClientTestBase {
 assertTrue(table.getCompletedCleanTimeline().empty());
 
 HoodieIndex index = SparkHoodieIndex.createIndex(cfg);
-List taggedRecords = 
((JavaRDD)index.tagLocation(jsc.parallelize(records, 1), context, 
table)).collect();
+List taggedRecords = ((JavaRDD) 
index.tagLocation(jsc.parallelize(records, 1), context, table)).collect();
 checkTaggedRecords(taggedRecords, newCommitTime);
   }
 
@@ -550,7 +549,7 @@ public class TestCleaner extends HoodieClientTestBase {
 Map partitionAndFileId002 = 
testTable.addCommit("02")
 .withBaseFilesInPartition(p0, file1P0C0)
 .withBaseFilesInPartition(p1, file1P1C0)
-.withBaseFilesInPartitions(p0, p1);
+.getFileIdsWithBaseFilesInPartitions(p0, p1);
 
 List hoodieCleanStatsTwo = runCleaner(config, 1);
 // enableBootstrapSourceClean would delete the bootstrap base file as the 
same time
@@ -592,7 +591,7 @@ public class TestCleaner extends HoodieClientTestBase {
 // make next commit, with 2 updates to existing files, and 1 insert
 String file3P0C2 = testTable.addCommit("03")
 .withBaseFilesInPartition(p0, file1P0C0, file2P0C1)
-.withBaseFilesInPartitions(p0).get(p0);
+.getFileIdsWithBaseFilesInPartitions(p0).get(p0);
 List hoodieCleanStatsThree = runCleaner(config, 3);
 assertEquals(2,
 getCleanStat(hoodieCleanStatsThree, p0)
@@ -625,7 +624,7 @@ public class TestCleaner extends HoodieClientTestBase {
 String p0 = "2020/01/01";
 
 // Make 3 files, one base file and 2 log files associate

[GitHub] [hudi] yanghua merged pull request #2167: [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable

2020-10-11 Thread GitBox


yanghua merged pull request #2167:
URL: https://github.com/apache/hudi/pull/2167


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] aniejo commented on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

2020-10-11 Thread GitBox


aniejo commented on pull request #1760:
URL: https://github.com/apache/hudi/pull/1760#issuecomment-706900383


   @vinothchandar @bschell  appreciate any guidance regarding this error, I am 
using COW hudi options on Spark 3.0
   pyspark --packages 
org.apache.hudi:hudi-spark-bundle_2.12:0.6.1-SNAPSHOT,org.apache.spark:spark-avro_2.12:3.0.1
 
   
   df.write.format("hudi"). \
 options(**hudi_options). \
 mode("overwrite"). \
 save(basePath)
 
   On writing df as hudi parquet , it throws the below error : 
   
   to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: 
Lost task 0.0 in stage 3.0 (TID 3, 6049cd42243a, executor driver): 
java.lang.NoSuchMethodError: 
scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
   at 
org.apache.hudi.AvroConversionHelper$.createConverterToAvro(AvroConversionHelper.scala:344)
   at 
org.apache.hudi.AvroConversionUtils$$anonfun$2.apply(AvroConversionUtils.scala:50)
   at 
org.apache.hudi.AvroConversionUtils$$anonfun$2.apply(AvroConversionUtils.scala:47)
   at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:837)
   at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:837)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
   at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   at org.apache.spark.scheduler.Task.run(Task.scala:127)
   at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   
   Driver stacktrace:
   at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
   at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
   at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
   at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
   at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
   at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
   at scala.Option.foreach(Option.scala:407)
   at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
   at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
   at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
   at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120)
   at org.apache.spark.SparkContext.runJob(SparkContext.scala:2139)
   at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1423)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
   at org.apache.spark.rdd.RDD.take(RDD.scala:1396)
   at org.apache.spark.rdd.RDD.$anonfun$isEmpty$1(RDD.scala:1531)
   at 
scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
   at org.apache.spark.rdd.RDD.isEmpty(

[GitHub] [hudi] lw309637554 edited a comment on pull request #2173: [HUDI-1339] delete useless import in hudi-spark module

2020-10-11 Thread GitBox


lw309637554 edited a comment on pull request #2173:
URL: https://github.com/apache/hudi/pull/2173#issuecomment-706890444


   > LGTM. looks like we should prioritize checkstyle for scala.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on pull request #2173: [HUDI-1339] delete useless import in hudi-spark module

2020-10-11 Thread GitBox


lw309637554 commented on pull request #2173:
URL: https://github.com/apache/hudi/pull/2173#issuecomment-706890444


   > LGTM. looks like we should prioritize checkstyle for scala.
   
   yeah, when i change checkstyle for scala first.  maven build style-check  
Found 501 errors   in hudi-spark module
   
   error 
file=/Users/liwei/work-space/dla/opensource/incubator-hudi/hudi-spark/src/test/scala/org/apache/hudi/TestAvroConversionHelper.scala
 message=There should at least one a sing
   le empty line separating groups 3rdParty and spark. line=26 column=0 

 
   error 
file=/Users/liwei/work-space/dla/opensource/incubator-hudi/hudi-spark/src/test/scala/org/apache/hudi/TestAvroConversionHelper.scala
 message=org.scalatest. should be in grou
   p 3rdParty, not spark. line=27 column=0  

 
   Processed 23 file(s) 

 
   Found 501 errors   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] KarthickAN commented on issue #2154: [SUPPORT] Throwing org.apache.spark.shuffle.FetchFailedException consistently

2020-10-11 Thread GitBox


KarthickAN commented on issue #2154:
URL: https://github.com/apache/hudi/issues/2154#issuecomment-706861733


   @vinothchandar Thanks vinoth. Yes, with the increased memory it had no 
issues. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (b58daf2 -> c0472d3)

2020-10-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from b58daf2  [MINOR] remove unused generics type (#2163)
 add c0472d3  [HUDI-1184] Fix the support of hbase index partition path 
change (#1978)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/config/HoodieHBaseIndexConfig.java | 18 +
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  4 ++
 .../hudi/index/hbase/SparkHoodieHBaseIndex.java| 76 +-
 .../apache/hudi/index/hbase/TestHBaseIndex.java| 67 ++-
 4 files changed, 133 insertions(+), 32 deletions(-)



[GitHub] [hudi] vinothchandar merged pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

2020-10-11 Thread GitBox


vinothchandar merged pull request #1978:
URL: https://github.com/apache/hudi/pull/1978


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

2020-10-11 Thread GitBox


codecov-io edited a comment on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-706814593


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=h1) Report
   > Merging 
[#1978](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1978/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1978   +/-   ##
   =
 Coverage 53.60%   53.60%   
   - Complexity 2846 2847+1 
   =
 Files   359  359   
 Lines 1654816546-2 
 Branches   1780 1780   
   =
 Hits   8870 8870   
   + Misses 6920 6917-3 
   - Partials758  759+1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.71% <ø> (+0.01%)` | `1794.00 <ø> (+1.00)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (ø)` | `304.00 <ø> (ø)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../main/java/org/apache/hudi/common/util/Option.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvT3B0aW9uLmphdmE=)
 | `66.66% <0.00%> (-3.61%)` | `23.00% <0.00%> (+1.00%)` | :arrow_down: |
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `76.19% <0.00%> (ø)` | `20.00% <0.00%> (ø%)` | |
   | 
[...rg/apache/hudi/common/util/SerializationUtils.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU2VyaWFsaXphdGlvblV0aWxzLmphdmE=)
 | `88.00% <0.00%> (ø)` | `3.00% <0.00%> (ø%)` | |
   | 
[...e/hudi/exception/SchemaCompatabilityException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGFiaWxpdHlFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[...e/hudi/exception/SchemaCompatibilityException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGliaWxpdHlFeGNlcHRpb24uamF2YQ==)
 | `33.33% <0.00%> (ø)` | `1.00% <0.00%> (?%)` | |
   | 
[...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh)
 | `50.00% <0.00%> (+16.66%)` | `2.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

2020-10-11 Thread GitBox


codecov-io commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-706814593


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=h1) Report
   > Merging 
[#1978](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/1978/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1978   +/-   ##
   =
 Coverage 53.60%   53.60%   
   - Complexity 2846 2847+1 
   =
 Files   359  359   
 Lines 1654816546-2 
 Branches   1780 1780   
   =
 Hits   8870 8870   
   + Misses 6920 6917-3 
   - Partials758  759+1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.71% <ø> (+0.01%)` | `1794.00 <ø> (+1.00)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (ø)` | `304.00 <ø> (ø)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../main/java/org/apache/hudi/common/util/Option.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvT3B0aW9uLmphdmE=)
 | `66.66% <0.00%> (-3.61%)` | `23.00% <0.00%> (+1.00%)` | :arrow_down: |
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `76.19% <0.00%> (ø)` | `20.00% <0.00%> (ø%)` | |
   | 
[...rg/apache/hudi/common/util/SerializationUtils.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU2VyaWFsaXphdGlvblV0aWxzLmphdmE=)
 | `88.00% <0.00%> (ø)` | `3.00% <0.00%> (ø%)` | |
   | 
[...e/hudi/exception/SchemaCompatabilityException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGFiaWxpdHlFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[...e/hudi/exception/SchemaCompatibilityException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGliaWxpdHlFeGNlcHRpb24uamF2YQ==)
 | `33.33% <0.00%> (ø)` | `1.00% <0.00%> (?%)` | |
   | 
[...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh)
 | `50.00% <0.00%> (+16.66%)` | `2.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lw309637554 commented on pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution

2020-10-11 Thread GitBox


lw309637554 commented on pull request #2127:
URL: https://github.com/apache/hudi/pull/2127#issuecomment-706813674


   > lagging a bit. Will take a pass today and circle back.
   
   thanks,please help to review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [MINOR] remove unused generics type (#2163)

2020-10-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b58daf2  [MINOR] remove unused generics type (#2163)
b58daf2 is described below

commit b58daf29ba7f0100d16deb793757d93d073c9a03
Author: dugenkui 
AuthorDate: Mon Oct 12 09:38:42 2020 +0800

[MINOR] remove unused generics type (#2163)
---
 .../apache/hudi/table/action/commit/AbstractMergeHelper.java   |  2 +-
 .../org/apache/hudi/table/action/commit/SparkMergeHelper.java  |  2 +-
 .../org/apache/hudi/io/storage/HoodieFileReaderFactory.java| 10 +++---
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/AbstractMergeHelper.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/AbstractMergeHelper.java
index 1bbffad..8c92b00 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/AbstractMergeHelper.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/AbstractMergeHelper.java
@@ -94,7 +94,7 @@ public abstract class AbstractMergeHelper bootstrapReader = 
HoodieFileReaderFactory.getFileReader(bootstrapFileConfig, 
externalFilePath);
+HoodieFileReader bootstrapReader = 
HoodieFileReaderFactory.getFileReader(bootstrapFileConfig, 
externalFilePath);
 Schema bootstrapReadSchema;
 if (externalSchemaTransformation) {
   bootstrapReadSchema = bootstrapReader.getSchema();
diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java
index 697b5ac..2d130e3 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java
@@ -76,7 +76,7 @@ public class SparkMergeHelper 
extends AbstractMer
 }
 
 BoundedInMemoryExecutor wrapper = null;
-HoodieFileReader reader = HoodieFileReaderFactory.getFileReader(cfgForHoodieFile, mergeHandle.getOldFilePath());
+HoodieFileReader reader = 
HoodieFileReaderFactory.getFileReader(cfgForHoodieFile, 
mergeHandle.getOldFilePath());
 try {
   final Iterator readerIterator;
   if (baseFile.getBootstrapBaseFile().isPresent()) {
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.java
 
b/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.java
index 3c97b36..ff559c5 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.java
@@ -19,7 +19,6 @@
 package org.apache.hudi.io.storage;
 
 import org.apache.hudi.common.fs.FSUtils;
-import org.apache.hudi.common.model.HoodieRecordPayload;
 
 import org.apache.avro.generic.IndexedRecord;
 import org.apache.hadoop.conf.Configuration;
@@ -33,8 +32,7 @@ import static 
org.apache.hudi.common.model.HoodieFileFormat.HFILE;
 
 public class HoodieFileReaderFactory {
 
-  public static  
HoodieFileReader getFileReader(
-  Configuration conf, Path path) throws IOException {
+  public static  HoodieFileReader 
getFileReader(Configuration conf, Path path) throws IOException {
 final String extension = FSUtils.getFileExtension(path.toString());
 if (PARQUET.getFileExtension().equals(extension)) {
   return newParquetFileReader(conf, path);
@@ -46,13 +44,11 @@ public class HoodieFileReaderFactory {
 throw new UnsupportedOperationException(extension + " format not supported 
yet.");
   }
 
-  private static  
HoodieFileReader newParquetFileReader(
-  Configuration conf, Path path) throws IOException {
+  private static  HoodieFileReader 
newParquetFileReader(Configuration conf, Path path) {
 return new HoodieParquetReader<>(conf, path);
   }
 
-  private static  
HoodieFileReader newHFileFileReader(
-  Configuration conf, Path path) throws IOException {
+  private static  HoodieFileReader 
newHFileFileReader(Configuration conf, Path path) throws IOException {
 CacheConfig cacheConfig = new CacheConfig(conf);
 return new HoodieHFileReader<>(conf, path, cacheConfig);
   }



[hudi] branch master updated: [MINOR] remove unused generics type (#2163)

2020-10-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b58daf2  [MINOR] remove unused generics type (#2163)
b58daf2 is described below

commit b58daf29ba7f0100d16deb793757d93d073c9a03
Author: dugenkui 
AuthorDate: Mon Oct 12 09:38:42 2020 +0800

[MINOR] remove unused generics type (#2163)
---
 .../apache/hudi/table/action/commit/AbstractMergeHelper.java   |  2 +-
 .../org/apache/hudi/table/action/commit/SparkMergeHelper.java  |  2 +-
 .../org/apache/hudi/io/storage/HoodieFileReaderFactory.java| 10 +++---
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/AbstractMergeHelper.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/AbstractMergeHelper.java
index 1bbffad..8c92b00 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/AbstractMergeHelper.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/AbstractMergeHelper.java
@@ -94,7 +94,7 @@ public abstract class AbstractMergeHelper bootstrapReader = 
HoodieFileReaderFactory.getFileReader(bootstrapFileConfig, 
externalFilePath);
+HoodieFileReader bootstrapReader = 
HoodieFileReaderFactory.getFileReader(bootstrapFileConfig, 
externalFilePath);
 Schema bootstrapReadSchema;
 if (externalSchemaTransformation) {
   bootstrapReadSchema = bootstrapReader.getSchema();
diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java
index 697b5ac..2d130e3 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkMergeHelper.java
@@ -76,7 +76,7 @@ public class SparkMergeHelper 
extends AbstractMer
 }
 
 BoundedInMemoryExecutor wrapper = null;
-HoodieFileReader reader = HoodieFileReaderFactory.getFileReader(cfgForHoodieFile, mergeHandle.getOldFilePath());
+HoodieFileReader reader = 
HoodieFileReaderFactory.getFileReader(cfgForHoodieFile, 
mergeHandle.getOldFilePath());
 try {
   final Iterator readerIterator;
   if (baseFile.getBootstrapBaseFile().isPresent()) {
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.java
 
b/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.java
index 3c97b36..ff559c5 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.java
@@ -19,7 +19,6 @@
 package org.apache.hudi.io.storage;
 
 import org.apache.hudi.common.fs.FSUtils;
-import org.apache.hudi.common.model.HoodieRecordPayload;
 
 import org.apache.avro.generic.IndexedRecord;
 import org.apache.hadoop.conf.Configuration;
@@ -33,8 +32,7 @@ import static 
org.apache.hudi.common.model.HoodieFileFormat.HFILE;
 
 public class HoodieFileReaderFactory {
 
-  public static  
HoodieFileReader getFileReader(
-  Configuration conf, Path path) throws IOException {
+  public static  HoodieFileReader 
getFileReader(Configuration conf, Path path) throws IOException {
 final String extension = FSUtils.getFileExtension(path.toString());
 if (PARQUET.getFileExtension().equals(extension)) {
   return newParquetFileReader(conf, path);
@@ -46,13 +44,11 @@ public class HoodieFileReaderFactory {
 throw new UnsupportedOperationException(extension + " format not supported 
yet.");
   }
 
-  private static  
HoodieFileReader newParquetFileReader(
-  Configuration conf, Path path) throws IOException {
+  private static  HoodieFileReader 
newParquetFileReader(Configuration conf, Path path) {
 return new HoodieParquetReader<>(conf, path);
   }
 
-  private static  
HoodieFileReader newHFileFileReader(
-  Configuration conf, Path path) throws IOException {
+  private static  HoodieFileReader 
newHFileFileReader(Configuration conf, Path path) throws IOException {
 CacheConfig cacheConfig = new CacheConfig(conf);
 return new HoodieHFileReader<>(conf, path, cacheConfig);
   }



[GitHub] [hudi] vinothchandar merged pull request #2163: [MINOR] Remove unused generics type

2020-10-11 Thread GitBox


vinothchandar merged pull request #2163:
URL: https://github.com/apache/hudi/pull/2163


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (032bc3b -> 2126f13)

2020-10-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 032bc3b  [MINOR] NPE Optimization for Option (#2158)
 add 2126f13  [HUDI-791]  Replace null by Option in Delta Streamer (#2171)

No new revisions were added by this update.

Summary of changes:
 .../deltastreamer/HoodieDeltaStreamer.java | 57 ++
 .../HoodieMultiTableDeltaStreamer.java |  3 +-
 .../functional/TestHoodieDeltaStreamer.java|  2 +-
 3 files changed, 28 insertions(+), 34 deletions(-)



[GitHub] [hudi] vinothchandar merged pull request #2171: [HUDI-791] Replace null by Option in Delta Streamer

2020-10-11 Thread GitBox


vinothchandar merged pull request #2171:
URL: https://github.com/apache/hudi/pull/2171


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

2020-10-11 Thread GitBox


vinothchandar commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-706806558







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: Travis CI build asf-site

2020-10-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 17c7e95  Travis CI build asf-site
17c7e95 is described below

commit 17c7e95b43a81c1b80b0343724567a191953
Author: CI 
AuthorDate: Mon Oct 12 00:56:58 2020 +

Travis CI build asf-site
---
 content/contributing.html | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/content/contributing.html b/content/contributing.html
index 73fa4bc..7d0bbce 100644
--- a/content/contributing.html
+++ b/content/contributing.html
@@ -470,6 +470,36 @@ an open source license https://www.apache.org/legal/resolved.html#crite
   
 
 
+Tests
+
+
+  Categories
+
+  unit - testing basic functionality at the class level, potentially 
using mocks. Expected to finish quicker
+  functional - brings up the services needed and runs test without 
mocking
+  integration - runs subset of functional tests, on a full fledged 
enviroment with dockerized services
+
+  
+  Prepare Test Data
+
+  Many unit and functional test cases require a Hudi dataset to be 
prepared beforehand. HoodieTestTable and 
HoodieWriteableTestTable are dedicated 
test utility classes for this purpose. Use them whenever appropriate, and add 
new APIs to them when needed.
+  When add new APIs in the test utility classes, overload APIs with 
variety of arguments to do more heavy-liftings for callers.
+  In most scenarios, you won’t need to use FileCreateUtils directly.
+  If test cases require interaction with actual HoodieRecords, use HoodieWriteableTestTable (and HoodieTestDataGenerator probably). Otherwise, 
HoodieTestTable that manipulates empty 
files shall serve the purpose.
+
+  
+  Strive for Readability
+
+  Avoid writing flow controls for different assertion cases. Split to 
a new test case when appropriate.
+  Use plain for-loop to avoid try-catch in lambda block. Declare 
exceptions is okay.
+  Use static import for constants and static helper methods to avoid 
lengthy code.
+  Avoid reusing local variable names. Create new variables 
generously.
+  Keep helper methods local to the test class until it becomes 
obviously generic and re-useable. When that happens, move the helper method to 
the right utility class. For example, Assertions contains common assert helpers, and 
SchemaTestUtil is for schema related 
helpers.
+  Avoid putting new helpers in HoodieTestUtils and HoodieClientTestUtils, which are named too 
generic. Eventually, all test helpers shall be categorized properly.
+
+  
+
+
 Reviewing Code/RFCs
 
 



[hudi] branch master updated: [MINOR] NPE Optimization for Option (#2158)

2020-10-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 032bc3b  [MINOR] NPE Optimization for Option (#2158)
032bc3b is described below

commit 032bc3b08fe61ff23c8a1002d78a8197893d4f89
Author: dugenkui 
AuthorDate: Mon Oct 12 08:55:41 2020 +0800

[MINOR] NPE Optimization for Option (#2158)
---
 hudi-common/src/main/java/org/apache/hudi/common/util/Option.java | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hudi-common/src/main/java/org/apache/hudi/common/util/Option.java 
b/hudi-common/src/main/java/org/apache/hudi/common/util/Option.java
index a67b6ab..42d6057 100644
--- a/hudi-common/src/main/java/org/apache/hudi/common/util/Option.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/util/Option.java
@@ -98,6 +98,9 @@ public final class Option implements Serializable {
   }
 
   public  Option map(Function mapper) {
+if (null == mapper) {
+  throw new NullPointerException("mapper should not be null");
+}
 if (!isPresent()) {
   return empty();
 } else {
@@ -140,6 +143,8 @@ public final class Option implements Serializable {
 
   @Override
   public String toString() {
-return "Option{val=" + val + '}';
+return val != null
+? "Option{val=" + val + "}"
+: "Optional.empty";
   }
 }



[GitHub] [hudi] vinothchandar merged pull request #2158: [MINOR]Optimization for Option

2020-10-11 Thread GitBox


vinothchandar merged pull request #2158:
URL: https://github.com/apache/hudi/pull/2158


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: [HUDI-1034][DOCS] Add code guidelines for writing tests (#2169)

2020-10-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new f550175  [HUDI-1034][DOCS] Add code guidelines for writing tests 
(#2169)
f550175 is described below

commit f550175e6abd56552a1b3dce96ea22a0ae1b7d08
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Sun Oct 11 17:54:50 2020 -0700

[HUDI-1034][DOCS] Add code guidelines for writing tests (#2169)
---
 docs/_pages/contributing.md | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/docs/_pages/contributing.md b/docs/_pages/contributing.md
index bb30a84..7db925a 100644
--- a/docs/_pages/contributing.md
+++ b/docs/_pages/contributing.md
@@ -184,6 +184,24 @@ of how we want to evolve our code in the future.
- Any changes to methods annotated with `PublicAPIMethod` or classes 
annotated with `PublicAPIClass` require upfront discussion and potentially an 
RFC.
- Any non-backwards compatible changes similarly need upfront discussion 
and the functionality needs to implement an upgrade-downgrade path.
 
+ Tests
+
+- **Categories**
+- unit - testing basic functionality at the class level, potentially using 
mocks. Expected to finish quicker
+- functional - brings up the services needed and runs test without mocking
+- integration - runs subset of functional tests, on a full fledged 
enviroment with dockerized services
+- **Prepare Test Data**
+- Many unit and functional test cases require a Hudi dataset to be 
prepared beforehand. `HoodieTestTable` and `HoodieWriteableTestTable` are 
dedicated test utility classes for this purpose. Use them whenever appropriate, 
and add new APIs to them when needed.
+- When add new APIs in the test utility classes, overload APIs with 
variety of arguments to do more heavy-liftings for callers.
+- In most scenarios, you won't need to use `FileCreateUtils` directly.
+- If test cases require interaction with actual `HoodieRecord`s, use 
`HoodieWriteableTestTable` (and `HoodieTestDataGenerator` probably). Otherwise, 
`HoodieTestTable` that manipulates empty files shall serve the purpose.
+- **Strive for Readability**
+- Avoid writing flow controls for different assertion cases. Split to a 
new test case when appropriate.
+- Use plain for-loop to avoid try-catch in lambda block. Declare 
exceptions is okay.
+- Use static import for constants and static helper methods to avoid 
lengthy code.
+- Avoid reusing local variable names. Create new variables generously.
+- Keep helper methods local to the test class until it becomes obviously 
generic and re-useable. When that happens, move the helper method to the right 
utility class. For example, `Assertions` contains common assert helpers, and 
`SchemaTestUtil` is for schema related helpers.
+- Avoid putting new helpers in `HoodieTestUtils` and 
`HoodieClientTestUtils`, which are named too generic. Eventually, all test 
helpers shall be categorized properly.  
 
 ### Reviewing Code/RFCs
 



[GitHub] [hudi] vinothchandar merged pull request #2169: [HUDI-1034][DOCS] Add code guidelines for writing tests

2020-10-11 Thread GitBox


vinothchandar merged pull request #2169:
URL: https://github.com/apache/hudi/pull/2169


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2147: [HUDI-1289] Remove shading pattern for hbase dependencies in hudi-spark-bundle

2020-10-11 Thread GitBox


vinothchandar commented on pull request #2147:
URL: https://github.com/apache/hudi/pull/2147#issuecomment-706799665


   @rmpifer if you can confirm the above, we can land this. otherwise LGTM 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [MINOR] Fix typo and others (#2164)

2020-10-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new d4d4c8c  [MINOR] Fix typo and others (#2164)
d4d4c8c is described below

commit d4d4c8c8994d4da988005b7930b797213aed4303
Author: dugenkui 
AuthorDate: Mon Oct 12 08:52:44 2020 +0800

[MINOR] Fix typo and others (#2164)


* remove HoodieSerializationException that will never be throw
* remove unused method, make HoodieException more readable
* fix typo
---
 .../src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java   | 4 ++--
 .../org/apache/hudi/common/table/log/HoodieLogFileReader.java | 2 +-
 .../main/java/org/apache/hudi/common/util/SerializationUtils.java | 3 ---
 .../src/main/java/org/apache/hudi/exception/HoodieException.java  | 8 
 ...patabilityException.java => SchemaCompatibilityException.java} | 8 
 .../src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java   | 4 ++--
 6 files changed, 9 insertions(+), 20 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java 
b/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
index 422d75c..a7517a4 100644
--- a/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
+++ b/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
@@ -23,7 +23,7 @@ import org.apache.hudi.common.util.StringUtils;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.exception.HoodieException;
 import org.apache.hudi.exception.HoodieIOException;
-import org.apache.hudi.exception.SchemaCompatabilityException;
+import org.apache.hudi.exception.SchemaCompatibilityException;
 
 import org.apache.avro.Conversions.DecimalConversion;
 import org.apache.avro.JsonProperties;
@@ -321,7 +321,7 @@ public class HoodieAvroUtils {
   }
 }
 if (!GenericData.get().validate(newSchema, newRecord)) {
-  throw new SchemaCompatabilityException(
+  throw new SchemaCompatibilityException(
   "Unable to validate the rewritten record " + record + " against 
schema " + newSchema);
 }
 return newRecord;
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
index 5d2e185..27884ec 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java
@@ -319,7 +319,7 @@ public class HoodieLogFileReader implements 
HoodieLogFormat.Reader {
   boolean hasMagic = hasNextMagic();
   if (!hasMagic) {
 throw new CorruptedLogFileException(
-logFile + "could not be read. Did not find the magic bytes at the 
start of the block");
+logFile + " could not be read. Did not find the magic bytes at the 
start of the block");
   }
   return hasMagic;
 } catch (EOFException e) {
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/util/SerializationUtils.java 
b/hudi-common/src/main/java/org/apache/hudi/common/util/SerializationUtils.java
index 9d075bb..9041db5 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/util/SerializationUtils.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/util/SerializationUtils.java
@@ -18,8 +18,6 @@
 
 package org.apache.hudi.common.util;
 
-import org.apache.hudi.exception.HoodieSerializationException;
-
 import com.esotericsoftware.kryo.Kryo;
 import com.esotericsoftware.kryo.io.Input;
 import com.esotericsoftware.kryo.io.Output;
@@ -72,7 +70,6 @@ public class SerializationUtils {
* @param objectData the serialized object, must not be null
* @return the deserialized object
* @throws IllegalArgumentException if {@code objectData} is {@code null}
-   * @throws HoodieSerializationException (runtime) if the serialization fails
*/
   public static  T deserialize(final byte[] objectData) {
 if (objectData == null) {
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/exception/HoodieException.java 
b/hudi-common/src/main/java/org/apache/hudi/exception/HoodieException.java
index 2b86dc6..58adc0e 100644
--- a/hudi-common/src/main/java/org/apache/hudi/exception/HoodieException.java
+++ b/hudi-common/src/main/java/org/apache/hudi/exception/HoodieException.java
@@ -47,12 +47,4 @@ public class HoodieException extends RuntimeException 
implements Serializable {
 super(t);
   }
 
-  protected static String format(String message, Object... args) {
-String[] argStrings = new String[args.length];
-for (int i = 0; i < args.length; i += 1) {
-  argStrings[i] = String.valueOf(args[i]);
-}
-return String.format(String.valueOf(message), (Object[]) argStrings);
-  }
-
 }
diff --git 
a/hudi-common/src/main/ja

[GitHub] [hudi] vinothchandar merged pull request #2164: [MINOR] Fix typo and others

2020-10-11 Thread GitBox


vinothchandar merged pull request #2164:
URL: https://github.com/apache/hudi/pull/2164


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #1946: [HUDI-1176]Support log4j2 config

2020-10-11 Thread GitBox


vinothchandar commented on pull request #1946:
URL: https://github.com/apache/hudi/pull/1946#issuecomment-706792241


   I am bit confused about this PR. 
   Checked out log4j 2, seems like it has some nice advantages. Should we just 
move to log4j 2? 
   https://logging.apache.org/log4j/2.x/
   https://logging.apache.org/log4j/2.x/manual/messages.html 
   https://logging.apache.org/log4j/2.x/manual/api.html#LambdaSupport
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

2020-10-11 Thread GitBox


vinothchandar commented on pull request #1760:
URL: https://github.com/apache/hudi/pull/1760#issuecomment-706791396


   @aniejo there are some known issues since some spark APIs have changed in 3. 
   
   @bschell any updates for us? This is being requested heavily, love to do 
this sooner if possible. are we blocked on something? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (585ce00 -> 86db4da)

2020-10-11 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 585ce00  [HUDI-1301]  use  spark INCREMENTAL mode query hudi dataset 
support schema version. (#2125)
 add 86db4da  [HUDI-1339] delete useless import in hudi-spark module (#2173)

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala| 2 +-
 .../src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala| 4 ++--
 .../src/main/scala/org/apache/hudi/HoodieStreamingSink.scala | 4 ++--
 .../src/main/scala/org/apache/hudi/IncrementalRelation.scala | 9 +
 4 files changed, 6 insertions(+), 13 deletions(-)



[GitHub] [hudi] vinothchandar merged pull request #2173: [HUDI-1339] delete useless import in hudi-spark module

2020-10-11 Thread GitBox


vinothchandar merged pull request #2173:
URL: https://github.com/apache/hudi/pull/2173


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2020-10-11 Thread GitBox


bvaradar commented on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-706764303


   @ashishmgofficial : It looks like the json data and the avro schema are not 
matching correctly.  When I read the file through spark directly (please see 
below), I am getting an different schema than the one you provided. This is 
because debezium is configured to write in "JSON_SCHEMA" mode which I think is 
the default. This has both data and schema inlined and is inefficient in space.
   
   Since you are actually managing avro schemas, can you configure Debezium to 
write avro records directly rather than json. In my experiments (with a custom 
schema), I saw 8x speeded in Debezium by changing the format from json_schema 
to avro. If you still want to write as json, disable inline schema by setting 
the below debezium configs to false: 
   key.converter.schemas.enable
  value.converter.schemas.enable
   
   
   ==
   
   scala> val df = spark.read.json("file:///var/hoodie/ws/docker/inp.json")
   df: org.apache.spark.sql.DataFrame = [after: struct, flag: struct 
... 5 more fields>>, before: string ... 4 more fields]
   
   scala> df.printSchema()
   root
|-- after: struct (nullable = true)
||-- Value: struct (nullable = true)
|||-- case_individual_id: struct (nullable = true)
||||-- int: long (nullable = true)
|||-- flag: struct (nullable = true)
||||-- string: string (nullable = true)
|||-- inc_id: long (nullable = true)
|||-- last_modified_ts: long (nullable = true)
|||-- violation_code: struct (nullable = true)
||||-- string: string (nullable = true)
|||-- violation_desc: struct (nullable = true)
||||-- string: string (nullable = true)
|||-- year: struct (nullable = true)
||||-- int: long (nullable = true)
|-- before: string (nullable = true)
|-- op: string (nullable = true)
|-- source: struct (nullable = true)
||-- connector: string (nullable = true)
||-- db: string (nullable = true)
||-- lsn: struct (nullable = true)
|||-- long: long (nullable = true)
||-- name: string (nullable = true)
||-- schema: string (nullable = true)
||-- snapshot: struct (nullable = true)
|||-- string: string (nullable = true)
||-- table: string (nullable = true)
||-- ts_ms: long (nullable = true)
||-- txId: struct (nullable = true)
|||-- long: long (nullable = true)
||-- version: string (nullable = true)
||-- xmin: string (nullable = true)
|-- transaction: string (nullable = true)
|-- ts_ms: struct (nullable = true)
||-- long: long (nullable = true)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-1340) Not able to query real time table when rows contains nested elements

2020-10-11 Thread Bharat Dighe (Jira)
Bharat Dighe created HUDI-1340:
--

 Summary: Not able to query real time table when rows contains 
nested elements
 Key: HUDI-1340
 URL: https://issues.apache.org/jira/browse/HUDI-1340
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Bharat Dighe
 Attachments: create_avro.py, user.avsc, users1.avro, users2.avro, 
users3.avro, users4.avro, users5.avro

AVRO schema: Attached

Script to generate sample data: attached

Sample data attached

==

the schema as nested elements, here is the output from hive
{code:java}
  CREATE EXTERNAL TABLE `users_mor_rt`( 
 `_hoodie_commit_time` string, 
 `_hoodie_commit_seqno` string, 
 `_hoodie_record_key` string, 
 `_hoodie_partition_path` string, 
 `_hoodie_file_name` string, 
 `name` string, 
 `userid` int, 
 `datehired` string, 
 `meta` struct, 
 `experience` 
struct>>) 
 PARTITIONED BY ( 
 `role` string) 
 ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
 STORED AS INPUTFORMAT 
 'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' 
 OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
 LOCATION 
 'hdfs://namenode:8020/tmp/hudi_repair_order_mor' 
 TBLPROPERTIES ( 
 'last_commit_time_sync'='20201011190954', 
 'transient_lastDdlTime'='1602442906')
{code}
scala  code:
{code:java}
import java.io.File

import org.apache.hudi.QuickstartUtils._
import org.apache.spark.sql.SaveMode._
import org.apache.avro.Schema
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._


val tableName = "users_mor"
//  val basePath = "hdfs:///tmp/hudi_repair_order_mor"
val basePath = "hdfs:///tmp/hudi_repair_order_mor"

//  Insert Data

/// local not hdfs !!!
//val schema = new Schema.Parser().parse(new 
File("/var/hoodie/ws/docker/demo/data/user/user.avsc"))


def updateHudi( num:String, op:String) = {
val path = "hdfs:///var/demo/data/user/users" + num + ".avro"
println( path );

val avdf2 =  new org.apache.spark.sql.SQLContext(sc).read.format("avro").
// option("avroSchema", schema.toString).
load(path)
avdf2.select("name").show(false)

avdf2.write.format("hudi").
options(getQuickstartWriteConfigs).
option(OPERATION_OPT_KEY,op).
option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ"). // default:COPY_ON_WRITE, 
MERGE_ON_READ

option(KEYGENERATOR_CLASS_OPT_KEY, 
"org.apache.hudi.keygen.ComplexKeyGenerator").
option(PRECOMBINE_FIELD_OPT_KEY, "meta.ingestTime").   // dedup
option(RECORDKEY_FIELD_OPT_KEY, "userId").   // key
option(PARTITIONPATH_FIELD_OPT_KEY, "role").
option(TABLE_NAME, tableName).
option("hoodie.compact.inline", false).

option(HIVE_STYLE_PARTITIONING_OPT_KEY, "true").
option(HIVE_SYNC_ENABLED_OPT_KEY, "true").
option(HIVE_TABLE_OPT_KEY, tableName).
option(HIVE_USER_OPT_KEY, "hive").
option(HIVE_PASS_OPT_KEY, "hive").
option(HIVE_URL_OPT_KEY, "jdbc:hive2://hiveserver:1").
option(HIVE_PARTITION_FIELDS_OPT_KEY, "role").
option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
"org.apache.hudi.hive.MultiPartKeysValueExtractor").
option("hoodie.datasource.hive_sync.assume_date_partitioning", "false").
mode(Append).
save(basePath)

spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, 
_hoodie_partition_path, experience.companies[0] from " + tableName + 
"_rt").show()
spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, 
_hoodie_partition_path, _hoodie_commit_seqno from " + tableName + "_ro").show()
}


updateHudi("1", "bulkinsert")
updateHudi("2", "upsert")
updateHudi("3", "upsert")
updateHudi("4", "upsert")
{code}

If nested fields are not included, it works fine
{code}
scala> spark.sql("select name from users_mor_rt");
res19: org.apache.spark.sql.DataFrame = [name: string]

scala> spark.sql("select name from users_mor_rt").show();
+-+
| name|
+-+
|engg3|
|engg1_new|
|engg2_new|
| mgr1|
| mgr2|
|  devops1|
|  devops2|
+-+
{code}

But fails when I include nested field 'experience'
{code}
scala> spark.sql("select name, experience from users_mor_rt").show();

20/10/11 19:53:58 ERROR executor.Executor: Exception in task 0.0 in stage 147.0 
(TID 153)
java.lang.UnsupportedOperationException: Cannot inspect 
org.apache.hadoop.io.Text
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:152)
at 
org.apache.spark.sql.hive.HiveInspectors$$anonfun$4$$anonfun$apply$7.apply(HiveInspectors.scala:688)
at 
org.apache.spark.sql.hive.HiveInspectors$$anonfun$unwrapperFor$41$$anonfun$apply$8.apply(HiveInspectors.scala:692)

[jira] [Updated] (HUDI-1340) Not able to query real time table when rows contains nested elements

2020-10-11 Thread Bharat Dighe (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Dighe updated HUDI-1340:
---
Attachment: users5.avro
users4.avro
users3.avro
users2.avro
users1.avro

> Not able to query real time table when rows contains nested elements
> 
>
> Key: HUDI-1340
> URL: https://issues.apache.org/jira/browse/HUDI-1340
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Bharat Dighe
>Priority: Major
> Attachments: create_avro.py, user.avsc, users1.avro, users2.avro, 
> users3.avro, users4.avro, users5.avro
>
>
> AVRO schema: Attached
> Script to generate sample data: attached
> Sample data attached
> ==
> the schema as nested elements, here is the output from hive
> {code:java}
>   CREATE EXTERNAL TABLE `users_mor_rt`( 
>  `_hoodie_commit_time` string, 
>  `_hoodie_commit_seqno` string, 
>  `_hoodie_record_key` string, 
>  `_hoodie_partition_path` string, 
>  `_hoodie_file_name` string, 
>  `name` string, 
>  `userid` int, 
>  `datehired` string, 
>  `meta` struct, 
>  `experience` 
> struct>>) 
>  PARTITIONED BY ( 
>  `role` string) 
>  ROW FORMAT SERDE 
>  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
>  STORED AS INPUTFORMAT 
>  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' 
>  OUTPUTFORMAT 
>  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  LOCATION 
>  'hdfs://namenode:8020/tmp/hudi_repair_order_mor' 
>  TBLPROPERTIES ( 
>  'last_commit_time_sync'='20201011190954', 
>  'transient_lastDdlTime'='1602442906')
> {code}
> scala  code:
> {code:java}
> import java.io.File
> import org.apache.hudi.QuickstartUtils._
> import org.apache.spark.sql.SaveMode._
> import org.apache.avro.Schema
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "users_mor"
> //  val basePath = "hdfs:///tmp/hudi_repair_order_mor"
> val basePath = "hdfs:///tmp/hudi_repair_order_mor"
> //  Insert Data
> /// local not hdfs !!!
> //val schema = new Schema.Parser().parse(new 
> File("/var/hoodie/ws/docker/demo/data/user/user.avsc"))
> def updateHudi( num:String, op:String) = {
> val path = "hdfs:///var/demo/data/user/users" + num + ".avro"
> println( path );
> val avdf2 =  new org.apache.spark.sql.SQLContext(sc).read.format("avro").
> // option("avroSchema", schema.toString).
> load(path)
> avdf2.select("name").show(false)
> avdf2.write.format("hudi").
> options(getQuickstartWriteConfigs).
> option(OPERATION_OPT_KEY,op).
> option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ"). // 
> default:COPY_ON_WRITE, MERGE_ON_READ
> option(KEYGENERATOR_CLASS_OPT_KEY, 
> "org.apache.hudi.keygen.ComplexKeyGenerator").
> option(PRECOMBINE_FIELD_OPT_KEY, "meta.ingestTime").   // dedup
> option(RECORDKEY_FIELD_OPT_KEY, "userId").   // key
> option(PARTITIONPATH_FIELD_OPT_KEY, "role").
> option(TABLE_NAME, tableName).
> option("hoodie.compact.inline", false).
> option(HIVE_STYLE_PARTITIONING_OPT_KEY, "true").
> option(HIVE_SYNC_ENABLED_OPT_KEY, "true").
> option(HIVE_TABLE_OPT_KEY, tableName).
> option(HIVE_USER_OPT_KEY, "hive").
> option(HIVE_PASS_OPT_KEY, "hive").
> option(HIVE_URL_OPT_KEY, "jdbc:hive2://hiveserver:1").
> option(HIVE_PARTITION_FIELDS_OPT_KEY, "role").
> option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
> "org.apache.hudi.hive.MultiPartKeysValueExtractor").
> option("hoodie.datasource.hive_sync.assume_date_partitioning", 
> "false").
> mode(Append).
> save(basePath)
> spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, 
> _hoodie_partition_path, experience.companies[0] from " + tableName + 
> "_rt").show()
> spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, 
> _hoodie_partition_path, _hoodie_commit_seqno from " + tableName + 
> "_ro").show()
> }
> updateHudi("1", "bulkinsert")
> updateHudi("2", "upsert")
> updateHudi("3", "upsert")
> updateHudi("4", "upsert")
> {code}
> If nested fields are not included, it works fine
> {code}
> scala> spark.sql("select name from users_mor_rt");
> res19: org.apache.spark.sql.DataFrame = [name: string]
> scala> spark.sql("select name from users_mor_rt").show();
> +-+
> | name|
> +-+
> |engg3|
> |engg1_new|
> |engg2_new|
> | mgr1|
> | mgr2|
> |  devops1|
> |  devops2|
> +-+
> {code}
> But fails when I include nested field 'experience'
> {code}
> scala> spark.sql("select name, experience from users_mor_rt").show();
> 2

[jira] [Updated] (HUDI-1340) Not able to query real time table when rows contains nested elements

2020-10-11 Thread Bharat Dighe (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Dighe updated HUDI-1340:
---
Attachment: user.avsc
create_avro.py

> Not able to query real time table when rows contains nested elements
> 
>
> Key: HUDI-1340
> URL: https://issues.apache.org/jira/browse/HUDI-1340
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Bharat Dighe
>Priority: Major
> Attachments: create_avro.py, user.avsc, users1.avro, users2.avro, 
> users3.avro, users4.avro, users5.avro
>
>
> AVRO schema: Attached
> Script to generate sample data: attached
> Sample data attached
> ==
> the schema as nested elements, here is the output from hive
> {code:java}
>   CREATE EXTERNAL TABLE `users_mor_rt`( 
>  `_hoodie_commit_time` string, 
>  `_hoodie_commit_seqno` string, 
>  `_hoodie_record_key` string, 
>  `_hoodie_partition_path` string, 
>  `_hoodie_file_name` string, 
>  `name` string, 
>  `userid` int, 
>  `datehired` string, 
>  `meta` struct, 
>  `experience` 
> struct>>) 
>  PARTITIONED BY ( 
>  `role` string) 
>  ROW FORMAT SERDE 
>  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
>  STORED AS INPUTFORMAT 
>  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' 
>  OUTPUTFORMAT 
>  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
>  LOCATION 
>  'hdfs://namenode:8020/tmp/hudi_repair_order_mor' 
>  TBLPROPERTIES ( 
>  'last_commit_time_sync'='20201011190954', 
>  'transient_lastDdlTime'='1602442906')
> {code}
> scala  code:
> {code:java}
> import java.io.File
> import org.apache.hudi.QuickstartUtils._
> import org.apache.spark.sql.SaveMode._
> import org.apache.avro.Schema
> import org.apache.hudi.DataSourceReadOptions._
> import org.apache.hudi.DataSourceWriteOptions._
> import org.apache.hudi.config.HoodieWriteConfig._
> val tableName = "users_mor"
> //  val basePath = "hdfs:///tmp/hudi_repair_order_mor"
> val basePath = "hdfs:///tmp/hudi_repair_order_mor"
> //  Insert Data
> /// local not hdfs !!!
> //val schema = new Schema.Parser().parse(new 
> File("/var/hoodie/ws/docker/demo/data/user/user.avsc"))
> def updateHudi( num:String, op:String) = {
> val path = "hdfs:///var/demo/data/user/users" + num + ".avro"
> println( path );
> val avdf2 =  new org.apache.spark.sql.SQLContext(sc).read.format("avro").
> // option("avroSchema", schema.toString).
> load(path)
> avdf2.select("name").show(false)
> avdf2.write.format("hudi").
> options(getQuickstartWriteConfigs).
> option(OPERATION_OPT_KEY,op).
> option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ"). // 
> default:COPY_ON_WRITE, MERGE_ON_READ
> option(KEYGENERATOR_CLASS_OPT_KEY, 
> "org.apache.hudi.keygen.ComplexKeyGenerator").
> option(PRECOMBINE_FIELD_OPT_KEY, "meta.ingestTime").   // dedup
> option(RECORDKEY_FIELD_OPT_KEY, "userId").   // key
> option(PARTITIONPATH_FIELD_OPT_KEY, "role").
> option(TABLE_NAME, tableName).
> option("hoodie.compact.inline", false).
> option(HIVE_STYLE_PARTITIONING_OPT_KEY, "true").
> option(HIVE_SYNC_ENABLED_OPT_KEY, "true").
> option(HIVE_TABLE_OPT_KEY, tableName).
> option(HIVE_USER_OPT_KEY, "hive").
> option(HIVE_PASS_OPT_KEY, "hive").
> option(HIVE_URL_OPT_KEY, "jdbc:hive2://hiveserver:1").
> option(HIVE_PARTITION_FIELDS_OPT_KEY, "role").
> option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
> "org.apache.hudi.hive.MultiPartKeysValueExtractor").
> option("hoodie.datasource.hive_sync.assume_date_partitioning", 
> "false").
> mode(Append).
> save(basePath)
> spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, 
> _hoodie_partition_path, experience.companies[0] from " + tableName + 
> "_rt").show()
> spark.sql("select name, _hoodie_commit_time, _hoodie_record_key, 
> _hoodie_partition_path, _hoodie_commit_seqno from " + tableName + 
> "_ro").show()
> }
> updateHudi("1", "bulkinsert")
> updateHudi("2", "upsert")
> updateHudi("3", "upsert")
> updateHudi("4", "upsert")
> {code}
> If nested fields are not included, it works fine
> {code}
> scala> spark.sql("select name from users_mor_rt");
> res19: org.apache.spark.sql.DataFrame = [name: string]
> scala> spark.sql("select name from users_mor_rt").show();
> +-+
> | name|
> +-+
> |engg3|
> |engg1_new|
> |engg2_new|
> | mgr1|
> | mgr2|
> |  devops1|
> |  devops2|
> +-+
> {code}
> But fails when I include nested field 'experience'
> {code}
> scala> spark.sql("select name, experience from users_mor_rt").show();
> 20/10/11 19:53:58 ERROR executor.Executor: Exception in task 0.0 in stage 
> 147.0 (

[jira] [Issue Comment Deleted] (HUDI-102) Beeline/Hive Client - select * on real-time views fails with schema related errors for tables with deep-nested schema #439

2020-10-11 Thread Bharat Dighe (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Dighe updated HUDI-102:
--
Comment: was deleted

(was: I am able to reproduce this.

scala> spark.sql("select * from users_mor_rt");

res10: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, 
_hoodie_commit_seqno: string ... 9 more fields]

 

scala> spark.sql("select * from users_mor_rt").show();

20/10/11 19:38:01 WARN hadoop.ParquetRecordReader: Can not initialize counter 
due to context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

20/10/11 19:38:01 ERROR executor.Executor: Exception in task 0.0 in stage 106.0 
(TID 102)

java.lang.UnsupportedOperationException: Cannot inspect 
org.apache.hadoop.io.Text

 at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:152))

> Beeline/Hive Client - select * on real-time views fails with schema related 
> errors for tables with deep-nested schema #439
> --
>
> Key: HUDI-102
> URL: https://issues.apache.org/jira/browse/HUDI-102
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Priority: Major
>  Labels: help-wanted
>
> https://github.com/apache/incubator-hudi/issues/439



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-102) Beeline/Hive Client - select * on real-time views fails with schema related errors for tables with deep-nested schema #439

2020-10-11 Thread Bharat Dighe (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212020#comment-17212020
 ] 

Bharat Dighe commented on HUDI-102:
---

I am able to reproduce this.

scala> spark.sql("select * from users_mor_rt");

res10: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, 
_hoodie_commit_seqno: string ... 9 more fields]

 

scala> spark.sql("select * from users_mor_rt").show();

20/10/11 19:38:01 WARN hadoop.ParquetRecordReader: Can not initialize counter 
due to context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

20/10/11 19:38:01 ERROR executor.Executor: Exception in task 0.0 in stage 106.0 
(TID 102)

java.lang.UnsupportedOperationException: Cannot inspect 
org.apache.hadoop.io.Text

 at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:152)

> Beeline/Hive Client - select * on real-time views fails with schema related 
> errors for tables with deep-nested schema #439
> --
>
> Key: HUDI-102
> URL: https://issues.apache.org/jira/browse/HUDI-102
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Priority: Major
>  Labels: help-wanted
>
> https://github.com/apache/incubator-hudi/issues/439



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-io edited a comment on pull request #2171: [HUDI-791] Replace null by Option in Delta Streamer

2020-10-11 Thread GitBox


codecov-io edited a comment on pull request #2171:
URL: https://github.com/apache/hudi/pull/2171#issuecomment-706733188


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=h1) Report
   > Merging 
[#2171](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `91.66%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2171/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#2171   +/-   ##
   =
 Coverage 53.60%   53.60%   
 Complexity 2846 2846   
   =
 Files   359  359   
 Lines 1654816547-1 
 Branches   1780 1780   
   =
 Hits   8870 8870   
   + Misses 6920 6919-1 
 Partials758  758   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.69% <ø> (-0.02%)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (ø)` | `304.00 <ø> (ø)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `70.07% <91.66%> (+0.09%)` | `325.00 <6.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `69.54% <91.30%> (+0.69%)` | `18.00 <6.00> (ø)` | |
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `78.39% <100.00%> (ø)` | `18.00 <0.00> (ø)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `77.67% <0.00%> (-0.90%)` | `22.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2173: [HUDI-1339] delete useless import in hudi-spark module

2020-10-11 Thread GitBox


codecov-io edited a comment on pull request #2173:
URL: https://github.com/apache/hudi/pull/2173#issuecomment-706736204


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2173?src=pr&el=h1) Report
   > Merging 
[#2173](https://codecov.io/gh/apache/hudi/pull/2173?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2173/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2173?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#2173   +/-   ##
   =
 Coverage 53.60%   53.60%   
 Complexity 2846 2846   
   =
 Files   359  359   
 Lines 1654816548   
 Branches   1780 1780   
   =
 Hits   8870 8870   
 Misses 6920 6920   
 Partials758  758   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.70% <ø> (ø)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (ø)` | `304.00 <ø> (ø)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2173?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala](https://codecov.io/gh/apache/hudi/pull/2173/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllTWVyZ2VPblJlYWRSREQuc2NhbGE=)
 | `82.19% <ø> (ø)` | `10.00 <0.00> (ø)` | |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2173/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=)
 | `51.14% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/scala/org/apache/hudi/HoodieStreamingSink.scala](https://codecov.io/gh/apache/hudi/pull/2173/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3RyZWFtaW5nU2luay5zY2FsYQ==)
 | `24.00% <ø> (ø)` | `10.00 <0.00> (ø)` | |
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2173/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `76.19% <ø> (ø)` | `20.00 <0.00> (ø)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2173: [HUDI-1339] delete useless import in hudi-spark module

2020-10-11 Thread GitBox


codecov-io commented on pull request #2173:
URL: https://github.com/apache/hudi/pull/2173#issuecomment-706736204


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2173?src=pr&el=h1) Report
   > Merging 
[#2173](https://codecov.io/gh/apache/hudi/pull/2173?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc)
 will **not change** coverage.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2173/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2173?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#2173   +/-   ##
   =
 Coverage 53.60%   53.60%   
 Complexity 2846 2846   
   =
 Files   359  359   
 Lines 1654816548   
 Branches   1780 1780   
   =
 Hits   8870 8870   
 Misses 6920 6920   
 Partials758  758   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.70% <ø> (ø)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (ø)` | `304.00 <ø> (ø)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2173?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala](https://codecov.io/gh/apache/hudi/pull/2173/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllTWVyZ2VPblJlYWRSREQuc2NhbGE=)
 | `82.19% <ø> (ø)` | `10.00 <0.00> (ø)` | |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2173/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=)
 | `51.14% <ø> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...in/scala/org/apache/hudi/HoodieStreamingSink.scala](https://codecov.io/gh/apache/hudi/pull/2173/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3RyZWFtaW5nU2luay5zY2FsYQ==)
 | `24.00% <ø> (ø)` | `10.00 <0.00> (ø)` | |
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2173/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `76.19% <ø> (ø)` | `20.00 <0.00> (ø)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar closed issue #2145: [SUPPORT] IOException when querying Hudi data with Hive using LIMIT clause

2020-10-11 Thread GitBox


bvaradar closed issue #2145:
URL: https://github.com/apache/hudi/issues/2145


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2171: [HUDI-791] Replace null by Option in Delta Streamer

2020-10-11 Thread GitBox


codecov-io edited a comment on pull request #2171:
URL: https://github.com/apache/hudi/pull/2171#issuecomment-706733188


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=h1) Report
   > Merging 
[#2171](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc)
 will **increase** coverage by `0.00%`.
   > The diff coverage is `91.66%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2171/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#2171   +/-   ##
   =
 Coverage 53.60%   53.60%   
 Complexity 2846 2846   
   =
 Files   359  359   
 Lines 1654816547-1 
 Branches   1780 1780   
   =
 Hits   8870 8870   
   + Misses 6920 6919-1 
 Partials758  758   
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.69% <ø> (-0.02%)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (ø)` | `304.00 <ø> (ø)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `70.07% <91.66%> (+0.09%)` | `325.00 <6.00> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `69.54% <91.30%> (+0.69%)` | `18.00 <6.00> (ø)` | |
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `78.39% <100.00%> (ø)` | `18.00 <0.00> (ø)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `77.67% <0.00%> (-0.90%)` | `22.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2171: [HUDI-791] Replace null by Option in Delta Streamer

2020-10-11 Thread GitBox


codecov-io commented on pull request #2171:
URL: https://github.com/apache/hudi/pull/2171#issuecomment-706733188


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=h1) Report
   > Merging 
[#2171](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc)
 will **decrease** coverage by `9.45%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2171/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2171  +/-   ##
   
   - Coverage 53.60%   44.14%   -9.46% 
   + Complexity 2846 2215 -631 
   
 Files   359  313  -46 
 Lines 1654814100-2448 
 Branches   1780 1451 -329 
   
   - Hits   8870 6225-2645 
   - Misses 6920 7403 +483 
   + Partials758  472 -286 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.69% <ø> (-0.02%)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `?` | `?` | |
   | #huditimelineservice | `?` | `?` | |
   | #hudiutilities | `10.46% <0.00%> (-59.52%)` | `48.00 <0.00> (-277.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `0.00% <0.00%> (-68.86%)` | `0.00 <0.00> (-18.00)` | |
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `0.00% <0.00%> (-78.40%)` | `0.00 <0.00> (-18.00)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21

[GitHub] [hudi] codecov-io edited a comment on pull request #2171: [HUDI-791] Replace null by Option in Delta Streamer

2020-10-11 Thread GitBox


codecov-io edited a comment on pull request #2171:
URL: https://github.com/apache/hudi/pull/2171#issuecomment-706733188


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=h1) Report
   > Merging 
[#2171](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc)
 will **decrease** coverage by `9.45%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2171/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2171  +/-   ##
   
   - Coverage 53.60%   44.14%   -9.46% 
   + Complexity 2846 2215 -631 
   
 Files   359  313  -46 
 Lines 1654814100-2448 
 Branches   1780 1451 -329 
   
   - Hits   8870 6225-2645 
   - Misses 6920 7403 +483 
   + Partials758  472 -286 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.69% <ø> (-0.02%)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `?` | `?` | |
   | #huditimelineservice | `?` | `?` | |
   | #hudiutilities | `10.46% <0.00%> (-59.52%)` | `48.00 <0.00> (-277.00)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2171?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `0.00% <0.00%> (-68.86%)` | `0.00 <0.00> (-18.00)` | |
   | 
[...s/deltastreamer/HoodieMultiTableDeltaStreamer.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllTXVsdGlUYWJsZURlbHRhU3RyZWFtZXIuamF2YQ==)
 | `0.00% <0.00%> (-78.40%)` | `0.00 <0.00> (-18.00)` | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2171/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMv

[GitHub] [hudi] bvaradar commented on issue #2153: [SUPPORT] Failed to delete key: /.hoodie/.temp/20201006182950

2020-10-11 Thread GitBox


bvaradar commented on issue #2153:
URL: https://github.com/apache/hudi/issues/2153#issuecomment-706733081


   Thanks



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar closed issue #2154: [SUPPORT] Throwing org.apache.spark.shuffle.FetchFailedException consistently

2020-10-11 Thread GitBox


bvaradar closed issue #2154:
URL: https://github.com/apache/hudi/issues/2154


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] bvaradar closed issue #2153: [SUPPORT] Failed to delete key: /.hoodie/.temp/20201006182950

2020-10-11 Thread GitBox


bvaradar closed issue #2153:
URL: https://github.com/apache/hudi/issues/2153


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1339) delete useless import in hudi-spark module

2020-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1339:
-
Labels: pull-request-available  (was: )

> delete useless import in hudi-spark module
> --
>
> Key: HUDI-1339
> URL: https://issues.apache.org/jira/browse/HUDI-1339
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] lw309637554 opened a new pull request #2173: [HUDI-1339] delete useless import in hudi-spark module

2020-10-11 Thread GitBox


lw309637554 opened a new pull request #2173:
URL: https://github.com/apache/hudi/pull/2173


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   delete useless import in hudi-spark module
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1339) delete useless import in hudi-spark module

2020-10-11 Thread liwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liwei updated HUDI-1339:

Summary: delete useless import in hudi-spark module  (was: delete useless 
import in HoodieSparkSqlWriter)

> delete useless import in hudi-spark module
> --
>
> Key: HUDI-1339
> URL: https://issues.apache.org/jira/browse/HUDI-1339
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1339) delete useless import in HoodieSparkSqlWriter

2020-10-11 Thread liwei (Jira)
liwei created HUDI-1339:
---

 Summary: delete useless import in HoodieSparkSqlWriter
 Key: HUDI-1339
 URL: https://issues.apache.org/jira/browse/HUDI-1339
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Spark Integration
Reporter: liwei
Assignee: liwei






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-io edited a comment on pull request #2172: [HUDI-1338] Adding Delete support to test suite framework

2020-10-11 Thread GitBox


codecov-io edited a comment on pull request #2172:
URL: https://github.com/apache/hudi/pull/2172#issuecomment-706720825


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2172?src=pr&el=h1) Report
   > Merging 
[#2172](https://codecov.io/gh/apache/hudi/pull/2172?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/788d236c443eb4ced819f9305ed8e0460b5984b7?el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2172/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2172?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2172  +/-   ##
   
   - Coverage 53.61%   53.60%   -0.02% 
   - Complexity 2845 2846   +1 
   
 Files   359  359  
 Lines 1653516548  +13 
 Branches   1777 1780   +3 
   
   + Hits   8866 8870   +4 
   - Misses 6912 6920   +8 
   - Partials757  758   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.70% <ø> (-0.04%)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (-0.03%)` | `304.00 <ø> (+1.00)` | :arrow_down: |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2172?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `76.19% <0.00%> (-2.30%)` | `20.00% <0.00%> (+1.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh)
 | `100.00% <0.00%> (ø)` | `1.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh)
 | `64.70% <0.00%> (ø)` | `10.00% <0.00%> (ø%)` | |
   | 
[...del/OverwriteNonDefaultsWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZU5vbkRlZmF1bHRzV2l0aExhdGVzdEF2cm9QYXlsb2FkLmphdmE=)
 | `78.94% <0.00%> (ø)` | `5.00% <0.00%> (ø%)` | |
   | 
[...main/scala/org/apache/hudi/DataSourceOptions.scala](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvRGF0YVNvdXJjZU9wdGlvbnMuc2NhbGE=)
 | `94.82% <0.00%> (+0.09%)` | `0.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2172: [HUDI-1338] Adding Delete support to test suite framework

2020-10-11 Thread GitBox


codecov-io commented on pull request #2172:
URL: https://github.com/apache/hudi/pull/2172#issuecomment-706720825


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2172?src=pr&el=h1) Report
   > Merging 
[#2172](https://codecov.io/gh/apache/hudi/pull/2172?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/788d236c443eb4ced819f9305ed8e0460b5984b7?el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2172/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2172?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2172  +/-   ##
   
   - Coverage 53.61%   53.60%   -0.02% 
   - Complexity 2845 2846   +1 
   
 Files   359  359  
 Lines 1653516548  +13 
 Branches   1777 1780   +3 
   
   + Hits   8866 8870   +4 
   - Misses 6912 6920   +8 
   - Partials757  758   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.70% <ø> (-0.04%)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (-0.03%)` | `304.00 <ø> (+1.00)` | :arrow_down: |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2172?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `76.19% <0.00%> (-2.30%)` | `20.00% <0.00%> (+1.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh)
 | `100.00% <0.00%> (ø)` | `1.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh)
 | `64.70% <0.00%> (ø)` | `10.00% <0.00%> (ø%)` | |
   | 
[...del/OverwriteNonDefaultsWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZU5vbkRlZmF1bHRzV2l0aExhdGVzdEF2cm9QYXlsb2FkLmphdmE=)
 | `78.94% <0.00%> (ø)` | `5.00% <0.00%> (ø%)` | |
   | 
[...main/scala/org/apache/hudi/DataSourceOptions.scala](https://codecov.io/gh/apache/hudi/pull/2172/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvRGF0YVNvdXJjZU9wdGlvbnMuc2NhbGE=)
 | `94.82% <0.00%> (+0.09%)` | `0.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1338) Adding Delete support to test suite framework

2020-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1338:
-
Labels: pull-request-available  (was: )

> Adding Delete support to test suite framework
> -
>
> Key: HUDI-1338
> URL: https://issues.apache.org/jira/browse/HUDI-1338
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Add delete support to test suite framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan opened a new pull request #2172: [HUDI-1338] Adding Delete support to test suite framework

2020-10-11 Thread GitBox


nsivabalan opened a new pull request #2172:
URL: https://github.com/apache/hudi/pull/2172


   ## What is the purpose of the pull request
   
   Adding Delete support to test suite framework
   
   ## Brief change log
   
 - Adding DeleteNode to assist in issuing deletes to hudi in integ test 
suite framework
   
   ## Verify this pull request
   
   Tested using docker set up using complex-dag-cow.yaml
   
   ## Committer checklist
   
- [ x] Has a corresponding JIRA in PR title & commit

- [ x] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-1331) Improving Hudi test suite framework to support proper validation and long running tests

2020-10-11 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-1331:
-

Assignee: sivabalan narayanan

> Improving Hudi test suite framework to support proper validation and long 
> running tests
> ---
>
> Key: HUDI-1331
> URL: https://issues.apache.org/jira/browse/HUDI-1331
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Improve hudi test suite framework to support proper validation and long 
> running tests. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1338) Adding Delete support to test suite framework

2020-10-11 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-1338:
-

Assignee: sivabalan narayanan

> Adding Delete support to test suite framework
> -
>
> Key: HUDI-1338
> URL: https://issues.apache.org/jira/browse/HUDI-1338
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Add delete support to test suite framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1338) Adding Delete support to test suite framework

2020-10-11 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1338:
-

 Summary: Adding Delete support to test suite framework
 Key: HUDI-1338
 URL: https://issues.apache.org/jira/browse/HUDI-1338
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Testing
Reporter: sivabalan narayanan


Add delete support to test suite framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-io edited a comment on pull request #2170: [MINOR] Make AbstractHoodieClient as a concrete class

2020-10-11 Thread GitBox


codecov-io edited a comment on pull request #2170:
URL: https://github.com/apache/hudi/pull/2170#issuecomment-706699194


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=h1) Report
   > Merging 
[#2170](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/1d1d91d444b6af2b24b17d94068512a930877a98?el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2170/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2170  +/-   ##
   
   - Coverage 53.61%   53.60%   -0.02% 
   - Complexity 2845 2846   +1 
   
 Files   359  359  
 Lines 1653516548  +13 
 Branches   1777 1780   +3 
   
   + Hits   8866 8870   +4 
   - Misses 6912 6920   +8 
   - Partials757  758   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.70% <ø> (-0.04%)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (-0.03%)` | `304.00 <ø> (+1.00)` | :arrow_down: |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `76.19% <0.00%> (-2.30%)` | `20.00% <0.00%> (+1.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh)
 | `100.00% <0.00%> (ø)` | `1.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh)
 | `64.70% <0.00%> (ø)` | `10.00% <0.00%> (ø%)` | |
   | 
[...del/OverwriteNonDefaultsWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZU5vbkRlZmF1bHRzV2l0aExhdGVzdEF2cm9QYXlsb2FkLmphdmE=)
 | `78.94% <0.00%> (ø)` | `5.00% <0.00%> (ø%)` | |
   | 
[...main/scala/org/apache/hudi/DataSourceOptions.scala](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvRGF0YVNvdXJjZU9wdGlvbnMuc2NhbGE=)
 | `94.82% <0.00%> (+0.09%)` | `0.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2170: [MINOR] Make AbstractHoodieClient as a concrete class

2020-10-11 Thread GitBox


codecov-io edited a comment on pull request #2170:
URL: https://github.com/apache/hudi/pull/2170#issuecomment-706699194


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=h1) Report
   > Merging 
[#2170](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/1d1d91d444b6af2b24b17d94068512a930877a98?el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2170/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2170  +/-   ##
   
   - Coverage 53.61%   53.60%   -0.02% 
   - Complexity 2845 2846   +1 
   
 Files   359  359  
 Lines 1653516548  +13 
 Branches   1777 1780   +3 
   
   + Hits   8866 8870   +4 
   - Misses 6912 6920   +8 
   - Partials757  758   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.70% <ø> (-0.04%)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (-0.03%)` | `304.00 <ø> (+1.00)` | :arrow_down: |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `76.19% <0.00%> (-2.30%)` | `20.00% <0.00%> (+1.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh)
 | `100.00% <0.00%> (ø)` | `1.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh)
 | `64.70% <0.00%> (ø)` | `10.00% <0.00%> (ø%)` | |
   | 
[...del/OverwriteNonDefaultsWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZU5vbkRlZmF1bHRzV2l0aExhdGVzdEF2cm9QYXlsb2FkLmphdmE=)
 | `78.94% <0.00%> (ø)` | `5.00% <0.00%> (ø%)` | |
   | 
[...main/scala/org/apache/hudi/DataSourceOptions.scala](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvRGF0YVNvdXJjZU9wdGlvbnMuc2NhbGE=)
 | `94.82% <0.00%> (+0.09%)` | `0.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2170: [MINOR] Make AbstractHoodieClient as a concrete class

2020-10-11 Thread GitBox


codecov-io commented on pull request #2170:
URL: https://github.com/apache/hudi/pull/2170#issuecomment-706699194


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=h1) Report
   > Merging 
[#2170](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=desc) into 
[master](https://codecov.io/gh/apache/hudi/commit/1d1d91d444b6af2b24b17d94068512a930877a98?el=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2170/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2170  +/-   ##
   
   - Coverage 53.61%   53.60%   -0.02% 
   - Complexity 2845 2846   +1 
   
 Files   359  359  
 Lines 1653516548  +13 
 Branches   1777 1780   +3 
   
   + Hits   8866 8870   +4 
   - Misses 6912 6920   +8 
   - Partials757  758   +1 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.70% <ø> (-0.04%)` | `1793.00 <ø> (ø)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (-0.03%)` | `304.00 <ø> (+1.00)` | :arrow_down: |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2170?src=pr&el=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==)
 | `76.19% <0.00%> (-2.30%)` | `20.00% <0.00%> (+1.00%)` | :arrow_down: |
   | 
[.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh)
 | `100.00% <0.00%> (ø)` | `1.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/common/table/TableSchemaResolver.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL1RhYmxlU2NoZW1hUmVzb2x2ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh)
 | `64.70% <0.00%> (ø)` | `10.00% <0.00%> (ø%)` | |
   | 
[...del/OverwriteNonDefaultsWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZU5vbkRlZmF1bHRzV2l0aExhdGVzdEF2cm9QYXlsb2FkLmphdmE=)
 | `78.94% <0.00%> (ø)` | `5.00% <0.00%> (ø%)` | |
   | 
[...main/scala/org/apache/hudi/DataSourceOptions.scala](https://codecov.io/gh/apache/hudi/pull/2170/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvRGF0YVNvdXJjZU9wdGlvbnMuc2NhbGE=)
 | `94.82% <0.00%> (+0.09%)` | `0.00% <0.00%> (ø%)` | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-791) Replace null by Option in Delta Streamer

2020-10-11 Thread liwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liwei updated HUDI-791:
---
Status: Patch Available  (was: In Progress)

> Replace null by Option in Delta Streamer
> 
>
> Key: HUDI-791
> URL: https://issues.apache.org/jira/browse/HUDI-791
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer, newbie
>Reporter: Yanjia Gary Li
>Assignee: liwei
>Priority: Minor
>  Labels: pull-request-available
>
> There is a lot of null in Delta Streamer. That will be great if we can 
> replace those null by Option. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-791) Replace null by Option in Delta Streamer

2020-10-11 Thread liwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liwei updated HUDI-791:
---
Status: Open  (was: New)

> Replace null by Option in Delta Streamer
> 
>
> Key: HUDI-791
> URL: https://issues.apache.org/jira/browse/HUDI-791
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer, newbie
>Reporter: Yanjia Gary Li
>Assignee: liwei
>Priority: Minor
>  Labels: pull-request-available
>
> There is a lot of null in Delta Streamer. That will be great if we can 
> replace those null by Option. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-791) Replace null by Option in Delta Streamer

2020-10-11 Thread liwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liwei updated HUDI-791:
---
Status: In Progress  (was: Open)

> Replace null by Option in Delta Streamer
> 
>
> Key: HUDI-791
> URL: https://issues.apache.org/jira/browse/HUDI-791
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer, newbie
>Reporter: Yanjia Gary Li
>Assignee: liwei
>Priority: Minor
>  Labels: pull-request-available
>
> There is a lot of null in Delta Streamer. That will be great if we can 
> replace those null by Option. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-791) Replace null by Option in Delta Streamer

2020-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-791:

Labels: pull-request-available  (was: )

> Replace null by Option in Delta Streamer
> 
>
> Key: HUDI-791
> URL: https://issues.apache.org/jira/browse/HUDI-791
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer, newbie
>Reporter: Yanjia Gary Li
>Assignee: liwei
>Priority: Minor
>  Labels: pull-request-available
>
> There is a lot of null in Delta Streamer. That will be great if we can 
> replace those null by Option. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] lw309637554 opened a new pull request #2171: [HUDI-791] Replace null by Option in Delta Streamer

2020-10-11 Thread GitBox


lw309637554 opened a new pull request #2171:
URL: https://github.com/apache/hudi/pull/2171


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Replace null by Option in Delta Streamer. Make the code look cleaner and 
more robust
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-892) RealtimeParquetInputFormat should skip adding projection columns if there are no log files

2020-10-11 Thread liwei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211893#comment-17211893
 ] 

liwei commented on HUDI-892:


[~vinoth] hi,  why RealtimeParquetInputFormat should skip adding projection 
columns if there are no log files? does have any background ?

> RealtimeParquetInputFormat should skip adding projection columns if there are 
> no log files
> --
>
> Key: HUDI-892
> URL: https://issues.apache.org/jira/browse/HUDI-892
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Performance
>Reporter: Vinoth Chandar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] dugenkui03 opened a new pull request #2170: [MINOR] Make AbstractHoodieClient as a concrete class

2020-10-11 Thread GitBox


dugenkui03 opened a new pull request #2170:
URL: https://github.com/apache/hudi/pull/2170


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-1337) Introduce FlinkInMemoryHashIndex to hudi-flink-client

2020-10-11 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu reassigned HUDI-1337:
-

Assignee: liujinhui  (was: wangxianghu)

> Introduce FlinkInMemoryHashIndex to hudi-flink-client
> -
>
> Key: HUDI-1337
> URL: https://issues.apache.org/jira/browse/HUDI-1337
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: wangxianghu
>Assignee: liujinhui
>Priority: Major
> Fix For: 0.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1337) Introduce FlinkInMemoryHashIndex to hudi-flink-client

2020-10-11 Thread wangxianghu (Jira)
wangxianghu created HUDI-1337:
-

 Summary: Introduce FlinkInMemoryHashIndex to hudi-flink-client
 Key: HUDI-1337
 URL: https://issues.apache.org/jira/browse/HUDI-1337
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: wangxianghu
Assignee: wangxianghu
 Fix For: 0.7.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1336) Introduce FlinkHoodieGlobalSimpleIndex to hudi-flink-client

2020-10-11 Thread wangxianghu (Jira)
wangxianghu created HUDI-1336:
-

 Summary: Introduce FlinkHoodieGlobalSimpleIndex to 
hudi-flink-client
 Key: HUDI-1336
 URL: https://issues.apache.org/jira/browse/HUDI-1336
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: wangxianghu
Assignee: wangxianghu
 Fix For: 0.7.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1335) Introduce FlinkHoodieSimpleIndex to hudi-flink-client

2020-10-11 Thread wangxianghu (Jira)
wangxianghu created HUDI-1335:
-

 Summary: Introduce FlinkHoodieSimpleIndex to hudi-flink-client
 Key: HUDI-1335
 URL: https://issues.apache.org/jira/browse/HUDI-1335
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: wangxianghu
Assignee: wangxianghu
 Fix For: 0.7.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1334) Introduce FlinkHoodieHBaseIndex to hudi-flink-client

2020-10-11 Thread wangxianghu (Jira)
wangxianghu created HUDI-1334:
-

 Summary: Introduce FlinkHoodieHBaseIndex to hudi-flink-client
 Key: HUDI-1334
 URL: https://issues.apache.org/jira/browse/HUDI-1334
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: wangxianghu
Assignee: wangxianghu
 Fix For: 0.7.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1333) Introduce FlinkHoodieGlobalBloomIndex to hudi-flink-client

2020-10-11 Thread wangxianghu (Jira)
wangxianghu created HUDI-1333:
-

 Summary: Introduce FlinkHoodieGlobalBloomIndex to hudi-flink-client
 Key: HUDI-1333
 URL: https://issues.apache.org/jira/browse/HUDI-1333
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: wangxianghu
Assignee: wangxianghu
 Fix For: 0.7.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1332) Introduce FlinkHoodieBloomIndex to hudi-flink-client

2020-10-11 Thread wangxianghu (Jira)
wangxianghu created HUDI-1332:
-

 Summary: Introduce FlinkHoodieBloomIndex to hudi-flink-client
 Key: HUDI-1332
 URL: https://issues.apache.org/jira/browse/HUDI-1332
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: wangxianghu
Assignee: wangxianghu
 Fix For: 0.7.0


a flink implementation for bloom index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-910) Intruduce FlinkHoodieIndex to hudi-flink-client

2020-10-11 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu updated HUDI-910:
-
Description: Abstract implemetation of HoodieIndex for flink  (was: Add 
index implemetation to hudi-flink-client)

> Intruduce  FlinkHoodieIndex to hudi-flink-client
> 
>
> Key: HUDI-910
> URL: https://issues.apache.org/jira/browse/HUDI-910
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
>
> Abstract implemetation of HoodieIndex for flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-910) Intruduce FlinkHoodieIndex to hudi-flink-client

2020-10-11 Thread wangxianghu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangxianghu updated HUDI-910:
-
Summary: Intruduce  FlinkHoodieIndex to hudi-flink-client  (was: Add index 
implemetation to hudi-flink-client)

> Intruduce  FlinkHoodieIndex to hudi-flink-client
> 
>
> Key: HUDI-910
> URL: https://issues.apache.org/jira/browse/HUDI-910
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: wangxianghu
>Assignee: wangxianghu
>Priority: Major
>  Labels: pull-request-available
>
> Add index implemetation to hudi-flink-client



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1034) Document info about test structure and guide

2020-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1034:
-
Labels: documentation pull-request-available  (was: documentation)

> Document info about test structure and guide
> 
>
> Key: HUDI-1034
> URL: https://issues.apache.org/jira/browse/HUDI-1034
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Priority: Minor
>  Labels: documentation, pull-request-available
>
> Create a test guide section in contribution guide to layout test structure 
> and other tips for writing tests
>  
> Quote [~vinothchandar]
> unit - testing basic functionality at the class level, potentially using 
> mocks. Expected to finish quicker
> functional - brings up the services needed and runs test without mocking
> integration - runs subset of functional tests, on a full fledged enviroment 
> with dockerized services
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan opened a new pull request #2169: [HUDI-1034][DOCS] Add code guidelines for writing tests

2020-10-11 Thread GitBox


xushiyan opened a new pull request #2169:
URL: https://github.com/apache/hudi/pull/2169


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1330) handle prefix filtering at directory level

2020-10-11 Thread liwei (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211843#comment-17211843
 ] 

liwei commented on HUDI-1330:
-

good job :D

> handle prefix filtering at directory level
> --
>
> Key: HUDI-1330
> URL: https://issues.apache.org/jira/browse/HUDI-1330
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Utilities
>Reporter: Vu Ho
>Priority: Major
>  Labels: pull-request-available
>
> The current DFSPathSelector only ignore prefix(_, .) at the file level while 
> files under intermediate directories are still being considered
>  E.g. when reading from a Spark structure streaming source which very often 
> consists of a .checkpoint directory, all metadata files should be ignored. 
> This is not the case currently. E.g.
> {code:java}
> /path/.file <-- skipped
> /path/.path/file <-- still being read
>  {code}
> {code:java}
> $SPARK_HOME/bin/spark-submit --class 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer 
> ~/.m2/repository/org/apache/hudi/hudi-utilities-bundle_2.12/0.6.1-SNAPSHOT/hudi-utilities-bundle_2.12-0.6.1-SNAPSHOT.jar
>  --target-base-path 'file:///tmp/hoodie/output/cow' --target-table hoodie_cow 
> --table-type COPY_ON_WRITE --props 'dfs-source.properties' --source-class 
> org.apache.hudi.utilities.sources.ParquetDFSSource  --source-ordering-field 
> ts --op UPSERT --continuous --min-sync-interval-seconds 30 {code}
>  configs:
> {code:java}
> hoodie.upsert.shuffle.parallelism=2
> hoodie.insert.shuffle.parallelism=2
> hoodie.delete.shuffle.parallelism=2
> hoodie.bulkinsert.shuffle.parallelism=2
> hoodie.datasource.write.recordkey.field=id
> hoodie.datasource.write.partitionpath.field=dt
> # DFS Source
> hoodie.deltastreamer.source.dfs.root=file:///tmp/hoodie/input {code}
> Stacktrace: 
> {code:java}
> Driver stacktrace:Driver stacktrace: at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1925)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1913)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1912)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1912) 
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:948)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:948)
>  at scala.Option.foreach(Option.scala:257) at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:948)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2146)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2095)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2084)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:759) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:990) at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>  at org.apache.spark.rdd.RDD.withScope(RDD.scala:385) at 
> org.apache.spark.rdd.RDD.collect(RDD.scala:989) at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:635)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:241)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:194)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:194)
>  at scala.Option.orElse(Option.scala:289) at 
> org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:193)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242) 
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230) at 
> org.apache.spark.sql.DataFrameReader.parqu