[jira] [Comment Edited] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-03-10 Thread Jasmine Omeke (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056437#comment-17056437
 ] 

Jasmine Omeke edited comment on HUDI-690 at 3/10/20, 9:42 PM:
--

pinging to triage 

[~vbalaji]


was (Author: jomeke):
 

[~vbalaji]

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Priority: Major
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
> "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
> 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
> 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
> "TOTAL_LOG_FILE_SIZE": 44197.0}})at 
> org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> 

[jira] [Commented] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-03-10 Thread Jasmine Omeke (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056437#comment-17056437
 ] 

Jasmine Omeke commented on HUDI-690:


 

[~vbalaji]

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Priority: Major
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
> "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
> 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
> 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
> "TOTAL_LOG_FILE_SIZE": 44197.0}})at 
> org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> 

[jira] [Created] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-03-10 Thread Jasmine Omeke (Jira)
Jasmine Omeke created HUDI-690:
--

 Summary: filtercompletedInstants in HudiSnapshotCopier not working 
as expected for MOR tables
 Key: HUDI-690
 URL: https://issues.apache.org/jira/browse/HUDI-690
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: Jasmine Omeke


Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
Backup of merge on read tables: 
[https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]

 

The error:

 
{code:java}
20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
web-proxy.bt.local Proxy Port: 3128
20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
/.hoodie/hoodie.properties
20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
web-proxy.bt.local Proxy Port: 3128
20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
MERGE_ON_READ from 
20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@77f7352a
20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) with 
ID 2
20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has registered 
(new total is 1)
20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
1, None)
20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) with 
ID 4
20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has registered 
(new total is 2)Exception in thread "main" java.lang.IllegalStateException: 
Hudi File Id (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
"deltaFilePaths": 
[".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
 ".7104bb0b-20f6-4dec-981b-c11
bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
 ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
177872977", 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
 "dataFilePath": 
"7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
 "fileId": "7
104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
"created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
"TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
"TOTAL_IO_WRITE_MB": 512.0,
 "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
(20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
[".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
 ".7104bb0b-20f6-4dec-981b-c11bf20ad
e4a-0_20200308180755.log.4_3-727192-165430450", 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
 "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
"partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
"TOTAL_LOG_FILE_SIZE": 44197.0}})at 
org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at