[jira] [Comment Edited] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables
[ https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056437#comment-17056437 ] Jasmine Omeke edited comment on HUDI-690 at 3/10/20, 9:42 PM: -- pinging to triage [~vbalaji] was (Author: jomeke): [~vbalaji] > filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR > tables > > > Key: HUDI-690 > URL: https://issues.apache.org/jira/browse/HUDI-690 > Project: Apache Hudi (incubating) > Issue Type: Improvement >Reporter: Jasmine Omeke >Priority: Major > > Hi. I encountered an error while using the HudiSnapshotCopier class to make a > Backup of merge on read tables: > [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java] > > The error: > > {code:java} > 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: > web-proxy.bt.local Proxy Port: 3128 > 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from > /.hoodie/hoodie.properties > 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: > web-proxy.bt.local Proxy Port: 3128 > 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type > MERGE_ON_READ from > 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants > java.util.stream.ReferencePipeline$Head@77f7352a > 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) > with ID 2 > 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has > registered (new total is 1) > 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager > ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, > BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283 > 1, None) > 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) > with ID 4 > 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has > registered (new total is 2)Exception in thread "main" > java.lang.IllegalStateException: Hudi File Id > (HoodieFileGroupId{partitionPath='created_at_month=2020-03', > fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending > compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", > "deltaFilePaths": > [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496", > ".7104bb0b-20f6-4dec-981b-c11 > bf20ade4a-0_20200308213934.log.2_3-761601-172985464", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377- > 177872977", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"], > "dataFilePath": > "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet", > "fileId": "7 > 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": > "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, > "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, > "TOTAL_IO_WRITE_MB": 512.0, > "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), > (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": > [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865", > > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423", > > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814", > ".7104bb0b-20f6-4dec-981b-c11bf20ad > e4a-0_20200308180755.log.4_3-727192-165430450", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"], > "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2 > 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", > "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": > 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": > 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, > "TOTAL_LOG_FILE_SIZE": 44197.0}})at > org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at >
[jira] [Commented] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables
[ https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056437#comment-17056437 ] Jasmine Omeke commented on HUDI-690: [~vbalaji] > filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR > tables > > > Key: HUDI-690 > URL: https://issues.apache.org/jira/browse/HUDI-690 > Project: Apache Hudi (incubating) > Issue Type: Improvement >Reporter: Jasmine Omeke >Priority: Major > > Hi. I encountered an error while using the HudiSnapshotCopier class to make a > Backup of merge on read tables: > [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java] > > The error: > > {code:java} > 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: > web-proxy.bt.local Proxy Port: 3128 > 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from > /.hoodie/hoodie.properties > 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: > web-proxy.bt.local Proxy Port: 3128 > 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type > MERGE_ON_READ from > 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants > java.util.stream.ReferencePipeline$Head@77f7352a > 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) > with ID 2 > 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has > registered (new total is 1) > 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager > ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, > BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283 > 1, None) > 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered > executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) > with ID 4 > 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has > registered (new total is 2)Exception in thread "main" > java.lang.IllegalStateException: Hudi File Id > (HoodieFileGroupId{partitionPath='created_at_month=2020-03', > fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending > compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", > "deltaFilePaths": > [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496", > ".7104bb0b-20f6-4dec-981b-c11 > bf20ade4a-0_20200308213934.log.2_3-761601-172985464", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377- > 177872977", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"], > "dataFilePath": > "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet", > "fileId": "7 > 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": > "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, > "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, > "TOTAL_IO_WRITE_MB": 512.0, > "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), > (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": > [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865", > > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423", > > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814", > ".7104bb0b-20f6-4dec-981b-c11bf20ad > e4a-0_20200308180755.log.4_3-727192-165430450", > ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"], > "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2 > 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", > "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": > 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": > 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, > "TOTAL_LOG_FILE_SIZE": 44197.0}})at > org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > at > java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at >
[jira] [Created] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables
Jasmine Omeke created HUDI-690: -- Summary: filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables Key: HUDI-690 URL: https://issues.apache.org/jira/browse/HUDI-690 Project: Apache Hudi (incubating) Issue Type: Improvement Reporter: Jasmine Omeke Hi. I encountered an error while using the HudiSnapshotCopier class to make a Backup of merge on read tables: [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java] The error: {code:java} 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: web-proxy.bt.local Proxy Port: 3128 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from /.hoodie/hoodie.properties 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: web-proxy.bt.local Proxy Port: 3128 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ from 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants java.util.stream.ReferencePipeline$Head@77f7352a 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) with ID 2 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has registered (new total is 1) 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283 1, None) 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) with ID 4 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has registered (new total is 2)Exception in thread "main" java.lang.IllegalStateException: Hudi File Id (HoodieFileGroupId{partitionPath='created_at_month=2020-03', fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", "deltaFilePaths": [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496", ".7104bb0b-20f6-4dec-981b-c11 bf20ade4a-0_20200308213934.log.2_3-761601-172985464", ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657", ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377- 177872977", ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"], "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet", "fileId": "7 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, "TOTAL_IO_WRITE_MB": 512.0, "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865", ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423", ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814", ".7104bb0b-20f6-4dec-981b-c11bf20ad e4a-0_20200308180755.log.4_3-727192-165430450", ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"], "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, "TOTAL_LOG_FILE_SIZE": 44197.0}})at org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at