[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yue Zhang updated HUDI-5442: Fix Version/s: 0.14.0 (was: 0.13.1) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.14.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEn
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14, Sprint 2023-02-28, Sprint 2023-03-14 (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14, Sprint 2023-02-28) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.1 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplit
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14, Sprint 2023-02-28 (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.1 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(Backgrou
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.1 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at >
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31 (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.N
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Story Points: 2 (was: 1) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97) > at > io
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthenticati
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.Hdfs
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Story Points: 1 (was: 2) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97) > at > io
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Priority: Critical (was: Blocker) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97) >
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Status: In Progress (was: Open) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97) > at
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Status: Open (was: Patch Available) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97) >
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Reviewers: Raymond Xu > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97) > at > io.trin
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Status: In Progress (was: Open) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97) >
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Status: Patch Available (was: In Progress) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5442: -- Story Points: 2 (was: 5) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > > Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, > using eager listing only. This leads to scanning all table partitions in the > file index, regardless of the queryPaths provided (for Trino Hive connector, > only one partition is passed in). > {code:java} > public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, > HoodieTableMetaClient metaClient, > TypedProperties configProperties, > HoodieTableQueryType queryType, > List queryPaths, > Option specifiedQueryInstant, > boolean shouldIncludePendingCommits > ) { > super(engineContext, > metaClient, > configProperties, > queryType, > queryPaths, > specifiedQueryInstant, > shouldIncludePendingCommits, > true, > new NoopCache(), > false); > } {code} > After flipping it to true for testing, the following exception is thrown. > {code:java} > io.trino.spi.TrinoException: Failed to parse partition column values from the > partition-path: likely non-encoded slashes being used in partition column's > values. You can try to work this around by switching listing mode to eager > at > io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) > at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) > at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) > at > io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:833) > Caused by: org.apache.hudi.exception.HoodieException: Failed to parse > partition column values from the partition-path: likely non-encoded slashes > being used in partition column's values. You can try to work this around by > switching listing mode to eager > at > org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) > at > org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) > at > java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) > at > java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) > at > java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) > at > java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) > at > java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) > at > java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) > at > org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) > at > org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) > at > org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) > at > org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) > at > org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) > at > io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) > at > io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) > at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97) >
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Description: Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, using eager listing only. This leads to scanning all table partitions in the file index, regardless of the queryPaths provided (for Trino Hive connector, only one partition is passed in). {code:java} public HiveHoodieTableFileIndex(HoodieEngineContext engineContext, HoodieTableMetaClient metaClient, TypedProperties configProperties, HoodieTableQueryType queryType, List queryPaths, Option specifiedQueryInstant, boolean shouldIncludePendingCommits ) { super(engineContext, metaClient, configProperties, queryType, queryPaths, specifiedQueryInstant, shouldIncludePendingCommits, true, new NoopCache(), false); } {code} After flipping it to true for testing, the following exception is thrown. {code:java} io.trino.spi.TrinoException: Failed to parse partition column values from the partition-path: likely non-encoded slashes being used in partition column's values. You can try to work this around by switching listing mode to eager at io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:284) at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38) at io.trino.$gen.Trino_39220221217_092723_2.run(Unknown Source) at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: org.apache.hudi.exception.HoodieException: Failed to parse partition column values from the partition-path: likely non-encoded slashes being used in partition column's values. You can try to work this around by switching listing mode to eager at org.apache.hudi.BaseHoodieTableFileIndex.parsePartitionColumnValues(BaseHoodieTableFileIndex.java:317) at org.apache.hudi.BaseHoodieTableFileIndex.lambda$listPartitionPaths$6(BaseHoodieTableFileIndex.java:288) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) at org.apache.hudi.BaseHoodieTableFileIndex.listPartitionPaths(BaseHoodieTableFileIndex.java:291) at org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:205) at org.apache.hudi.BaseHoodieTableFileIndex.getAllInputFileSlices(BaseHoodieTableFileIndex.java:216) at org.apache.hudi.hadoop.HiveHoodieTableFileIndex.listFileSlices(HiveHoodieTableFileIndex.java:71) at org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:263) at org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:158) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325) at org.apache.hudi.hadoop.HoodieParquetInputFormatBase.getSplits(HoodieParquetInputFormatBase.java:68) at io.trino.plugin.hive.BackgroundHiveSplitLoader.lambda$loadPartition$2(BackgroundHiveSplitLoader.java:493) at io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97) at io.trino.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:493) at io.trino.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:353) at io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:277) ... 6 more {code} > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: r
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-5442: -- Priority: Critical (was: Major) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Sprint: 0.13.0 Final Sprint > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Critical > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Priority: Blocker (was: Critical) > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Blocker > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Story Points: 5 > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Component/s: reader-core trino-presto > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug > Components: reader-core, trino-presto >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Fix Version/s: 0.13.0 > Fix HiveHoodieTableFileIndex to use lazy listing > > > Key: HUDI-5442 > URL: https://issues.apache.org/jira/browse/HUDI-5442 > Project: Apache Hudi > Issue Type: Bug >Reporter: Ethan Guo >Priority: Major > Fix For: 0.13.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)