[jira] [Updated] (HUDI-4952) Reading from metadata table could fail when there are no completed commits

2022-10-04 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4952:
-
Sprint: 2022/09/19, 2022/10/04  (was: 2022/09/19)

> Reading from metadata table could fail when there are no completed commits
> --
>
> Key: HUDI-4952
> URL: https://issues.apache.org/jira/browse/HUDI-4952
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> When metadata table is just getting initialized, but first commit is not yet 
> fully complete, reading from metadata table could fail w/ below stacktrace. 
>  
> {code:java}
> 22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job 
> 39d720db-b15d-4823-b8b1-54398b143d6e
> org.apache.hudi.exception.HoodieException: Error fetching partition paths 
> from metadata table
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.(BaseHoodieTableFileIndex.java:139)
> at 
> org.apache.hudi.hadoop.HiveHoodieTableFileIndex.(HiveHoodieTableFileIndex.java:49)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
> at 
> org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
> at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
> at 
> org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
> at 
> org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to 
> retrieve list of partition from metadata
> at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
> ... 32 more
> Caused by: java.util.NoSuchElementException: No value present in Option
> at org.apache.hudi.common.util.Option.get(Option.java:89)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java

[jira] [Updated] (HUDI-4952) Reading from metadata table could fail when there are no completed commits

2022-10-04 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4952:
-
Status: Patch Available  (was: In Progress)

> Reading from metadata table could fail when there are no completed commits
> --
>
> Key: HUDI-4952
> URL: https://issues.apache.org/jira/browse/HUDI-4952
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> When metadata table is just getting initialized, but first commit is not yet 
> fully complete, reading from metadata table could fail w/ below stacktrace. 
>  
> {code:java}
> 22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job 
> 39d720db-b15d-4823-b8b1-54398b143d6e
> org.apache.hudi.exception.HoodieException: Error fetching partition paths 
> from metadata table
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.(BaseHoodieTableFileIndex.java:139)
> at 
> org.apache.hudi.hadoop.HiveHoodieTableFileIndex.(HiveHoodieTableFileIndex.java:49)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
> at 
> org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
> at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
> at 
> org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
> at 
> org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to 
> retrieve list of partition from metadata
> at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
> ... 32 more
> Caused by: java.util.NoSuchElementException: No value present in Option
> at org.apache.hudi.common.util.Option.get(Option.java:89)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:1001)

[jira] [Updated] (HUDI-4952) Reading from metadata table could fail when there are no completed commits

2022-10-04 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4952:
--
Story Points: 2

> Reading from metadata table could fail when there are no completed commits
> --
>
> Key: HUDI-4952
> URL: https://issues.apache.org/jira/browse/HUDI-4952
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> When metadata table is just getting initialized, but first commit is not yet 
> fully complete, reading from metadata table could fail w/ below stacktrace. 
>  
> {code:java}
> 22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job 
> 39d720db-b15d-4823-b8b1-54398b143d6e
> org.apache.hudi.exception.HoodieException: Error fetching partition paths 
> from metadata table
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.(BaseHoodieTableFileIndex.java:139)
> at 
> org.apache.hudi.hadoop.HiveHoodieTableFileIndex.(HiveHoodieTableFileIndex.java:49)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
> at 
> org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
> at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
> at 
> org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
> at 
> org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to 
> retrieve list of partition from metadata
> at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
> ... 32 more
> Caused by: java.util.NoSuchElementException: No value present in Option
> at org.apache.hudi.common.util.Option.get(Option.java:89)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:1001)
> at 
> o

[jira] [Updated] (HUDI-4952) Reading from metadata table could fail when there are no completed commits

2022-10-01 Thread Zhaojing Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaojing Yu updated HUDI-4952:
--
Fix Version/s: 0.12.2
   (was: 0.12.1)

> Reading from metadata table could fail when there are no completed commits
> --
>
> Key: HUDI-4952
> URL: https://issues.apache.org/jira/browse/HUDI-4952
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> When metadata table is just getting initialized, but first commit is not yet 
> fully complete, reading from metadata table could fail w/ below stacktrace. 
>  
> {code:java}
> 22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job 
> 39d720db-b15d-4823-b8b1-54398b143d6e
> org.apache.hudi.exception.HoodieException: Error fetching partition paths 
> from metadata table
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.(BaseHoodieTableFileIndex.java:139)
> at 
> org.apache.hudi.hadoop.HiveHoodieTableFileIndex.(HiveHoodieTableFileIndex.java:49)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
> at 
> org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
> at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
> at 
> org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
> at 
> org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to 
> retrieve list of partition from metadata
> at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
> ... 32 more
> Caused by: java.util.NoSuchElementException: No value present in Option
> at org.apache.hudi.common.util.Option.get(Option.java:89)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetada

[jira] [Updated] (HUDI-4952) Reading from metadata table could fail when there are no completed commits

2022-09-30 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-4952:
-
Status: In Progress  (was: Open)

> Reading from metadata table could fail when there are no completed commits
> --
>
> Key: HUDI-4952
> URL: https://issues.apache.org/jira/browse/HUDI-4952
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> When metadata table is just getting initialized, but first commit is not yet 
> fully complete, reading from metadata table could fail w/ below stacktrace. 
>  
> {code:java}
> 22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job 
> 39d720db-b15d-4823-b8b1-54398b143d6e
> org.apache.hudi.exception.HoodieException: Error fetching partition paths 
> from metadata table
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.(BaseHoodieTableFileIndex.java:139)
> at 
> org.apache.hudi.hadoop.HiveHoodieTableFileIndex.(HiveHoodieTableFileIndex.java:49)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
> at 
> org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
> at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
> at 
> org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
> at 
> org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to 
> retrieve list of partition from metadata
> at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
> ... 32 more
> Caused by: java.util.NoSuchElementException: No value present in Option
> at org.apache.hudi.common.util.Option.get(Option.java:89)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:1001)
> at 
> or

[jira] [Updated] (HUDI-4952) Reading from metadata table could fail when there are no completed commits

2022-09-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4952:
-
Labels: pull-request-available  (was: )

> Reading from metadata table could fail when there are no completed commits
> --
>
> Key: HUDI-4952
> URL: https://issues.apache.org/jira/browse/HUDI-4952
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.12.1
>
>
> When metadata table is just getting initialized, but first commit is not yet 
> fully complete, reading from metadata table could fail w/ below stacktrace. 
>  
> {code:java}
> 22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job 
> 39d720db-b15d-4823-b8b1-54398b143d6e
> org.apache.hudi.exception.HoodieException: Error fetching partition paths 
> from metadata table
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.(BaseHoodieTableFileIndex.java:139)
> at 
> org.apache.hudi.hadoop.HiveHoodieTableFileIndex.(HiveHoodieTableFileIndex.java:49)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
> at 
> org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
> at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
> at 
> org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
> at 
> org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to 
> retrieve list of partition from metadata
> at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
> ... 32 more
> Caused by: java.util.NoSuchElementException: No value present in Option
> at org.apache.hudi.common.util.Option.get(Option.java:89)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:1

[jira] [Updated] (HUDI-4952) Reading from metadata table could fail when there are no completed commits

2022-09-29 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4952:
--
Sprint: 2022/09/19

> Reading from metadata table could fail when there are no completed commits
> --
>
> Key: HUDI-4952
> URL: https://issues.apache.org/jira/browse/HUDI-4952
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Priority: Major
>
> When metadata table is just getting initialized, but first commit is not yet 
> fully complete, reading from metadata table could fail w/ below stacktrace. 
>  
> {code:java}
> 22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job 
> 39d720db-b15d-4823-b8b1-54398b143d6e
> org.apache.hudi.exception.HoodieException: Error fetching partition paths 
> from metadata table
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.(BaseHoodieTableFileIndex.java:139)
> at 
> org.apache.hudi.hadoop.HiveHoodieTableFileIndex.(HiveHoodieTableFileIndex.java:49)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
> at 
> org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
> at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
> at 
> org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
> at 
> org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to 
> retrieve list of partition from metadata
> at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
> ... 32 more
> Caused by: java.util.NoSuchElementException: No value present in Option
> at org.apache.hudi.common.util.Option.get(Option.java:89)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:1001)
> at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getPartitionFileSliceToKeysMapping(HoodieBackedTableMetadata.java:377)

[jira] [Updated] (HUDI-4952) Reading from metadata table could fail when there are no completed commits

2022-09-29 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4952:
--
Priority: Blocker  (was: Major)

> Reading from metadata table could fail when there are no completed commits
> --
>
> Key: HUDI-4952
> URL: https://issues.apache.org/jira/browse/HUDI-4952
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.12.1
>
>
> When metadata table is just getting initialized, but first commit is not yet 
> fully complete, reading from metadata table could fail w/ below stacktrace. 
>  
> {code:java}
> 22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job 
> 39d720db-b15d-4823-b8b1-54398b143d6e
> org.apache.hudi.exception.HoodieException: Error fetching partition paths 
> from metadata table
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.(BaseHoodieTableFileIndex.java:139)
> at 
> org.apache.hudi.hadoop.HiveHoodieTableFileIndex.(HiveHoodieTableFileIndex.java:49)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
> at 
> org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
> at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
> at 
> org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
> at 
> org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to 
> retrieve list of partition from metadata
> at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
> ... 32 more
> Caused by: java.util.NoSuchElementException: No value present in Option
> at org.apache.hudi.common.util.Option.get(Option.java:89)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:1001)
> at 
> org.apache.hudi.metadata.Hoodie

[jira] [Updated] (HUDI-4952) Reading from metadata table could fail when there are no completed commits

2022-09-29 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4952:
--
Fix Version/s: 0.12.1

> Reading from metadata table could fail when there are no completed commits
> --
>
> Key: HUDI-4952
> URL: https://issues.apache.org/jira/browse/HUDI-4952
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.12.1
>
>
> When metadata table is just getting initialized, but first commit is not yet 
> fully complete, reading from metadata table could fail w/ below stacktrace. 
>  
> {code:java}
> 22/08/20 02:56:58 ERROR client.RemoteDriver: Failed to run client job 
> 39d720db-b15d-4823-b8b1-54398b143d6e
> org.apache.hudi.exception.HoodieException: Error fetching partition paths 
> from metadata table
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:315)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
> at 
> org.apache.hudi.BaseHoodieTableFileIndex.(BaseHoodieTableFileIndex.java:139)
> at 
> org.apache.hudi.hadoop.HiveHoodieTableFileIndex.(HiveHoodieTableFileIndex.java:49)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatusForSnapshotMode(HoodieCopyOnWriteTableInputFormat.java:234)
> at 
> org.apache.hudi.hadoop.HoodieCopyOnWriteTableInputFormat.listStatus(HoodieCopyOnWriteTableInputFormat.java:141)
> at 
> org.apache.hudi.hadoop.HoodieParquetInputFormatBase.listStatus(HoodieParquetInputFormatBase.java:90)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.listStatus(HoodieCombineHiveInputFormat.java:889)
> at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getSplits(HoodieCombineHiveInputFormat.java:942)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getCombineSplits(HoodieCombineHiveInputFormat.java:241)
> at 
> org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getSplits(HoodieCombineHiveInputFormat.java:363)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
> at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:267)
> at 
> org.apache.spark.api.java.JavaRDDLike$class.getNumPartitions(JavaRDDLike.scala:65)
> at 
> org.apache.spark.api.java.AbstractJavaRDDLike.getNumPartitions(JavaRDDLike.scala:45)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:179)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:355)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:400)
> at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:365)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieMetadataException: Failed to 
> retrieve list of partition from metadata
> at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:113)
> at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
> ... 32 more
> Caused by: java.util.NoSuchElementException: No value present in Option
> at org.apache.hudi.common.util.Option.get(Option.java:89)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionFileSlices(HoodieTableMetadataUtil.java:1057)
> at 
> org.apache.hudi.metadata.HoodieTableMetadataUtil.getPartitionLatestMergedFileSlices(HoodieTableMetadataUtil.java:1001)
> at 
> org.apache.hudi.metadata.HoodieBackedTableM