[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-05-22 Thread Yue Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-5768:

Fix Version/s: 0.13.1

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.13.1, 0.12.3
>
>
> Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-03-29 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-5768:
--
Fix Version/s: 0.12.3

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0, 0.12.3
>
>
> Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-03-06 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5768:
-
Component/s: metadata

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metadata
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-03-02 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5768:

Epic Link: HUDI-1292

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
>  

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5768:

Story Points: 1

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
>   at 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-13 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5768:

Sprint: Sprint 2023-01-31

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
>   at 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5768:
-
Labels: pull-request-available  (was: )

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
>   at 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5768:

Description: 
Using Hudi 0.13.0 and Spark 3.3.0, reading a table created by 0.13.0:
{code:java}
scala> val df = 
spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
scala> df.count
scala.MatchError: HFILE (of class org.apache.hudi.common.model.HoodieFileFormat)
  at 
org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
  at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
  at 
org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
  at org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
  at 
org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
  at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
  at 
org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
  at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
  at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
  at scala.collection.immutable.List.foldLeft(List.scala:91)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
  at scala.collection.immutable.List.foreach(List.scala:431)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
  at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
  at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
  at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
  at 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5768:

Description: 
Using Hudi 0.13.0 and Spark 3.3, reading a table created by 0.13.0:
{code:java}
scala> val df = 
spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
scala> df.count
scala.MatchError: HFILE (of class org.apache.hudi.common.model.HoodieFileFormat)
  at 
org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
  at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
  at 
org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
  at org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
  at 
org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
  at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
  at 
org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
  at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
  at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
  at scala.collection.immutable.List.foldLeft(List.scala:91)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
  at scala.collection.immutable.List.foreach(List.scala:431)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
  at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
  at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
  at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
  at 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5768:

Affects Version/s: 0.12.2
   0.12.1
   0.12.0

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.12.1, 0.12.2
>Reporter: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> Using Hudi 0.13.0 and Spark 3.3, reading a table created by 0.13.0:
> {code:java}
> scala> val df = 
> spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
> scala> df.count
> scala.MatchError: HFILE (of class 
> org.apache.hudi.common.model.HoodieFileFormat)
>   at 
> org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
>   at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
>   at 
> org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
>   at 
> org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>   at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>   at scala.collection.immutable.List.foldLeft(List.scala:91)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
>   at 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5768:

Description: 
Using Hudi 0.13.0 and Spark 3.3, reading a table created by 0.13.0:
{code:java}
scala> val df = 
spark.read.format("hudi").load("/Users/ethan/Work/tmp/20230127-test-cli-bundle/hudi_trips_cow_backup/.hoodie/metadata")
scala> df.count
scala.MatchError: HFILE (of class org.apache.hudi.common.model.HoodieFileFormat)
  at 
org.apache.hudi.HoodieBaseRelation.x$2$lzycompute(HoodieBaseRelation.scala:216)
  at org.apache.hudi.HoodieBaseRelation.x$2(HoodieBaseRelation.scala:215)
  at 
org.apache.hudi.HoodieBaseRelation.fileFormat$lzycompute(HoodieBaseRelation.scala:215)
  at org.apache.hudi.HoodieBaseRelation.fileFormat(HoodieBaseRelation.scala:215)
  at 
org.apache.hudi.HoodieBaseRelation.canPruneRelationSchema(HoodieBaseRelation.scala:295)
  at 
org.apache.hudi.BaseMergeOnReadSnapshotRelation.canPruneRelationSchema(MergeOnReadSnapshotRelation.scala:102)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:56)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning$$anonfun$apply0$1.applyOrElse(Spark33NestedSchemaPruning.scala:50)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
  at 
org.apache.spark.sql.catalyst.plans.logical.Aggregate.mapChildren(basicLogicalOperators.scala:976)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply0(Spark33NestedSchemaPruning.scala:50)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:44)
  at 
org.apache.spark.sql.execution.datasources.Spark33NestedSchemaPruning.apply(Spark33NestedSchemaPruning.scala:39)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
  at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
  at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
  at scala.collection.immutable.List.foldLeft(List.scala:91)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
  at scala.collection.immutable.List.foreach(List.scala:431)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
  at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
  at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
  at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
  at 

[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5768:

Fix Version/s: 0.13.0

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5768) Fail to read metadata table in Spark Datasource

2023-02-11 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5768:

Priority: Blocker  (was: Major)

> Fail to read metadata table in Spark Datasource
> ---
>
> Key: HUDI-5768
> URL: https://issues.apache.org/jira/browse/HUDI-5768
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)