[ https://issues.apache.org/jira/browse/HUDI-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-6729: --------------------------------- Labels: pull-request-available (was: ) > Fix get partition values from path for non-string type partition column > ----------------------------------------------------------------------- > > Key: HUDI-6729 > URL: https://issues.apache.org/jira/browse/HUDI-6729 > Project: Apache Hudi > Issue Type: Bug > Components: hudi-utilities > Reporter: Wechar > Priority: Major > Labels: pull-request-available > > When we enable {{hoodie.datasource.read.extract.partition.values.from.path}} > to get partition values from path instead of data file, the exception throw > if partition column is not string type: > {code:bash} > Caused by: java.lang.ClassCastException: > org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:103) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getInt(rows.scala:41) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getInt$(rows.scala:41) > at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getInt(rows.scala:195) > at > org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:97) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:245) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:264) > at > org.apache.spark.sql.execution.datasources.parquet.Spark32LegacyHoodieParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(Spark32LegacyHoodieParquetFileFormat.scala:314) > at > org.apache.hudi.HoodieDataSourceHelper$.$anonfun$buildHoodieParquetReader$1(HoodieDataSourceHelper.scala:67) > at > org.apache.hudi.HoodieBaseRelation.$anonfun$createBaseFileReader$2(HoodieBaseRelation.scala:602) > at > org.apache.hudi.HoodieBaseRelation$BaseFileReader.apply(HoodieBaseRelation.scala:680) > at > org.apache.hudi.HoodieBaseRelation$.$anonfun$projectReader$1(HoodieBaseRelation.scala:706) > at > org.apache.hudi.HoodieBaseRelation$.$anonfun$projectReader$2(HoodieBaseRelation.scala:711) > at > org.apache.hudi.HoodieBaseRelation$BaseFileReader.apply(HoodieBaseRelation.scala:680) > at > org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:96) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:131) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)