If I have a column store in a parquet file under INT type and I create a table with the same column but change the time from int to bigint.
in Spark 2.0 it shows error: Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 259.0 failed 4 times, most recent failure: Lost task 0.3 in stage 259.0 (TID 22958, slave2): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainLongDictionary at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:48) at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getInt(OnHeapColumnVector.java:233) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) So I think this error still happen in Spark 2.0 > On Aug 1, 2016, at 9:21 AM, Chanh Le <giaosu...@gmail.com> wrote: > > Sorry my bad, I ran in Spark 1.6.1 but what about this error? > Why Int cannot be cast to Long? > > > Thanks. > > >> On Aug 1, 2016, at 2:44 AM, Michael Armbrust <mich...@databricks.com >> <mailto:mich...@databricks.com>> wrote: >> >> Are you sure you are running Spark 2.0? >> >> In your stack trace I see SqlNewHadoopRDD, which was removed in #12354 >> <https://github.com/apache/spark/pull/12354>. >> >> On Sun, Jul 31, 2016 at 2:12 AM, Chanh Le <giaosu...@gmail.com >> <mailto:giaosu...@gmail.com>> wrote: >> Hi everyone, >> Why MutableInt cannot be cast to MutableLong? >> It’s really weird and seems Spark 2.0 has a lot of error with parquet about >> format. >> >> org.apache.spark.sql.catalyst.expressions.MutableInt cannot be cast to >> org.apache.spark.sql.catalyst.expressions.MutableL ong >> >> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read >> value at 0 in block 0 in file >> file:/data/etl-report/parquet/AD_COOKIE_REPORT/time=2016-07- >> 25-16/network_id=31713/part-r-00000-9adbef89-f2f4-4836-a50c-a2e7b381d558.snappy.parquet >> at >> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228) >> at >> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201) >> at >> org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:194) >> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >> at >> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88) >> at >> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86) >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) >> at >> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:89) >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> Caused by: java.lang.ClassCastException: >> org.apache.spark.sql.catalyst.expressions.MutableInt cannot be cast to >> org.apache.spark.sql.catalyst.expressions.MutableL >> ong >> at >> org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.setLong(SpecificMutableRow.scala:295) >> at >> org.apache.spark.sql.execution.datasources.parquet.CatalystRowConverter$RowUpdater.setLong(CatalystRowConverter.scala:161) >> at >> org.apache.spark.sql.execution.datasources.parquet.CatalystPrimitiveConverter.addLong(CatalystRowConverter.scala:85) >> at >> org.apache.parquet.column.impl.ColumnReaderImpl$2$4.writeValue(ColumnReaderImpl.java:269) >> at >> org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:365) >> at >> org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:405) >> at >> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:209) >> ... 20 more >> >