[ https://issues.apache.org/jira/browse/SPARK-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-3231: ----------------------------- Component/s: (was: Spark Core) SQL > select on a table in parquet format containing smallint as a field type does > not work > ------------------------------------------------------------------------------------- > > Key: SPARK-3231 > URL: https://issues.apache.org/jira/browse/SPARK-3231 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.1.0 > Environment: The table is created through Hive-0.13. > SparkSql 1.1 is used. > Reporter: chirag aggarwal > > A table is created through hive. This table has a field of type smallint. The > format of the table is parquet. > select on this table works perfectly on hive shell. > But, when the select is run on this table from spark-sql, then the query > fails. > Steps to reproduce the issue: > -------------------------------------- > hive> create table abct (a smallint, b int) row format delimited fields > terminated by '|' stored as textfile; > A text file is stored in hdfs for this table. > hive> create table abc (a smallint, b int) stored as parquet; > hive> insert overwrite table abc select * from abct; > hive> select * from abc; > 2 1 > 2 2 > 2 3 > spark-sql> select * from abc; > 10:08:46 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to > stage failure: Task 0.0 in stage 33.0 (TID 2340) had a not serializable > result: org.apache.hadoop.io.IntWritable > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1158) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1147) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1146) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1146) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:685) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1364) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > But, if the type of this table is now changed to int, then spark-sql gives > the correct results. > hive> alter table abc change a a int; > spark-sql> select * from abc; > 2 1 > 2 2 > 2 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org