Peter Vary created HIVE-24504: --------------------------------- Summary: VectorFileSinkArrowOperator does not serialize complex types correctly Key: HIVE-24504 URL: https://issues.apache.org/jira/browse/HIVE-24504 Project: Hive Issue Type: Bug Components: llap Reporter: Peter Vary Assignee: Peter Vary
When the table has complex types and the result has 0 records theĀ VectorFileSinkArrowOperator only serializes the primitive types correctly. For complex types only the main type is set which causes issues for clients trying to read data. Got the following HWC exception: {code:java} Previous exception in task: Unsupported data type: Null org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71) org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106) org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98) org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala) org.apache.spark.sql.vectorized.ArrowColumnVector.<init>(ArrowColumnVector.java:135) com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105) com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29) org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59) org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown Source) org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) org.apache.spark.rdd.RDD.iterator(RDD.scala:288) org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) org.apache.spark.rdd.RDD.iterator(RDD.scala:288) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) org.apache.spark.scheduler.Task.run(Task.scala:109) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117) at org.apache.spark.scheduler.Task.run(Task.scala:119) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)