[ https://issues.apache.org/jira/browse/SPARK-22320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212581#comment-16212581 ]
Dongjoon Hyun commented on SPARK-22320: --------------------------------------- I checked that 2.2.0 and 2.1.2 has the same problem and 2.0.2 fails with the following exception. {code} scala> df.write.mode("overwrite").orc("/tmp/o3") 17/10/20 05:34:42 ERROR Utils: Aborting task java.lang.ClassCastException: org.apache.spark.ml.linalg.VectorUDT cannot be cast to org.apache.spark.sql.types.StructType at org.apache.spark.sql.hive.HiveInspectors$class.wrap(HiveInspectors.scala:558) {code} > ORC should support VectorUDT/MatrixUDT > -------------------------------------- > > Key: SPARK-22320 > URL: https://issues.apache.org/jira/browse/SPARK-22320 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2, 2.1.2, 2.2.0 > Reporter: zhengruifeng > > I save dataframe containing vectors in ORC format, when I read it back, the > format is changed. > {code} > scala> import org.apache.spark.ml.linalg._ > import org.apache.spark.ml.linalg._ > scala> val data = Seq((1,Vectors.dense(1.0,2.0)), (2,Vectors.sparse(8, > Array(4), Array(1.0)))) > data: Seq[(Int, org.apache.spark.ml.linalg.Vector)] = List((1,[1.0,2.0]), > (2,(8,[4],[1.0]))) > scala> val df = data.toDF("i", "vec") > df: org.apache.spark.sql.DataFrame = [i: int, vec: vector] > scala> df.schema > res0: org.apache.spark.sql.types.StructType = > StructType(StructField(i,IntegerType,false), > StructField(vec,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true)) > scala> df.write.orc("/tmp/123") > scala> val df2 = spark.sqlContext.read.orc("/tmp/123") > df2: org.apache.spark.sql.DataFrame = [i: int, vec: struct<type: tinyint, > size: int ... 2 more fields>] > scala> df2.schema > res3: org.apache.spark.sql.types.StructType = > StructType(StructField(i,IntegerType,true), > StructField(vec,StructType(StructField(type,ByteType,true), > StructField(size,IntegerType,true), > StructField(indices,ArrayType(IntegerType,true),true), > StructField(values,ArrayType(DoubleType,true),true)),true)) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org