Francisco Orchard created SPARK-25687:
-----------------------------------------

             Summary: A dataset can store a column as sequence of Vectors but 
not directly vectors
                 Key: SPARK-25687
                 URL: https://issues.apache.org/jira/browse/SPARK-25687
             Project: Spark
          Issue Type: Bug
          Components: ML, SQL
    Affects Versions: 2.3.1
            Reporter: Francisco Orchard


A dataset can store an array of vectors but not a vector. This is inconsistent.

To reproduce:

{
    import org.apache.spark.sql.Row
    import org.apache.spark.ml.linalg.\{Vectors, DenseVector, Vector}
    import org.apache.spark.ml.linalg.SQLDataTypes.VectorType
    import org.apache.spark.sql.types._
    import spark.implicits._

    val rdd = sc.parallelize(Seq(Row(Seq(Vectors.dense(Array(1.0, 
2.0)).toSparse))))
    val arrayOfVectorsDS = spark.createDataFrame(rowRDD= rdd, schema = new 
StructType(Array(StructField(name = "value", dataType = ArrayType(elementType = 
VectorType))))).as[Seq[Vector]]
//    val vectorsDS = arrayOfVectorsDS.flatMap(a => a)
    .show
}

 If the line before ".show" is uncommented this code will throw the well known 
error: error: Unable to find encoder for type stored in a Dataset. Primitive 
types (Int, String, etc) and Product types (case classes) are supported by 
importing spark.implicits._ Support for serializing other types will be added 
in future releases.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to