[ https://issues.apache.org/jira/browse/SPARK-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418526#comment-15418526 ]
胡振宇 commented on SPARK-14850: ----------------------------- I try to run your code on spark1.6.1 but i found that "toDF" cannot be used in this example Here are my code object Example{ def main (args:Array[String]){ case class Test(num:Int,vector:Vector) val conf = new SparkConf.setAppname("Example") val sqlContext=new SQLContext(sc) import sqlContext.implicts._ val temp=sqlContext.sparkContext.parallelize(0,until 1e4.toInt,1).map(i=>Test(i,Vectors.dense(Array.fill(1e6.toInt)(1.0)))).toDF() //at this step toDF can be used I do } } sc.parallelize(0 until 1e4.toInt, 1).map { i => (i, Vectors.dense(Array.fill(1e6.toInt)(1.0))) }.toDF.rdd.count() I even use sparkcontext but toDF cannot be used too Do you have a solution to run the example on spark1.6.1? Thank you } > VectorUDT/MatrixUDT should take primitive arrays without boxing > --------------------------------------------------------------- > > Key: SPARK-14850 > URL: https://issues.apache.org/jira/browse/SPARK-14850 > Project: Spark > Issue Type: Improvement > Components: ML, SQL > Affects Versions: 1.5.2, 1.6.1, 2.0.0 > Reporter: Xiangrui Meng > Assignee: Wenchen Fan > Priority: Critical > Fix For: 2.0.0 > > > In SPARK-9390, we switched to use GenericArrayData to store indices and > values in vector/matrix UDTs. However, GenericArrayData is not specialized > for primitive types. This might hurt MLlib performance badly. We should > consider either specialize GenericArrayData or use a different container. > cc: [~cloud_fan] [~yhuai] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org