[ https://issues.apache.org/jira/browse/SPARK-11497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992923#comment-14992923 ]
Joseph K. Bradley commented on SPARK-11497: ------------------------------------------- Can you please paste code which reproduces this problem here? Also, what version of Spark were you using? > PySpark RowMatrix Constructor Has Type Erasure Issue > ---------------------------------------------------- > > Key: SPARK-11497 > URL: https://issues.apache.org/jira/browse/SPARK-11497 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark > Reporter: Mike Dusenberry > > Implementing tallSkinnyQR in SPARK-9656 uncovered a bug with our PySpark > RowMatrix constructor. As discussed on the dev list > [here|http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html], > there appears to be an issue with type erasure with RDDs coming from Java, > and by extension from PySpark. Although we are attempting to construct a > RowMatrix from an RDD[Vector] in > [PythonMLlibAPI|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115], > the Vector type is erased, resulting in an RDD[Object]. Thus, when calling > Scala's tallSkinnyQR from PySpark, we get a Java ClassCastException in which > an Object cannot be cast to a Spark Vector. As noted in the aforementioned > dev list thread, this issue was also encountered with DecisionTrees, and the > fix involved an explicit retag of the RDD with a Vector type. Thus, this PR > will apply that fix to the createRowMatrix helper function in PythonMLlibAPI. > IndexedRowMatrix and CoordinateMatrix do not appear to have this issue likely > due to their related helper functions in PythonMLlibAPI creating the RDDs > explicitly from DataFrames with pattern matching, thus preserving the types. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org