[ https://issues.apache.org/jira/browse/SPARK-30154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng updated SPARK-30154: ---------------------------------- Description: If a PySpark user wants to convert MLlib sparse/dense vectors in a DataFrame into dense arrays, an efficient approach is to do that in JVM. However, it requires PySpark user to write Scala code and register it as a UDF. Often this is infeasible for a pure python project. What we can do is to predefine those converters in Scala and expose them in PySpark, e.g.: {code} from pyspark.ml.functions import vector_to_dense_array df.select(vector_to_dense_array(col("features")) {code} cc: [~weichenxu123] was: If a PySpark user wants to convert MLlib sparse/dense vectors in a DataFrame into dense arrays, an efficient method is to do that in JVM. However, it requires PySpark user to write Scala code and register it as a UDF. Often this is infeasible for a pure python project. What we can do is to predefine those converters in Scala and expose them in PySpark, e.g.: {code} from pyspark.ml.functions import vector_to_dense_array df.select(vector_to_dense_array(col("features")) {code} cc: [~weichenxu123] > PySpark UDF to convert MLlib vectors to dense arrays > ---------------------------------------------------- > > Key: SPARK-30154 > URL: https://issues.apache.org/jira/browse/SPARK-30154 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib, PySpark > Affects Versions: 3.0.0 > Reporter: Xiangrui Meng > Priority: Major > > If a PySpark user wants to convert MLlib sparse/dense vectors in a DataFrame > into dense arrays, an efficient approach is to do that in JVM. However, it > requires PySpark user to write Scala code and register it as a UDF. Often > this is infeasible for a pure python project. > What we can do is to predefine those converters in Scala and expose them in > PySpark, e.g.: > {code} > from pyspark.ml.functions import vector_to_dense_array > df.select(vector_to_dense_array(col("features")) > {code} > cc: [~weichenxu123] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org