MatrixUDT and VectorUDT in Spark ML

Li Jin Fri, 23 Mar 2018 07:55:12 -0700

Hi All,

I came across these two types MatrixUDT and VectorUDF in Spark ML when
doing feature extraction and preprocessing with PySpark. However, when
trying to do some basic operations, such as vector multiplication and
matrix multiplication, I had to go down to Python UDF.


It seems to be it would be very useful to have built-in operators on these
types just like first class Spark SQL types, e.g.,

df.withColumn('v', df.matrix_column * df.vector_column)

I wonder what are other people's thoughts on this?

Li

MatrixUDT and VectorUDT in Spark ML

Reply via email to