RE: Python UDFs

2016-01-28 Thread Stefan Panayotov
...@msn.com spanayo...@outlook.com spanayo...@comcast.net > Date: Wed, 27 Jan 2016 15:03:06 -0800 > Subject: Re: Python UDFs > From: ja...@odersky.com > To: spanayo...@msn.com > CC: user@spark.apache.org > > Have you checked: > > - the mllib doc for python > https://s

Re: Python UDFs

2016-01-27 Thread Jakob Odersky
Have you checked: - the mllib doc for python https://spark.apache.org/docs/1.6.0/api/python/pyspark.mllib.html#pyspark.mllib.linalg.DenseVector - the udf doc https://spark.apache.org/docs/1.6.0/api/python/pyspark.sql.html#pyspark.sql.functions.udf You should be fine in returning a DenseVector

Python UDFs

2016-01-27 Thread Stefan Panayotov
Hi, I have defined a UDF in Scala like this: import org.apache.spark.mllib.linalg.Vector import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, Statistics} import org.apache.spark.mllib.linalg.DenseVector val determineVector = udf((a: Double, b: Double) => { val data: