Re: Possible SPIP to improve matrix and vector column type support

2018-05-12 Thread Leif Walsh
I filed an SPIP for this at https://issues.apache.org/jira/browse/SPARK-24258. Let’s discuss! On Wed, Apr 18, 2018 at 23:33 Leif Walsh wrote: > I agree we should reuse as much as possible. For PySpark, I think the > obvious choices of Breeze and numpy arrays already made

Re: Possible SPIP to improve matrix and vector column type support

2018-04-18 Thread Leif Walsh
I agree we should reuse as much as possible. For PySpark, I think the obvious choices of Breeze and numpy arrays already made make a lot of sense, I’m not sure about the other language bindings and would defer to others. I was under the impression that UDTs were gone and (probably?) not coming

Re: Possible SPIP to improve matrix and vector column type support

2018-04-18 Thread Joseph Bradley
Thanks for the thoughts! We've gone back and forth quite a bit about local linear algebra support in Spark. For reference, there have been some discussions here: https://issues.apache.org/jira/browse/SPARK-6442 https://issues.apache.org/jira/browse/SPARK-16365

Possible SPIP to improve matrix and vector column type support

2018-04-11 Thread Leif Walsh
Hi all, I’ve been playing around with the Vector and Matrix UDTs in pyspark.ml and I’ve found myself wanting more. There is a minor issue in that with the arrow serialization enabled, these types don’t serialize properly in python UDF calls or in toPandas. There’s a natural representation for