Re: PySpark, numpy arrays and binary data

2014-08-07 Thread Rok Roskar
thanks for the quick answer! numpy array only can support basic types, so we can not use it during collect() by default. sure, but if you knew that a numpy array went in on one end, you could safely use it on the other end, no? Perhaps it would require an extension of the RDD class and

Re: PySpark, numpy arrays and binary data

2014-08-07 Thread Davies Liu
On Thu, Aug 7, 2014 at 12:06 AM, Rok Roskar rokros...@gmail.com wrote: sure, but if you knew that a numpy array went in on one end, you could safely use it on the other end, no? Perhaps it would require an extension of the RDD class and overriding the colect() method. Could you give a short

PySpark, numpy arrays and binary data

2014-08-06 Thread Rok Roskar
Hello, I'm interested in getting started with Spark to scale our scientific analysis package (http://pynbody.github.io) to larger data sets. The package is written in Python and makes heavy use of numpy/scipy and related frameworks. I've got a couple of questions that I have not been able to

Re: PySpark, numpy arrays and binary data

2014-08-06 Thread Davies Liu
numpy array only can support basic types, so we can not use it during collect() by default. Could you give a short example about how numpy array is used in your project? On Wed, Aug 6, 2014 at 8:41 AM, Rok Roskar rokros...@gmail.com wrote: Hello, I'm interested in getting started with Spark