Re: PySpark API on top of Apache Arrow

2018-05-26 Thread Jules Damji
3/benchmarking-apache-spark-on-a-single-node-machine.html >> >> regars, >> >> 2018-05-23 22:30 GMT+02:00 Corey Nolet <cjno...@gmail.com>: >>> Please forgive me if this question has been asked already. >>> >>> I'm working in Python with Arrow+Pla

Re: PySpark API on top of Apache Arrow

2018-05-26 Thread Corey Nolet
g- > apache-spark-on-a-single-node-machine.html > > regars, > > 2018-05-23 22:30 GMT+02:00 Corey Nolet <cjno...@gmail.com>: > >> Please forgive me if this question has been asked already. >> >> I'm working in Python with Arrow+Plasma+Pandas Dataframes. I'

Re: PySpark API on top of Apache Arrow

2018-05-26 Thread Nicolas Paris
8-05-23 22:30 GMT+02:00 Corey Nolet <cjno...@gmail.com>: > Please forgive me if this question has been asked already. > > I'm working in Python with Arrow+Plasma+Pandas Dataframes. I'm curious if > anyone knows of any efforts to implement the PySpark API on top of Apache >

PySpark API on top of Apache Arrow

2018-05-23 Thread Corey Nolet
Please forgive me if this question has been asked already. I'm working in Python with Arrow+Plasma+Pandas Dataframes. I'm curious if anyone knows of any efforts to implement the PySpark API on top of Apache Arrow directly. In my case, I'm doing data science on a machine with 288 cores and 1TB