Re: PySpark API on top of Apache Arrow

Nicolas Paris Sat, 26 May 2018 11:52:26 -0700

hi corey

not familiar with arrow, plasma. However recently read an article about
spark on
a standalone machine (your case). Sounds like you could take benefit of
pyspark
"as-is"


https://databricks.com/blog/2018/05/03/benchmarking-apache-spark-on-a-single-node-machine.html

regars,

2018-05-23 22:30 GMT+02:00 Corey Nolet <cjno...@gmail.com>:

> Please forgive me if this question has been asked already.
>
> I'm working in Python with Arrow+Plasma+Pandas Dataframes. I'm curious if
> anyone knows of any efforts to implement the PySpark API on top of Apache
> Arrow directly. In my case, I'm doing data science on a machine with 288
> cores and 1TB of ram.
>
> It would make life much easier if I was able to use the flexibility of the
> PySpark API (rather than having to be tied to the operations in Pandas). It
> seems like an implementation would be fairly straightforward using the
> Plasma server and object_ids.
>
> If you have not heard of an effort underway to accomplish this, any
> reasons why it would be a bad idea?
>
>
> Thanks!
>

Re: PySpark API on top of Apache Arrow

Reply via email to