subject:"Re\: Best strategy for Pandas \-> Spark"

Re: Best strategy for Pandas - Spark

2015-06-02 Thread Olivier Girardot

Thanks for the answer, I'm currently doing exactly that. I'll try to sum-up the usual Pandas = Spark Dataframe caveats soon. Regards, Olivier. Le mar. 2 juin 2015 à 02:38, Davies Liu dav...@databricks.com a écrit : The second one sounds reasonable, I think. On Thu, Apr 30, 2015 at 1:42 AM,

Re: Best strategy for Pandas - Spark

2015-06-01 Thread Davies Liu

The second one sounds reasonable, I think. On Thu, Apr 30, 2015 at 1:42 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi everyone, Let's assume I have a complex workflow of more than 10 datasources as input - 20 computations (some creating intermediary datasets and some merging