Re: Best strategy for Pandas - Spark

2015-06-02 Thread Olivier Girardot
Thanks for the answer, I'm currently doing exactly that. I'll try to sum-up the usual Pandas = Spark Dataframe caveats soon. Regards, Olivier. Le mar. 2 juin 2015 à 02:38, Davies Liu dav...@databricks.com a écrit : The second one sounds reasonable, I think. On Thu, Apr 30, 2015 at 1:42 AM,

Re: Best strategy for Pandas - Spark

2015-06-01 Thread Davies Liu
The second one sounds reasonable, I think. On Thu, Apr 30, 2015 at 1:42 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi everyone, Let's assume I have a complex workflow of more than 10 datasources as input - 20 computations (some creating intermediary datasets and some merging

Best strategy for Pandas - Spark

2015-04-30 Thread Olivier Girardot
Hi everyone, Let's assume I have a complex workflow of more than 10 datasources as input - 20 computations (some creating intermediary datasets and some merging everything for the final computation) - some taking on average 1 minute to complete and some taking more than 30 minutes. What would be