Thanks for the answer, I'm currently doing exactly that.
I'll try to sum-up the usual Pandas = Spark Dataframe caveats soon.
Regards,
Olivier.
Le mar. 2 juin 2015 à 02:38, Davies Liu dav...@databricks.com a écrit :
The second one sounds reasonable, I think.
On Thu, Apr 30, 2015 at 1:42 AM,
The second one sounds reasonable, I think.
On Thu, Apr 30, 2015 at 1:42 AM, Olivier Girardot
o.girar...@lateral-thoughts.com wrote:
Hi everyone,
Let's assume I have a complex workflow of more than 10 datasources as input
- 20 computations (some creating intermediary datasets and some merging