PySpark: breakdown application execution time and fine-tuning the application

2015-11-10 Thread saluc
Hello, I am using PySpark to develop my big-data application. I have the impression that most of the execution of my application is spent on the infrastructure (distributing the code and the data in the cluster, IPC between the Python processes and the JVM) rather than on the computation itself.

PySpark: breakdown application execution time and fine-tuning

2015-10-17 Thread saluc
Hello, I am using PySpark to develop my big-data application. I have the impression that most of the execution of my application is spent on the infrastructure (distributing the code and the data in the cluster, IPC between the Python processes and the JVM) rather than on the computation itself.