Hello,

I am using PySpark to develop my big-data application. I have the impression
that most of the execution of my application is spent on the infrastructure
(distributing the code and the data in the cluster, IPC between the Python
processes and the JVM) rather than on the  computation itself. I would be
interested in particular in measuring the time spent in the IPC between the
Python processes and the JVM.

I would like to ask you, is there a way to breakdown the execution time in
order to have more details on how much time is effectively spent on the
different phases of the execution, so to have some kind of detailed
profiling of the execution time, and have more information for fine-tuning
the application?

Thank you very much for your help and support,
Luca



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-breakdown-application-execution-time-and-fine-tuning-the-application-tp25350.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to