Hello,
I am using PySpark to develop my big-data application. I have the impression
that most of the execution of my application is spent on the infrastructure
(distributing the code and the data in the cluster, IPC between the Python
processes and the JVM) rather than on the computation itself.
Hello,
I am using PySpark to develop my big-data application. I have the impression
that most of the execution of my application is spent on the infrastructure
(distributing the code and the data in the cluster, IPC between the Python
processes and the JVM) rather than on the computation itself.