Hello All,

We have a requirement to run PySpark in standalone cluster mode and also 
reference python libraries (egg/wheel) which are not local but placed in a 
distributed storage like HDFS. From the code it looks like none of cases are 
supported.

Questions are:


  1.  Why is PySpark supported only in standalone client mode?
  2.  Why –py-files only support local files and not files stored in remote 
stores?

We will like to update the Spark code to support these scenarios but just want 
to be aware of any technical difficulties that the community has faced while 
trying to support those.

Thanks, Arijit

Reply via email to