Hello All, We have a requirement to run PySpark in standalone cluster mode and also reference python libraries (egg/wheel) which are not local but placed in a distributed storage like HDFS. From the code it looks like none of cases are supported.
Questions are: 1. Why is PySpark supported only in standalone client mode? 2. Why –py-files only support local files and not files stored in remote stores? We will like to update the Spark code to support these scenarios but just want to be aware of any technical difficulties that the community has faced while trying to support those. Thanks, Arijit