I've never tried to run a stand-alone cluster alongside hadoop, but why not
run Spark as a yarn application? That way it can absolutely (in fact
preferably) use the distributed file system.

On Fri, Nov 9, 2018 at 5:04 PM, Arijit Tarafdar <arij...@live.com> wrote:

> Hello All,
>
>
>
> We have a requirement to run PySpark in standalone cluster mode and also
> reference python libraries (egg/wheel) which are not local but placed in a
> distributed storage like HDFS. From the code it looks like none of cases
> are supported.
>
>
>
> Questions are:
>
>
>
>    1. Why is PySpark supported only in standalone client mode?
>    2. Why –py-files only support local files and not files stored in
>    remote stores?
>
>
>
> We will like to update the Spark code to support these scenarios but just
> want to be aware of any technical difficulties that the community has faced
> while trying to support those.
>
>
>
> Thanks, Arijit
>

Reply via email to