Re: PySpark Client

Chris Beavers Thu, 22 Jan 2015 13:59:45 -0800

Hey Andrew,

Thanks for the response. Is this the issue you're referring to (the
duplicate linked there has an associated patch):
https://issues.apache.org/jira/browse/SPARK-5162 ?


Just to confirm that I understand this: with this patch, Python jobs can be
submitted to YARN, and a node from the cluster will act as the driver,
meaning that the Python version of the submission client vs. cluster
shouldn't be an issue?

Thanks,
Chris

On Tue, Jan 20, 2015 at 10:34 AM, Andrew Or <and...@databricks.com> wrote:

> Hi Chris,
>
> Short answer is no, not yet.
>
> Longer answer is that PySpark only supports client mode, which means your
> driver runs on the same machine as your submission client. By corollary
> this means your submission client must currently depend on all of Spark and
> its dependencies. There is a patch that supports this for *cluster* mode
> (as opposed to client mode), which would be the first step towards what you
> want.
>
> -Andrew
>
> 2015-01-20 8:36 GMT-08:00 Chris Beavers <cbeav...@trifacta.com>:
>
> Hey all,
>>
>> Is there any notion of a lightweight python client for submitting jobs to
>> a Spark cluster remotely? If I essentially install Spark on the client
>> machine, and that machine has the same OS, same version of Python, etc.,
>> then I'm able to communicate with the cluster just fine. But if Python
>> versions differ slightly, then I start to see a lot of opaque errors that
>> often bubble up as EOFExceptions. Furthermore, this just seems like a very
>> heavy weight way to set up a client.
>>
>> Does anyone have any suggestions for setting up a thin pyspark client on
>> a node which doesn't necessarily conform to the homogeneity of the target
>> Spark cluster?
>>
>> Best,
>> Chris
>>
>
>

Re: PySpark Client

Reply via email to