haiyangsea created SPARK-19469:
----------------------------------

             Summary: PySpark should allow driver process on different machine
                 Key: SPARK-19469
                 URL: https://issues.apache.org/jira/browse/SPARK-19469
             Project: Spark
          Issue Type: Wish
          Components: PySpark
    Affects Versions: 1.6.3
            Reporter: haiyangsea


In my scenario, there is a resident spark driver process(creating by yarn 
cluster mode), all PySpark python client will connect to this resident driver 
process and share the SparkContext. In python client process, I also run other 
applications that have requirements for the environment.So I want separate the 
PySpark python client process and the driver process.

There are two main limitations in PySpark:

1. *parallelize* method uses local file to transform data between java process 
and python process.

2. using the hard code *localhost* to transform task result or intermediate 
data between java process and python process.

Is there any way to achieve my goal?




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to