Ravi created SPARK-25309:
----------------------------

             Summary: Sci-kit Learn like Auto Pipeline Parallelization in Spark 
                 Key: SPARK-25309
                 URL: https://issues.apache.org/jira/browse/SPARK-25309
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 2.3.1
            Reporter: Ravi


SPARK-19357 and SPARK-21911 have helped parallelize Pipelines in Spark. 
However, instead of setting the parallelism Parameter in the CrossValidator it 
would be good to have something like njobs=-1 (like Scikit Learn) where the 
Pipleline DAG could be automatically parallelized and scheduled based on the 
resources allocated to the Spark Session instead of having the user pick the 
integer value for this parameter. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to