Joseph K. Bradley created SPARK-24632:
-----------------------------------------

             Summary: Allow 3rd-party libraries to use pyspark.ml abstractions 
for Java wrappers for persistence
                 Key: SPARK-24632
                 URL: https://issues.apache.org/jira/browse/SPARK-24632
             Project: Spark
          Issue Type: Improvement
          Components: ML, PySpark
    Affects Versions: 2.4.0
            Reporter: Joseph K. Bradley


This is a follow-up for [SPARK-17025], which allowed users to implement Python 
PipelineStages in 3rd-party libraries, include them in Pipelines, and use 
Pipeline persistence.  This task is to make it easier for 3rd-party libraries 
to have PipelineStages written in Java and then to use pyspark.ml abstractions 
to create wrappers around those Java classes.  This is currently possible, 
except that users hit bugs around persistence.

One fix we'll need is an overridable method for converting between Python and 
Java classpaths. See 
https://github.com/apache/spark/blob/b56e9c613fb345472da3db1a567ee129621f6bf3/python/pyspark/ml/util.py#L284

One unusual thing for this task will be to write unit tests which test a custom 
PipelineStage written outside of the pyspark package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to