[jira] [Updated] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

2020-03-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24632:
--
Affects Version/s: (was: 3.0.0)
   3.1.0

> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers 
> for persistence
> --
>
> Key: SPARK-24632
> URL: https://issues.apache.org/jira/browse/SPARK-24632
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement 
> Python PipelineStages in 3rd-party libraries, include them in Pipelines, and 
> use Pipeline persistence.  This task is to make it easier for 3rd-party 
> libraries to have PipelineStages written in Java and then to use pyspark.ml 
> abstractions to create wrappers around those Java classes.  This is currently 
> possible, except that users hit bugs around persistence.
> I spent a bit thinking about this and wrote up thoughts and a proposal in the 
> doc linked below.  Summary of proposal:
> Require that 3rd-party libraries with Java classes with Python wrappers 
> implement a trait which provides the corresponding Python classpath in some 
> field:
> {code}
> trait PythonWrappable {
>   def pythonClassPath: String = …
> }
> MyJavaType extends PythonWrappable
> {code}
> This will not be required for MLlib wrappers, which we can handle specially.
> One issue for this task will be that we may have trouble writing unit tests.  
> They would ideally test a Java class + Python wrapper class pair sitting 
> outside of pyspark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

2019-07-16 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24632:
--
Affects Version/s: (was: 2.4.0)
   3.0.0

> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers 
> for persistence
> --
>
> Key: SPARK-24632
> URL: https://issues.apache.org/jira/browse/SPARK-24632
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement 
> Python PipelineStages in 3rd-party libraries, include them in Pipelines, and 
> use Pipeline persistence.  This task is to make it easier for 3rd-party 
> libraries to have PipelineStages written in Java and then to use pyspark.ml 
> abstractions to create wrappers around those Java classes.  This is currently 
> possible, except that users hit bugs around persistence.
> I spent a bit thinking about this and wrote up thoughts and a proposal in the 
> doc linked below.  Summary of proposal:
> Require that 3rd-party libraries with Java classes with Python wrappers 
> implement a trait which provides the corresponding Python classpath in some 
> field:
> {code}
> trait PythonWrappable {
>   def pythonClassPath: String = …
> }
> MyJavaType extends PythonWrappable
> {code}
> This will not be required for MLlib wrappers, which we can handle specially.
> One issue for this task will be that we may have trouble writing unit tests.  
> They would ideally test a Java class + Python wrapper class pair sitting 
> outside of pyspark.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

2018-06-25 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-24632:
--
Description: 
This is a follow-up for [SPARK-17025], which allowed users to implement Python 
PipelineStages in 3rd-party libraries, include them in Pipelines, and use 
Pipeline persistence.  This task is to make it easier for 3rd-party libraries 
to have PipelineStages written in Java and then to use pyspark.ml abstractions 
to create wrappers around those Java classes.  This is currently possible, 
except that users hit bugs around persistence.

I spent a bit thinking about this and wrote up thoughts and a proposal in the 
doc linked below.  Summary of proposal:

Require that 3rd-party libraries with Java classes with Python wrappers 
implement a trait which provides the corresponding Python classpath in some 
field:
{code}
trait PythonWrappable {
  def pythonClassPath: String = …
}
MyJavaType extends PythonWrappable
{code}
This will not be required for MLlib wrappers, which we can handle specially.

One issue for this task will be that we may have trouble writing unit tests.  
They would ideally test a Java class + Python wrapper class pair sitting 
outside of pyspark.

  was:
This is a follow-up for [SPARK-17025], which allowed users to implement Python 
PipelineStages in 3rd-party libraries, include them in Pipelines, and use 
Pipeline persistence.  This task is to make it easier for 3rd-party libraries 
to have PipelineStages written in Java and then to use pyspark.ml abstractions 
to create wrappers around those Java classes.  This is currently possible, 
except that users hit bugs around persistence.

Some fixes we'll need include:
* an overridable method for converting between Python and Java classpaths. See 
https://github.com/apache/spark/blob/b56e9c613fb345472da3db1a567ee129621f6bf3/python/pyspark/ml/util.py#L284
* 
https://github.com/apache/spark/blob/4e7d8678a3d9b12797d07f5497e0ed9e471428dd/python/pyspark/ml/pipeline.py#L378

One unusual thing for this task will be to write unit tests which test a custom 
PipelineStage written outside of the pyspark package.


> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers 
> for persistence
> --
>
> Key: SPARK-24632
> URL: https://issues.apache.org/jira/browse/SPARK-24632
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement 
> Python PipelineStages in 3rd-party libraries, include them in Pipelines, and 
> use Pipeline persistence.  This task is to make it easier for 3rd-party 
> libraries to have PipelineStages written in Java and then to use pyspark.ml 
> abstractions to create wrappers around those Java classes.  This is currently 
> possible, except that users hit bugs around persistence.
> I spent a bit thinking about this and wrote up thoughts and a proposal in the 
> doc linked below.  Summary of proposal:
> Require that 3rd-party libraries with Java classes with Python wrappers 
> implement a trait which provides the corresponding Python classpath in some 
> field:
> {code}
> trait PythonWrappable {
>   def pythonClassPath: String = …
> }
> MyJavaType extends PythonWrappable
> {code}
> This will not be required for MLlib wrappers, which we can handle specially.
> One issue for this task will be that we may have trouble writing unit tests.  
> They would ideally test a Java class + Python wrapper class pair sitting 
> outside of pyspark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

2018-06-22 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-24632:
--
Description: 
This is a follow-up for [SPARK-17025], which allowed users to implement Python 
PipelineStages in 3rd-party libraries, include them in Pipelines, and use 
Pipeline persistence.  This task is to make it easier for 3rd-party libraries 
to have PipelineStages written in Java and then to use pyspark.ml abstractions 
to create wrappers around those Java classes.  This is currently possible, 
except that users hit bugs around persistence.

Some fixes we'll need include:
* an overridable method for converting between Python and Java classpaths. See 
https://github.com/apache/spark/blob/b56e9c613fb345472da3db1a567ee129621f6bf3/python/pyspark/ml/util.py#L284
* 
https://github.com/apache/spark/blob/4e7d8678a3d9b12797d07f5497e0ed9e471428dd/python/pyspark/ml/pipeline.py#L378

One unusual thing for this task will be to write unit tests which test a custom 
PipelineStage written outside of the pyspark package.

  was:
This is a follow-up for [SPARK-17025], which allowed users to implement Python 
PipelineStages in 3rd-party libraries, include them in Pipelines, and use 
Pipeline persistence.  This task is to make it easier for 3rd-party libraries 
to have PipelineStages written in Java and then to use pyspark.ml abstractions 
to create wrappers around those Java classes.  This is currently possible, 
except that users hit bugs around persistence.

One fix we'll need is an overridable method for converting between Python and 
Java classpaths. See 
https://github.com/apache/spark/blob/b56e9c613fb345472da3db1a567ee129621f6bf3/python/pyspark/ml/util.py#L284

One unusual thing for this task will be to write unit tests which test a custom 
PipelineStage written outside of the pyspark package.


> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers 
> for persistence
> --
>
> Key: SPARK-24632
> URL: https://issues.apache.org/jira/browse/SPARK-24632
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement 
> Python PipelineStages in 3rd-party libraries, include them in Pipelines, and 
> use Pipeline persistence.  This task is to make it easier for 3rd-party 
> libraries to have PipelineStages written in Java and then to use pyspark.ml 
> abstractions to create wrappers around those Java classes.  This is currently 
> possible, except that users hit bugs around persistence.
> Some fixes we'll need include:
> * an overridable method for converting between Python and Java classpaths. 
> See 
> https://github.com/apache/spark/blob/b56e9c613fb345472da3db1a567ee129621f6bf3/python/pyspark/ml/util.py#L284
> * 
> https://github.com/apache/spark/blob/4e7d8678a3d9b12797d07f5497e0ed9e471428dd/python/pyspark/ml/pipeline.py#L378
> One unusual thing for this task will be to write unit tests which test a 
> custom PipelineStage written outside of the pyspark package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org