[ https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544703#comment-16544703 ]
Bryan Cutler commented on SPARK-24632: -------------------------------------- Hi [~josephkb], would you mind clarifying why there needs to be an additional trait in Scala to point to Python class paths, instead of something to override the line {code:java} stage_name = java_stage.getClass().getName().replace("org.apache.spark", "pyspark") {code} in wrapper.py? Ideally the Scala classes should not be aware of the Python, and when loading, the Python esitmators/models should be able to create the Java object and wrap it as long as the line above has the correct class prefix? Thanks! > Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers > for persistence > ------------------------------------------------------------------------------------------ > > Key: SPARK-24632 > URL: https://issues.apache.org/jira/browse/SPARK-24632 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark > Affects Versions: 2.4.0 > Reporter: Joseph K. Bradley > Assignee: Joseph K. Bradley > Priority: Major > > This is a follow-up for [SPARK-17025], which allowed users to implement > Python PipelineStages in 3rd-party libraries, include them in Pipelines, and > use Pipeline persistence. This task is to make it easier for 3rd-party > libraries to have PipelineStages written in Java and then to use pyspark.ml > abstractions to create wrappers around those Java classes. This is currently > possible, except that users hit bugs around persistence. > I spent a bit thinking about this and wrote up thoughts and a proposal in the > doc linked below. Summary of proposal: > Require that 3rd-party libraries with Java classes with Python wrappers > implement a trait which provides the corresponding Python classpath in some > field: > {code} > trait PythonWrappable { > def pythonClassPath: String = … > } > MyJavaType extends PythonWrappable > {code} > This will not be required for MLlib wrappers, which we can handle specially. > One issue for this task will be that we may have trouble writing unit tests. > They would ideally test a Java class + Python wrapper class pair sitting > outside of pyspark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org