Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11266#issuecomment-194540106
  
    My thoughts on the pros/cons of having Python's Pipeline be a wrapper for 
the Java Pipeline:
    
    Pros:
    * Less code duplication.  This would have an even higher impact for 
CrossValidator, which is annoying to implement twice.
    Cons:
    * This will break the code of anyone who has written a PipelineStage from 
Python.  We have not supported this very explicitly so far, but I think it's 
something we should support eventually.
    
    I'd propose:
    * Short-term (for 2.0): We do not make Pipeline into a Java wrapper.  But 
we implement save/load by transferring the stages to Java (as you did in this 
PR).
    * Long-term: We can eventually consider better ways to support Python users 
who wish to write their own PipelineStages in Python.  If useful for combining 
code paths, we could also consider making Pipeline into a Java wrapper, as long 
as we come up with a good way to have a Java wrapper for a PipelineStage 
defined in Python (like a Python UDF).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to