Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/3637#issuecomment-66368865
  
    @srowen  @Lewuathe  Continuing the above inline discussion...
    
    Question: Should the typed interface be public?
    
    New proposal: Hide the typed interface of Estimators.  Leave the typed 
interface of Transformers exposed.
    
    Argument:
    * The typed interface loses metadata which SchemaRDD can (but does not yet) 
store.
      * E.g., for Classifiers, it is good to know the number of classes to 
predict, which features are categorical, and the number of categories for each 
categorical feature.  The current typed train() methods do not have this info; 
to pass in this info, we'll need either (a) extra parameters in train() which 
would make Classifiers have a different signature than other Estimators' 
train() methods or (b) extra embedded parameters in Classifiers which would be 
ignored when using the fit(SchemaRDD) interface.  Neither option sounds good to 
me.
      * We could use a typed interface with stronger typing for features, but 
that would still not cover metadata like # classes / categories.
      * This metadata is important for training, but it is not important for 
testing.  We would just need to make sure that Vectors passed predict() methods 
had the same feature order as used for training.
    * I would guess the typed interface would be most useful for Models.  This 
is based on me assuming that:
      * Models will be kept for longer and might have predict() methods called 
multiple times, including on individual instances, and
      * Models might need typed APIs for efficiency if used in production.
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to