[GitHub] [spark] zero323 commented on issue #27241: [SPARK-30533][ML][PYSPARK] Add classes to represent Java Regressors and RegressionModels

2020-01-17 Thread GitBox
zero323 commented on issue #27241: [SPARK-30533][ML][PYSPARK] Add classes to 
represent Java Regressors and RegressionModels
URL: https://github.com/apache/spark/pull/27241#issuecomment-575855169
 
 
   Thanks @huaxingao @srowen @zhengruifeng 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zero323 commented on issue #27241: [SPARK-30533][ML][PYSPARK] Add classes to represent Java Regressors and RegressionModels

2020-01-17 Thread GitBox
zero323 commented on issue #27241: [SPARK-30533][ML][PYSPARK] Add classes to 
represent Java Regressors and RegressionModels
URL: https://github.com/apache/spark/pull/27241#issuecomment-575582976
 
 
   > The Regressor class is basically empty and I am not sure if we should add 
another layer of abstraction, so I chose not to add Regressor/Regressor class 
on python side when I did #27168. But I know you can argue that we need change 
python too to keep the parity between scala and python. I am OK either way.
   
   I am actually not that interested in  parity (as it is right now it provides 
little or no value to the end user, inflates `pyspark.ml` codebase, and 
actually increases effort required to maintain the whole thing) as much as 
practical value. As I argued in discussion around 
https://github.com/apache/spark/pull/25776#issuecomment-533488999 ability to 
distinguish between types of predictors is fundamental for building complex ML 
workflows, and current API is not sufficient to that (`Classifiers` and non 
classifier `Predictors` usually have the same API, and produce identical output 
schema).
   
   > the Regressor class is basically empty
   
   I am afraid that's, for good or bad, argument you can make against 
significant chunk of the API.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zero323 commented on issue #27241: [SPARK-30533][ML][PYSPARK] Add classes to represent Java Regressors and RegressionModels

2020-01-16 Thread GitBox
zero323 commented on issue #27241: [SPARK-30533][ML][PYSPARK] Add classes to 
represent Java Regressors and RegressionModels
URL: https://github.com/apache/spark/pull/27241#issuecomment-575426906
 
 
   > Is this more for symmetry than anything else? I don't have a strong 
opinion. I'd defer to @huaxingao and @zhengruifeng who have been watching this 
much more closely.
   
   To some extent. In general it useful to be able to distinguish between 
`Regressors`, `Classiffiers` and other types of `Params` (more about 
[here](https://issues.apache.org/jira/browse/SPARK-29212)).  Additionally we 
cannot really get Scala parity without these.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org