[GitHub] [spark] zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON

2019-09-22 Thread GitBox
zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common 
classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in 
PYTHON
URL: https://github.com/apache/spark/pull/25776#issuecomment-533975864
 
 
   @zero323 
   This PR will not directly affect your own implementation, in my opinion.
   As to your proposal, I personally think it is reasonable, although 
Pyspark.ML is
   mostly there to wrap the Scala side. We may open another ticket for it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON

2019-09-19 Thread GitBox
zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common 
classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in 
PYTHON
URL: https://github.com/apache/spark/pull/25776#issuecomment-533393082
 
 
   @zero323 hi, how newly add common classes in this PR affects the end users 
to implement their own hierarchy?  Could you please provide a user case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON

2019-09-18 Thread GitBox
zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common 
classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in 
PYTHON
URL: https://github.com/apache/spark/pull/25776#issuecomment-532566063
 
 
   @srowen Most models in Pyspark do not have any setter/getter (One exception 
is OneVsRest). And no model has prediction function.
   
   A main complaint about PySpark-ML I heard from the uers of JD's bigdate 
platform is that they can not set the input/output column name of models. It is 
inconvenient to rename some columns to avoid column conflicts.
   Suppose we deal with a classification task in a interactive mode(like 
jupyter). We have trained some classification models with default columns 
names, we evaluate them one by one, and then want to ensamble some good models. 
Now we must rename the `predictionCol` of some models after transformation, 
since all model have the same column name. Otherwise, we need to re-train them 
with modified column names. Similar cases are easy to happen when we deal with 
dataframe with tens of columns and try several algorithms. So we want the 
column setters like the scala side.
   
   The goal is to make the py side in sync with the scala side. It has two 
benefits: 1, it will be easy to maintain the codebase, when we change the scala 
side, it is easy to sync in the py side; 2, function parity, methods like 
models' getter are still missing in the py side.
   I try to devide the goal into serveral subtasks in 
[SPARK-28958](https://issues.apache.org/jira/browse/SPARK-28958), after this PR 
we need to resolve others.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org