[GitHub] [spark] zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON
zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON URL: https://github.com/apache/spark/pull/25776#issuecomment-533975864 @zero323 This PR will not directly affect your own implementation, in my opinion. As to your proposal, I personally think it is reasonable, although Pyspark.ML is mostly there to wrap the Scala side. We may open another ticket for it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON
zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON URL: https://github.com/apache/spark/pull/25776#issuecomment-533393082 @zero323 hi, how newly add common classes in this PR affects the end users to implement their own hierarchy? Could you please provide a user case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON
zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON URL: https://github.com/apache/spark/pull/25776#issuecomment-532566063 @srowen Most models in Pyspark do not have any setter/getter (One exception is OneVsRest). And no model has prediction function. A main complaint about PySpark-ML I heard from the uers of JD's bigdate platform is that they can not set the input/output column name of models. It is inconvenient to rename some columns to avoid column conflicts. Suppose we deal with a classification task in a interactive mode(like jupyter). We have trained some classification models with default columns names, we evaluate them one by one, and then want to ensamble some good models. Now we must rename the `predictionCol` of some models after transformation, since all model have the same column name. Otherwise, we need to re-train them with modified column names. Similar cases are easy to happen when we deal with dataframe with tens of columns and try several algorithms. So we want the column setters like the scala side. The goal is to make the py side in sync with the scala side. It has two benefits: 1, it will be easy to maintain the codebase, when we change the scala side, it is easy to sync in the py side; 2, function parity, methods like models' getter are still missing in the py side. I try to devide the goal into serveral subtasks in [SPARK-28958](https://issues.apache.org/jira/browse/SPARK-28958), after this PR we need to resolve others. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org