[jira] [Assigned] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps
[ https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33592: Assignee: Weichen Xu (was: Apache Spark) > Pyspark ML Validator writer may lost params in estimatorParamMaps > - > > Key: SPARK-33592 > URL: https://issues.apache.org/jira/browse/SPARK-33592 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.0.0, 3.1.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > Two typical cases to reproduce it: > (1) > {code:python} > tokenizer = Tokenizer(inputCol="text", outputCol="words") > hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features") > lr = LogisticRegression() > pipeline = Pipeline(stages=[tokenizer, hashingTF, lr]) > paramGrid = ParamGridBuilder() \ > .addGrid(hashingTF.numFeatures, [10, 100]) \ > .addGrid(lr.maxIter, [100, 200]) \ > .build() > tvs = TrainValidationSplit(estimator=pipeline, >estimatorParamMaps=paramGrid, >evaluator=MulticlassClassificationEvaluator()) > tvs.save(tvsPath) > loadedTvs = TrainValidationSplit.load(tvsPath) > {code} > Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params > `hashingTF.numFeatures` and `lr.maxIter` are lost. > (2) > {code:python} > lr = LogisticRegression() > ova = OneVsRest(classifier=lr) > grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build() > evaluator = MulticlassClassificationEvaluator() > tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, > evaluator=evaluator) > tvs.save(tvsPath) > loadedTvs = TrainValidationSplit.load(tvsPath) > {code} > Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning > params`lr.maxIter` are lost. > Both CrossValidator and TrainValidationSplit in Pyspark has this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps
[ https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33592: Assignee: Apache Spark (was: Weichen Xu) > Pyspark ML Validator writer may lost params in estimatorParamMaps > - > > Key: SPARK-33592 > URL: https://issues.apache.org/jira/browse/SPARK-33592 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.0.0, 3.1.0 >Reporter: Weichen Xu >Assignee: Apache Spark >Priority: Major > > Two typical cases to reproduce it: > (1) > {code:python} > tokenizer = Tokenizer(inputCol="text", outputCol="words") > hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features") > lr = LogisticRegression() > pipeline = Pipeline(stages=[tokenizer, hashingTF, lr]) > paramGrid = ParamGridBuilder() \ > .addGrid(hashingTF.numFeatures, [10, 100]) \ > .addGrid(lr.maxIter, [100, 200]) \ > .build() > tvs = TrainValidationSplit(estimator=pipeline, >estimatorParamMaps=paramGrid, >evaluator=MulticlassClassificationEvaluator()) > tvs.save(tvsPath) > loadedTvs = TrainValidationSplit.load(tvsPath) > {code} > Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params > `hashingTF.numFeatures` and `lr.maxIter` are lost. > (2) > {code:python} > lr = LogisticRegression() > ova = OneVsRest(classifier=lr) > grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build() > evaluator = MulticlassClassificationEvaluator() > tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, > evaluator=evaluator) > tvs.save(tvsPath) > loadedTvs = TrainValidationSplit.load(tvsPath) > {code} > Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning > params`lr.maxIter` are lost. > Both CrossValidator and TrainValidationSplit in Pyspark has this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps
[ https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu reassigned SPARK-33592: -- Assignee: Weichen Xu > Pyspark ML Validator writer may lost params in estimatorParamMaps > - > > Key: SPARK-33592 > URL: https://issues.apache.org/jira/browse/SPARK-33592 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.0.0, 3.1.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > Two typical cases to reproduce it: > (1) > {code:python} > tokenizer = Tokenizer(inputCol="text", outputCol="words") > hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features") > lr = LogisticRegression() > pipeline = Pipeline(stages=[tokenizer, hashingTF, lr]) > paramGrid = ParamGridBuilder() \ > .addGrid(hashingTF.numFeatures, [10, 100]) \ > .addGrid(lr.maxIter, [100, 200]) \ > .build() > tvs = TrainValidationSplit(estimator=pipeline, >estimatorParamMaps=paramGrid, >evaluator=MulticlassClassificationEvaluator()) > tvs.save(tvsPath) > loadedTvs = TrainValidationSplit.load(tvsPath) > {code} > Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params > `hashingTF.numFeatures` and `lr.maxIter` are lost. > (2) > {code:python} > lr = LogisticRegression() > ova = OneVsRest(classifier=lr) > grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build() > evaluator = MulticlassClassificationEvaluator() > tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, > evaluator=evaluator) > tvs.save(tvsPath) > loadedTvs = TrainValidationSplit.load(tvsPath) > {code} > Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning > params`lr.maxIter` are lost. > Both CrossValidator and TrainValidationSplit has this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org