[jira] [Issue Comment Deleted] (SPARK-27621) Calling transform() method on a LinearRegressionModel throws NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-27621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anca Sarb updated SPARK-27621: -- Comment: was deleted (was: I've created a PR with the fix here [https://github.com/apache/spark/pull/24509]) > Calling transform() method on a LinearRegressionModel throws > NoSuchElementException > --- > > Key: SPARK-27621 > URL: https://issues.apache.org/jira/browse/SPARK-27621 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2 >Reporter: Anca Sarb >Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > When transform(...) method is called on a LinearRegressionModel created > directly with the coefficients and intercepts, the following exception is > encountered. > {code:java} > java.util.NoSuchElementException: Failed to find a default value for loss at > org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) > at > org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) > at scala.Option.getOrElse(Option.scala:121) at > org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at > org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at > org.apache.spark.ml.param.Params$class.$(params.scala:786) at > org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at > org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111) > at > org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637) > at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) > at > org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) > at > org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at > org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at > org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at > org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305) > {code} > This is because validateAndTransformSchema() is called both during training > and scoring phases, but the checks against the training related params like > loss should really be performed during training phase only, I think, please > correct me if I'm missing anything. > This issue was first reported for mleap > ([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because > basically when we serialize the Spark transformers for mleap, we only > serialize the params that are relevant for scoring. We do have the option to > de-serialize the serialized transformers back into Spark for scoring again, > but in that case, we no longer have all the training params. > Test to reproduce in PR: [https://github.com/apache/spark/pull/24509] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27621) Calling transform() method on a LinearRegressionModel throws NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-27621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anca Sarb updated SPARK-27621: -- Description: When transform(...) method is called on a LinearRegressionModel created directly with the coefficients and intercepts, the following exception is encountered. {code:java} java.util.NoSuchElementException: Failed to find a default value for loss at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at org.apache.spark.ml.param.Params$class.$(params.scala:786) at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111) at org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637) at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305) {code} This is because validateAndTransformSchema() is called both during training and scoring phases, but the checks against the training related params like loss should really be performed during training phase only, I think, please correct me if I'm missing anything. This issue was first reported for mleap ([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because basically when we serialize the Spark transformers for mleap, we only serialize the params that are relevant for scoring. We do have the option to de-serialize the serialized transformers back into Spark for scoring again, but in that case, we no longer have all the training params. Test to reproduce in PR: [https://github.com/apache/spark/pull/24509] was: When transform(...) method is called on a LinearRegressionModel created directly with the coefficients and intercepts, the following exception is encountered. {code:java} java.util.NoSuchElementException: Failed to find a default value for loss at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at org.apache.spark.ml.param.Params$class.$(params.scala:786) at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111) at org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637) at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305) {code} This is because validateAndTransformSchema() is called both during training and scoring phases, but the checks against the training related params like loss should really be performed during training phase only, I think, please correct me if I'm missing anything :) This issue was first reported for mleap ([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because basically when we serialize the Spark transformers for mleap, we only serialize the params that are relevant for scoring. We do have the option to de-serialize the serialized transformers back into Spark for scoring again, but in that case, we no longer have all the
[jira] [Commented] (SPARK-27621) Calling transform() method on a LinearRegressionModel throws NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-27621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831499#comment-16831499 ] Anca Sarb commented on SPARK-27621: --- I've created a PR with the fix here [https://github.com/apache/spark/pull/24509] > Calling transform() method on a LinearRegressionModel throws > NoSuchElementException > --- > > Key: SPARK-27621 > URL: https://issues.apache.org/jira/browse/SPARK-27621 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2 >Reporter: Anca Sarb >Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > When transform(...) method is called on a LinearRegressionModel created > directly with the coefficients and intercepts, the following exception is > encountered. > {code:java} > java.util.NoSuchElementException: Failed to find a default value for loss at > org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) > at > org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) > at scala.Option.getOrElse(Option.scala:121) at > org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at > org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at > org.apache.spark.ml.param.Params$class.$(params.scala:786) at > org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at > org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111) > at > org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637) > at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) > at > org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) > at > org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) > at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at > org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at > org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at > org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305) > {code} > This is because validateAndTransformSchema() is called both during training > and scoring phases, but the checks against the training related params like > loss should really be performed during training phase only, I think, please > correct me if I'm missing anything :) > This issue was first reported for mleap > ([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because > basically when we serialize the Spark transformers for mleap, we only > serialize the params that are relevant for scoring. We do have the option to > de-serialize the serialized transformers back into Spark for scoring again, > but in that case, we no longer have all the training params. > Test to reproduce in PR: [https://github.com/apache/spark/pull/24509] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27621) Calling transform() method on a LinearRegressionModel throws NoSuchElementException
Anca Sarb created SPARK-27621: - Summary: Calling transform() method on a LinearRegressionModel throws NoSuchElementException Key: SPARK-27621 URL: https://issues.apache.org/jira/browse/SPARK-27621 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.4.2, 2.4.1, 2.4.0, 2.3.3, 2.3.2, 2.3.1, 2.3.0, 2.3.4 Reporter: Anca Sarb When transform(...) method is called on a LinearRegressionModel created directly with the coefficients and intercepts, the following exception is encountered. {code:java} java.util.NoSuchElementException: Failed to find a default value for loss at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at org.apache.spark.ml.param.Params$class.$(params.scala:786) at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111) at org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637) at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) at org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305) {code} This is because validateAndTransformSchema() is called both during training and scoring phases, but the checks against the training related params like loss should really be performed during training phase only, I think, please correct me if I'm missing anything :) This issue was first reported for mleap ([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because basically when we serialize the Spark transformers for mleap, we only serialize the params that are relevant for scoring. We do have the option to de-serialize the serialized transformers back into Spark for scoring again, but in that case, we no longer have all the training params. Test to reproduce in PR: [https://github.com/apache/spark/pull/24509] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org