[jira] [Issue Comment Deleted] (SPARK-27621) Calling transform() method on a LinearRegressionModel throws NoSuchElementException

2019-05-02 Thread Anca Sarb (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anca Sarb updated SPARK-27621:
--
Comment: was deleted

(was: I've created a PR with the fix here 
[https://github.com/apache/spark/pull/24509])

> Calling transform() method on a LinearRegressionModel throws 
> NoSuchElementException
> ---
>
> Key: SPARK-27621
> URL: https://issues.apache.org/jira/browse/SPARK-27621
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2
>Reporter: Anca Sarb
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When transform(...) method is called on a LinearRegressionModel created 
> directly with the coefficients and intercepts, the following exception is 
> encountered.
> {code:java}
> java.util.NoSuchElementException: Failed to find a default value for loss at 
> org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
>  at 
> org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
>  at scala.Option.getOrElse(Option.scala:121) at 
> org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at 
> org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at 
> org.apache.spark.ml.param.Params$class.$(params.scala:786) at 
> org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at 
> org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111)
>  at 
> org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637)
>  at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) 
> at 
> org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
>  at 
> org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>  at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at 
> org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at 
> org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at 
> org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305)
> {code}
> This is because validateAndTransformSchema() is called both during training 
> and scoring phases, but the checks against the training related params like 
> loss should really be performed during training phase only, I think, please 
> correct me if I'm missing anything.
> This issue was first reported for mleap 
> ([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because 
> basically when we serialize the Spark transformers for mleap, we only 
> serialize the params that are relevant for scoring. We do have the option to 
> de-serialize the serialized transformers back into Spark for scoring again, 
> but in that case, we no longer have all the training params.
> Test to reproduce in PR: [https://github.com/apache/spark/pull/24509]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27621) Calling transform() method on a LinearRegressionModel throws NoSuchElementException

2019-05-02 Thread Anca Sarb (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anca Sarb updated SPARK-27621:
--
Description: 
When transform(...) method is called on a LinearRegressionModel created 
directly with the coefficients and intercepts, the following exception is 
encountered.
{code:java}
java.util.NoSuchElementException: Failed to find a default value for loss at 
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
 at 
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
 at scala.Option.getOrElse(Option.scala:121) at 
org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at 
org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at 
org.apache.spark.ml.param.Params$class.$(params.scala:786) at 
org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at 
org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111)
 at 
org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637)
 at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) at 
org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
 at 
org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
 at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) 
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
 at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at 
org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at 
org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at 
org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305)
{code}
This is because validateAndTransformSchema() is called both during training and 
scoring phases, but the checks against the training related params like loss 
should really be performed during training phase only, I think, please correct 
me if I'm missing anything.

This issue was first reported for mleap 
([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because 
basically when we serialize the Spark transformers for mleap, we only serialize 
the params that are relevant for scoring. We do have the option to de-serialize 
the serialized transformers back into Spark for scoring again, but in that 
case, we no longer have all the training params.

Test to reproduce in PR: [https://github.com/apache/spark/pull/24509]

 

  was:
When transform(...) method is called on a LinearRegressionModel created 
directly with the coefficients and intercepts, the following exception is 
encountered.
{code:java}
java.util.NoSuchElementException: Failed to find a default value for loss at 
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
 at 
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
 at scala.Option.getOrElse(Option.scala:121) at 
org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at 
org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at 
org.apache.spark.ml.param.Params$class.$(params.scala:786) at 
org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at 
org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111)
 at 
org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637)
 at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) at 
org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
 at 
org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
 at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) 
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
 at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at 
org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at 
org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at 
org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305)
{code}
This is because validateAndTransformSchema() is called both during training and 
scoring phases, but the checks against the training related params like loss 
should really be performed during training phase only, I think, please correct 
me if I'm missing anything :)

This issue was first reported for mleap 
([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because 
basically when we serialize the Spark transformers for mleap, we only serialize 
the params that are relevant for scoring. We do have the option to de-serialize 
the serialized transformers back into Spark for scoring again, but in that 
case, we no longer have all the 

[jira] [Commented] (SPARK-27621) Calling transform() method on a LinearRegressionModel throws NoSuchElementException

2019-05-02 Thread Anca Sarb (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831499#comment-16831499
 ] 

Anca Sarb commented on SPARK-27621:
---

I've created a PR with the fix here [https://github.com/apache/spark/pull/24509]

> Calling transform() method on a LinearRegressionModel throws 
> NoSuchElementException
> ---
>
> Key: SPARK-27621
> URL: https://issues.apache.org/jira/browse/SPARK-27621
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2
>Reporter: Anca Sarb
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When transform(...) method is called on a LinearRegressionModel created 
> directly with the coefficients and intercepts, the following exception is 
> encountered.
> {code:java}
> java.util.NoSuchElementException: Failed to find a default value for loss at 
> org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
>  at 
> org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
>  at scala.Option.getOrElse(Option.scala:121) at 
> org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at 
> org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at 
> org.apache.spark.ml.param.Params$class.$(params.scala:786) at 
> org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at 
> org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111)
>  at 
> org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637)
>  at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) 
> at 
> org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
>  at 
> org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>  at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at 
> org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at 
> org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at 
> org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305)
> {code}
> This is because validateAndTransformSchema() is called both during training 
> and scoring phases, but the checks against the training related params like 
> loss should really be performed during training phase only, I think, please 
> correct me if I'm missing anything :)
> This issue was first reported for mleap 
> ([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because 
> basically when we serialize the Spark transformers for mleap, we only 
> serialize the params that are relevant for scoring. We do have the option to 
> de-serialize the serialized transformers back into Spark for scoring again, 
> but in that case, we no longer have all the training params.
> Test to reproduce in PR: [https://github.com/apache/spark/pull/24509]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27621) Calling transform() method on a LinearRegressionModel throws NoSuchElementException

2019-05-02 Thread Anca Sarb (JIRA)
Anca Sarb created SPARK-27621:
-

 Summary: Calling transform() method on a LinearRegressionModel 
throws NoSuchElementException
 Key: SPARK-27621
 URL: https://issues.apache.org/jira/browse/SPARK-27621
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 2.4.2, 2.4.1, 2.4.0, 2.3.3, 2.3.2, 2.3.1, 2.3.0, 2.3.4
Reporter: Anca Sarb


When transform(...) method is called on a LinearRegressionModel created 
directly with the coefficients and intercepts, the following exception is 
encountered.
{code:java}
java.util.NoSuchElementException: Failed to find a default value for loss at 
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
 at 
org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:780)
 at scala.Option.getOrElse(Option.scala:121) at 
org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:779) at 
org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42) at 
org.apache.spark.ml.param.Params$class.$(params.scala:786) at 
org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42) at 
org.apache.spark.ml.regression.LinearRegressionParams$class.validateAndTransformSchema(LinearRegression.scala:111)
 at 
org.apache.spark.ml.regression.LinearRegressionModel.validateAndTransformSchema(LinearRegression.scala:637)
 at org.apache.spark.ml.PredictionModel.transformSchema(Predictor.scala:192) at 
org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
 at 
org.apache.spark.ml.PipelineModel$$anonfun$transformSchema$5.apply(Pipeline.scala:311)
 at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) 
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
 at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186) at 
org.apache.spark.ml.PipelineModel.transformSchema(Pipeline.scala:311) at 
org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74) at 
org.apache.spark.ml.PipelineModel.transform(Pipeline.scala:305)
{code}
This is because validateAndTransformSchema() is called both during training and 
scoring phases, but the checks against the training related params like loss 
should really be performed during training phase only, I think, please correct 
me if I'm missing anything :)

This issue was first reported for mleap 
([combust/mleap#455|https://github.com/combust/mleap/issues/455]) because 
basically when we serialize the Spark transformers for mleap, we only serialize 
the params that are relevant for scoring. We do have the option to de-serialize 
the serialized transformers back into Spark for scoring again, but in that 
case, we no longer have all the training params.

Test to reproduce in PR: [https://github.com/apache/spark/pull/24509]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org