[jira] [Commented] (SPARK-14246) vars not updated after Scala script reload

2016-03-29 Thread Jim Powers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216734#comment-15216734
 ] 

Jim Powers commented on SPARK-14246:


It appears that this is a Scala problem and not a spark one.

{noformat}
scala -Yrepl-class-based
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74).
Type in expressions for evaluation. Or try :help.

scala> :load Fail.scala
Loading Fail.scala...
X: Serializable{def getArray(n: Int): Array[Double]; def multiplySum(x: 
Double,v: Seq[Double]): Double} = $anon$1@5ddf0d24

scala> val a = X.getArray(100)
warning: there was one feature warning; re-run with -feature for details
a: Array[Double] = Array(0.6445967025789217, 0.8356433165638456, 
0.9050186287112574, 0.06344554850936357, 0.008363070900988756, 
0.6593626537474886, 0.7424265307039932, 0.5035629973234215, 0.5510831160354674, 
0.37366654438205593, 0.33299751842582703, 0.3800633883472283, 
0.4153963387304084, 0.2752468331316783, 0.8699452196820426, 
0.31938530945559984, 0.7990568815957724, 0.6875841747139724, 
0.31949965197609675, 0.026911873556428878, 0.2616536698127736, 
0.5580118021155783, 0.28345994848845435, 0.1773433165304532, 
0.2549417030032525, 0.9777692443616465, 0.6296846343712603, 0.8589339648876033, 
0.7020098253707141, 0.8518829567943531, 0.41154622619731374, 
0.1075129613308311, 0.8499252434316056, 0.3841876768177086, 0.137415801614582, 
0.27030938222499756, 0.5511585560527115, 0.26252884257087217, ...
scala> X = null
X: Serializable{def getArray(n: Int): Array[Double]; def multiplySum(x: 
Double,v: Seq[Double]): Double} = null

scala> :load Fail.scala
Loading Fail.scala...
X: Serializable{def getArray(n: Int): Array[Double]; def multiplySum(x: 
Double,v: Seq[Double]): Double} = $anon$1@52c8295b

scala> X
res0: Serializable{def getArray(n: Int): Array[Double]; def multiplySum(x: 
Double,v: Seq[Double]): Double} = null
{noformat}

So, it appears any anonymous class with 2 or more members may exhibit this 
problem in the presence of {{-Yrepl-class-based}}.

Closing this issue.

> vars not updated after Scala script reload
> --
>
> Key: SPARK-14246
> URL: https://issues.apache.org/jira/browse/SPARK-14246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Jim Powers
> Attachments: Fail.scala, Null.scala, reproduce_transient_npe.scala
>
>
> Attached are two scripts.  The problem only exhibits itself with Spark 1.6.0, 
> 1.6.1, and 2.0.0 using Scala 2.11.  Scala 2.10 does not exhibit this problem. 
>  With the Regular Scala 2.11(.7) REPL:
> {noformat}
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@4149c063
> scala> X
> res0: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@4149c063
> scala> val a = X.getArray(10)
> warning: there was one feature warning; re-run with -feature for details
> a: Array[Double] = Array(0.1701063617079236, 0.17570862034857437, 
> 0.6065851472098629, 0.4683069994589304, 0.35194859652378363, 
> 0.04033043823203897, 0.11917887149548367, 0.540367871104426, 
> 0.18544859881040276, 0.7236380062803334)
> scala> X = null
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = null
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@5860f3d7
> scala> X
> res1: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@5860f3d7
> {noformat}
> However, from within the Spark shell (Spark 1.6.0, Scala 2.11.7):
> {noformat}
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@750e2d33
> scala> val a = X.getArray(100)
> warning: there was one feature warning; re-run with -feature for details
> a: Array[Double] = Array(0.6330055191546612, 0.017865502179445936, 
> 0.6334775064489349, 0.9053929733525056, 0.7648311134918273, 
> 0.5423177955113584, 0.5164344368587143, 0.420054677669768, 
> 0.7842112717076851, 0.2098345684721057, 0.7925640951404774, 
> 0.5604706596425998, 0.8104403239147542, 0.7567005967624031, 
> 0.5221119883682028, 

[jira] [Commented] (SPARK-14246) vars not updated after Scala script reload

2016-03-29 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216737#comment-15216737
 ] 

Sean Owen commented on SPARK-14246:
---

Huh. That is strange. Yes it does look like something at least unexpected, and 
coming from the scala shell. I also have no idea why this would happen unless 
somehow the existence of two declarations for this class is confusing 
classloaders here in the second instance. 

> vars not updated after Scala script reload
> --
>
> Key: SPARK-14246
> URL: https://issues.apache.org/jira/browse/SPARK-14246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Jim Powers
> Fix For: 1.6.0, 1.6.1, 2.0.0
>
> Attachments: Fail.scala, Null.scala, reproduce_transient_npe.scala
>
>
> Attached are two scripts.  The problem only exhibits itself with Spark 1.6.0, 
> 1.6.1, and 2.0.0 using Scala 2.11.  Scala 2.10 does not exhibit this problem. 
>  With the Regular Scala 2.11(.7) REPL:
> {noformat}
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@4149c063
> scala> X
> res0: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@4149c063
> scala> val a = X.getArray(10)
> warning: there was one feature warning; re-run with -feature for details
> a: Array[Double] = Array(0.1701063617079236, 0.17570862034857437, 
> 0.6065851472098629, 0.4683069994589304, 0.35194859652378363, 
> 0.04033043823203897, 0.11917887149548367, 0.540367871104426, 
> 0.18544859881040276, 0.7236380062803334)
> scala> X = null
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = null
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@5860f3d7
> scala> X
> res1: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@5860f3d7
> {noformat}
> However, from within the Spark shell (Spark 1.6.0, Scala 2.11.7):
> {noformat}
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@750e2d33
> scala> val a = X.getArray(100)
> warning: there was one feature warning; re-run with -feature for details
> a: Array[Double] = Array(0.6330055191546612, 0.017865502179445936, 
> 0.6334775064489349, 0.9053929733525056, 0.7648311134918273, 
> 0.5423177955113584, 0.5164344368587143, 0.420054677669768, 
> 0.7842112717076851, 0.2098345684721057, 0.7925640951404774, 
> 0.5604706596425998, 0.8104403239147542, 0.7567005967624031, 
> 0.5221119883682028, 0.15766763970350484, 0.18693986227881698, 
> 0.7475065360095031, 0.7766720862129398, 0.7844069968816826, 
> 0.27481855935245014, 0.8498855383673198, 0.7496017097461324, 
> 0.448373036252237, 0.7372969840779748, 0.26381835654323815, 
> 0.7919478212349927, 0.773136240932345, 0.7441046289586369, 
> 0.8774372628866844, 0.567152428053003, 0.7256375989728348, 0.654839959050646, 
> 0.858953671296855, 0.47581286359760067, 0.039760801375546495, 
> 0.7764165909218748, 0.6882803110041462, 0.8660302...
> scala> X = null
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = null
> scala> X
> res0: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = null
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@48da64f2
> scala> X
> res1: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = null
> {noformat}
> However, if the script being loaded does not refer to an {{RDD}} then the 
> reload seems to work fine:
> {noformat}
> scala> :load Null.scala
> Loading Null.scala...
> X: Serializable{def getArray(n: Int): Array[Double]} = $anon$1@987a0bb
> scala> val a = X.getArray(100)
> warning: there was one feature warning; 

[jira] [Commented] (SPARK-14246) vars not updated after Scala script reload

2016-03-29 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216619#comment-15216619
 ] 

Sean Owen commented on SPARK-14246:
---

I suspect this ends up being some classloader issue, but I can't immediately 
see how. The spark shell is the Scala shell in 2.11 for most intents and 
purposes, so I doubt there's a Spark-specific issue there per se if the Scala 
shell does what you expect.

> vars not updated after Scala script reload
> --
>
> Key: SPARK-14246
> URL: https://issues.apache.org/jira/browse/SPARK-14246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.6.0, 1.6.1, 2.0.0
>Reporter: Jim Powers
> Attachments: Null.scala, reproduce_transient_npe.scala
>
>
> Attached are two scripts.  The problem only exhibits itself with Spark 1.6.0, 
> 1.6.1, and 2.0.0 using Scala 2.11.  Scala 2.10 does not exhibit this problem. 
>  With the Regular Scala 2.11(.7) REPL:
> {noformat}
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@4149c063
> scala> X
> res0: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@4149c063
> scala> val a = X.getArray(10)
> warning: there was one feature warning; re-run with -feature for details
> a: Array[Double] = Array(0.1701063617079236, 0.17570862034857437, 
> 0.6065851472098629, 0.4683069994589304, 0.35194859652378363, 
> 0.04033043823203897, 0.11917887149548367, 0.540367871104426, 
> 0.18544859881040276, 0.7236380062803334)
> scala> X = null
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = null
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@5860f3d7
> scala> X
> res1: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@5860f3d7
> {noformat}
> However, from within the Spark shell (Spark 1.6.0, Scala 2.11.7):
> {noformat}
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@750e2d33
> scala> val a = X.getArray(100)
> warning: there was one feature warning; re-run with -feature for details
> a: Array[Double] = Array(0.6330055191546612, 0.017865502179445936, 
> 0.6334775064489349, 0.9053929733525056, 0.7648311134918273, 
> 0.5423177955113584, 0.5164344368587143, 0.420054677669768, 
> 0.7842112717076851, 0.2098345684721057, 0.7925640951404774, 
> 0.5604706596425998, 0.8104403239147542, 0.7567005967624031, 
> 0.5221119883682028, 0.15766763970350484, 0.18693986227881698, 
> 0.7475065360095031, 0.7766720862129398, 0.7844069968816826, 
> 0.27481855935245014, 0.8498855383673198, 0.7496017097461324, 
> 0.448373036252237, 0.7372969840779748, 0.26381835654323815, 
> 0.7919478212349927, 0.773136240932345, 0.7441046289586369, 
> 0.8774372628866844, 0.567152428053003, 0.7256375989728348, 0.654839959050646, 
> 0.858953671296855, 0.47581286359760067, 0.039760801375546495, 
> 0.7764165909218748, 0.6882803110041462, 0.8660302...
> scala> X = null
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = null
> scala> X
> res0: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = null
> scala> :load reproduce_transient_npe.scala
> Loading reproduce_transient_npe.scala...
> X: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = 
> $anon$1@48da64f2
> scala> X
> res1: Serializable{val cf: Double; def getArray(n: Int): Array[Double]; def 
> multiplySum(x: Double,v: org.apache.spark.rdd.RDD[Double]): Double} = null
> {noformat}
> However, if the script being loaded does not refer to an {{RDD}} then the 
> reload seems to work fine:
> {noformat}
> scala> :load Null.scala
> Loading Null.scala...
> X: Serializable{def getArray(n: Int): Array[Double]} = $anon$1@987a0bb
> scala> val a = X.getArray(100)
> warning: there was one feature warning; re-run with -feature for details
> a: Array[Double] =