Re: BUG: when running as "extends App", closures don't capture variables

2014-10-30 Thread Sean Owen
Very coincidentally I ran into something equally puzzling yesterday where
something was bizarrely null when it can't have been in a Spark program
that extends App. I also changed to use main() and it works fine. So
definitely some issue here. If nobody makes a JIRA before I get home I'll
do it.
On Oct 29, 2014 11:20 PM, "Michael Albert" 
wrote:

> Greetings!
>
> This might be a documentation issue as opposed to a coding issue, in that
> perhaps the correct answer is "don't do that", but as this is not obvious,
> I am writing.
>
> The following code produces output most would not expect:
>
> package misc
>
> import org.apache.spark.SparkConf
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkContext._
>
> object DemoBug extends App {
> val conf = new SparkConf()
> val sc = new SparkContext(conf)
>
> val rdd = sc.parallelize(List("A","B","C","D"))
> val str1 = "A"
>
> val rslt1 = rdd.filter(x => { x != "A" }).count
> val rslt2 = rdd.filter(x => { str1 != null && x != "A" }).count
>
> println("DemoBug: rslt1 = " + rslt1 + " rslt2 = " + rslt2)
> }
>
> This produces the output:
> DemoBug: rslt1 = 3 rslt2 = 0
>
> Compiled with sbt:
> libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.1.0"
> Run on an EC2 EMR instance with a recent image (hadoop 2.4.0, spark 1.1.0)
>
> If instead there is a proper "main()", it works as expected.
>
> Thank you.
>
> Sincerely,
>  Mike
>


Re: BUG: when running as "extends App", closures don't capture variables

2014-10-29 Thread Matei Zaharia
Good catch! If you'd like, you can send a pull request changing the files in 
docs/ to do this (see 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark 
), 
otherwise maybe open an issue on https://issues.apache.org/jira/browse/SPARK 
 so we can track it.

Matei

> On Oct 29, 2014, at 3:16 PM, Michael Albert  
> wrote:
> 
> Greetings!
> 
> This might be a documentation issue as opposed to a coding issue, in that 
> perhaps the correct answer is "don't do that", but as this is not obvious, I 
> am writing.
> 
> The following code produces output most would not expect:
> 
> package misc
> 
> import org.apache.spark.SparkConf
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkContext._
> 
> object DemoBug extends App {
> val conf = new SparkConf()
> val sc = new SparkContext(conf)
> 
> val rdd = sc.parallelize(List("A","B","C","D"))
> val str1 = "A"
> 
> val rslt1 = rdd.filter(x => { x != "A" }).count
> val rslt2 = rdd.filter(x => { str1 != null && x != "A" }).count
> 
> println("DemoBug: rslt1 = " + rslt1 + " rslt2 = " + rslt2)
> }
> 
> This produces the output:
> DemoBug: rslt1 = 3 rslt2 = 0
> 
> Compiled with sbt:
> libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.1.0"
> Run on an EC2 EMR instance with a recent image (hadoop 2.4.0, spark 1.1.0)
> 
> If instead there is a proper "main()", it works as expected.
> 
> Thank you.
> 
> Sincerely,
>  Mike



BUG: when running as "extends App", closures don't capture variables

2014-10-29 Thread Michael Albert
Greetings!
This might be a documentation issue as opposed to a coding issue, in that 
perhaps the correct answer is "don't do that", but as this is not obvious, I am 
writing.
The following code produces output most would not expect:
package misc
import org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport 
org.apache.spark.SparkContext._
object DemoBug extends App {    val conf = new SparkConf()    val sc = new 
SparkContext(conf)
    val rdd = sc.parallelize(List("A","B","C","D"))    val str1 = "A"
    val rslt1 = rdd.filter(x => { x != "A" }).count    val rslt2 = rdd.filter(x 
=> { str1 != null && x != "A" }).count        println("DemoBug: rslt1 = " + 
rslt1 + " rslt2 = " + rslt2)}
This produces the output:DemoBug: rslt1 = 3 rslt2 = 0
Compiled with sbt:libraryDependencies += "org.apache.spark" % "spark-core_2.10" 
% "1.1.0"Run on an EC2 EMR instance with a recent image (hadoop 2.4.0, spark 
1.1.0)
If instead there is a proper "main()", it works as expected.
Thank you.
Sincerely, Mike