Re: Scala examples for Spark do not work as written in documentation
Hey, sorry to reanimate this thread, but just a quick question: why do the examples (on http://spark.apache.org/examples.html) use spark for the SparkContext reference? This is minor, but it seems like it could be a little confusing for people who want to run them in the shell and need to change spark to sc. (I noticed because this was a speedbump for a colleague who is trying out Spark.) thanks, wb - Original Message - From: Andy Konwinski andykonwin...@gmail.com To: dev@spark.apache.org Sent: Tuesday, May 20, 2014 4:06:33 PM Subject: Re: Scala examples for Spark do not work as written in documentation I fixed the bug, but I kept the parameter i instead of _ since that (1) keeps it more parallel to the python and java versions which also use functions with a named variable and (2) doesn't require readers to know this particular use of the _ syntax in Scala. Thanks for catching this Glenn. Andy On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra m...@clearstorydata.comwrote: Sorry, looks like an extra line got inserted in there. One more try: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra m...@clearstorydata.com wrote: Actually, the better way to write the multi-line closure would be: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 9:41 AM, GlennStrycker glenn.stryc...@gmail.com wrote: On the webpage http://spark.apache.org/examples.html, there is an example written as val count = spark.parallelize(1 to NUM_SAMPLES).map(i = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 ).reduce(_ + _) println(Pi is roughly + 4.0 * count / NUM_SAMPLES) This does not execute in Spark, which gives me an error: console:2: error: illegal start of simple expression val x = Math.random() ^ If I rewrite the query slightly, adding in {}, it works: val count = spark.parallelize(1 to 1).map(i = { val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 } ).reduce(_ + _) println(Pi is roughly + 4.0 * count / 1.0) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Scala examples for Spark do not work as written in documentation
Those are pretty old - but I think the reason Matei did that was to make it less confusing for brand new users. `spark` is actually a valid identifier because it's just a variable name (val spark = new SparkContext()) but I agree this could be confusing for users who want to drop into the shell. On Fri, Jun 20, 2014 at 12:04 PM, Will Benton wi...@redhat.com wrote: Hey, sorry to reanimate this thread, but just a quick question: why do the examples (on http://spark.apache.org/examples.html) use spark for the SparkContext reference? This is minor, but it seems like it could be a little confusing for people who want to run them in the shell and need to change spark to sc. (I noticed because this was a speedbump for a colleague who is trying out Spark.) thanks, wb - Original Message - From: Andy Konwinski andykonwin...@gmail.com To: dev@spark.apache.org Sent: Tuesday, May 20, 2014 4:06:33 PM Subject: Re: Scala examples for Spark do not work as written in documentation I fixed the bug, but I kept the parameter i instead of _ since that (1) keeps it more parallel to the python and java versions which also use functions with a named variable and (2) doesn't require readers to know this particular use of the _ syntax in Scala. Thanks for catching this Glenn. Andy On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra m...@clearstorydata.comwrote: Sorry, looks like an extra line got inserted in there. One more try: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra m...@clearstorydata.com wrote: Actually, the better way to write the multi-line closure would be: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 9:41 AM, GlennStrycker glenn.stryc...@gmail.com wrote: On the webpage http://spark.apache.org/examples.html, there is an example written as val count = spark.parallelize(1 to NUM_SAMPLES).map(i = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 ).reduce(_ + _) println(Pi is roughly + 4.0 * count / NUM_SAMPLES) This does not execute in Spark, which gives me an error: console:2: error: illegal start of simple expression val x = Math.random() ^ If I rewrite the query slightly, adding in {}, it works: val count = spark.parallelize(1 to 1).map(i = { val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 } ).reduce(_ + _) println(Pi is roughly + 4.0 * count / 1.0) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Scala examples for Spark do not work as written in documentation
I fixed the bug, but I kept the parameter i instead of _ since that (1) keeps it more parallel to the python and java versions which also use functions with a named variable and (2) doesn't require readers to know this particular use of the _ syntax in Scala. Thanks for catching this Glenn. Andy On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra m...@clearstorydata.comwrote: Sorry, looks like an extra line got inserted in there. One more try: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra m...@clearstorydata.com wrote: Actually, the better way to write the multi-line closure would be: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 9:41 AM, GlennStrycker glenn.stryc...@gmail.com wrote: On the webpage http://spark.apache.org/examples.html, there is an example written as val count = spark.parallelize(1 to NUM_SAMPLES).map(i = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 ).reduce(_ + _) println(Pi is roughly + 4.0 * count / NUM_SAMPLES) This does not execute in Spark, which gives me an error: console:2: error: illegal start of simple expression val x = Math.random() ^ If I rewrite the query slightly, adding in {}, it works: val count = spark.parallelize(1 to 1).map(i = { val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 } ).reduce(_ + _) println(Pi is roughly + 4.0 * count / 1.0) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Scala examples for Spark do not work as written in documentation
Thanks for pointing it out. We should update the website to fix the code. val count = spark.parallelize(1 to NUM_SAMPLES).map { i = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) println(Pi is roughly + 4.0 * count / NUM_SAMPLES) On Fri, May 16, 2014 at 9:41 AM, GlennStrycker glenn.stryc...@gmail.comwrote: On the webpage http://spark.apache.org/examples.html, there is an example written as val count = spark.parallelize(1 to NUM_SAMPLES).map(i = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 ).reduce(_ + _) println(Pi is roughly + 4.0 * count / NUM_SAMPLES) This does not execute in Spark, which gives me an error: console:2: error: illegal start of simple expression val x = Math.random() ^ If I rewrite the query slightly, adding in {}, it works: val count = spark.parallelize(1 to 1).map(i = { val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 } ).reduce(_ + _) println(Pi is roughly + 4.0 * count / 1.0) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Scala examples for Spark do not work as written in documentation
Sorry, looks like an extra line got inserted in there. One more try: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 12:36 PM, Mark Hamstra m...@clearstorydata.comwrote: Actually, the better way to write the multi-line closure would be: val count = spark.parallelize(1 to NUM_SAMPLES).map { _ = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 }.reduce(_ + _) On Fri, May 16, 2014 at 9:41 AM, GlennStrycker glenn.stryc...@gmail.comwrote: On the webpage http://spark.apache.org/examples.html, there is an example written as val count = spark.parallelize(1 to NUM_SAMPLES).map(i = val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 ).reduce(_ + _) println(Pi is roughly + 4.0 * count / NUM_SAMPLES) This does not execute in Spark, which gives me an error: console:2: error: illegal start of simple expression val x = Math.random() ^ If I rewrite the query slightly, adding in {}, it works: val count = spark.parallelize(1 to 1).map(i = { val x = Math.random() val y = Math.random() if (x*x + y*y 1) 1 else 0 } ).reduce(_ + _) println(Pi is roughly + 4.0 * count / 1.0) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Scala examples for Spark do not work as written in documentation
Why does the reduce function only work on sums of keys of the same type and does not support other functional forms? I am having trouble in another example where instead of 1s and 0s, the output of the map function is something like A=(1,2) and B=(3,4). I need a reduce function that can return something complicated based on reduce( (A,B) = (arbitrary fcn1 of A and B, arbitrary fcn2 of A and B) ), but I am only getting reduce( (A,B) = (arbitrary fcn1 of A, arbitrary fcn2 of A) ). See http://apache-spark-developers-list.1001551.n3.nabble.com/reduce-only-removes-duplicates-cannot-be-arbitrary-function-td6606.html -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-examples-for-Spark-do-not-work-as-written-in-documentation-tp6593p6607.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.