[jira] [Updated] (SPARK-1242) Add aggregate to python API
[ https://issues.apache.org/jira/browse/SPARK-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1242: - Assignee: Holden Karau Add aggregate to python API --- Key: SPARK-1242 URL: https://issues.apache.org/jira/browse/SPARK-1242 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 0.9.0 Reporter: Holden Karau Assignee: Holden Karau Priority: Trivial Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1626) Update Spark YARN docs to use spark-submit
Patrick Wendell created SPARK-1626: -- Summary: Update Spark YARN docs to use spark-submit Key: SPARK-1626 URL: https://issues.apache.org/jira/browse/SPARK-1626 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1625) Ensure all legacy YARN options are supported with spark-submit
[ https://issues.apache.org/jira/browse/SPARK-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1625: --- Component/s: YARN Ensure all legacy YARN options are supported with spark-submit -- Key: SPARK-1625 URL: https://issues.apache.org/jira/browse/SPARK-1625 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1621) Update Chill to 0.3.6
[ https://issues.apache.org/jira/browse/SPARK-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1621: --- Issue Type: Dependency upgrade (was: Improvement) Update Chill to 0.3.6 - Key: SPARK-1621 URL: https://issues.apache.org/jira/browse/SPARK-1621 Project: Spark Issue Type: Dependency upgrade Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Fix For: 1.0.0 It registers more Scala classes, including things like Ranges that we had to register manually before. See https://github.com/twitter/chill/releases for Chill's change log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1582) Job cancellation does not interrupt threads
[ https://issues.apache.org/jira/browse/SPARK-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1582. Resolution: Fixed Fix Version/s: 1.0.0 Job cancellation does not interrupt threads --- Key: SPARK-1582 URL: https://issues.apache.org/jira/browse/SPARK-1582 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 0.9.1 Reporter: Aaron Davidson Assignee: Aaron Davidson Fix For: 1.0.0 Cancelling Spark jobs is limited because executors that are blocked are not interrupted. In effect, the cancellation will succeed and the job will no longer be running, but executor threads may still be tied up with the cancelled job and unable to do further work until complete. This is particularly problematic in the case of deadlock or unlimited/long timeouts. It would be useful if cancelling a job would call Thread.interrupt() in order to interrupt blocking in most situations, such as Object monitors or IO. The one caveat is [HDFS-1208|https://issues.apache.org/jira/browse/HDFS-1208], where HDFS's DFSClient will not only swallow InterruptedException but may reinterpret them as IOException, causing HDFS to mark a node as permanently failed. Thus, this feature must be optional and probably off by default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1590) Recommend to use FindBugs
[ https://issues.apache.org/jira/browse/SPARK-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1590. Resolution: Not a Problem Assignee: Shixiong Zhu [~zsxwing] I'm marking this as not an issue now, but if you feel we should re-consider adding this to the build please feel free to open it. Recommend to use FindBugs - Key: SPARK-1590 URL: https://issues.apache.org/jira/browse/SPARK-1590 Project: Spark Issue Type: Question Components: Build Reporter: Shixiong Zhu Assignee: Shixiong Zhu Priority: Minor Attachments: findbugs.png FindBugs is an open source program created by Bill Pugh and David Hovemeyer which looks for bugs in Java code. It uses static analysis to identify hundreds of different potential types of errors in Java programs. Although Spark is a Scala project, FindBugs is still helpful. For example, I used it to find SPARK-1583 and SPARK-1589. However, the disadvantage is that the report generated by FindBugs usually contains many false alarms for a Scala project. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1590) Recommend to use FindBugs
[ https://issues.apache.org/jira/browse/SPARK-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980735#comment-13980735 ] Patrick Wendell commented on SPARK-1590: I agree with this course of action. [~zsxwing] the issues you've been reporting are super helpful. I think the way to go here is to periodically look through these and contribute fixes. [~srowen] are you doing any custom configurations in intellij to enable these? Or do you just go with the defaults? Recommend to use FindBugs - Key: SPARK-1590 URL: https://issues.apache.org/jira/browse/SPARK-1590 Project: Spark Issue Type: Question Components: Build Reporter: Shixiong Zhu Priority: Minor Attachments: findbugs.png FindBugs is an open source program created by Bill Pugh and David Hovemeyer which looks for bugs in Java code. It uses static analysis to identify hundreds of different potential types of errors in Java programs. Although Spark is a Scala project, FindBugs is still helpful. For example, I used it to find SPARK-1583 and SPARK-1589. However, the disadvantage is that the report generated by FindBugs usually contains many false alarms for a Scala project. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1604) Couldn't run spark-submit with yarn cluster mode when using deps jar
[ https://issues.apache.org/jira/browse/SPARK-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980740#comment-13980740 ] Patrick Wendell commented on SPARK-1604: I'm not sure why you're including both assemble-deps and the examples jar. The examples jar includes all of spark and its dependencies. I've noted here we should probably mark spark as provided in the examples jar so it doesn't embed Spark. https://issues.apache.org/jira/browse/SPARK-1565 Couldn't run spark-submit with yarn cluster mode when using deps jar Key: SPARK-1604 URL: https://issues.apache.org/jira/browse/SPARK-1604 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.0 Reporter: Kan Zhang SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar ./bin/spark-submit ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL Exception in thread main java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1607) Remove use of octal literals, deprecated in Scala 2.10 / removed in 2.11
[ https://issues.apache.org/jira/browse/SPARK-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1607. -- Resolution: Fixed Fix Version/s: 1.0.0 Remove use of octal literals, deprecated in Scala 2.10 / removed in 2.11 Key: SPARK-1607 URL: https://issues.apache.org/jira/browse/SPARK-1607 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 0.9.1 Reporter: Sean Owen Priority: Minor Labels: literal, octal, scala, yarn Fix For: 1.0.0 Octal literals like 0700 are deprecated in Scala 2.10, generating a warning. They have been removed entirely in 2.11. See https://issues.scala-lang.org/browse/SI-7618 This change simply replaces two uses of octals with hex literals, which seemed the next-best representation since they express a bit mask (file permission in particular) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1611) Incorrect initialization order in AppendOnlyMap
[ https://issues.apache.org/jira/browse/SPARK-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-1611. - Resolution: Fixed Fix Version/s: 1.0.0 Incorrect initialization order in AppendOnlyMap --- Key: SPARK-1611 URL: https://issues.apache.org/jira/browse/SPARK-1611 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Shixiong Zhu Assignee: Shixiong Zhu Priority: Minor Labels: easyfix Fix For: 1.0.0 The initialization order of growThreshold and LOAD_FACTOR is incorrect. growThreshold will be initialized to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-993) Don't reuse Writable objects in SequenceFile by default
[ https://issues.apache.org/jira/browse/SPARK-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980747#comment-13980747 ] Arun Ramakrishnan commented on SPARK-993: - How does one reproduce this issue ? I tried a few things on the spark shell locally {noformat} import java.io.File import com.google.common.io.Files import org.apache.hadoop.io._ val tempDir = Files.createTempDir() val outputDir = new File(tempDir, output).getAbsolutePath val num = 100 val nums = sc.makeRDD(1 to num).map(x = (a * x, x)) nums.saveAsSequenceFile(outputDir) val output = sc.sequenceFile[String,Int](outputDir) assert(output.collect().toSet.size == num) val t = sc.sequenceFile(outputDir, classOf[Text], classOf[IntWritable]) assert( t.map { case (k,v) = (k.toString, v.get) }.collect().toSet.size == num ) {noformat} But, asserts seem to be fine. Don't reuse Writable objects in SequenceFile by default --- Key: SPARK-993 URL: https://issues.apache.org/jira/browse/SPARK-993 Project: Spark Issue Type: Improvement Reporter: Matei Zaharia Labels: Starter Right now we reuse them as an optimization, which leads to weird results when you call collect() on a file with distinct items. We should instead make that behavior optional through a flag. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980756#comment-13980756 ] Piotr Kołaczkowski edited comment on SPARK-1199 at 4/25/14 7:26 AM: +1 to fixing this. We're affected as well. Classes defined in Shell are inner classes, and therefore cannot be easily instantiated by reflection. They need additional reference to the outer object, which is non-trivial to obtain (is it obtainable at all without modifying Spark?). {noformat} scala class Test defined class Test scala new Test res5: Test = $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$Test@4f755864 // good, so there is a default constructor and we can call it through reflection? // not so fast... scala classOf[Test].getConstructor() java.lang.NoSuchMethodException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$Test.init() ... scala classOf[Test].getConstructors()(0) res7: java.lang.reflect.Constructor[_] = public $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$Test($iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC) {noformat} The workaround does not work for us. was (Author: pkolaczk): +1 to fixing this. We're affected as well. Classes defined in Shell are inner classes, and therefore cannot be easily instantiated by reflection. They need additional reference to the outer object, which is non-trivial to obtain (is it obtainable at all without modifying Spark?). {noformat} scala class Test defined class Test scala new Test res5: Test = $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$Test@4f755864 // good, so there is a default constructor and we can call it through reflection? // not so fast... scala classOf[Test].getConstructor() java.lang.NoSuchMethodException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$Test.init() ... scala classOf[Test].getConstructors()(0) res7: java.lang.reflect.Constructor[_] = public $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$Test($iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC) {noformat} Type mismatch in Spark shell when using case class defined in shell --- Key: SPARK-1199 URL: https://issues.apache.org/jira/browse/SPARK-1199 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Andrew Kerr Priority: Critical Fix For: 1.1.0 Define a class in the shell: {code} case class TestClass(a:String) {code} and an RDD {code} val data = sc.parallelize(Seq(a)).map(TestClass(_)) {code} define a function on it and map over the RDD {code} def itemFunc(a:TestClass):TestClass = a data.map(itemFunc) {code} Error: {code} console:19: error: type mismatch; found : TestClass = TestClass required: TestClass = ? data.map(itemFunc) {code} Similarly with a mapPartitions: {code} def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a data.mapPartitions(partitionFunc) {code} {code} console:19: error: type mismatch; found : Iterator[TestClass] = Iterator[TestClass] required: Iterator[TestClass] = Iterator[?] Error occurred in an application involving default arguments. data.mapPartitions(partitionFunc) {code} The behavior is the same whether in local mode or on a cluster. This isn't specific to RDDs. A Scala collection in the Spark shell has the same problem. {code} scala Seq(TestClass(foo)).map(itemFunc) console:15: error: type mismatch; found : TestClass = TestClass required: TestClass = ? Seq(TestClass(foo)).map(itemFunc) ^ {code} When run in the Scala console (not the Spark shell) there are no type mismatch errors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1628) Missing hashCode methods in Partitioner subclasses
Shixiong Zhu created SPARK-1628: --- Summary: Missing hashCode methods in Partitioner subclasses Key: SPARK-1628 URL: https://issues.apache.org/jira/browse/SPARK-1628 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Shixiong Zhu Assignee: Shixiong Zhu Priority: Minor `hashCode` is not override in HashPartitioner, RangePartitioner, PythonPartitioner and PageRankUtils.CustomPartitioner. Should override hashcode() if overriding equals(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1597) Add a version of reduceByKey that takes the Partitioner as a second argument
[ https://issues.apache.org/jira/browse/SPARK-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Singh reassigned SPARK-1597: Assignee: Sandeep Singh Add a version of reduceByKey that takes the Partitioner as a second argument Key: SPARK-1597 URL: https://issues.apache.org/jira/browse/SPARK-1597 Project: Spark Issue Type: Bug Reporter: Matei Zaharia Assignee: Sandeep Singh Priority: Blocker Most of our shuffle methods can take a Partitioner or a number of partitions as a second argument, but for some reason reduceByKey takes the Partitioner as a *first* argument: http://spark.apache.org/docs/0.9.1/api/core/#org.apache.spark.rdd.PairRDDFunctions. We should deprecate that version and add one where the Partitioner is the second argument. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1604) Couldn't run spark-submit with yarn cluster mode when built with assemble-deps
[ https://issues.apache.org/jira/browse/SPARK-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated SPARK-1604: - Summary: Couldn't run spark-submit with yarn cluster mode when built with assemble-deps (was: Couldn't run spark-submit with yarn cluster mode when built with assemble-ceps) Couldn't run spark-submit with yarn cluster mode when built with assemble-deps -- Key: SPARK-1604 URL: https://issues.apache.org/jira/browse/SPARK-1604 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.0 Reporter: Kan Zhang SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar ./bin/spark-submit ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL Exception in thread main java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1604) Couldn't run spark-submit with yarn cluster mode when built with assemble-deps
[ https://issues.apache.org/jira/browse/SPARK-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated SPARK-1604: - Description: {code} SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar ./bin/spark-submit ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL Exception in thread main java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} was: SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar ./bin/spark-submit ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL Exception in thread main java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Couldn't run spark-submit with yarn cluster mode when built with assemble-deps -- Key: SPARK-1604 URL: https://issues.apache.org/jira/browse/SPARK-1604 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.0.0 Reporter: Kan Zhang {code} SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar ./bin/spark-submit ./examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar --master yarn --deploy-mode cluster --class org.apache.spark.examples.sql.JavaSparkSQL Exception in thread main java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.Client at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:234) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:47) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1629) Spark Core missing commons-lang dependence
witgo created SPARK-1629: Summary: Spark Core missing commons-lang dependence Key: SPARK-1629 URL: https://issues.apache.org/jira/browse/SPARK-1629 Project: Spark Issue Type: Bug Reporter: witgo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1629) Spark Core missing commons-lang dependence
[ https://issues.apache.org/jira/browse/SPARK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980878#comment-13980878 ] Sean Owen commented on SPARK-1629: -- I don't see any usage of Commons Lang in the whole project? Tachyon uses commons-lang3 but it also brings it in as a dependency. Spark Core missing commons-lang dependence --- Key: SPARK-1629 URL: https://issues.apache.org/jira/browse/SPARK-1629 Project: Spark Issue Type: Bug Reporter: witgo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1629) Spark Core missing commons-lang dependence
[ https://issues.apache.org/jira/browse/SPARK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980886#comment-13980886 ] witgo edited comment on SPARK-1629 at 4/25/14 11:06 AM: Hi Sean Owen,see [Utils.scala|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L33] was (Author: witgo): Hi Sean Owen see [Utils.scala|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L33] Spark Core missing commons-lang dependence --- Key: SPARK-1629 URL: https://issues.apache.org/jira/browse/SPARK-1629 Project: Spark Issue Type: Bug Reporter: witgo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1478) Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915
[ https://issues.apache.org/jira/browse/SPARK-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13980962#comment-13980962 ] Ted Malaska commented on SPARK-1478: Spark-1584 is done and so is PR #300. So final we are ready for this Jira. I will start development today. Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 --- Key: SPARK-1478 URL: https://issues.apache.org/jira/browse/SPARK-1478 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Flume-1915 added support for compression over the wire from avro sink to avro source. I would like to add this functionality to the FlumeReceiver. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1467) Make StorageLevel.apply() factory methods experimental
[ https://issues.apache.org/jira/browse/SPARK-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981031#comment-13981031 ] Sandeep Singh commented on SPARK-1467: -- https://github.com/apache/spark/pull/551 Make StorageLevel.apply() factory methods experimental -- Key: SPARK-1467 URL: https://issues.apache.org/jira/browse/SPARK-1467 Project: Spark Issue Type: Bug Components: Documentation Reporter: Matei Zaharia Assignee: Sandeep Singh Fix For: 1.0.0 We may want to evolve these in the future to add things like SSDs, so let's mark them as experimental for now. Long-term the right solution might be some kind of builder. The stable API should be the existing StorageLevel constants. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981187#comment-13981187 ] Andrew Kerr commented on SPARK-1199: I have something of a workaround: {code} object MyTypes { case class TestClass(a:Int) } object MyLogic { import MyClasses._ def fn(b:TestClass) = TestClass(b.a * 2) val result = Seq(TestClass(1)).map(fn) } MyLogic.result // Seq{MyTypes.TestClass] = List(TestClass(2)) {code} Still can't access TestClass outside an object. Type mismatch in Spark shell when using case class defined in shell --- Key: SPARK-1199 URL: https://issues.apache.org/jira/browse/SPARK-1199 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Andrew Kerr Priority: Critical Fix For: 1.1.0 Define a class in the shell: {code} case class TestClass(a:String) {code} and an RDD {code} val data = sc.parallelize(Seq(a)).map(TestClass(_)) {code} define a function on it and map over the RDD {code} def itemFunc(a:TestClass):TestClass = a data.map(itemFunc) {code} Error: {code} console:19: error: type mismatch; found : TestClass = TestClass required: TestClass = ? data.map(itemFunc) {code} Similarly with a mapPartitions: {code} def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a data.mapPartitions(partitionFunc) {code} {code} console:19: error: type mismatch; found : Iterator[TestClass] = Iterator[TestClass] required: Iterator[TestClass] = Iterator[?] Error occurred in an application involving default arguments. data.mapPartitions(partitionFunc) {code} The behavior is the same whether in local mode or on a cluster. This isn't specific to RDDs. A Scala collection in the Spark shell has the same problem. {code} scala Seq(TestClass(foo)).map(itemFunc) console:15: error: type mismatch; found : TestClass = TestClass required: TestClass = ? Seq(TestClass(foo)).map(itemFunc) ^ {code} When run in the Scala console (not the Spark shell) there are no type mismatch errors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981187#comment-13981187 ] Andrew Kerr edited comment on SPARK-1199 at 4/25/14 4:44 PM: - I have something of a workaround: {code} object MyTypes { case class TestClass(a:Int) } object MyLogic { import MyClasses._ def fn(b:TestClass) = TestClass(b.a * 2) val result = Seq(TestClass(1)).map(fn) } MyLogic.result // Seq[MyTypes.TestClass] = List(TestClass(2)) {code} Still can't access TestClass outside an object. was (Author: andrewkerr): I have something of a workaround: {code} object MyTypes { case class TestClass(a:Int) } object MyLogic { import MyClasses._ def fn(b:TestClass) = TestClass(b.a * 2) val result = Seq(TestClass(1)).map(fn) } MyLogic.result // Seq{MyTypes.TestClass] = List(TestClass(2)) {code} Still can't access TestClass outside an object. Type mismatch in Spark shell when using case class defined in shell --- Key: SPARK-1199 URL: https://issues.apache.org/jira/browse/SPARK-1199 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Andrew Kerr Priority: Critical Fix For: 1.1.0 Define a class in the shell: {code} case class TestClass(a:String) {code} and an RDD {code} val data = sc.parallelize(Seq(a)).map(TestClass(_)) {code} define a function on it and map over the RDD {code} def itemFunc(a:TestClass):TestClass = a data.map(itemFunc) {code} Error: {code} console:19: error: type mismatch; found : TestClass = TestClass required: TestClass = ? data.map(itemFunc) {code} Similarly with a mapPartitions: {code} def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a data.mapPartitions(partitionFunc) {code} {code} console:19: error: type mismatch; found : Iterator[TestClass] = Iterator[TestClass] required: Iterator[TestClass] = Iterator[?] Error occurred in an application involving default arguments. data.mapPartitions(partitionFunc) {code} The behavior is the same whether in local mode or on a cluster. This isn't specific to RDDs. A Scala collection in the Spark shell has the same problem. {code} scala Seq(TestClass(foo)).map(itemFunc) console:15: error: type mismatch; found : TestClass = TestClass required: TestClass = ? Seq(TestClass(foo)).map(itemFunc) ^ {code} When run in the Scala console (not the Spark shell) there are no type mismatch errors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1630) PythonRDDs don't handle nulls gracefully
Kalpit Shah created SPARK-1630: -- Summary: PythonRDDs don't handle nulls gracefully Key: SPARK-1630 URL: https://issues.apache.org/jira/browse/SPARK-1630 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 0.9.0, 0.9.1 Reporter: Kalpit Shah Fix For: 1.0.0 If PythonRDDs receive a null element in iterators, they currently NPE. It would be better do log a DEBUG message and skip the write of NULL elements. Here are the 2 stack traces : 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread Thread[stdin writer for python,5,main] java.lang.NullPointerException at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267) at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88) - Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.writeToFile. : java.lang.NullPointerException at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273) at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247) at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246) at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285) at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280) at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1576) Passing of JAVA_OPTS to YARN on command line
[ https://issues.apache.org/jira/browse/SPARK-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981313#comment-13981313 ] Mridul Muralidharan commented on SPARK-1576: There is a misunderstanding here - it is to pass SPARK_JAVA_OPTS : not JAVA_OPTS. Directly passing JAVA_OPTS has beem removed Passing of JAVA_OPTS to YARN on command line Key: SPARK-1576 URL: https://issues.apache.org/jira/browse/SPARK-1576 Project: Spark Issue Type: Improvement Affects Versions: 0.9.0, 1.0.0, 0.9.1 Reporter: Nishkam Ravi Fix For: 0.9.0, 1.0.0, 0.9.1 Attachments: SPARK-1576.patch JAVA_OPTS can be passed by using either env variables (i.e., SPARK_JAVA_OPTS) or as config vars (after Patrick's recent change). It would be good to allow the user to pass them on command line as well to restrict scope to single application invocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1586) Fix issues with spark development under windows
[ https://issues.apache.org/jira/browse/SPARK-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981321#comment-13981321 ] Mridul Muralidharan commented on SPARK-1586: Immediate issues fixed though there are more hive tests failing due to path related issues. pr : https://github.com/apache/spark/pull/505 Fix issues with spark development under windows --- Key: SPARK-1586 URL: https://issues.apache.org/jira/browse/SPARK-1586 Project: Spark Issue Type: Bug Reporter: Mridul Muralidharan Assignee: Mridul Muralidharan Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1587) Fix thread leak in spark
[ https://issues.apache.org/jira/browse/SPARK-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-1587. Resolution: Fixed Fixed, https://github.com/apache/spark/pull/504 Fix thread leak in spark Key: SPARK-1587 URL: https://issues.apache.org/jira/browse/SPARK-1587 Project: Spark Issue Type: Bug Reporter: Mridul Muralidharan Assignee: Mridul Muralidharan SparkContext.stop does not cause all threads to exit. When running tests via scalatest (which keeps reusing the same vm), over time, this causes too many threads to be created causing tests to fail due to inability to create more threads. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1626) Update Spark YARN docs to use spark-submit
[ https://issues.apache.org/jira/browse/SPARK-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981325#comment-13981325 ] Thomas Graves commented on SPARK-1626: -- this is dup of SPARK-1492 Update Spark YARN docs to use spark-submit -- Key: SPARK-1626 URL: https://issues.apache.org/jira/browse/SPARK-1626 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1631) App name set in SparkConf (not in JVM properties) not respected by Yarn backend
[ https://issues.apache.org/jira/browse/SPARK-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981337#comment-13981337 ] Marcelo Vanzin commented on SPARK-1631: --- PR: https://github.com/apache/spark/pull/539 App name set in SparkConf (not in JVM properties) not respected by Yarn backend --- Key: SPARK-1631 URL: https://issues.apache.org/jira/browse/SPARK-1631 Project: Spark Issue Type: Bug Components: YARN Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin When you submit an application that sets its name using a SparkContext constructor or SparkConf.setAppName(), the Yarn app name is not set and the app shows up as Spark in the RM UI. That's because YarnClientSchedulerBackend only looks at the system properties to look for the app name, instead of looking at the app's config. e.g., app initializes like this: {code} val sc = new SparkContext(new SparkConf().setAppName(Blah)); {code} Start app like this: {noformat} ./bin/spark-submit --master yarn --deploy-mode client blah blah blah {noformat} And app name in RM UI does not reflect the code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1621) Update Chill to 0.3.6
[ https://issues.apache.org/jira/browse/SPARK-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1621. -- Resolution: Fixed Update Chill to 0.3.6 - Key: SPARK-1621 URL: https://issues.apache.org/jira/browse/SPARK-1621 Project: Spark Issue Type: Dependency upgrade Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Minor Fix For: 1.0.0 It registers more Scala classes, including things like Ranges that we had to register manually before. See https://github.com/twitter/chill/releases for Chill's change log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1630) PythonRDDs don't handle nulls gracefully
[ https://issues.apache.org/jira/browse/SPARK-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981458#comment-13981458 ] Kalpit Shah commented on SPARK-1630: https://github.com/apache/spark/pull/554 PythonRDDs don't handle nulls gracefully Key: SPARK-1630 URL: https://issues.apache.org/jira/browse/SPARK-1630 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 0.9.0, 0.9.1 Reporter: Kalpit Shah Fix For: 1.0.0 Original Estimate: 2h Remaining Estimate: 2h If PythonRDDs receive a null element in iterators, they currently NPE. It would be better do log a DEBUG message and skip the write of NULL elements. Here are the 2 stack traces : 14/04/22 03:44:19 ERROR executor.Executor: Uncaught exception in thread Thread[stdin writer for python,5,main] java.lang.NullPointerException at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:267) at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:88) - Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.writeToFile. : java.lang.NullPointerException at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:273) at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:247) at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$2.apply(PythonRDD.scala:246) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:246) at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:285) at org.apache.spark.api.python.PythonRDD$.writeToFile(PythonRDD.scala:280) at org.apache.spark.api.python.PythonRDD.writeToFile(PythonRDD.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1576) Passing of JAVA_OPTS to YARN on command line
[ https://issues.apache.org/jira/browse/SPARK-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981473#comment-13981473 ] Nishkam Ravi commented on SPARK-1576: - We have been using SPARK_JAVA_OPTS and JAVA_OPTS interchangeably. SPARK_JAVA_OPTS are JAVA_OPTS :) Passing of JAVA_OPTS to YARN on command line Key: SPARK-1576 URL: https://issues.apache.org/jira/browse/SPARK-1576 Project: Spark Issue Type: Improvement Affects Versions: 0.9.0, 1.0.0, 0.9.1 Reporter: Nishkam Ravi Fix For: 0.9.0, 1.0.0, 0.9.1 Attachments: SPARK-1576.patch JAVA_OPTS can be passed by using either env variables (i.e., SPARK_JAVA_OPTS) or as config vars (after Patrick's recent change). It would be good to allow the user to pass them on command line as well to restrict scope to single application invocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (SPARK-1395) Cannot launch jobs on Yarn cluster with local: scheme in SPARK_JAR
[ https://issues.apache.org/jira/browse/SPARK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reopened SPARK-1395: --- This is broken again. Cannot launch jobs on Yarn cluster with local: scheme in SPARK_JAR Key: SPARK-1395 URL: https://issues.apache.org/jira/browse/SPARK-1395 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.0 Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Fix For: 1.0.0 If you define SPARK_JAR and friends to use local: URIs, you cannot submit a job on a Yarn cluster. e.g., I have: SPARK_JAR=local:/tmp/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-cdh5.0.0.jar SPARK_YARN_APP_JAR=local:/tmp/spark-examples-assembly-1.0.0-SNAPSHOT.jar And running SparkPi using bin/run-example yields this: 14/04/02 13:23:33 INFO yarn.Client: Preparing Local resources Exception in thread main java.io.IOException: No FileSystem for scheme: local at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.spark.deploy.yarn.ClientBase$class.org$apache$spark$deploy$yarn$ClientBase$$copyRemoteFile(ClientBase.scala:156) at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$3.apply(ClientBase.scala:217) at org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$3.apply(ClientBase.scala:212) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:212) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:41) at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:76) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:81) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:129) at org.apache.spark.SparkContext.init(SparkContext.scala:226) at org.apache.spark.SparkContext.init(SparkContext.scala:96) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1633) Various examples for Scala and Java custom receiver, etc.
Tathagata Das created SPARK-1633: Summary: Various examples for Scala and Java custom receiver, etc. Key: SPARK-1633 URL: https://issues.apache.org/jira/browse/SPARK-1633 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Tathagata Das Assignee: Tathagata Das -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1600) flaky test case in streaming.CheckpointSuite
[ https://issues.apache.org/jira/browse/SPARK-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981552#comment-13981552 ] Tathagata Das commented on SPARK-1600: -- Will try to address this post 1.0 release flaky test case in streaming.CheckpointSuite Key: SPARK-1600 URL: https://issues.apache.org/jira/browse/SPARK-1600 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 0.9.0, 1.0.0, 0.9.1 Reporter: Nan Zhu the case recovery with file input stream.recovery with file input stream sometimes fails when the Jenkins is very busy with an unrelated change I have met it for 3 times, I also saw it in other places, the latest example is in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14397/ where the modification is just in YARN related files I once reported in dev mail list: http://apache-spark-developers-list.1001551.n3.nabble.com/a-weird-test-case-in-Streaming-td6116.html -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1409) Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite
[ https://issues.apache.org/jira/browse/SPARK-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981555#comment-13981555 ] Tathagata Das commented on SPARK-1409: -- The test seems to run fine when launched from Intellij Idea but not from sbt. I am not sure why yet. It is something to do with Akka's Prop (actually Typesafe Config it uses) being not serializable under certain conditions. I am afraid that there may be two versions of Props/Config in the classpath (when running from sbt) though I havent figured out how. The serialization error causes the test to fail, every time. Some change in the last couple of months resulted in this side effect. Comparing spark 0.9 branch (where tests run fine), there is no difference in the Akka / Typesafe Config. The only difference is saw is in the version of sbt - 0.12.1 for branch 0.9, 0.13.2 for master. So, still not solution, bumping this to post 1.0 Flaky Test: actor input stream test in org.apache.spark.streaming.InputStreamsSuite - Key: SPARK-1409 URL: https://issues.apache.org/jira/browse/SPARK-1409 Project: Spark Issue Type: Bug Components: Streaming Reporter: Michael Armbrust Assignee: Tathagata Das Here are just a few cases: https://travis-ci.org/apache/spark/jobs/22151827 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13709/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1400) Spark Streaming's received data is not cleaned up from BlockManagers when not needed any more
[ https://issues.apache.org/jira/browse/SPARK-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das closed SPARK-1400. Resolution: Duplicate Duplicate https://issues.apache.org/jira/browse/SPARK-1592 Spark Streaming's received data is not cleaned up from BlockManagers when not needed any more - Key: SPARK-1400 URL: https://issues.apache.org/jira/browse/SPARK-1400 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 0.9.0 Reporter: Tathagata Das Spark Streaming generates BlockRDDs with the data received over the network. These data blocks are not automatically cleared, they spill over from memory based on LRU, which slows down processing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1632) Avoid boxing in ExternalAppendOnlyMap compares
[ https://issues.apache.org/jira/browse/SPARK-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-1632: -- Summary: Avoid boxing in ExternalAppendOnlyMap compares (was: Avoid boxing in ExternalAppendOnlyMap.KCComparator) Avoid boxing in ExternalAppendOnlyMap compares -- Key: SPARK-1632 URL: https://issues.apache.org/jira/browse/SPARK-1632 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Hitting an OOME in ExternalAppendOnlyMap.KCComparator while boxing an int. I don't know if this is the root cause, but the boxing is also avoidable. Code: {code} def compare(kc1: (K, C), kc2: (K, C)): Int = { kc1._1.hashCode().compareTo(kc2._1.hashCode()) } {code} Error: {code} java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.Integer.valueOf(Integer.java:642) at scala.Predef$.int2Integer(Predef.scala:370) at org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:432) at org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:430) at org.apache.spark.util.collection.AppendOnlyMap$$anon$3.compare(AppendOnlyMap.scala:271) at java.util.TimSort.mergeLo(TimSort.java:687) at java.util.TimSort.mergeAt(TimSort.java:483) at java.util.TimSort.mergeCollapse(TimSort.java:410) at java.util.TimSort.sort(TimSort.java:214) at java.util.Arrays.sort(Arrays.java:727) at org.apache.spark.util.collection.AppendOnlyMap.destructiveSortedIterator(AppendOnlyMap.scala:274) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:188) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insert(ExternalAppendOnlyMap.scala:141) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95) at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471) at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1634) Java API docs contain test cases
[ https://issues.apache.org/jira/browse/SPARK-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-1634: - Summary: Java API docs contain test cases (was: JavaDoc contains test cases) Java API docs contain test cases Key: SPARK-1634 URL: https://issues.apache.org/jira/browse/SPARK-1634 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Xiangrui Meng Priority: Blocker The generated Java API docs contains all test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1634) Java API docs contain test cases
[ https://issues.apache.org/jira/browse/SPARK-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-1634: - Description: The generated Java API docs contain all test cases. (was: The generated Java API docs contains all test cases.) Java API docs contain test cases Key: SPARK-1634 URL: https://issues.apache.org/jira/browse/SPARK-1634 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Xiangrui Meng Priority: Blocker The generated Java API docs contain all test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1635) Java API docs do not show annotation.
Xiangrui Meng created SPARK-1635: Summary: Java API docs do not show annotation. Key: SPARK-1635 URL: https://issues.apache.org/jira/browse/SPARK-1635 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Xiangrui Meng The generated Java API docs do not contain Developer/Experimental annotations. The :: Developer/Experimental :: tag is in the generated doc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1478) Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915
[ https://issues.apache.org/jira/browse/SPARK-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981583#comment-13981583 ] Ted Malaska commented on SPARK-1478: I re cloned the code and ran a test. There is a bug with the current Github branch. I'm looking into it now. Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 --- Key: SPARK-1478 URL: https://issues.apache.org/jira/browse/SPARK-1478 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Flume-1915 added support for compression over the wire from avro sink to avro source. I would like to add this functionality to the FlumeReceiver. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-703) KafkaWordCount example crashes with java.lang.ArrayIndexOutOfBoundsException in CheckpointRDD.scala
[ https://issues.apache.org/jira/browse/SPARK-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das closed SPARK-703. --- Resolution: Not a Problem KafkaWordCount example crashes with java.lang.ArrayIndexOutOfBoundsException in CheckpointRDD.scala --- Key: SPARK-703 URL: https://issues.apache.org/jira/browse/SPARK-703 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.7.0 Reporter: Craig A. Vanderborgh This is a bad Spark Streaming bug. The KafkaWordCount example can be used to demonstrate the problem. After a few iterations (batches), the test crashes with this stack trace during the checkpointing attempt: 3/02/22 15:26:54 INFO streaming.JobManager: Total delay: 0.02100 s for job 12 (execution: 0.01300 s) 13/02/22 15:26:54 INFO rdd.CoGroupedRDD: Adding one-to-one dependency with MappedValuesRDD[87] at apply at TraversableLike.scala:239 13/02/22 15:26:54 INFO rdd.CoGroupedRDD: Adding one-to-one dependency with MapPartitionsRDD[56] at apply at TraversableLike.scala:239 13/02/22 15:26:54 INFO rdd.CoGroupedRDD: Adding one-to-one dependency with MapPartitionsRDD[99] at apply at TraversableLike.scala:239 13/02/22 15:26:54 ERROR streaming.JobManager: Running streaming job 13 @ 1361572014000 ms failed java.lang.ArrayIndexOutOfBoundsException: 0 at spark.rdd.CheckpointRDD.getPartitions(CheckpointRDD.scala:27) at spark.RDD.partitions(RDD.scala:166) at spark.RDD.partitions(RDD.scala:166) at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:71) at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:65) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:233) at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47) at spark.rdd.CoGroupedRDD.getPartitions(CoGroupedRDD.scala:63) at spark.RDD.partitions(RDD.scala:166) at spark.MappedValuesRDD.getPartitions(PairRDDFunctions.scala:655) at spark.RDD.partitions(RDD.scala:166) at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:71) at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:65) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:233) at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47) at spark.rdd.CoGroupedRDD.getPartitions(CoGroupedRDD.scala:63) at spark.RDD.partitions(RDD.scala:166) at spark.MappedValuesRDD.getPartitions(PairRDDFunctions.scala:655) at spark.RDD.partitions(RDD.scala:166) at spark.RDD.take(RDD.scala:550) at spark.streaming.DStream$$anonfun$foreachFunc$2$1.apply(DStream.scala:522) at spark.streaming.DStream$$anonfun$foreachFunc$2$1.apply(DStream.scala:521) at spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:22) at spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:21) at spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:21) at spark.streaming.Job.run(Job.scala:10) at spark.streaming.JobManager$JobHandler.run(JobManager.scala:15) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) The only way I can get this test to work on a cluster is to disable checkpointing and to use reduceByKey() instead of reduceByKeyAndWindow(). Also the test works when run using local as the master. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-703) KafkaWordCount example crashes with java.lang.ArrayIndexOutOfBoundsException in CheckpointRDD.scala
[ https://issues.apache.org/jira/browse/SPARK-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981581#comment-13981581 ] Tathagata Das commented on SPARK-703: - Yes, that observation is correct. This is something we have to make it clearer in the documentation. Closing this JIRA for now. KafkaWordCount example crashes with java.lang.ArrayIndexOutOfBoundsException in CheckpointRDD.scala --- Key: SPARK-703 URL: https://issues.apache.org/jira/browse/SPARK-703 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.7.0 Reporter: Craig A. Vanderborgh This is a bad Spark Streaming bug. The KafkaWordCount example can be used to demonstrate the problem. After a few iterations (batches), the test crashes with this stack trace during the checkpointing attempt: 3/02/22 15:26:54 INFO streaming.JobManager: Total delay: 0.02100 s for job 12 (execution: 0.01300 s) 13/02/22 15:26:54 INFO rdd.CoGroupedRDD: Adding one-to-one dependency with MappedValuesRDD[87] at apply at TraversableLike.scala:239 13/02/22 15:26:54 INFO rdd.CoGroupedRDD: Adding one-to-one dependency with MapPartitionsRDD[56] at apply at TraversableLike.scala:239 13/02/22 15:26:54 INFO rdd.CoGroupedRDD: Adding one-to-one dependency with MapPartitionsRDD[99] at apply at TraversableLike.scala:239 13/02/22 15:26:54 ERROR streaming.JobManager: Running streaming job 13 @ 1361572014000 ms failed java.lang.ArrayIndexOutOfBoundsException: 0 at spark.rdd.CheckpointRDD.getPartitions(CheckpointRDD.scala:27) at spark.RDD.partitions(RDD.scala:166) at spark.RDD.partitions(RDD.scala:166) at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:71) at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:65) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:233) at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47) at spark.rdd.CoGroupedRDD.getPartitions(CoGroupedRDD.scala:63) at spark.RDD.partitions(RDD.scala:166) at spark.MappedValuesRDD.getPartitions(PairRDDFunctions.scala:655) at spark.RDD.partitions(RDD.scala:166) at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:71) at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:65) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:233) at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47) at spark.rdd.CoGroupedRDD.getPartitions(CoGroupedRDD.scala:63) at spark.RDD.partitions(RDD.scala:166) at spark.MappedValuesRDD.getPartitions(PairRDDFunctions.scala:655) at spark.RDD.partitions(RDD.scala:166) at spark.RDD.take(RDD.scala:550) at spark.streaming.DStream$$anonfun$foreachFunc$2$1.apply(DStream.scala:522) at spark.streaming.DStream$$anonfun$foreachFunc$2$1.apply(DStream.scala:521) at spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:22) at spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:21) at spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:21) at spark.streaming.Job.run(Job.scala:10) at spark.streaming.JobManager$JobHandler.run(JobManager.scala:15) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) The only way I can get this test to work on a cluster is to disable checkpointing and to use reduceByKey() instead of reduceByKeyAndWindow(). Also the test works when run using local as the master. -- This message was sent by
[jira] [Commented] (SPARK-1478) Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915
[ https://issues.apache.org/jira/browse/SPARK-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981601#comment-13981601 ] Ted Malaska commented on SPARK-1478: never mind it is working now. Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 --- Key: SPARK-1478 URL: https://issues.apache.org/jira/browse/SPARK-1478 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Flume-1915 added support for compression over the wire from avro sink to avro source. I would like to add this functionality to the FlumeReceiver. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1636) Move main methods to examples
Xiangrui Meng created SPARK-1636: Summary: Move main methods to examples Key: SPARK-1636 URL: https://issues.apache.org/jira/browse/SPARK-1636 Project: Spark Issue Type: Sub-task Components: MLlib Reporter: Xiangrui Meng Assignee: Xiangrui Meng Move the main methods to examples and make them compatible with spark-submit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1637) Clean up examples for 1.0
Matei Zaharia created SPARK-1637: Summary: Clean up examples for 1.0 Key: SPARK-1637 URL: https://issues.apache.org/jira/browse/SPARK-1637 Project: Spark Issue Type: Improvement Components: Examples Reporter: Matei Zaharia Priority: Critical Fix For: 1.0.0 - Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - Move Python examples into examples/src/main/python - Update docs to reflect these changes -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1598) Mark main methods experimental
[ https://issues.apache.org/jira/browse/SPARK-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-1598. -- Resolution: Duplicate We will move main methods to examples instead. Mark main methods experimental -- Key: SPARK-1598 URL: https://issues.apache.org/jira/browse/SPARK-1598 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.0.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng We should treat the parameters in the main methods as part of our APIs. They are not quite consistent at this time, so we should mark them experimental and look for a unified solution in the next sprint. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1638) Executors fail to come up if spark.executor.extraJavaOptions is set
Kalpit Shah created SPARK-1638: -- Summary: Executors fail to come up if spark.executor.extraJavaOptions is set Key: SPARK-1638 URL: https://issues.apache.org/jira/browse/SPARK-1638 Project: Spark Issue Type: Bug Components: Deploy, EC2 Environment: Bring up a cluster in EC2 using spark-ec2 scripts Reporter: Kalpit Shah Fix For: 1.0.0 If you try to launch a PySpark shell with spark.executor.extraJavaOptions set to -XX:+UseCompressedOops -XX:+UseCompressedStrings -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps, the executors never come up on any of the workers. I see the following error in log file : Spark Executor Command: /usr/lib/jvm/java/bin/java -cp /root/c3/lib/*::/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar: -XX:+UseCompressedOops -XX:+UseCompressedStrings -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xms13312M -Xmx13312M org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://spark@HOSTNAME:45429/user/CoarseGrainedScheduler 7 HOSTNAME 4 akka.tcp://sparkWorker@HOSTNAME:39727/user/Worker app-20140423224526- Unrecognized VM option 'UseCompressedOops -XX:+UseCompressedStrings -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1640) In yarn-client mode, pass preferred node locations to AM
Sandy Ryza created SPARK-1640: - Summary: In yarn-client mode, pass preferred node locations to AM Key: SPARK-1640 URL: https://issues.apache.org/jira/browse/SPARK-1640 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 0.9.0 Reporter: Sandy Ryza In yarn-cluster mode, if the user passes preferred node location data to the SparkContext, the AM requests containers based on that data. In yarn-client mode, it would be good to do this as well. This required some way of passing this data from the client process to the AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1641) Spark submit warning tells the user to use spark-submit
[ https://issues.apache.org/jira/browse/SPARK-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1641: - Description: $ bin/spark-submit ... Spark assembly has been built with Hive, including Datanucleus jars on classpath WARNING: This client is deprecated and will be removed in a future version of Spark. Use ./bin/spark-submit with --master yarn This is printed in org.apache.spark.deploy.yarn.Client. was: $ bin/spark-submit ... Spark assembly has been built with Hive, including Datanucleus jars on classpath WARNING: This client is deprecated and will be removed in a future version of Spark. Use ./bin/spark-submit with --master yarn Spark submit warning tells the user to use spark-submit --- Key: SPARK-1641 URL: https://issues.apache.org/jira/browse/SPARK-1641 Project: Spark Issue Type: Improvement Reporter: Andrew Or Priority: Minor $ bin/spark-submit ... Spark assembly has been built with Hive, including Datanucleus jars on classpath WARNING: This client is deprecated and will be removed in a future version of Spark. Use ./bin/spark-submit with --master yarn This is printed in org.apache.spark.deploy.yarn.Client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-711) Spark Streaming 0.7.0: ArrayIndexOutOfBoundsException in KafkaWordCount Example
[ https://issues.apache.org/jira/browse/SPARK-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das closed SPARK-711. --- Resolution: Duplicate Duplicate issue: https://issues.apache.org/jira/browse/SPARK-703 Spark Streaming 0.7.0: ArrayIndexOutOfBoundsException in KafkaWordCount Example --- Key: SPARK-711 URL: https://issues.apache.org/jira/browse/SPARK-711 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 0.7.0 Reporter: Craig A. Vanderborgh The unmodified KafkaWordCount example is crashing when run under Mesos. It works fine when the master is local. The KafkaWordCount job works for about 5 iterations, then the exceptions start. This problem is related to windowing. Here is the stack trace: 13/03/07 15:43:46 ERROR streaming.JobManager: Running streaming job 5 @ 1362696226000 ms failed java.lang.ArrayIndexOutOfBoundsException: 0 at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:71) at spark.rdd.CoGroupedRDD$$anonfun$getPartitions$1$$anonfun$apply$mcVI$sp$1.apply(CoGroupedRDD.scala:65) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:233) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:233) at scala.collection.mutable.ArrayBuffer.map(ArrayBuffer.scala:47) at spark.rdd.CoGroupedRDD.getPartitions(CoGroupedRDD.scala:63) at spark.RDD.partitions(RDD.scala:168) at spark.MappedValuesRDD.getPartitions(PairRDDFunctions.scala:646) at spark.RDD.partitions(RDD.scala:168) at spark.RDD.take(RDD.scala:579) at spark.streaming.DStream$$anonfun$foreachFunc$2$1.apply(DStream.scala:495) at spark.streaming.DStream$$anonfun$foreachFunc$2$1.apply(DStream.scala:494) at spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:22) at spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:21) at spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:21) at spark.streaming.Job.run(Job.scala:10) at spark.streaming.JobManager$JobHandler.run(JobManager.scala:17) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Please let me know if I can help or provide additional information. Craig -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-944) Give example of writing to HBase from Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981693#comment-13981693 ] Tathagata Das commented on SPARK-944: - Do you have a working example now, that you would like to contribute? Give example of writing to HBase from Spark Streaming - Key: SPARK-944 URL: https://issues.apache.org/jira/browse/SPARK-944 Project: Spark Issue Type: New Feature Components: Streaming Reporter: Patrick Wendell Assignee: Patrick Cogan Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-944) Give example of writing to HBase from Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981693#comment-13981693 ] Tathagata Das edited comment on SPARK-944 at 4/25/14 10:10 PM: --- [~kanwal] Do you have a working example now, that you would like to contribute? was (Author: tdas): Do you have a working example now, that you would like to contribute? Give example of writing to HBase from Spark Streaming - Key: SPARK-944 URL: https://issues.apache.org/jira/browse/SPARK-944 Project: Spark Issue Type: New Feature Components: Streaming Reporter: Patrick Wendell Assignee: Patrick Cogan Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1639) Some tidying of Spark on YARN ApplicationMaster and ExecutorLauncher
[ https://issues.apache.org/jira/browse/SPARK-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-1639: -- Summary: Some tidying of Spark on YARN ApplicationMaster and ExecutorLauncher (was: Some tidying of Spark on YARN code) Some tidying of Spark on YARN ApplicationMaster and ExecutorLauncher Key: SPARK-1639 URL: https://issues.apache.org/jira/browse/SPARK-1639 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 0.9.0 Reporter: Sandy Ryza Assignee: Sandy Ryza I found a few places where we can consolidate duplicate methods, fix typos, add comments, and make what's going on more clear. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1637) Clean up examples for 1.0
[ https://issues.apache.org/jira/browse/SPARK-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1637: - Description: - Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - Move Python examples into examples/src/main/python - Update docs to reflect these changes - Clarify that the hand-written K-means and logistic regression examples are for demo purposes, but in reality you might want to use MLlib (we will add examples for these using MLlib too) was: - Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - Move Python examples into examples/src/main/python - Update docs to reflect these changes Clean up examples for 1.0 - Key: SPARK-1637 URL: https://issues.apache.org/jira/browse/SPARK-1637 Project: Spark Issue Type: Improvement Components: Examples Reporter: Matei Zaharia Priority: Critical Fix For: 1.0.0 - Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - Move Python examples into examples/src/main/python - Update docs to reflect these changes - Clarify that the hand-written K-means and logistic regression examples are for demo purposes, but in reality you might want to use MLlib (we will add examples for these using MLlib too) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1235) DAGScheduler ignores exceptions thrown in handleTaskCompletion
[ https://issues.apache.org/jira/browse/SPARK-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1235: - Affects Version/s: (was: 1.0.0) DAGScheduler ignores exceptions thrown in handleTaskCompletion -- Key: SPARK-1235 URL: https://issues.apache.org/jira/browse/SPARK-1235 Project: Spark Issue Type: Bug Affects Versions: 0.9.0, 0.9.1 Reporter: Kay Ousterhout Assignee: Nan Zhu Priority: Blocker Fix For: 1.0.0 If an exception gets thrown in the handleTaskCompletion method, the method exits, but the exception is caught somewhere (not clear where) and the DAGScheduler keeps running. Jobs hang as a result -- because not all of the task completion code gets run. This was first reported by Brad Miller on the mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-pyspark-crash-on-mesos-td2256.html and this behavior seems to have changed since 0.8 (when, based on Brad's description, it sounds like an exception in handleTaskCompletion would cause the DAGScheduler to crash), suggesting that this may be related to the Scala 2.10.3. To reproduce this problem, add throw new Exception(foo) anywhere in handleTaskCompletion and run any job locally. The job will hang and you can see the exception get printed in the logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1235) DAGScheduler ignores exceptions thrown in handleTaskCompletion
[ https://issues.apache.org/jira/browse/SPARK-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1235. -- Resolution: Fixed Fix Version/s: 1.0.0 Resolved in https://github.com/apache/spark/pull/186. DAGScheduler ignores exceptions thrown in handleTaskCompletion -- Key: SPARK-1235 URL: https://issues.apache.org/jira/browse/SPARK-1235 Project: Spark Issue Type: Bug Affects Versions: 0.9.0, 0.9.1 Reporter: Kay Ousterhout Assignee: Nan Zhu Priority: Blocker Fix For: 1.0.0 If an exception gets thrown in the handleTaskCompletion method, the method exits, but the exception is caught somewhere (not clear where) and the DAGScheduler keeps running. Jobs hang as a result -- because not all of the task completion code gets run. This was first reported by Brad Miller on the mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-pyspark-crash-on-mesos-td2256.html and this behavior seems to have changed since 0.8 (when, based on Brad's description, it sounds like an exception in handleTaskCompletion would cause the DAGScheduler to crash), suggesting that this may be related to the Scala 2.10.3. To reproduce this problem, add throw new Exception(foo) anywhere in handleTaskCompletion and run any job locally. The job will hang and you can see the exception get printed in the logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1620) Uncaught exception from Akka scheduler
[ https://issues.apache.org/jira/browse/SPARK-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamstra closed SPARK-1620. --- Resolution: Invalid Uncaught exception from Akka scheduler -- Key: SPARK-1620 URL: https://issues.apache.org/jira/browse/SPARK-1620 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Mark Hamstra Priority: Blocker I've been looking at this one in the context of a BlockManagerMaster that OOMs and doesn't respond to heartBeat(), but I suspect that there may be problems elsewhere where we use Akka's scheduler. The basic nature of the problem is that we are expecting exceptions thrown from a scheduled function to be caught in the thread where _ActorSystem_.scheduler.schedule() or scheduleOnce() has been called. In fact, the scheduled function runs on its own thread, so any exceptions that it throws are not caught in the thread that called schedule() -- e.g., unanswered BlockManager heartBeats (scheduled in BlockManager#initialize) that end up throwing exceptions in BlockManagerMaster#askDriverWithReply do not cause those exceptions to be handled by the Executor thread's UncaughtExceptionHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1632) Avoid boxing in ExternalAppendOnlyMap compares
[ https://issues.apache.org/jira/browse/SPARK-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1632. Resolution: Fixed Fix Version/s: 1.0.0 Avoid boxing in ExternalAppendOnlyMap compares -- Key: SPARK-1632 URL: https://issues.apache.org/jira/browse/SPARK-1632 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 1.0.0 Hitting an OOME in ExternalAppendOnlyMap.KCComparator while boxing an int. I don't know if this is the root cause, but the boxing is also avoidable. Code: {code} def compare(kc1: (K, C), kc2: (K, C)): Int = { kc1._1.hashCode().compareTo(kc2._1.hashCode()) } {code} Error: {code} java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.Integer.valueOf(Integer.java:642) at scala.Predef$.int2Integer(Predef.scala:370) at org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:432) at org.apache.spark.util.collection.ExternalAppendOnlyMap$KCComparator.compare(ExternalAppendOnlyMap.scala:430) at org.apache.spark.util.collection.AppendOnlyMap$$anon$3.compare(AppendOnlyMap.scala:271) at java.util.TimSort.mergeLo(TimSort.java:687) at java.util.TimSort.mergeAt(TimSort.java:483) at java.util.TimSort.mergeCollapse(TimSort.java:410) at java.util.TimSort.sort(TimSort.java:214) at java.util.Arrays.sort(Arrays.java:727) at org.apache.spark.util.collection.AppendOnlyMap.destructiveSortedIterator(AppendOnlyMap.scala:274) at org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:188) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insert(ExternalAppendOnlyMap.scala:141) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95) at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471) at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1299) making comments of RDD.doCheckpoint consistent with its usage
[ https://issues.apache.org/jira/browse/SPARK-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981819#comment-13981819 ] Nan Zhu commented on SPARK-1299: addressed in https://github.com/apache/spark/pull/186 making comments of RDD.doCheckpoint consistent with its usage - Key: SPARK-1299 URL: https://issues.apache.org/jira/browse/SPARK-1299 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu Priority: Trivial Fix For: 1.0.0 another trivial thing I found occasionally, the comments of function is saying that /** * Performs the checkpointing of this RDD by saving this. It is called by the DAGScheduler * after a job using this RDD has completed (therefore the RDD has been materialized and * potentially stored in memory). doCheckpoint() is called recursively on the parent RDDs. */ actually this function is called in SparkContext.runJob we can either change the comments or call it in DAGScheduler, I personally prefer the later one, as this calling seems like a auto-checkpoint , better put it in a non-user-facing component -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1299) making comments of RDD.doCheckpoint consistent with its usage
[ https://issues.apache.org/jira/browse/SPARK-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu resolved SPARK-1299. Resolution: Fixed Fix Version/s: 1.0.0 making comments of RDD.doCheckpoint consistent with its usage - Key: SPARK-1299 URL: https://issues.apache.org/jira/browse/SPARK-1299 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu Priority: Trivial Fix For: 1.0.0 another trivial thing I found occasionally, the comments of function is saying that /** * Performs the checkpointing of this RDD by saving this. It is called by the DAGScheduler * after a job using this RDD has completed (therefore the RDD has been materialized and * potentially stored in memory). doCheckpoint() is called recursively on the parent RDDs. */ actually this function is called in SparkContext.runJob we can either change the comments or call it in DAGScheduler, I personally prefer the later one, as this calling seems like a auto-checkpoint , better put it in a non-user-facing component -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1298) Remove duplicate partition id checking
[ https://issues.apache.org/jira/browse/SPARK-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu resolved SPARK-1298. Resolution: Fixed Fix Version/s: 1.0.0 Remove duplicate partition id checking -- Key: SPARK-1298 URL: https://issues.apache.org/jira/browse/SPARK-1298 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Nan Zhu Assignee: Nan Zhu Priority: Minor Fix For: 1.0.0 In the current implementation, we check whether partitionIDs make sense in SparkContext.runJob() https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L896 However, immediately following, in DAGScheduler (calling path SparkContext.runJob - DAGScheduler.runJob - DAGScheduler.submitJob), we check it again, (just missing a 0 condition), https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L432 I propose to remove the SparkContext one and check it in DAGScheduler (which makes more sense, from my view) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1356) [STREAMING] Annotate developer and experimental API's
[ https://issues.apache.org/jira/browse/SPARK-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-1356. -- Resolution: Fixed Assignee: Tathagata Das [STREAMING] Annotate developer and experimental API's - Key: SPARK-1356 URL: https://issues.apache.org/jira/browse/SPARK-1356 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Patrick Wendell Assignee: Tathagata Das Priority: Blocker Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1634) Java API docs contain test cases
[ https://issues.apache.org/jira/browse/SPARK-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng closed SPARK-1634. Resolution: Not a Problem Fix Version/s: 1.0.0 Assignee: Xiangrui Meng Re-tried with 'sbt/sbt clean`. The generated docs for test cases were gone. Java API docs contain test cases Key: SPARK-1634 URL: https://issues.apache.org/jira/browse/SPARK-1634 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.0.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.0 The generated Java API docs contain all test cases. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1558) [streaming] Update receiver information to match it with code
[ https://issues.apache.org/jira/browse/SPARK-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das closed SPARK-1558. Resolution: Invalid Irrelevant after underlying code changes. [streaming] Update receiver information to match it with code - Key: SPARK-1558 URL: https://issues.apache.org/jira/browse/SPARK-1558 Project: Spark Issue Type: Sub-task Components: Documentation, Streaming Reporter: Tathagata Das Assignee: Tathagata Das Priority: Blocker Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1606) spark-submit needs `--arg` for every application parameter
[ https://issues.apache.org/jira/browse/SPARK-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reassigned SPARK-1606: -- Assignee: Patrick Wendell spark-submit needs `--arg` for every application parameter -- Key: SPARK-1606 URL: https://issues.apache.org/jira/browse/SPARK-1606 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Xiangrui Meng Assignee: Patrick Wendell If the application has a few parameters, the spark-submit command looks like the following: {code} spark-submit --master yarn-cluster --class main.Class --arg --numPartitions --arg 8 --arg --kryo --arg true {code} It is a little bit hard to read and modify. Maybe it is okay to treat all arguments after `main.Class` as application parameters. {code} spark-submit --master yarn-cluster --class main.Class --numPartitions 8 --kryo true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1606) spark-submit needs `--arg` for every application parameter
[ https://issues.apache.org/jira/browse/SPARK-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981851#comment-13981851 ] Patrick Wendell commented on SPARK-1606: I submitted a PR here that adds the following syntax. {code} ./bin/spark-submit [options] user.jar [user options] {code} spark-submit needs `--arg` for every application parameter -- Key: SPARK-1606 URL: https://issues.apache.org/jira/browse/SPARK-1606 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Xiangrui Meng Assignee: Patrick Wendell Priority: Blocker If the application has a few parameters, the spark-submit command looks like the following: {code} spark-submit --master yarn-cluster --class main.Class --arg --numPartitions --arg 8 --arg --kryo --arg true {code} It is a little bit hard to read and modify. Maybe it is okay to treat all arguments after `main.Class` as application parameters. {code} spark-submit --master yarn-cluster --class main.Class --numPartitions 8 --kryo true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1606) spark-submit needs `--arg` for every application parameter
[ https://issues.apache.org/jira/browse/SPARK-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981851#comment-13981851 ] Patrick Wendell edited comment on SPARK-1606 at 4/26/14 2:26 AM: - I submitted a PR here that adds the following syntax. {code} ./bin/spark-submit [options] user.jar [user options] {code} https://github.com/apache/spark/pull/563 was (Author: pwendell): I submitted a PR here that adds the following syntax. {code} ./bin/spark-submit [options] user.jar [user options] {code} spark-submit needs `--arg` for every application parameter -- Key: SPARK-1606 URL: https://issues.apache.org/jira/browse/SPARK-1606 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Xiangrui Meng Assignee: Patrick Wendell Priority: Blocker If the application has a few parameters, the spark-submit command looks like the following: {code} spark-submit --master yarn-cluster --class main.Class --arg --numPartitions --arg 8 --arg --kryo --arg true {code} It is a little bit hard to read and modify. Maybe it is okay to treat all arguments after `main.Class` as application parameters. {code} spark-submit --master yarn-cluster --class main.Class --numPartitions 8 --kryo true {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1478) Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915
[ https://issues.apache.org/jira/browse/SPARK-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981855#comment-13981855 ] Ted Malaska commented on SPARK-1478: No worries that error was caused by me. Still learning Scala. It was the difference between using a lazy val and a var. I have all three test cases working now and I will do one last review before submitting it tomorrow. Now there is also one more odd thing going on that I haven't figured out yet. Sometime (seeming randomly) my tests will fail with the following exception: [info] - flume input stream *** FAILED *** (10 seconds, 332 milliseconds) [info] 0 did not equal 1 (FlumeStreamSuite.scala:104) [info] org.scalatest.exceptions.TestFailedException: Then I will rerun the test with no code changes and they will success. It feels very much like a race condition. Note I found this so odd that I did a fresh git clone and tested the latest branch and I also was able to get the exception. I will look into this tomorrow. I would assume at this point that something is odd in my environment until I find evidence of it being anything else. Thank you again for the help. Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 --- Key: SPARK-1478 URL: https://issues.apache.org/jira/browse/SPARK-1478 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Flume-1915 added support for compression over the wire from avro sink to avro source. I would like to add this functionality to the FlumeReceiver. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1478) Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915
[ https://issues.apache.org/jira/browse/SPARK-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981856#comment-13981856 ] Tathagata Das commented on SPARK-1478: -- Haha, yeah, lazy vals are super useful in difficult situations but can lead to difficult situations themselves if not careful. :) I am not sure what the flakiness is coming from, but that really needs to be solved. Flakiness can really be a major headache in our automated tests in Jenkins, etc. Suffering from flakiness myself in two PRs. :( Let me know how I can help in this. Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 --- Key: SPARK-1478 URL: https://issues.apache.org/jira/browse/SPARK-1478 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Flume-1915 added support for compression over the wire from avro sink to avro source. I would like to add this functionality to the FlumeReceiver. -- This message was sent by Atlassian JIRA (v6.2#6252)