[jira] [Commented] (SPARK-1764) EOF reached before Python server acknowledged
[ https://issues.apache.org/jira/browse/SPARK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995484#comment-13995484 ] Bernardo Gomez Palacio commented on SPARK-1764: --- We just ran `sc.parallelize(range(100)).map(lambda n: n * 2).collect()` on a Mesos 0.18.1 cluster with the latest spark and it worked. Could you confirm the Spark Mesos version you are using (if using master please include the sha/commit hash). EOF reached before Python server acknowledged - Key: SPARK-1764 URL: https://issues.apache.org/jira/browse/SPARK-1764 Project: Spark Issue Type: Bug Components: Mesos, PySpark Affects Versions: 1.0.0 Reporter: Bouke van der Bijl Priority: Blocker Labels: mesos, pyspark I'm getting EOF reached before Python server acknowledged while using PySpark on Mesos. The error manifests itself in multiple ways. One is: 14/05/08 18:10:40 ERROR DAGSchedulerActorSupervisor: eventProcesserActor failed due to the error EOF reached before Python server acknowledged; shutting down SparkContext And the other has a full stacktrace: 14/05/08 18:03:06 ERROR OneForOneStrategy: EOF reached before Python server acknowledged org.apache.spark.SparkException: EOF reached before Python server acknowledged at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:416) at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:387) at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:71) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:279) at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:277) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.Accumulators$.add(Accumulators.scala:277) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:818) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1204) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) This error causes the SparkContext to shutdown. I have not been able to reliably reproduce this bug, it seems to happen randomly, but if you run enough tasks on a SparkContext it'll hapen eventually -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1780) Non-existent SPARK_DAEMON_OPTS is referred to in a few places
[ https://issues.apache.org/jira/browse/SPARK-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995902#comment-13995902 ] Andrew Or commented on SPARK-1780: -- https://github.com/apache/spark/pull/751 Non-existent SPARK_DAEMON_OPTS is referred to in a few places - Key: SPARK-1780 URL: https://issues.apache.org/jira/browse/SPARK-1780 Project: Spark Issue Type: Bug Affects Versions: 0.9.1 Reporter: Andrew Or Fix For: 1.0.0 SparkConf.scala and spark-env.sh refer to a non-existent SPARK_DAEMON_OPTS. What they really mean SPARK_DAEMON_JAVA_OPTS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1161) Add saveAsObjectFile and SparkContext.objectFile in Python
[ https://issues.apache.org/jira/browse/SPARK-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996081#comment-13996081 ] Kan Zhang commented on SPARK-1161: -- PR: https://github.com/apache/spark/pull/755 Add saveAsObjectFile and SparkContext.objectFile in Python -- Key: SPARK-1161 URL: https://issues.apache.org/jira/browse/SPARK-1161 Project: Spark Issue Type: New Feature Components: PySpark Reporter: Matei Zaharia Assignee: Kan Zhang It can use pickling for serialization and a SequenceFile on disk similar to the JVM versions of these. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1802) Audit dependency graph when Spark is built with -Phive
[ https://issues.apache.org/jira/browse/SPARK-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995815#comment-13995815 ] Sean Owen commented on SPARK-1802: -- I looked further into just what might go wrong by including hive-exec into the assembly, since it includes its dependencies directly (i.e. Maven can't manage around it.) Attached is a full dump of the conflicts. The ones that are potential issues appear to be the following, and one looks like it could be a deal-breaker -- protobuf -- since it's neither forwards nor backwards compatible. That is, I recommend testing this assembly with an older Hadoop that needs 2.4.1 and see if it croaks. The rest might be worked around but need some additional mojo to make sure the right version wins in the packaging. Certainly having hive-exec in the build is making me queasy! [WARNING] hive-exec-0.12.0.jar, libthrift-0.9.0.jar define 153 overlappping classes: HBase includes libthrift-0.8.0, but it's in examples, and so figure this is ignorable. [WARNING] hive-exec-0.12.0.jar, commons-lang-2.4.jar define 2 overlappping classes: Probably ignorable, but we have to make sure commons-lang-3.3.2 'wins' in the build. [WARNING] hive-exec-0.12.0.jar, jackson-core-asl-1.9.11.jar define 117 overlappping classes: [WARNING] hive-exec-0.12.0.jar, jackson-mapper-asl-1.8.8.jar define 432 overlappping classes: Believe this are ignorable. (Not sure why the jackson versions are mismatched? another todo) [WARNING] hive-exec-0.12.0.jar, guava-14.0.1.jar define 1087 overlappping classes: Should be OK. Hive uses 11.0.2 like Hadoop; the build is already taking that particular risk. We need 14.0.1 to win. [WARNING] hive-exec-0.12.0.jar, protobuf-java-2.4.1.jar define 204 overlappping classes: Oof. Hive has protobuf 2.5.0. This has got to be a problem for older Hadoop builds? Audit dependency graph when Spark is built with -Phive -- Key: SPARK-1802 URL: https://issues.apache.org/jira/browse/SPARK-1802 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Sean Owen Priority: Blocker Fix For: 1.0.0 I'd like to have binary release for 1.0 include Hive support. Since this isn't enabled by default in the build I don't think it's as well tested, so we should dig around a bit and decide if we need to e.g. add any excludes. {code} $ mvn install -Phive -DskipTests mvn dependency:build-classpath -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort without_hive.txt $ mvn install -Phive -DskipTests mvn dependency:build-classpath -Phive -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort with_hive.txt $ diff without_hive.txt with_hive.txt antlr-2.7.7.jar antlr-3.4.jar antlr-runtime-3.4.jar 10,14d6 avro-1.7.4.jar avro-ipc-1.7.4.jar avro-ipc-1.7.4-tests.jar avro-mapred-1.7.4.jar bonecp-0.7.1.RELEASE.jar 22d13 commons-cli-1.2.jar 25d15 commons-compress-1.4.1.jar 33,34d22 commons-logging-1.1.1.jar commons-logging-api-1.0.4.jar 38d25 commons-pool-1.5.4.jar 46,49d32 datanucleus-api-jdo-3.2.1.jar datanucleus-core-3.2.2.jar datanucleus-rdbms-3.2.1.jar derby-10.4.2.0.jar 53,57d35 hive-common-0.12.0.jar hive-exec-0.12.0.jar hive-metastore-0.12.0.jar hive-serde-0.12.0.jar hive-shims-0.12.0.jar 60,61d37 httpclient-4.1.3.jar httpcore-4.1.3.jar 68d43 JavaEWAH-0.3.2.jar 73d47 javolution-5.5.1.jar 76d49 jdo-api-3.0.1.jar 78d50 jetty-6.1.26.jar 87d58 jetty-util-6.1.26.jar 93d63 json-20090211.jar 98d67 jta-1.1.jar 103,104d71 libfb303-0.9.0.jar libthrift-0.9.0.jar 112d78 mockito-all-1.8.5.jar 136d101 servlet-api-2.5-20081211.jar 139d103 snappy-0.2.jar 144d107 spark-hive_2.10-1.0.0.jar 151d113 ST4-4.0.4.jar 153d114 stringtemplate-3.2.1.jar 156d116 velocity-1.7.jar 158d117 xz-1.0.jar {code} Some initial investigation suggests we may need to take some precaution surrounding (a) jetty and (b) servlet-api. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1753) PySpark on YARN does not work on assembly jar built on Red Hat based OS
[ https://issues.apache.org/jira/browse/SPARK-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1753. Resolution: Fixed Fix Version/s: (was: 1.0.1) 1.0.0 Issue resolved by pull request 701 [https://github.com/apache/spark/pull/701] PySpark on YARN does not work on assembly jar built on Red Hat based OS --- Key: SPARK-1753 URL: https://issues.apache.org/jira/browse/SPARK-1753 Project: Spark Issue Type: Bug Components: PySpark, Spark Core Affects Versions: 1.0.0 Reporter: Andrew Or Fix For: 1.0.0 If the jar is built on a Red Hat based OS, the additional python files included in the jar cannot be accessed. This means PySpark doesn't work on YARN because in this mode it relies on the python files within this jar. I have confirmed that my Java, Scala, and maven versions are all exactly the same on my CentOS environment and on my local OSX environment, and the former does not work. Thomas Graves also struggled with the same problem. Until a fix is found, we should at the very least document this peculiarity. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1756) Add missing description to spark-env.sh.template
[ https://issues.apache.org/jira/browse/SPARK-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992535#comment-13992535 ] Guoqiang Li commented on SPARK-1756: [PR 646| https://github.com/apache/spark/pull/646] Add missing description to spark-env.sh.template Key: SPARK-1756 URL: https://issues.apache.org/jira/browse/SPARK-1756 Project: Spark Issue Type: Sub-task Components: Spark Core, YARN Reporter: Guoqiang Li Assignee: Guoqiang Li Fix For: 1.0.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1817) RDD zip erroneous when partitions do not divide RDD count
Michael Malak created SPARK-1817: Summary: RDD zip erroneous when partitions do not divide RDD count Key: SPARK-1817 URL: https://issues.apache.org/jira/browse/SPARK-1817 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: Michael Malak Example: scala sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect res1: Array[(Long, Int)] = Array((2,11)) But more generally, it's whenever the number of partitions does not evenly divide the total number of elements in the RDD. See https://groups.google.com/forum/#!msg/spark-users/demrmjHFnoc/Ek3ijiXHr2MJ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1616) input file not found issue
[ https://issues.apache.org/jira/browse/SPARK-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994044#comment-13994044 ] Marcelo Vanzin commented on SPARK-1616: --- Hi Prasad, This doesn't really sound like a bug, but a mismatch between your expectations and Spark's. When you tell a Spark job to read data from a file, Spark expects the file to be available to all the workers. This can be achieved in several ways: * Using a distributed file system such as HDFS * Using a networked file system such as NFS * Using Spark's file distribution mechanism, which will copy the file to the workers for you (e.g. spark-submit's --files argument if you run 1.0) * Manually copying the file like you did But Spark will not automatically copy data to worker nodes on your behalf. input file not found issue --- Key: SPARK-1616 URL: https://issues.apache.org/jira/browse/SPARK-1616 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 0.9.0 Environment: Linux 2.6.18-348.3.1.el5 Reporter: prasad potipireddi -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1818) Freshen Mesos docs
[ https://issues.apache.org/jira/browse/SPARK-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996146#comment-13996146 ] Andrew Ash commented on SPARK-1818: --- https://github.com/apache/spark/pull/756 Freshen Mesos docs -- Key: SPARK-1818 URL: https://issues.apache.org/jira/browse/SPARK-1818 Project: Spark Issue Type: Documentation Components: Documentation, Mesos Affects Versions: 1.0.0 Reporter: Andrew Ash They haven't been updated since 0.6.0 and encourage compiling both Mesos and Spark from scratch. Include mention of the precompiled binary versions of both projects available and otherwise generally freshen the documentation for Mesos newcomers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1780) Non-existent SPARK_DAEMON_OPTS is referred to in a few places
[ https://issues.apache.org/jira/browse/SPARK-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1780. Resolution: Fixed Assignee: Andrew Or (was: Patrick Wendell) Non-existent SPARK_DAEMON_OPTS is referred to in a few places - Key: SPARK-1780 URL: https://issues.apache.org/jira/browse/SPARK-1780 Project: Spark Issue Type: Bug Affects Versions: 0.9.1 Reporter: Andrew Or Assignee: Andrew Or Fix For: 1.0.0 SparkConf.scala and spark-env.sh refer to a non-existent SPARK_DAEMON_OPTS. What they really mean SPARK_DAEMON_JAVA_OPTS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1802) Audit dependency graph when Spark is built with -Phive
[ https://issues.apache.org/jira/browse/SPARK-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996178#comment-13996178 ] Patrick Wendell commented on SPARK-1802: This protobuf thing is very troubling. The options here are pretty limited since they publish this assembly jar. I see a few: 1. Publish a Hive 0.12 that users our shaded protobuf 2.4.1 (we already published a shaded version of protobuf 2.4.1). I actually have this working in a local build of Hive 0.12, but I haven't tried to push it to sonatype yet: https://github.com/pwendell/hive/commits/branch-0.12-shaded-protobuf 2. Upgrade our use of hive to 0.13 (which bumps to protobuf 2.5.0) and only support Spark SQL with Hadoop 2+ - that is, versions of Hadoop that have also bumped to protobuf 2.5.0. I'm not sure how big of an effort that would be in terms of the code changes between 0.12 and 0.13. Spark didn't recompile trivially. I can talk to Michael Armbrust tomorrow morning about this. One thing I don't totally understand is how Hive itself deals with this conflict. For instance, if someone wants to run Hive 0.12 with Hadoop 2. Presumably both the Hive protobuf 2.4.1 and the HDFS client protobuf 2.5.0 will be in the JVM at the same time... I'm not sure how they are isolated from each-other. HDP 2.1 for instance, seems to have both (http://hortonworks.com/hdp/whats-new/) Audit dependency graph when Spark is built with -Phive -- Key: SPARK-1802 URL: https://issues.apache.org/jira/browse/SPARK-1802 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Sean Owen Priority: Blocker Fix For: 1.0.0 Attachments: hive-exec-jar-problems.txt I'd like to have binary release for 1.0 include Hive support. Since this isn't enabled by default in the build I don't think it's as well tested, so we should dig around a bit and decide if we need to e.g. add any excludes. {code} $ mvn install -Phive -DskipTests mvn dependency:build-classpath -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort without_hive.txt $ mvn install -Phive -DskipTests mvn dependency:build-classpath -Phive -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort with_hive.txt $ diff without_hive.txt with_hive.txt antlr-2.7.7.jar antlr-3.4.jar antlr-runtime-3.4.jar 10,14d6 avro-1.7.4.jar avro-ipc-1.7.4.jar avro-ipc-1.7.4-tests.jar avro-mapred-1.7.4.jar bonecp-0.7.1.RELEASE.jar 22d13 commons-cli-1.2.jar 25d15 commons-compress-1.4.1.jar 33,34d22 commons-logging-1.1.1.jar commons-logging-api-1.0.4.jar 38d25 commons-pool-1.5.4.jar 46,49d32 datanucleus-api-jdo-3.2.1.jar datanucleus-core-3.2.2.jar datanucleus-rdbms-3.2.1.jar derby-10.4.2.0.jar 53,57d35 hive-common-0.12.0.jar hive-exec-0.12.0.jar hive-metastore-0.12.0.jar hive-serde-0.12.0.jar hive-shims-0.12.0.jar 60,61d37 httpclient-4.1.3.jar httpcore-4.1.3.jar 68d43 JavaEWAH-0.3.2.jar 73d47 javolution-5.5.1.jar 76d49 jdo-api-3.0.1.jar 78d50 jetty-6.1.26.jar 87d58 jetty-util-6.1.26.jar 93d63 json-20090211.jar 98d67 jta-1.1.jar 103,104d71 libfb303-0.9.0.jar libthrift-0.9.0.jar 112d78 mockito-all-1.8.5.jar 136d101 servlet-api-2.5-20081211.jar 139d103 snappy-0.2.jar 144d107 spark-hive_2.10-1.0.0.jar 151d113 ST4-4.0.4.jar 153d114 stringtemplate-3.2.1.jar 156d116 velocity-1.7.jar 158d117 xz-1.0.jar {code} Some initial investigation suggests we may need to take some precaution surrounding (a) jetty and (b) servlet-api. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1802) Audit dependency graph when Spark is built with -Phive
[ https://issues.apache.org/jira/browse/SPARK-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996178#comment-13996178 ] Patrick Wendell edited comment on SPARK-1802 at 5/13/14 8:18 AM: - This protobuf thing is very troubling. The options here are pretty limited since they publish this assembly jar. I see a few: 1. Publish a Hive 0.12 that uses our shaded protobuf 2.4.1 (we already published a shaded version of protobuf 2.4.1). I actually have this working in a local build of Hive 0.12, but I haven't tried to push it to sonatype yet: https://github.com/pwendell/hive/commits/branch-0.12-shaded-protobuf 2. Upgrade our use of hive to 0.13 (which bumps to protobuf 2.5.0) and only support Spark SQL with Hadoop 2+ - that is, versions of Hadoop that have also bumped to protobuf 2.5.0. I'm not sure how big of an effort that would be in terms of the code changes between 0.12 and 0.13. Spark didn't recompile trivially. I can talk to Michael Armbrust tomorrow morning about this. One thing I don't totally understand is how Hive itself deals with this conflict. For instance, if someone wants to run Hive 0.12 with Hadoop 2. Presumably both the Hive protobuf 2.4.1 and the HDFS client protobuf 2.5.0 will be in the JVM at the same time... I'm not sure how they are isolated from each-other. HDP 2.1 for instance, seems to have both (http://hortonworks.com/hdp/whats-new/) was (Author: pwendell): This protobuf thing is very troubling. The options here are pretty limited since they publish this assembly jar. I see a few: 1. Publish a Hive 0.12 that users our shaded protobuf 2.4.1 (we already published a shaded version of protobuf 2.4.1). I actually have this working in a local build of Hive 0.12, but I haven't tried to push it to sonatype yet: https://github.com/pwendell/hive/commits/branch-0.12-shaded-protobuf 2. Upgrade our use of hive to 0.13 (which bumps to protobuf 2.5.0) and only support Spark SQL with Hadoop 2+ - that is, versions of Hadoop that have also bumped to protobuf 2.5.0. I'm not sure how big of an effort that would be in terms of the code changes between 0.12 and 0.13. Spark didn't recompile trivially. I can talk to Michael Armbrust tomorrow morning about this. One thing I don't totally understand is how Hive itself deals with this conflict. For instance, if someone wants to run Hive 0.12 with Hadoop 2. Presumably both the Hive protobuf 2.4.1 and the HDFS client protobuf 2.5.0 will be in the JVM at the same time... I'm not sure how they are isolated from each-other. HDP 2.1 for instance, seems to have both (http://hortonworks.com/hdp/whats-new/) Audit dependency graph when Spark is built with -Phive -- Key: SPARK-1802 URL: https://issues.apache.org/jira/browse/SPARK-1802 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Sean Owen Priority: Blocker Fix For: 1.0.0 Attachments: hive-exec-jar-problems.txt I'd like to have binary release for 1.0 include Hive support. Since this isn't enabled by default in the build I don't think it's as well tested, so we should dig around a bit and decide if we need to e.g. add any excludes. {code} $ mvn install -Phive -DskipTests mvn dependency:build-classpath -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort without_hive.txt $ mvn install -Phive -DskipTests mvn dependency:build-classpath -Phive -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort with_hive.txt $ diff without_hive.txt with_hive.txt antlr-2.7.7.jar antlr-3.4.jar antlr-runtime-3.4.jar 10,14d6 avro-1.7.4.jar avro-ipc-1.7.4.jar avro-ipc-1.7.4-tests.jar avro-mapred-1.7.4.jar bonecp-0.7.1.RELEASE.jar 22d13 commons-cli-1.2.jar 25d15 commons-compress-1.4.1.jar 33,34d22 commons-logging-1.1.1.jar commons-logging-api-1.0.4.jar 38d25 commons-pool-1.5.4.jar 46,49d32 datanucleus-api-jdo-3.2.1.jar datanucleus-core-3.2.2.jar datanucleus-rdbms-3.2.1.jar derby-10.4.2.0.jar 53,57d35 hive-common-0.12.0.jar hive-exec-0.12.0.jar hive-metastore-0.12.0.jar hive-serde-0.12.0.jar hive-shims-0.12.0.jar 60,61d37 httpclient-4.1.3.jar httpcore-4.1.3.jar 68d43 JavaEWAH-0.3.2.jar 73d47 javolution-5.5.1.jar 76d49 jdo-api-3.0.1.jar 78d50 jetty-6.1.26.jar 87d58 jetty-util-6.1.26.jar 93d63 json-20090211.jar 98d67 jta-1.1.jar 103,104d71 libfb303-0.9.0.jar libthrift-0.9.0.jar 112d78 mockito-all-1.8.5.jar 136d101 servlet-api-2.5-20081211.jar 139d103 snappy-0.2.jar 144d107 spark-hive_2.10-1.0.0.jar 151d113 ST4-4.0.4.jar 153d114 stringtemplate-3.2.1.jar 156d116 velocity-1.7.jar 158d117 xz-1.0.jar {code} Some initial investigation
[jira] [Created] (SPARK-1819) Fix GetField.nullable.
Takuya Ueshin created SPARK-1819: Summary: Fix GetField.nullable. Key: SPARK-1819 URL: https://issues.apache.org/jira/browse/SPARK-1819 Project: Spark Issue Type: Bug Components: SQL Reporter: Takuya Ueshin {{GetField.nullable}} should be {{true}} not only when {{field.nullable}} is {{true}} but also when {{child.nullable}} is {{true}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1802) Audit dependency graph when Spark is built with -Phive
[ https://issues.apache.org/jira/browse/SPARK-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1802. Resolution: Fixed Issue resolved by pull request 744 [https://github.com/apache/spark/pull/744] Audit dependency graph when Spark is built with -Phive -- Key: SPARK-1802 URL: https://issues.apache.org/jira/browse/SPARK-1802 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Priority: Blocker Fix For: 1.0.0 I'd like to have binary release for 1.0 include Hive support. Since this isn't enabled by default in the build I don't think it's as well tested, so we should dig around a bit and decide if we need to e.g. add any excludes. {code} $ mvn install -Phive -DskipTests mvn dependency:build-classpath -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort without_hive.txt $ mvn install -Phive -DskipTests mvn dependency:build-classpath -Phive -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort with_hive.txt $ diff without_hive.txt with_hive.txt antlr-2.7.7.jar antlr-3.4.jar antlr-runtime-3.4.jar 10,14d6 avro-1.7.4.jar avro-ipc-1.7.4.jar avro-ipc-1.7.4-tests.jar avro-mapred-1.7.4.jar bonecp-0.7.1.RELEASE.jar 22d13 commons-cli-1.2.jar 25d15 commons-compress-1.4.1.jar 33,34d22 commons-logging-1.1.1.jar commons-logging-api-1.0.4.jar 38d25 commons-pool-1.5.4.jar 46,49d32 datanucleus-api-jdo-3.2.1.jar datanucleus-core-3.2.2.jar datanucleus-rdbms-3.2.1.jar derby-10.4.2.0.jar 53,57d35 hive-common-0.12.0.jar hive-exec-0.12.0.jar hive-metastore-0.12.0.jar hive-serde-0.12.0.jar hive-shims-0.12.0.jar 60,61d37 httpclient-4.1.3.jar httpcore-4.1.3.jar 68d43 JavaEWAH-0.3.2.jar 73d47 javolution-5.5.1.jar 76d49 jdo-api-3.0.1.jar 78d50 jetty-6.1.26.jar 87d58 jetty-util-6.1.26.jar 93d63 json-20090211.jar 98d67 jta-1.1.jar 103,104d71 libfb303-0.9.0.jar libthrift-0.9.0.jar 112d78 mockito-all-1.8.5.jar 136d101 servlet-api-2.5-20081211.jar 139d103 snappy-0.2.jar 144d107 spark-hive_2.10-1.0.0.jar 151d113 ST4-4.0.4.jar 153d114 stringtemplate-3.2.1.jar 156d116 velocity-1.7.jar 158d117 xz-1.0.jar {code} Some initial investigation suggests we may need to take some precaution surrounding (a) jetty and (b) servlet-api. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1760) mvn -Dsuites=* test throw an ClassNotFoundException
[ https://issues.apache.org/jira/browse/SPARK-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993493#comment-13993493 ] Sean Owen commented on SPARK-1760: -- If `wildcardSuites` lets you invoke specific suites across the whole project, then that sounds like an ideal solution. If it works then I'd propose that as a small doc change? mvn -Dsuites=* test throw an ClassNotFoundException -- Key: SPARK-1760 URL: https://issues.apache.org/jira/browse/SPARK-1760 Project: Spark Issue Type: Bug Reporter: Guoqiang Li Assignee: Guoqiang Li Fix For: 1.0.0 {{mvn -Dhadoop.version=0.23.9 -Phadoop-0.23 -Dsuites=org.apache.spark.repl.ReplSuite test}} = {code} *** RUN ABORTED *** java.lang.ClassNotFoundException: org.apache.spark.repl.ReplSuite at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1470) at org.scalatest.tools.Runner$$anonfun$21.apply(Runner.scala:1469) at scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264) at scala.collection.immutable.List.foreach(List.scala:318) ... {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1802) Audit dependency graph when Spark is built with -Phive
[ https://issues.apache.org/jira/browse/SPARK-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995815#comment-13995815 ] Sean Owen edited comment on SPARK-1802 at 5/13/14 11:27 AM: (Edited to fix comment about protobuf versions) I looked further into just what might go wrong by including hive-exec into the assembly, since it includes its dependencies directly (i.e. Maven can't manage around it.) Attached is a full dump of the conflicts. The ones that are potential issues appear to be the following, and one looks like it could be a deal-breaker -- protobuf -- since it's neither forwards nor backwards compatible. That is, I recommend testing this assembly with an *newer* Hadoop that needs 2.5 and see if it croaks. The rest might be worked around but need some additional mojo to make sure the right version wins in the packaging. Certainly having hive-exec in the build is making me queasy! [WARNING] hive-exec-0.12.0.jar, libthrift-0.9.0.jar define 153 overlappping classes: HBase includes libthrift-0.8.0, but it's in examples, and so figure this is ignorable. [WARNING] hive-exec-0.12.0.jar, commons-lang-2.4.jar define 2 overlappping classes: Probably ignorable, but we have to make sure commons-lang-3.3.2 'wins' in the build. [WARNING] hive-exec-0.12.0.jar, jackson-core-asl-1.9.11.jar define 117 overlappping classes: [WARNING] hive-exec-0.12.0.jar, jackson-mapper-asl-1.8.8.jar define 432 overlappping classes: Believe this are ignorable. (Not sure why the jackson versions are mismatched? another todo) [WARNING] hive-exec-0.12.0.jar, guava-14.0.1.jar define 1087 overlappping classes: Should be OK. Hive uses 11.0.2 like Hadoop; the build is already taking that particular risk. We need 14.0.1 to win. [WARNING] hive-exec-0.12.0.jar, protobuf-java-2.4.1.jar define 204 overlappping classes: Oof. Hive has protobuf *2.4.1*. This has got to be a problem for newer Hadoop builds? (Edited to fix comment about protobuf versions) was (Author: srowen): I looked further into just what might go wrong by including hive-exec into the assembly, since it includes its dependencies directly (i.e. Maven can't manage around it.) Attached is a full dump of the conflicts. The ones that are potential issues appear to be the following, and one looks like it could be a deal-breaker -- protobuf -- since it's neither forwards nor backwards compatible. That is, I recommend testing this assembly with an older Hadoop that needs 2.4.1 and see if it croaks. The rest might be worked around but need some additional mojo to make sure the right version wins in the packaging. Certainly having hive-exec in the build is making me queasy! [WARNING] hive-exec-0.12.0.jar, libthrift-0.9.0.jar define 153 overlappping classes: HBase includes libthrift-0.8.0, but it's in examples, and so figure this is ignorable. [WARNING] hive-exec-0.12.0.jar, commons-lang-2.4.jar define 2 overlappping classes: Probably ignorable, but we have to make sure commons-lang-3.3.2 'wins' in the build. [WARNING] hive-exec-0.12.0.jar, jackson-core-asl-1.9.11.jar define 117 overlappping classes: [WARNING] hive-exec-0.12.0.jar, jackson-mapper-asl-1.8.8.jar define 432 overlappping classes: Believe this are ignorable. (Not sure why the jackson versions are mismatched? another todo) [WARNING] hive-exec-0.12.0.jar, guava-14.0.1.jar define 1087 overlappping classes: Should be OK. Hive uses 11.0.2 like Hadoop; the build is already taking that particular risk. We need 14.0.1 to win. [WARNING] hive-exec-0.12.0.jar, protobuf-java-2.4.1.jar define 204 overlappping classes: Oof. Hive has protobuf 2.5.0. This has got to be a problem for older Hadoop builds? Audit dependency graph when Spark is built with -Phive -- Key: SPARK-1802 URL: https://issues.apache.org/jira/browse/SPARK-1802 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Sean Owen Priority: Blocker Fix For: 1.0.0 Attachments: hive-exec-jar-problems.txt I'd like to have binary release for 1.0 include Hive support. Since this isn't enabled by default in the build I don't think it's as well tested, so we should dig around a bit and decide if we need to e.g. add any excludes. {code} $ mvn install -Phive -DskipTests mvn dependency:build-classpath -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort without_hive.txt $ mvn install -Phive -DskipTests mvn dependency:build-classpath -Phive -pl assembly | grep -v INFO | tr : \n | awk ' { FS=/; print ( $(NF) ); }' | sort with_hive.txt $ diff without_hive.txt with_hive.txt antlr-2.7.7.jar antlr-3.4.jar antlr-runtime-3.4.jar 10,14d6 avro-1.7.4.jar
[jira] [Commented] (SPARK-1819) Fix GetField.nullable.
[ https://issues.apache.org/jira/browse/SPARK-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996201#comment-13996201 ] Takuya Ueshin commented on SPARK-1819: -- Pull-requested: https://github.com/apache/spark/pull/757 Fix GetField.nullable. -- Key: SPARK-1819 URL: https://issues.apache.org/jira/browse/SPARK-1819 Project: Spark Issue Type: Bug Components: SQL Reporter: Takuya Ueshin {{GetField.nullable}} should be {{true}} not only when {{field.nullable}} is {{true}} but also when {{child.nullable}} is {{true}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1680) Clean up use of setExecutorEnvs in SparkConf
[ https://issues.apache.org/jira/browse/SPARK-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1680: --- Fix Version/s: (was: 1.0.0) 1.1.0 Clean up use of setExecutorEnvs in SparkConf - Key: SPARK-1680 URL: https://issues.apache.org/jira/browse/SPARK-1680 Project: Spark Issue Type: Sub-task Components: Spark Core Reporter: Patrick Wendell Priority: Blocker Fix For: 1.1.0 We should make this consistent between YARN and Standalone. Basically, YARN mode should just use the executorEnvs from the Spark conf and not need SPARK_YARN_USER_ENV. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1809) Mesos backend doesn't respect HADOOP_CONF_DIR
Andrew Ash created SPARK-1809: - Summary: Mesos backend doesn't respect HADOOP_CONF_DIR Key: SPARK-1809 URL: https://issues.apache.org/jira/browse/SPARK-1809 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Andrew Ash In order to use HDFS paths without the server component, standalone mode reads spark-env.sh and scans the HADOOP_CONF_DIR to open core-site.xml and get the fs.default.name parameter. This lets you use HDFS paths like: - hdfs:///tmp/myfile.txt instead of - hdfs://myserver.mydomain.com:8020/tmp/myfile.txt However as of recent 1.0.0 pre-release (hash 756c96) I had to specify HDFS paths with the full server even though I have HADOOP_CONF_DIR still set in spark-env.sh. The HDFS, Spark, and Mesos nodes are all co-located and non-domain HDFS paths work fine when using the standalone mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1813) Add a utility to SparkConf that makes using Kryo really easy
Sandy Ryza created SPARK-1813: - Summary: Add a utility to SparkConf that makes using Kryo really easy Key: SPARK-1813 URL: https://issues.apache.org/jira/browse/SPARK-1813 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Sandy Ryza It would be nice to have a method in SparkConf that makes it really easy to use Kryo and register a set of classes. without defining you Using Kryo currently requires all this: {code} import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[MyClass1]) kryo.register(classOf[MyClass2]) } } val conf = new SparkConf().setMaster(...).setAppName(...) conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrator, mypackage.MyRegistrator) val sc = new SparkContext(conf) {code} It would be nice if it just required this: {code} SparkConf.setKryo(Array(classOf[MyFirstClass, classOf[MySecond])) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1813) Add a utility to SparkConf that makes using Kryo really easy
[ https://issues.apache.org/jira/browse/SPARK-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996390#comment-13996390 ] Mridul Muralidharan commented on SPARK-1813: Writing a KryoRegistrator is the only requirement - rest are done as part of initialization anyway. Registering classes with kryo is non trivial except for degenerate cases : for example, we have classes we have to use java read/write Object serialization, which support kyro serialization, which support java's external serialization, generated classes, etc. And we would need a registrator ... ofcourse, it could be argued this is corner case, though I dont think so. Add a utility to SparkConf that makes using Kryo really easy Key: SPARK-1813 URL: https://issues.apache.org/jira/browse/SPARK-1813 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Sandy Ryza It would be nice to have a method in SparkConf that makes it really easy to use Kryo and register a set of classes. without defining you Using Kryo currently requires all this: {code} import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[MyClass1]) kryo.register(classOf[MyClass2]) } } val conf = new SparkConf().setMaster(...).setAppName(...) conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrator, mypackage.MyRegistrator) val sc = new SparkContext(conf) {code} It would be nice if it just required this: {code} SparkConf.setKryo(Array(classOf[MyFirstClass, classOf[MySecond])) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1808) bin/pyspark does not load default configuration properties
Andrew Or created SPARK-1808: Summary: bin/pyspark does not load default configuration properties Key: SPARK-1808 URL: https://issues.apache.org/jira/browse/SPARK-1808 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Andrew Or Fix For: 1.0.1 ... because it doesn't go through spark-submit. Either we make it go through spark-submit (hard), or we extract the load default configurations logic and set them for the JVM that launches the py4j GatewayServer (easier). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1810) The spark tar ball does not unzip into a separate folder when un-tarred.
Manikandan Narayanaswamy created SPARK-1810: --- Summary: The spark tar ball does not unzip into a separate folder when un-tarred. Key: SPARK-1810 URL: https://issues.apache.org/jira/browse/SPARK-1810 Project: Spark Issue Type: Bug Components: Build Affects Versions: 0.9.0 Environment: All environments Reporter: Manikandan Narayanaswamy Priority: Minor All other Hadoop components when extracted are contained within a new folder that is created. But, this is not the case for Spark. The Spark.tar decompresses all files into the Current Working Directory. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Closed] (SPARK-1821) Document History Server
[ https://issues.apache.org/jira/browse/SPARK-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu closed SPARK-1821. -- Resolution: Implemented sorry, missed some documents Document History Server --- Key: SPARK-1821 URL: https://issues.apache.org/jira/browse/SPARK-1821 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.0.0 Reporter: Nan Zhu In 1.0, there is a new component, history server, which is not mentioned in http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/ I think we'd better add the missing document -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1813) Add a utility to SparkConf that makes using Kryo really easy
[ https://issues.apache.org/jira/browse/SPARK-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-1813: -- Description: It would be nice to have a method in SparkConf that makes it really easy to turn on Kryo serialization and register a set of classes. Using Kryo currently requires all this: {code} import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[MyClass1]) kryo.register(classOf[MyClass2]) } } val conf = new SparkConf().setMaster(...).setAppName(...) conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrator, mypackage.MyRegistrator) val sc = new SparkContext(conf) {code} It would be nice if it just required this: {code} SparkConf.setKryo(Array(classOf[MyFirstClass, classOf[MySecond])) {code} was: It would be nice to have a method in SparkConf that makes it really easy to use Kryo and register a set of classes. without defining you Using Kryo currently requires all this: {code} import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[MyClass1]) kryo.register(classOf[MyClass2]) } } val conf = new SparkConf().setMaster(...).setAppName(...) conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrator, mypackage.MyRegistrator) val sc = new SparkContext(conf) {code} It would be nice if it just required this: {code} SparkConf.setKryo(Array(classOf[MyFirstClass, classOf[MySecond])) {code} Add a utility to SparkConf that makes using Kryo really easy Key: SPARK-1813 URL: https://issues.apache.org/jira/browse/SPARK-1813 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Sandy Ryza It would be nice to have a method in SparkConf that makes it really easy to turn on Kryo serialization and register a set of classes. Using Kryo currently requires all this: {code} import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[MyClass1]) kryo.register(classOf[MyClass2]) } } val conf = new SparkConf().setMaster(...).setAppName(...) conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrator, mypackage.MyRegistrator) val sc = new SparkContext(conf) {code} It would be nice if it just required this: {code} SparkConf.setKryo(Array(classOf[MyFirstClass, classOf[MySecond])) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1769) Executor loss can cause race condition in Pool
[ https://issues.apache.org/jira/browse/SPARK-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson updated SPARK-1769: -- Assignee: Andrew Or (was: Aaron Davidson) Executor loss can cause race condition in Pool -- Key: SPARK-1769 URL: https://issues.apache.org/jira/browse/SPARK-1769 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Aaron Davidson Assignee: Andrew Or Loss of executors (in this case due to OOMs) exposes a race condition in Pool.scala, evident from this stack trace: {code} 14/05/08 22:41:48 ERROR OneForOneStrategy: java.lang.NullPointerException at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.TaskSchedulerImpl.removeExecutor(TaskSchedulerImpl.scala:412) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:385) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.removeExecutor(CoarseGrainedSchedulerBackend.scala:160) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:123) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} Note that the line of code that throws this exception is here: {code} schedulableQueue.foreach(_.executorLost(executorId, host)) {code} By the stack trace, it's not schedulableQueue that is null, but an element therein. As far as I could tell, we never add a null element to this queue. Rather, I could see that removeSchedulable() and executorLost() were called at about the same time (via log messages), and suspect that since this ArrayBuffer is in no way synchronized, that we iterate through the list while it's in an incomplete state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (SPARK-1769) Executor loss can cause race condition in Pool
[ https://issues.apache.org/jira/browse/SPARK-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson reassigned SPARK-1769: - Assignee: Aaron Davidson Executor loss can cause race condition in Pool -- Key: SPARK-1769 URL: https://issues.apache.org/jira/browse/SPARK-1769 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Aaron Davidson Assignee: Aaron Davidson Loss of executors (in this case due to OOMs) exposes a race condition in Pool.scala, evident from this stack trace: {code} 14/05/08 22:41:48 ERROR OneForOneStrategy: java.lang.NullPointerException at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.TaskSchedulerImpl.removeExecutor(TaskSchedulerImpl.scala:412) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:385) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.removeExecutor(CoarseGrainedSchedulerBackend.scala:160) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:123) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} Note that the line of code that throws this exception is here: {code} schedulableQueue.foreach(_.executorLost(executorId, host)) {code} By the stack trace, it's not schedulableQueue that is null, but an element therein. As far as I could tell, we never add a null element to this queue. Rather, I could see that removeSchedulable() and executorLost() were called at about the same time (via log messages), and suspect that since this ArrayBuffer is in no way synchronized, that we iterate through the list while it's in an incomplete state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1817) RDD zip erroneous when partitions do not divide RDD count
[ https://issues.apache.org/jira/browse/SPARK-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996818#comment-13996818 ] Kan Zhang commented on SPARK-1817: -- PR: https://github.com/apache/spark/pull/760 RDD zip erroneous when partitions do not divide RDD count - Key: SPARK-1817 URL: https://issues.apache.org/jira/browse/SPARK-1817 Project: Spark Issue Type: Bug Affects Versions: 0.9.0 Reporter: Michael Malak Assignee: Kan Zhang Priority: Blocker Example: scala sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect res1: Array[(Long, Int)] = Array((2,11)) But more generally, it's whenever the number of partitions does not evenly divide the total number of elements in the RDD. See https://groups.google.com/forum/#!msg/spark-users/demrmjHFnoc/Ek3ijiXHr2MJ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1821) Document History Server
Nan Zhu created SPARK-1821: -- Summary: Document History Server Key: SPARK-1821 URL: https://issues.apache.org/jira/browse/SPARK-1821 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 1.0.0 Reporter: Nan Zhu In 1.0, there is a new component in the standalone mode, history server, which is not mentioned in http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/spark-standalone.html I think we'd better add the missing document -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1821) Document History Server
[ https://issues.apache.org/jira/browse/SPARK-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nan Zhu updated SPARK-1821: --- Issue Type: Improvement (was: Bug) Document History Server --- Key: SPARK-1821 URL: https://issues.apache.org/jira/browse/SPARK-1821 Project: Spark Issue Type: Improvement Components: Deploy Affects Versions: 1.0.0 Reporter: Nan Zhu In 1.0, there is a new component in the standalone mode, history server, which is not mentioned in http://people.apache.org/~pwendell/spark-1.0.0-rc3-docs/spark-standalone.html I think we'd better add the missing document -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1813) Add a utility to SparkConf that makes using Kryo really easy
[ https://issues.apache.org/jira/browse/SPARK-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-1813: -- Description: It would be nice to have a method in SparkConf that makes it really easy to turn on Kryo serialization and register a set of classes. Using Kryo currently requires all this: {code} import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[MyClass1]) kryo.register(classOf[MyClass2]) } } val conf = new SparkConf().setMaster(...).setAppName(...) conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrator, mypackage.MyRegistrator) val sc = new SparkContext(conf) {code} It would be nice if it just required this: {code} SparkConf.setKryo(Array(classOf[MyClass1], classOf[MyClass2])) {code} was: It would be nice to have a method in SparkConf that makes it really easy to turn on Kryo serialization and register a set of classes. Using Kryo currently requires all this: {code} import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[MyClass1]) kryo.register(classOf[MyClass2]) } } val conf = new SparkConf().setMaster(...).setAppName(...) conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrator, mypackage.MyRegistrator) val sc = new SparkContext(conf) {code} It would be nice if it just required this: {code} SparkConf.setKryo(Array(classOf[MyFirstClass, classOf[MySecond])) {code} Add a utility to SparkConf that makes using Kryo really easy Key: SPARK-1813 URL: https://issues.apache.org/jira/browse/SPARK-1813 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Sandy Ryza It would be nice to have a method in SparkConf that makes it really easy to turn on Kryo serialization and register a set of classes. Using Kryo currently requires all this: {code} import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[MyClass1]) kryo.register(classOf[MyClass2]) } } val conf = new SparkConf().setMaster(...).setAppName(...) conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer) conf.set(spark.kryo.registrator, mypackage.MyRegistrator) val sc = new SparkContext(conf) {code} It would be nice if it just required this: {code} SparkConf.setKryo(Array(classOf[MyClass1], classOf[MyClass2])) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1789) Multiple versions of Netty dependencies cause FlumeStreamSuite failure
[ https://issues.apache.org/jira/browse/SPARK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995560#comment-13995560 ] William Benton commented on SPARK-1789: --- Sean, we're currently building against Akka 2.3.0 in Fedora (it's a trivial source patch against 0.9.1; I haven't investigated the delta against 1.0 yet). Are there reasons why Akka 2.3.0 is a bad idea for Spark in general? If not, I'm happy to file a JIRA for updating the dependency and contribute my patch upstream. Multiple versions of Netty dependencies cause FlumeStreamSuite failure -- Key: SPARK-1789 URL: https://issues.apache.org/jira/browse/SPARK-1789 Project: Spark Issue Type: Bug Components: Build Affects Versions: 0.9.1 Reporter: Sean Owen Assignee: Sean Owen Labels: flume, netty, test Fix For: 1.0.0 TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure. I hit the error described at http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html while running FlumeStreamingSuite, and have for a short while (is it just me?) velvia notes: I have found a workaround. If you add akka 2.2.4 to your dependencies, then everything works, probably because akka 2.2.4 brings in newer version of Jetty. There are at least 3 versions of Netty in play in the build: - the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and that is the immediate problem - the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6. - but, Spark Core directly uses io.netty:netty-all:4.0.17.Final The POMs try to exclude other versions of netty, but are excluding org.jboss.netty:netty, when in fact older versions of io.netty:netty (not netty-all) are also an issue. The org.jboss.netty:netty excludes are largely unnecessary. I replaced many of them with io.netty:netty exclusions until everything agreed on io.netty:netty-all:4.0.17.Final. But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. Down-grading to 3.6.6.Final across the board made some Spark code not compile. If the build *keeps* io.netty:netty:3.6.6.Final as well, everything seems to work. Part of the reason seems to be that Netty 3.x used the old `org.jboss.netty` packages. This is less than ideal, but is no worse than the current situation. So this PR resolves the issue and improves the JAR hell, even if it leaves the existing theoretical Netty 3-vs-4 conflict: - Remove org.jboss.netty excludes where possible, for clarity; they're not needed except with Hadoop artifacts - Add io.netty:netty excludes where needed -- except, let akka keep its io.netty:netty - Change a bit of test code that actually depended on Netty 3.x, to use 4.x equivalent - Update SBT build accordingly A better change would be to update Akka far enough such that it agrees on Netty 4.x, but I don't know if that's feasible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1817) RDD zip erroneous when partitions do not divide RDD count
[ https://issues.apache.org/jira/browse/SPARK-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated SPARK-1817: - Affects Version/s: 1.0.0 RDD zip erroneous when partitions do not divide RDD count - Key: SPARK-1817 URL: https://issues.apache.org/jira/browse/SPARK-1817 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Michael Malak Assignee: Kan Zhang Priority: Blocker Example: scala sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect res1: Array[(Long, Int)] = Array((2,11)) But more generally, it's whenever the number of partitions does not evenly divide the total number of elements in the RDD. See https://groups.google.com/forum/#!msg/spark-users/demrmjHFnoc/Ek3ijiXHr2MJ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1708) Add ClassTag parameter on accumulator and broadcast methods
[ https://issues.apache.org/jira/browse/SPARK-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1708. Resolution: Fixed Fix Version/s: (was: 1.0.0) 1.1.0 Issue resolved by pull request 700 [https://github.com/apache/spark/pull/700] Add ClassTag parameter on accumulator and broadcast methods --- Key: SPARK-1708 URL: https://issues.apache.org/jira/browse/SPARK-1708 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Assignee: Matei Zaharia Priority: Blocker Fix For: 1.1.0 ClassTags will be needed by some serializers, such as a Scala Pickling based one, to come up with efficient serialization. We need to add them on Broadcast and probably also Accumulator and Accumulable. Since we're freezing the public API in 1.0 we have to do this before the release. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1283) Create spark-contrib repo for 1.0
[ https://issues.apache.org/jira/browse/SPARK-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992974#comment-13992974 ] Evan Chan commented on SPARK-1283: -- ping Any more comments? Objections to creating a landing page for contrib projects in the Spark docs? Create spark-contrib repo for 1.0 - Key: SPARK-1283 URL: https://issues.apache.org/jira/browse/SPARK-1283 Project: Spark Issue Type: Task Components: Project Infra Affects Versions: 1.0.0 Reporter: Evan Chan Fix For: 1.0.0 Let's create a spark-contrib repo to host community projects for the Spark ecosystem that don't quite belong in core, but are very important nevertheless. It would be linked to from official Spark documentation and web site, and help provide visibility for community projects. Some questions: - Who should host this repo, and where should it be hosted? - Github would be a strong preference from usability standpoint - There is talk that Apache might have some facility for this - Contents. Should it simply be links? Git submodules? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-571) Forbid return statements when cleaning closures
[ https://issues.apache.org/jira/browse/SPARK-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-571. --- Resolution: Fixed Forbid return statements when cleaning closures --- Key: SPARK-571 URL: https://issues.apache.org/jira/browse/SPARK-571 Project: Spark Issue Type: Improvement Reporter: tjhunter Assignee: William Benton Fix For: 1.1.0 By mistake, I wrote some code like this: {code} object Foo { def main() { val sc = new SparkContext(...) sc.parallelize(0 to 10,10).map({ ... return 1 ... }).collect } } {code} This compiles fine and actually runs using the local scheduler. However, using the mesos scheduler throws a NotSerializableException in the CollectTask . I agree the result of the program above should be undefined or it should be an error. Would it be possible to have more explicit messages? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1736) spark-submit on Windows
[ https://issues.apache.org/jira/browse/SPARK-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1736: --- Assignee: Andrew Or spark-submit on Windows --- Key: SPARK-1736 URL: https://issues.apache.org/jira/browse/SPARK-1736 Project: Spark Issue Type: Improvement Components: Windows Reporter: Matei Zaharia Assignee: Andrew Or Priority: Blocker Fix For: 1.0.0 - spark-submit needs a Windows version (shouldn't be too hard, it's just launching a Java process) - spark-shell.cmd needs to run through spark-submit like it does on Unix -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1753) PySpark on YARN does not work on assembly jar built on Red Hat based OS
[ https://issues.apache.org/jira/browse/SPARK-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1753: --- Assignee: Andrew Or PySpark on YARN does not work on assembly jar built on Red Hat based OS --- Key: SPARK-1753 URL: https://issues.apache.org/jira/browse/SPARK-1753 Project: Spark Issue Type: Bug Components: PySpark, Spark Core Affects Versions: 1.0.0 Reporter: Andrew Or Assignee: Andrew Or Fix For: 1.0.0 If the jar is built on a Red Hat based OS, the additional python files included in the jar cannot be accessed. This means PySpark doesn't work on YARN because in this mode it relies on the python files within this jar. I have confirmed that my Java, Scala, and maven versions are all exactly the same on my CentOS environment and on my local OSX environment, and the former does not work. Thomas Graves also struggled with the same problem. Until a fix is found, we should at the very least document this peculiarity. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1793) Heavily duplicated test setup code in SVMSuite
[ https://issues.apache.org/jira/browse/SPARK-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Tulloch resolved SPARK-1793. --- Resolution: Fixed Fix Version/s: 1.0.0 Heavily duplicated test setup code in SVMSuite -- Key: SPARK-1793 URL: https://issues.apache.org/jira/browse/SPARK-1793 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Andrew Tulloch Priority: Minor Fix For: 1.0.0 Refactor the code to remove the repeated initialization of test/validation RDDs in every test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1823) ExternalAppendOnlyMap can still OOM if one key is very large
[ https://issues.apache.org/jira/browse/SPARK-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1823: - Affects Version/s: 1.0.0 ExternalAppendOnlyMap can still OOM if one key is very large Key: SPARK-1823 URL: https://issues.apache.org/jira/browse/SPARK-1823 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Andrew Or Fix For: 1.1.0 If the values for one key do not collectively fit into memory, then the map will still OOM when you merge the spilled contents back in. This is a problem especially for PySpark, since we hash the keys (Python objects) before a shuffle, and there are only so many integers out there in the world, so there could potentially be many collisions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1816) LiveListenerBus dies if a listener throws an exception
[ https://issues.apache.org/jira/browse/SPARK-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1816. Resolution: Fixed Fix Version/s: 1.0.0 Issue resolved by pull request 759 [https://github.com/apache/spark/pull/759] LiveListenerBus dies if a listener throws an exception -- Key: SPARK-1816 URL: https://issues.apache.org/jira/browse/SPARK-1816 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Aaron Davidson Assignee: Andrew Or Priority: Critical Fix For: 1.0.0 The exception isn't even printed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1769) Executor loss can cause race condition in Pool
[ https://issues.apache.org/jira/browse/SPARK-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997079#comment-13997079 ] Andrew Or commented on SPARK-1769: -- https://github.com/apache/spark/pull/762 Executor loss can cause race condition in Pool -- Key: SPARK-1769 URL: https://issues.apache.org/jira/browse/SPARK-1769 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Aaron Davidson Assignee: Andrew Or Loss of executors (in this case due to OOMs) exposes a race condition in Pool.scala, evident from this stack trace: {code} 14/05/08 22:41:48 ERROR OneForOneStrategy: java.lang.NullPointerException at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.TaskSchedulerImpl.removeExecutor(TaskSchedulerImpl.scala:412) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:385) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.removeExecutor(CoarseGrainedSchedulerBackend.scala:160) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:123) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} Note that the line of code that throws this exception is here: {code} schedulableQueue.foreach(_.executorLost(executorId, host)) {code} By the stack trace, it's not schedulableQueue that is null, but an element therein. As far as I could tell, we never add a null element to this queue. Rather, I could see that removeSchedulable() and executorLost() were called at about the same time (via log messages), and suspect that since this ArrayBuffer is in no way synchronized, that we iterate through the list while it's in an incomplete state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1824) Python examples still take in master
[ https://issues.apache.org/jira/browse/SPARK-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1824: - Affects Version/s: 1.0.0 Python examples still take in master -- Key: SPARK-1824 URL: https://issues.apache.org/jira/browse/SPARK-1824 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Andrew Or A recent commit https://github.com/apache/spark/commit/44dd57fb66bb676d753ad8d9757f9f4c03364113 changed existing Spark examples in Scala and Java such that they no longer take in master as an argument. We forgot to do the same for Python. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1769) Executor loss can cause race condition in Pool
Aaron Davidson created SPARK-1769: - Summary: Executor loss can cause race condition in Pool Key: SPARK-1769 URL: https://issues.apache.org/jira/browse/SPARK-1769 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Aaron Davidson Loss of executors (in this case due to OOMs) exposes a race condition in Pool.scala, evident from this stack trace: {code} 14/05/08 22:41:48 ERROR OneForOneStrategy: java.lang.NullPointerException at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at org.apache.spark.scheduler.Pool$$anonfun$executorLost$1.apply(Pool.scala:87) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.Pool.executorLost(Pool.scala:87) at org.apache.spark.scheduler.TaskSchedulerImpl.removeExecutor(TaskSchedulerImpl.scala:412) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:385) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.removeExecutor(CoarseGrainedSchedulerBackend.scala:160) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1$$anonfun$applyOrElse$5.apply(CoarseGrainedSchedulerBackend.scala:123) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:123) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} Note that the line of code that throws this exception is here: {code} schedulableQueue.foreach(_.executorLost(executorId, host)) {code} By the stack trace, it's not schedulableQueue that is null, but an element therein. As far as I could tell, we never add a null element to this queue. Rather, I could see that there removeSchedulable() and executorLost() were called at about the same time (via log messages), and suspect that since this ArrayBuffer is in no way synchronized, that we iterate through the list while it's in an incomplete state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1283) Create spark-contrib repo for 1.0
[ https://issues.apache.org/jira/browse/SPARK-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993076#comment-13993076 ] Patrick Wendell commented on SPARK-1283: [~velvia] Yeah - do you want to submit a PR? This seems like a good idea to me. Create spark-contrib repo for 1.0 - Key: SPARK-1283 URL: https://issues.apache.org/jira/browse/SPARK-1283 Project: Spark Issue Type: Task Components: Project Infra Affects Versions: 1.0.0 Reporter: Evan Chan Fix For: 1.0.0 Let's create a spark-contrib repo to host community projects for the Spark ecosystem that don't quite belong in core, but are very important nevertheless. It would be linked to from official Spark documentation and web site, and help provide visibility for community projects. Some questions: - Who should host this repo, and where should it be hosted? - Github would be a strong preference from usability standpoint - There is talk that Apache might have some facility for this - Contents. Should it simply be links? Git submodules? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1825) Windows Spark fails to work with Linux YARN
[ https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taeyun Kim updated SPARK-1825: -- Fix Version/s: 1.0.0 Windows Spark fails to work with Linux YARN --- Key: SPARK-1825 URL: https://issues.apache.org/jira/browse/SPARK-1825 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Taeyun Kim Fix For: 1.0.0 Windows Spark fails to work with Linux YARN. This is a cross-platform problem. On YARN side, Hadoop 2.4.0 resolved the issue as follows: https://issues.apache.org/jira/browse/YARN-1824 But Spark YARN module does not incorporate the new YARN API yet, so problem persists for Spark. First, the following source files should be changed: - /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala - /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala Change is as follows: - Replace .$() to .$$() - Replace File.pathSeparator for Environment.CLASSPATH.name to ApplicationConstants.CLASS_PATH_SEPARATOR (import org.apache.hadoop.yarn.api.ApplicationConstants is required for this) Unless the above are applied, launch_container.sh will contain invalid shell script statements(since they will contain Windows-specific separators), and job will fail. Also, the following symptom should also be fixed (I could not find the relevant source code): - SPARK_HOME environment variable is copied straight to launch_container.sh. It should be changed to the path format for the server OS, or, the better, a separate environment variable or a configuration variable should be created. - '%HADOOP_MAPRED_HOME%' string still exists in launch_container.sh, after the above change is applied. maybe I missed a few lines. I'm not sure whether this is all, since I'm new to both Spark and YARN. -- This message was sent by Atlassian JIRA (v6.2#6252)