[jira] [Commented] (SPARK-1392) Local spark-shell Runs Out of Memory With Default Settings
[ https://issues.apache.org/jira/browse/SPARK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039946#comment-14039946 ] Patrick Wendell commented on SPARK-1392: I mentioned this on the pull request, but I think this was an instance of SPARK-1777. I'm running some tests locally on the pull request there to determine whether that was the case. Local spark-shell Runs Out of Memory With Default Settings -- Key: SPARK-1392 URL: https://issues.apache.org/jira/browse/SPARK-1392 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Environment: OS X 10.9.2, Java 1.7.0_51, Scala 2.10.3 Reporter: Pat McDonough Using the spark-0.9.0 Hadoop2 binary from the project download page, running the spark-shell locally in out of the box configuration, and attempting to cache all the attached data, spark OOMs with: java.lang.OutOfMemoryError: GC overhead limit exceeded You can work around the issue by either decreasing spark.storage.memoryFraction or increasing SPARK_MEM -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1392) Local spark-shell Runs Out of Memory With Default Settings
[ https://issues.apache.org/jira/browse/SPARK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039953#comment-14039953 ] Patrick Wendell commented on SPARK-1392: Okay great, I confirmed this is fixed by SPARK-1777. I tested as follows: {code} SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_HIVE=true sbt/sbt clean assembly/assembly sc.textFile(/tmp/wiki_links).cache.count {code} The wiki_links file was download and extracted from here: This worked with the proposed patch but failed with the default build. Local spark-shell Runs Out of Memory With Default Settings -- Key: SPARK-1392 URL: https://issues.apache.org/jira/browse/SPARK-1392 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Environment: OS X 10.9.2, Java 1.7.0_51, Scala 2.10.3 Reporter: Pat McDonough Using the spark-0.9.0 Hadoop2 binary from the project download page, running the spark-shell locally in out of the box configuration, and attempting to cache all the attached data, spark OOMs with: java.lang.OutOfMemoryError: GC overhead limit exceeded You can work around the issue by either decreasing spark.storage.memoryFraction or increasing SPARK_MEM -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-1392) Local spark-shell Runs Out of Memory With Default Settings
[ https://issues.apache.org/jira/browse/SPARK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039953#comment-14039953 ] Patrick Wendell edited comment on SPARK-1392 at 6/21/14 9:15 PM: - Okay great, I confirmed this is fixed by SPARK-1777. I tested as follows: {code} SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_HIVE=true sbt/sbt clean assembly/assembly sc.textFile(/tmp/wiki_links).cache.count {code} The wiki_links file was download and extracted from here: https://drive.google.com/file/d/0BwrkCxCycBCyTmlWYXp0MmdEakk/edit?usp=sharing This worked with the proposed patch but failed with the default build. was (Author: pwendell): Okay great, I confirmed this is fixed by SPARK-1777. I tested as follows: {code} SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_HIVE=true sbt/sbt clean assembly/assembly sc.textFile(/tmp/wiki_links).cache.count {code} The wiki_links file was download and extracted from here: This worked with the proposed patch but failed with the default build. Local spark-shell Runs Out of Memory With Default Settings -- Key: SPARK-1392 URL: https://issues.apache.org/jira/browse/SPARK-1392 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Environment: OS X 10.9.2, Java 1.7.0_51, Scala 2.10.3 Reporter: Pat McDonough Using the spark-0.9.0 Hadoop2 binary from the project download page, running the spark-shell locally in out of the box configuration, and attempting to cache all the attached data, spark OOMs with: java.lang.OutOfMemoryError: GC overhead limit exceeded You can work around the issue by either decreasing spark.storage.memoryFraction or increasing SPARK_MEM -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1392) Local spark-shell Runs Out of Memory With Default Settings
[ https://issues.apache.org/jira/browse/SPARK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1392. Resolution: Duplicate Local spark-shell Runs Out of Memory With Default Settings -- Key: SPARK-1392 URL: https://issues.apache.org/jira/browse/SPARK-1392 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Environment: OS X 10.9.2, Java 1.7.0_51, Scala 2.10.3 Reporter: Pat McDonough Using the spark-0.9.0 Hadoop2 binary from the project download page, running the spark-shell locally in out of the box configuration, and attempting to cache all the attached data, spark OOMs with: java.lang.OutOfMemoryError: GC overhead limit exceeded You can work around the issue by either decreasing spark.storage.memoryFraction or increasing SPARK_MEM -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1996) Remove use of special Maven repo for Akka
[ https://issues.apache.org/jira/browse/SPARK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1996. Resolution: Fixed Fix Version/s: (was: 1.0.1) 1.1.0 Fixed via: https://github.com/apache/spark/pull/1170/files Remove use of special Maven repo for Akka - Key: SPARK-1996 URL: https://issues.apache.org/jira/browse/SPARK-1996 Project: Spark Issue Type: Improvement Components: Documentation, Spark Core Reporter: Matei Zaharia Assignee: Sean Owen Fix For: 1.1.0 According to http://doc.akka.io/docs/akka/2.3.3/intro/getting-started.html Akka is now published to Maven Central, so our documentation and POM files don't need to use the old Akka repo. It will be one less step for users to worry about. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2230) Improvements to Jenkins QA Harness
Patrick Wendell created SPARK-2230: -- Summary: Improvements to Jenkins QA Harness Key: SPARK-2230 URL: https://issues.apache.org/jira/browse/SPARK-2230 Project: Spark Issue Type: Umbrella Components: Project Infra Reporter: Patrick Wendell An umbrella for some improvements I'd like to do. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2231) dev/run-tests should include YARN and use a recent Hadoop version
Patrick Wendell created SPARK-2231: -- Summary: dev/run-tests should include YARN and use a recent Hadoop version Key: SPARK-2231 URL: https://issues.apache.org/jira/browse/SPARK-2231 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Patrick Wendell Assignee: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2232) Fix Jenkins tests in Maven
Patrick Wendell created SPARK-2232: -- Summary: Fix Jenkins tests in Maven Key: SPARK-2232 URL: https://issues.apache.org/jira/browse/SPARK-2232 Project: Spark Issue Type: Sub-task Reporter: Patrick Wendell It appears Maven tests are failing under the newer Hadoop configurations. We need to go through and make sure all the Spark master build configurations are passing. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Master%20Matrix/ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1804) Mark 0.9.1 as released in JIRA
[ https://issues.apache.org/jira/browse/SPARK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1804. Resolution: Fixed Mark 0.9.1 as released in JIRA -- Key: SPARK-1804 URL: https://issues.apache.org/jira/browse/SPARK-1804 Project: Spark Issue Type: Task Components: Documentation, Project Infra Affects Versions: 0.9.1 Reporter: Stevo Slavic Priority: Trivial 0.9.1 has been released but is labeled as unreleased in SPARK JIRA project. Please have it marked as released. Also please document that step in release process. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1803) Rename test resources to be compatible with Windows FS
[ https://issues.apache.org/jira/browse/SPARK-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1803. Resolution: Fixed Resolved via: https://github.com/apache/spark/pull/739 Rename test resources to be compatible with Windows FS -- Key: SPARK-1803 URL: https://issues.apache.org/jira/browse/SPARK-1803 Project: Spark Issue Type: Task Components: Windows Affects Versions: 0.9.1 Reporter: Stevo Slavic Priority: Trivial {{git clone}} of master branch and then {{git status}} on Windows reports untracked files: {noformat} # Untracked files: # (use git add file... to include in what will be committed) # # sql/hive/src/test/resources/golden/Column pruning # sql/hive/src/test/resources/golden/Partition pruning # sql/hive/src/test/resources/golden/Partiton pruning {noformat} Actual issue is that several files under {{sql/hive/src/test/resources/golden}} directory have colon in name which is invalid character in file name on Windows. Please have these files renamed to a Windows compatible file name. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-721) Fix remaining deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-721. --- Resolution: Fixed Assignee: (was: Gary Struthers) Fix remaining deprecation warnings -- Key: SPARK-721 URL: https://issues.apache.org/jira/browse/SPARK-721 Project: Spark Issue Type: Improvement Affects Versions: 0.7.1 Reporter: Josh Rosen Priority: Minor Labels: Starter The recent patch to re-enable deprecation warnings fixed many of them, but there's still a few left; it would be nice to fix them. For example, here's one in RDDSuite: {code} [warn] /Users/joshrosen/Documents/spark/spark/core/src/test/scala/spark/RDDSuite.scala:32: method mapPartitionsWithSplit in class RDD is deprecated: use mapPartitionsWithIndex [warn] val partitionSumsWithSplit = nums.mapPartitionsWithSplit { [warn] ^ [warn] one warning found {code} Also, it looks like Scala 2.9 added a second deprecatedSince parameter to @Deprecated. We didn't fill this in, which causes some additional warnings: {code} [warn] /Users/joshrosen/Documents/spark/spark/core/src/main/scala/spark/RDD.scala:370: @deprecated now takes two arguments; see the scaladoc. [warn] @deprecated(use mapPartitionsWithIndex) [warn]^ [warn] one warning found {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2233) make-distribution script should list the git hash in the RELEASE file
Patrick Wendell created SPARK-2233: -- Summary: make-distribution script should list the git hash in the RELEASE file Key: SPARK-2233 URL: https://issues.apache.org/jira/browse/SPARK-2233 Project: Spark Issue Type: Bug Reporter: Patrick Wendell If someone is creating a distribution and also has a version of Spark that has a .git folder in it, we should list the current git hash and put that in the RELEASE file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2233) make-distribution script should list the git hash in the RELEASE file
[ https://issues.apache.org/jira/browse/SPARK-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2233: --- Issue Type: Improvement (was: Bug) make-distribution script should list the git hash in the RELEASE file - Key: SPARK-2233 URL: https://issues.apache.org/jira/browse/SPARK-2233 Project: Spark Issue Type: Improvement Components: Project Infra Reporter: Patrick Wendell Priority: Minor Labels: starter If someone is creating a distribution and also has a version of Spark that has a .git folder in it, we should list the current git hash and put that in the RELEASE file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2231) dev/run-tests should include YARN and use a recent Hadoop version
[ https://issues.apache.org/jira/browse/SPARK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2231. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1175 [https://github.com/apache/spark/pull/1175] dev/run-tests should include YARN and use a recent Hadoop version - Key: SPARK-2231 URL: https://issues.apache.org/jira/browse/SPARK-2231 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Patrick Wendell Assignee: Patrick Wendell Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2034) KafkaInputDStream doesn't close resources and may prevent JVM shutdown
[ https://issues.apache.org/jira/browse/SPARK-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2034: --- Assignee: Sean Owen KafkaInputDStream doesn't close resources and may prevent JVM shutdown -- Key: SPARK-2034 URL: https://issues.apache.org/jira/browse/SPARK-2034 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.0.0 Reporter: Sean Owen Assignee: Sean Owen Fix For: 1.0.1, 1.1.0 Tobias noted today on the mailing list: {quote} I am trying to use Spark Streaming with Kafka, which works like a charm -- except for shutdown. When I run my program with sbt run-main, sbt will never exit, because there are two non-daemon threads left that don't die. I created a minimal example at https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-kafkadoesntshutdown-scala. It starts a StreamingContext and does nothing more than connecting to a Kafka server and printing what it receives. Using the `future { ... }` construct, I shut down the StreamingContext after some seconds and then print the difference between the threads at start time and at end time. The output can be found at https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-output1. There are a number of threads remaining that will prevent sbt from exiting. When I replace `KafkaUtils.createStream(...)` with a call that does exactly the same, except that it calls `consumerConnector.shutdown()` in `KafkaReceiver.onStop()` (which it should, IMO), the output is as shown at https://gist.github.com/tgpfeiffer/b1e765064e983449c6b6#file-output2. Does anyone have *any* idea what is going on here and why the program doesn't shut down properly? The behavior is the same with both kafka 0.8.0 and 0.8.1.1, by the way. {quote} Something similar was noted last year: http://mail-archives.apache.org/mod_mbox/spark-dev/201309.mbox/%3c1380220041.2428.yahoomail...@web160804.mail.bf1.yahoo.com%3E KafkaInputDStream doesn't close ConsumerConnector in onStop(), and does not close the Executor it creates. The latter leaves non-daemon threads and can prevent the JVM from shutting down even if streaming is closed properly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2156) When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks
[ https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2156: --- Target Version/s: 0.9.2, 1.0.1, 1.1.0 (was: 0.9.2, 1.0.1) When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks -- Key: SPARK-2156 URL: https://issues.apache.org/jira/browse/SPARK-2156 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1, 1.0.0 Environment: AWS EC2 1 master 2 slaves with the instance type of r3.2xlarge Reporter: Chen Jin Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.1 Original Estimate: 504h Remaining Estimate: 504h I have done some experiments when the frameSize is around 10MB . 1) spark.akka.frameSize = 10 If one of the partition size is very close to 10MB, say 9.97MB, the execution blocks without any exception or warning. Worker finished the task to send the serialized result, and then throw exception saying hadoop IPC client connection stops (changing the logging to debug level). However, the master never receives the results and the program just hangs. But if sizes for all the partitions less than some number btw 9.96MB amd 9.97MB, the program works fine. 2) spark.akka.frameSize = 9 when the partition size is just a little bit smaller than 9MB, it fails as well. This bug behavior is not exactly what spark-1112 is about. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution
[ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1112: --- Fix Version/s: 1.0.1 When spark.akka.frameSize 10, task results bigger than 10MiB block execution -- Key: SPARK-1112 URL: https://issues.apache.org/jira/browse/SPARK-1112 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Guillaume Pitel Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.1 When I set the spark.akka.frameSize to something over 10, the messages sent from the executors to the driver completely block the execution if the message is bigger than 10MiB and smaller than the frameSize (if it's above the frameSize, it's ok) Workaround is to set the spark.akka.frameSize to 10. In this case, since 0.8.1, the blockManager deal with the data to be sent. It seems slower than akka direct message though. The configuration seems to be correctly read (see actorSystemConfig.txt), so I don't see where the 10MiB could come from -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2156) When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks
[ https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2156: --- Fix Version/s: 1.0.1 When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks -- Key: SPARK-2156 URL: https://issues.apache.org/jira/browse/SPARK-2156 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1, 1.0.0 Environment: AWS EC2 1 master 2 slaves with the instance type of r3.2xlarge Reporter: Chen Jin Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.1 Original Estimate: 504h Remaining Estimate: 504h I have done some experiments when the frameSize is around 10MB . 1) spark.akka.frameSize = 10 If one of the partition size is very close to 10MB, say 9.97MB, the execution blocks without any exception or warning. Worker finished the task to send the serialized result, and then throw exception saying hadoop IPC client connection stops (changing the logging to debug level). However, the master never receives the results and the program just hangs. But if sizes for all the partitions less than some number btw 9.96MB amd 9.97MB, the program works fine. 2) spark.akka.frameSize = 9 when the partition size is just a little bit smaller than 9MB, it fails as well. This bug behavior is not exactly what spark-1112 is about. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution
[ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1112: --- Target Version/s: 0.9.2, 1.0.1, 1.1.0 (was: 0.9.2, 1.0.1) When spark.akka.frameSize 10, task results bigger than 10MiB block execution -- Key: SPARK-1112 URL: https://issues.apache.org/jira/browse/SPARK-1112 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Guillaume Pitel Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.1 When I set the spark.akka.frameSize to something over 10, the messages sent from the executors to the driver completely block the execution if the message is bigger than 10MiB and smaller than the frameSize (if it's above the frameSize, it's ok) Workaround is to set the spark.akka.frameSize to 10. In this case, since 0.8.1, the blockManager deal with the data to be sent. It seems slower than akka direct message though. The configuration seems to be correctly read (see actorSystemConfig.txt), so I don't see where the 10MiB could come from -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2156) When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks
[ https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040360#comment-14040360 ] Patrick Wendell commented on SPARK-2156: This is fixed in the 1.0 branch via: https://github.com/apache/spark/pull/1172 When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks -- Key: SPARK-2156 URL: https://issues.apache.org/jira/browse/SPARK-2156 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1, 1.0.0 Environment: AWS EC2 1 master 2 slaves with the instance type of r3.2xlarge Reporter: Chen Jin Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.1 Original Estimate: 504h Remaining Estimate: 504h I have done some experiments when the frameSize is around 10MB . 1) spark.akka.frameSize = 10 If one of the partition size is very close to 10MB, say 9.97MB, the execution blocks without any exception or warning. Worker finished the task to send the serialized result, and then throw exception saying hadoop IPC client connection stops (changing the logging to debug level). However, the master never receives the results and the program just hangs. But if sizes for all the partitions less than some number btw 9.96MB amd 9.97MB, the program works fine. 2) spark.akka.frameSize = 9 when the partition size is just a little bit smaller than 9MB, it fails as well. This bug behavior is not exactly what spark-1112 is about. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2241) EC2 script should handle quoted arguments correctly
[ https://issues.apache.org/jira/browse/SPARK-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2241: --- Description: We should pass quoted arguments correctly to the underlying ec2 script in spark-ec2 (was: We should pass quoted arguments correctly to the underlying ec2 in spark-ec2) EC2 script should handle quoted arguments correctly --- Key: SPARK-2241 URL: https://issues.apache.org/jira/browse/SPARK-2241 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 0.9.1, 1.0.0 Reporter: Patrick Wendell We should pass quoted arguments correctly to the underlying ec2 script in spark-ec2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2241) EC2 script should handle quoted arguments correctly
Patrick Wendell created SPARK-2241: -- Summary: EC2 script should handle quoted arguments correctly Key: SPARK-2241 URL: https://issues.apache.org/jira/browse/SPARK-2241 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 1.0.0, 0.9.1 Reporter: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2241) EC2 script should handle quoted arguments correctly
[ https://issues.apache.org/jira/browse/SPARK-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2241: --- Description: We should pass quoted arguments correctly to the underlying ec2 in spark-ec2 EC2 script should handle quoted arguments correctly --- Key: SPARK-2241 URL: https://issues.apache.org/jira/browse/SPARK-2241 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 0.9.1, 1.0.0 Reporter: Patrick Wendell We should pass quoted arguments correctly to the underlying ec2 in spark-ec2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2241) EC2 script should handle quoted arguments correctly
[ https://issues.apache.org/jira/browse/SPARK-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2241. Resolution: Fixed Fix Version/s: 1.1.0 0.9.2 1.0.1 Issue resolved by pull request 1169 [https://github.com/apache/spark/pull/1169] EC2 script should handle quoted arguments correctly --- Key: SPARK-2241 URL: https://issues.apache.org/jira/browse/SPARK-2241 Project: Spark Issue Type: Bug Components: EC2 Affects Versions: 0.9.1, 1.0.0 Reporter: Patrick Wendell Fix For: 1.0.1, 0.9.2, 1.1.0 We should pass quoted arguments correctly to the underlying ec2 script in spark-ec2 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2166) Enumerating instances to be terminated before the prompting the users to continue.
[ https://issues.apache.org/jira/browse/SPARK-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2166. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 270 [https://github.com/apache/spark/pull/270] Enumerating instances to be terminated before the prompting the users to continue. -- Key: SPARK-2166 URL: https://issues.apache.org/jira/browse/SPARK-2166 Project: Spark Issue Type: Improvement Components: EC2 Affects Versions: 0.9.0, 1.0.0 Reporter: Jean-Martin Archer Assignee: Jean-Martin Archer Priority: Minor Fix For: 1.1.0 Original Estimate: 0h Remaining Estimate: 0h When destroying a cluster, the user will be prompted for confirmation without first showing which instances will be terminated. Pull Request: https://github.com/apache/spark/pull/270#issuecomment-46341975 This pull request will list the EC2 instances before destroying the cluster. This was added because it can be scary to destroy EC2 instances without knowing which one will be affected. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2228: --- Target Version/s: 1.0.1, 1.1.0 onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted - Key: SPARK-2228 URL: https://issues.apache.org/jira/browse/SPARK-2228 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Baoxu Shi We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative computing, but after several hundreds of iterations, there will be `NoSuchElementsError`. We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark` did add a stage but failed to send the message to listeners. When sending `finish` message to listeners, the error occurs. This problem will cause a huge number of `active stages` showing in `SparkUI`, which is really annoying. But it may not affect the final result, according to the result of my testing code. I'm willing to help solve this problem, any idea about which part should I change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks fine to me. FYI, here is the test code that could reproduce the problem. I do not know who to put code here with highlight, so I put the code on gist to make the issue looks clean. https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2228: --- Affects Version/s: (was: 1.1.0) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted - Key: SPARK-2228 URL: https://issues.apache.org/jira/browse/SPARK-2228 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Baoxu Shi We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative computing, but after several hundreds of iterations, there will be `NoSuchElementsError`. We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark` did add a stage but failed to send the message to listeners. When sending `finish` message to listeners, the error occurs. This problem will cause a huge number of `active stages` showing in `SparkUI`, which is really annoying. But it may not affect the final result, according to the result of my testing code. I'm willing to help solve this problem, any idea about which part should I change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks fine to me. FYI, here is the test code that could reproduce the problem. I do not know who to put code here with highlight, so I put the code on gist to make the issue looks clean. https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2118) If tools jar is not present, MIMA build should exit with an exception
[ https://issues.apache.org/jira/browse/SPARK-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2118. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1068 [https://github.com/apache/spark/pull/1068] If tools jar is not present, MIMA build should exit with an exception - Key: SPARK-2118 URL: https://issues.apache.org/jira/browse/SPARK-2118 Project: Spark Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Prashant Sharma Fix For: 1.1.0 Right now dev/mima will just produce a bunch of warnings since generating the excludes fails. If the tools jar is not present, it should tell the user to run sbt/sbt assembly and exit nonzero. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1768) History Server enhancements
[ https://issues.apache.org/jira/browse/SPARK-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1768. Resolution: Fixed Issue resolved by pull request 718 [https://github.com/apache/spark/pull/718] History Server enhancements --- Key: SPARK-1768 URL: https://issues.apache.org/jira/browse/SPARK-1768 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.0.0 Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Fix For: 1.1.0 The history server currently has some limitations; the one that currently concerns me the most is that it limits the number of applications it will show, to avoid having to hold all applications in memory. It would be better if the code were smarter and able to show any application available in the history storage. Also, thinking forward a little bit (I'm thinking SPARK-1537), it would be nice to separate the serving logic from the logic to access app log data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
[ https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2204: --- Target Version/s: 1.0.1, 1.1.0 Scheduler for Mesos in fine-grained mode launches tasks on wrong executors -- Key: SPARK-2204 URL: https://issues.apache.org/jira/browse/SPARK-2204 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Sebastien Rainville Priority: Blocker MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning task lists in the same order as the offers it was passed, but in the current implementation TaskSchedulerImpl.resourceOffers shuffles the offers to avoid assigning the tasks always to the same executors. The result is that the tasks are launched on the wrong executors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2264) CachedTableSuite SQL Tests are Failing
Patrick Wendell created SPARK-2264: -- Summary: CachedTableSuite SQL Tests are Failing Key: SPARK-2264 URL: https://issues.apache.org/jira/browse/SPARK-2264 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Assignee: Michael Armbrust Priority: Blocker {code} [info] CachedTableSuite: [info] - read from cached table and uncache *** FAILED *** [info] java.lang.RuntimeException: Table Not Found: testData [info] at scala.sys.package$.error(package.scala:27) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:64) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:64) [info] at scala.Option.getOrElse(Option.scala:120) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:64) [info] at org.apache.spark.sql.SQLContext.table(SQLContext.scala:185) [info] at org.apache.spark.sql.CachedTableSuite$$anonfun$1.apply$mcV$sp(CachedTableSuite.scala:43) [info] at org.apache.spark.sql.CachedTableSuite$$anonfun$1.apply(CachedTableSuite.scala:27) [info] at org.apache.spark.sql.CachedTableSuite$$anonfun$1.apply(CachedTableSuite.scala:27) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22) [info] ... [info] - correct error on uncache of non-cached table *** FAILED *** [info] Expected exception java.lang.IllegalArgumentException to be thrown, but java.lang.RuntimeException was thrown. (CachedTableSuite.scala:55) [info] - SELECT Star Cached Table *** FAILED *** [info] java.lang.RuntimeException: Table Not Found: testData [info] at scala.sys.package$.error(package.scala:27) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:64) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:64) [info] at scala.Option.getOrElse(Option.scala:120) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:64) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$1.applyOrElse(Analyzer.scala:67) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$1.applyOrElse(Analyzer.scala:65) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) [info] at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183) [info] at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) [info] ... [info] - Self-join cached *** FAILED *** [info] java.lang.RuntimeException: Table Not Found: testData [info] at scala.sys.package$.error(package.scala:27) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:64) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:64) [info] at scala.Option.getOrElse(Option.scala:120) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:64) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$1.applyOrElse(Analyzer.scala:67) [info] at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$1.applyOrElse(Analyzer.scala:65) [info] at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) [info] at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:183) [info] at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) [info] ... [info] - 'CACHE TABLE' and 'UNCACHE TABLE' SQL statement *** FAILED *** [info] java.lang.RuntimeException: Table Not Found: testData [info] at scala.sys.package$.error(package.scala:27) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:64) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:64) [info] at scala.Option.getOrElse(Option.scala:120) [info] at org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:64) [info] at org.apache.spark.sql.SQLContext.cacheTable(SQLContext.scala:189) [info] at org.apache.spark.sql.execution.CacheCommand.sideEffectResult$lzycompute(commands.scala:110) [info] at org.apache.spark.sql.execution.CacheCommand.sideEffectResult(commands.scala:108) [info] at org.apache.spark.sql.execution.CacheCommand.execute(commands.scala:118) [info] at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:322) [info] ... {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2269) Clean up and add unit tests for resourceOffers in MesosSchedulerBackend
Patrick Wendell created SPARK-2269: -- Summary: Clean up and add unit tests for resourceOffers in MesosSchedulerBackend Key: SPARK-2269 URL: https://issues.apache.org/jira/browse/SPARK-2269 Project: Spark Issue Type: Bug Components: Mesos Reporter: Patrick Wendell This function could be simplified a bit. We could re-write it without offerableIndices or creating the mesosTasks array as large as the offer list. There is a lot of logic around making sure you get the correct index into mesosTasks and offers, really we should just build mesosTasks directly from the offers we get back. To associate the tasks we are launching with the offers we can just create a hashMap from the slaveId to the original offer. The basic logic of the function is that you take the mesos offers, convert them to spark offers, then convert the results back. One thing we should check is whether Mesos guarantees that it won't give two offers for the same worker. That would make things much more complicated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2156) When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks
[ https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042962#comment-14042962 ] Patrick Wendell commented on SPARK-2156: Fixed in 1.1.0 via: https://github.com/apache/spark/pull/1132 When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks -- Key: SPARK-2156 URL: https://issues.apache.org/jira/browse/SPARK-2156 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.1, 1.0.0 Environment: AWS EC2 1 master 2 slaves with the instance type of r3.2xlarge Reporter: Chen Jin Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.1, 1.1.0 Original Estimate: 504h Remaining Estimate: 504h I have done some experiments when the frameSize is around 10MB . 1) spark.akka.frameSize = 10 If one of the partition size is very close to 10MB, say 9.97MB, the execution blocks without any exception or warning. Worker finished the task to send the serialized result, and then throw exception saying hadoop IPC client connection stops (changing the logging to debug level). However, the master never receives the results and the program just hangs. But if sizes for all the partitions less than some number btw 9.96MB amd 9.97MB, the program works fine. 2) spark.akka.frameSize = 9 when the partition size is just a little bit smaller than 9MB, it fails as well. This bug behavior is not exactly what spark-1112 is about. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2248) spark.default.parallelism does not apply in local mode
[ https://issues.apache.org/jira/browse/SPARK-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2248. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1194 [https://github.com/apache/spark/pull/1194] spark.default.parallelism does not apply in local mode -- Key: SPARK-2248 URL: https://issues.apache.org/jira/browse/SPARK-2248 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Matei Zaharia Assignee: Guoqiang Li Priority: Trivial Labels: Starter Fix For: 1.1.0 LocalBackend.defaultParallelism ignores the spark.default.parallelism property, unlike the other SchedulerBackends. We should make it take this in for consistency. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution
[ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043738#comment-14043738 ] Patrick Wendell commented on SPARK-1112: [~reachbach] If you are running on standalone mode, it might work if you go on every node in your cluster and add the following to spark-env.sh: {code} SPARK_JAVA_OPTS=-Dspark.akka.frameSize=XXX {code} However, this work around will only work if every job in your cluster is using the same frame size (XXX). The main recommendation is to upgrade to 1.0.1. We are very conservative about what we merge into maintenance branches, so we recommend users upgrade immediately once we release them. When spark.akka.frameSize 10, task results bigger than 10MiB block execution -- Key: SPARK-1112 URL: https://issues.apache.org/jira/browse/SPARK-1112 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Guillaume Pitel Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.1, 1.1.0 When I set the spark.akka.frameSize to something over 10, the messages sent from the executors to the driver completely block the execution if the message is bigger than 10MiB and smaller than the frameSize (if it's above the frameSize, it's ok) Workaround is to set the spark.akka.frameSize to 10. In this case, since 0.8.1, the blockManager deal with the data to be sent. It seems slower than akka direct message though. The configuration seems to be correctly read (see actorSystemConfig.txt), so I don't see where the 10MiB could come from -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (SPARK-1112) When spark.akka.frameSize 10, task results bigger than 10MiB block execution
[ https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-1112: This is not resolved yet because it needs to be back ported into 0.9 When spark.akka.frameSize 10, task results bigger than 10MiB block execution -- Key: SPARK-1112 URL: https://issues.apache.org/jira/browse/SPARK-1112 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0, 1.0.0 Reporter: Guillaume Pitel Assignee: Xiangrui Meng Priority: Blocker Fix For: 1.0.1, 1.1.0 When I set the spark.akka.frameSize to something over 10, the messages sent from the executors to the driver completely block the execution if the message is bigger than 10MiB and smaller than the frameSize (if it's above the frameSize, it's ok) Workaround is to set the spark.akka.frameSize to 10. In this case, since 0.8.1, the blockManager deal with the data to be sent. It seems slower than akka direct message though. The configuration seems to be correctly read (see actorSystemConfig.txt), so I don't see where the 10MiB could come from -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2269) Clean up and add unit tests for resourceOffers in MesosSchedulerBackend
[ https://issues.apache.org/jira/browse/SPARK-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2269: --- Description: This function could be simplified a bit. We could re-write it without offerableIndices or creating the mesosTasks array as large as the offer list. There is a lot of logic around making sure you get the correct index into mesosTasks and offers, really we should just build mesosTasks directly from the offers we get back. To associate the tasks we are launching with the offers we can just create a hashMap from the slaveId to the original offer. The basic logic of the function is that you take the mesos offers, convert them to spark offers, then convert the results back. One reason I think it might be designed as it is now is to deal with the case where Mesos gives multiple offers for a single slave. I checked directly with the Mesos team and they said this won't ever happen, you'll get at most one offer per mesos slave within a set of offers. was: This function could be simplified a bit. We could re-write it without offerableIndices or creating the mesosTasks array as large as the offer list. There is a lot of logic around making sure you get the correct index into mesosTasks and offers, really we should just build mesosTasks directly from the offers we get back. To associate the tasks we are launching with the offers we can just create a hashMap from the slaveId to the original offer. The basic logic of the function is that you take the mesos offers, convert them to spark offers, then convert the results back. One thing we should check is whether Mesos guarantees that it won't give two offers for the same worker. That would make things much more complicated. Clean up and add unit tests for resourceOffers in MesosSchedulerBackend --- Key: SPARK-2269 URL: https://issues.apache.org/jira/browse/SPARK-2269 Project: Spark Issue Type: Bug Components: Mesos Reporter: Patrick Wendell This function could be simplified a bit. We could re-write it without offerableIndices or creating the mesosTasks array as large as the offer list. There is a lot of logic around making sure you get the correct index into mesosTasks and offers, really we should just build mesosTasks directly from the offers we get back. To associate the tasks we are launching with the offers we can just create a hashMap from the slaveId to the original offer. The basic logic of the function is that you take the mesos offers, convert them to spark offers, then convert the results back. One reason I think it might be designed as it is now is to deal with the case where Mesos gives multiple offers for a single slave. I checked directly with the Mesos team and they said this won't ever happen, you'll get at most one offer per mesos slave within a set of offers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2270) Kryo cannot serialize results returned by asJavaIterable (and thus groupBy/cogroup are broken in Java APIs when Kryo is used)
[ https://issues.apache.org/jira/browse/SPARK-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2270. Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Issue resolved by pull request 1206 [https://github.com/apache/spark/pull/1206] Kryo cannot serialize results returned by asJavaIterable (and thus groupBy/cogroup are broken in Java APIs when Kryo is used) - Key: SPARK-2270 URL: https://issues.apache.org/jira/browse/SPARK-2270 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Reynold Xin Assignee: Reynold Xin Priority: Critical Fix For: 1.0.1, 1.1.0 The combination of Kryo serializer Java API could lead to the following exception in groupBy/groupByKey/cogroup: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Exception while deserializing and fetching task: java.lang.UnsupportedOperationException org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) scala.Option.foreach(Option.scala:236) org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633) org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207) akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) akka.actor.ActorCell.invoke(ActorCell.scala:456) akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) akka.dispatch.Mailbox.run(Mailbox.scala:219) akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)3:45 PM {code} or {code} 14/06/24 16:38:09 ERROR TaskResultGetter: Exception while getting task result java.lang.UnsupportedOperationException at java.util.AbstractCollection.add(AbstractCollection.java:260) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at carbonite.serializer$mk_collection_reader$fn__50.invoke(serializer.clj:57) at clojure.lang.Var.invoke(Var.java:383) at carbonite.ClojureVecSerializer.read(ClojureVecSerializer.java:17) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:338) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:144) at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79) at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:480) at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:316) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:68) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1213) at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at
[jira] [Resolved] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
[ https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2204. Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Issue resolved by pull request 1140 [https://github.com/apache/spark/pull/1140] Scheduler for Mesos in fine-grained mode launches tasks on wrong executors -- Key: SPARK-2204 URL: https://issues.apache.org/jira/browse/SPARK-2204 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Sebastien Rainville Priority: Blocker Fix For: 1.0.1, 1.1.0 MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning task lists in the same order as the offers it was passed, but in the current implementation TaskSchedulerImpl.resourceOffers shuffles the offers to avoid assigning the tasks always to the same executors. The result is that the tasks are launched on the wrong executors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1912) Compression memory issue during reduce
[ https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1912: --- Fix Version/s: 0.9.2 Compression memory issue during reduce -- Key: SPARK-1912 URL: https://issues.apache.org/jira/browse/SPARK-1912 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Wenchen Fan Assignee: Wenchen Fan Fix For: 0.9.2, 1.0.1, 1.1.0 When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem. Actually reducer reads the shuffle blocks one by one, so why we create compression instance at the first time? Can we do it lazily that when a block is first read, create compression instance for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1912) Compression memory issue during reduce
[ https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1912: --- Fix Version/s: 1.0.1 Compression memory issue during reduce -- Key: SPARK-1912 URL: https://issues.apache.org/jira/browse/SPARK-1912 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Wenchen Fan Assignee: Wenchen Fan Fix For: 0.9.2, 1.0.1, 1.1.0 When we need to read a compressed block, we will first create a compress stream instance(LZF or Snappy) and use it to wrap that block. Let's say a reducer task need to read 1000 local shuffle blocks, it will first prepare to read that 1000 blocks, which means create 1000 compression stream instance to wrap them. But the initialization of compression instance will allocate some memory and when we have many compression instance at the same time, it is a problem. Actually reducer reads the shuffle blocks one by one, so why we create compression instance at the first time? Can we do it lazily that when a block is first read, create compression instance for it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2279) JavaSparkContext should allow creation of EmptyRDD
[ https://issues.apache.org/jira/browse/SPARK-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044063#comment-14044063 ] Patrick Wendell commented on SPARK-2279: I think `EmtpyRDD` is a mostly internal class. Can you just parallelize an empty collection? JavaSparkContext should allow creation of EmptyRDD -- Key: SPARK-2279 URL: https://issues.apache.org/jira/browse/SPARK-2279 Project: Spark Issue Type: New Feature Components: Java API Affects Versions: 1.0.0 Reporter: Hans Uhlig Scala Implementation currently supports creation of EmptyRDD. Java does not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2266) Log page on Worker UI displays Some(app-id)
[ https://issues.apache.org/jira/browse/SPARK-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044070#comment-14044070 ] Patrick Wendell commented on SPARK-2266: Resolved by: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=9aa603296c285e1acf4bde64583f203008ba3e91 Log page on Worker UI displays Some(app-id) - Key: SPARK-2266 URL: https://issues.apache.org/jira/browse/SPARK-2266 Project: Spark Issue Type: Bug Affects Versions: 1.1.0 Reporter: Andrew Or Priority: Minor Fix For: 1.0.1, 1.1.0 Attachments: Screen Shot 2014-06-24 at 5.07.54 PM.png Oops. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2266) Log page on Worker UI displays Some(app-id)
[ https://issues.apache.org/jira/browse/SPARK-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2266. Resolution: Fixed Fix Version/s: 1.0.1 Log page on Worker UI displays Some(app-id) - Key: SPARK-2266 URL: https://issues.apache.org/jira/browse/SPARK-2266 Project: Spark Issue Type: Bug Affects Versions: 1.1.0 Reporter: Andrew Or Priority: Minor Fix For: 1.0.1, 1.1.0 Attachments: Screen Shot 2014-06-24 at 5.07.54 PM.png Oops. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-985) Support Job Cancellation on Mesos Scheduler
[ https://issues.apache.org/jira/browse/SPARK-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044079#comment-14044079 ] Patrick Wendell commented on SPARK-985: --- Some more notes on this from a related thread: Task killing is not supported in the fine-grained mode on mesos because, in that mode, we use Mesos's built in support for all of the control plane messages relating to tasks. So we'll have to figure out how to support killing tasks in that model. There are two questions, one is who actually sends the kill message to the executor and the other is how we tell Mesos that the cores are freed which were in use by the task. In the course of normal operation that's handled by using the Mesos launchTask and sendStatusUpdate interfaces. Support Job Cancellation on Mesos Scheduler --- Key: SPARK-985 URL: https://issues.apache.org/jira/browse/SPARK-985 Project: Spark Issue Type: Improvement Components: Mesos Affects Versions: 0.9.0 Reporter: Josh Rosen https://github.com/apache/incubator-spark/pull/29 added job cancellation but may still need support for Mesos scheduler backends: Quote: {quote} This looks good except that MesosSchedulerBackend isn't yet calling Mesos's killTask. Do you want to add that too or are you planning to push it till later? I don't think it's a huge change. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-985) Support Job Cancellation on Mesos Scheduler
[ https://issues.apache.org/jira/browse/SPARK-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-985: -- Component/s: Mesos Support Job Cancellation on Mesos Scheduler --- Key: SPARK-985 URL: https://issues.apache.org/jira/browse/SPARK-985 Project: Spark Issue Type: Improvement Components: Mesos Affects Versions: 0.9.0 Reporter: Josh Rosen https://github.com/apache/incubator-spark/pull/29 added job cancellation but may still need support for Mesos scheduler backends: Quote: {quote} This looks good except that MesosSchedulerBackend isn't yet calling Mesos's killTask. Do you want to add that too or are you planning to push it till later? I don't think it's a huge change. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
[ https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2204: --- Assignee: Sebastien Rainville Scheduler for Mesos in fine-grained mode launches tasks on wrong executors -- Key: SPARK-2204 URL: https://issues.apache.org/jira/browse/SPARK-2204 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 1.0.0 Reporter: Sebastien Rainville Assignee: Sebastien Rainville Priority: Blocker Fix For: 1.0.1, 1.1.0 MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning task lists in the same order as the offers it was passed, but in the current implementation TaskSchedulerImpl.resourceOffers shuffles the offers to avoid assigning the tasks always to the same executors. The result is that the tasks are launched on the wrong executors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1749) DAGScheduler supervisor strategy broken with Mesos
[ https://issues.apache.org/jira/browse/SPARK-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1749. Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Issue resolved by pull request 1219 [https://github.com/apache/spark/pull/1219] DAGScheduler supervisor strategy broken with Mesos -- Key: SPARK-1749 URL: https://issues.apache.org/jira/browse/SPARK-1749 Project: Spark Issue Type: Bug Components: Mesos, Spark Core Affects Versions: 1.0.0 Reporter: Bouke van der Bijl Assignee: Mark Hamstra Priority: Blocker Labels: mesos, scheduler, scheduling Fix For: 1.0.1, 1.1.0 Any bad Python code will trigger this bug, for example `sc.parallelize(range(100)).map(lambda n: undefined_variable * 2).collect()` will cause a `undefined_variable isn't defined`, which will cause spark to try to kill the task, resulting in the following stacktrace: java.lang.UnsupportedOperationException at org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32) at org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:184) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:182) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:182) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:182) at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:175) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:175) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1058) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1045) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1045) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1045) at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:998) at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:499) at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:499) at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:499) at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:499) at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGScheduler.scala:1151) at org.apache.spark.scheduler.DAGSchedulerActorSupervisor$$anonfun$2.applyOrElse(DAGScheduler.scala:1147) at akka.actor.SupervisorStrategy.handleFailure(FaultHandling.scala:295) at akka.actor.dungeon.FaultHandling$class.handleFailure(FaultHandling.scala:253) at akka.actor.ActorCell.handleFailure(ActorCell.scala:338) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:423) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262) at akka.dispatch.Mailbox.run(Mailbox.scala:218) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) This is because killTask isn't implemented for the MesosSchedulerBackend. I assume this isn't pyspark-specific, as there will be other instances where you might want to kill the task -- This message was sent by Atlassian JIRA
[jira] [Resolved] (SPARK-2251) MLLib Naive Bayes Example SparkException: Can only zip RDDs with same number of elements in each partition
[ https://issues.apache.org/jira/browse/SPARK-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2251. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1229 [https://github.com/apache/spark/pull/1229] MLLib Naive Bayes Example SparkException: Can only zip RDDs with same number of elements in each partition -- Key: SPARK-2251 URL: https://issues.apache.org/jira/browse/SPARK-2251 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Environment: OS: Fedora Linux Spark Version: 1.0.0. Git clone from the Spark Repository Reporter: Jun Xie Assignee: Xiangrui Meng Priority: Minor Labels: Naive-Bayes Fix For: 1.0.1, 1.1.0 I follow the exact code from Naive Bayes Example (http://spark.apache.org/docs/latest/mllib-naive-bayes.html) of MLLib. When I executed the final command: val accuracy = 1.0 * predictionAndLabel.filter(x = x._1 == x._2).count() / test.count() It complains Can only zip RDDs with same number of elements in each partition. I got the following exception: {code} 14/06/23 19:39:23 INFO SparkContext: Starting job: count at console:31 14/06/23 19:39:23 INFO DAGScheduler: Got job 3 (count at console:31) with 2 output partitions (allowLocal=false) 14/06/23 19:39:23 INFO DAGScheduler: Final stage: Stage 4(count at console:31) 14/06/23 19:39:23 INFO DAGScheduler: Parents of final stage: List() 14/06/23 19:39:23 INFO DAGScheduler: Missing parents: List() 14/06/23 19:39:23 INFO DAGScheduler: Submitting Stage 4 (FilteredRDD[14] at filter at console:31), which has no missing parents 14/06/23 19:39:23 INFO DAGScheduler: Submitting 2 missing tasks from Stage 4 (FilteredRDD[14] at filter at console:31) 14/06/23 19:39:23 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks 14/06/23 19:39:23 INFO TaskSetManager: Starting task 4.0:0 as TID 8 on executor localhost: localhost (PROCESS_LOCAL) 14/06/23 19:39:23 INFO TaskSetManager: Serialized task 4.0:0 as 3410 bytes in 0 ms 14/06/23 19:39:23 INFO TaskSetManager: Starting task 4.0:1 as TID 9 on executor localhost: localhost (PROCESS_LOCAL) 14/06/23 19:39:23 INFO TaskSetManager: Serialized task 4.0:1 as 3410 bytes in 1 ms 14/06/23 19:39:23 INFO Executor: Running task ID 8 14/06/23 19:39:23 INFO Executor: Running task ID 9 14/06/23 19:39:23 INFO BlockManager: Found block broadcast_0 locally 14/06/23 19:39:23 INFO BlockManager: Found block broadcast_0 locally 14/06/23 19:39:23 INFO HadoopRDD: Input split: file:/home/jun/open_source/spark/mllib/data/sample_naive_bayes_data.txt:0+24 14/06/23 19:39:23 INFO HadoopRDD: Input split: file:/home/jun/open_source/spark/mllib/data/sample_naive_bayes_data.txt:24+24 14/06/23 19:39:23 INFO HadoopRDD: Input split: file:/home/jun/open_source/spark/mllib/data/sample_naive_bayes_data.txt:0+24 14/06/23 19:39:23 INFO HadoopRDD: Input split: file:/home/jun/open_source/spark/mllib/data/sample_naive_bayes_data.txt:24+24 14/06/23 19:39:23 ERROR Executor: Exception in task ID 9 org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition at org.apache.spark.rdd.RDD$$anonfun$zip$1$$anon$1.hasNext(RDD.scala:663) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1067) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:858) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:858) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1079) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1079) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) 14/06/23 19:39:23 ERROR Executor: Exception in task ID 8 org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition at org.apache.spark.rdd.RDD$$anonfun$zip$1$$anon$1.hasNext(RDD.scala:663) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1067) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:858) at
[jira] [Commented] (SPARK-2279) JavaSparkContext should allow creation of EmptyRDD
[ https://issues.apache.org/jira/browse/SPARK-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045517#comment-14045517 ] Patrick Wendell commented on SPARK-2279: Ah I see - I thought you meant the EmptyRDD class not the emptyRDD() method (which I forgot we even had!). It definitely makes sense to include the latter in the Java API. JavaSparkContext should allow creation of EmptyRDD -- Key: SPARK-2279 URL: https://issues.apache.org/jira/browse/SPARK-2279 Project: Spark Issue Type: New Feature Components: Java API Affects Versions: 1.0.0 Reporter: Hans Uhlig Scala Implementation currently supports creation of EmptyRDD. Java does not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2181) The keys for sorting the columns of Executor page in SparkUI are incorrect
[ https://issues.apache.org/jira/browse/SPARK-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2181. Resolution: Fixed Fix Version/s: 1.0.2 1.1.0 The keys for sorting the columns of Executor page in SparkUI are incorrect -- Key: SPARK-2181 URL: https://issues.apache.org/jira/browse/SPARK-2181 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Shuo Xiang Assignee: Guoqiang Li Priority: Minor Fix For: 1.1.0, 1.0.2 Under the Executor page of SparkUI, each column is sorted alphabetically (after clicking). However, it should be sorted by the value, not the string. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045564#comment-14045564 ] Patrick Wendell commented on SPARK-2228: I ran your reproduction locally. What I found was that it just generates events more quickly than the listener can process, so that was triggering all of the subsequent errors: {code} $ cat job-log.txt |grep ERROR | head -n 10 14/06/26 22:41:02 ERROR scheduler.LiveListenerBus: Dropping SparkListenerEvent because no remaining room in event queue. This likely means one of the SparkListeners is too slow and cannot keep up withthe rate at which tasks are being started by the scheduler. 14/06/26 22:42:01 ERROR scheduler.LiveListenerBus: Listener JobProgressListener threw an exception 14/06/26 22:42:01 ERROR scheduler.LiveListenerBus: Listener JobProgressListener threw an exception 14/06/26 22:42:01 ERROR scheduler.LiveListenerBus: Listener JobProgressListener threw an exception 14/06/26 22:42:01 ERROR scheduler.LiveListenerBus: Listener JobProgressListener threw an exception 14/06/26 22:42:01 ERROR scheduler.LiveListenerBus: Listener JobProgressListener threw an exception 14/06/26 22:42:01 ERROR scheduler.LiveListenerBus: Listener JobProgressListener threw an exception {code} If someone submits a job that creates thousands of stages in a few seconds this can happen. But I haven't seen it happen in a real production job that does actual nontrivial work inside of the stage. We could consider an alternative design that applies back pressure instead of dropping events. onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted - Key: SPARK-2228 URL: https://issues.apache.org/jira/browse/SPARK-2228 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Baoxu Shi We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative computing, but after several hundreds of iterations, there will be `NoSuchElementsError`. We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark` did add a stage but failed to send the message to listeners. When sending `finish` message to listeners, the error occurs. This problem will cause a huge number of `active stages` showing in `SparkUI`, which is really annoying. But it may not affect the final result, according to the result of my testing code. I'm willing to help solve this problem, any idea about which part should I change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks fine to me. FYI, here is the test code that could reproduce the problem. I do not know who to put code here with highlight, so I put the code on gist to make the issue looks clean. https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045626#comment-14045626 ] Patrick Wendell commented on SPARK-2228: [~rxin] unfortunately I think it's more complicated because the inconsistency can happen in both directions. We can miss an event for a stage finishing or we can miss an event for the stage starting. That means we either try to finish a missing stage (and get an NPE), or we have a straggler stage that looks like it never ended in the UI. onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted - Key: SPARK-2228 URL: https://issues.apache.org/jira/browse/SPARK-2228 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Baoxu Shi We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative computing, but after several hundreds of iterations, there will be `NoSuchElementsError`. We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark` did add a stage but failed to send the message to listeners. When sending `finish` message to listeners, the error occurs. This problem will cause a huge number of `active stages` showing in `SparkUI`, which is really annoying. But it may not affect the final result, according to the result of my testing code. I'm willing to help solve this problem, any idea about which part should I change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks fine to me. FYI, here is the test code that could reproduce the problem. I do not know who to put code here with highlight, so I put the code on gist to make the issue looks clean. https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2291) Update EC2 scripts to use instance storage on m3 instance types
[ https://issues.apache.org/jira/browse/SPARK-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2291. Resolution: Duplicate Was already fixed by this PR: https://github.com/apache/spark/pull/1156 Update EC2 scripts to use instance storage on m3 instance types --- Key: SPARK-2291 URL: https://issues.apache.org/jira/browse/SPARK-2291 Project: Spark Issue Type: Improvement Components: EC2 Affects Versions: 0.9.0, 0.9.1, 1.0.0 Reporter: Alessandro Andrioni [On January 21|https://aws.amazon.com/about-aws/whats-new/2014/01/21/announcing-new-amazon-ec2-m3-instance-sizes-and-lower-prices-for-amazon-s3-and-amazon-ebs/], Amazon added SSD-backed instance storages for m3 instances, and also added two new types: m3.medium and m3.large. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (SPARK-2292) NullPointerException in JavaPairRDD.mapToPair
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046445#comment-14046445 ] Patrick Wendell edited comment on SPARK-2292 at 6/27/14 9:54 PM: - Unfortunately I also can't reproduce this issue. I tried the example job from [~mkim], but I had to generate my own CSV file because non was provided. And I found that there was no exception. Looking at the code I don't see an obvious cause for this, so it would be nice to have a reliable reproduction. was (Author: pwendell): Unfortunately I also can't reproduce this issue. I tried the example job from [~mkim], but I had to generate my own CSV file because non was provided. NullPointerException in JavaPairRDD.mapToPair - Key: SPARK-2292 URL: https://issues.apache.org/jira/browse/SPARK-2292 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Environment: Spark 1.0.0, Standalone with the master single slave running on Ubuntu on a laptop. 4G mem and 8 cores were available to the executor . Reporter: Bharath Ravi Kumar Priority: Critical Correction: Invoking JavaPairRDD.mapToPair results in an NPE: {noformat} 14/06/26 21:05:35 WARN scheduler.TaskSetManager: Loss was due to java.lang.NullPointerException java.lang.NullPointerException at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750) at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} This occurs only after migrating to the 1.0.0 API. The details of the code the data file used to test are included in this gist : https://gist.github.com/reachbach/d8977c8eb5f71f889301 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2307) SparkUI Storage page cached statuses incorrect
[ https://issues.apache.org/jira/browse/SPARK-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2307: --- Assignee: Andrew Or SparkUI Storage page cached statuses incorrect -- Key: SPARK-2307 URL: https://issues.apache.org/jira/browse/SPARK-2307 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 1.1.0 Reporter: Andrew Or Assignee: Andrew Or Fix For: 1.0.1, 1.1.0 Attachments: Screen Shot 2014-06-27 at 11.09.54 AM.png See attached: the executor has 512MB, but somehow it has cached (279 + 27 + 279 + 27) = 612MB? (The correct answer is 279MB). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2307) SparkUI Storage page cached statuses incorrect
[ https://issues.apache.org/jira/browse/SPARK-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2307. Resolution: Fixed Fix Version/s: 1.0.1 Issue resolved by pull request 1249 [https://github.com/apache/spark/pull/1249] SparkUI Storage page cached statuses incorrect -- Key: SPARK-2307 URL: https://issues.apache.org/jira/browse/SPARK-2307 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.0.1, 1.1.0 Attachments: Screen Shot 2014-06-27 at 11.09.54 AM.png See attached: the executor has 512MB, but somehow it has cached (279 + 27 + 279 + 27) = 612MB? (The correct answer is 279MB). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2259) Spark submit documentation for --deploy-mode is highly misleading
[ https://issues.apache.org/jira/browse/SPARK-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2259. Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Issue resolved by pull request 1200 [https://github.com/apache/spark/pull/1200] Spark submit documentation for --deploy-mode is highly misleading - Key: SPARK-2259 URL: https://issues.apache.org/jira/browse/SPARK-2259 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 1.1.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical Fix For: 1.0.1, 1.1.0 There are a few issues: 1. Client mode does not necessarily mean the driver program must be launched outside of the cluster. 2. For standalone clusters, only client mode is currently supported. This was the case supported even before 1.0. Currently, the docs tell the user to use cluster deploy mode when deploying your driver program within the cluster, which is true also for standalone-client mode. In short, the docs encourage the user to use standalone-cluster, an unsupported mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2243: --- Summary: Support multiple SparkContexts in the same JVM (was: Using several Spark Contexts) Support multiple SparkContexts in the same JVM -- Key: SPARK-2243 URL: https://issues.apache.org/jira/browse/SPARK-2243 Project: Spark Issue Type: New Feature Components: Block Manager, Spark Core Affects Versions: 1.0.0 Reporter: Miguel Angel Fernandez Diaz We're developing a platform where we create several Spark contexts for carrying out different calculations. Is there any restriction when using several Spark contexts? We have two contexts, one for Spark calculations and another one for Spark Streaming jobs. The next error arises when we first execute a Spark calculation and, once the execution is finished, a Spark Streaming job is launched: {code} 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Loss was due to java.io.FileNotFoundException java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at
[jira] [Commented] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046574#comment-14046574 ] Patrick Wendell commented on SPARK-2243: This is not supported - but it's something we could support in the future. Note though that you can use the same SparkContext for a streaming program and your own calculations. This is actually better in some ways because they can share data. Support multiple SparkContexts in the same JVM -- Key: SPARK-2243 URL: https://issues.apache.org/jira/browse/SPARK-2243 Project: Spark Issue Type: New Feature Components: Block Manager, Spark Core Affects Versions: 1.0.0 Reporter: Miguel Angel Fernandez Diaz We're developing a platform where we create several Spark contexts for carrying out different calculations. Is there any restriction when using several Spark contexts? We have two contexts, one for Spark calculations and another one for Spark Streaming jobs. The next error arises when we first execute a Spark calculation and, once the execution is finished, a Spark Streaming job is launched: {code} 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Loss was due to java.io.FileNotFoundException java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at
[jira] [Commented] (SPARK-2111) pyspark errors when SPARK_PRINT_LAUNCH_COMMAND=1
[ https://issues.apache.org/jira/browse/SPARK-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046586#comment-14046586 ] Patrick Wendell commented on SPARK-2111: I was thinking that SPARK-2313 might be a better general solution to this. pyspark errors when SPARK_PRINT_LAUNCH_COMMAND=1 Key: SPARK-2111 URL: https://issues.apache.org/jira/browse/SPARK-2111 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.0.0 Reporter: Thomas Graves If you set SPARK_PRINT_LAUNCH_COMMAND=1 to see what java command is being used to launch spark and then try to run pyspark it errors out with a very non-useful error message: Traceback (most recent call last): File /homes/tgraves/test/hadoop2/y-spark-git/python/pyspark/shell.py, line 43, in module sc = SparkContext(appName=PySparkShell, pyFiles=add_files) File /homes/tgraves/test/hadoop2/y-spark-git/python/pyspark/context.py, line 94, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File /homes/tgraves/test/hadoop2/y-spark-git/python/pyspark/context.py, line 184, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File /homes/tgraves/test/hadoop2/y-spark-git/python/pyspark/java_gateway.py, line 51, in launch_gateway gateway_port = int(proc.stdout.readline()) ValueError: invalid literal for int() with base 10: 'Spark Command: /home/gs/java/jdk/bin/java -cp :/home/gs/hadoop/current/share/hadoop/common/hadoop-gpl-compression.jar:/home/gs/hadoop/current/share/hadoop/hdfs/lib/YahooDNSToSwitchMapping-0.2.14020207' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2313) PySpark should accept port via a command line argument rather than STDIN
Patrick Wendell created SPARK-2313: -- Summary: PySpark should accept port via a command line argument rather than STDIN Key: SPARK-2313 URL: https://issues.apache.org/jira/browse/SPARK-2313 Project: Spark Issue Type: Bug Components: PySpark Reporter: Patrick Wendell Relying on stdin is a brittle mechanism and has broken several times in the past. From what I can tell this is used only to bootstrap worker.py one time. It would be strictly simpler to just pass it is a command line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046940#comment-14046940 ] Patrick Wendell commented on SPARK-2228: So I dug into this more and profiled it to confirm. The issue is that we do a bunch of inefficient operations in the storage listener. For instance I noticed we spend almost all the times doing a big scala groupBy on the entire list of persisted blocks: {code} at java.lang.Integer.valueOf(Integer.java:642) at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70) at org.apache.spark.storage.StorageUtils$$anonfun$9.apply(StorageUtils.scala:82) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:328) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327) at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105) at org.apache.spark.storage.StorageUtils$.rddInfoFromStorageStatus(StorageUtils.scala:82) at org.apache.spark.ui.storage.StorageListener.updateRDDInfo(StorageTab.scala:56) at org.apache.spark.ui.storage.StorageListener.onTaskEnd(StorageTab.scala:67) - locked 0xa27ebe30 (a org.apache.spark.ui.storage.StorageListener) {code} Resizing this buffer won't help the underlying issue it all, it will just defer the time until failure to be longer. onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted - Key: SPARK-2228 URL: https://issues.apache.org/jira/browse/SPARK-2228 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Baoxu Shi We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative computing, but after several hundreds of iterations, there will be `NoSuchElementsError`. We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark` did add a stage but failed to send the message to listeners. When sending `finish` message to listeners, the error occurs. This problem will cause a huge number of `active stages` showing in `SparkUI`, which is really annoying. But it may not affect the final result, according to the result of my testing code. I'm willing to help solve this problem, any idea about which part should I change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks fine to me. FYI, here is the test code that could reproduce the problem. I do not know who to put code here with highlight, so I put the code on gist to make the issue looks clean. https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2316) StorageStatusListener should avoid O(blocks) operations
Patrick Wendell created SPARK-2316: -- Summary: StorageStatusListener should avoid O(blocks) operations Key: SPARK-2316 URL: https://issues.apache.org/jira/browse/SPARK-2316 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 1.0.0 Reporter: Patrick Wendell Assignee: Andrew Or In the case where jobs are frequently causing dropped blocks the storage status listener can bottleneck. This is slow for a few reasons, one being that we use Scala collection operations, the other being that we operations that are O(number of blocks). I think using a few indices here could make this much faster. {code} at java.lang.Integer.valueOf(Integer.java:642) at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:70) at org.apache.spark.storage.StorageUtils$$anonfun$9.apply(StorageUtils.scala:82) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:328) at scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) at scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327) at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105) at org.apache.spark.storage.StorageUtils$.rddInfoFromStorageStatus(StorageUtils.scala:82) at org.apache.spark.ui.storage.StorageListener.updateRDDInfo(StorageTab.scala:56) at org.apache.spark.ui.storage.StorageListener.onTaskEnd(StorageTab.scala:67) - locked 0xa27ebe30 (a org.apache.spark.ui.storage.StorageListener) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2228) onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted
[ https://issues.apache.org/jira/browse/SPARK-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046943#comment-14046943 ] Patrick Wendell commented on SPARK-2228: I've created SPARK-2316 to deal with the underlying issue here. The fix in this pull request might also alleviate this issue, since it removes dropped blocks from the set that is considered by the UI: https://github.com/apache/spark/pull/1255 onStageSubmitted does not properly called so NoSuchElement will be thrown in onStageCompleted - Key: SPARK-2228 URL: https://issues.apache.org/jira/browse/SPARK-2228 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Baoxu Shi We are using `SaveAsObjectFile` and `objectFile` to cut off lineage during iterative computing, but after several hundreds of iterations, there will be `NoSuchElementsError`. We check the code and locate the problem at `org.apache.spark.ui.jobs.JobProgressListener`. When `onStageCompleted` is called, such `stageId` can not be found in `stageIdToPool`, but it does exist in other HashMaps. So we think `onStageSubmitted` is not properly called. `Spark` did add a stage but failed to send the message to listeners. When sending `finish` message to listeners, the error occurs. This problem will cause a huge number of `active stages` showing in `SparkUI`, which is really annoying. But it may not affect the final result, according to the result of my testing code. I'm willing to help solve this problem, any idea about which part should I change? I assume `org.apache.spark.scheduler.SparkListenerBus` have something to do with it but it looks fine to me. FYI, here is the test code that could reproduce the problem. I do not know who to put code here with highlight, so I put the code on gist to make the issue looks clean. https://gist.github.com/bxshi/b5c0fe0ae089c75a39bd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2292) NullPointerException in JavaPairRDD.mapToPair
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046946#comment-14046946 ] Patrick Wendell commented on SPARK-2292: [~aash] With your example code I was able to narrow this down (slightly). I think there is something subtle going on here at the byte code level. Your example links against the spark-1.0.0 binaries in Maven. 1. If I ran your example on a download Spark 1.0.0 cluster (I just went and downloaded the Spark binaries) it worked fine. 2. If I ran your example on a local Spark cluster that I compiled myself with SBT, even with the 1.0.0 tag, it didn't work. I'm wondering if this is something similar to SPARK-2075. In general, it would be good if people used spark-submit binary that is compiled at the same time as their cluster to submit jobs. Otherwise, there can be issues where a closure is created using an internal class name that is different than that on the cluster. NullPointerException in JavaPairRDD.mapToPair - Key: SPARK-2292 URL: https://issues.apache.org/jira/browse/SPARK-2292 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Environment: Spark 1.0.0, Standalone with the master single slave running on Ubuntu on a laptop. 4G mem and 8 cores were available to the executor . Reporter: Bharath Ravi Kumar Attachments: SPARK-2292-aash-repro.tar.gz Correction: Invoking JavaPairRDD.mapToPair results in an NPE: {noformat} 14/06/26 21:05:35 WARN scheduler.TaskSetManager: Loss was due to java.lang.NullPointerException java.lang.NullPointerException at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750) at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} This occurs only after migrating to the 1.0.0 API. The details of the code the data file used to test are included in this gist : https://gist.github.com/reachbach/d8977c8eb5f71f889301 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2233) make-distribution script should list the git hash in the RELEASE file
[ https://issues.apache.org/jira/browse/SPARK-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2233: --- Assignee: Guillaume Ballet make-distribution script should list the git hash in the RELEASE file - Key: SPARK-2233 URL: https://issues.apache.org/jira/browse/SPARK-2233 Project: Spark Issue Type: Improvement Components: Project Infra Reporter: Patrick Wendell Assignee: Guillaume Ballet Priority: Minor Labels: starter Fix For: 1.1.0 If someone is creating a distribution and also has a version of Spark that has a .git folder in it, we should list the current git hash and put that in the RELEASE file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2233) make-distribution script should list the git hash in the RELEASE file
[ https://issues.apache.org/jira/browse/SPARK-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2233. Resolution: Fixed Fix Version/s: 1.1.0 Issue resolved by pull request 1216 [https://github.com/apache/spark/pull/1216] make-distribution script should list the git hash in the RELEASE file - Key: SPARK-2233 URL: https://issues.apache.org/jira/browse/SPARK-2233 Project: Spark Issue Type: Improvement Components: Project Infra Reporter: Patrick Wendell Priority: Minor Labels: starter Fix For: 1.1.0 If someone is creating a distribution and also has a version of Spark that has a .git folder in it, we should list the current git hash and put that in the RELEASE file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2292) NullPointerException in JavaPairRDD.mapToPair
[ https://issues.apache.org/jira/browse/SPARK-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047039#comment-14047039 ] Patrick Wendell commented on SPARK-2292: It would still be good to debug and fix this issue because there are definitely users who want to come with their own version of spark. But a work around in the short term is to use spark-submit or somehow else inject the same spark jars that are present on the cluster in the claspath when you submit your app. --- sent from my phone NullPointerException in JavaPairRDD.mapToPair - Key: SPARK-2292 URL: https://issues.apache.org/jira/browse/SPARK-2292 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Environment: Spark 1.0.0, Standalone with the master single slave running on Ubuntu on a laptop. 4G mem and 8 cores were available to the executor . Reporter: Bharath Ravi Kumar Attachments: SPARK-2292-aash-repro.tar.gz Correction: Invoking JavaPairRDD.mapToPair results in an NPE: {noformat} 14/06/26 21:05:35 WARN scheduler.TaskSetManager: Loss was due to java.lang.NullPointerException java.lang.NullPointerException at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750) at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} This occurs only after migrating to the 1.0.0 API. The details of the code the data file used to test are included in this gist : https://gist.github.com/reachbach/d8977c8eb5f71f889301 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2003) SparkContext(SparkConf) doesn't work in pyspark
[ https://issues.apache.org/jira/browse/SPARK-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2003. Resolution: Won't Fix SparkContext(SparkConf) doesn't work in pyspark --- Key: SPARK-2003 URL: https://issues.apache.org/jira/browse/SPARK-2003 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 1.0.0 Reporter: Diana Carroll Fix For: 1.0.1, 1.1.0 Using SparkConf with SparkContext as described in the Programming Guide does NOT work in Python: conf = SparkConf.setAppName(blah) sc = SparkContext(conf) When I tried I got AttributeError: 'SparkConf' object has no attribute '_get_object_id' [This equivalent code in Scala works fine: val conf = new SparkConf().setAppName(blah) val sc = new SparkContext(conf)] I think this is because there's no equivalent for the Scala constructor SparkContext(SparkConf). Workaround: If I explicitly set the conf parameter in the python call, it does work: sconf = SparkConf.setAppName(blah) sc = SparkContext(conf=sconf) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2003) SparkContext(SparkConf) doesn't work in pyspark
[ https://issues.apache.org/jira/browse/SPARK-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049456#comment-14049456 ] Patrick Wendell commented on SPARK-2003: If I understand correctly, [~dcarr...@cloudera.com] is asking us to change this API to make it more consistent with other languages. I don't see a way of doing this without breaking the existing behavior for old users (which we can't do). In python, it's not possible to overload constructors in the same way as in Java because it's not strongly typed. I'd guess this is why Matei didn't change it when he refactored the constructor to take a configuration. For that reason I'm going to close this as wontFix - but if there is indeed a backwards-compatible way to do that, please feel free to re-open it with a proposal. SparkContext(SparkConf) doesn't work in pyspark --- Key: SPARK-2003 URL: https://issues.apache.org/jira/browse/SPARK-2003 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 1.0.0 Reporter: Diana Carroll Fix For: 1.0.1, 1.1.0 Using SparkConf with SparkContext as described in the Programming Guide does NOT work in Python: conf = SparkConf.setAppName(blah) sc = SparkContext(conf) When I tried I got AttributeError: 'SparkConf' object has no attribute '_get_object_id' [This equivalent code in Scala works fine: val conf = new SparkConf().setAppName(blah) val sc = new SparkContext(conf)] I think this is because there's no equivalent for the Scala constructor SparkContext(SparkConf). Workaround: If I explicitly set the conf parameter in the python call, it does work: sconf = SparkConf.setAppName(blah) sc = SparkContext(conf=sconf) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-1981) Add AWS Kinesis streaming support
[ https://issues.apache.org/jira/browse/SPARK-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049560#comment-14049560 ] Patrick Wendell commented on SPARK-1981: Assigned! Add AWS Kinesis streaming support - Key: SPARK-1981 URL: https://issues.apache.org/jira/browse/SPARK-1981 Project: Spark Issue Type: New Feature Components: Streaming Reporter: Chris Fregly Assignee: Chris Fregly Add AWS Kinesis support to Spark Streaming. Initial discussion occured here: https://github.com/apache/spark/pull/223 I discussed this with Parviz from AWS recently and we agreed that I would take this over. Look for a new PR that takes into account all the feedback from the earlier PR including spark-1.0-compliant implementation, AWS-license-aware build support, tests, comments, and style guide compliance. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2109) Setting SPARK_MEM for bin/pyspark does not work.
[ https://issues.apache.org/jira/browse/SPARK-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2109. Resolution: Fixed Fixed in master and 1.0 via https://github.com/apache/spark/pull/1050/files Setting SPARK_MEM for bin/pyspark does not work. - Key: SPARK-2109 URL: https://issues.apache.org/jira/browse/SPARK-2109 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Prashant Sharma Assignee: Prashant Sharma Priority: Critical Fix For: 1.0.1, 1.1.0 prashant@sc:~/work/spark$ SPARK_MEM=10G bin/pyspark Python 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] on linux2 Type help, copyright, credits or license for more information. Traceback (most recent call last): File /home/prashant/work/spark/python/pyspark/shell.py, line 43, in module sc = SparkContext(appName=PySparkShell, pyFiles=add_files) File /home/prashant/work/spark/python/pyspark/context.py, line 94, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File /home/prashant/work/spark/python/pyspark/context.py, line 190, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File /home/prashant/work/spark/python/pyspark/java_gateway.py, line 51, in launch_gateway gateway_port = int(proc.stdout.readline()) ValueError: invalid literal for int() with base 10: 'Warning: SPARK_MEM is deprecated, please use a more specific config option\n' -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2350) Master throws NPE
[ https://issues.apache.org/jira/browse/SPARK-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2350. Resolution: Fixed Fix Version/s: 1.0.1 Issue resolved by pull request 1289 [https://github.com/apache/spark/pull/1289] Master throws NPE - Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Fix For: 1.0.1, 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. Here is the culprit from Master.scala (L487 as of the creation of this JIRA, commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c). {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2350) Master throws NPE
[ https://issues.apache.org/jira/browse/SPARK-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2350: --- Assignee: Aaron Davidson Master throws NPE - Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Assignee: Aaron Davidson Fix For: 1.0.1, 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. Here is the culprit from Master.scala (L487 as of the creation of this JIRA, commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c). {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2307) SparkUI Storage page cached statuses incorrect
[ https://issues.apache.org/jira/browse/SPARK-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052183#comment-14052183 ] Patrick Wendell commented on SPARK-2307: There was a follow up patch: https://github.com/apache/spark/pull/1255 SparkUI Storage page cached statuses incorrect -- Key: SPARK-2307 URL: https://issues.apache.org/jira/browse/SPARK-2307 Project: Spark Issue Type: Bug Components: Spark Core, Web UI Affects Versions: 1.1.0 Reporter: Andrew Or Assignee: Andrew Or Fix For: 1.0.1, 1.1.0 Attachments: Screen Shot 2014-06-27 at 11.09.54 AM.png See attached: the executor has 512MB, but somehow it has cached (279 + 27 + 279 + 27) = 612MB? (The correct answer is 279MB). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2350) Master throws NPE
[ https://issues.apache.org/jira/browse/SPARK-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2350: --- Fix Version/s: 0.9.2 Master throws NPE - Key: SPARK-2350 URL: https://issues.apache.org/jira/browse/SPARK-2350 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Andrew Or Assignee: Aaron Davidson Fix For: 0.9.2, 1.0.1, 1.1.0 ... if we launch a driver and there are more waiting drivers to be launched. This is because we remove from a list while iterating through this. Here is the culprit from Master.scala (L487 as of the creation of this JIRA, commit bc7041a42dfa84312492ea8cae6fdeaeac4f6d1c). {code} for (driver - waitingDrivers) { if (worker.memoryFree = driver.desc.mem worker.coresFree = driver.desc.cores) { launchDriver(worker, driver) waitingDrivers -= driver } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2282) PySpark crashes if too many tasks complete quickly
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2282. Resolution: Fixed Fix Version/s: 1.0.0 1.0.1 0.9.2 PySpark crashes if too many tasks complete quickly -- Key: SPARK-2282 URL: https://issues.apache.org/jira/browse/SPARK-2282 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 0.9.1, 1.0.0, 1.0.1 Reporter: Aaron Davidson Assignee: Aaron Davidson Fix For: 0.9.2, 1.0.1, 1.0.0 Upon every task completion, PythonAccumulatorParam constructs a new socket to the Accumulator server running inside the pyspark daemon. This can cause a buildup of used ephemeral ports from sockets in the TIME_WAIT termination stage, which will cause the SparkContext to crash if too many tasks complete too quickly. We ran into this bug with 17k tasks completing in 15 seconds. This bug can be fixed outside of Spark by ensuring these properties are set (on a linux server); echo 1 /proc/sys/net/ipv4/tcp_tw_reuse echo 1 /proc/sys/net/ipv4/tcp_tw_recycle or by adding the SO_REUSEADDR option to the Socket creation within Spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2282) PySpark crashes if too many tasks complete quickly
[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2282: --- Affects Version/s: 0.9.1 PySpark crashes if too many tasks complete quickly -- Key: SPARK-2282 URL: https://issues.apache.org/jira/browse/SPARK-2282 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 0.9.1, 1.0.0, 1.0.1 Reporter: Aaron Davidson Assignee: Aaron Davidson Fix For: 0.9.2, 1.0.0, 1.0.1 Upon every task completion, PythonAccumulatorParam constructs a new socket to the Accumulator server running inside the pyspark daemon. This can cause a buildup of used ephemeral ports from sockets in the TIME_WAIT termination stage, which will cause the SparkContext to crash if too many tasks complete too quickly. We ran into this bug with 17k tasks completing in 15 seconds. This bug can be fixed outside of Spark by ensuring these properties are set (on a linux server); echo 1 /proc/sys/net/ipv4/tcp_tw_reuse echo 1 /proc/sys/net/ipv4/tcp_tw_recycle or by adding the SO_REUSEADDR option to the Socket creation within Spark. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1199. Resolution: Fixed Fix Version/s: 1.1.0 1.0.1 Resolved via: https://github.com/apache/spark/pull/1179 Type mismatch in Spark shell when using case class defined in shell --- Key: SPARK-1199 URL: https://issues.apache.org/jira/browse/SPARK-1199 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 0.9.0 Reporter: Andrew Kerr Assignee: Prashant Sharma Priority: Blocker Fix For: 1.0.1, 1.1.0 Define a class in the shell: {code} case class TestClass(a:String) {code} and an RDD {code} val data = sc.parallelize(Seq(a)).map(TestClass(_)) {code} define a function on it and map over the RDD {code} def itemFunc(a:TestClass):TestClass = a data.map(itemFunc) {code} Error: {code} console:19: error: type mismatch; found : TestClass = TestClass required: TestClass = ? data.map(itemFunc) {code} Similarly with a mapPartitions: {code} def partitionFunc(a:Iterator[TestClass]):Iterator[TestClass] = a data.mapPartitions(partitionFunc) {code} {code} console:19: error: type mismatch; found : Iterator[TestClass] = Iterator[TestClass] required: Iterator[TestClass] = Iterator[?] Error occurred in an application involving default arguments. data.mapPartitions(partitionFunc) {code} The behavior is the same whether in local mode or on a cluster. This isn't specific to RDDs. A Scala collection in the Spark shell has the same problem. {code} scala Seq(TestClass(foo)).map(itemFunc) console:15: error: type mismatch; found : TestClass = TestClass required: TestClass = ? Seq(TestClass(foo)).map(itemFunc) ^ {code} When run in the Scala console (not the Spark shell) there are no type mismatch errors. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2380) Support displaying accumulator contents in the web UI
Patrick Wendell created SPARK-2380: -- Summary: Support displaying accumulator contents in the web UI Key: SPARK-2380 URL: https://issues.apache.org/jira/browse/SPARK-2380 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Reporter: Patrick Wendell Assignee: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2290) Do not send SPARK_HOME from workers to executors
[ https://issues.apache.org/jira/browse/SPARK-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2290: --- Issue Type: Improvement (was: Bug) Do not send SPARK_HOME from workers to executors Key: SPARK-2290 URL: https://issues.apache.org/jira/browse/SPARK-2290 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: YanTang Zhai Priority: Minor The client path is /data/home/spark/test/spark-1.0.0 while the worker deploy path is /data/home/spark/spark-1.0.0 which is different from the client path. Then an application is launched using the ./bin/spark-submit --class JobTaskJoin --master spark://172.25.38.244:7077 --executor-memory 128M ../jobtaskjoin_2.10-1.0.0.jar. However the application is failed because an exception occurs at java.io.IOException: Cannot run program /data/home/spark/test/spark-1.0.0-bin-0.20.2-cdh3u3/bin/compute-classpath.sh (in directory .): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:759) at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:72) at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37) at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:109) at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:124) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:58) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021) ... 6 more Therefore, I think worker should not use appDesc.sparkHome when LaunchExecutor, Contrarily, worker could use its own sparkHome directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2290) Do not send SPARK_HOME from workers to executors
[ https://issues.apache.org/jira/browse/SPARK-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2290: --- Priority: Major (was: Minor) Do not send SPARK_HOME from workers to executors Key: SPARK-2290 URL: https://issues.apache.org/jira/browse/SPARK-2290 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: YanTang Zhai Assignee: Patrick Wendell The client path is /data/home/spark/test/spark-1.0.0 while the worker deploy path is /data/home/spark/spark-1.0.0 which is different from the client path. Then an application is launched using the ./bin/spark-submit --class JobTaskJoin --master spark://172.25.38.244:7077 --executor-memory 128M ../jobtaskjoin_2.10-1.0.0.jar. However the application is failed because an exception occurs at java.io.IOException: Cannot run program /data/home/spark/test/spark-1.0.0-bin-0.20.2-cdh3u3/bin/compute-classpath.sh (in directory .): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:759) at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:72) at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37) at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:109) at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:124) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:58) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021) ... 6 more Therefore, I think worker should not use appDesc.sparkHome when LaunchExecutor, Contrarily, worker could use its own sparkHome directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2290) Do not send SPARK_HOME from workers to executors
[ https://issues.apache.org/jira/browse/SPARK-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2290: --- Assignee: Patrick Wendell Do not send SPARK_HOME from workers to executors Key: SPARK-2290 URL: https://issues.apache.org/jira/browse/SPARK-2290 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: YanTang Zhai Assignee: Patrick Wendell Priority: Minor The client path is /data/home/spark/test/spark-1.0.0 while the worker deploy path is /data/home/spark/spark-1.0.0 which is different from the client path. Then an application is launched using the ./bin/spark-submit --class JobTaskJoin --master spark://172.25.38.244:7077 --executor-memory 128M ../jobtaskjoin_2.10-1.0.0.jar. However the application is failed because an exception occurs at java.io.IOException: Cannot run program /data/home/spark/test/spark-1.0.0-bin-0.20.2-cdh3u3/bin/compute-classpath.sh (in directory .): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:759) at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:72) at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37) at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:109) at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:124) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:58) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021) ... 6 more Therefore, I think worker should not use appDesc.sparkHome when LaunchExecutor, Contrarily, worker could use its own sparkHome directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2290) Do not send SPARK_HOME from workers to executors
[ https://issues.apache.org/jira/browse/SPARK-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2290: --- Summary: Do not send SPARK_HOME from workers to executors (was: Worker should directly use its own sparkHome instead of appDesc.sparkHome when LaunchExecutor) Do not send SPARK_HOME from workers to executors Key: SPARK-2290 URL: https://issues.apache.org/jira/browse/SPARK-2290 Project: Spark Issue Type: Bug Components: Spark Core Reporter: YanTang Zhai Priority: Minor The client path is /data/home/spark/test/spark-1.0.0 while the worker deploy path is /data/home/spark/spark-1.0.0 which is different from the client path. Then an application is launched using the ./bin/spark-submit --class JobTaskJoin --master spark://172.25.38.244:7077 --executor-memory 128M ../jobtaskjoin_2.10-1.0.0.jar. However the application is failed because an exception occurs at java.io.IOException: Cannot run program /data/home/spark/test/spark-1.0.0-bin-0.20.2-cdh3u3/bin/compute-classpath.sh (in directory .): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:759) at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:72) at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37) at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:109) at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:124) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:58) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021) ... 6 more Therefore, I think worker should not use appDesc.sparkHome when LaunchExecutor, Contrarily, worker could use its own sparkHome directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2290) Do not send SPARK_HOME from workers to executors
[ https://issues.apache.org/jira/browse/SPARK-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054633#comment-14054633 ] Patrick Wendell commented on SPARK-2290: I updated the description here. It is indeed pretty strange that we ship this to the cluster when it's always available there anyways (since the Worker has it's own sparkHome anyways). So we should just remove it. Do not send SPARK_HOME from workers to executors Key: SPARK-2290 URL: https://issues.apache.org/jira/browse/SPARK-2290 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: YanTang Zhai Assignee: Patrick Wendell The client path is /data/home/spark/test/spark-1.0.0 while the worker deploy path is /data/home/spark/spark-1.0.0 which is different from the client path. Then an application is launched using the ./bin/spark-submit --class JobTaskJoin --master spark://172.25.38.244:7077 --executor-memory 128M ../jobtaskjoin_2.10-1.0.0.jar. However the application is failed because an exception occurs at java.io.IOException: Cannot run program /data/home/spark/test/spark-1.0.0-bin-0.20.2-cdh3u3/bin/compute-classpath.sh (in directory .): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:759) at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:72) at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37) at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:109) at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:124) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:58) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.init(UNIXProcess.java:135) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021) ... 6 more Therefore, I think worker should not use appDesc.sparkHome when LaunchExecutor, Contrarily, worker could use its own sparkHome directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2348) In Windows having a enviorinment variable named 'classpath' gives error
[ https://issues.apache.org/jira/browse/SPARK-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2348: --- Assignee: Chirag Todarka In Windows having a enviorinment variable named 'classpath' gives error --- Key: SPARK-2348 URL: https://issues.apache.org/jira/browse/SPARK-2348 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Environment: Windows 7 Enterprise Reporter: Chirag Todarka Assignee: Chirag Todarka Operating System:: Windows 7 Enterprise If having enviorinment variable named 'classpath' gives then starting 'spark-shell' gives below error:: mydir\spark\binspark-shell Failed to initialize compiler: object scala.runtime in compiler mirror not found . ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programatically, settings.usejavacp.value = true. 14/07/02 14:22:06 WARN SparkILoop$SparkILoopInterpreter: Warning: compiler acces sed before init set up. Assuming no postInit code. Failed to initialize compiler: object scala.runtime in compiler mirror not found . ** Note that as of 2.8 scala does not assume use of the java classpath. ** For the old behavior pass -usejavacp to scala, or if using a Settings ** object programatically, settings.usejavacp.value = true. Exception in thread main java.lang.AssertionError: assertion failed: null at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.repl.SparkIMain.initializeSynchronous(SparkIMain.sca la:202) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(Spar kILoop.scala:929) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop. scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop. scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClass Loader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2379) stopReceive in dead loop, cause stackoverflow exception
[ https://issues.apache.org/jira/browse/SPARK-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2379: --- Component/s: Streaming stopReceive in dead loop, cause stackoverflow exception --- Key: SPARK-2379 URL: https://issues.apache.org/jira/browse/SPARK-2379 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.0.0 Reporter: sunshangchun streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisor.scala stop will call stopReceiver and stopReceiver will call stop if exception occurs, that make a dead loop. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2409) Make SQLConf thread safe
[ https://issues.apache.org/jira/browse/SPARK-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2409: --- Component/s: SQL Make SQLConf thread safe Key: SPARK-2409 URL: https://issues.apache.org/jira/browse/SPARK-2409 Project: Spark Issue Type: Improvement Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable
[ https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2201: --- Component/s: Streaming Improve FlumeInputDStream's stability and make it scalable -- Key: SPARK-2201 URL: https://issues.apache.org/jira/browse/SPARK-2201 Project: Spark Issue Type: Improvement Components: Streaming Reporter: sunshangchun Currently: FlumeUtils.createStream(ssc, localhost, port); This means that only one flume receiver can work with FlumeInputDStream .so the solution is not scalable. I use a zookeeper to solve this problem. Spark flume receivers register themselves to a zk path when started, and a flume agent get physical hosts and push events to them. Some works need to be done here: 1.receiver create tmp node in zk, listeners just watch those tmp nodes. 2. when spark FlumeReceivers started, they acquire a physical host (localhost's ip and an idle port) and register itself to zookeeper. 3. A new flume sink. In the method of appendEvents, they get physical hosts and push data to them in a round-robin manner. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2414) Remove jquery
[ https://issues.apache.org/jira/browse/SPARK-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2414: --- Component/s: Web UI Remove jquery - Key: SPARK-2414 URL: https://issues.apache.org/jira/browse/SPARK-2414 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Reynold Xin Assignee: Reynold Xin Priority: Minor SPARK-2384 introduces jquery for tooltip display. We can probably just create a very simple javascript for tooltip instead of pulling in jquery. https://github.com/apache/spark/pull/1314 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2381) streaming receiver crashed,but seems nothing happened
[ https://issues.apache.org/jira/browse/SPARK-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2381: --- Component/s: Streaming streaming receiver crashed,but seems nothing happened - Key: SPARK-2381 URL: https://issues.apache.org/jira/browse/SPARK-2381 Project: Spark Issue Type: Bug Components: Streaming Reporter: sunshangchun when we submit a streaming job and if receivers doesn't start normally, the application should stop itself. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2017) web ui stage page becomes unresponsive when the number of tasks is large
[ https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2017: --- Component/s: Web UI web ui stage page becomes unresponsive when the number of tasks is large Key: SPARK-2017 URL: https://issues.apache.org/jira/browse/SPARK-2017 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Reynold Xin Labels: starter {code} sc.parallelize(1 to 100, 100).count() {code} The above code creates one million tasks to be executed. The stage detail web ui page takes forever to load (if it ever completes). There are again a few different alternatives: 0. Limit the number of tasks we show. 1. Pagination 2. By default only show the aggregate metrics and failed tasks, and hide the successful ones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2345) ForEachDStream should have an option of running the foreachfunc on Spark
[ https://issues.apache.org/jira/browse/SPARK-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2345: --- Component/s: Streaming ForEachDStream should have an option of running the foreachfunc on Spark Key: SPARK-2345 URL: https://issues.apache.org/jira/browse/SPARK-2345 Project: Spark Issue Type: Bug Components: Streaming Reporter: Hari Shreedharan Today the Job generated simply calls the foreachfunc, but does not run it on spark itself using the sparkContext.runJob method. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (SPARK-2338) Jenkins Spark-Master-Maven-with-YARN builds failing due to test misconfiguration
[ https://issues.apache.org/jira/browse/SPARK-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2338. Resolution: Fixed Assignee: Pete MacKinnon Thanks a ton for getting to the bottom of this. I was super confused why the tests were so messed up even though this seems totally obvious in retrospect. I went ahead and updated the build configuration. There are some failing tests in MLLib in maven, I'll try to track those down as well to get this all green. Jenkins Spark-Master-Maven-with-YARN builds failing due to test misconfiguration Key: SPARK-2338 URL: https://issues.apache.org/jira/browse/SPARK-2338 Project: Spark Issue Type: Bug Components: Build, Project Infra, YARN Affects Versions: 1.0.0 Environment: https://amplab.cs.berkeley.edu/jenkins Reporter: Pete MacKinnon Assignee: Pete MacKinnon Labels: hadoop2, jenkins, maven, protobuf, yarn https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/hadoop.version=2.2.0,label=centos/ https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/hadoop.version=2.3.0,label=centos/ These builds are currently failing due to the builder configuration being incomplete. After building, they specify the test command as: {noformat} /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn -Dhadoop.version=2.3.0 -Dlabel=centos test -Pyarn -Phive {noformat} However, it is not enough to specify the hadoop.version, the tests should instead be run using the hadoop-2.2 and hadoop-2.3 profiles respectively. For example: {noformat} /home/jenkins/tools/hudson.tasks.Maven_MavenInstallation/Maven_3.0.5/bin/mvn -Phadoop2.2 -Dlabel=centos test -Pyarn -Phive {noformat} These profiles will not only set the appropriate hadoop.version but also set the version of protobuf-java required by yarn (2.5.0). Without the correct profile set, the test run fails at: {noformat} *** RUN ABORTED *** java.lang.VerifyError: class org.apache.hadoop.yarn.proto.YarnProtos$LocalResourceProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; {noformat} since it is getting the default version of protobuf-java (2.4.1) which has the old incompatible version of getUnknownFields. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2416) Allow richer reporting of unit test results
Patrick Wendell created SPARK-2416: -- Summary: Allow richer reporting of unit test results Key: SPARK-2416 URL: https://issues.apache.org/jira/browse/SPARK-2416 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Patrick Wendell Assignee: Patrick Wendell The built-in Jenkins integration is pretty bad. It's very confusing to users whether tests have passed or failed and we can't easily customize the message. With some small scripting around the Github API we can do much better than this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2152) the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib
[ https://issues.apache.org/jira/browse/SPARK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2152: --- Assignee: Jon Sondag the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib Key: SPARK-2152 URL: https://issues.apache.org/jira/browse/SPARK-2152 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Environment: windows7 ,32 operator,and 3G mem Reporter: caoli Assignee: Jon Sondag Labels: features Fix For: 1.0.1, 1.1.0 Original Estimate: 4h Remaining Estimate: 4h the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib about the function extractLeftRightNodeAggregates() ,when compute rightNodeAgg used bindata index is error. in the DecisionTree.scala file about Line 980: rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) = binData(shift + (2 * (numBins - 2 - splitIndex))) + rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex)) the binData(shift + (2 * (numBins - 2 - splitIndex))) index compute is error, so the result of rightNodeAgg include repeated data about bins -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2152) the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib
[ https://issues.apache.org/jira/browse/SPARK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056157#comment-14056157 ] Patrick Wendell commented on SPARK-2152: FYI this caused some new test failures, I created SPARK-2417 to track it. the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib Key: SPARK-2152 URL: https://issues.apache.org/jira/browse/SPARK-2152 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Environment: windows7 ,32 operator,and 3G mem Reporter: caoli Assignee: Jon Sondag Labels: features Fix For: 1.0.1, 1.1.0 Original Estimate: 4h Remaining Estimate: 4h the error of comput rightNodeAgg about Decision tree algorithm in Spark MLlib about the function extractLeftRightNodeAggregates() ,when compute rightNodeAgg used bindata index is error. in the DecisionTree.scala file about Line 980: rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) = binData(shift + (2 * (numBins - 2 - splitIndex))) + rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex)) the binData(shift + (2 * (numBins - 2 - splitIndex))) index compute is error, so the result of rightNodeAgg include repeated data about bins -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-2417) Decision tree tests fail in maven build
Patrick Wendell created SPARK-2417: -- Summary: Decision tree tests fail in maven build Key: SPARK-2417 URL: https://issues.apache.org/jira/browse/SPARK-2417 Project: Spark Issue Type: Bug Components: MLlib Reporter: Patrick Wendell Assignee: Xiangrui Meng After SPARK-2152 was merged, these tests started failing in Jenkins: {code} - classification stump with all categorical variables *** FAILED *** org.scalatest.exceptions.TestFailedException was thrown. (DecisionTreeSuite.scala:257) - regression stump with all categorical variables *** FAILED *** org.scalatest.exceptions.TestFailedException was thrown. (DecisionTreeSuite.scala:284) {code} https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/97/hadoop.version=1.0.4,label=centos/console -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-2417) Decision tree tests are failing
[ https://issues.apache.org/jira/browse/SPARK-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2417: --- Summary: Decision tree tests are failing (was: Decision tree tests fail in maven build) Decision tree tests are failing --- Key: SPARK-2417 URL: https://issues.apache.org/jira/browse/SPARK-2417 Project: Spark Issue Type: Bug Components: MLlib Reporter: Patrick Wendell Assignee: Xiangrui Meng After SPARK-2152 was merged, these tests started failing in Jenkins: {code} - classification stump with all categorical variables *** FAILED *** org.scalatest.exceptions.TestFailedException was thrown. (DecisionTreeSuite.scala:257) - regression stump with all categorical variables *** FAILED *** org.scalatest.exceptions.TestFailedException was thrown. (DecisionTreeSuite.scala:284) {code} https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/97/hadoop.version=1.0.4,label=centos/console -- This message was sent by Atlassian JIRA (v6.2#6252)