[jira] [Updated] (SPARK-5735) Replace uses of EasyMock with Mockito
[ https://issues.apache.org/jira/browse/SPARK-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5735: --- Assignee: Josh Rosen Replace uses of EasyMock with Mockito - Key: SPARK-5735 URL: https://issues.apache.org/jira/browse/SPARK-5735 Project: Spark Issue Type: Improvement Components: Tests Reporter: Patrick Wendell Assignee: Josh Rosen There are a few reasons we should drop EasyMock. First, we should have a single mocking framework in our tests in general to keep things consistent. Second, EasyMock has caused us some dependency pain in our tests due to objenesis. We aren't totally sure but suspect such conflicts might be causing non deterministic test failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Mail to u...@spark.apache.org failing
Ah - we should update it to suggest mailing the dev@ list (and if there is enough traffic maybe do something else). I'm happy to add you if you can give an organization name, URL, a list of which Spark components you are using, and a short description of your use case.. On Mon, Feb 9, 2015 at 9:00 PM, Meethu Mathew meethu.mat...@flytxt.com wrote: Hi, The mail id given in https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark seems to be failing. Can anyone tell me how to get added to Powered By Spark list? -- Regards, *Meethu* - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: New Metrics Sink class not packaged in spark-assembly jar
Hi Judy, If you have added source files in the sink/ source folder, they should appear in the assembly jar when you build. One thing I noticed is that you are looking inside the /dist folder. That only gets populated if you run make-distribution. The normal development process is just to do mvn package and then look at the assembly jar that is contained in core/target. - Patrick On Mon, Feb 9, 2015 at 10:02 PM, Judy Nash judyn...@exchange.microsoft.com wrote: Hello, Working on SPARK-5708 https://issues.apache.org/jira/browse/SPARK-5708 - Add Slf4jSink to Spark Metrics Sink. Wrote a new Slf4jSink class (see patch attached), but the new class is not packaged as part of spark-assembly jar. Do I need to update build config somewhere to have this packaged? Current packaged class: Thought I must have missed something basic but can't figure out why. Thanks! Judy - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Resolved] (SPARK-1142) Allow adding jars on app submission, outside of code
[ https://issues.apache.org/jira/browse/SPARK-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1142. Resolution: Not a Problem Allow adding jars on app submission, outside of code Key: SPARK-1142 URL: https://issues.apache.org/jira/browse/SPARK-1142 Project: Spark Issue Type: Improvement Components: Spark Submit Affects Versions: 0.9.0 Reporter: Sandy Pérez González Assignee: Sandy Ryza yarn-standalone mode supports an option that allows adding jars that will be distributed on the cluster with job submission. Providing similar functionality for other app submission modes will allow the spark-app script proposed in SPARK-1126 to support an add-jars option that works for every submit mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5195) when hive table is query with alias the cache data lose effectiveness.
[ https://issues.apache.org/jira/browse/SPARK-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5195: --- Assignee: yixiaohua when hive table is query with alias the cache data lose effectiveness. Key: SPARK-5195 URL: https://issues.apache.org/jira/browse/SPARK-5195 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: yixiaohua Assignee: yixiaohua Fix For: 1.3.0 override the MetastoreRelation's sameresult method only compare databasename and table name because in previous : cache table t1; select count() from t1; it will read data from memory but the sql below will not,instead it read from hdfs: select count() from t1 t; because cache data is keyed by logical plan and compare with sameResult ,so when table with alias the same table 's logicalplan is not the same logical plan with out alias so modify the sameresult method only compare databasename and table name -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5690) Flaky test:
Patrick Wendell created SPARK-5690: -- Summary: Flaky test: Key: SPARK-5690 URL: https://issues.apache.org/jira/browse/SPARK-5690 Project: Spark Issue Type: Bug Components: Tests Reporter: Patrick Wendell Assignee: Andrew Or https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/1647/testReport/junit/org.apache.spark.deploy.rest/StandaloneRestSubmitSuite/simple_submit_until_completion/ {code} org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.simple submit until completion Failing for the past 1 build (Since Failed#1647 ) Took 30 sec. Error Message Driver driver-20150209035158- did not finish within 30 seconds. Stacktrace sbt.ForkMain$ForkError: Driver driver-20150209035158- did not finish within 30 seconds. at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$class.fail(Assertions.scala:1328) at org.scalatest.FunSuite.fail(FunSuite.scala:1555) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.org$apache$spark$deploy$rest$StandaloneRestSubmitSuite$$waitUntilFinished(StandaloneRestSubmitSuite.scala:152) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite$$anonfun$1.apply$mcV$sp(StandaloneRestSubmitSuite.scala:57) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite$$anonfun$1.apply(StandaloneRestSubmitSuite.scala:52) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite$$anonfun$1.apply(StandaloneRestSubmitSuite.scala:52) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(StandaloneRestSubmitSuite.scala:41) at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.runTest(StandaloneRestSubmitSuite.scala:41) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.org$scalatest$BeforeAndAfterAll$$super$run(StandaloneRestSubmitSuite.scala:41) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.run(StandaloneRestSubmitSuite.scala:41) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute
[jira] [Updated] (SPARK-5690) Flaky test: org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.simple submit until completion
[ https://issues.apache.org/jira/browse/SPARK-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5690: --- Summary: Flaky test: org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.simple submit until completion (was: Flaky test: ) Flaky test: org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.simple submit until completion - Key: SPARK-5690 URL: https://issues.apache.org/jira/browse/SPARK-5690 Project: Spark Issue Type: Bug Components: Tests Reporter: Patrick Wendell Assignee: Andrew Or Labels: flaky-test https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/1647/testReport/junit/org.apache.spark.deploy.rest/StandaloneRestSubmitSuite/simple_submit_until_completion/ {code} org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.simple submit until completion Failing for the past 1 build (Since Failed#1647 ) Took 30 sec. Error Message Driver driver-20150209035158- did not finish within 30 seconds. Stacktrace sbt.ForkMain$ForkError: Driver driver-20150209035158- did not finish within 30 seconds. at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$class.fail(Assertions.scala:1328) at org.scalatest.FunSuite.fail(FunSuite.scala:1555) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.org$apache$spark$deploy$rest$StandaloneRestSubmitSuite$$waitUntilFinished(StandaloneRestSubmitSuite.scala:152) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite$$anonfun$1.apply$mcV$sp(StandaloneRestSubmitSuite.scala:57) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite$$anonfun$1.apply(StandaloneRestSubmitSuite.scala:52) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite$$anonfun$1.apply(StandaloneRestSubmitSuite.scala:52) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(StandaloneRestSubmitSuite.scala:41) at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.runTest(StandaloneRestSubmitSuite.scala:41) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.deploy.rest.StandaloneRestSubmitSuite.org$scalatest$BeforeAndAfterAll$$super$run(StandaloneRestSubmitSuite.scala:41
[jira] [Updated] (SPARK-5689) Document what can be run in different YARN modes
[ https://issues.apache.org/jira/browse/SPARK-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5689: --- Issue Type: Documentation (was: Improvement) Document what can be run in different YARN modes Key: SPARK-5689 URL: https://issues.apache.org/jira/browse/SPARK-5689 Project: Spark Issue Type: Documentation Components: YARN Affects Versions: 1.1.0 Reporter: Thomas Graves We should document what can be run in the different yarn modes. For instances, the interactive shell only work in yarn client mode, recently with https://github.com/apache/spark/pull/3976 users can run python scripts in cluster mode, etc.. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1142) Allow adding jars on app submission, outside of code
[ https://issues.apache.org/jira/browse/SPARK-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312608#comment-14312608 ] Patrick Wendell commented on SPARK-1142: This already exists - you can use the --jars flag to spark-submit or set 'spark.jars' manually. Allow adding jars on app submission, outside of code Key: SPARK-1142 URL: https://issues.apache.org/jira/browse/SPARK-1142 Project: Spark Issue Type: Improvement Components: Spark Submit Affects Versions: 0.9.0 Reporter: Sandy Pérez González Assignee: Sandy Ryza yarn-standalone mode supports an option that allows adding jars that will be distributed on the cluster with job submission. Providing similar functionality for other app submission modes will allow the spark-app script proposed in SPARK-1126 to support an add-jars option that works for every submit mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Keep or remove Debian packaging in Spark?
I have wondered whether we should sort of deprecated it more officially, since otherwise I think people have the reasonable expectation based on the current code that Spark intends to support complete Debian packaging as part of the upstream build. Having something that's sort-of maintained but no one is helping review and merge patches on it or make it fully functional, IMO that doesn't benefit us or our users. There are a bunch of other projects that are specifically devoted to packaging, so it seems like there is a clear separation of concerns here. On Mon, Feb 9, 2015 at 7:31 AM, Mark Hamstra m...@clearstorydata.com wrote: it sounds like nobody intends these to be used to actually deploy Spark I wouldn't go quite that far. What we have now can serve as useful input to a deployment tool like Chef, but the user is then going to need to add some customization or configuration within the context of that tooling to get Spark installed just the way they want. So it is not so much that the current Debian packaging can't be used as that it has never really been intended to be a completely finished product that a newcomer could, for example, use to install Spark completely and quickly to Ubuntu and have a fully-functional environment in which they could then run all of the examples, tutorials, etc. Getting to that level of packaging (and maintenance) is something that I'm not sure we want to do since that is a better fit with Bigtop and the efforts of Cloudera, Horton Works, MapR, etc. to distribute Spark. On Mon, Feb 9, 2015 at 2:41 AM, Sean Owen so...@cloudera.com wrote: This is a straw poll to assess whether there is support to keep and fix, or remove, the Debian packaging-related config in Spark. I see several oldish outstanding JIRAs relating to problems in the packaging: https://issues.apache.org/jira/browse/SPARK-1799 https://issues.apache.org/jira/browse/SPARK-2614 https://issues.apache.org/jira/browse/SPARK-3624 https://issues.apache.org/jira/browse/SPARK-4436 (and a similar idea about making RPMs) https://issues.apache.org/jira/browse/SPARK-665 The original motivation seems related to Chef: https://issues.apache.org/jira/browse/SPARK-2614?focusedCommentId=14070908page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14070908 Mark's recent comments cast some doubt on whether it is essential: https://github.com/apache/spark/pull/4277#issuecomment-72114226 and in recent conversations I didn't hear dissent to the idea of removing this. Is this still useful enough to fix up? All else equal I'd like to start to walk back some of the complexity of the build, but I don't know how all-else-equal it is. Certainly, it sounds like nobody intends these to be used to actually deploy Spark. I don't doubt it's useful to someone, but can they maintain the packaging logic elsewhere? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Resolved] (SPARK-2892) Socket Receiver does not stop when streaming context is stopped
[ https://issues.apache.org/jira/browse/SPARK-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2892. Resolution: Fixed Fix Version/s: 1.2.1 I believe this is fixed by SPARK-5035, so I'm closing this. Socket Receiver does not stop when streaming context is stopped --- Key: SPARK-2892 URL: https://issues.apache.org/jira/browse/SPARK-2892 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.0.2 Reporter: Tathagata Das Assignee: Tathagata Das Priority: Critical Fix For: 1.2.1 Running NetworkWordCount with {quote} ssc.start(); Thread.sleep(1); ssc.stop(stopSparkContext = false); Thread.sleep(6) {quote} gives the following error {quote} 14/08/06 18:37:13 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 10047 ms on localhost (1/1) 14/08/06 18:37:13 INFO DAGScheduler: Stage 0 (runJob at ReceiverTracker.scala:275) finished in 10.056 s 14/08/06 18:37:13 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/08/06 18:37:13 INFO SparkContext: Job finished: runJob at ReceiverTracker.scala:275, took 10.179263 s 14/08/06 18:37:13 INFO ReceiverTracker: All of the receivers have been terminated 14/08/06 18:37:13 WARN ReceiverTracker: All of the receivers have not deregistered, Map(0 - ReceiverInfo(0,SocketReceiver-0,null,false,localhost,Stopped by driver,)) 14/08/06 18:37:13 INFO ReceiverTracker: ReceiverTracker stopped 14/08/06 18:37:13 INFO JobGenerator: Stopping JobGenerator immediately 14/08/06 18:37:13 INFO RecurringTimer: Stopped timer for JobGenerator after time 1407375433000 14/08/06 18:37:13 INFO JobGenerator: Stopped JobGenerator 14/08/06 18:37:13 INFO JobScheduler: Stopped JobScheduler 14/08/06 18:37:13 INFO StreamingContext: StreamingContext stopped successfully 14/08/06 18:37:43 INFO SocketReceiver: Stopped receiving 14/08/06 18:37:43 INFO SocketReceiver: Closed socket to localhost: {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: multi-line comment style
Clearly there isn't a strictly optimal commenting format (pro's and cons for both '//' and '/*'). My thought is for consistency we should just chose one and put in the style guide. On Mon, Feb 9, 2015 at 12:25 PM, Xiangrui Meng men...@gmail.com wrote: Btw, I think allowing `/* ... */` without the leading `*` in lines is also useful. Check this line: https://github.com/apache/spark/pull/4259/files#diff-e9dcb3b5f3de77fc31b3aff7831110eaR55, where we put the R commands that can reproduce the test result. It is easier if we write in the following style: ~~~ /* Using the following R code to load the data and train the model using glmnet package. library(glmnet) data - read.csv(path, header=FALSE, stringsAsFactors=FALSE) features - as.matrix(data.frame(as.numeric(data$V2), as.numeric(data$V3))) label - as.numeric(data$V1) weights - coef(glmnet(features, label, family=gaussian, alpha = 0, lambda = 0)) */ ~~~ So people can copy paste the R commands directly. Xiangrui On Mon, Feb 9, 2015 at 12:18 PM, Xiangrui Meng men...@gmail.com wrote: I like the `/* .. */` style more. Because it is easier for IDEs to recognize it as a block comment. If you press enter in the comment block with the `//` style, IDEs won't add `//` for you. -Xiangrui On Wed, Feb 4, 2015 at 2:15 PM, Reynold Xin r...@databricks.com wrote: We should update the style doc to reflect what we have in most places (which I think is //). On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: FWIW I like the multi-line // over /* */ from a purely style standpoint. The Google Java style guide[1] has some comment about code formatting tools working better with /* */ but there doesn't seem to be any strong arguments for one over the other I can find Thanks Shivaram [1] https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell pwend...@gmail.com wrote: Personally I have no opinion, but agree it would be nice to standardize. - Patrick On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com wrote: One thing Marcelo pointed out to me is that the // style does not interfere with commenting out blocks of code with /* */, which is a small good thing. I am also accustomed to // style for multiline, and reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */ style inline always looks a little funny to me. On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout kayousterh...@gmail.com wrote: Hi all, The Spark Style Guide https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide says multi-line comments should formatted as: /* * This is a * very * long comment. */ But in my experience, we almost always use // for multi-line comments: // This is a // very // long comment. Here are some examples: - Recent commit by Reynold, king of style: https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58 - RDD.scala: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361 - DAGScheduler.scala: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281 Any objections to me updating the style guide to reflect this? As with other style issues, I think consistency here is helpful (and formatting multi-line comments as // does nicely visually distinguish code comments from doc comments). -Kay - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[ANNOUNCE] Apache Spark 1.2.1 Released
Hi All, I've just posted the 1.2.1 maintenance release of Apache Spark. We recommend all 1.2.0 users upgrade to this release, as this release includes stability fixes across all components of Spark. - Download this release: http://spark.apache.org/downloads.html - View the release notes: http://spark.apache.org/releases/spark-release-1-2-1.html - Full list of JIRA issues resolved in this release: http://s.apache.org/Mpn Thanks to everyone who helped work on this release! - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-4423) Improve foreach() documentation to avoid confusion between local- and cluster-mode behavior
[ https://issues.apache.org/jira/browse/SPARK-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313030#comment-14313030 ] Patrick Wendell commented on SPARK-4423: [~joshrosen] Is this specific to foreach? Isn't this true of map() or other operators as well? Improve foreach() documentation to avoid confusion between local- and cluster-mode behavior --- Key: SPARK-4423 URL: https://issues.apache.org/jira/browse/SPARK-4423 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Josh Rosen Assignee: Ilya Ganelin {{foreach}} seems to be a common source of confusion for new users: in {{local}} mode, {{foreach}} can be used to update local variables on the driver, but programs that do this will not work properly when executed on clusters, since the {{foreach}} will update per-executor variables (note that this _will_ work correctly for accumulators, but not for other types of mutable objects). Similarly, I've seen users become confused when {{.foreach(println)}} doesn't print to the driver's standard output. At a minimum, we should improve the documentation to warn users against unsafe uses of {{foreach}} that won't work properly when transitioning from local mode to a real cluster. We might also consider changes to local mode so that its behavior more closely matches the cluster modes; this will require some discussion, though, since any change of behavior here would technically be a user-visible backwards-incompatible change (I don't think that we made any explicit guarantees about the current local-mode behavior, but someone might be relying on the current implicit behavior). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5696) HiveThriftServer2Suite fails because of extra log4j.properties in the driver classpath
[ https://issues.apache.org/jira/browse/SPARK-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313005#comment-14313005 ] Patrick Wendell commented on SPARK-5696: Wow - this must have been a substantial effort to figure out what caused this. Sorry I didn't anticipate this when signing off on that patch. HiveThriftServer2Suite fails because of extra log4j.properties in the driver classpath -- Key: SPARK-5696 URL: https://issues.apache.org/jira/browse/SPARK-5696 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Cheng Lian Assignee: Cheng Lian Labels: flaky-test PR #2982 added the {{--driver-class-path}} flag to {{HiveThriftServer2Suite}} so that it passes when the {{hadoop-provided}} profile is used. However, {{lib_managed/jars/jets3s-0.9.2.jar}} in the classpath has a log4j.properties in it, which sets root logger level to ERROR. This makes {{HiveThriftServer2Suite}} fail because it starts new processes and checks for log output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5647) Output metrics do not show up for older hadoop versions ( 2.5)
[ https://issues.apache.org/jira/browse/SPARK-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313020#comment-14313020 ] Patrick Wendell commented on SPARK-5647: Isn't it just possible to get the file path in the case of file output format, and then read the size of that file? The main challenge I see is how quickly that size becomes visible to the HDFS client. In general I think it's worth doing because a lot of people still use older versions of the Spark HDFS client, for instance people based on AWS who primarily read from S3 and don't keep up to date with the newest Hadoop API's. Output metrics do not show up for older hadoop versions ( 2.5) --- Key: SPARK-5647 URL: https://issues.apache.org/jira/browse/SPARK-5647 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Kostas Sakellis Need to add output metrics for hadoop 2.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5647) Output metrics do not show up for older hadoop versions ( 2.5)
[ https://issues.apache.org/jira/browse/SPARK-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5647: --- Target Version/s: 1.4.0 Output metrics do not show up for older hadoop versions ( 2.5) --- Key: SPARK-5647 URL: https://issues.apache.org/jira/browse/SPARK-5647 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Kostas Sakellis Priority: Critical Need to add output metrics for hadoop 2.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Improving metadata in Spark JIRA
I think we already have a YARN component. https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20%3D%20YARN I don't think JIRA allows it to be mandatory, but if it does, that would be useful. On Sat, Feb 7, 2015 at 5:08 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: By the way, isn't it possible to make the Component field mandatory when people open new issues? Shouldn't we do that? Btw Patrick, don't we need a YARN component? I think our JIRA components should roughly match the components on the PR dashboard. Nick On Fri Feb 06 2015 at 12:25:52 PM Patrick Wendell pwend...@gmail.com wrote: Per Nick's suggestion I added two components: 1. Spark Submit 2. Spark Scheduler I figured I would just add these since if we decide later we don't want them, we can simply merge them into Spark Core. On Fri, Feb 6, 2015 at 11:53 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Do we need some new components to be added to the JIRA project? Like: - scheduler - YARN - spark-submit - ...? Nick On Fri Feb 06 2015 at 10:50:41 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: +9000 on cleaning up JIRA. Thank you Sean for laying out some specific things to tackle. I will assist with this. Regarding email, I think Sandy is right. I only get JIRA email for issues I'm watching. Nick On Fri Feb 06 2015 at 9:52:58 AM Sandy Ryza sandy.r...@cloudera.com wrote: JIRA updates don't go to this list, they go to iss...@spark.apache.org. I don't think many are signed up for that list, and those that are probably have a flood of emails anyway. So I'd definitely be in favor of any JIRA cleanup that you're up for. -Sandy On Fri, Feb 6, 2015 at 6:45 AM, Sean Owen so...@cloudera.com wrote: I've wasted no time in wielding the commit bit to complete a number of small, uncontroversial changes. I wouldn't commit anything that didn't already appear to have review, consensus and little risk, but please let me know if anything looked a little too bold, so I can calibrate. Anyway, I'd like to continue some small house-cleaning by improving the state of JIRA's metadata, in order to let it give us a little clearer view on what's happening in the project: a. Add Component to every (open) issue that's missing one b. Review all Critical / Blocker issues to de-escalate ones that seem obviously neither c. Correct open issues that list a Fix version that has already been released d. Close all issues Resolved for a release that has already been released The problem with doing so is that it will create a tremendous amount of email to the list, like, several hundred. It's possible to make bulk changes and suppress e-mail though, which could be done for all but b. Better to suppress the emails when making such changes? or just not bother on some of these? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC3)
This vote passes with 5 +1 votes (3 binding) and no 0 or -1 votes. +1 Votes: Krishna Sankar Sean Owen* Chip Senkbeil Matei Zaharia* Patrick Wendell* 0 Votes: (none) -1 Votes: (none) On Fri, Feb 6, 2015 at 5:12 PM, Patrick Wendell pwend...@gmail.com wrote: I'll add a +1 as well. On Fri, Feb 6, 2015 at 2:38 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested on Mac OS X. Matei On Feb 2, 2015, at 8:57 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc3 (commit b6eaf77): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b6eaf77d4332bfb0a698849b1f5f917d20d70e97 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.1-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1065/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.1-rc3-docs/ Changes from rc2: A single patch fixing a windows issue. Please vote on releasing this package as Apache Spark 1.2.1! The vote is open until Friday, February 06, at 05:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.1 [ ] -1 Do not release this package because ... For a list of fixes in this release, see http://s.apache.org/Mpn. To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.1 (RC3)
I'll add a +1 as well. On Fri, Feb 6, 2015 at 2:38 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested on Mac OS X. Matei On Feb 2, 2015, at 8:57 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc3 (commit b6eaf77): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b6eaf77d4332bfb0a698849b1f5f917d20d70e97 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.1-rc3/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1065/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.1-rc3-docs/ Changes from rc2: A single patch fixing a windows issue. Please vote on releasing this package as Apache Spark 1.2.1! The vote is open until Friday, February 06, at 05:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.1 [ ] -1 Do not release this package because ... For a list of fixes in this release, see http://s.apache.org/Mpn. To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Unit tests
Hey All, The tests are in a not-amazing state right now due to a few compounding factors: 1. We've merged a large volume of patches recently. 2. The load on jenkins has been relatively high, exposing races and other behavior not seen at lower load. For those not familiar, the main issue is flaky (non deterministic) test failures. Right now I'm trying to prioritize keeping the PullReqeustBuilder in good shape since it will block development if it is down. For other tests, let's try to keep filing JIRA's when we see issues and use the flaky-test label (see http://bit.ly/1yRif9S): I may contact people regarding specific tests. This is a very high priority to get in good shape. This kind of thing is no one's fault but just the result of a lot of concurrent development, and everyone needs to pitch in to get back in a good place. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-761) Print a nicer error message when incompatible Spark binaries try to talk
[ https://issues.apache.org/jira/browse/SPARK-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311475#comment-14311475 ] Patrick Wendell commented on SPARK-761: --- I think the main thing to catch would be Akka. I.e. try connecting different versions and seeing what happens as an exploratory step. For instance, if akka has a standard exception which says you had an incompatible message type, we can wrap that and give an outer exception explaining that the spark version is likely wrong. So maybe we can see if someone wants to explore this a bit as a starter task. Print a nicer error message when incompatible Spark binaries try to talk Key: SPARK-761 URL: https://issues.apache.org/jira/browse/SPARK-761 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Priority: Minor Labels: starter Not sure what component this falls under, or if this is still an issue. Patrick Wendell / Matei Zaharia? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4687) SparkContext#addFile doesn't keep file folder information
[ https://issues.apache.org/jira/browse/SPARK-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4687: --- Component/s: Spark Core SparkContext#addFile doesn't keep file folder information - Key: SPARK-4687 URL: https://issues.apache.org/jira/browse/SPARK-4687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Jimmy Xiang Assignee: Sandy Ryza Fix For: 1.3.0, 1.4.0 Files added with SparkContext#addFile are loaded with Utils#fetchFile before a task starts. However, Utils#fetchFile puts all files under the Spart root on the worker node. We should have an option to keep the folder information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5299) Is http://www.apache.org/dist/spark/KEYS out of date?
[ https://issues.apache.org/jira/browse/SPARK-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5299: --- Component/s: (was: Deploy) Build Is http://www.apache.org/dist/spark/KEYS out of date? - Key: SPARK-5299 URL: https://issues.apache.org/jira/browse/SPARK-5299 Project: Spark Issue Type: Question Components: Build Reporter: David Shaw Assignee: Patrick Wendell The keys contained in http://www.apache.org/dist/spark/KEYS do not appear to match the keys used to sign the releases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3033) [Hive] java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal
[ https://issues.apache.org/jira/browse/SPARK-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3033: --- Component/s: (was: Spark Core) [Hive] java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal Key: SPARK-3033 URL: https://issues.apache.org/jira/browse/SPARK-3033 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.0.2 Reporter: pengyanhong run a complex HiveQL via yarn-cluster, got error as below: {quote} 14/08/14 15:05:24 WARN org.apache.spark.Logging$class.logWarning(Logging.scala:70): Loss was due to java.lang.ClassCastException java.lang.ClassCastException: java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal at org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaHiveDecimalObjectInspector.getPrimitiveJavaObject(JavaHiveDecimalObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveDecimal(PrimitiveObjectInspectorUtils.java:1022) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$HiveDecimalConverter.convert(PrimitiveObjectInspectorConverter.java:306) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ReturnObjectInspectorResolver.convertIfNecessary(GenericUDFUtils.java:179) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFIf.evaluate(GenericUDFIf.java:82) at org.apache.spark.sql.hive.HiveGenericUdf.eval(hiveUdfs.scala:276) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:84) at org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:62) at org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:51) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$4.apply(joins.scala:309) at org.apache.spark.sql.execution.BroadcastNestedLoopJoin$$anonfun$4.apply(joins.scala:303) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:571) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-761) Print a nicer error message when incompatible Spark binaries try to talk
[ https://issues.apache.org/jira/browse/SPARK-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-761: -- Labels: starter (was: ) Print a nicer error message when incompatible Spark binaries try to talk Key: SPARK-761 URL: https://issues.apache.org/jira/browse/SPARK-761 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Priority: Minor Labels: starter Not sure what component this falls under, or if this is still an issue. Patrick Wendell / Matei Zaharia? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-761) Print a nicer error message when incompatible Spark binaries try to talk
[ https://issues.apache.org/jira/browse/SPARK-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-761: -- Description: As a starter task, it would be good to audit the current behavior for different client - server pairs with respect to how exceptions occur. (was: Not sure what component this falls under, or if this is still an issue. Patrick Wendell / Matei Zaharia?) Print a nicer error message when incompatible Spark binaries try to talk Key: SPARK-761 URL: https://issues.apache.org/jira/browse/SPARK-761 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Priority: Minor Labels: starter As a starter task, it would be good to audit the current behavior for different client - server pairs with respect to how exceptions occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-761) Print a nicer error message when incompatible Spark binaries try to talk
[ https://issues.apache.org/jira/browse/SPARK-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311490#comment-14311490 ] Patrick Wendell commented on SPARK-761: --- [~aash] right now we don't explicitly encode the spark version anywhere in the RPC. The best possible thing is to give an explicit version number like you said, but we don't have the plumbing to do that at the moment and IMO that's worth punting until we decide to standardize the RPC format. Print a nicer error message when incompatible Spark binaries try to talk Key: SPARK-761 URL: https://issues.apache.org/jira/browse/SPARK-761 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Matei Zaharia Priority: Minor Labels: starter As a starter task, it would be good to audit the current behavior for different client - server pairs with respect to how exceptions occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5659) Flaky Test: org.apache.spark.streaming.ReceiverSuite.block
[ https://issues.apache.org/jira/browse/SPARK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5659: --- Component/s: Tests Flaky Test: org.apache.spark.streaming.ReceiverSuite.block -- Key: SPARK-5659 URL: https://issues.apache.org/jira/browse/SPARK-5659 Project: Spark Issue Type: Bug Components: Streaming, Tests Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Tathagata Das Priority: Critical Labels: flaky-test {code} Error Message recordedBlocks.drop(1).dropRight(1).forall(((block: scala.collection.mutable.ArrayBuffer[Int]) = block.size.=(minExpectedMessagesPerBlock).(block.size.=(maxExpectedMessagesPerBlock was false # records in received blocks = [11,10,10,10,10,10,10,10,10,10,10,4,16,10,10,10,10,10,10,10], not between 7 and 11 Stacktrace sbt.ForkMain$ForkError: recordedBlocks.drop(1).dropRight(1).forall(((block: scala.collection.mutable.ArrayBuffer[Int]) = block.size.=(minExpectedMessagesPerBlock).(block.size.=(maxExpectedMessagesPerBlock was false # records in received blocks = [11,10,10,10,10,10,10,10,10,10,10,4,16,10,10,10,10,10,10,10], not between 7 and 11 at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply$mcV$sp(ReceiverSuite.scala:200) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$runTest(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.ReceiverSuite.runTest(ReceiverSuite.scala:39) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$run(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.ReceiverSuite.run(ReceiverSuite.scala:39) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294
[jira] [Created] (SPARK-5679) Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and .input metrics with mixed read method
Patrick Wendell created SPARK-5679: -- Summary: Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and .input metrics with mixed read method Key: SPARK-5679 URL: https://issues.apache.org/jira/browse/SPARK-5679 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Kostas Sakellis Priority: Blocker Please audit these and see if there are any assumptions with respect to File IO that might not hold in all cases. I'm happy to help if you can't find anything. These both failed in the same run: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.3-SBT/38/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/#showFailuresLink {code} org.apache.spark.metrics.InputOutputMetricsSuite.input metrics with mixed read method Failing for the past 13 builds (Since Failed#26 ) Took 48 sec. Error Message 2030 did not equal 6496 Stacktrace sbt.ForkMain$ForkError: 2030 did not equal 6496 at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply$mcV$sp(InputOutputMetricsSuite.scala:135) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply(InputOutputMetricsSuite.scala:113) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply(InputOutputMetricsSuite.scala:113) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.metrics.InputOutputMetricsSuite.runTest(InputOutputMetricsSuite.scala:46) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfterAll$$super$run(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfter$$super$run(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.metrics.InputOutputMetricsSuite.run(InputOutputMetricsSuite.scala:46
[jira] [Updated] (SPARK-5679) Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and input metrics with mixed read method
[ https://issues.apache.org/jira/browse/SPARK-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5679: --- Description: Please audit these and see if there are any assumptions with respect to File IO that might not hold in all cases. I'm happy to help if you can't find anything. These both failed in the same run: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.3-SBT/38/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/#showFailuresLink {code} org.apache.spark.metrics.InputOutputMetricsSuite.input metrics with mixed read method Failing for the past 13 builds (Since Failed#26 ) Took 48 sec. Error Message 2030 did not equal 6496 Stacktrace sbt.ForkMain$ForkError: 2030 did not equal 6496 at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply$mcV$sp(InputOutputMetricsSuite.scala:135) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply(InputOutputMetricsSuite.scala:113) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply(InputOutputMetricsSuite.scala:113) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.metrics.InputOutputMetricsSuite.runTest(InputOutputMetricsSuite.scala:46) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfterAll$$super$run(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfter$$super$run(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.metrics.InputOutputMetricsSuite.run(InputOutputMetricsSuite.scala:46) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294) at sbt.ForkMain$Run$2.call(ForkMain.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:262
[jira] [Updated] (SPARK-5679) Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and input metrics with mixed read method
[ https://issues.apache.org/jira/browse/SPARK-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5679: --- Summary: Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and input metrics with mixed read method (was: Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and .input metrics with mixed read method ) Flaky tests in InputOutputMetricsSuite: input metrics with interleaved reads and input metrics with mixed read method -- Key: SPARK-5679 URL: https://issues.apache.org/jira/browse/SPARK-5679 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Kostas Sakellis Priority: Blocker Please audit these and see if there are any assumptions with respect to File IO that might not hold in all cases. I'm happy to help if you can't find anything. These both failed in the same run: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.3-SBT/38/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/#showFailuresLink {code} org.apache.spark.metrics.InputOutputMetricsSuite.input metrics with mixed read method Failing for the past 13 builds (Since Failed#26 ) Took 48 sec. Error Message 2030 did not equal 6496 Stacktrace sbt.ForkMain$ForkError: 2030 did not equal 6496 at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply$mcV$sp(InputOutputMetricsSuite.scala:135) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply(InputOutputMetricsSuite.scala:113) at org.apache.spark.metrics.InputOutputMetricsSuite$$anonfun$9.apply(InputOutputMetricsSuite.scala:113) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfter$$super$runTest(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.metrics.InputOutputMetricsSuite.runTest(InputOutputMetricsSuite.scala:46) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.metrics.InputOutputMetricsSuite.org$scalatest$BeforeAndAfterAll$$super$run(InputOutputMetricsSuite.scala:46) at org.scalatest.BeforeAndAfterAll
[jira] [Updated] (SPARK-4896) Don't redundantly copy executor dependencies in Utils.fetchFile
[ https://issues.apache.org/jira/browse/SPARK-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4896: --- Component/s: Spark Core Don't redundantly copy executor dependencies in Utils.fetchFile --- Key: SPARK-4896 URL: https://issues.apache.org/jira/browse/SPARK-4896 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Josh Rosen Assignee: Ryan Williams Fix For: 1.3.0, 1.1.2, 1.2.1 This JIRA is spun off from a comment by [~rdub] on SPARK-3967, quoted here: {quote} I've been debugging this issue as well and I think I've found an issue in {{org.apache.spark.util.Utils}} that is contributing to / causing the problem: {{Files.move}} on [line 390|https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L390] is called even if {{targetFile}} exists and {{tempFile}} and {{targetFile}} are equal. The check on [line 379|https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L379] seems to imply the desire to skip a redundant overwrite if the file is already there and has the contents that it should have. Gating the {{Files.move}} call on a further {{if (!targetFile.exists)}} fixes the issue for me; attached is a patch of the change. In practice all of my executors that hit this code path are finding every dependency JAR to already exist and be exactly equal to what they need it to be, meaning they were all needlessly overwriting all of their dependency JARs, and now are all basically no-op-ing in {{Utils.fetchFile}}; I've not determined who/what is putting the JARs there, why the issue only crops up in {{yarn-cluster}} mode (or {{--master yarn --deploy-mode cluster}}), etc., but it seems like either way this patch is probably desirable. {quote} I'm spinning this off into its own JIRA so that we can track the merging of https://github.com/apache/spark/pull/2848 separately (since we have multiple PRs that contribute to fixing the original issue). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5355) SparkConf is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5355: --- Component/s: Spark Core SparkConf is not thread-safe Key: SPARK-5355 URL: https://issues.apache.org/jira/browse/SPARK-5355 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0, 1.3.0 Reporter: Davies Liu Assignee: Davies Liu Priority: Blocker Fix For: 1.3.0, 1.2.1 The SparkConf is not thread-safe, but is accessed by many threads. The getAll() could return parts of the configs if another thread is access it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5289) Backport publishing of repl, yarn into branch-1.2
[ https://issues.apache.org/jira/browse/SPARK-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5289: --- Component/s: Build Backport publishing of repl, yarn into branch-1.2 - Key: SPARK-5289 URL: https://issues.apache.org/jira/browse/SPARK-5289 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.1 In SPARK-3452 we did some clean-up of published artifacts that turned out to adversely affect some users. This has been mostly patched up in master via SPARK-4925 (hive-thritserver) which was backported. For the repl and yarn modules, they were fixed in SPARK-4048 as part of a larger change that only went into master. Those pieces should be backported to Spark 1.2 to allow publishing in a 1.2.1 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5254) Update the user guide to make clear that spark.mllib is not being deprecated
[ https://issues.apache.org/jira/browse/SPARK-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5254: --- Component/s: MLlib Update the user guide to make clear that spark.mllib is not being deprecated Key: SPARK-5254 URL: https://issues.apache.org/jira/browse/SPARK-5254 Project: Spark Issue Type: Documentation Components: MLlib Reporter: Xiangrui Meng Assignee: Xiangrui Meng Fix For: 1.3.0, 1.2.1 The current statement in the user guide may deliver confusing messages to users. spark.ml contains high-level APIs for building ML pipelines. But it doesn't mean that spark.mllib is being deprecated. First of all, the pipeline API is in its alpha stage and we need to see more use cases from the community to stabilizes it, which may take several releases. Secondly, the components in spark.ml are simple wrappers over spark.mllib implementations. Neither the APIs or the implementations from spark.mllib are being deprecated. We expect users use spark.ml pipeline APIs to build their ML pipelines, but we will keep supporting and adding features to spark.mllib. For example, there are many features in review at https://spark-prs.appspot.com/#mllib. So users should be comfortable with using spark.mllib features and expect more coming. The user guide needs to be updated to make the message clear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5308) MD5 / SHA1 hash format doesn't match standard Maven output
[ https://issues.apache.org/jira/browse/SPARK-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5308: --- Fix Version/s: (was: 1.2.1) MD5 / SHA1 hash format doesn't match standard Maven output -- Key: SPARK-5308 URL: https://issues.apache.org/jira/browse/SPARK-5308 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.2.0 Reporter: Kuldeep Assignee: Sean Owen Priority: Minor Fix For: 1.3.0 https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.2.0/spark-core_2.10-1.2.0.pom.md5 The above does not look like a proper md5 which is causing failure in some build tools like leiningen. https://github.com/technomancy/leiningen/issues/1802 Compare this with 1.1.0 release https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.pom.md5 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5308) MD5 / SHA1 hash format doesn't match standard Maven output
[ https://issues.apache.org/jira/browse/SPARK-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5308: --- Fix Version/s: 1.2.1 MD5 / SHA1 hash format doesn't match standard Maven output -- Key: SPARK-5308 URL: https://issues.apache.org/jira/browse/SPARK-5308 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.2.0 Reporter: Kuldeep Assignee: Sean Owen Priority: Minor Fix For: 1.3.0, 1.2.1 https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.2.0/spark-core_2.10-1.2.0.pom.md5 The above does not look like a proper md5 which is causing failure in some build tools like leiningen. https://github.com/technomancy/leiningen/issues/1802 Compare this with 1.1.0 release https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.pom.md5 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5524: --- Component/s: Spark Core Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Spark Core Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1430#comment-1430 ] Patrick Wendell commented on SPARK-5524: [~nchammas] I don't think this is related to the build, so I've changed the component. Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Spark Core Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5524) Remove messy dependencies to log4j
[ https://issues.apache.org/jira/browse/SPARK-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5524: --- Component/s: (was: Build) Remove messy dependencies to log4j -- Key: SPARK-5524 URL: https://issues.apache.org/jira/browse/SPARK-5524 Project: Spark Issue Type: Task Components: Spark Core Reporter: Jacek Lewandowski There are some tickets regarding loosening the dependency on Log4j, however some classes still use the following scheme: {code} if (Logger.getLogger(classOf[SomeClass]).getLevel == null) { Logger.getLogger(classOf[SomeClass]).setLevel(someLevel) } {code} This doesn't look good and make it difficult to track why some logs are missing when you use Log4j and why they are flooding when you use something else, like logback. There is a Logging class which checks whether we use Log4j or not. Why not delegate all of such invocations, where the Logging class could handle it properly, maybe considering more logging implementations? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309784#comment-14309784 ] Patrick Wendell commented on SPARK-5388: On DELETE, I'll defer to you guys, have zero strong feelings either way. Provide a stable application submission gateway in standalone cluster mode -- Key: SPARK-5388 URL: https://issues.apache.org/jira/browse/SPARK-5388 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Blocker Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf The existing submission gateway in standalone mode is not compatible across Spark versions. If you have a newer version of Spark submitting to an older version of the standalone Master, it is currently not guaranteed to work. The goal is to provide a stable REST interface to replace this channel. For more detail, please see the most recent design doc attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Improving metadata in Spark JIRA
Per Nick's suggestion I added two components: 1. Spark Submit 2. Spark Scheduler I figured I would just add these since if we decide later we don't want them, we can simply merge them into Spark Core. On Fri, Feb 6, 2015 at 11:53 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Do we need some new components to be added to the JIRA project? Like: - scheduler - YARN - spark-submit - ...? Nick On Fri Feb 06 2015 at 10:50:41 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: +9000 on cleaning up JIRA. Thank you Sean for laying out some specific things to tackle. I will assist with this. Regarding email, I think Sandy is right. I only get JIRA email for issues I'm watching. Nick On Fri Feb 06 2015 at 9:52:58 AM Sandy Ryza sandy.r...@cloudera.com wrote: JIRA updates don't go to this list, they go to iss...@spark.apache.org. I don't think many are signed up for that list, and those that are probably have a flood of emails anyway. So I'd definitely be in favor of any JIRA cleanup that you're up for. -Sandy On Fri, Feb 6, 2015 at 6:45 AM, Sean Owen so...@cloudera.com wrote: I've wasted no time in wielding the commit bit to complete a number of small, uncontroversial changes. I wouldn't commit anything that didn't already appear to have review, consensus and little risk, but please let me know if anything looked a little too bold, so I can calibrate. Anyway, I'd like to continue some small house-cleaning by improving the state of JIRA's metadata, in order to let it give us a little clearer view on what's happening in the project: a. Add Component to every (open) issue that's missing one b. Review all Critical / Blocker issues to de-escalate ones that seem obviously neither c. Correct open issues that list a Fix version that has already been released d. Close all issues Resolved for a release that has already been released The problem with doing so is that it will create a tremendous amount of email to the list, like, several hundred. It's possible to make bulk changes and suppress e-mail though, which could be done for all but b. Better to suppress the emails when making such changes? or just not bother on some of these? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309825#comment-14309825 ] Patrick Wendell commented on SPARK-5388: One the boolean and numeric values. I don't mind one way or the other how they are handled programmatically (since we are not exposing this). However, it does seem weird that in the wire protocol defines these as string types. I looked at a few other API's, Github, Twitter, etc and they all use proper boolean types. So I'd definitely recommend setting them as proper types in the JSON, and if that's easier to do by making them nullable Boolean and Long values, seems like a good approach. Provide a stable application submission gateway in standalone cluster mode -- Key: SPARK-5388 URL: https://issues.apache.org/jira/browse/SPARK-5388 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Blocker Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf The existing submission gateway in standalone mode is not compatible across Spark versions. If you have a newer version of Spark submitting to an older version of the standalone Master, it is currently not guaranteed to work. The goal is to provide a stable REST interface to replace this channel. For more detail, please see the most recent design doc attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4874) Report number of records read/written in a task
[ https://issues.apache.org/jira/browse/SPARK-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4874: --- Component/s: Web UI Spark Core Report number of records read/written in a task --- Key: SPARK-4874 URL: https://issues.apache.org/jira/browse/SPARK-4874 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Reporter: Kostas Sakellis Assignee: Kostas Sakellis Fix For: 1.3.0 This metric will help us find key skew using the WebUI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4874) Report number of records read/written in a task
[ https://issues.apache.org/jira/browse/SPARK-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4874. Resolution: Fixed Fix Version/s: 1.3.0 Target Version/s: 1.3.0 Report number of records read/written in a task --- Key: SPARK-4874 URL: https://issues.apache.org/jira/browse/SPARK-4874 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Reporter: Kostas Sakellis Assignee: Kostas Sakellis Fix For: 1.3.0 This metric will help us find key skew using the WebUI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5659) Flaky Test: org.apache.spark.streaming.ReceiverSuite.block
[ https://issues.apache.org/jira/browse/SPARK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5659: --- Labels: flaky-test (was: ) Flaky Test: org.apache.spark.streaming.ReceiverSuite.block -- Key: SPARK-5659 URL: https://issues.apache.org/jira/browse/SPARK-5659 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Tathagata Das Priority: Critical Labels: flaky-test {code} Error Message recordedBlocks.drop(1).dropRight(1).forall(((block: scala.collection.mutable.ArrayBuffer[Int]) = block.size.=(minExpectedMessagesPerBlock).(block.size.=(maxExpectedMessagesPerBlock was false # records in received blocks = [11,10,10,10,10,10,10,10,10,10,10,4,16,10,10,10,10,10,10,10], not between 7 and 11 Stacktrace sbt.ForkMain$ForkError: recordedBlocks.drop(1).dropRight(1).forall(((block: scala.collection.mutable.ArrayBuffer[Int]) = block.size.=(minExpectedMessagesPerBlock).(block.size.=(maxExpectedMessagesPerBlock was false # records in received blocks = [11,10,10,10,10,10,10,10,10,10,10,4,16,10,10,10,10,10,10,10], not between 7 and 11 at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply$mcV$sp(ReceiverSuite.scala:200) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.apache.spark.streaming.ReceiverSuite$$anonfun$3.apply(ReceiverSuite.scala:158) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$runTest(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.ReceiverSuite.runTest(ReceiverSuite.scala:39) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.ReceiverSuite.org$scalatest$BeforeAndAfter$$super$run(ReceiverSuite.scala:39) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.ReceiverSuite.run(ReceiverSuite.scala:39) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294
[jira] [Resolved] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5388. Resolution: Fixed Fix Version/s: 1.3.0 Provide a stable application submission gateway in standalone cluster mode -- Key: SPARK-5388 URL: https://issues.apache.org/jira/browse/SPARK-5388 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Blocker Fix For: 1.3.0 Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf The existing submission gateway in standalone mode is not compatible across Spark versions. If you have a newer version of Spark submitting to an older version of the standalone Master, it is currently not guaranteed to work. The goal is to provide a stable REST interface to replace this channel. For more detail, please see the most recent design doc attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5662) Flaky test: org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.multi topic stream
[ https://issues.apache.org/jira/browse/SPARK-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5662: --- Priority: Critical (was: Major) Flaky test: org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.multi topic stream -- Key: SPARK-5662 URL: https://issues.apache.org/jira/browse/SPARK-5662 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.3.0 Reporter: Patrick Wendell Priority: Critical {code} sbt.ForkMain$ForkError: java.net.ConnectException: Connection refused at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:319) at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:319) at scala.util.Either.fold(Either.scala:97) at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:318) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite$$anonfun$3.apply$mcV$sp(KafkaDirectStreamSuite.scala:66) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite$$anonfun$3.apply(KafkaDirectStreamSuite.scala:59) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite$$anonfun$3.apply(KafkaDirectStreamSuite.scala:59) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.org$scalatest$BeforeAndAfter$$super$runTest(KafkaDirectStreamSuite.scala:32) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.runTest(KafkaDirectStreamSuite.scala:32) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.org$scalatest$BeforeAndAfter$$super$run(KafkaDirectStreamSuite.scala:32) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.run(KafkaDirectStreamSuite.scala:32) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294) at sbt.ForkMain$Run$2.call(ForkMain.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745
[jira] [Created] (SPARK-5662) Flaky test: org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.multi topic stream
Patrick Wendell created SPARK-5662: -- Summary: Flaky test: org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.multi topic stream Key: SPARK-5662 URL: https://issues.apache.org/jira/browse/SPARK-5662 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.3.0 Reporter: Patrick Wendell {code} sbt.ForkMain$ForkError: java.net.ConnectException: Connection refused at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:319) at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:319) at scala.util.Either.fold(Either.scala:97) at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:318) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite$$anonfun$3.apply$mcV$sp(KafkaDirectStreamSuite.scala:66) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite$$anonfun$3.apply(KafkaDirectStreamSuite.scala:59) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite$$anonfun$3.apply(KafkaDirectStreamSuite.scala:59) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.org$scalatest$BeforeAndAfter$$super$runTest(KafkaDirectStreamSuite.scala:32) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.runTest(KafkaDirectStreamSuite.scala:32) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.org$scalatest$BeforeAndAfter$$super$run(KafkaDirectStreamSuite.scala:32) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.streaming.kafka.KafkaDirectStreamSuite.run(KafkaDirectStreamSuite.scala:32) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294) at sbt.ForkMain$Run$2.call(ForkMain.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-SBT/1628/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/testReport/junit/org.apache.spark.streaming.kafka/KafkaDirectStreamSuite/multi_topic_stream
[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308635#comment-14308635 ] Patrick Wendell commented on SPARK-5388: I think it's reasonable to use DELETE per [~tigerquoll]'s suggestion. It's not a perfect match with DELETE semantics, but I think it's fine to use it if it's not too much work. I also think calling it maxProtocolVersion is a good idea if those are indeed the semantics. For security, yeah the killing is the same as it is in the current mode, which is that there is no security. One thing we could do if there is user demand is add a flag that globally disables killing, but let's see if users request this first. Provide a stable application submission gateway in standalone cluster mode -- Key: SPARK-5388 URL: https://issues.apache.org/jira/browse/SPARK-5388 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Blocker Attachments: stable-spark-submit-in-standalone-mode-2-4-15.pdf The existing submission gateway in standalone mode is not compatible across Spark versions. If you have a newer version of Spark submitting to an older version of the standalone Master, it is currently not guaranteed to work. The goal is to provide a stable REST interface to replace this channel. For more detail, please see the most recent design doc attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5557) spark-shell failed to start
[ https://issues.apache.org/jira/browse/SPARK-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308346#comment-14308346 ] Patrick Wendell commented on SPARK-5557: I can send a fix for this shortly. It also works fine if you build with Hadoop 2 support. spark-shell failed to start --- Key: SPARK-5557 URL: https://issues.apache.org/jira/browse/SPARK-5557 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Guoqiang Li Priority: Blocker the log: {noformat} 5/02/03 19:06:39 INFO spark.HttpServer: Starting HTTP Server Exception in thread main java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:75) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1774) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1765) at org.apache.spark.HttpServer.start(HttpServer.scala:62) at org.apache.spark.repl.SparkIMain.init(SparkIMain.scala:130) at org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.init(SparkILoop.scala:185) at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:214) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:946) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1039) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:403) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: javax.servlet.http.HttpServletResponse at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 25 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: PSA: Maven supports parallel builds
I've done this in the past, but back when I wasn't using Zinc it didn't make a big difference. It's worth doing this in our jenkins environment though. - Patrick On Thu, Feb 5, 2015 at 4:52 PM, Dirceu Semighini Filho dirceu.semigh...@gmail.com wrote: Thanks Nicholas, I didn't knew this. 2015-02-05 22:16 GMT-02:00 Nicholas Chammas nicholas.cham...@gmail.com: Y'all may already know this, but I haven't seen it mentioned anywhere in our docs on here and it's a pretty easy win. Maven supports parallel builds https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3 with the -T command line option. For example: ./build/mvn -T 1C -Dhadoop.version=1.2.1 -DskipTests clean package This will have Maven use 1 thread per core on your machine to build Spark. On my little MacBook air, this cuts the build time from 14 minutes to 10.5 minutes. A machine with more cores should see a bigger improvement. Note though that the docs mark this as experimental, so I wouldn't change our reference build to use this. But it should be useful, for example, in Jenkins or when working locally. Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-5607) NullPointerException in objenesis
[ https://issues.apache.org/jira/browse/SPARK-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308423#comment-14308423 ] Patrick Wendell commented on SPARK-5607: This may have actually caused more of an issue than it solved :(. Lots of cascading failures in Spark SQL recently NullPointerException in objenesis - Key: SPARK-5607 URL: https://issues.apache.org/jira/browse/SPARK-5607 Project: Spark Issue Type: Bug Reporter: Reynold Xin Assignee: Patrick Wendell Fix For: 1.3.0 Tests are sometimes failing with the following exception. The problem might be that Kryo is using a different version of objenesis from Mockito. {code} [info] - Process succeeds instantly *** FAILED *** (107 milliseconds) [info] java.lang.NullPointerException: [info] at org.objenesis.strategy.StdInstantiatorStrategy.newInstantiatorOf(StdInstantiatorStrategy.java:52) [info] at org.objenesis.ObjenesisBase.getInstantiatorOf(ObjenesisBase.java:90) [info] at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73) [info] at org.mockito.internal.creation.jmock.ClassImposterizer.createProxy(ClassImposterizer.java:111) [info] at org.mockito.internal.creation.jmock.ClassImposterizer.imposterise(ClassImposterizer.java:51) [info] at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:52) [info] at org.mockito.internal.MockitoCore.mock(MockitoCore.java:41) [info] at org.mockito.Mockito.mock(Mockito.java:1014) [info] at org.mockito.Mockito.mock(Mockito.java:909) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply$mcV$sp(DriverRunnerTest.scala:50) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply(DriverRunnerTest.scala:47) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply(DriverRunnerTest.scala:47) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1122) [info] at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [info] at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuite.run(FunSuite.scala:1555) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:284) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [info
[jira] [Updated] (SPARK-5557) Servlet API classes now missing after jetty shading
[ https://issues.apache.org/jira/browse/SPARK-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5557: --- Summary: Servlet API classes now missing after jetty shading (was: spark-shell failed to start) Servlet API classes now missing after jetty shading --- Key: SPARK-5557 URL: https://issues.apache.org/jira/browse/SPARK-5557 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Guoqiang Li Priority: Blocker the log: {noformat} 5/02/03 19:06:39 INFO spark.HttpServer: Starting HTTP Server Exception in thread main java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:75) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1774) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1765) at org.apache.spark.HttpServer.start(HttpServer.scala:62) at org.apache.spark.repl.SparkIMain.init(SparkIMain.scala:130) at org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.init(SparkILoop.scala:185) at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:214) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:946) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1039) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:403) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: javax.servlet.http.HttpServletResponse at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 25 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5557) Servlet API classes now missing after jetty shading
[ https://issues.apache.org/jira/browse/SPARK-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308391#comment-14308391 ] Patrick Wendell edited comment on SPARK-5557 at 2/6/15 1:07 AM: I'm sorry that this affected so many people for so long. It is not acceptable to have the master build not working for this many hours. Unfortunately our tests do not catch this for some reason. was (Author: pwendell): I'm sorry that this affected so many people for so long. It is not acceptable to have the master build not working for this any hours. Unfortunately our tests do not catch this for some reason. Servlet API classes now missing after jetty shading --- Key: SPARK-5557 URL: https://issues.apache.org/jira/browse/SPARK-5557 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Guoqiang Li Priority: Blocker the log: {noformat} 5/02/03 19:06:39 INFO spark.HttpServer: Starting HTTP Server Exception in thread main java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:75) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1774) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1765) at org.apache.spark.HttpServer.start(HttpServer.scala:62) at org.apache.spark.repl.SparkIMain.init(SparkIMain.scala:130) at org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.init(SparkILoop.scala:185) at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:214) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:946) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1039) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:403) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: javax.servlet.http.HttpServletResponse at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 25 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5557) Servlet API classes now missing after jetty shading
[ https://issues.apache.org/jira/browse/SPARK-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308391#comment-14308391 ] Patrick Wendell commented on SPARK-5557: I'm sorry that this affected so many people for so long. It is not acceptable to have the master build not working for this any hours. Unfortunately our tests do not catch this for some reason. Servlet API classes now missing after jetty shading --- Key: SPARK-5557 URL: https://issues.apache.org/jira/browse/SPARK-5557 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Guoqiang Li Priority: Blocker the log: {noformat} 5/02/03 19:06:39 INFO spark.HttpServer: Starting HTTP Server Exception in thread main java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:75) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1774) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1765) at org.apache.spark.HttpServer.start(HttpServer.scala:62) at org.apache.spark.repl.SparkIMain.init(SparkIMain.scala:130) at org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.init(SparkILoop.scala:185) at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:214) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:946) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1039) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:403) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: javax.servlet.http.HttpServletResponse at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 25 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5557) Servlet API classes now missing after jetty shading
[ https://issues.apache.org/jira/browse/SPARK-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5557. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Patrick Wendell Servlet API classes now missing after jetty shading --- Key: SPARK-5557 URL: https://issues.apache.org/jira/browse/SPARK-5557 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Guoqiang Li Assignee: Patrick Wendell Priority: Blocker Fix For: 1.3.0 the log: {noformat} 5/02/03 19:06:39 INFO spark.HttpServer: Starting HTTP Server Exception in thread main java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.org$apache$spark$HttpServer$$doStart(HttpServer.scala:75) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.HttpServer$$anonfun$1.apply(HttpServer.scala:62) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1774) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1765) at org.apache.spark.HttpServer.start(HttpServer.scala:62) at org.apache.spark.repl.SparkIMain.init(SparkIMain.scala:130) at org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.init(SparkILoop.scala:185) at org.apache.spark.repl.SparkILoop.createInterpreter(SparkILoop.scala:214) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:946) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:942) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:942) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1039) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:403) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: javax.servlet.http.HttpServletResponse at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 25 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5626) Spurious test failures due to NullPointerException in EasyMock test code
[ https://issues.apache.org/jira/browse/SPARK-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308482#comment-14308482 ] Patrick Wendell commented on SPARK-5626: [~joshrosen] I may have caused this by merging SPARK-5607 Spurious test failures due to NullPointerException in EasyMock test code Key: SPARK-5626 URL: https://issues.apache.org/jira/browse/SPARK-5626 Project: Spark Issue Type: Bug Affects Versions: 1.3.0 Reporter: Josh Rosen Labels: flaky-test Attachments: consoleText.txt I've seen a few cases where a test failure will trigger a cascade of spurious failures when instantiating test suites that use EasyMock. Here's a sample symptom: {code} [info] CacheManagerSuite: [info] Exception encountered when attempting to run a suite with class name: org.apache.spark.CacheManagerSuite *** ABORTED *** (137 milliseconds) [info] java.lang.NullPointerException: [info] at org.objenesis.strategy.StdInstantiatorStrategy.newInstantiatorOf(StdInstantiatorStrategy.java:52) [info] at org.objenesis.ObjenesisBase.getInstantiatorOf(ObjenesisBase.java:90) [info] at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73) [info] at org.objenesis.ObjenesisHelper.newInstance(ObjenesisHelper.java:43) [info] at org.easymock.internal.ObjenesisClassInstantiator.newInstance(ObjenesisClassInstantiator.java:26) [info] at org.easymock.internal.ClassProxyFactory.createProxy(ClassProxyFactory.java:219) [info] at org.easymock.internal.MocksControl.createMock(MocksControl.java:59) [info] at org.easymock.EasyMock.createMock(EasyMock.java:103) [info] at org.scalatest.mock.EasyMockSugar$class.mock(EasyMockSugar.scala:267) [info] at org.apache.spark.CacheManagerSuite.mock(CacheManagerSuite.scala:28) [info] at org.apache.spark.CacheManagerSuite$$anonfun$1.apply$mcV$sp(CacheManagerSuite.scala:40) [info] at org.apache.spark.CacheManagerSuite$$anonfun$1.apply(CacheManagerSuite.scala:38) [info] at org.apache.spark.CacheManagerSuite$$anonfun$1.apply(CacheManagerSuite.scala:38) [info] at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:195) [info] at org.apache.spark.CacheManagerSuite.runTest(CacheManagerSuite.scala:28) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [info] at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) [info] at org.apache.spark.CacheManagerSuite.org$scalatest$BeforeAndAfter$$super$run(CacheManagerSuite.scala:28) [info] at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) [info] at org.apache.spark.CacheManagerSuite.run(CacheManagerSuite.scala:28) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:284) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [info] at java.lang.Thread.run(Thread.java:745) {code} This is from https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26852/consoleFull. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
[jira] [Reopened] (SPARK-5607) NullPointerException in objenesis
[ https://issues.apache.org/jira/browse/SPARK-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-5607: NullPointerException in objenesis - Key: SPARK-5607 URL: https://issues.apache.org/jira/browse/SPARK-5607 Project: Spark Issue Type: Bug Reporter: Reynold Xin Assignee: Patrick Wendell Fix For: 1.3.0 Tests are sometimes failing with the following exception. The problem might be that Kryo is using a different version of objenesis from Mockito. {code} [info] - Process succeeds instantly *** FAILED *** (107 milliseconds) [info] java.lang.NullPointerException: [info] at org.objenesis.strategy.StdInstantiatorStrategy.newInstantiatorOf(StdInstantiatorStrategy.java:52) [info] at org.objenesis.ObjenesisBase.getInstantiatorOf(ObjenesisBase.java:90) [info] at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73) [info] at org.mockito.internal.creation.jmock.ClassImposterizer.createProxy(ClassImposterizer.java:111) [info] at org.mockito.internal.creation.jmock.ClassImposterizer.imposterise(ClassImposterizer.java:51) [info] at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:52) [info] at org.mockito.internal.MockitoCore.mock(MockitoCore.java:41) [info] at org.mockito.Mockito.mock(Mockito.java:1014) [info] at org.mockito.Mockito.mock(Mockito.java:909) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply$mcV$sp(DriverRunnerTest.scala:50) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply(DriverRunnerTest.scala:47) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply(DriverRunnerTest.scala:47) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1122) [info] at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [info] at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuite.run(FunSuite.scala:1555) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:284) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [info] at java.lang.Thread.run(Thread.java:745) {code} More information: Kryo depends on objenesis 1.2
[jira] [Commented] (SPARK-5607) NullPointerException in objenesis
[ https://issues.apache.org/jira/browse/SPARK-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308493#comment-14308493 ] Patrick Wendell commented on SPARK-5607: I've reverted my patch since it may have caused more harm than good. NullPointerException in objenesis - Key: SPARK-5607 URL: https://issues.apache.org/jira/browse/SPARK-5607 Project: Spark Issue Type: Bug Reporter: Reynold Xin Assignee: Patrick Wendell Fix For: 1.3.0 Tests are sometimes failing with the following exception. The problem might be that Kryo is using a different version of objenesis from Mockito. {code} [info] - Process succeeds instantly *** FAILED *** (107 milliseconds) [info] java.lang.NullPointerException: [info] at org.objenesis.strategy.StdInstantiatorStrategy.newInstantiatorOf(StdInstantiatorStrategy.java:52) [info] at org.objenesis.ObjenesisBase.getInstantiatorOf(ObjenesisBase.java:90) [info] at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73) [info] at org.mockito.internal.creation.jmock.ClassImposterizer.createProxy(ClassImposterizer.java:111) [info] at org.mockito.internal.creation.jmock.ClassImposterizer.imposterise(ClassImposterizer.java:51) [info] at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:52) [info] at org.mockito.internal.MockitoCore.mock(MockitoCore.java:41) [info] at org.mockito.Mockito.mock(Mockito.java:1014) [info] at org.mockito.Mockito.mock(Mockito.java:909) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply$mcV$sp(DriverRunnerTest.scala:50) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply(DriverRunnerTest.scala:47) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply(DriverRunnerTest.scala:47) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1122) [info] at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [info] at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuite.run(FunSuite.scala:1555) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:284) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run
[jira] [Resolved] (SPARK-5474) curl should support URL redirection in build/mvn
[ https://issues.apache.org/jira/browse/SPARK-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5474. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Guoqiang Li curl should support URL redirection in build/mvn Key: SPARK-5474 URL: https://issues.apache.org/jira/browse/SPARK-5474 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 1.3.0 Reporter: Guoqiang Li Assignee: Guoqiang Li Fix For: 1.3.0 {{http://archive.apache.org/dist/maven/maven-3/3.2.5/binaries/apache-maven-3.2.5-bin.tar.gz}} sometimes return 3xx -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5594) SparkException: Failed to get broadcast (TorrentBroadcast)
[ https://issues.apache.org/jira/browse/SPARK-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5594: --- Priority: Critical (was: Major) SparkException: Failed to get broadcast (TorrentBroadcast) -- Key: SPARK-5594 URL: https://issues.apache.org/jira/browse/SPARK-5594 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: John Sandiford Priority: Critical I am uncertain whether this is a bug, however I am getting the error below when running on a cluster (works locally), and have no idea what is causing it, or where to look for more information. Any help is appreciated. Others appear to experience the same issue, but I have not found any solutions online. Please note that this only happens with certain code and is repeatable, all my other spark jobs work fine. ERROR TaskSetManager: Task 3 in stage 6.0 failed 4 times; aborting job Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 6.0 failed 4 times, most recent failure: Lost task 3.3 in stage 6.0 (TID 24, removed): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_6_piece0 of broadcast_6 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.spark.SparkException: Failed to get broadcast_6_piece0 of broadcast_6 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008) ... 11 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420
[jira] [Commented] (SPARK-5594) SparkException: Failed to get broadcast (TorrentBroadcast)
[ https://issues.apache.org/jira/browse/SPARK-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305657#comment-14305657 ] Patrick Wendell commented on SPARK-5594: I've seen this occasionally in unit tests also. I think we need better exception logging in this code path to explain exactly why it is failing. SparkException: Failed to get broadcast (TorrentBroadcast) -- Key: SPARK-5594 URL: https://issues.apache.org/jira/browse/SPARK-5594 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: John Sandiford I am uncertain whether this is a bug, however I am getting the error below when running on a cluster (works locally), and have no idea what is causing it, or where to look for more information. Any help is appreciated. Others appear to experience the same issue, but I have not found any solutions online. Please note that this only happens with certain code and is repeatable, all my other spark jobs work fine. ERROR TaskSetManager: Task 3 in stage 6.0 failed 4 times; aborting job Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 6.0 failed 4 times, most recent failure: Lost task 3.3 in stage 6.0 (TID 24, removed): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_6_piece0 of broadcast_6 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.spark.SparkException: Failed to get broadcast_6_piece0 of broadcast_6 at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008) ... 11 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696
Re: 1.2.1-rc3 - Avro input format for Hadoop 2 broken/fix?
Hi Markus, That won't be included in 1.2.1 most likely because the release votes have already started, and at that point we don't hold the release except for major regression issues from 1.2.0. However, if this goes through we can backport it into the 1.2 branch and it will end up in a future maintenance release, or you can just build spark from that branch as soon as it's in there. - Patric On Wed, Feb 4, 2015 at 7:30 AM, M. Dale medal...@yahoo.com.invalid wrote: SPARK-3039 Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API was reopened and prevents v.1.2.1-rc3 from using Avro Input format for Hadoop 2 API/instances (it includes the hadoop1 avro-mapred library files). What are the chances of getting the fix outlined here (https://github.com/medale/spark/compare/apache:v1.2.1-rc3...avro-hadoop2-v1.2.1-rc2) included in 1.2.1? My apologies, I do not know how to generate a pull request against a tag version. I did add pull request https://github.com/apache/spark/pull/4315 for the current 1.3.0-SNAPSHOT master on this issue. Even though 1.3.0 build already does not include avro-mapred in the spark assembly jar this minor change improves dependence convergence. Thanks, Markus - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: multi-line comment style
Personally I have no opinion, but agree it would be nice to standardize. - Patrick On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com wrote: One thing Marcelo pointed out to me is that the // style does not interfere with commenting out blocks of code with /* */, which is a small good thing. I am also accustomed to // style for multiline, and reserve /** */ for javadoc / scaladoc. Meaning, seeing the /* */ style inline always looks a little funny to me. On Wed, Feb 4, 2015 at 3:53 PM, Kay Ousterhout kayousterh...@gmail.com wrote: Hi all, The Spark Style Guide https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide says multi-line comments should formatted as: /* * This is a * very * long comment. */ But in my experience, we almost always use // for multi-line comments: // This is a // very // long comment. Here are some examples: - Recent commit by Reynold, king of style: https://github.com/apache/spark/commit/bebf4c42bef3e75d31ffce9bfdb331c16f34ddb1#diff-d616b5496d1a9f648864f4ab0db5a026R58 - RDD.scala: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L361 - DAGScheduler.scala: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L281 Any objections to me updating the style guide to reflect this? As with other style issues, I think consistency here is helpful (and formatting multi-line comments as // does nicely visually distinguish code comments from doc comments). -Kay - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Updated] (SPARK-5586) Automatically provide sqlContext in Spark shell
[ https://issues.apache.org/jira/browse/SPARK-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5586: --- Priority: Blocker (was: Critical) Automatically provide sqlContext in Spark shell --- Key: SPARK-5586 URL: https://issues.apache.org/jira/browse/SPARK-5586 Project: Spark Issue Type: Improvement Components: Spark Shell, SQL Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker A simple patch, but we should create a sqlContext (and, if supported by the build, a Hive context) in the Spark shell when it's created, and import the DSL. We can just call it sqlContext. This would save us so much time writing code examples :P -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5586) Automatically provide sqlContext in Spark shell
[ https://issues.apache.org/jira/browse/SPARK-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5586: --- Assignee: (was: Patrick Wendell) Automatically provide sqlContext in Spark shell --- Key: SPARK-5586 URL: https://issues.apache.org/jira/browse/SPARK-5586 Project: Spark Issue Type: Improvement Components: Spark Shell, SQL Reporter: Patrick Wendell Priority: Blocker A simple patch, but we should create a sqlContext (and, if supported by the build, a Hive context) in the Spark shell when it's created, and import the DSL. We can just call it sqlContext. This would save us so much time writing code examples :P -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5411) Allow SparkListeners to be specified in SparkConf and loaded when creating SparkContext
[ https://issues.apache.org/jira/browse/SPARK-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5411. Resolution: Fixed Fix Version/s: 1.3.0 Allow SparkListeners to be specified in SparkConf and loaded when creating SparkContext --- Key: SPARK-5411 URL: https://issues.apache.org/jira/browse/SPARK-5411 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Josh Rosen Assignee: Josh Rosen Fix For: 1.3.0 It would be nice if there was a mechanism to allow SparkListeners to be registered through SparkConf settings. This would allow monitoring frameworks to be easily injected into Spark programs without having to modify those programs' code. I propose to introduce a new configuration option, {{spark.extraListeners}}, that allows SparkListeners to be specified in SparkConf and registered before the SparkContext is created. Here is the proposed documentation for the new option: {quote} A comma-separated list of classes that implement SparkListener; when initializing SparkContext, instances of these classes will be created and registered with Spark's listener bus. If a class has a single-argument constructor that accepts a SparkConf, that constructor will be called; otherwise, a zero-argument constructor will be called. If no valid constructor can be found, the SparkContext creation will fail with an exception. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5607) NullPointerException in objenesis
[ https://issues.apache.org/jira/browse/SPARK-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5607. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Patrick Wendell Target Version/s: 1.3.0 I've merged a patch attempting to fix this. Let's re-open this if we see it again NullPointerException in objenesis - Key: SPARK-5607 URL: https://issues.apache.org/jira/browse/SPARK-5607 Project: Spark Issue Type: Bug Reporter: Reynold Xin Assignee: Patrick Wendell Fix For: 1.3.0 Tests are sometimes failing with the following exception. The problem might be that Kryo is using a different version of objenesis from Mockito. {code} [info] - Process succeeds instantly *** FAILED *** (107 milliseconds) [info] java.lang.NullPointerException: [info] at org.objenesis.strategy.StdInstantiatorStrategy.newInstantiatorOf(StdInstantiatorStrategy.java:52) [info] at org.objenesis.ObjenesisBase.getInstantiatorOf(ObjenesisBase.java:90) [info] at org.objenesis.ObjenesisBase.newInstance(ObjenesisBase.java:73) [info] at org.mockito.internal.creation.jmock.ClassImposterizer.createProxy(ClassImposterizer.java:111) [info] at org.mockito.internal.creation.jmock.ClassImposterizer.imposterise(ClassImposterizer.java:51) [info] at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:52) [info] at org.mockito.internal.MockitoCore.mock(MockitoCore.java:41) [info] at org.mockito.Mockito.mock(Mockito.java:1014) [info] at org.mockito.Mockito.mock(Mockito.java:909) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply$mcV$sp(DriverRunnerTest.scala:50) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply(DriverRunnerTest.scala:47) [info] at org.apache.spark.deploy.worker.DriverRunnerTest$$anonfun$1.apply(DriverRunnerTest.scala:47) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.scalatest.Suite$class.withFixture(Suite.scala:1122) [info] at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [info] at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuite.run(FunSuite.scala:1555) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:284) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java
[jira] [Updated] (SPARK-5585) Flaky test: Python regression
[ https://issues.apache.org/jira/browse/SPARK-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5585: --- Labels: flaky-test (was: ) Flaky test: Python regression - Key: SPARK-5585 URL: https://issues.apache.org/jira/browse/SPARK-5585 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Davies Liu Priority: Critical Labels: flaky-test Hey [~davies] any chance you can take a look at this? The master build is having random python failures fairly often. Not quite sure what is going on: {code} 0inputs+128outputs (0major+13320minor)pagefaults 0swaps Run mllib tests ... Running test: pyspark/mllib/classification.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.43user 0.12system 0:14.85elapsed 3%CPU (0avgtext+0avgdata 94272maxresident)k 0inputs+280outputs (0major+12627minor)pagefaults 0swaps Running test: pyspark/mllib/clustering.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.35user 0.11system 0:12.63elapsed 3%CPU (0avgtext+0avgdata 93568maxresident)k 0inputs+88outputs (0major+12532minor)pagefaults 0swaps Running test: pyspark/mllib/feature.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.28user 0.08system 0:05.73elapsed 6%CPU (0avgtext+0avgdata 93424maxresident)k 0inputs+32outputs (0major+12548minor)pagefaults 0swaps Running test: pyspark/mllib/linalg.py 0.16user 0.05system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 89888maxresident)k 0inputs+0outputs (0major+8099minor)pagefaults 0swaps Running test: pyspark/mllib/rand.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.25user 0.08system 0:05.42elapsed 6%CPU (0avgtext+0avgdata 87872maxresident)k 0inputs+0outputs (0major+11849minor)pagefaults 0swaps Running test: pyspark/mllib/recommendation.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.32user 0.09system 0:11.42elapsed 3%CPU (0avgtext+0avgdata 94256maxresident)k 0inputs+32outputs (0major+11797minor)pagefaults 0swaps Running test: pyspark/mllib/regression.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.53user 0.17system 0:23.53elapsed 3%CPU (0avgtext+0avgdata 99600maxresident)k 0inputs+48outputs (0major+12402minor)pagefaults 0swaps Running test: pyspark/mllib/stat/_statistics.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.29user 0.09system 0:08.03elapsed 4%CPU (0avgtext+0avgdata 92656maxresident)k 0inputs+48outputs (0major+12508minor)pagefaults 0swaps Running test: pyspark/mllib/tree.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.57user 0.16system 0:25.30elapsed 2%CPU (0avgtext+0avgdata 94400maxresident)k 0inputs+144outputs (0major+12600minor)pagefaults 0swaps Running test: pyspark/mllib/util.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.20user 0.06system 0:08.08elapsed 3%CPU (0avgtext+0avgdata 92768maxresident)k 0inputs+56outputs (0major+12474minor)pagefaults 0swaps Running test: pyspark/mllib/tests.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath .F/usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) ./usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see
[jira] [Created] (SPARK-5585) Flaky test: Python regression
Patrick Wendell created SPARK-5585: -- Summary: Flaky test: Python regression Key: SPARK-5585 URL: https://issues.apache.org/jira/browse/SPARK-5585 Project: Spark Issue Type: Bug Components: MLlib Reporter: Patrick Wendell Assignee: Davies Liu Hey [~davies] any chance you can take a look at this? The master build is having random python failures fairly often. Not quite sure what is going on: {code} 0inputs+128outputs (0major+13320minor)pagefaults 0swaps Run mllib tests ... Running test: pyspark/mllib/classification.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.43user 0.12system 0:14.85elapsed 3%CPU (0avgtext+0avgdata 94272maxresident)k 0inputs+280outputs (0major+12627minor)pagefaults 0swaps Running test: pyspark/mllib/clustering.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.35user 0.11system 0:12.63elapsed 3%CPU (0avgtext+0avgdata 93568maxresident)k 0inputs+88outputs (0major+12532minor)pagefaults 0swaps Running test: pyspark/mllib/feature.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.28user 0.08system 0:05.73elapsed 6%CPU (0avgtext+0avgdata 93424maxresident)k 0inputs+32outputs (0major+12548minor)pagefaults 0swaps Running test: pyspark/mllib/linalg.py 0.16user 0.05system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 89888maxresident)k 0inputs+0outputs (0major+8099minor)pagefaults 0swaps Running test: pyspark/mllib/rand.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.25user 0.08system 0:05.42elapsed 6%CPU (0avgtext+0avgdata 87872maxresident)k 0inputs+0outputs (0major+11849minor)pagefaults 0swaps Running test: pyspark/mllib/recommendation.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.32user 0.09system 0:11.42elapsed 3%CPU (0avgtext+0avgdata 94256maxresident)k 0inputs+32outputs (0major+11797minor)pagefaults 0swaps Running test: pyspark/mllib/regression.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.53user 0.17system 0:23.53elapsed 3%CPU (0avgtext+0avgdata 99600maxresident)k 0inputs+48outputs (0major+12402minor)pagefaults 0swaps Running test: pyspark/mllib/stat/_statistics.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.29user 0.09system 0:08.03elapsed 4%CPU (0avgtext+0avgdata 92656maxresident)k 0inputs+48outputs (0major+12508minor)pagefaults 0swaps Running test: pyspark/mllib/tree.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.57user 0.16system 0:25.30elapsed 2%CPU (0avgtext+0avgdata 94400maxresident)k 0inputs+144outputs (0major+12600minor)pagefaults 0swaps Running test: pyspark/mllib/util.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.20user 0.06system 0:08.08elapsed 3%CPU (0avgtext+0avgdata 92768maxresident)k 0inputs+56outputs (0major+12474minor)pagefaults 0swaps Running test: pyspark/mllib/tests.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath .F/usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) ./usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) ./usr/lib64/python2.6/site-packages/numpy/lib
[jira] [Updated] (SPARK-5585) Flaky test: Python regression
[ https://issues.apache.org/jira/browse/SPARK-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5585: --- Affects Version/s: 1.3.0 Flaky test: Python regression - Key: SPARK-5585 URL: https://issues.apache.org/jira/browse/SPARK-5585 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Davies Liu Priority: Critical Labels: flaky-test Hey [~davies] any chance you can take a look at this? The master build is having random python failures fairly often. Not quite sure what is going on: {code} 0inputs+128outputs (0major+13320minor)pagefaults 0swaps Run mllib tests ... Running test: pyspark/mllib/classification.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.43user 0.12system 0:14.85elapsed 3%CPU (0avgtext+0avgdata 94272maxresident)k 0inputs+280outputs (0major+12627minor)pagefaults 0swaps Running test: pyspark/mllib/clustering.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.35user 0.11system 0:12.63elapsed 3%CPU (0avgtext+0avgdata 93568maxresident)k 0inputs+88outputs (0major+12532minor)pagefaults 0swaps Running test: pyspark/mllib/feature.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.28user 0.08system 0:05.73elapsed 6%CPU (0avgtext+0avgdata 93424maxresident)k 0inputs+32outputs (0major+12548minor)pagefaults 0swaps Running test: pyspark/mllib/linalg.py 0.16user 0.05system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 89888maxresident)k 0inputs+0outputs (0major+8099minor)pagefaults 0swaps Running test: pyspark/mllib/rand.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.25user 0.08system 0:05.42elapsed 6%CPU (0avgtext+0avgdata 87872maxresident)k 0inputs+0outputs (0major+11849minor)pagefaults 0swaps Running test: pyspark/mllib/recommendation.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.32user 0.09system 0:11.42elapsed 3%CPU (0avgtext+0avgdata 94256maxresident)k 0inputs+32outputs (0major+11797minor)pagefaults 0swaps Running test: pyspark/mllib/regression.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.53user 0.17system 0:23.53elapsed 3%CPU (0avgtext+0avgdata 99600maxresident)k 0inputs+48outputs (0major+12402minor)pagefaults 0swaps Running test: pyspark/mllib/stat/_statistics.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.29user 0.09system 0:08.03elapsed 4%CPU (0avgtext+0avgdata 92656maxresident)k 0inputs+48outputs (0major+12508minor)pagefaults 0swaps Running test: pyspark/mllib/tree.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.57user 0.16system 0:25.30elapsed 2%CPU (0avgtext+0avgdata 94400maxresident)k 0inputs+144outputs (0major+12600minor)pagefaults 0swaps Running test: pyspark/mllib/util.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.20user 0.06system 0:08.08elapsed 3%CPU (0avgtext+0avgdata 92768maxresident)k 0inputs+56outputs (0major+12474minor)pagefaults 0swaps Running test: pyspark/mllib/tests.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath .F/usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) ./usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see
[jira] [Updated] (SPARK-5585) Flaky test: Python regression
[ https://issues.apache.org/jira/browse/SPARK-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5585: --- Priority: Critical (was: Major) Flaky test: Python regression - Key: SPARK-5585 URL: https://issues.apache.org/jira/browse/SPARK-5585 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Davies Liu Priority: Critical Labels: flaky-test Hey [~davies] any chance you can take a look at this? The master build is having random python failures fairly often. Not quite sure what is going on: {code} 0inputs+128outputs (0major+13320minor)pagefaults 0swaps Run mllib tests ... Running test: pyspark/mllib/classification.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.43user 0.12system 0:14.85elapsed 3%CPU (0avgtext+0avgdata 94272maxresident)k 0inputs+280outputs (0major+12627minor)pagefaults 0swaps Running test: pyspark/mllib/clustering.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.35user 0.11system 0:12.63elapsed 3%CPU (0avgtext+0avgdata 93568maxresident)k 0inputs+88outputs (0major+12532minor)pagefaults 0swaps Running test: pyspark/mllib/feature.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.28user 0.08system 0:05.73elapsed 6%CPU (0avgtext+0avgdata 93424maxresident)k 0inputs+32outputs (0major+12548minor)pagefaults 0swaps Running test: pyspark/mllib/linalg.py 0.16user 0.05system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 89888maxresident)k 0inputs+0outputs (0major+8099minor)pagefaults 0swaps Running test: pyspark/mllib/rand.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.25user 0.08system 0:05.42elapsed 6%CPU (0avgtext+0avgdata 87872maxresident)k 0inputs+0outputs (0major+11849minor)pagefaults 0swaps Running test: pyspark/mllib/recommendation.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.32user 0.09system 0:11.42elapsed 3%CPU (0avgtext+0avgdata 94256maxresident)k 0inputs+32outputs (0major+11797minor)pagefaults 0swaps Running test: pyspark/mllib/regression.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.53user 0.17system 0:23.53elapsed 3%CPU (0avgtext+0avgdata 99600maxresident)k 0inputs+48outputs (0major+12402minor)pagefaults 0swaps Running test: pyspark/mllib/stat/_statistics.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.29user 0.09system 0:08.03elapsed 4%CPU (0avgtext+0avgdata 92656maxresident)k 0inputs+48outputs (0major+12508minor)pagefaults 0swaps Running test: pyspark/mllib/tree.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.57user 0.16system 0:25.30elapsed 2%CPU (0avgtext+0avgdata 94400maxresident)k 0inputs+144outputs (0major+12600minor)pagefaults 0swaps Running test: pyspark/mllib/util.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath 0.20user 0.06system 0:08.08elapsed 3%CPU (0avgtext+0avgdata 92768maxresident)k 0inputs+56outputs (0major+12474minor)pagefaults 0swaps Running test: pyspark/mllib/tests.py tput: No value for $TERM and no -T specified Spark assembly has been built with Hive, including Datanucleus jars on classpath .F/usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) ./usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`. VisibleDeprecationWarning) /usr/lib64/python2.6/site-packages/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see
[jira] [Updated] (SPARK-5341) Support maven coordinates in spark-shell and spark-submit
[ https://issues.apache.org/jira/browse/SPARK-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5341: --- Assignee: Burak Yavuz Support maven coordinates in spark-shell and spark-submit - Key: SPARK-5341 URL: https://issues.apache.org/jira/browse/SPARK-5341 Project: Spark Issue Type: New Feature Components: Deploy, Spark Shell Reporter: Burak Yavuz Assignee: Burak Yavuz Priority: Critical Fix For: 1.3.0 This feature will allow users to provide the maven coordinates of jars they wish to use in their spark application. Coordinates can be a comma-delimited list and be supplied like: ```spark-submit --maven org.apache.example.a,org.apache.example.b``` This feature will also be added to spark-shell (where it is more critical to have this feature) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5341) Support maven coordinates in spark-shell and spark-submit
[ https://issues.apache.org/jira/browse/SPARK-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5341. Resolution: Fixed Fix Version/s: 1.3.0 Support maven coordinates in spark-shell and spark-submit - Key: SPARK-5341 URL: https://issues.apache.org/jira/browse/SPARK-5341 Project: Spark Issue Type: New Feature Components: Deploy, Spark Shell Reporter: Burak Yavuz Assignee: Burak Yavuz Priority: Critical Fix For: 1.3.0 This feature will allow users to provide the maven coordinates of jars they wish to use in their spark application. Coordinates can be a comma-delimited list and be supplied like: ```spark-submit --maven org.apache.example.a,org.apache.example.b``` This feature will also be added to spark-shell (where it is more critical to have this feature) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5586) Automatically provide sqlContext in Spark shell
[ https://issues.apache.org/jira/browse/SPARK-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5586: --- Priority: Critical (was: Major) Automatically provide sqlContext in Spark shell --- Key: SPARK-5586 URL: https://issues.apache.org/jira/browse/SPARK-5586 Project: Spark Issue Type: Improvement Components: Spark Shell, SQL Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Critical A simple patch, but we should create a sqlContext (and, if supported by the build, a Hive context) in the Spark shell when it's created, and import the DSL. We can just call it sqlContext. This would save us so much time writing code examples :P -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5586) Automatically provide sqlContext in Spark shell
Patrick Wendell created SPARK-5586: -- Summary: Automatically provide sqlContext in Spark shell Key: SPARK-5586 URL: https://issues.apache.org/jira/browse/SPARK-5586 Project: Spark Issue Type: Improvement Components: Spark Shell, SQL Reporter: Patrick Wendell Assignee: Patrick Wendell Fix For: 1.3.0 A simple patch, but we should create a sqlContext (and, if supported by the build, a Hive context) in the Spark shell when it's created, and import the DSL. We can just call it sqlContext. This would save us so much time writing code examples :P -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5586) Automatically provide sqlContext in Spark shell
[ https://issues.apache.org/jira/browse/SPARK-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5586: --- Fix Version/s: (was: 1.3.0) Automatically provide sqlContext in Spark shell --- Key: SPARK-5586 URL: https://issues.apache.org/jira/browse/SPARK-5586 Project: Spark Issue Type: Improvement Components: Spark Shell, SQL Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Critical A simple patch, but we should create a sqlContext (and, if supported by the build, a Hive context) in the Spark shell when it's created, and import the DSL. We can just call it sqlContext. This would save us so much time writing code examples :P -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5140) Two RDDs which are scheduled concurrently should be able to wait on parent in all cases
[ https://issues.apache.org/jira/browse/SPARK-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5140: --- Fix Version/s: (was: 1.2.1) (was: 1.3.0) Two RDDs which are scheduled concurrently should be able to wait on parent in all cases --- Key: SPARK-5140 URL: https://issues.apache.org/jira/browse/SPARK-5140 Project: Spark Issue Type: New Feature Reporter: Corey J. Nolet Labels: features Not sure if this would change too much of the internals to be included in the 1.2.1 but it would be very helpful if it could be. This ticket is from a discussion between myself and [~ilikerps]. Here's the result of some testing that [~ilikerps] did: bq. I did some testing as well, and it turns out the wait for other guy to finish caching logic is on a per-task basis, and it only works on tasks that happen to be executing on the same machine. bq. Once a partition is cached, we will schedule tasks that touch that partition on that executor. The problem here, though, is that the cache is in progress, and so the tasks are still scheduled randomly (or with whatever locality the data source has), so tasks which end up on different machines will not see that the cache is already in progress. {code} Here was my test, by the way: import scala.concurrent.ExecutionContext.Implicits.global import scala.concurrent._ import scala.concurrent.duration._ val rdd = sc.parallelize(0 until 8).map(i = { Thread.sleep(1); i }).cache() val futures = (0 until 4).map { _ = Future { rdd.count } } Await.result(Future.sequence(futures), 120.second) {code} bq. Note that I run the future 4 times in parallel. I found that the first run has all tasks take 10 seconds. The second has about 50% of its tasks take 10 seconds, and the rest just wait for the first stage to finish. The last two runs have no tasks that take 10 seconds; all wait for the first two stages to finish. What we want is the ability to fire off a job and have the DAG figure out that two RDDs depend on the same parent so that when the children are scheduled concurrently, the first one to start will activate the parent and both will wait on the parent. When the parent is done, they will both be able to finish their work concurrently. We are trying to use this pattern by having the parent cache results. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5569) Checkpoints cannot reference classes defined outside of Spark's assembly
Patrick Wendell created SPARK-5569: -- Summary: Checkpoints cannot reference classes defined outside of Spark's assembly Key: SPARK-5569 URL: https://issues.apache.org/jira/browse/SPARK-5569 Project: Spark Issue Type: Bug Components: Streaming Reporter: Patrick Wendell Not sure if this is a bug or a feature, but it's not obvious, so wanted to create a JIRA to make sure we document this behavior. First documented by Cody Koeninger: https://gist.github.com/koeninger/561a61482cd1b5b3600c {code} 15/01/12 16:07:07 INFO CheckpointReader: Attempting to load checkpoint from file file:/var/tmp/cp/checkpoint-142110041.bk 15/01/12 16:07:07 WARN CheckpointReader: Error reading checkpoint from file file:/var/tmp/cp/checkpoint-142110041.bk java.io.IOException: java.lang.ClassNotFoundException: org.apache.spark.rdd.kafka.KafkaRDDPartition at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1043) at org.apache.spark.streaming.dstream.DStreamCheckpointData.readObject(DStreamCheckpointData.scala:146) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500) at org.apache.spark.streaming.DStreamGraph$$anonfun$readObject$1.apply$mcV$sp(DStreamGraph.scala:180) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1040) at org.apache.spark.streaming.DStreamGraph.readObject(DStreamGraph.scala:176) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.streaming.CheckpointReader$$anonfun$read$2.apply(Checkpoint.scala:251) at org.apache.spark.streaming.CheckpointReader$$anonfun$read$2.apply(Checkpoint.scala:239) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) at org.apache.spark.streaming.CheckpointReader$.read(Checkpoint.scala:239) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:552) at example.CheckpointedExample$.main(CheckpointedExample.scala:34) at example.CheckpointedExample.main(CheckpointedExample.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57
[ANNOUNCE] branch-1.3 has been cut
Hey All, Just wanted to announce that we've cut the 1.3 branch which will become the 1.3 release after community testing. There are still some features that will go in (in higher level libraries, and some stragglers in spark core), but overall this indicates the end of major feature development for Spark 1.3 and a transition into testing. Within a few days I'll cut a snapshot package release for this so that people can begin testing. https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-1.3 - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302999#comment-14302999 ] Patrick Wendell commented on SPARK-4550: The doc alludes to having to (at some point) deal with comparing serialized objects. In the future one approach would be to restrict this only to SchemaRDD's where we can have more control over the serialized format. This is effectively what Flink and other systems do (they basically only have SchemaRDD's). In sort-based shuffle, store map outputs in serialized form --- Key: SPARK-4550 URL: https://issues.apache.org/jira/browse/SPARK-4550 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 1.2.0 Reporter: Sandy Ryza Priority: Critical Attachments: SPARK-4550-design-v1.pdf One drawback with sort-based shuffle compared to hash-based shuffle is that it ends up storing many more java objects in memory. If Spark could store map outputs in serialized form, it could * spill less often because the serialized form is more compact * reduce GC pressure This will only work when the serialized representations of objects are independent from each other and occupy contiguous segments of memory. E.g. when Kryo reference tracking is left on, objects may contain pointers to objects farther back in the stream, which means that the sort can't relocate objects without corrupting them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5388) Provide a stable application submission gateway in standalone cluster mode
[ https://issues.apache.org/jira/browse/SPARK-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304010#comment-14304010 ] Patrick Wendell commented on SPARK-5388: The intention for this is really just to take single RPC that was using Akka and add a stable version of it that we are okay supporting long term. It doesn't preclude moving to avro or some other RPC as a general thing we use across all of Spark. However, that design choice was intentionally excluded from this decision given all the complexities you bring up. Doing some basic message dispatching on our own - there is only a small and very straightforward code related to this. Adopting Avro would be overkill for this. In the current implementation the client and server exchange Spark versions, so this is the basis of reasoning about version changes - maybe it wasn't in the design doc. In terms of evolvability, the way you do this is that you only add new functionality over time, and you never remove fields from messages. This is similar to the API contract of the history logs with the history server. So the idea is that newer clients would implement a super set of messages and fields as older ones. Adding v1 seems like a good idea in case this evolves into something public or more well specified over time. It would just be good to define precisely what it means to advance that version identifier. That all matters a lot more if we want it to be something others interact with. Provide a stable application submission gateway in standalone cluster mode -- Key: SPARK-5388 URL: https://issues.apache.org/jira/browse/SPARK-5388 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Blocker Attachments: Stable Spark Standalone Submission.pdf The existing submission gateway in standalone mode is not compatible across Spark versions. If you have a newer version of Spark submitting to an older version of the standalone Master, it is currently not guaranteed to work. The goal is to provide a stable REST interface to replace this channel. The first cut implementation will target standalone cluster mode because there are very few messages exchanged. The design, however, should be general enough to potentially support this for other cluster managers too. Note that this is not necessarily required in YARN because we already use YARN's stable interface to submit applications there. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Spark Master Maven with YARN build is broken
It's my fault, I'm sending a hot fix now. On Mon, Feb 2, 2015 at 1:44 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ Is this is a known issue? It seems to have been broken since last night. Here's a snippet from the build output of one of the builds https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/1308/console : [error] bad symbolic reference. A signature in WebUI.class refers to term eclipse [error] in package org which is not available. [error] It may be completely missing from the current classpath, or the version on [error] the classpath might be incompatible with the version used when compiling WebUI.class. [error] bad symbolic reference. A signature in WebUI.class refers to term jetty [error] in value org.eclipse which is not available. [error] It may be completely missing from the current classpath, or the version on [error] the classpath might be incompatible with the version used when compiling WebUI.class. [error] [error] while compiling: /home/jenkins/workspace/Spark-Master-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala [error] during phase: erasure [error] library version: version 2.10.4 [error] compiler version: version 2.10.4 Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-3778) newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn
[ https://issues.apache.org/jira/browse/SPARK-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302290#comment-14302290 ] Patrick Wendell commented on SPARK-3778: /cc [~hshreedharan] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn - Key: SPARK-3778 URL: https://issues.apache.org/jira/browse/SPARK-3778 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Thomas Graves Assignee: Thomas Graves Priority: Blocker The newAPIHadoopRDD routine doesn't properly add the credentials to the conf to be able to access secure hdfs. Note that newAPIHadoopFile does handle these because the org.apache.hadoop.mapreduce.Job automatically adds it for you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3778) newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn
[ https://issues.apache.org/jira/browse/SPARK-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3778: --- Priority: Blocker (was: Critical) newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn - Key: SPARK-3778 URL: https://issues.apache.org/jira/browse/SPARK-3778 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.1.0 Reporter: Thomas Graves Assignee: Thomas Graves Priority: Blocker The newAPIHadoopRDD routine doesn't properly add the credentials to the conf to be able to access secure hdfs. Note that newAPIHadoopFile does handle these because the org.apache.hadoop.mapreduce.Job automatically adds it for you. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4550: --- Target Version/s: 1.4.0 In sort-based shuffle, store map outputs in serialized form --- Key: SPARK-4550 URL: https://issues.apache.org/jira/browse/SPARK-4550 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 1.2.0 Reporter: Sandy Ryza Attachments: SPARK-4550-design-v1.pdf One drawback with sort-based shuffle compared to hash-based shuffle is that it ends up storing many more java objects in memory. If Spark could store map outputs in serialized form, it could * spill less often because the serialized form is more compact * reduce GC pressure This will only work when the serialized representations of objects are independent from each other and occupy contiguous segments of memory. E.g. when Kryo reference tracking is left on, objects may contain pointers to objects farther back in the stream, which means that the sort can't relocate objects without corrupting them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4550: --- Priority: Critical (was: Major) In sort-based shuffle, store map outputs in serialized form --- Key: SPARK-4550 URL: https://issues.apache.org/jira/browse/SPARK-4550 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 1.2.0 Reporter: Sandy Ryza Priority: Critical Attachments: SPARK-4550-design-v1.pdf One drawback with sort-based shuffle compared to hash-based shuffle is that it ends up storing many more java objects in memory. If Spark could store map outputs in serialized form, it could * spill less often because the serialized form is more compact * reduce GC pressure This will only work when the serialized representations of objects are independent from each other and occupy contiguous segments of memory. E.g. when Kryo reference tracking is left on, objects may contain pointers to objects farther back in the stream, which means that the sort can't relocate objects without corrupting them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5195) when hive table is query with alias the cache data lose effectiveness.
[ https://issues.apache.org/jira/browse/SPARK-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5195: --- Fix Version/s: (was: 1.2.1) when hive table is query with alias the cache data lose effectiveness. Key: SPARK-5195 URL: https://issues.apache.org/jira/browse/SPARK-5195 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: yixiaohua Fix For: 1.3.0 override the MetastoreRelation's sameresult method only compare databasename and table name because in previous : cache table t1; select count() from t1; it will read data from memory but the sql below will not,instead it read from hdfs: select count() from t1 t; because cache data is keyed by logical plan and compare with sameResult ,so when table with alias the same table 's logicalplan is not the same logical plan with out alias so modify the sameresult method only compare databasename and table name -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4508) Native Date type for SQL92 Date
[ https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4508: --- Fix Version/s: (was: 1.3.0) Native Date type for SQL92 Date --- Key: SPARK-4508 URL: https://issues.apache.org/jira/browse/SPARK-4508 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Adrian Wang Assignee: Adrian Wang Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 bytes as Long) in catalyst row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5541) Allow running Maven or SBT in the Spark build
Patrick Wendell created SPARK-5541: -- Summary: Allow running Maven or SBT in the Spark build Key: SPARK-5541 URL: https://issues.apache.org/jira/browse/SPARK-5541 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Nicholas Chammas It would be nice if we had a hook for the spark test scripts to run with Maven in addition to running with SBT. Right now it is difficult for us to test pull requests in maven and we get master build breaks because of it. A simple first step is to modify run-tests to allow building with maven. Then we can add a second PRB that invokes this maven build. I would just add an env var called SPARK_BUILD_TOOL that can be set to sbt or mvn. And make sure the associated logic works in either case. If we don't want to have the fancy SQL only stuff in Maven, that's fine too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5541) Allow running Maven or SBT in run-tests
[ https://issues.apache.org/jira/browse/SPARK-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5541: --- Summary: Allow running Maven or SBT in run-tests (was: Allow running Maven or SBT in the Spark build) Allow running Maven or SBT in run-tests --- Key: SPARK-5541 URL: https://issues.apache.org/jira/browse/SPARK-5541 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Nicholas Chammas It would be nice if we had a hook for the spark test scripts to run with Maven in addition to running with SBT. Right now it is difficult for us to test pull requests in maven and we get master build breaks because of it. A simple first step is to modify run-tests to allow building with maven. Then we can add a second PRB that invokes this maven build. I would just add an env var called SPARK_BUILD_TOOL that can be set to sbt or mvn. And make sure the associated logic works in either case. If we don't want to have the fancy SQL only stuff in Maven, that's fine too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-4508) Native Date type for SQL92 Date
[ https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-4508: This has caused several date-related test failures in the master and pull request builds, so I'm reverting it: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26560/testReport/org.apache.spark.sql/ScalaReflectionRelationSuite/query_case_class_RDD/ Native Date type for SQL92 Date --- Key: SPARK-4508 URL: https://issues.apache.org/jira/browse/SPARK-4508 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Adrian Wang Assignee: Adrian Wang Fix For: 1.3.0 Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 bytes as Long) in catalyst row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302326#comment-14302326 ] Patrick Wendell commented on SPARK-4550: Yeah, this is a good idea. I don't see why we don't serialize these immediately. In sort-based shuffle, store map outputs in serialized form --- Key: SPARK-4550 URL: https://issues.apache.org/jira/browse/SPARK-4550 Project: Spark Issue Type: Improvement Components: Shuffle, Spark Core Affects Versions: 1.2.0 Reporter: Sandy Ryza Priority: Critical Attachments: SPARK-4550-design-v1.pdf One drawback with sort-based shuffle compared to hash-based shuffle is that it ends up storing many more java objects in memory. If Spark could store map outputs in serialized form, it could * spill less often because the serialized form is more compact * reduce GC pressure This will only work when the serialized representations of objects are independent from each other and occupy contiguous segments of memory. E.g. when Kryo reference tracking is left on, objects may contain pointers to objects farther back in the stream, which means that the sort can't relocate objects without corrupting them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5542) Decouple publishing, packaging, and tagging in release script
Patrick Wendell created SPARK-5542: -- Summary: Decouple publishing, packaging, and tagging in release script Key: SPARK-5542 URL: https://issues.apache.org/jira/browse/SPARK-5542 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Our release script should make it easy to do these separately. I.e. it should be possible to publish a release from a tag that we already cut. This would help with things such as publishing nightly releases (SPARK-1517). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Temporary jenkins issue
Hey All, I made a change to the Jenkins configuration that caused most builds to fail (attempting to enable a new plugin), I've reverted the change effective about 10 minutes ago. If you've seen recent build failures like below, this was caused by that change. Sorry about that. ERROR: Publisher com.google.jenkins.flakyTestHandler.plugin.JUnitFlakyResultArchiver aborted due to exception java.lang.NoSuchMethodError: hudson.model.AbstractBuild.getTestResultAction()Lhudson/tasks/test/AbstractTestResultAction; at com.google.jenkins.flakyTestHandler.plugin.FlakyTestResultAction.init(FlakyTestResultAction.java:78) at com.google.jenkins.flakyTestHandler.plugin.JUnitFlakyResultArchiver.perform(JUnitFlakyResultArchiver.java:89) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:770) at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:734) at hudson.model.Build$BuildExecution.post2(Build.java:183) at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:683) at hudson.model.Run.execute(Run.java:1784) at hudson.matrix.MatrixRun.run(MatrixRun.java:146) at hudson.model.ResourceController.execute(ResourceController.java:89) at hudson.model.Executor.run(Executor.java:240) - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Resolved] (SPARK-5542) Decouple publishing, packaging, and tagging in release script
[ https://issues.apache.org/jira/browse/SPARK-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5542. Resolution: Fixed Fix Version/s: 1.3.0 Decouple publishing, packaging, and tagging in release script - Key: SPARK-5542 URL: https://issues.apache.org/jira/browse/SPARK-5542 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Fix For: 1.3.0 Our release script should make it easy to do these separately. I.e. it should be possible to publish a release from a tag that we already cut. This would help with things such as publishing nightly releases (SPARK-1517). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5548) Flaky test: org.apache.spark.util.AkkaUtilsSuite.remote fetch ssl on - untrusted server
Patrick Wendell created SPARK-5548: -- Summary: Flaky test: org.apache.spark.util.AkkaUtilsSuite.remote fetch ssl on - untrusted server Key: SPARK-5548 URL: https://issues.apache.org/jira/browse/SPARK-5548 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Patrick Wendell Assignee: Jacek Lewandowski {code} sbt.ForkMain$ForkError: Expected exception java.util.concurrent.TimeoutException to be thrown, but akka.actor.ActorNotFound was thrown. at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:496) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$class.intercept(Assertions.scala:1004) at org.scalatest.FunSuite.intercept(FunSuite.scala:1555) at org.apache.spark.util.AkkaUtilsSuite$$anonfun$8.apply$mcV$sp(AkkaUtilsSuite.scala:373) at org.apache.spark.util.AkkaUtilsSuite$$anonfun$8.apply(AkkaUtilsSuite.scala:349) at org.apache.spark.util.AkkaUtilsSuite$$anonfun$8.apply(AkkaUtilsSuite.scala:349) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.util.AkkaUtilsSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(AkkaUtilsSuite.scala:37) at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) at org.apache.spark.util.AkkaUtilsSuite.runTest(AkkaUtilsSuite.scala:37) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.util.AkkaUtilsSuite.org$scalatest$BeforeAndAfterAll$$super$run(AkkaUtilsSuite.scala:37) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at org.apache.spark.util.AkkaUtilsSuite.run(AkkaUtilsSuite.scala:37) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294) at sbt.ForkMain$Run$2.call(ForkMain.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: sbt.ForkMain$ForkError: Actor not found for: ActorSelection[Anchor(akka.ssl.tcp://spark@localhost:41417/), Path(/user/MapOutputTracker)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65
[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC2)
This is cancelled in favor of RC2. On Mon, Feb 2, 2015 at 8:50 PM, Patrick Wendell pwend...@gmail.com wrote: The windows issue reported only affects actually running Spark on Windows (not job submission). However, I agree it's worth cutting a new RC. I'm going to cancel this vote and propose RC3 with a single additional patch. Let's try to vote that through so we can ship Spark 1.2.1. - Patrick On Sat, Jan 31, 2015 at 7:36 PM, Matei Zaharia matei.zaha...@gmail.com wrote: This looks like a pretty serious problem, thanks! Glad people are testing on Windows. Matei On Jan 31, 2015, at 11:57 AM, MartinWeindel martin.wein...@gmail.com wrote: FYI: Spark 1.2.1rc2 does not work on Windows! On creating a Spark context you get following log output on my Windows machine: INFO org.apache.spark.SparkEnv:59 - Registering BlockManagerMaster ERROR org.apache.spark.util.Utils:75 - Failed to create local root dir in C:\Users\mweindel\AppData\Local\Temp\. Ignoring this directory. ERROR org.apache.spark.storage.DiskBlockManager:75 - Failed to create any local dir. I have already located the cause. A newly added function chmod700() in org.apache.util.Utils uses functionality which only works on a Unix file system. See also pull request [https://github.com/apache/spark/pull/4299] for my suggestion how to resolve the issue. Best regards, Martin Weindel -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-2-1-RC2-tp10317p10370.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org