[jira] [Updated] (SPARK-7820) Java8-tests suite compile error under SBT
[ https://issues.apache.org/jira/browse/SPARK-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7820: --- Priority: Critical (was: Blocker) Java8-tests suite compile error under SBT - Key: SPARK-7820 URL: https://issues.apache.org/jira/browse/SPARK-7820 Project: Spark Issue Type: Bug Components: Build, Streaming Affects Versions: 1.4.0 Reporter: Saisai Shao Priority: Critical Lots of compilation error is shown when java 8 test suite is enabled in SBT: {{JAVA_HOME=/usr/java/jdk1.8.0_45 ./sbt/sbt -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Pjava8-tests}} {code} [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:43: error: cannot find symbol [error] public class Java8APISuite extends LocalJavaStreamingContext implements Serializable { [error]^ [error] symbol: class LocalJavaStreamingContext [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55: error: cannot find symbol [error] JavaDStreamString stream = JavaTestUtils.attachTestInputStream(ssc, inputData, 1); [error] ^ [error] symbol: variable ssc [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55: error: cannot find symbol [error] JavaDStreamString stream = JavaTestUtils.attachTestInputStream(ssc, inputData, 1); [error] ^ [error] symbol: variable JavaTestUtils [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:57: error: cannot find symbol [error] JavaTestUtils.attachTestOutputStream(letterCount); [error] ^ [error] symbol: variable JavaTestUtils [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58: error: cannot find symbol [error] ListListInteger result = JavaTestUtils.runStreams(ssc, 2, 2); [error] ^ [error] symbol: variable ssc [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58: error: cannot find symbol [error] ListListInteger result = JavaTestUtils.runStreams(ssc, 2, 2); [error] ^ [error] symbol: variable JavaTestUtils [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:73: error: cannot find symbol [error] JavaDStreamString stream = JavaTestUtils.attachTestInputStream(ssc, inputData, 1); [error] ^ [error] symbol: variable ssc [error] location: class Java8APISuite {code} The class {{JavaAPISuite}} relies on {{LocalJavaStreamingContext}} which exists in streaming test jar. It is OK for maven compile, since it will generate test jar, but will be failed in sbt test compile, sbt do not generate test jar by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7820) Java8-tests suite compile error under SBT
[ https://issues.apache.org/jira/browse/SPARK-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557581#comment-14557581 ] Patrick Wendell commented on SPARK-7820: Since this only affects tests I'm de-escalating it, but I'd like to see it fixed as well before 1.4.0 ships if possible. Java8-tests suite compile error under SBT - Key: SPARK-7820 URL: https://issues.apache.org/jira/browse/SPARK-7820 Project: Spark Issue Type: Bug Components: Build, Streaming Affects Versions: 1.4.0 Reporter: Saisai Shao Priority: Critical Lots of compilation error is shown when java 8 test suite is enabled in SBT: {{JAVA_HOME=/usr/java/jdk1.8.0_45 ./sbt/sbt -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Pjava8-tests}} {code} [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:43: error: cannot find symbol [error] public class Java8APISuite extends LocalJavaStreamingContext implements Serializable { [error]^ [error] symbol: class LocalJavaStreamingContext [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55: error: cannot find symbol [error] JavaDStreamString stream = JavaTestUtils.attachTestInputStream(ssc, inputData, 1); [error] ^ [error] symbol: variable ssc [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55: error: cannot find symbol [error] JavaDStreamString stream = JavaTestUtils.attachTestInputStream(ssc, inputData, 1); [error] ^ [error] symbol: variable JavaTestUtils [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:57: error: cannot find symbol [error] JavaTestUtils.attachTestOutputStream(letterCount); [error] ^ [error] symbol: variable JavaTestUtils [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58: error: cannot find symbol [error] ListListInteger result = JavaTestUtils.runStreams(ssc, 2, 2); [error] ^ [error] symbol: variable ssc [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58: error: cannot find symbol [error] ListListInteger result = JavaTestUtils.runStreams(ssc, 2, 2); [error] ^ [error] symbol: variable JavaTestUtils [error] location: class Java8APISuite [error] /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:73: error: cannot find symbol [error] JavaDStreamString stream = JavaTestUtils.attachTestInputStream(ssc, inputData, 1); [error] ^ [error] symbol: variable ssc [error] location: class Java8APISuite {code} The class {{JavaAPISuite}} relies on {{LocalJavaStreamingContext}} which exists in streaming test jar. It is OK for maven compile, since it will generate test jar, but will be failed in sbt test compile, sbt do not generate test jar by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7287) Flaky test: o.a.s.deploy.SparkSubmitSuite --packages
[ https://issues.apache.org/jira/browse/SPARK-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557610#comment-14557610 ] Patrick Wendell commented on SPARK-7287: [~brkyvz] I am going to disable this test again, it is still failing even after SPARK-7224: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.4-Maven-pre-YARN/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/235/testReport/junit/org.apache.spark.deploy/SparkSubmitSuite/includes_jars_passed_in_through___packages/ Flaky test: o.a.s.deploy.SparkSubmitSuite --packages Key: SPARK-7287 URL: https://issues.apache.org/jira/browse/SPARK-7287 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Burak Yavuz Priority: Critical Labels: flaky-test Error message was not helpful (did not complete within 60 seconds or something). Observed only in master: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/2239/ https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/2238/ https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2163/ ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: spark packages
Yes - spark packages can include non ASF licenses. On Sat, May 23, 2015 at 6:16 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, Is it possible to add GPL/LGPL code on spark packages or it must be licensed under Apache as well ? I want to expose Professor Tim Davis's LGPL library for sparse algebra and ECOS GPL library through the package. Thanks. Deb - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Updated] (SPARK-7807) High-Availablity:: SparkHadoopUtil.scala should support hadoopConfiguration.addResource()
[ https://issues.apache.org/jira/browse/SPARK-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7807: --- Component/s: Spark Core High-Availablity:: SparkHadoopUtil.scala should support hadoopConfiguration.addResource() -- Key: SPARK-7807 URL: https://issues.apache.org/jira/browse/SPARK-7807 Project: Spark Issue Type: Improvement Components: Spark Core Environment: running spark against remote-hadoop HA cluster. Easy of use with spark.hadoop.url. prefix. 1) user can support sparkConf with prefix spark.hadoop.url. like spark.hadoop.url.core-site and spark.hadoop.url.hdfs-site Reporter: Norman He Priority: Trivial Labels: easyfix line 97 : should below should be able to change to conf.getAll.foreach { case (key, value) = if (key.startsWith(spark.hadoop.)) { hadoopConf.set(key.substring(spark.hadoop..length), value) } } new version--- conf.getAll.foreach { case (key, value) = if (key.startsWith(spark.hadoop.)) { if( key.startsWith(spark.hadoop.url.)) hadoopConf.addResource(new URL(value)) else hadoopConf.set(key.substring(spark.hadoop..length), value) } } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC1)
Thanks Andrew, the doc issue should be fixed in RC2 (if not, please chine in!). R was missing in the build envirionment. - Patrick On Fri, May 22, 2015 at 3:33 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Thanks for catching this. I'll check with Patrick to see why the R API docs are not getting included. On Fri, May 22, 2015 at 2:44 PM, Andrew Psaltis psaltis.and...@gmail.com wrote: All, Should all the docs work from http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ ? If so the R API docs 404. On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc1 (commit 777a081): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1092/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Friday, May 22, at 17:03 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Created] (SPARK-7805) Move SQLTestUtils.scala form src/main
Patrick Wendell created SPARK-7805: -- Summary: Move SQLTestUtils.scala form src/main Key: SPARK-7805 URL: https://issues.apache.org/jira/browse/SPARK-7805 Project: Spark Issue Type: Bug Components: SQL Reporter: Patrick Wendell Assignee: Yin Huai Priority: Critical These trigger binary compatibility issues when changed. In general we shouldn't be putting test code in src/main. If it's needed by multiple modules, IIRC we have a way to do that (look elsewhere in Spark). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7771) Dynamic allocation: lower timeouts further
[ https://issues.apache.org/jira/browse/SPARK-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7771: --- Issue Type: Improvement (was: Bug) Dynamic allocation: lower timeouts further -- Key: SPARK-7771 URL: https://issues.apache.org/jira/browse/SPARK-7771 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Andrew Or While testing, I found that the existing timeouts of 5s to add and 600s to remove still too high for many workloads. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7801) Upgrade master versions to Spark 1.5.0
[ https://issues.apache.org/jira/browse/SPARK-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7801: --- Issue Type: Improvement (was: Bug) Upgrade master versions to Spark 1.5.0 -- Key: SPARK-7801 URL: https://issues.apache.org/jira/browse/SPARK-7801 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7803) Update MIMA for Spark 1.5.0
Patrick Wendell created SPARK-7803: -- Summary: Update MIMA for Spark 1.5.0 Key: SPARK-7803 URL: https://issues.apache.org/jira/browse/SPARK-7803 Project: Spark Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell We should do this after we publish 1.4 binaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-7802) Update pom versioning to 1.5.0
[ https://issues.apache.org/jira/browse/SPARK-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell deleted SPARK-7802: --- Update pom versioning to 1.5.0 -- Key: SPARK-7802 URL: https://issues.apache.org/jira/browse/SPARK-7802 Project: Spark Issue Type: Sub-task Reporter: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-7803) Update MIMA for Spark 1.5.0
[ https://issues.apache.org/jira/browse/SPARK-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell deleted SPARK-7803: --- Update MIMA for Spark 1.5.0 --- Key: SPARK-7803 URL: https://issues.apache.org/jira/browse/SPARK-7803 Project: Spark Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Patrick Wendell We should do this after we publish 1.4 binaries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7801) Upgrade master versions to Spark 1.5.0
Patrick Wendell created SPARK-7801: -- Summary: Upgrade master versions to Spark 1.5.0 Key: SPARK-7801 URL: https://issues.apache.org/jira/browse/SPARK-7801 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7802) Update pom versioning to 1.5.0
Patrick Wendell created SPARK-7802: -- Summary: Update pom versioning to 1.5.0 Key: SPARK-7802 URL: https://issues.apache.org/jira/browse/SPARK-7802 Project: Spark Issue Type: Sub-task Reporter: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7320) Add rollup and cube support to DataFrame DSL
[ https://issues.apache.org/jira/browse/SPARK-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553073#comment-14553073 ] Patrick Wendell commented on SPARK-7320: Hey [~liancheng] and [~chenghao] - I reverted pull request 6257 because it broke all of our maven builds. The issue is that it's not safe to rely on the test suite constructor to create the table used in the tests. Separately, I noticed this JIRA was not closed when the patch was merged. In cases like this where the patch only addresses part of the JIRA, it is better to just create a sub task or split the task into two different JIRA's. I.e. every pull request (ideally) is associated with exactly one JIRA. Otherwise, it's difficult to track things like when we revert a patch. Add rollup and cube support to DataFrame DSL Key: SPARK-7320 URL: https://issues.apache.org/jira/browse/SPARK-7320 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Cheng Hao Labels: starter We should add two functions to GroupedData in order to support rollup and cube for the DataFrame DSL. {code} def rollup(): GroupedData def cube(): GroupedData {code} These two should return new GroupedData with the appropriate state set so when we run an Aggregate, we translate the underlying logical operator into Rollup or Cube. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7389) Tachyon integration improvement
[ https://issues.apache.org/jira/browse/SPARK-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7389: --- Assignee: shimingfei Tachyon integration improvement --- Key: SPARK-7389 URL: https://issues.apache.org/jira/browse/SPARK-7389 Project: Spark Issue Type: Improvement Components: Block Manager Reporter: shimingfei Assignee: shimingfei Fix For: 1.5.0 Two main changes: 1. Add two functions in ExternalBlockManager, which are putValues and getValues, because the implementation may not rely on the putBytes and getBytes 2. improve Tachyon integration. Currently, when putting data into Tachyon, Spark first serialize all data in one partition into a ByteBuffer, and then write into Tachyon, this will use much memory and increase GC overhead when getting data from Tachyon, getValues depends on getBytes, which also read all data into On heap byte arry, and result in much memory usage. This PR changes the approach of the two functions, make them read / write data by stream to reduce memory usage. In our testing, when data size is huge, this patch reduces about 30% GC time and 70% full GC time, and total execution time reduces about 10% -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7389) Tachyon integration improvement
[ https://issues.apache.org/jira/browse/SPARK-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7389. Resolution: Fixed Fix Version/s: 1.5.0 Tachyon integration improvement --- Key: SPARK-7389 URL: https://issues.apache.org/jira/browse/SPARK-7389 Project: Spark Issue Type: Improvement Components: Block Manager Reporter: shimingfei Assignee: shimingfei Fix For: 1.5.0 Two main changes: 1. Add two functions in ExternalBlockManager, which are putValues and getValues, because the implementation may not rely on the putBytes and getBytes 2. improve Tachyon integration. Currently, when putting data into Tachyon, Spark first serialize all data in one partition into a ByteBuffer, and then write into Tachyon, this will use much memory and increase GC overhead when getting data from Tachyon, getValues depends on getBytes, which also read all data into On heap byte arry, and result in much memory usage. This PR changes the approach of the two functions, make them read / write data by stream to reduce memory usage. In our testing, when data size is huge, this patch reduces about 30% GC time and 70% full GC time, and total execution time reduces about 10% -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7719) Java 6 code in UnsafeShuffleWriterSuite
[ https://issues.apache.org/jira/browse/SPARK-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7719: --- Description: This was causing a compile failure because emptyIterator() is not exposed in some versions of Java 6. I lost the exact compile error along the way in the console, but it's just a simple visibility issue. https://github.com/apache/spark/commit/9ebb44f8abb1a13f045eed60190954db904ffef7 I've removed the test code for now, but we probably want to use something from Guava instead for this: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterators.html was: This was causing a compile failure because emptyIterator() is not exposed in some versions of Java 6. I lost the exact compile error along the way in the console, but it's just a simple visibility issue. https://github.com/apache/spark/commit/9ebb44f8abb1a13f045eed60190954db904ffef7 Java 6 code in UnsafeShuffleWriterSuite --- Key: SPARK-7719 URL: https://issues.apache.org/jira/browse/SPARK-7719 Project: Spark Issue Type: Bug Components: Spark Core, Tests Reporter: Patrick Wendell Assignee: Josh Rosen Priority: Critical This was causing a compile failure because emptyIterator() is not exposed in some versions of Java 6. I lost the exact compile error along the way in the console, but it's just a simple visibility issue. https://github.com/apache/spark/commit/9ebb44f8abb1a13f045eed60190954db904ffef7 I've removed the test code for now, but we probably want to use something from Guava instead for this: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterators.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7722) Style checks do not run for Kinesis on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7722: --- Component/s: Project Infra Style checks do not run for Kinesis on Jenkins -- Key: SPARK-7722 URL: https://issues.apache.org/jira/browse/SPARK-7722 Project: Spark Issue Type: Bug Components: Project Infra, Streaming Reporter: Patrick Wendell Assignee: Tathagata Das Priority: Critical This caused the release build to fail late in the game. We should make sure jenkins is proactively checking it: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commitdiff;h=23cf897112624ece19a3b5e5394cdf71b9c3c8b3;hp=9ebb44f8abb1a13f045eed60190954db904ffef7 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7719) Java 6 code in UnsafeShuffleWriterSuite
Patrick Wendell created SPARK-7719: -- Summary: Java 6 code in UnsafeShuffleWriterSuite Key: SPARK-7719 URL: https://issues.apache.org/jira/browse/SPARK-7719 Project: Spark Issue Type: Bug Components: Spark Core, Tests Reporter: Patrick Wendell Assignee: Josh Rosen Priority: Critical This was causing a compile failure because emptyIterator() is not exposed in some versions of Java 6. I lost the exact compile error along the way in the console, but it's just a simple visibility issue. https://github.com/apache/spark/commit/9ebb44f8abb1a13f045eed60190954db904ffef7 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7722) Style checks do not run for Kinesis on Jenkins
Patrick Wendell created SPARK-7722: -- Summary: Style checks do not run for Kinesis on Jenkins Key: SPARK-7722 URL: https://issues.apache.org/jira/browse/SPARK-7722 Project: Spark Issue Type: Bug Components: Streaming Reporter: Patrick Wendell Assignee: Tathagata Das Priority: Critical This caused the release build to fail late in the game. We should make sure jenkins is proactively checking it: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commitdiff;h=23cf897112624ece19a3b5e5394cdf71b9c3c8b3;hp=9ebb44f8abb1a13f045eed60190954db904ffef7 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]
Patrick Wendell created SPARK-7726: -- Summary: Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher] Key: SPARK-7726 URL: https://issues.apache.org/jira/browse/SPARK-7726 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Priority: Blocker This one took a long time to track down. The Maven install phase is part of our release process. It runs the scala:doc target to generate doc jars. Between Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in a way that breaks our build. In both cases, it returned an error (there has been a long running error here that we've always ignored), however in 2.11.3 that error became fatal and failed the entire build process. The upgrade occurred in SPARK-7092. Here is a simple reproduction: {code} ./dev/change-version-to-2.11.sh mvn clean install -pl network/common -pl network/shuffle -DskipTests -Dscala-2.11 {code} This command exits success when Spark is at Scala 2.11.2 and fails with 2.11.3 or higher. In either case an error is printed: {code} [INFO] [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ spark-network-shuffle_2.11 --- /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56: error: not found: type Type protected Type type() { return Type.UPLOAD_BLOCK; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37: error: not found: type Type protected Type type() { return Type.STREAM_HANDLE; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44: error: not found: type Type protected Type type() { return Type.REGISTER_EXECUTOR; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40: error: not found: type Type protected Type type() { return Type.OPEN_BLOCKS; } ^ model contains 22 documentable templates four errors found {code} Ideally we'd just dig in and fix this error. Unfortunately it's a very confusing error and I have no idea why it is appearing. I'd propose reverting SPARK-7092 in the mean time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1
[ https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7670. Resolution: Duplicate Fix Version/s: SPARK-7726 Failure when building with scala 2.11 (after 1.3.1 -- Key: SPARK-7670 URL: https://issues.apache.org/jira/browse/SPARK-7670 Project: Spark Issue Type: Bug Components: Build Affects Versions: 1.4.0 Reporter: Fernando Ruben Otero Fix For: SPARK-7726 Attachments: Dockerfile When trying to build spark with scala 2.11 on revision c64ff8036cc6bc7c87743f4c751d7fe91c2e366a (the one on master when I'm submitting this issue) I'm getting export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m dev/change-version-to-2.11.sh mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests clean install ... ... ... [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ spark-network-shuffle_2.11 --- /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56: error: not found: type Type protected Type type() { return Type.UPLOAD_BLOCK; } ^ /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37: error: not found: type Type protected Type type() { return Type.STREAM_HANDLE; } ^ /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44: error: not found: type Type protected Type type() { return Type.REGISTER_EXECUTOR; } ^ /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40: error: not found: type Type protected Type type() { return Type.OPEN_BLOCKS; } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7092) Update spark scala version to 2.11.6
[ https://issues.apache.org/jira/browse/SPARK-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550124#comment-14550124 ] Patrick Wendell commented on SPARK-7092: This is reopened because it caused SPARK-7726 Update spark scala version to 2.11.6 Key: SPARK-7092 URL: https://issues.apache.org/jira/browse/SPARK-7092 Project: Spark Issue Type: Improvement Components: Spark Core, Spark Shell Affects Versions: 1.4.0 Reporter: Prashant Sharma Assignee: Prashant Sharma Priority: Minor Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7092) Update spark scala version to 2.11.6
[ https://issues.apache.org/jira/browse/SPARK-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7092: Update spark scala version to 2.11.6 Key: SPARK-7092 URL: https://issues.apache.org/jira/browse/SPARK-7092 Project: Spark Issue Type: Improvement Components: Spark Core, Spark Shell Affects Versions: 1.4.0 Reporter: Prashant Sharma Assignee: Prashant Sharma Priority: Minor Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC1)
HI all, I've created another release repository where the release is identified with the version 1.4.0-rc1: https://repository.apache.org/content/repositories/orgapachespark-1093/ On Tue, May 19, 2015 at 5:36 PM, Krishna Sankar ksanka...@gmail.com wrote: Quick tests from my side - looks OK. The results are same or very similar to 1.3.1. Will add dataframes et al in future tests. +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:42 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.3.1 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK Cheers k/ On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc1 (commit 777a081): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1092/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Friday, May 22, at 17:03 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC1)
Punya, Let me see if I can publish these under rc1 as well. In the future this will all be automated but current it's a somewhat manual task. - Patrick On Tue, May 19, 2015 at 9:32 AM, Punyashloka Biswal punya.bis...@gmail.com wrote: When publishing future RCs to the staging repository, would it be possible to use a version number that includes the rc1 designation? In the current setup, when I run a build against the artifacts at https://repository.apache.org/content/repositories/orgapachespark-1092/org/apache/spark/spark-core_2.10/1.4.0/, my local Maven cache will get polluted with things that claim to be 1.4.0 but aren't. It would be preferable for the version number to be 1.4.0-rc1 instead. Thanks! Punya On Tue, May 19, 2015 at 12:20 PM Sean Owen so...@cloudera.com wrote: Before I vote, I wanted to point out there are still 9 Blockers for 1.4.0. I'd like to use this status to really mean must happen before the release. Many of these may be already fixed, or aren't really blockers -- can just be updated accordingly. I bet at least one will require further work if it's really meant for 1.4, so all this means is there is likely to be another RC. We should still kick the tires on RC1. (I also assume we should be extra conservative about what is merged into 1.4 at this point.) SPARK-6784 SQL Clean up all the inbound/outbound conversions for DateType Adrian Wang SPARK-6811 SparkR Building binary R packages for SparkR Shivaram Venkataraman SPARK-6941 SQL Provide a better error message to explain that tables created from RDDs are immutable SPARK-7158 SQL collect and take return different results SPARK-7478 SQL Add a SQLContext.getOrCreate to maintain a singleton instance of SQLContext Tathagata Das SPARK-7616 SQL Overwriting a partitioned parquet table corrupt data Cheng Lian SPARK-7654 SQL DataFrameReader and DataFrameWriter for input/output API Reynold Xin SPARK-7662 SQL Exception of multi-attribute generator anlysis in projection SPARK-7713 SQL Use shared broadcast hadoop conf for partitioned table scan. Yin Huai On Tue, May 19, 2015 at 5:10 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc1 (commit 777a081): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1092/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Friday, May 22, at 17:03 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
branch-1.4 merge ettiquite
Hey All, Since we are now voting, please tread very carefully with branch-1.4 merges. For instances, bug fixes that don't represent regressions from 1.3.X, these probably shouldn't be merged unless they are extremely simple and well reviewed. As usual mature/core components (e.g. Spark core) are more sensitive than newer/edge ones (e.g. Dataframes). I'm happy to provide guidance to people if they are on the fence about patches. Ultimately this ends up being a matter of judgement and assessing risk of specific patches. Just ping me on github. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[VOTE] Release Apache Spark 1.4.0 (RC1)
Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc1 (commit 777a081): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1092/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Friday, May 22, at 17:03 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Updated] (SPARK-7743) Upgrade parquet dependency
[ https://issues.apache.org/jira/browse/SPARK-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7743: --- Component/s: SQL Upgrade parquet dependency -- Key: SPARK-7743 URL: https://issues.apache.org/jira/browse/SPARK-7743 Project: Spark Issue Type: Bug Components: SQL Reporter: Thomas Omans There are many outstanding issues with the parquet format that have been resolved between the version depended on by spark (1.6.0rc3 as of spark 1.3.1) and the most recent parquet release (1.6.0). Some of these are things include not supporting schema migration when using parquet with avro, not supporting summary metadata in the parquet footers causing null pointer exceptions reading, and many others. See https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-160 for the full list of fixes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC1)
A couple of other process things: 1. Please *keep voting* (+1/-1) on this thread even if we find some issues, until we cut RC2. This lets us pipeline the QA. 2. The SQL team owes a JIRA clean-up (forthcoming shortly)... there are still a few Blocker's that aren't. On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc1 (commit 777a081): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1092/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Friday, May 22, at 17:03 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Updated] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]
[ https://issues.apache.org/jira/browse/SPARK-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7726: --- Assignee: Iulian Dragos Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher] - Key: SPARK-7726 URL: https://issues.apache.org/jira/browse/SPARK-7726 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Iulian Dragos Priority: Blocker This one took a long time to track down. The Maven install phase is part of our release process. It runs the scala:doc target to generate doc jars. Between Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in a way that breaks our build. In both cases, it returned an error (there has been a long running error here that we've always ignored), however in 2.11.3 that error became fatal and failed the entire build process. The upgrade occurred in SPARK-7092. Here is a simple reproduction: {code} ./dev/change-version-to-2.11.sh mvn clean install -pl network/common -pl network/shuffle -DskipTests -Dscala-2.11 {code} This command exits success when Spark is at Scala 2.11.2 and fails with 2.11.3 or higher. In either case an error is printed: {code} [INFO] [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ spark-network-shuffle_2.11 --- /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56: error: not found: type Type protected Type type() { return Type.UPLOAD_BLOCK; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37: error: not found: type Type protected Type type() { return Type.STREAM_HANDLE; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44: error: not found: type Type protected Type type() { return Type.REGISTER_EXECUTOR; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40: error: not found: type Type protected Type type() { return Type.OPEN_BLOCKS; } ^ model contains 22 documentable templates four errors found {code} Ideally we'd just dig in and fix this error. Unfortunately it's a very confusing error and I have no idea why it is appearing. I'd propose reverting SPARK-7092 in the mean time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]
[ https://issues.apache.org/jira/browse/SPARK-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7726. Resolution: Fixed Fix Version/s: 1.4.0 Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher] - Key: SPARK-7726 URL: https://issues.apache.org/jira/browse/SPARK-7726 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Iulian Dragos Priority: Blocker Fix For: 1.4.0 This one took a long time to track down. The Maven install phase is part of our release process. It runs the scala:doc target to generate doc jars. Between Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in a way that breaks our build. In both cases, it returned an error (there has been a long running error here that we've always ignored), however in 2.11.3 that error became fatal and failed the entire build process. The upgrade occurred in SPARK-7092. Here is a simple reproduction: {code} ./dev/change-version-to-2.11.sh mvn clean install -pl network/common -pl network/shuffle -DskipTests -Dscala-2.11 {code} This command exits success when Spark is at Scala 2.11.2 and fails with 2.11.3 or higher. In either case an error is printed: {code} [INFO] [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ spark-network-shuffle_2.11 --- /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56: error: not found: type Type protected Type type() { return Type.UPLOAD_BLOCK; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37: error: not found: type Type protected Type type() { return Type.STREAM_HANDLE; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44: error: not found: type Type protected Type type() { return Type.REGISTER_EXECUTOR; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40: error: not found: type Type protected Type type() { return Type.OPEN_BLOCKS; } ^ model contains 22 documentable templates four errors found {code} Ideally we'd just dig in and fix this error. Unfortunately it's a very confusing error and I have no idea why it is appearing. I'd propose reverting SPARK-7092 in the mean time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7092) Update spark scala version to 2.11.6
[ https://issues.apache.org/jira/browse/SPARK-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7092. Resolution: Fixed Okay this was re merged in the SPARK-7726 fix: https://github.com/apache/spark/commit/ee012e0ed61fbf5bb819b7489a3a23a03c878f4d Update spark scala version to 2.11.6 Key: SPARK-7092 URL: https://issues.apache.org/jira/browse/SPARK-7092 Project: Spark Issue Type: Improvement Components: Spark Core, Spark Shell Affects Versions: 1.4.0 Reporter: Prashant Sharma Assignee: Prashant Sharma Priority: Minor Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7677) Enable Kafka In Scala 2.11 Build
[ https://issues.apache.org/jira/browse/SPARK-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7677: --- Description: Now that we upgraded Kafka in SPARK-2808 we can enable it in the Scala 2.11 build. Enable Kafka In Scala 2.11 Build Key: SPARK-7677 URL: https://issues.apache.org/jira/browse/SPARK-7677 Project: Spark Issue Type: Sub-task Components: Build Reporter: Patrick Wendell Assignee: Iulian Dragos Now that we upgraded Kafka in SPARK-2808 we can enable it in the Scala 2.11 build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7672) Number format exception with spark.kryoserializer.buffer.mb
[ https://issues.apache.org/jira/browse/SPARK-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7672: --- Priority: Critical (was: Major) Number format exception with spark.kryoserializer.buffer.mb --- Key: SPARK-7672 URL: https://issues.apache.org/jira/browse/SPARK-7672 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Nishkam Ravi Priority: Critical With spark.kryoserializer.buffer.mb 1000 : Exception in thread main java.lang.NumberFormatException: Size must be specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m. Fractional values are not supported. Input was: 100.0 at org.apache.spark.network.util.JavaUtils.parseByteString(JavaUtils.java:238) at org.apache.spark.network.util.JavaUtils.byteStringAsKb(JavaUtils.java:259) at org.apache.spark.util.Utils$.byteStringAsKb(Utils.scala:1037) at org.apache.spark.SparkConf.getSizeAsKb(SparkConf.scala:245) at org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:53) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:269) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:280) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:283) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7672) Number format exception with spark.kryoserializer.buffer.mb
[ https://issues.apache.org/jira/browse/SPARK-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7672: --- Component/s: Spark Core Number format exception with spark.kryoserializer.buffer.mb --- Key: SPARK-7672 URL: https://issues.apache.org/jira/browse/SPARK-7672 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Nishkam Ravi With spark.kryoserializer.buffer.mb 1000 : Exception in thread main java.lang.NumberFormatException: Size must be specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m. Fractional values are not supported. Input was: 100.0 at org.apache.spark.network.util.JavaUtils.parseByteString(JavaUtils.java:238) at org.apache.spark.network.util.JavaUtils.byteStringAsKb(JavaUtils.java:259) at org.apache.spark.util.Utils$.byteStringAsKb(Utils.scala:1037) at org.apache.spark.SparkConf.getSizeAsKb(SparkConf.scala:245) at org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:53) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:269) at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:280) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:283) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7284) Update streaming documentation for Spark 1.4.0 release
[ https://issues.apache.org/jira/browse/SPARK-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7284: --- Priority: Critical (was: Blocker) Update streaming documentation for Spark 1.4.0 release -- Key: SPARK-7284 URL: https://issues.apache.org/jira/browse/SPARK-7284 Project: Spark Issue Type: Improvement Components: Documentation, Streaming Reporter: Tathagata Das Assignee: Tathagata Das Priority: Critical Things to update (continuously updated list) - Python API for Kafka Direct - Pointers to the new Streaming UI - Update Kafka version to 0.8.2.1 - Add ref to RDD.foreachPartitionWithIndex (if merged) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7677) Enable Kafka In Scala 2.11 Build
Patrick Wendell created SPARK-7677: -- Summary: Enable Kafka In Scala 2.11 Build Key: SPARK-7677 URL: https://issues.apache.org/jira/browse/SPARK-7677 Project: Spark Issue Type: Sub-task Reporter: Patrick Wendell Assignee: Iulian Dragos -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6811) Building binary R packages for SparkR
[ https://issues.apache.org/jira/browse/SPARK-6811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6811: --- Assignee: Shivaram Venkataraman Building binary R packages for SparkR - Key: SPARK-6811 URL: https://issues.apache.org/jira/browse/SPARK-6811 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Shivaram Venkataraman Assignee: Shivaram Venkataraman Priority: Blocker We should figure out how to distribute binary packages for SparkR as a part of the release process. R packages for Windows might need to be built separately and we could offer a separate download link for Windows users. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver
[ https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7563: --- Fix Version/s: 1.4.0 OutputCommitCoordinator.stop() should only be executed in driver Key: SPARK-7563 URL: https://issues.apache.org/jira/browse/SPARK-7563 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo) Spark 1.3.1 Release Reporter: Hailong Wen Priority: Critical Fix For: 1.4.0 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with EGO (a resource management product). In EGO we uses fine-grained dynamic allocation policy, and each Executor will exit after its tasks are all done. When testing *spark-shell*, we find that when executor of first job exit, it will stop OutputCommitCoordinator, which result in all future jobs failing. Details are as follows: We got the following error in executor when submitting job in *spark-shell* the second time (the first job submission is successful): {noformat} 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to OutputCommitCoordinator: akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator Exception in thread main akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), Path(/user/OutputCommitCoordinator)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {noformat} And in driver side, we see a log message telling that the OutputCommitCoordinator is stopped after the first submission: {noformat} 15/05/11 04:01:23 INFO spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped! {noformat} We examine the code of OutputCommitCoordinator, and find that executor will reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor exits, it will eventually call SparkEnv.stop(): {noformat} private[spark] def stop() { isStopped = true pythonWorkers.foreach { case(key, worker) = worker.stop() } Option(httpFileServer).foreach(_.stop()) mapOutputTracker.stop() shuffleManager.stop() broadcastManager.stop() blockManager.stop() blockManager.master.stop() metricsSystem.stop() outputCommitCoordinator.stop
[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver
[ https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7563: --- Target Version/s: 1.3.2, 1.4.0 OutputCommitCoordinator.stop() should only be executed in driver Key: SPARK-7563 URL: https://issues.apache.org/jira/browse/SPARK-7563 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo) Spark 1.3.1 Release Reporter: Hailong Wen Priority: Critical Fix For: 1.4.0 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with EGO (a resource management product). In EGO we uses fine-grained dynamic allocation policy, and each Executor will exit after its tasks are all done. When testing *spark-shell*, we find that when executor of first job exit, it will stop OutputCommitCoordinator, which result in all future jobs failing. Details are as follows: We got the following error in executor when submitting job in *spark-shell* the second time (the first job submission is successful): {noformat} 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to OutputCommitCoordinator: akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator Exception in thread main akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), Path(/user/OutputCommitCoordinator)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {noformat} And in driver side, we see a log message telling that the OutputCommitCoordinator is stopped after the first submission: {noformat} 15/05/11 04:01:23 INFO spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped! {noformat} We examine the code of OutputCommitCoordinator, and find that executor will reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor exits, it will eventually call SparkEnv.stop(): {noformat} private[spark] def stop() { isStopped = true pythonWorkers.foreach { case(key, worker) = worker.stop() } Option(httpFileServer).foreach(_.stop()) mapOutputTracker.stop() shuffleManager.stop() broadcastManager.stop() blockManager.stop() blockManager.master.stop() metricsSystem.stop() outputCommitCoordinator.stop
[jira] [Resolved] (SPARK-7677) Enable Kafka In Scala 2.11 Build
[ https://issues.apache.org/jira/browse/SPARK-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7677. Resolution: Fixed Fix Version/s: 1.4.0 Fixed by pull request: https://github.com/apache/spark/pull/6149 Enable Kafka In Scala 2.11 Build Key: SPARK-7677 URL: https://issues.apache.org/jira/browse/SPARK-7677 Project: Spark Issue Type: Sub-task Components: Build Reporter: Patrick Wendell Assignee: Iulian Dragos Fix For: 1.4.0 Now that we upgraded Kafka in SPARK-2808 we can enable it in the Scala 2.11 build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7355) FlakyTest - o.a.s.DriverSuite
[ https://issues.apache.org/jira/browse/SPARK-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7355: --- Priority: Critical (was: Blocker) FlakyTest - o.a.s.DriverSuite - Key: SPARK-7355 URL: https://issues.apache.org/jira/browse/SPARK-7355 Project: Spark Issue Type: Test Components: Spark Core, Tests Reporter: Tathagata Das Assignee: Andrew Or Priority: Critical Labels: flaky-test -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7644) Ensure all scoped RDD operations are tested and cleaned
[ https://issues.apache.org/jira/browse/SPARK-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7644: --- Priority: Critical (was: Blocker) Ensure all scoped RDD operations are tested and cleaned --- Key: SPARK-7644 URL: https://issues.apache.org/jira/browse/SPARK-7644 Project: Spark Issue Type: Bug Components: Spark Core, SQL, Streaming Affects Versions: 1.4.0 Reporter: Andrew Or Assignee: Andrew Or Priority: Critical If all goes well, this will be a Won't Fix. Before releasing we should make sure all operations wrapped in `RDDOperationScope.withScope` are actually tested and enclosed closures are actually cleaned. This is because a big change went into `ClosureCleaner` and wrapping methods in closures may change whether they are serializable. TL;DR we should run all the wrapped operations to make sure we don't run into java.lang.NotSerializableException. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546394#comment-14546394 ] Patrick Wendell commented on SPARK-2883: Since this is a feature I'm going to drop it down to critical priority since we'll start the release candidates soon. However, I think it's fine to slip this in between RC's because it's purely additive, so IMO it's very likely this will make it into Spark 1.4. Spark Support for ORCFile format Key: SPARK-2883 URL: https://issues.apache.org/jira/browse/SPARK-2883 Project: Spark Issue Type: Bug Components: Input/Output, SQL Reporter: Zhan Zhang Priority: Blocker Attachments: 2014-09-12 07.05.24 pm Spark UI.png, 2014-09-12 07.07.19 pm jobtracker.png, orc.diff Verify the support of OrcInputFormat in spark, fix issues if exists and add documentation of its usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2883) Spark Support for ORCFile format
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2883: --- Priority: Critical (was: Blocker) Spark Support for ORCFile format Key: SPARK-2883 URL: https://issues.apache.org/jira/browse/SPARK-2883 Project: Spark Issue Type: Bug Components: Input/Output, SQL Reporter: Zhan Zhang Priority: Critical Attachments: 2014-09-12 07.05.24 pm Spark UI.png, 2014-09-12 07.07.19 pm jobtracker.png, orc.diff Verify the support of OrcInputFormat in spark, fix issues if exists and add documentation of its usage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver
[ https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546468#comment-14546468 ] Patrick Wendell commented on SPARK-7563: I pulled the fix into 1.4.0, but not yet 1.3.2 (didn't feel comfortable doing the backport). OutputCommitCoordinator.stop() should only be executed in driver Key: SPARK-7563 URL: https://issues.apache.org/jira/browse/SPARK-7563 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo) Spark 1.3.1 Release Reporter: Hailong Wen Priority: Critical Fix For: 1.4.0 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with EGO (a resource management product). In EGO we uses fine-grained dynamic allocation policy, and each Executor will exit after its tasks are all done. When testing *spark-shell*, we find that when executor of first job exit, it will stop OutputCommitCoordinator, which result in all future jobs failing. Details are as follows: We got the following error in executor when submitting job in *spark-shell* the second time (the first job submission is successful): {noformat} 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to OutputCommitCoordinator: akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator Exception in thread main akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), Path(/user/OutputCommitCoordinator)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {noformat} And in driver side, we see a log message telling that the OutputCommitCoordinator is stopped after the first submission: {noformat} 15/05/11 04:01:23 INFO spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped! {noformat} We examine the code of OutputCommitCoordinator, and find that executor will reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor exits, it will eventually call SparkEnv.stop(): {noformat} private[spark] def stop() { isStopped = true pythonWorkers.foreach { case(key, worker) = worker.stop() } Option(httpFileServer).foreach(_.stop()) mapOutputTracker.stop() shuffleManager.stop() broadcastManager.stop() blockManager.stop
[jira] [Resolved] (SPARK-5920) Use a BufferedInputStream to read local shuffle data
[ https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5920. Resolution: Won't Fix Per the discussion on this PR I am resolving this as won't fix. https://github.com/apache/spark/pull/4878 [~kayousterhout] please feel free to re-open if I misinterpreted. Use a BufferedInputStream to read local shuffle data Key: SPARK-5920 URL: https://issues.apache.org/jira/browse/SPARK-5920 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 1.2.1, 1.3.0 Reporter: Kay Ousterhout Assignee: Kay Ousterhout Priority: Blocker When reading local shuffle data, Spark doesn't currently buffer the local reads into larger chunks, which can lead to terrible disk performance if many tasks are concurrently reading local data from the same disk. We should use a BufferedInputStream to mitigate this problem; we can lazily create the input stream to avoid allocating a bunch of in-memory buffers at the same time for tasks that read shuffle data from a large number of local blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7532) Make StreamingContext.start() idempotent
[ https://issues.apache.org/jira/browse/SPARK-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7532: --- Fix Version/s: 1.4.0 Make StreamingContext.start() idempotent Key: SPARK-7532 URL: https://issues.apache.org/jira/browse/SPARK-7532 Project: Spark Issue Type: Bug Components: Streaming Reporter: Tathagata Das Assignee: Tathagata Das Priority: Blocker Fix For: 1.4.0 Currently calling StreamingContext.start() throws error when the context is already started. This is inconsistent with the the StreamingContext.stop() which is idempotent, that is, called stop() on a stopped context is a no-op. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7228) SparkR public API for 1.4 release
[ https://issues.apache.org/jira/browse/SPARK-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7228: --- Fix Version/s: 1.4.0 SparkR public API for 1.4 release - Key: SPARK-7228 URL: https://issues.apache.org/jira/browse/SPARK-7228 Project: Spark Issue Type: Umbrella Components: SparkR Affects Versions: 1.4.0 Reporter: Shivaram Venkataraman Assignee: Shivaram Venkataraman Priority: Blocker Fix For: 1.4.0 This in an umbrella ticket to track the public APIs and documentation to be released as a part of SparkR in the 1.4 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Tentative due dates for Spark 1.3.2 release
Hi Niranda, Maintenance releases are not done on a predetermined schedule but instead according to which fixes show up and their severity. Since we just did a 1.3.1 release I'm not sure I see 1.3.2 on the immediate horizon. However, the maintenance releases are simply builds at the head of the respective release branches (in this case branch-1.3). They never introduce new API's. If you have a particular bug fix you are waiting for, you can always build Spark off of that branch. - Patrick On Fri, May 15, 2015 at 12:46 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, May I know the tentative release dates for spark 1.3.2? rgds -- Niranda - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Adding/Using More Resolution Types on JIRA
If there is no further feedback on this I will ask ASF Infra to add the new fields Out of Scope and Inactive. - Patrick On Tue, May 12, 2015 at 9:02 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I tend to find that any large project has a lot of walking dead JIRAs, and pretending they are simply Open causes problems. Any state is better for these, so I favor this. Agreed. Inactive: A way to clear out inactive/dead JIRA's without indicating a decision has been made one way or the other. This is a good idea, and perhaps the process of closing JIRAs as Inactive can be automated. If nothing about a JIRA has changed in 12 months or more (e.g. current oldest open Spark issue; dates to Aug 2013: SPARK-867), perhaps a bot can mark it as such for us. (Here's a list of stale issues). This doesn't mean the issue is invalid or won't be addressed, but it gets it out of the Open queue, which ideally should be a high churn queue (e.g. stuff doesn't stay in there forever). Nick On Tue, May 12, 2015 at 4:49 AM Sean Owen so...@cloudera.com wrote: I tend to find that any large project has a lot of walking dead JIRAs, and pretending they are simply Open causes problems. Any state is better for these, so I favor this. The possible objection is that this will squash or hide useful issues, but in practice we have the opposite problem. Resolved issues are still searchable by default, and, people aren't shy about opening duplicates anyway. At least the semantics Later do not discourage a diligent searcher from considering commenting on and reopening such an archived JIRA. Patrick this could piggy back on INFRA-9513. As a corollary I would welcome deciding that Target Version should be used more narrowly to mean 'I really mean to help resolve this for the indicated version'. Setting it to a future version just to mean Later should instead turn into resolving the JIRA. Last: if JIRAs are regularly ice-boxed this way, I think it should trigger some reflection. Why are these JIRAs going nowhere? For completely normal reasons or does it mean too many TODOs are filed and forgotten? That's no comment on the current state, just something to watch. So: yes I like the idea. On May 12, 2015 8:50 AM, Patrick Wendell pwend...@gmail.com wrote: In Spark we sometimes close issues as something other than Fixed, and this is an important part of maintaining our JIRA. The current resolution types we use are the following: Won't Fix - bug fix or (more often) feature we don't want to add Invalid - issue is underspecified or not appropriate for a JIRA issue Duplicate - duplicate of another JIRA Cannot Reproduce - bug that could not be reproduced Not A Problem - issue purports to represent a bug, but does not I would like to propose adding a few new resolutions. This will require modifying the ASF JIRA, but infra said they are open to proposals as long as they are considered of broad interest. My issue with the current set of resolutions are that Won't Fix is a big catch all we use for many different things. Most often it's used for things that aren't even bugs even though it has Fix in the name. I'm proposing adding: Inactive - A feature or bug that has had no activity from users or developers in a long time Out of Scope - A feature proposal that is not in scope given the projects goals Later - A feature not on the immediate roadmap, but potentially of interest longer term (this one already exists, I'm just proposing to start using it) I am in no way proposing changes to the decision making model around JIRA's, notably that it is consensus based and that all resolutions are considered tentative and fully reversible. The benefits I see of this change would be the following: 1. Inactive: A way to clear out inactive/dead JIRA's without indicating a decision has been made one way or the other. 2. Out of Scope: It more clearly explains closing out-of-scope features than the generic Won't Fix. Also makes it more clear to future contributors what is considered in scope for Spark. 3. Later: A way to signal that issues aren't targeted for a near term version. This would help avoid the mess we have now of like 200+ issues targeted at each version and target version being a very bad indicator of actual roadmap. An alternative on this one is to have a version called Later or Parking Lot but not close the issues. Any thoughts on this? - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Recent Spark test failures
The PR builder currently builds against Hadoop 2.3. - Patrick On Fri, May 15, 2015 at 11:40 AM, Marcelo Vanzin van...@cloudera.com wrote: Funny thing, since I asked this question in a PR a few minutes ago... Ignoring the rotation suggestion for a second, can the PR builder at least cover hadoop 2.2? That's the actual version used to create the official Spark artifacts for maven, and the oldest version Spark supports for YARN.. Kinda the same argument as the why do we build with java 7 when we support java 6 discussion we had recently. On Fri, May 15, 2015 at 11:34 AM, Ted Yu yuzhih...@gmail.com wrote: bq. would be prohibitive to build all configurations for every push Agreed. Can PR builder rotate testing against hadoop 2.3, 2.4, 2.6 and 2.7 (each test run still uses one hadoop profile) ? This way we would have some coverage for each of the major hadoop releases. Cheers On Fri, May 15, 2015 at 10:30 AM, Sean Owen so...@cloudera.com wrote: You all are looking only at the pull request builder. It just does one build to sanity-check a pull request, since that already takes 2 hours and would be prohibitive to build all configurations for every push. There is a different set of Jenkins jobs that periodically tests master against a lot more configurations, including Hadoop 2.4. On Fri, May 15, 2015 at 6:02 PM, Frederick R Reiss frre...@us.ibm.com wrote: The PR builder seems to be building against Hadoop 2.3. In the log for the most recent successful build ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32805/consoleFull ) I see: = Building Spark = [info] Compile with Hive 0.13.1 [info] Building Spark with these arguments: -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver ... = Running Spark unit tests = [info] Running Spark tests with these arguments: -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl test Is anyone testing individual pull requests against Hadoop 2.4 or 2.6 before the code is declared clean? Fred [image: Inactive hide details for Ted Yu ---05/15/2015 09:29:09 AM---Jenkins build against hadoop 2.4 has been unstable recently: https]Ted Yu ---05/15/2015 09:29:09 AM---Jenkins build against hadoop 2.4 has been unstable recently: https://amplab.cs.berkeley.edu/jenkins/ From: Ted Yu yuzhih...@gmail.com To: Andrew Or and...@databricks.com Cc: dev@spark.apache.org dev@spark.apache.org Date: 05/15/2015 09:29 AM Subject: Re: Recent Spark test failures -- Jenkins build against hadoop 2.4 has been unstable recently: *https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/* https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ I haven't found the test which hung / failed in recent Jenkins builds. But PR builder has several green builds lately: *https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/* https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ Maybe PR builder doesn't build against hadoop 2.4 ? Cheers On Mon, May 11, 2015 at 1:11 PM, Ted Yu *yuzhih...@gmail.com* yuzhih...@gmail.com wrote: Makes sense. Having high determinism in these tests would make Jenkins build stable. On Mon, May 11, 2015 at 1:08 PM, Andrew Or *and...@databricks.com* and...@databricks.com wrote: Hi Ted, Yes, those two options can be useful, but in general I think the standard to set is that tests should never fail. It's actually the worst if tests fail sometimes but not others, because we can't reproduce them deterministically. Using -M and -A actually tolerates flaky tests to a certain extent, and I would prefer to instead increase the determinism in these tests. -Andrew 2015-05-08 17:56 GMT-07:00 Ted Yu *yuzhih...@gmail.com* yuzhih...@gmail.com: Andrew: Do you think the -M and -A options described here can be used in test runs ? *http://scalatest.org/user_guide/using_the_runner* http://scalatest.org/user_guide/using_the_runner Cheers On Wed, May 6, 2015 at 5:41 PM, Andrew Or *and...@databricks.com* and...@databricks.com wrote: Dear all, I'm sure you have all noticed that the Spark tests have been fairly unstable recently. I wanted to share a tool that I use to track which tests have been failing most often in order to prioritize fixing
[jira] [Updated] (SPARK-5632) not able to resolve dot('.') in field name
[ https://issues.apache.org/jira/browse/SPARK-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5632: --- Fix Version/s: 1.4.0 not able to resolve dot('.') in field name -- Key: SPARK-5632 URL: https://issues.apache.org/jira/browse/SPARK-5632 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 1.2.0, 1.3.0 Environment: Spark cluster: EC2 m1.small + Spark 1.2.0 Cassandra cluster: EC2 m3.xlarge + Cassandra 2.1.2 Reporter: Lishu Liu Priority: Blocker Fix For: 1.4.0 My cassandra table task_trace has a field sm.result which contains dot in the name. So SQL tried to look up sm instead of full name 'sm.result'. Here is my code: {code} scala import org.apache.spark.sql.cassandra.CassandraSQLContext scala val cc = new CassandraSQLContext(sc) scala val task_trace = cc.jsonFile(/task_trace.json) scala task_trace.registerTempTable(task_trace) scala cc.setKeyspace(cerberus_data_v4) scala val res = cc.sql(SELECT received_datetime, task_body.cerberus_id, task_body.sm.result FROM task_trace WHERE task_id = 'fff7304e-9984-4b45-b10c-0423a96745ce') res: org.apache.spark.sql.SchemaRDD = SchemaRDD[57] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == java.lang.RuntimeException: No such struct field sm in cerberus_batch_id, cerberus_id, couponId, coupon_code, created, description, domain, expires, message_id, neverShowAfter, neverShowBefore, offerTitle, screenshots, sm.result, sm.task, startDate, task_id, url, uuid, validationDateTime, validity {code} The full schema look like this: {code} scala task_trace.printSchema() root \|-- received_datetime: long (nullable = true) \|-- task_body: struct (nullable = true) \|\|-- cerberus_batch_id: string (nullable = true) \|\|-- cerberus_id: string (nullable = true) \|\|-- couponId: integer (nullable = true) \|\|-- coupon_code: string (nullable = true) \|\|-- created: string (nullable = true) \|\|-- description: string (nullable = true) \|\|-- domain: string (nullable = true) \|\|-- expires: string (nullable = true) \|\|-- message_id: string (nullable = true) \|\|-- neverShowAfter: string (nullable = true) \|\|-- neverShowBefore: string (nullable = true) \|\|-- offerTitle: string (nullable = true) \|\|-- screenshots: array (nullable = true) \|\|\|-- element: string (containsNull = false) \|\|-- sm.result: struct (nullable = true) \|\|\|-- cerberus_batch_id: string (nullable = true) \|\|\|-- cerberus_id: string (nullable = true) \|\|\|-- code: string (nullable = true) \|\|\|-- couponId: integer (nullable = true) \|\|\|-- created: string (nullable = true) \|\|\|-- description: string (nullable = true) \|\|\|-- domain: string (nullable = true) \|\|\|-- expires: string (nullable = true) \|\|\|-- message_id: string (nullable = true) \|\|\|-- neverShowAfter: string (nullable = true) \|\|\|-- neverShowBefore: string (nullable = true) \|\|\|-- offerTitle: string (nullable = true) \|\|\|-- result: struct (nullable = true) \|\|\|\|-- post: struct (nullable = true) \|\|\|\|\|-- alchemy_out_of_stock: struct (nullable = true) \|\|\|\|\|\|-- ci: double (nullable = true) \|\|\|\|\|\|-- value: boolean (nullable = true) \|\|\|\|\|-- meta: struct (nullable = true) \|\|\|\|\|\|-- None_tx_value: array (nullable = true) \|\|\|\|\|\|\|-- element: string (containsNull = false) \|\|\|\|\|\|-- exceptions: array (nullable = true) \|\|\|\|\|\|\|-- element: string (containsNull = false) \|\|\|\|\|\|-- no_input_value: array (nullable = true) \|\|\|\|\|\|\|-- element: string (containsNull = false) \|\|\|\|\|\|-- not_mapped: array (nullable = true) \|\|\|\|\|\|\|-- element: string (containsNull = false) \|\|\|\|\|\|-- not_transformed: array (nullable = true) \|\|\|\|\|\|\|-- element: array (containsNull = false) \|\|\|\|\|\|\|\|-- element: string (containsNull = false) \|\|\|\|\|-- now_price_checkout: struct (nullable = true) \|\|\|\|\|\|-- ci: double (nullable = true) \|\|\|\|\|\|-- value: double (nullable = true) \|\|\|\|\|-- shipping_price: struct (nullable = true) \|\|\|\|\|\|-- ci: double
Re: Recent Spark test failures
Sorry premature send: The PR builder currently builds against Hadoop 2.3 https://github.com/apache/spark/blob/master/dev/run-tests#L54 We can set this to whatever we want. 2.2 might make sense since it's the default in our published artifacts. - Patrick On Fri, May 15, 2015 at 11:53 AM, Patrick Wendell pwend...@gmail.com wrote: The PR builder currently builds against Hadoop 2.3. - Patrick On Fri, May 15, 2015 at 11:40 AM, Marcelo Vanzin van...@cloudera.com wrote: Funny thing, since I asked this question in a PR a few minutes ago... Ignoring the rotation suggestion for a second, can the PR builder at least cover hadoop 2.2? That's the actual version used to create the official Spark artifacts for maven, and the oldest version Spark supports for YARN.. Kinda the same argument as the why do we build with java 7 when we support java 6 discussion we had recently. On Fri, May 15, 2015 at 11:34 AM, Ted Yu yuzhih...@gmail.com wrote: bq. would be prohibitive to build all configurations for every push Agreed. Can PR builder rotate testing against hadoop 2.3, 2.4, 2.6 and 2.7 (each test run still uses one hadoop profile) ? This way we would have some coverage for each of the major hadoop releases. Cheers On Fri, May 15, 2015 at 10:30 AM, Sean Owen so...@cloudera.com wrote: You all are looking only at the pull request builder. It just does one build to sanity-check a pull request, since that already takes 2 hours and would be prohibitive to build all configurations for every push. There is a different set of Jenkins jobs that periodically tests master against a lot more configurations, including Hadoop 2.4. On Fri, May 15, 2015 at 6:02 PM, Frederick R Reiss frre...@us.ibm.com wrote: The PR builder seems to be building against Hadoop 2.3. In the log for the most recent successful build ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32805/consoleFull ) I see: = Building Spark = [info] Compile with Hive 0.13.1 [info] Building Spark with these arguments: -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver ... = Running Spark unit tests = [info] Running Spark tests with these arguments: -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl test Is anyone testing individual pull requests against Hadoop 2.4 or 2.6 before the code is declared clean? Fred [image: Inactive hide details for Ted Yu ---05/15/2015 09:29:09 AM---Jenkins build against hadoop 2.4 has been unstable recently: https]Ted Yu ---05/15/2015 09:29:09 AM---Jenkins build against hadoop 2.4 has been unstable recently: https://amplab.cs.berkeley.edu/jenkins/ From: Ted Yu yuzhih...@gmail.com To: Andrew Or and...@databricks.com Cc: dev@spark.apache.org dev@spark.apache.org Date: 05/15/2015 09:29 AM Subject: Re: Recent Spark test failures -- Jenkins build against hadoop 2.4 has been unstable recently: *https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/* https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ I haven't found the test which hung / failed in recent Jenkins builds. But PR builder has several green builds lately: *https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/* https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/ Maybe PR builder doesn't build against hadoop 2.4 ? Cheers On Mon, May 11, 2015 at 1:11 PM, Ted Yu *yuzhih...@gmail.com* yuzhih...@gmail.com wrote: Makes sense. Having high determinism in these tests would make Jenkins build stable. On Mon, May 11, 2015 at 1:08 PM, Andrew Or *and...@databricks.com* and...@databricks.com wrote: Hi Ted, Yes, those two options can be useful, but in general I think the standard to set is that tests should never fail. It's actually the worst if tests fail sometimes but not others, because we can't reproduce them deterministically. Using -M and -A actually tolerates flaky tests to a certain extent, and I would prefer to instead increase the determinism in these tests. -Andrew 2015-05-08 17:56 GMT-07:00 Ted Yu *yuzhih...@gmail.com* yuzhih...@gmail.com: Andrew: Do you think the -M and -A options described here can be used in test runs ? *http://scalatest.org/user_guide/using_the_runner* http://scalatest.org/user_guide/using_the_runner Cheers On Wed, May 6, 2015 at 5:41 PM, Andrew
[jira] [Updated] (SPARK-6595) DataFrame self joins with MetastoreRelations fail
[ https://issues.apache.org/jira/browse/SPARK-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6595: --- Fix Version/s: 1.4.0 1.3.2 DataFrame self joins with MetastoreRelations fail - Key: SPARK-6595 URL: https://issues.apache.org/jira/browse/SPARK-6595 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: Michael Armbrust Assignee: Michael Armbrust Priority: Blocker Fix For: 1.3.2, 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544118#comment-14544118 ] Patrick Wendell commented on SPARK-4128: Thanks for bringing this back up [~srowen]. When you removed this I reached out but we discussed offline and concluded that in IDEA 14 maybe it wasn't necessary (because IIRC you had got it working without making these changes). But maybe it is still needed. Create instructions on fully building Spark in Intellij --- Key: SPARK-4128 URL: https://issues.apache.org/jira/browse/SPARK-4128 Project: Spark Issue Type: Improvement Components: Documentation Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Fix For: 1.2.0 With some of our more complicated modules, I'm not sure whether Intellij correctly understands all source locations. Also, we might require specifying some profiles for the build to work directly. We should document clearly how to start with vanilla Spark master and get the entire thing building in Intellij. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: How to link code pull request with JIRA ID?
Yeah I wrote the original script and I intentionally made it easy for other projects to use (you'll just need to tweak some variables at the top). You just need somewhere to run it... we were using a jenkins cluster to run it every 5 minutes. BTW - I looked and there is one instance where it hard cores the string SPARK-, but that should be easy to change. I'm happy to review a patch that makes that prefix a variable. https://github.com/apache/spark/blob/master/dev/github_jira_sync.py#L71 - Patrick On Thu, May 14, 2015 at 8:45 AM, Josh Rosen rosenvi...@gmail.com wrote: Spark PRs didn't always used to handle the JIRA linking. We used to rely on a Jenkins job that ran https://github.com/apache/spark/blob/master/dev/github_jira_sync.py. We switched this over to Spark PRs at a time when the Jenkins GitHub Pull Request Builder plugin was having flakiness issues, but as far as I know that old script should still work. On Wed, May 13, 2015 at 9:40 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: There's no magic to it. We're doing the same, except Josh automated it in the PR dashboard he created. https://spark-prs.appspot.com/ Nick On Wed, May 13, 2015 at 6:20 PM Markus Weimer mar...@weimo.de wrote: Hi, how did you set this up? Over in the REEF incubation project, we painstakingly create the forwards- and backwards links despite having the IDs in the PR descriptions... Thanks! Markus On 2015-05-13 11:56, Ted Yu wrote: Subproject tag should follow SPARK JIRA number. e.g. [SPARK-5277][SQL] ... Cheers On Wed, May 13, 2015 at 11:50 AM, Stephen Boesch java...@gmail.com wrote: following up from Nicholas, it is [SPARK-12345] Your PR description where 12345 is the jira number. One thing I tend to forget is when/where to include the subproject tag e.g. [MLLIB] 2015-05-13 11:11 GMT-07:00 Nicholas Chammas nicholas.cham...@gmail.com : That happens automatically when you open a PR with the JIRA key in the PR title. On Wed, May 13, 2015 at 2:10 PM Chandrashekhar Kotekar shekhar.kote...@gmail.com wrote: Hi, I am new to open source contribution and trying to understand the process starting from pulling code to uploading patch. I have managed to pull code from GitHub. In JIRA I saw that each JIRA issue is connected with pull request. I would like to know how do people attach pull request details to JIRA issue? Thanks, Chandrash3khar Kotekar Mobile - +91 8600011455 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Resolved] (SPARK-7297) Make timeline more discoverable
[ https://issues.apache.org/jira/browse/SPARK-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7297. Resolution: Fixed Make timeline more discoverable --- Key: SPARK-7297 URL: https://issues.apache.org/jira/browse/SPARK-7297 Project: Spark Issue Type: Sub-task Components: Web UI Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker Currently there is a small drop down triangle. I showed this to many people and they said they couldn't easily find it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver
[ https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544290#comment-14544290 ] Patrick Wendell commented on SPARK-7563: /cc [~joshrosen] I think this is caused by the output committer change you worked on. Probably just a corner case here when executors die in the spark shell. OutputCommitCoordinator.stop() should only be executed in driver Key: SPARK-7563 URL: https://issues.apache.org/jira/browse/SPARK-7563 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo) Spark 1.3.1 Release Reporter: Hailong Wen I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with EGO (a resource management product). In EGO we uses fine-grained dynamic allocation policy, and each Executor will exit after its tasks are all done. When testing *spark-shell*, we find that when executor of first job exit, it will stop OutputCommitCoordinator, which result in all future jobs failing. Details are as follows: We got the following error in executor when submitting job in *spark-shell* the second time (the first job submission is successful): {noformat} 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to OutputCommitCoordinator: akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator Exception in thread main akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), Path(/user/OutputCommitCoordinator)] at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65) at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74) at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110) at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {noformat} And in driver side, we see a log message telling that the OutputCommitCoordinator is stopped after the first submission: {noformat} 15/05/11 04:01:23 INFO spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: OutputCommitCoordinator stopped! {noformat} We examine the code of OutputCommitCoordinator, and find that executor will reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor exits, it will eventually call SparkEnv.stop(): {noformat} private[spark] def stop() { isStopped = true pythonWorkers.foreach { case(key, worker) = worker.stop() } Option(httpFileServer).foreach(_.stop()) mapOutputTracker.stop() shuffleManager.stop() broadcastManager.stop() blockManager.stop
[jira] [Updated] (SPARK-7063) Update lz4 for Java 7 to avoid: when lz4 compression is used, it causes core dump
[ https://issues.apache.org/jira/browse/SPARK-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7063: --- Target Version/s: 1.5.0 (was: 2+) Update lz4 for Java 7 to avoid: when lz4 compression is used, it causes core dump - Key: SPARK-7063 URL: https://issues.apache.org/jira/browse/SPARK-7063 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Environment: IBM JDK Reporter: Jenny MA Priority: Minor this issue is initially noticed by using IBM JDK, below please find the stack track of this issue, caused by violating the rule in critical section. #0 0x00314340f3cb in raise () from /service/pmrs/45638/20/lib64/libpthread.so.0 #1 0x7f795b0323be in j9dump_create () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9prt27.so #2 0x7f795a88ba2a in doSystemDump () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so #3 0x7f795b0405d5 in j9sig_protect () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9prt27.so #4 0x7f795a88a1fd in runDumpFunction () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so #5 0x7f795a88dbab in runDumpAgent () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so #6 0x7f795a8a1c49 in triggerDumpAgents () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so #7 0x7f795a4518fe in doTracePoint () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9trc27.so #8 0x7f795a45210e in j9Trace () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9trc27.so #9 0x7f79590e46e1 in MM_StandardAccessBarrier::jniReleasePrimitiveArrayCritical(J9VMThread*, _jarray*, void*, int) () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9gc27.so #10 0x7f7938bc397c in Java_net_jpountz_lz4_LZ4JNI_LZ4_1compress_1limitedOutput () from /service/pmrs/45638/20/tmp/liblz4-java7155003924599399415.so #11 0x7f795b707149 in VMprJavaSendNative () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9vm27.so #12 0x in ?? () this is an issue introduced by a bug in net.jpountz.lz4.lz4-1.2.0.jar, and fixed in 1.3.0 version. Sun JDK /Open JDK doesn't complain this issue, but this issue will trigger assertion failure when IBM JDK is used. here is the link to the fix https://github.com/jpountz/lz4-java/commit/07229aa2f788229ab4f50379308297f428e3d2d2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7063) Update lz4 for Java 7 to avoid: when lz4 compression is used, it causes core dump
[ https://issues.apache.org/jira/browse/SPARK-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544359#comment-14544359 ] Patrick Wendell commented on SPARK-7063: [~srowen] so I think maybe we can pull this into master now, given that we'll drop 1.6 in Spark 1.5 (?) Update lz4 for Java 7 to avoid: when lz4 compression is used, it causes core dump - Key: SPARK-7063 URL: https://issues.apache.org/jira/browse/SPARK-7063 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.1 Environment: IBM JDK Reporter: Jenny MA Priority: Minor this issue is initially noticed by using IBM JDK, below please find the stack track of this issue, caused by violating the rule in critical section. #0 0x00314340f3cb in raise () from /service/pmrs/45638/20/lib64/libpthread.so.0 #1 0x7f795b0323be in j9dump_create () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9prt27.so #2 0x7f795a88ba2a in doSystemDump () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so #3 0x7f795b0405d5 in j9sig_protect () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9prt27.so #4 0x7f795a88a1fd in runDumpFunction () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so #5 0x7f795a88dbab in runDumpAgent () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so #6 0x7f795a8a1c49 in triggerDumpAgents () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so #7 0x7f795a4518fe in doTracePoint () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9trc27.so #8 0x7f795a45210e in j9Trace () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9trc27.so #9 0x7f79590e46e1 in MM_StandardAccessBarrier::jniReleasePrimitiveArrayCritical(J9VMThread*, _jarray*, void*, int) () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9gc27.so #10 0x7f7938bc397c in Java_net_jpountz_lz4_LZ4JNI_LZ4_1compress_1limitedOutput () from /service/pmrs/45638/20/tmp/liblz4-java7155003924599399415.so #11 0x7f795b707149 in VMprJavaSendNative () from /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9vm27.so #12 0x in ?? () this is an issue introduced by a bug in net.jpountz.lz4.lz4-1.2.0.jar, and fixed in 1.3.0 version. Sun JDK /Open JDK doesn't complain this issue, but this issue will trigger assertion failure when IBM JDK is used. here is the link to the fix https://github.com/jpountz/lz4-java/commit/07229aa2f788229ab4f50379308297f428e3d2d2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7622) Test Jira
Patrick Wendell created SPARK-7622: -- Summary: Test Jira Key: SPARK-7622 URL: https://issues.apache.org/jira/browse/SPARK-7622 Project: Spark Issue Type: Bug Reporter: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7622) Test Jira
[ https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7622: Test Jira - Key: SPARK-7622 URL: https://issues.apache.org/jira/browse/SPARK-7622 Project: Spark Issue Type: Bug Reporter: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-7622) Test Jira
[ https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell closed SPARK-7622. -- Resolution: Invalid Test Jira - Key: SPARK-7622 URL: https://issues.apache.org/jira/browse/SPARK-7622 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [IMPORTANT] Committers please update merge script
Hi All - unfortunately the fix introduced another bug, which is that fixVersion was not updated properly. I've updated the script and had one other person test it. So committers please pull from master again thanks! - Patrick On Tue, May 12, 2015 at 6:25 PM, Patrick Wendell pwend...@gmail.com wrote: Due to an ASF infrastructure change (bug?) [1] the default JIRA resolution status has switched to Pending Closed. I've made a change to our merge script to coerce the correct status of Fixed when resolving [2]. Please upgrade the merge script to master. I've manually corrected JIRA's that were closed with the incorrect status. Let me know if you have any issues. [1] https://issues.apache.org/jira/browse/INFRA-9646 [2] https://github.com/apache/spark/commit/1b9e434b6c19f23a01e9875a3c1966cd03ce8e2d - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Resolved] (SPARK-7622) Test Jira
[ https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7622. Resolution: Invalid Test Jira - Key: SPARK-7622 URL: https://issues.apache.org/jira/browse/SPARK-7622 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Fix For: 1.6.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7622) Test Jira
[ https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7622: Test Jira - Key: SPARK-7622 URL: https://issues.apache.org/jira/browse/SPARK-7622 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Fix For: 1.6.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7531) Install GPG on Jenkins machines
[ https://issues.apache.org/jira/browse/SPARK-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7531: --- Fix Version/s: 1.4.0 Install GPG on Jenkins machines --- Key: SPARK-7531 URL: https://issues.apache.org/jira/browse/SPARK-7531 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Patrick Wendell Assignee: shane knapp Fix For: 1.4.0 This one is also required for us to cut regular snapshot releases from Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Change for submitting to yarn in 1.3.1
Hey Chester, Thanks for sending this. It's very helpful to have this list. The reason we made the Client API private was that it was never intended to be used by third parties programmatically and we don't intend to support it in its current form as a stable API. We thought the fact that it was for internal use would be obvious since it accepts arguments as a string array of CL args. It was always intended for command line use and the stable API was the command line. When we migrated the Launcher library we figured we covered most of the use cases in the off chance someone was using the Client. It appears we regressed one feature which was a clean way to get the app ID. The items you list here 2-6 all seem like new feature requests rather than a regression caused by us making that API private. I think the way to move forward is for someone to design a proper long-term stable API for the things you mentioned here. That could either be by extension of the Launcher library. Marcelo would be natural to help with this effort since he was heavily involved in both YARN support and the launcher. So I'm curious to hear his opinion on how best to move forward. I do see how apps that run Spark would benefit of having a control plane for querying status, both on YARN and elsewhere. - Patrick On Wed, May 13, 2015 at 5:44 AM, Chester At Work ches...@alpinenow.com wrote: Patrick There are several things we need, some of them already mentioned in the mailing list before. I haven't looked at the SparkLauncher code, but here are few things we need from our perspectives for Spark Yarn Client 1) client should not be private ( unless alternative is provided) so we can call it directly. 2) we need a way to stop the running yarn app programmatically ( the PR is already submitted) 3) before we start the spark job, we should have a call back to the application, which will provide the yarn container capacity (number of cores and max memory ), so spark program will not set values beyond max values (PR submitted) 4) call back could be in form of yarn app listeners, which call back based on yarn status changes ( start, in progress, failure, complete etc), application can react based on these events in PR) 5) yarn client passing arguments to spark program in the form of main program, we had experience problems when we pass a very large argument due the length limit. For example, we use json to serialize the argument and encoded, then parse them as argument. For wide columns datasets, we will run into limit. Therefore, an alternative way of passing additional larger argument is needed. We are experimenting with passing the args via a established akka messaging channel. 6) spark yarn client in yarn-cluster mode right now is essentially a batch job with no communication once it launched. Need to establish the communication channel so that logs, errors, status updates, progress bars, execution stages etc can be displayed on the application side. We added an akka communication channel for this (working on PR ). Combined with others items in this list, we are able to redirect print and error statement to application log (outside of the hadoop cluster), so spark UI equivalent progress bar via spark listener. We can show yarn progress via yarn app listener before spark started; and status can be updated during job execution. We are also experimenting with long running job with additional spark commands and interactions via this channel. Chester Sent from my iPad On May 12, 2015, at 20:54, Patrick Wendell pwend...@gmail.com wrote: Hey Kevin and Ron, So is the main shortcoming of the launcher library the inability to get an app ID back from YARN? Or are there other issues here that fundamentally regress things for you. It seems like adding a way to get back the appID would be a reasonable addition to the launcher. - Patrick On Tue, May 12, 2015 at 12:51 PM, Marcelo Vanzin van...@cloudera.com wrote: On Tue, May 12, 2015 at 11:34 AM, Kevin Markey kevin.mar...@oracle.com wrote: I understand that SparkLauncher was supposed to address these issues, but it really doesn't. Yarn already provides indirection and an arm's length transaction for starting Spark on a cluster. The launcher introduces yet another layer of indirection and dissociates the Yarn Client from the application that launches it. Well, not fully. The launcher was supposed to solve how to launch a Spark app programatically, but in the first version nothing was added to actually gather information about the running app. It's also limited in the way it works because of Spark's limitations (one context per JVM, etc). Still, adding things like this is something that is definitely in the scope for the launcher library; information such as app id can be useful for the code launching the app, not just in yarn mode. We just
[jira] [Updated] (SPARK-7622) Test Jira
[ https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7622: --- Fix Version/s: (was: 1.6.0) Test Jira - Key: SPARK-7622 URL: https://issues.apache.org/jira/browse/SPARK-7622 Project: Spark Issue Type: Bug Reporter: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7622) Test Jira
[ https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7622: Test Jira - Key: SPARK-7622 URL: https://issues.apache.org/jira/browse/SPARK-7622 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6568) spark-shell.cmd --jars option does not accept the jar that has space in its path
[ https://issues.apache.org/jira/browse/SPARK-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6568: --- Fix Version/s: 1.4.0 spark-shell.cmd --jars option does not accept the jar that has space in its path Key: SPARK-6568 URL: https://issues.apache.org/jira/browse/SPARK-6568 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.3.0 Environment: Windows 8.1 Reporter: Masayoshi TSUZUKI Assignee: Masayoshi TSUZUKI Fix For: 1.4.0 spark-shell.cmd --jars option does not accept the jar that has space in its path. The path of jar sometimes containes space in Windows. {code} bin\spark-shell.cmd --jars C:\Program Files\some\jar1.jar {code} this gets {code} Exception in thread main java.net.URISyntaxException: Illegal character in path at index 10: C:/Program Files/some/jar1.jar {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7561) Install Junit Attachment Plugin on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7561: Install Junit Attachment Plugin on Jenkins -- Key: SPARK-7561 URL: https://issues.apache.org/jira/browse/SPARK-7561 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Patrick Wendell Assignee: shane knapp Fix For: 1.4.0 As part of SPARK-7560 I'd like to just attach the test output file to the Jenkins build. This is nicer than requiring someone have an SSH login to the master node. Currently we gzip the logs, copy it to the master, and then delete them on the worker. https://github.com/apache/spark/blob/master/dev/run-tests-jenkins#L132 Instead I think we can just gzip them and then have the attachment plugin add them to the build. But it would require installing this plug-in to see if we can get it working. [~shaneknapp] not sure how willing you are to install plug-ins on Jenkins, but this one would be awesome if it's doable and we can get it working. https://wiki.jenkins-ci.org/display/JENKINS/JUnit+Attachments+Plugin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7526) Specify ip of RBackend, MonitorServer and RRDD Socket server
[ https://issues.apache.org/jira/browse/SPARK-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7526: --- Fix Version/s: 1.4.0 Specify ip of RBackend, MonitorServer and RRDD Socket server Key: SPARK-7526 URL: https://issues.apache.org/jira/browse/SPARK-7526 Project: Spark Issue Type: Improvement Components: SparkR Reporter: Weizhong Assignee: Weizhong Priority: Minor Fix For: 1.4.0 These R process only used to communicate with JVM process on local, so binding to localhost is more reasonable then wildcard ip. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7303) push down project if possible when the child is sort
[ https://issues.apache.org/jira/browse/SPARK-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7303: --- Fix Version/s: 1.4.0 push down project if possible when the child is sort Key: SPARK-7303 URL: https://issues.apache.org/jira/browse/SPARK-7303 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.1 Reporter: Fei Wang Fix For: 1.4.0 Optimize the case of `project(_, sort)` , a example is: `select key from (select * from testData order by key) t` optimize it from ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` to ``` == Parsed Logical Plan == 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7601) Support Insert into JDBC Datasource
[ https://issues.apache.org/jira/browse/SPARK-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7601: --- Fix Version/s: 1.4.0 Support Insert into JDBC Datasource --- Key: SPARK-7601 URL: https://issues.apache.org/jira/browse/SPARK-7601 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.1 Reporter: Venkata Ramana G Fix For: 1.4.0 Support Insert into JDBCDataSource. Following are usage examples {code} sqlContext.sql( s |CREATE TEMPORARY TABLE testram1 |USING org.apache.spark.sql.jdbc |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', driver 'com.h2.Driver') .stripMargin.replaceAll(\n, )) sqlContext.sql(insert into table testram1 select * from testsrc).show {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7482) Rename some DataFrame API methods in SparkR to match their counterparts in Scala
[ https://issues.apache.org/jira/browse/SPARK-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7482: --- Fix Version/s: 1.4.0 Rename some DataFrame API methods in SparkR to match their counterparts in Scala Key: SPARK-7482 URL: https://issues.apache.org/jira/browse/SPARK-7482 Project: Spark Issue Type: Improvement Components: SparkR Affects Versions: 1.4.0 Reporter: Sun Rui Assignee: Sun Rui Priority: Critical Fix For: 1.4.0 This is re-consideration on how to solve name conflict. Previously, we rename API names from Scala API if there is name conflict with base or other commonly-used packages. However, from long term perspective, this is not good for API stability, because we can't predict name conflicts, for example, if in the future a name added in base package conflicts with an API in SparkR? So the better policy is to keep API name same as Scala's without worrying about name conflicts. When users use SparkR, they should load SparkR as last package, so that all API names are effective. Use can explicitly use :: to refer to hidden names from other packages. more discussion can be found at https://issues.apache.org/jira/browse/SPARK-6812 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7589) Make Input Rate in the Streaming page consistent with other pages
[ https://issues.apache.org/jira/browse/SPARK-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7589: --- Component/s: Streaming Make Input Rate in the Streaming page consistent with other pages --- Key: SPARK-7589 URL: https://issues.apache.org/jira/browse/SPARK-7589 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Shixiong Zhu -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7597) Make default doc build avoid search engine indexing
Patrick Wendell created SPARK-7597: -- Summary: Make default doc build avoid search engine indexing Key: SPARK-7597 URL: https://issues.apache.org/jira/browse/SPARK-7597 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Patrick Wendell Assignee: Patrick Wendell By default we should add the necessary headers to avoid indexing. This will help random personally hosted docs from getting indexed, for instance, nightly doc builds. We should gate this behind the PRODUCTION flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Adding/Using More Resolution Types on JIRA
In Spark we sometimes close issues as something other than Fixed, and this is an important part of maintaining our JIRA. The current resolution types we use are the following: Won't Fix - bug fix or (more often) feature we don't want to add Invalid - issue is underspecified or not appropriate for a JIRA issue Duplicate - duplicate of another JIRA Cannot Reproduce - bug that could not be reproduced Not A Problem - issue purports to represent a bug, but does not I would like to propose adding a few new resolutions. This will require modifying the ASF JIRA, but infra said they are open to proposals as long as they are considered of broad interest. My issue with the current set of resolutions are that Won't Fix is a big catch all we use for many different things. Most often it's used for things that aren't even bugs even though it has Fix in the name. I'm proposing adding: Inactive - A feature or bug that has had no activity from users or developers in a long time Out of Scope - A feature proposal that is not in scope given the projects goals Later - A feature not on the immediate roadmap, but potentially of interest longer term (this one already exists, I'm just proposing to start using it) I am in no way proposing changes to the decision making model around JIRA's, notably that it is consensus based and that all resolutions are considered tentative and fully reversible. The benefits I see of this change would be the following: 1. Inactive: A way to clear out inactive/dead JIRA's without indicating a decision has been made one way or the other. 2. Out of Scope: It more clearly explains closing out-of-scope features than the generic Won't Fix. Also makes it more clear to future contributors what is considered in scope for Spark. 3. Later: A way to signal that issues aren't targeted for a near term version. This would help avoid the mess we have now of like 200+ issues targeted at each version and target version being a very bad indicator of actual roadmap. An alternative on this one is to have a version called Later or Parking Lot but not close the issues. Any thoughts on this? - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Created] (SPARK-7561) Install Junit Attachment Plugin on Jenkins
Patrick Wendell created SPARK-7561: -- Summary: Install Junit Attachment Plugin on Jenkins Key: SPARK-7561 URL: https://issues.apache.org/jira/browse/SPARK-7561 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Patrick Wendell Assignee: shane knapp As part of SPARK-7560 I'd like to just attach the test output file to the Jenkins build. This is nicer than requiring someone have an SSH login to the master node. Currently we gzip the logs, copy it to the master, and then delete them on the worker. https://github.com/apache/spark/blob/master/dev/run-tests-jenkins#L132 Instead I think we can just gzip them and then have the attachment plugin add them to the build. But it would require installing this plug-in to see if we can get it working. [~shaneknapp] not sure how willing you are to install plug-ins on Jenkins, but this one would be awesome if it's doable and we can get it working. https://wiki.jenkins-ci.org/display/JENKINS/JUnit+Attachments+Plugin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7558) Log test name when starting and finishing each test
Patrick Wendell created SPARK-7558: -- Summary: Log test name when starting and finishing each test Key: SPARK-7558 URL: https://issues.apache.org/jira/browse/SPARK-7558 Project: Spark Issue Type: Improvement Components: Tests Reporter: Patrick Wendell Assignee: Andrew Or Right now it's really tough to interpret testing output because logs for different tests are interspersed in the same unit-tests.log file. This makes it particularly hard to diagnose flaky tests. This would be much easier if we logged the test name before and after every test (e.g. Starting test XX, Finished test XX). Then you could get right to the logs. I think one way to do this might be to create a custom test fixture that logs the test class name and then mix that into every test suite /cc [~joshrosen] for his superb knowledge of Scalatest. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7560) Make flaky tests easier to debug
Patrick Wendell created SPARK-7560: -- Summary: Make flaky tests easier to debug Key: SPARK-7560 URL: https://issues.apache.org/jira/browse/SPARK-7560 Project: Spark Issue Type: New Feature Components: Project Infra, Tests Reporter: Patrick Wendell Right now it's really hard for people to even get the logs from a flakey test. Once you get the logs, it's very difficult to figure out what logs are associated with what tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7558) Log test name when starting and finishing each test
[ https://issues.apache.org/jira/browse/SPARK-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7558: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-7560 Log test name when starting and finishing each test --- Key: SPARK-7558 URL: https://issues.apache.org/jira/browse/SPARK-7558 Project: Spark Issue Type: Sub-task Components: Tests Reporter: Patrick Wendell Assignee: Andrew Or Right now it's really tough to interpret testing output because logs for different tests are interspersed in the same unit-tests.log file. This makes it particularly hard to diagnose flaky tests. This would be much easier if we logged the test name before and after every test (e.g. Starting test XX, Finished test XX). Then you could get right to the logs. I think one way to do this might be to create a custom test fixture that logs the test class name and then mix that into every test suite /cc [~joshrosen] for his superb knowledge of Scalatest. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7590) Test Issue to Debug JIRA Problem
Patrick Wendell created SPARK-7590: -- Summary: Test Issue to Debug JIRA Problem Key: SPARK-7590 URL: https://issues.apache.org/jira/browse/SPARK-7590 Project: Spark Issue Type: Bug Reporter: Patrick Wendell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7590) Test Issue to Debug JIRA Problem
[ https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7590. Resolution: Fixed Issue resolved by pull request 5426 [https://github.com/apache/spark/pull/5426] Test Issue to Debug JIRA Problem Key: SPARK-7590 URL: https://issues.apache.org/jira/browse/SPARK-7590 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Fix For: 1.6.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7590) Test Issue to Debug JIRA Problem
[ https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7590: Test Issue to Debug JIRA Problem Key: SPARK-7590 URL: https://issues.apache.org/jira/browse/SPARK-7590 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Fix For: 1.6.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7592) Resolution set to Pending Closed when using PR merge script
Patrick Wendell created SPARK-7592: -- Summary: Resolution set to Pending Closed when using PR merge script Key: SPARK-7592 URL: https://issues.apache.org/jira/browse/SPARK-7592 Project: Spark Issue Type: Bug Components: Project Infra Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Blocker I noticed this was happening. The issue is that the behavior of the ASF JIRA silently changed. Now when the Resolve Issue transition occurs, the default resolution is Pending Closed. We used to count on the default behavior being to set the resolution as Fixed. The solution is to explicitly set the resolution as Fixed and not count on default behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[IMPORTANT] Committers please update merge script
Due to an ASF infrastructure change (bug?) [1] the default JIRA resolution status has switched to Pending Closed. I've made a change to our merge script to coerce the correct status of Fixed when resolving [2]. Please upgrade the merge script to master. I've manually corrected JIRA's that were closed with the incorrect status. Let me know if you have any issues. [1] https://issues.apache.org/jira/browse/INFRA-9646 [2] https://github.com/apache/spark/commit/1b9e434b6c19f23a01e9875a3c1966cd03ce8e2d - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Reopened] (SPARK-7590) Test Issue to Debug JIRA Problem
[ https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7590: Test Issue to Debug JIRA Problem Key: SPARK-7590 URL: https://issues.apache.org/jira/browse/SPARK-7590 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Fix For: 1.6.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7590) Test Issue to Debug JIRA Problem
[ https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7590: Test Issue to Debug JIRA Problem Key: SPARK-7590 URL: https://issues.apache.org/jira/browse/SPARK-7590 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Fix For: 1.6.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7590) Test Issue to Debug JIRA Problem
[ https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7590. Resolution: Pending Closed Issue resolved by pull request 5426 [https://github.com/apache/spark/pull/5426] Test Issue to Debug JIRA Problem Key: SPARK-7590 URL: https://issues.apache.org/jira/browse/SPARK-7590 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Fix For: 1.6.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-5182) Partitioning support for tables created by the data source API
[ https://issues.apache.org/jira/browse/SPARK-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-5182: Partitioning support for tables created by the data source API -- Key: SPARK-5182 URL: https://issues.apache.org/jira/browse/SPARK-5182 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yin Huai Assignee: Cheng Lian Priority: Blocker Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6876) DataFrame.na.replace value support for Python
[ https://issues.apache.org/jira/browse/SPARK-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-6876. Resolution: Fixed DataFrame.na.replace value support for Python - Key: SPARK-6876 URL: https://issues.apache.org/jira/browse/SPARK-6876 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Adrian Wang Fix For: 1.4.0 Scala/Java support is in. We should provide the Python version, similar to what Pandas supports. http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.replace.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7435) Make DataFrame.show() consistent with that of Scala and pySpark
[ https://issues.apache.org/jira/browse/SPARK-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7435: Make DataFrame.show() consistent with that of Scala and pySpark --- Key: SPARK-7435 URL: https://issues.apache.org/jira/browse/SPARK-7435 Project: Spark Issue Type: Improvement Components: SparkR Affects Versions: 1.4.0 Reporter: Sun Rui Assignee: Rekha Joshi Priority: Critical Fix For: 1.4.0 Currently in SparkR, DataFrame has two methods show() and showDF(). show() prints the DataFrame column names and types and showDF() prints the first numRows rows of a DataFrame. In Scala and pySpark, show() is used to prints rows of a DataFrame. We'd better keep API consistent unless there is some important reason. So propose to interchange the names (show() and showDF()) in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-7435) Make DataFrame.show() consistent with that of Scala and pySpark
[ https://issues.apache.org/jira/browse/SPARK-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-7435. Resolution: Fixed Make DataFrame.show() consistent with that of Scala and pySpark --- Key: SPARK-7435 URL: https://issues.apache.org/jira/browse/SPARK-7435 Project: Spark Issue Type: Improvement Components: SparkR Affects Versions: 1.4.0 Reporter: Sun Rui Assignee: Rekha Joshi Priority: Critical Fix For: 1.4.0 Currently in SparkR, DataFrame has two methods show() and showDF(). show() prints the DataFrame column names and types and showDF() prints the first numRows rows of a DataFrame. In Scala and pySpark, show() is used to prints rows of a DataFrame. We'd better keep API consistent unless there is some important reason. So propose to interchange the names (show() and showDF()) in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5182) Partitioning support for tables created by the data source API
[ https://issues.apache.org/jira/browse/SPARK-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-5182. Resolution: Fixed Partitioning support for tables created by the data source API -- Key: SPARK-5182 URL: https://issues.apache.org/jira/browse/SPARK-5182 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Yin Huai Assignee: Cheng Lian Priority: Blocker Fix For: 1.4.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-7534) Fix the Stage table when a stage is missing
[ https://issues.apache.org/jira/browse/SPARK-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell reopened SPARK-7534: Fix the Stage table when a stage is missing --- Key: SPARK-7534 URL: https://issues.apache.org/jira/browse/SPARK-7534 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Reporter: Shixiong Zhu Assignee: Shixiong Zhu Priority: Minor Fix For: 1.4.0 Just improved the Stage table when a stage is missing. Please see the screenshots in https://github.com/apache/spark/pull/6061 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org