[jira] [Updated] (SPARK-7820) Java8-tests suite compile error under SBT

2015-05-23 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7820:
---
Priority: Critical  (was: Blocker)

 Java8-tests suite compile error under SBT
 -

 Key: SPARK-7820
 URL: https://issues.apache.org/jira/browse/SPARK-7820
 Project: Spark
  Issue Type: Bug
  Components: Build, Streaming
Affects Versions: 1.4.0
Reporter: Saisai Shao
Priority: Critical

 Lots of compilation error is shown when java 8 test suite is enabled in SBT:
 {{JAVA_HOME=/usr/java/jdk1.8.0_45 ./sbt/sbt -Pyarn -Phadoop-2.4 
 -Dhadoop.version=2.6.0 -Pjava8-tests}}
 {code}
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:43:
  error: cannot find symbol
 [error] public class Java8APISuite extends LocalJavaStreamingContext 
 implements Serializable {
 [error]^
 [error]   symbol: class LocalJavaStreamingContext
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55:
  error: cannot find symbol
 [error] JavaDStreamString stream = 
 JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
 [error]  ^
 [error]   symbol:   variable ssc
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55:
  error: cannot find symbol
 [error] JavaDStreamString stream = 
 JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
 [error]  ^
 [error]   symbol:   variable JavaTestUtils
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:57:
  error: cannot find symbol
 [error] JavaTestUtils.attachTestOutputStream(letterCount);
 [error] ^
 [error]   symbol:   variable JavaTestUtils
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58:
  error: cannot find symbol
 [error] ListListInteger result = JavaTestUtils.runStreams(ssc, 2, 2);
 [error]   ^
 [error]   symbol:   variable ssc
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58:
  error: cannot find symbol
 [error] ListListInteger result = JavaTestUtils.runStreams(ssc, 2, 2);
 [error]  ^
 [error]   symbol:   variable JavaTestUtils
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:73:
  error: cannot find symbol
 [error] JavaDStreamString stream = 
 JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
 [error]  ^
 [error]   symbol:   variable ssc
 [error]   location: class Java8APISuite
 {code}
 The class {{JavaAPISuite}} relies on {{LocalJavaStreamingContext}} which 
 exists in streaming test jar. It is OK for maven compile, since it will 
 generate test jar, but will be failed in sbt test compile, sbt do not 
 generate test jar by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7820) Java8-tests suite compile error under SBT

2015-05-23 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557581#comment-14557581
 ] 

Patrick Wendell commented on SPARK-7820:


Since this only affects tests I'm de-escalating it, but I'd like to see it 
fixed as well before 1.4.0 ships if possible.


 Java8-tests suite compile error under SBT
 -

 Key: SPARK-7820
 URL: https://issues.apache.org/jira/browse/SPARK-7820
 Project: Spark
  Issue Type: Bug
  Components: Build, Streaming
Affects Versions: 1.4.0
Reporter: Saisai Shao
Priority: Critical

 Lots of compilation error is shown when java 8 test suite is enabled in SBT:
 {{JAVA_HOME=/usr/java/jdk1.8.0_45 ./sbt/sbt -Pyarn -Phadoop-2.4 
 -Dhadoop.version=2.6.0 -Pjava8-tests}}
 {code}
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:43:
  error: cannot find symbol
 [error] public class Java8APISuite extends LocalJavaStreamingContext 
 implements Serializable {
 [error]^
 [error]   symbol: class LocalJavaStreamingContext
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55:
  error: cannot find symbol
 [error] JavaDStreamString stream = 
 JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
 [error]  ^
 [error]   symbol:   variable ssc
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:55:
  error: cannot find symbol
 [error] JavaDStreamString stream = 
 JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
 [error]  ^
 [error]   symbol:   variable JavaTestUtils
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:57:
  error: cannot find symbol
 [error] JavaTestUtils.attachTestOutputStream(letterCount);
 [error] ^
 [error]   symbol:   variable JavaTestUtils
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58:
  error: cannot find symbol
 [error] ListListInteger result = JavaTestUtils.runStreams(ssc, 2, 2);
 [error]   ^
 [error]   symbol:   variable ssc
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:58:
  error: cannot find symbol
 [error] ListListInteger result = JavaTestUtils.runStreams(ssc, 2, 2);
 [error]  ^
 [error]   symbol:   variable JavaTestUtils
 [error]   location: class Java8APISuite
 [error] 
 /mnt/data/project/apache-spark/extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java:73:
  error: cannot find symbol
 [error] JavaDStreamString stream = 
 JavaTestUtils.attachTestInputStream(ssc, inputData, 1);
 [error]  ^
 [error]   symbol:   variable ssc
 [error]   location: class Java8APISuite
 {code}
 The class {{JavaAPISuite}} relies on {{LocalJavaStreamingContext}} which 
 exists in streaming test jar. It is OK for maven compile, since it will 
 generate test jar, but will be failed in sbt test compile, sbt do not 
 generate test jar by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7287) Flaky test: o.a.s.deploy.SparkSubmitSuite --packages

2015-05-23 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557610#comment-14557610
 ] 

Patrick Wendell commented on SPARK-7287:


[~brkyvz] I am going to disable this test again, it is still failing even after 
SPARK-7224:

https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.4-Maven-pre-YARN/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/235/testReport/junit/org.apache.spark.deploy/SparkSubmitSuite/includes_jars_passed_in_through___packages/

 Flaky test: o.a.s.deploy.SparkSubmitSuite --packages
 

 Key: SPARK-7287
 URL: https://issues.apache.org/jira/browse/SPARK-7287
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Burak Yavuz
Priority: Critical
  Labels: flaky-test

 Error message was not helpful (did not complete within 60 seconds or 
 something).
 Observed only in master:
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/2239/
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/2238/
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2163/
 ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: spark packages

2015-05-23 Thread Patrick Wendell
Yes - spark packages can include non ASF licenses.

On Sat, May 23, 2015 at 6:16 PM, Debasish Das debasish.da...@gmail.com wrote:
 Hi,

 Is it possible to add GPL/LGPL code on spark packages or it must be licensed
 under Apache as well ?

 I want to expose Professor Tim Davis's LGPL library for sparse algebra and
 ECOS GPL library through the package.

 Thanks.
 Deb

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Updated] (SPARK-7807) High-Availablity:: SparkHadoopUtil.scala should support hadoopConfiguration.addResource()

2015-05-22 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7807:
---
Component/s: Spark Core

 High-Availablity:: SparkHadoopUtil.scala should support  
 hadoopConfiguration.addResource()
 --

 Key: SPARK-7807
 URL: https://issues.apache.org/jira/browse/SPARK-7807
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
 Environment: running spark against remote-hadoop HA cluster. Easy of 
 use with spark.hadoop.url. prefix.
 1) user can support sparkConf with prefix spark.hadoop.url. like 
 spark.hadoop.url.core-site 
 and spark.hadoop.url.hdfs-site 
Reporter: Norman He
Priority: Trivial
  Labels: easyfix

 line 97 : should below should be able to change to 
 conf.getAll.foreach { case (key, value) =
 if (key.startsWith(spark.hadoop.)) {
   hadoopConf.set(key.substring(spark.hadoop..length), value)
 }
   }
 new version---
   conf.getAll.foreach { case (key, value) =
 if (key.startsWith(spark.hadoop.)) {
   if( key.startsWith(spark.hadoop.url.)) 
hadoopConf.addResource(new URL(value))
   else
   hadoopConf.set(key.substring(spark.hadoop..length), value)
 }
   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Patrick Wendell
Thanks Andrew, the doc issue should be fixed in RC2 (if not, please
chine in!). R was missing in the build envirionment.

- Patrick

On Fri, May 22, 2015 at 3:33 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 Thanks for catching this. I'll check with Patrick to see why the R API docs
 are not getting included.

 On Fri, May 22, 2015 at 2:44 PM, Andrew Psaltis psaltis.and...@gmail.com
 wrote:

 All,
 Should all the docs work from
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ ? If so the R API
 docs 404.


 On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found
 at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Created] (SPARK-7805) Move SQLTestUtils.scala form src/main

2015-05-21 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7805:
--

 Summary: Move SQLTestUtils.scala form src/main
 Key: SPARK-7805
 URL: https://issues.apache.org/jira/browse/SPARK-7805
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Patrick Wendell
Assignee: Yin Huai
Priority: Critical


These trigger binary compatibility issues when changed. In general we shouldn't 
be putting test code in src/main. If it's needed by multiple modules, IIRC we 
have a way to do that (look elsewhere in Spark).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7771) Dynamic allocation: lower timeouts further

2015-05-21 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7771:
---
Issue Type: Improvement  (was: Bug)

 Dynamic allocation: lower timeouts further
 --

 Key: SPARK-7771
 URL: https://issues.apache.org/jira/browse/SPARK-7771
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Andrew Or

 While testing, I found that the existing timeouts of 5s to add and 600s to 
 remove still too high for many workloads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7801) Upgrade master versions to Spark 1.5.0

2015-05-21 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7801:
---
Issue Type: Improvement  (was: Bug)

 Upgrade master versions to Spark 1.5.0
 --

 Key: SPARK-7801
 URL: https://issues.apache.org/jira/browse/SPARK-7801
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7803) Update MIMA for Spark 1.5.0

2015-05-21 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7803:
--

 Summary: Update MIMA for Spark 1.5.0
 Key: SPARK-7803
 URL: https://issues.apache.org/jira/browse/SPARK-7803
 Project: Spark
  Issue Type: Sub-task
Reporter: Patrick Wendell
Assignee: Patrick Wendell


We should do this after we publish 1.4 binaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-7802) Update pom versioning to 1.5.0

2015-05-21 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell deleted SPARK-7802:
---


 Update pom versioning to 1.5.0
 --

 Key: SPARK-7802
 URL: https://issues.apache.org/jira/browse/SPARK-7802
 Project: Spark
  Issue Type: Sub-task
Reporter: Patrick Wendell





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-7803) Update MIMA for Spark 1.5.0

2015-05-21 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell deleted SPARK-7803:
---


 Update MIMA for Spark 1.5.0
 ---

 Key: SPARK-7803
 URL: https://issues.apache.org/jira/browse/SPARK-7803
 Project: Spark
  Issue Type: Sub-task
Reporter: Patrick Wendell
Assignee: Patrick Wendell

 We should do this after we publish 1.4 binaries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7801) Upgrade master versions to Spark 1.5.0

2015-05-21 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7801:
--

 Summary: Upgrade master versions to Spark 1.5.0
 Key: SPARK-7801
 URL: https://issues.apache.org/jira/browse/SPARK-7801
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7802) Update pom versioning to 1.5.0

2015-05-21 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7802:
--

 Summary: Update pom versioning to 1.5.0
 Key: SPARK-7802
 URL: https://issues.apache.org/jira/browse/SPARK-7802
 Project: Spark
  Issue Type: Sub-task
Reporter: Patrick Wendell






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7320) Add rollup and cube support to DataFrame DSL

2015-05-20 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553073#comment-14553073
 ] 

Patrick Wendell commented on SPARK-7320:


Hey [~liancheng] and [~chenghao] - I reverted pull request 6257 because it 
broke all of our maven builds. The issue is that it's not safe to rely on the 
test suite constructor to create the table used in the tests.

Separately, I noticed this JIRA was not closed when the patch was merged. In 
cases like this where the patch only addresses part of the JIRA, it is better 
to just create a sub task or split the task into two different JIRA's. I.e. 
every pull request (ideally) is associated with exactly one JIRA. Otherwise, 
it's difficult to track things like when we revert a patch.

 Add rollup and cube support to DataFrame DSL
 

 Key: SPARK-7320
 URL: https://issues.apache.org/jira/browse/SPARK-7320
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Cheng Hao
  Labels: starter

 We should add two functions to GroupedData in order to support rollup and 
 cube for the DataFrame DSL.
 {code}
 def rollup(): GroupedData
 def cube(): GroupedData
 {code}
 These two should return new GroupedData with the appropriate state set so 
 when we run an Aggregate, we translate the underlying logical operator into 
 Rollup or Cube.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7389) Tachyon integration improvement

2015-05-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7389:
---
Assignee: shimingfei

 Tachyon integration improvement
 ---

 Key: SPARK-7389
 URL: https://issues.apache.org/jira/browse/SPARK-7389
 Project: Spark
  Issue Type: Improvement
  Components: Block Manager
Reporter: shimingfei
Assignee: shimingfei
 Fix For: 1.5.0


 Two main changes:
 1. Add two functions in ExternalBlockManager, which are putValues and 
 getValues, because the implementation may not rely on the putBytes and 
 getBytes
 2. improve Tachyon integration.
 Currently, when putting data into Tachyon, Spark first serialize all data in 
 one partition into a ByteBuffer, and then write into Tachyon, this will use 
 much memory and increase GC overhead
 when getting data from Tachyon, getValues depends on getBytes, which also 
 read all data into On heap byte arry, and result in much memory usage.
 This PR changes the approach of the two functions, make them read / write 
 data by stream to reduce memory usage.
 In our testing, when data size is huge, this patch reduces about 30% GC time 
 and 70% full GC time, and total execution time reduces about 10%



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7389) Tachyon integration improvement

2015-05-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7389.

   Resolution: Fixed
Fix Version/s: 1.5.0

 Tachyon integration improvement
 ---

 Key: SPARK-7389
 URL: https://issues.apache.org/jira/browse/SPARK-7389
 Project: Spark
  Issue Type: Improvement
  Components: Block Manager
Reporter: shimingfei
Assignee: shimingfei
 Fix For: 1.5.0


 Two main changes:
 1. Add two functions in ExternalBlockManager, which are putValues and 
 getValues, because the implementation may not rely on the putBytes and 
 getBytes
 2. improve Tachyon integration.
 Currently, when putting data into Tachyon, Spark first serialize all data in 
 one partition into a ByteBuffer, and then write into Tachyon, this will use 
 much memory and increase GC overhead
 when getting data from Tachyon, getValues depends on getBytes, which also 
 read all data into On heap byte arry, and result in much memory usage.
 This PR changes the approach of the two functions, make them read / write 
 data by stream to reduce memory usage.
 In our testing, when data size is huge, this patch reduces about 30% GC time 
 and 70% full GC time, and total execution time reduces about 10%



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7719) Java 6 code in UnsafeShuffleWriterSuite

2015-05-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7719:
---
Description: 
This was causing a compile failure because emptyIterator() is not exposed in 
some versions of Java 6. I lost the exact compile error along the way in the 
console, but it's just a simple visibility issue.

https://github.com/apache/spark/commit/9ebb44f8abb1a13f045eed60190954db904ffef7

I've removed the test code for now, but we probably want to use something from 
Guava instead for this:
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterators.html

  was:
This was causing a compile failure because emptyIterator() is not exposed in 
some versions of Java 6. I lost the exact compile error along the way in the 
console, but it's just a simple visibility issue.

https://github.com/apache/spark/commit/9ebb44f8abb1a13f045eed60190954db904ffef7


 Java 6 code in UnsafeShuffleWriterSuite
 ---

 Key: SPARK-7719
 URL: https://issues.apache.org/jira/browse/SPARK-7719
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Reporter: Patrick Wendell
Assignee: Josh Rosen
Priority: Critical

 This was causing a compile failure because emptyIterator() is not exposed in 
 some versions of Java 6. I lost the exact compile error along the way in the 
 console, but it's just a simple visibility issue.
 https://github.com/apache/spark/commit/9ebb44f8abb1a13f045eed60190954db904ffef7
 I've removed the test code for now, but we probably want to use something 
 from Guava instead for this:
 http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/Iterators.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7722) Style checks do not run for Kinesis on Jenkins

2015-05-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7722:
---
Component/s: Project Infra

 Style checks do not run for Kinesis on Jenkins
 --

 Key: SPARK-7722
 URL: https://issues.apache.org/jira/browse/SPARK-7722
 Project: Spark
  Issue Type: Bug
  Components: Project Infra, Streaming
Reporter: Patrick Wendell
Assignee: Tathagata Das
Priority: Critical

 This caused the release build to fail late in the game. We should make sure 
 jenkins is proactively checking it:
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commitdiff;h=23cf897112624ece19a3b5e5394cdf71b9c3c8b3;hp=9ebb44f8abb1a13f045eed60190954db904ffef7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7719) Java 6 code in UnsafeShuffleWriterSuite

2015-05-19 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7719:
--

 Summary: Java 6 code in UnsafeShuffleWriterSuite
 Key: SPARK-7719
 URL: https://issues.apache.org/jira/browse/SPARK-7719
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Reporter: Patrick Wendell
Assignee: Josh Rosen
Priority: Critical


This was causing a compile failure because emptyIterator() is not exposed in 
some versions of Java 6. I lost the exact compile error along the way in the 
console, but it's just a simple visibility issue.

https://github.com/apache/spark/commit/9ebb44f8abb1a13f045eed60190954db904ffef7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7722) Style checks do not run for Kinesis on Jenkins

2015-05-19 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7722:
--

 Summary: Style checks do not run for Kinesis on Jenkins
 Key: SPARK-7722
 URL: https://issues.apache.org/jira/browse/SPARK-7722
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Patrick Wendell
Assignee: Tathagata Das
Priority: Critical


This caused the release build to fail late in the game. We should make sure 
jenkins is proactively checking it:

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commitdiff;h=23cf897112624ece19a3b5e5394cdf71b9c3c8b3;hp=9ebb44f8abb1a13f045eed60190954db904ffef7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]

2015-05-19 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7726:
--

 Summary: Maven Install Breaks When Upgrading Scala 
2.11.2--[2.11.3 or higher]
 Key: SPARK-7726
 URL: https://issues.apache.org/jira/browse/SPARK-7726
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell
Priority: Blocker


This one took a long time to track down. The Maven install phase is part of our 
release process. It runs the scala:doc target to generate doc jars. Between 
Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in a way 
that breaks our build. In both cases, it returned an error (there has been a 
long running error here that we've always ignored), however in 2.11.3 that 
error became fatal and failed the entire build process. The upgrade occurred in 
SPARK-7092. Here is a simple reproduction:

{code}
./dev/change-version-to-2.11.sh
mvn clean install -pl network/common -pl network/shuffle -DskipTests 
-Dscala-2.11
{code} 

This command exits success when Spark is at Scala 2.11.2 and fails with 2.11.3 
or higher. In either case an error is printed:

{code}
[INFO] 
[INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
spark-network-shuffle_2.11 ---
/Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
 error: not found: type Type
  protected Type type() { return Type.UPLOAD_BLOCK; }
^
/Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
 error: not found: type Type
  protected Type type() { return Type.STREAM_HANDLE; }
^
/Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
 error: not found: type Type
  protected Type type() { return Type.REGISTER_EXECUTOR; }
^
/Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
 error: not found: type Type
  protected Type type() { return Type.OPEN_BLOCKS; }
^
model contains 22 documentable templates
four errors found
{code}

Ideally we'd just dig in and fix this error. Unfortunately it's a very 
confusing error and I have no idea why it is appearing. I'd propose reverting 
SPARK-7092 in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7670) Failure when building with scala 2.11 (after 1.3.1

2015-05-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7670.

   Resolution: Duplicate
Fix Version/s: SPARK-7726

 Failure when building with scala 2.11 (after 1.3.1
 --

 Key: SPARK-7670
 URL: https://issues.apache.org/jira/browse/SPARK-7670
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.4.0
Reporter: Fernando Ruben Otero
 Fix For: SPARK-7726

 Attachments: Dockerfile


 When trying to build spark with scala 2.11 on revision 
 c64ff8036cc6bc7c87743f4c751d7fe91c2e366a  (the one on master when I'm 
 submitting this issue) I'm getting 
  export MAVEN_OPTS=-Xmx2g -XX:MaxPermSize=512M 
  -XX:ReservedCodeCacheSize=512m
  dev/change-version-to-2.11.sh
  mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -Dhadoop.version=2.6.0 -DskipTests 
  clean install
 ...
 ...
 ...
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/ZeoS/dev/bigdata/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7092) Update spark scala version to 2.11.6

2015-05-19 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550124#comment-14550124
 ] 

Patrick Wendell commented on SPARK-7092:


This is reopened because it caused SPARK-7726

 Update spark scala version to 2.11.6
 

 Key: SPARK-7092
 URL: https://issues.apache.org/jira/browse/SPARK-7092
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Spark Shell
Affects Versions: 1.4.0
Reporter: Prashant Sharma
Assignee: Prashant Sharma
Priority: Minor
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7092) Update spark scala version to 2.11.6

2015-05-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7092:


 Update spark scala version to 2.11.6
 

 Key: SPARK-7092
 URL: https://issues.apache.org/jira/browse/SPARK-7092
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Spark Shell
Affects Versions: 1.4.0
Reporter: Prashant Sharma
Assignee: Prashant Sharma
Priority: Minor
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell
HI all,

I've created another release repository where the release is
identified with the version 1.4.0-rc1:

https://repository.apache.org/content/repositories/orgapachespark-1093/

On Tue, May 19, 2015 at 5:36 PM, Krishna Sankar ksanka...@gmail.com wrote:
 Quick tests from my side - looks OK. The results are same or very similar to
 1.3.1. Will add dataframes et al in future tests.

 +1 (non-binding, of course)

 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:42 min
  mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
 -Dhadoop.version=2.6.0 -Phive -DskipTests
 2. Tested pyspark, mlib - running as well as compare results with 1.3.1
 2.1. statistics (min,max,mean,Pearson,Spearman) OK
 2.2. Linear/Ridge/Laso Regression OK
 2.3. Decision Tree, Naive Bayes OK
 2.4. KMeans OK
Center And Scale OK
 2.5. RDD operations OK
   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
Model evaluation/optimization (rank, numIter, lambda) with itertools
 OK

 Cheers
 k/

 On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell
Punya,

Let me see if I can publish these under rc1 as well. In the future
this will all be automated but current it's a somewhat manual task.

- Patrick

On Tue, May 19, 2015 at 9:32 AM, Punyashloka Biswal
punya.bis...@gmail.com wrote:
 When publishing future RCs to the staging repository, would it be possible
 to use a version number that includes the rc1 designation? In the current
 setup, when I run a build against the artifacts at
 https://repository.apache.org/content/repositories/orgapachespark-1092/org/apache/spark/spark-core_2.10/1.4.0/,
 my local Maven cache will get polluted with things that claim to be 1.4.0
 but aren't. It would be preferable for the version number to be 1.4.0-rc1
 instead.

 Thanks!
 Punya


 On Tue, May 19, 2015 at 12:20 PM Sean Owen so...@cloudera.com wrote:

 Before I vote, I wanted to point out there are still 9 Blockers for 1.4.0.
 I'd like to use this status to really mean must happen before the release.
 Many of these may be already fixed, or aren't really blockers -- can just be
 updated accordingly.

 I bet at least one will require further work if it's really meant for 1.4,
 so all this means is there is likely to be another RC. We should still kick
 the tires on RC1.

 (I also assume we should be extra conservative about what is merged into
 1.4 at this point.)


 SPARK-6784 SQL Clean up all the inbound/outbound conversions for DateType
 Adrian Wang

 SPARK-6811 SparkR Building binary R packages for SparkR Shivaram
 Venkataraman

 SPARK-6941 SQL Provide a better error message to explain that tables
 created from RDDs are immutable
 SPARK-7158 SQL collect and take return different results
 SPARK-7478 SQL Add a SQLContext.getOrCreate to maintain a singleton
 instance of SQLContext Tathagata Das

 SPARK-7616 SQL Overwriting a partitioned parquet table corrupt data Cheng
 Lian

 SPARK-7654 SQL DataFrameReader and DataFrameWriter for input/output API
 Reynold Xin

 SPARK-7662 SQL Exception of multi-attribute generator anlysis in
 projection

 SPARK-7713 SQL Use shared broadcast hadoop conf for partitioned table
 scan. Yin Huai


 On Tue, May 19, 2015 at 5:10 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



branch-1.4 merge ettiquite

2015-05-19 Thread Patrick Wendell
Hey All,

Since we are now voting, please tread very carefully with branch-1.4 merges.

For instances, bug fixes that don't represent regressions from 1.3.X,
these probably shouldn't be merged unless they are extremely simple
and well reviewed.

As usual mature/core components (e.g. Spark core) are more sensitive
than newer/edge ones (e.g. Dataframes).

I'm happy to provide guidance to people if they are on the fence about
patches. Ultimately this ends up being a matter of judgement and
assessing risk of specific patches. Just ping me on github.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc1 (commit 777a081):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.4.0-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1092/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

Please vote on releasing this package as Apache Spark 1.4.0!

The vote is open until Friday, May 22, at 17:03 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions from 1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs related
to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Updated] (SPARK-7743) Upgrade parquet dependency

2015-05-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7743:
---
Component/s: SQL

 Upgrade parquet dependency
 --

 Key: SPARK-7743
 URL: https://issues.apache.org/jira/browse/SPARK-7743
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Thomas Omans

 There are many outstanding issues with the parquet format that have been 
 resolved between the version depended on by spark (1.6.0rc3 as of spark 
 1.3.1) and the most recent parquet release (1.6.0).
 Some of these are things include not supporting schema migration when using 
 parquet with avro, not supporting summary metadata in the parquet footers 
 causing null pointer exceptions reading, and many others.
 See https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-160 
 for the full list of fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell
A couple of other process things:

1. Please *keep voting* (+1/-1) on this thread even if we find some
issues, until we cut RC2. This lets us pipeline the QA.
2. The SQL team owes a JIRA clean-up (forthcoming shortly)... there
are still a few Blocker's that aren't.


On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Updated] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]

2015-05-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7726:
---
Assignee: Iulian Dragos

 Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]
 -

 Key: SPARK-7726
 URL: https://issues.apache.org/jira/browse/SPARK-7726
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell
Assignee: Iulian Dragos
Priority: Blocker

 This one took a long time to track down. The Maven install phase is part of 
 our release process. It runs the scala:doc target to generate doc jars. 
 Between Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in 
 a way that breaks our build. In both cases, it returned an error (there has 
 been a long running error here that we've always ignored), however in 2.11.3 
 that error became fatal and failed the entire build process. The upgrade 
 occurred in SPARK-7092. Here is a simple reproduction:
 {code}
 ./dev/change-version-to-2.11.sh
 mvn clean install -pl network/common -pl network/shuffle -DskipTests 
 -Dscala-2.11
 {code} 
 This command exits success when Spark is at Scala 2.11.2 and fails with 
 2.11.3 or higher. In either case an error is printed:
 {code}
 [INFO] 
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }
 ^
 model contains 22 documentable templates
 four errors found
 {code}
 Ideally we'd just dig in and fix this error. Unfortunately it's a very 
 confusing error and I have no idea why it is appearing. I'd propose reverting 
 SPARK-7092 in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]

2015-05-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7726.

   Resolution: Fixed
Fix Version/s: 1.4.0

 Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]
 -

 Key: SPARK-7726
 URL: https://issues.apache.org/jira/browse/SPARK-7726
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell
Assignee: Iulian Dragos
Priority: Blocker
 Fix For: 1.4.0


 This one took a long time to track down. The Maven install phase is part of 
 our release process. It runs the scala:doc target to generate doc jars. 
 Between Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in 
 a way that breaks our build. In both cases, it returned an error (there has 
 been a long running error here that we've always ignored), however in 2.11.3 
 that error became fatal and failed the entire build process. The upgrade 
 occurred in SPARK-7092. Here is a simple reproduction:
 {code}
 ./dev/change-version-to-2.11.sh
 mvn clean install -pl network/common -pl network/shuffle -DskipTests 
 -Dscala-2.11
 {code} 
 This command exits success when Spark is at Scala 2.11.2 and fails with 
 2.11.3 or higher. In either case an error is printed:
 {code}
 [INFO] 
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }
 ^
 model contains 22 documentable templates
 four errors found
 {code}
 Ideally we'd just dig in and fix this error. Unfortunately it's a very 
 confusing error and I have no idea why it is appearing. I'd propose reverting 
 SPARK-7092 in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7092) Update spark scala version to 2.11.6

2015-05-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7092.

Resolution: Fixed

Okay this was re merged in the SPARK-7726 fix:

https://github.com/apache/spark/commit/ee012e0ed61fbf5bb819b7489a3a23a03c878f4d

 Update spark scala version to 2.11.6
 

 Key: SPARK-7092
 URL: https://issues.apache.org/jira/browse/SPARK-7092
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Spark Shell
Affects Versions: 1.4.0
Reporter: Prashant Sharma
Assignee: Prashant Sharma
Priority: Minor
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7677) Enable Kafka In Scala 2.11 Build

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7677:
---
Description: Now that we upgraded Kafka in SPARK-2808 we can enable it in 
the Scala 2.11 build.

 Enable Kafka In Scala 2.11 Build
 

 Key: SPARK-7677
 URL: https://issues.apache.org/jira/browse/SPARK-7677
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Reporter: Patrick Wendell
Assignee: Iulian Dragos

 Now that we upgraded Kafka in SPARK-2808 we can enable it in the Scala 2.11 
 build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7672) Number format exception with spark.kryoserializer.buffer.mb

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7672:
---
Priority: Critical  (was: Major)

 Number format exception with spark.kryoserializer.buffer.mb
 ---

 Key: SPARK-7672
 URL: https://issues.apache.org/jira/browse/SPARK-7672
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Nishkam Ravi
Priority: Critical

 With spark.kryoserializer.buffer.mb  1000 : 
 Exception in thread main java.lang.NumberFormatException: Size must be 
 specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), 
 tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m.
 Fractional values are not supported. Input was: 100.0
 at 
 org.apache.spark.network.util.JavaUtils.parseByteString(JavaUtils.java:238)
 at 
 org.apache.spark.network.util.JavaUtils.byteStringAsKb(JavaUtils.java:259)
 at org.apache.spark.util.Utils$.byteStringAsKb(Utils.scala:1037)
 at org.apache.spark.SparkConf.getSizeAsKb(SparkConf.scala:245)
 at 
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:53)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:269)
 at 
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:280)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:283)
 at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
 at 
 org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7672) Number format exception with spark.kryoserializer.buffer.mb

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7672:
---
Component/s: Spark Core

 Number format exception with spark.kryoserializer.buffer.mb
 ---

 Key: SPARK-7672
 URL: https://issues.apache.org/jira/browse/SPARK-7672
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Nishkam Ravi

 With spark.kryoserializer.buffer.mb  1000 : 
 Exception in thread main java.lang.NumberFormatException: Size must be 
 specified as bytes (b), kibibytes (k), mebibytes (m), gibibytes (g), 
 tebibytes (t), or pebibytes(p). E.g. 50b, 100k, or 250m.
 Fractional values are not supported. Input was: 100.0
 at 
 org.apache.spark.network.util.JavaUtils.parseByteString(JavaUtils.java:238)
 at 
 org.apache.spark.network.util.JavaUtils.byteStringAsKb(JavaUtils.java:259)
 at org.apache.spark.util.Utils$.byteStringAsKb(Utils.scala:1037)
 at org.apache.spark.SparkConf.getSizeAsKb(SparkConf.scala:245)
 at 
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:53)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:269)
 at 
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:280)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:283)
 at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
 at 
 org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7284) Update streaming documentation for Spark 1.4.0 release

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7284:
---
Priority: Critical  (was: Blocker)

 Update streaming documentation for Spark 1.4.0 release
 --

 Key: SPARK-7284
 URL: https://issues.apache.org/jira/browse/SPARK-7284
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical

 Things to update (continuously updated list)
 - Python API for Kafka Direct
 - Pointers to the new Streaming UI
 - Update Kafka version to 0.8.2.1
 - Add ref to RDD.foreachPartitionWithIndex (if merged)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7677) Enable Kafka In Scala 2.11 Build

2015-05-15 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7677:
--

 Summary: Enable Kafka In Scala 2.11 Build
 Key: SPARK-7677
 URL: https://issues.apache.org/jira/browse/SPARK-7677
 Project: Spark
  Issue Type: Sub-task
Reporter: Patrick Wendell
Assignee: Iulian Dragos






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6811) Building binary R packages for SparkR

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6811:
---
Assignee: Shivaram Venkataraman

 Building binary R packages for SparkR
 -

 Key: SPARK-6811
 URL: https://issues.apache.org/jira/browse/SPARK-6811
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
Priority: Blocker

 We should figure out how to distribute binary packages for SparkR as a part 
 of the release process. R packages for Windows might need to be built 
 separately and we could offer a separate download link for Windows users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7563:
---
Fix Version/s: 1.4.0

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical
 Fix For: 1.4.0


 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 outputCommitCoordinator.stop

[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7563:
---
Target Version/s: 1.3.2, 1.4.0

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical
 Fix For: 1.4.0


 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 outputCommitCoordinator.stop

[jira] [Resolved] (SPARK-7677) Enable Kafka In Scala 2.11 Build

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7677.

   Resolution: Fixed
Fix Version/s: 1.4.0

Fixed by pull request:
https://github.com/apache/spark/pull/6149

 Enable Kafka In Scala 2.11 Build
 

 Key: SPARK-7677
 URL: https://issues.apache.org/jira/browse/SPARK-7677
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Reporter: Patrick Wendell
Assignee: Iulian Dragos
 Fix For: 1.4.0


 Now that we upgraded Kafka in SPARK-2808 we can enable it in the Scala 2.11 
 build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7355) FlakyTest - o.a.s.DriverSuite

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7355:
---
Priority: Critical  (was: Blocker)

 FlakyTest - o.a.s.DriverSuite
 -

 Key: SPARK-7355
 URL: https://issues.apache.org/jira/browse/SPARK-7355
 Project: Spark
  Issue Type: Test
  Components: Spark Core, Tests
Reporter: Tathagata Das
Assignee: Andrew Or
Priority: Critical
  Labels: flaky-test





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7644) Ensure all scoped RDD operations are tested and cleaned

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7644:
---
Priority: Critical  (was: Blocker)

 Ensure all scoped RDD operations are tested and cleaned
 ---

 Key: SPARK-7644
 URL: https://issues.apache.org/jira/browse/SPARK-7644
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL, Streaming
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical

 If all goes well, this will be a Won't Fix. Before releasing we should make 
 sure all operations wrapped in `RDDOperationScope.withScope` are actually 
 tested and enclosed closures are actually cleaned. This is because a big 
 change went into `ClosureCleaner` and wrapping methods in closures may change 
 whether they are serializable.
 TL;DR we should run all the wrapped operations to make sure we don't run into 
 java.lang.NotSerializableException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2015-05-15 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546394#comment-14546394
 ] 

Patrick Wendell commented on SPARK-2883:


Since this is a feature I'm going to drop it down to critical priority since 
we'll start the release candidates soon. However, I think it's fine to slip 
this in between RC's because it's purely additive, so IMO it's very likely this 
will make it into Spark 1.4.

 Spark Support for ORCFile format
 

 Key: SPARK-2883
 URL: https://issues.apache.org/jira/browse/SPARK-2883
 Project: Spark
  Issue Type: Bug
  Components: Input/Output, SQL
Reporter: Zhan Zhang
Priority: Blocker
 Attachments: 2014-09-12 07.05.24 pm Spark UI.png, 2014-09-12 07.07.19 
 pm jobtracker.png, orc.diff


 Verify the support of OrcInputFormat in spark, fix issues if exists and add 
 documentation of its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2883) Spark Support for ORCFile format

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2883:
---
Priority: Critical  (was: Blocker)

 Spark Support for ORCFile format
 

 Key: SPARK-2883
 URL: https://issues.apache.org/jira/browse/SPARK-2883
 Project: Spark
  Issue Type: Bug
  Components: Input/Output, SQL
Reporter: Zhan Zhang
Priority: Critical
 Attachments: 2014-09-12 07.05.24 pm Spark UI.png, 2014-09-12 07.07.19 
 pm jobtracker.png, orc.diff


 Verify the support of OrcInputFormat in spark, fix issues if exists and add 
 documentation of its usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-05-15 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546468#comment-14546468
 ] 

Patrick Wendell commented on SPARK-7563:


I pulled the fix into 1.4.0, but not yet 1.3.2 (didn't feel comfortable doing 
the backport).

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical
 Fix For: 1.4.0


 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop

[jira] [Resolved] (SPARK-5920) Use a BufferedInputStream to read local shuffle data

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5920.

Resolution: Won't Fix

Per the discussion on this PR I am resolving this as won't fix.

https://github.com/apache/spark/pull/4878

[~kayousterhout] please feel free to re-open if I misinterpreted.

 Use a BufferedInputStream to read local shuffle data
 

 Key: SPARK-5920
 URL: https://issues.apache.org/jira/browse/SPARK-5920
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 1.2.1, 1.3.0
Reporter: Kay Ousterhout
Assignee: Kay Ousterhout
Priority: Blocker

 When reading local shuffle data, Spark doesn't currently buffer the local 
 reads into larger chunks, which can lead to terrible disk performance if many 
 tasks are concurrently reading local data from the same disk.  We should use 
 a BufferedInputStream to mitigate this problem; we can lazily create the 
 input stream to avoid allocating a bunch of in-memory buffers at the same 
 time for tasks that read shuffle data from a large number of local blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7532) Make StreamingContext.start() idempotent

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7532:
---
Fix Version/s: 1.4.0

 Make StreamingContext.start() idempotent
 

 Key: SPARK-7532
 URL: https://issues.apache.org/jira/browse/SPARK-7532
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Blocker
 Fix For: 1.4.0


 Currently calling StreamingContext.start() throws error when the context is 
 already started. This is inconsistent with the the StreamingContext.stop() 
 which is idempotent, that is, called stop() on a stopped context is a no-op. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7228) SparkR public API for 1.4 release

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7228:
---
Fix Version/s: 1.4.0

 SparkR public API for 1.4 release
 -

 Key: SPARK-7228
 URL: https://issues.apache.org/jira/browse/SPARK-7228
 Project: Spark
  Issue Type: Umbrella
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
Priority: Blocker
 Fix For: 1.4.0


 This in an umbrella ticket to track the public APIs and documentation to be 
 released as a part of SparkR in the 1.4 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Tentative due dates for Spark 1.3.2 release

2015-05-15 Thread Patrick Wendell
Hi Niranda,

Maintenance releases are not done on a predetermined schedule but
instead according to which fixes show up and their severity. Since we
just did a 1.3.1 release I'm not sure I see 1.3.2 on the immediate
horizon.

However, the maintenance releases are simply builds at the head of the
respective release branches (in this case branch-1.3). They never
introduce new API's. If you have a particular bug fix you are waiting
for, you can always build Spark off of that branch.

- Patrick

On Fri, May 15, 2015 at 12:46 AM, Niranda Perera
niranda.per...@gmail.com wrote:
 Hi,

 May I know the tentative release dates for spark 1.3.2?

 rgds

 --
 Niranda

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Adding/Using More Resolution Types on JIRA

2015-05-15 Thread Patrick Wendell
If there is no further feedback on this I will ask ASF Infra to add
the new fields Out of Scope and Inactive.

- Patrick

On Tue, May 12, 2015 at 9:02 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 I tend to find that any large project has a lot of walking dead JIRAs, and
 pretending they are simply Open causes problems. Any state is better for
 these, so I favor this.

 Agreed.

 Inactive: A way to clear out inactive/dead JIRA's without
 indicating a decision has been made one way or the other.

 This is a good idea, and perhaps the process of closing JIRAs as Inactive
 can be automated. If nothing about a JIRA has changed in 12 months or more
 (e.g. current oldest open Spark issue; dates to Aug 2013: SPARK-867),
 perhaps a bot can mark it as such for us. (Here's a list of stale issues).

 This doesn't mean the issue is invalid or won't be addressed, but it gets it
 out of the Open queue, which ideally should be a high churn queue (e.g.
 stuff doesn't stay in there forever).

 Nick


 On Tue, May 12, 2015 at 4:49 AM Sean Owen so...@cloudera.com wrote:

 I tend to find that any large project has a lot of walking dead JIRAs, and
 pretending they are simply Open causes problems. Any state is better for
 these, so I favor this.

 The possible objection is that this will squash or hide useful issues, but
 in practice we have the opposite problem. Resolved issues are still
 searchable by default, and, people aren't shy about opening duplicates
 anyway. At least the semantics Later do not discourage a diligent searcher
 from considering commenting on and reopening such an archived JIRA.

 Patrick this could piggy back on INFRA-9513.

 As a corollary I would welcome deciding that Target Version should be used
 more narrowly to mean 'I really mean to help resolve this for the
 indicated
 version'. Setting it to a future version just to mean Later should instead
 turn into resolving the JIRA.

 Last: if JIRAs are regularly ice-boxed this way, I think it should trigger
 some reflection. Why are these JIRAs going nowhere? For completely normal
 reasons or does it mean too many TODOs are filed and forgotten? That's no
 comment on the current state, just something to watch.

 So: yes I like the idea.
 On May 12, 2015 8:50 AM, Patrick Wendell pwend...@gmail.com wrote:

  In Spark we sometimes close issues as something other than Fixed,
  and this is an important part of maintaining our JIRA.
 
  The current resolution types we use are the following:
 
  Won't Fix - bug fix or (more often) feature we don't want to add
  Invalid - issue is underspecified or not appropriate for a JIRA issue
  Duplicate - duplicate of another JIRA
  Cannot Reproduce - bug that could not be reproduced
  Not A Problem - issue purports to represent a bug, but does not
 
  I would like to propose adding a few new resolutions. This will
  require modifying the ASF JIRA, but infra said they are open to
  proposals as long as they are considered of broad interest.
 
  My issue with the current set of resolutions are that Won't Fix is a
  big catch all we use for many different things. Most often it's used
  for things that aren't even bugs even though it has Fix in the name.
  I'm proposing adding:
 
  Inactive - A feature or bug that has had no activity from users or
  developers in a long time
  Out of Scope - A feature proposal that is not in scope given the
  projects
  goals
  Later - A feature not on the immediate roadmap, but potentially of
  interest longer term (this one already exists, I'm just proposing to
  start using it)
 
  I am in no way proposing changes to the decision making model around
  JIRA's, notably that it is consensus based and that all resolutions
  are considered tentative and fully reversible.
 
  The benefits I see of this change would be the following:
  1. Inactive: A way to clear out inactive/dead JIRA's without
  indicating a decision has been made one way or the other.
  2. Out of Scope: It more clearly explains closing out-of-scope
  features than the generic Won't Fix. Also makes it more clear to
  future contributors what is considered in scope for Spark.
  3. Later: A way to signal that issues aren't targeted for a near term
  version. This would help avoid the mess we have now of like 200+
  issues targeted at each version and target version being a very bad
  indicator of actual roadmap. An alternative on this one is to have a
  version called Later or Parking Lot but not close the issues.
 
  Any thoughts on this?
 
  - Patrick
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Recent Spark test failures

2015-05-15 Thread Patrick Wendell
The PR builder currently builds against Hadoop 2.3.

- Patrick

On Fri, May 15, 2015 at 11:40 AM, Marcelo Vanzin van...@cloudera.com
wrote:

 Funny thing, since I asked this question in a PR a few minutes ago...

 Ignoring the rotation suggestion for a second, can the PR builder at least
 cover hadoop 2.2? That's the actual version used to create the official
 Spark artifacts for maven, and the oldest version Spark supports for YARN..

 Kinda the same argument as the why do we build with java 7 when we
 support java 6 discussion we had recently.


 On Fri, May 15, 2015 at 11:34 AM, Ted Yu yuzhih...@gmail.com wrote:

 bq. would be prohibitive to build all configurations for every push

 Agreed.

 Can PR builder rotate testing against hadoop 2.3, 2.4, 2.6 and 2.7 (each
 test run still uses one hadoop profile) ?

 This way we would have some coverage for each of the major hadoop
 releases.

 Cheers

 On Fri, May 15, 2015 at 10:30 AM, Sean Owen so...@cloudera.com wrote:

 You all are looking only at the pull request builder. It just does one
 build to sanity-check a pull request, since that already takes 2 hours and
 would be prohibitive to build all configurations for every push. There is a
 different set of Jenkins jobs that periodically tests master against a lot
 more configurations, including Hadoop 2.4.

 On Fri, May 15, 2015 at 6:02 PM, Frederick R Reiss frre...@us.ibm.com
 wrote:

 The PR builder seems to be building against Hadoop 2.3. In the log for
 the most recent successful build (
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32805/consoleFull
 ) I see:


 =
 Building Spark

 =
 [info] Compile with Hive 0.13.1
 [info] Building Spark with these arguments: -Pyarn -Phadoop-2.3
 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver
 ...

 =
 Running Spark unit tests

 =
 [info] Running Spark tests with these arguments: -Pyarn -Phadoop-2.3
 -Dhadoop.version=2.3.0 -Pkinesis-asl test

 Is anyone testing individual pull requests against Hadoop 2.4 or 2.6
 before the code is declared clean?

 Fred

 [image: Inactive hide details for Ted Yu ---05/15/2015 09:29:09
 AM---Jenkins build against hadoop 2.4 has been unstable recently: https]Ted
 Yu ---05/15/2015 09:29:09 AM---Jenkins build against hadoop 2.4 has been
 unstable recently: https://amplab.cs.berkeley.edu/jenkins/

 From: Ted Yu yuzhih...@gmail.com
 To: Andrew Or and...@databricks.com
 Cc: dev@spark.apache.org dev@spark.apache.org
 Date: 05/15/2015 09:29 AM
 Subject: Re: Recent Spark test failures
 --



 Jenkins build against hadoop 2.4 has been unstable recently:

 *https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/*
 https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/

 I haven't found the test which hung / failed in recent Jenkins builds.

 But PR builder has several green builds lately:
 *https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/*
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/

 Maybe PR builder doesn't build against hadoop 2.4 ?

 Cheers

 On Mon, May 11, 2015 at 1:11 PM, Ted Yu *yuzhih...@gmail.com*
 yuzhih...@gmail.com wrote:

Makes sense.

Having high determinism in these tests would make Jenkins build
stable.


On Mon, May 11, 2015 at 1:08 PM, Andrew Or *and...@databricks.com*
and...@databricks.com wrote:
   Hi Ted,

   Yes, those two options can be useful, but in general I think the
   standard to set is that tests should never fail. It's actually the 
 worst if
   tests fail sometimes but not others, because we can't reproduce them
   deterministically. Using -M and -A actually tolerates flaky tests to 
 a
   certain extent, and I would prefer to instead increase the 
 determinism in
   these tests.

   -Andrew

   2015-05-08 17:56 GMT-07:00 Ted Yu *yuzhih...@gmail.com*
   yuzhih...@gmail.com:
   Andrew:
  Do you think the -M and -A options described here can be used
  in test runs ?
  *http://scalatest.org/user_guide/using_the_runner*
  http://scalatest.org/user_guide/using_the_runner

  Cheers

  On Wed, May 6, 2015 at 5:41 PM, Andrew Or 
  *and...@databricks.com* and...@databricks.com wrote:
 Dear all,

 I'm sure you have all noticed that the Spark tests have
 been fairly
 unstable recently. I wanted to share a tool that I use to
 track which tests
 have been failing most often in order to prioritize fixing
   

[jira] [Updated] (SPARK-5632) not able to resolve dot('.') in field name

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5632:
---
Fix Version/s: 1.4.0

 not able to resolve dot('.') in field name
 --

 Key: SPARK-5632
 URL: https://issues.apache.org/jira/browse/SPARK-5632
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 1.2.0, 1.3.0
 Environment: Spark cluster: EC2 m1.small + Spark 1.2.0
 Cassandra cluster: EC2 m3.xlarge + Cassandra 2.1.2
Reporter: Lishu Liu
Priority: Blocker
 Fix For: 1.4.0


 My cassandra table task_trace has a field sm.result which contains dot in the 
 name. So SQL tried to look up sm instead of full name 'sm.result'. 
 Here is my code: 
 {code}
 scala import org.apache.spark.sql.cassandra.CassandraSQLContext
 scala val cc = new CassandraSQLContext(sc)
 scala val task_trace = cc.jsonFile(/task_trace.json)
 scala task_trace.registerTempTable(task_trace)
 scala cc.setKeyspace(cerberus_data_v4)
 scala val res = cc.sql(SELECT received_datetime, task_body.cerberus_id, 
 task_body.sm.result FROM task_trace WHERE task_id = 
 'fff7304e-9984-4b45-b10c-0423a96745ce')
 res: org.apache.spark.sql.SchemaRDD = 
 SchemaRDD[57] at RDD at SchemaRDD.scala:108
 == Query Plan ==
 == Physical Plan ==
 java.lang.RuntimeException: No such struct field sm in cerberus_batch_id, 
 cerberus_id, couponId, coupon_code, created, description, domain, expires, 
 message_id, neverShowAfter, neverShowBefore, offerTitle, screenshots, 
 sm.result, sm.task, startDate, task_id, url, uuid, validationDateTime, 
 validity
 {code}
 The full schema look like this:
 {code}
 scala task_trace.printSchema()
 root
  \|-- received_datetime: long (nullable = true)
  \|-- task_body: struct (nullable = true)
  \|\|-- cerberus_batch_id: string (nullable = true)
  \|\|-- cerberus_id: string (nullable = true)
  \|\|-- couponId: integer (nullable = true)
  \|\|-- coupon_code: string (nullable = true)
  \|\|-- created: string (nullable = true)
  \|\|-- description: string (nullable = true)
  \|\|-- domain: string (nullable = true)
  \|\|-- expires: string (nullable = true)
  \|\|-- message_id: string (nullable = true)
  \|\|-- neverShowAfter: string (nullable = true)
  \|\|-- neverShowBefore: string (nullable = true)
  \|\|-- offerTitle: string (nullable = true)
  \|\|-- screenshots: array (nullable = true)
  \|\|\|-- element: string (containsNull = false)
  \|\|-- sm.result: struct (nullable = true)
  \|\|\|-- cerberus_batch_id: string (nullable = true)
  \|\|\|-- cerberus_id: string (nullable = true)
  \|\|\|-- code: string (nullable = true)
  \|\|\|-- couponId: integer (nullable = true)
  \|\|\|-- created: string (nullable = true)
  \|\|\|-- description: string (nullable = true)
  \|\|\|-- domain: string (nullable = true)
  \|\|\|-- expires: string (nullable = true)
  \|\|\|-- message_id: string (nullable = true)
  \|\|\|-- neverShowAfter: string (nullable = true)
  \|\|\|-- neverShowBefore: string (nullable = true)
  \|\|\|-- offerTitle: string (nullable = true)
  \|\|\|-- result: struct (nullable = true)
  \|\|\|\|-- post: struct (nullable = true)
  \|\|\|\|\|-- alchemy_out_of_stock: struct (nullable = true)
  \|\|\|\|\|\|-- ci: double (nullable = true)
  \|\|\|\|\|\|-- value: boolean (nullable = true)
  \|\|\|\|\|-- meta: struct (nullable = true)
  \|\|\|\|\|\|-- None_tx_value: array (nullable = true)
  \|\|\|\|\|\|\|-- element: string (containsNull = 
 false)
  \|\|\|\|\|\|-- exceptions: array (nullable = true)
  \|\|\|\|\|\|\|-- element: string (containsNull = 
 false)
  \|\|\|\|\|\|-- no_input_value: array (nullable = true)
  \|\|\|\|\|\|\|-- element: string (containsNull = 
 false)
  \|\|\|\|\|\|-- not_mapped: array (nullable = true)
  \|\|\|\|\|\|\|-- element: string (containsNull = 
 false)
  \|\|\|\|\|\|-- not_transformed: array (nullable = true)
  \|\|\|\|\|\|\|-- element: array (containsNull = 
 false)
  \|\|\|\|\|\|\|\|-- element: string (containsNull 
 = false)
  \|\|\|\|\|-- now_price_checkout: struct (nullable = true)
  \|\|\|\|\|\|-- ci: double (nullable = true)
  \|\|\|\|\|\|-- value: double (nullable = true)
  \|\|\|\|\|-- shipping_price: struct (nullable = true)
  \|\|\|\|\|\|-- ci: double

Re: Recent Spark test failures

2015-05-15 Thread Patrick Wendell
Sorry premature send:

The PR builder currently builds against Hadoop 2.3
https://github.com/apache/spark/blob/master/dev/run-tests#L54

We can set this to whatever we want. 2.2 might make sense since it's the
default in our published artifacts.

- Patrick

On Fri, May 15, 2015 at 11:53 AM, Patrick Wendell pwend...@gmail.com
wrote:

 The PR builder currently builds against Hadoop 2.3.

 - Patrick

 On Fri, May 15, 2015 at 11:40 AM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Funny thing, since I asked this question in a PR a few minutes ago...

 Ignoring the rotation suggestion for a second, can the PR builder at
 least cover hadoop 2.2? That's the actual version used to create the
 official Spark artifacts for maven, and the oldest version Spark supports
 for YARN..

 Kinda the same argument as the why do we build with java 7 when we
 support java 6 discussion we had recently.


 On Fri, May 15, 2015 at 11:34 AM, Ted Yu yuzhih...@gmail.com wrote:

 bq. would be prohibitive to build all configurations for every push

 Agreed.

 Can PR builder rotate testing against hadoop 2.3, 2.4, 2.6 and 2.7 (each
 test run still uses one hadoop profile) ?

 This way we would have some coverage for each of the major hadoop
 releases.

 Cheers

 On Fri, May 15, 2015 at 10:30 AM, Sean Owen so...@cloudera.com wrote:

 You all are looking only at the pull request builder. It just does one
 build to sanity-check a pull request, since that already takes 2 hours and
 would be prohibitive to build all configurations for every push. There is a
 different set of Jenkins jobs that periodically tests master against a lot
 more configurations, including Hadoop 2.4.

 On Fri, May 15, 2015 at 6:02 PM, Frederick R Reiss frre...@us.ibm.com
 wrote:

 The PR builder seems to be building against Hadoop 2.3. In the log for
 the most recent successful build (
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32805/consoleFull
 ) I see:


 =
 Building Spark

 =
 [info] Compile with Hive 0.13.1
 [info] Building Spark with these arguments: -Pyarn -Phadoop-2.3
 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver
 ...

 =
 Running Spark unit tests

 =
 [info] Running Spark tests with these arguments: -Pyarn -Phadoop-2.3
 -Dhadoop.version=2.3.0 -Pkinesis-asl test

 Is anyone testing individual pull requests against Hadoop 2.4 or 2.6
 before the code is declared clean?

 Fred

 [image: Inactive hide details for Ted Yu ---05/15/2015 09:29:09
 AM---Jenkins build against hadoop 2.4 has been unstable recently: 
 https]Ted
 Yu ---05/15/2015 09:29:09 AM---Jenkins build against hadoop 2.4 has been
 unstable recently: https://amplab.cs.berkeley.edu/jenkins/

 From: Ted Yu yuzhih...@gmail.com
 To: Andrew Or and...@databricks.com
 Cc: dev@spark.apache.org dev@spark.apache.org
 Date: 05/15/2015 09:29 AM
 Subject: Re: Recent Spark test failures
 --



 Jenkins build against hadoop 2.4 has been unstable recently:

 *https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/*
 https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/

 I haven't found the test which hung / failed in recent Jenkins builds.

 But PR builder has several green builds lately:
 *https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/*
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/

 Maybe PR builder doesn't build against hadoop 2.4 ?

 Cheers

 On Mon, May 11, 2015 at 1:11 PM, Ted Yu *yuzhih...@gmail.com*
 yuzhih...@gmail.com wrote:

Makes sense.

Having high determinism in these tests would make Jenkins build
stable.


On Mon, May 11, 2015 at 1:08 PM, Andrew Or *and...@databricks.com*
and...@databricks.com wrote:
   Hi Ted,

   Yes, those two options can be useful, but in general I think
   the standard to set is that tests should never fail. It's actually 
 the
   worst if tests fail sometimes but not others, because we can't 
 reproduce
   them deterministically. Using -M and -A actually tolerates flaky 
 tests to a
   certain extent, and I would prefer to instead increase the 
 determinism in
   these tests.

   -Andrew

   2015-05-08 17:56 GMT-07:00 Ted Yu *yuzhih...@gmail.com*
   yuzhih...@gmail.com:
   Andrew:
  Do you think the -M and -A options described here can be
  used in test runs ?
  *http://scalatest.org/user_guide/using_the_runner*
  http://scalatest.org/user_guide/using_the_runner

  Cheers

  On Wed, May 6, 2015 at 5:41 PM, Andrew

[jira] [Updated] (SPARK-6595) DataFrame self joins with MetastoreRelations fail

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6595:
---
Fix Version/s: 1.4.0
   1.3.2

 DataFrame self joins with MetastoreRelations fail
 -

 Key: SPARK-6595
 URL: https://issues.apache.org/jira/browse/SPARK-6595
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Blocker
 Fix For: 1.3.2, 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4128) Create instructions on fully building Spark in Intellij

2015-05-14 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544118#comment-14544118
 ] 

Patrick Wendell commented on SPARK-4128:


Thanks for bringing this back up [~srowen]. When you removed this I reached out 
but we discussed offline and concluded that in IDEA 14 maybe it wasn't 
necessary (because IIRC you had got it working without making these changes). 
But maybe it is still needed.

 Create instructions on fully building Spark in Intellij
 ---

 Key: SPARK-4128
 URL: https://issues.apache.org/jira/browse/SPARK-4128
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.2.0


 With some of our more complicated modules, I'm not sure whether Intellij 
 correctly understands all source locations. Also, we might require specifying 
 some profiles for the build to work directly. We should document clearly how 
 to start with vanilla Spark master and get the entire thing building in 
 Intellij.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: How to link code pull request with JIRA ID?

2015-05-14 Thread Patrick Wendell
Yeah I wrote the original script and I intentionally made it easy for
other projects to use (you'll just need to tweak some variables at the
top). You just need somewhere to run it... we were using a jenkins
cluster to run it every 5 minutes.

BTW - I looked and there is one instance where it hard cores the
string SPARK-, but that should be easy to change. I'm happy to
review a patch that makes that prefix a variable.

https://github.com/apache/spark/blob/master/dev/github_jira_sync.py#L71

- Patrick

On Thu, May 14, 2015 at 8:45 AM, Josh Rosen rosenvi...@gmail.com wrote:
 Spark PRs didn't always used to handle the JIRA linking.  We used to rely
 on a Jenkins job that ran
 https://github.com/apache/spark/blob/master/dev/github_jira_sync.py.  We
 switched this over to Spark PRs at a time when the Jenkins GitHub Pull
 Request Builder plugin was having flakiness issues, but as far as I know
 that old script should still work.

 On Wed, May 13, 2015 at 9:40 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 There's no magic to it. We're doing the same, except Josh automated it in
 the PR dashboard he created.

 https://spark-prs.appspot.com/

 Nick

 On Wed, May 13, 2015 at 6:20 PM Markus Weimer mar...@weimo.de wrote:

  Hi,
 
  how did you set this up? Over in the REEF incubation project, we
  painstakingly create the forwards- and backwards links despite having
  the IDs in the PR descriptions...
 
  Thanks!
 
  Markus
 
 
  On 2015-05-13 11:56, Ted Yu wrote:
   Subproject tag should follow SPARK JIRA number.
   e.g.
  
   [SPARK-5277][SQL] ...
  
   Cheers
  
   On Wed, May 13, 2015 at 11:50 AM, Stephen Boesch java...@gmail.com
  wrote:
  
   following up from Nicholas, it is
  
   [SPARK-12345] Your PR description
  
   where 12345 is the jira number.
  
  
   One thing I tend to forget is when/where to include the subproject tag
  e.g.
[MLLIB]
  
  
   2015-05-13 11:11 GMT-07:00 Nicholas Chammas 
 nicholas.cham...@gmail.com
  :
  
   That happens automatically when you open a PR with the JIRA key in
 the
  PR
   title.
  
   On Wed, May 13, 2015 at 2:10 PM Chandrashekhar Kotekar 
   shekhar.kote...@gmail.com wrote:
  
   Hi,
  
   I am new to open source contribution and trying to understand the
   process
   starting from pulling code to uploading patch.
  
   I have managed to pull code from GitHub. In JIRA I saw that each
 JIRA
   issue
   is connected with pull request. I would like to know how do people
   attach
   pull request details to JIRA issue?
  
   Thanks,
   Chandrash3khar Kotekar
   Mobile - +91 8600011455
  
  
  
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Resolved] (SPARK-7297) Make timeline more discoverable

2015-05-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7297.

Resolution: Fixed

 Make timeline more discoverable
 ---

 Key: SPARK-7297
 URL: https://issues.apache.org/jira/browse/SPARK-7297
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker

 Currently there is a small drop down triangle. I showed this to many people 
 and they said they couldn't easily find it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-05-14 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544290#comment-14544290
 ] 

Patrick Wendell commented on SPARK-7563:


/cc [~joshrosen] I think this is caused by the output committer change you 
worked on. Probably just a corner case here when executors die in the spark 
shell.

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen

 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop

[jira] [Updated] (SPARK-7063) Update lz4 for Java 7 to avoid: when lz4 compression is used, it causes core dump

2015-05-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7063:
---
Target Version/s: 1.5.0  (was: 2+)

 Update lz4 for Java 7 to avoid: when lz4 compression is used, it causes core 
 dump
 -

 Key: SPARK-7063
 URL: https://issues.apache.org/jira/browse/SPARK-7063
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: IBM JDK
Reporter: Jenny MA
Priority: Minor

 this issue is initially noticed by using IBM JDK, below please find the stack 
 track of this issue, caused by violating the rule in critical section. 
 #0 0x00314340f3cb in raise () from 
 /service/pmrs/45638/20/lib64/libpthread.so.0
 #1 0x7f795b0323be in j9dump_create () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9prt27.so
 #2 0x7f795a88ba2a in doSystemDump () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so
 #3 0x7f795b0405d5 in j9sig_protect () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9prt27.so
 #4 0x7f795a88a1fd in runDumpFunction () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so
 #5 0x7f795a88dbab in runDumpAgent () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so
 #6 0x7f795a8a1c49 in triggerDumpAgents () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so
 #7 0x7f795a4518fe in doTracePoint () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9trc27.so
 #8 0x7f795a45210e in j9Trace () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9trc27.so
 #9 0x7f79590e46e1 in 
 MM_StandardAccessBarrier::jniReleasePrimitiveArrayCritical(J9VMThread*, 
 _jarray*, void*, int) ()
 from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9gc27.so
 #10 0x7f7938bc397c in 
 Java_net_jpountz_lz4_LZ4JNI_LZ4_1compress_1limitedOutput () from 
 /service/pmrs/45638/20/tmp/liblz4-java7155003924599399415.so
 #11 0x7f795b707149 in VMprJavaSendNative () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9vm27.so
 #12 0x in ?? ()
 this is an issue introduced by a bug in net.jpountz.lz4.lz4-1.2.0.jar, and 
 fixed in 1.3.0 version.  Sun JDK /Open JDK doesn't complain this issue, but 
 this issue will trigger assertion failure when IBM JDK is used. here is the 
 link to the fix 
 https://github.com/jpountz/lz4-java/commit/07229aa2f788229ab4f50379308297f428e3d2d2
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7063) Update lz4 for Java 7 to avoid: when lz4 compression is used, it causes core dump

2015-05-14 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544359#comment-14544359
 ] 

Patrick Wendell commented on SPARK-7063:


[~srowen] so I think maybe we can pull this into master now, given that we'll 
drop 1.6 in Spark 1.5 (?)

 Update lz4 for Java 7 to avoid: when lz4 compression is used, it causes core 
 dump
 -

 Key: SPARK-7063
 URL: https://issues.apache.org/jira/browse/SPARK-7063
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: IBM JDK
Reporter: Jenny MA
Priority: Minor

 this issue is initially noticed by using IBM JDK, below please find the stack 
 track of this issue, caused by violating the rule in critical section. 
 #0 0x00314340f3cb in raise () from 
 /service/pmrs/45638/20/lib64/libpthread.so.0
 #1 0x7f795b0323be in j9dump_create () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9prt27.so
 #2 0x7f795a88ba2a in doSystemDump () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so
 #3 0x7f795b0405d5 in j9sig_protect () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9prt27.so
 #4 0x7f795a88a1fd in runDumpFunction () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so
 #5 0x7f795a88dbab in runDumpAgent () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so
 #6 0x7f795a8a1c49 in triggerDumpAgents () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9dmp27.so
 #7 0x7f795a4518fe in doTracePoint () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9trc27.so
 #8 0x7f795a45210e in j9Trace () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9trc27.so
 #9 0x7f79590e46e1 in 
 MM_StandardAccessBarrier::jniReleasePrimitiveArrayCritical(J9VMThread*, 
 _jarray*, void*, int) ()
 from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9gc27.so
 #10 0x7f7938bc397c in 
 Java_net_jpountz_lz4_LZ4JNI_LZ4_1compress_1limitedOutput () from 
 /service/pmrs/45638/20/tmp/liblz4-java7155003924599399415.so
 #11 0x7f795b707149 in VMprJavaSendNative () from 
 /service/pmrs/45638/20/opt/ibm/biginsights/jdk/jre/lib/amd64/compressedrefs/libj9vm27.so
 #12 0x in ?? ()
 this is an issue introduced by a bug in net.jpountz.lz4.lz4-1.2.0.jar, and 
 fixed in 1.3.0 version.  Sun JDK /Open JDK doesn't complain this issue, but 
 this issue will trigger assertion failure when IBM JDK is used. here is the 
 link to the fix 
 https://github.com/jpountz/lz4-java/commit/07229aa2f788229ab4f50379308297f428e3d2d2
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7622) Test Jira

2015-05-13 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7622:
--

 Summary: Test Jira
 Key: SPARK-7622
 URL: https://issues.apache.org/jira/browse/SPARK-7622
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7622) Test Jira

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7622:


 Test Jira
 -

 Key: SPARK-7622
 URL: https://issues.apache.org/jira/browse/SPARK-7622
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7622) Test Jira

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell closed SPARK-7622.
--
Resolution: Invalid

 Test Jira
 -

 Key: SPARK-7622
 URL: https://issues.apache.org/jira/browse/SPARK-7622
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [IMPORTANT] Committers please update merge script

2015-05-13 Thread Patrick Wendell
Hi All - unfortunately the fix introduced another bug, which is that
fixVersion was not updated properly. I've updated the script and had
one other person test it.

So committers please pull from master again thanks!

- Patrick

On Tue, May 12, 2015 at 6:25 PM, Patrick Wendell pwend...@gmail.com wrote:
 Due to an ASF infrastructure change (bug?) [1] the default JIRA
 resolution status has switched to Pending Closed. I've made a change
 to our merge script to coerce the correct status of Fixed when
 resolving [2]. Please upgrade the merge script to master.

 I've manually corrected JIRA's that were closed with the incorrect
 status. Let me know if you have any issues.

 [1] https://issues.apache.org/jira/browse/INFRA-9646

 [2] 
 https://github.com/apache/spark/commit/1b9e434b6c19f23a01e9875a3c1966cd03ce8e2d

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Resolved] (SPARK-7622) Test Jira

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7622.

Resolution: Invalid

 Test Jira
 -

 Key: SPARK-7622
 URL: https://issues.apache.org/jira/browse/SPARK-7622
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
 Fix For: 1.6.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7622) Test Jira

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7622:


 Test Jira
 -

 Key: SPARK-7622
 URL: https://issues.apache.org/jira/browse/SPARK-7622
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
 Fix For: 1.6.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7531) Install GPG on Jenkins machines

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7531:
---
Fix Version/s: 1.4.0

 Install GPG on Jenkins machines
 ---

 Key: SPARK-7531
 URL: https://issues.apache.org/jira/browse/SPARK-7531
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Patrick Wendell
Assignee: shane knapp
 Fix For: 1.4.0


 This one is also required for us to cut regular snapshot releases from 
 Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Change for submitting to yarn in 1.3.1

2015-05-13 Thread Patrick Wendell
Hey Chester,

Thanks for sending this. It's very helpful to have this list.

The reason we made the Client API private was that it was never
intended to be used by third parties programmatically and we don't
intend to support it in its current form as a stable API. We thought
the fact that it was for internal use would be obvious since it
accepts arguments as a string array of CL args. It was always intended
for command line use and the stable API was the command line.

When we migrated the Launcher library we figured we covered most of
the use cases in the off chance someone was using the Client. It
appears we regressed one feature which was a clean way to get the app
ID.

The items you list here 2-6 all seem like new feature requests rather
than a regression caused by us making that API private.

I think the way to move forward is for someone to design a proper
long-term stable API for the things you mentioned here. That could
either be by extension of the Launcher library. Marcelo would be
natural to help with this effort since he was heavily involved in both
YARN support and the launcher. So I'm curious to hear his opinion on
how best to move forward.

I do see how apps that run Spark would benefit of having a control
plane for querying status, both on YARN and elsewhere.

- Patrick

On Wed, May 13, 2015 at 5:44 AM, Chester At Work ches...@alpinenow.com wrote:
 Patrick
  There are several things we need, some of them already mentioned in the 
 mailing list before.

 I haven't looked at the SparkLauncher code, but here are few things we need 
 from our perspectives for Spark Yarn Client

  1) client should not be private ( unless alternative is provided) so we 
 can call it directly.
  2) we need a way to stop the running yarn app programmatically ( the PR 
 is already submitted)
  3) before we start the spark job, we should have a call back to the 
 application, which will provide the yarn container capacity (number of cores 
 and max memory ), so spark program will not set values beyond max values (PR 
 submitted)
  4) call back could be in form of yarn app listeners, which call back 
 based on yarn status changes ( start, in progress, failure, complete etc), 
 application can react based on these events in PR)

  5) yarn client passing arguments to spark program in the form of main 
 program, we had experience problems when we pass a very large argument due 
 the length limit. For example, we use json to serialize the argument and 
 encoded, then parse them as argument. For wide columns datasets, we will run 
 into limit. Therefore, an alternative way of passing additional larger 
 argument is needed. We are experimenting with passing the args via a 
 established akka messaging channel.

 6) spark yarn client in yarn-cluster mode right now is essentially a 
 batch job with no communication once it launched. Need to establish the 
 communication channel so that logs, errors, status updates, progress bars, 
 execution stages etc can be displayed on the application side. We added an 
 akka communication channel for this (working on PR ).

Combined with others items in this list, we are able to redirect print 
 and error statement to application log (outside of the hadoop cluster), so 
 spark UI equivalent progress bar via spark listener. We can show yarn 
 progress via yarn app listener before spark started; and status can be 
 updated during job execution.

 We are also experimenting with long running job with additional spark 
 commands and interactions via this channel.


  Chester









 Sent from my iPad

 On May 12, 2015, at 20:54, Patrick Wendell pwend...@gmail.com wrote:

 Hey Kevin and Ron,

 So is the main shortcoming of the launcher library the inability to
 get an app ID back from YARN? Or are there other issues here that
 fundamentally regress things for you.

 It seems like adding a way to get back the appID would be a reasonable
 addition to the launcher.

 - Patrick

 On Tue, May 12, 2015 at 12:51 PM, Marcelo Vanzin van...@cloudera.com wrote:
 On Tue, May 12, 2015 at 11:34 AM, Kevin Markey kevin.mar...@oracle.com
 wrote:

 I understand that SparkLauncher was supposed to address these issues, but
 it really doesn't.  Yarn already provides indirection and an arm's length
 transaction for starting Spark on a cluster. The launcher introduces yet
 another layer of indirection and dissociates the Yarn Client from the
 application that launches it.


 Well, not fully. The launcher was supposed to solve how to launch a Spark
 app programatically, but in the first version nothing was added to
 actually gather information about the running app. It's also limited in the
 way it works because of Spark's limitations (one context per JVM, etc).

 Still, adding things like this is something that is definitely in the scope
 for the launcher library; information such as app id can be useful for the
 code launching the app, not just in yarn mode. We just

[jira] [Updated] (SPARK-7622) Test Jira

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7622:
---
Fix Version/s: (was: 1.6.0)

 Test Jira
 -

 Key: SPARK-7622
 URL: https://issues.apache.org/jira/browse/SPARK-7622
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7622) Test Jira

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7622:


 Test Jira
 -

 Key: SPARK-7622
 URL: https://issues.apache.org/jira/browse/SPARK-7622
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6568) spark-shell.cmd --jars option does not accept the jar that has space in its path

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6568:
---
Fix Version/s: 1.4.0

 spark-shell.cmd --jars option does not accept the jar that has space in its 
 path
 

 Key: SPARK-6568
 URL: https://issues.apache.org/jira/browse/SPARK-6568
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.3.0
 Environment: Windows 8.1
Reporter: Masayoshi TSUZUKI
Assignee: Masayoshi TSUZUKI
 Fix For: 1.4.0


 spark-shell.cmd --jars option does not accept the jar that has space in its 
 path.
 The path of jar sometimes containes space in Windows.
 {code}
 bin\spark-shell.cmd --jars C:\Program Files\some\jar1.jar
 {code}
 this gets
 {code}
 Exception in thread main java.net.URISyntaxException: Illegal character in 
 path at index 10: C:/Program Files/some/jar1.jar
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7561) Install Junit Attachment Plugin on Jenkins

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7561:


 Install Junit Attachment Plugin on Jenkins
 --

 Key: SPARK-7561
 URL: https://issues.apache.org/jira/browse/SPARK-7561
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Patrick Wendell
Assignee: shane knapp
 Fix For: 1.4.0


 As part of SPARK-7560 I'd like to just attach the test output file to the 
 Jenkins build. This is nicer than requiring someone have an SSH login to the 
 master node.
 Currently we gzip the logs, copy it to the master, and then delete them on 
 the worker.
 https://github.com/apache/spark/blob/master/dev/run-tests-jenkins#L132
 Instead I think we can just gzip them and then have the attachment plugin add 
 them to the build. But it would require installing this plug-in to see if we 
 can get it working.
 [~shaneknapp] not sure how willing you are to install plug-ins on Jenkins, 
 but this one would be awesome if it's doable and we can get it working.
 https://wiki.jenkins-ci.org/display/JENKINS/JUnit+Attachments+Plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7526) Specify ip of RBackend, MonitorServer and RRDD Socket server

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7526:
---
Fix Version/s: 1.4.0

 Specify ip of RBackend, MonitorServer and RRDD Socket server
 

 Key: SPARK-7526
 URL: https://issues.apache.org/jira/browse/SPARK-7526
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Weizhong
Assignee: Weizhong
Priority: Minor
 Fix For: 1.4.0


 These R process only used to communicate with JVM process on local, so 
 binding to localhost is more reasonable then wildcard ip.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7303) push down project if possible when the child is sort

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7303:
---
Fix Version/s: 1.4.0

 push down project if possible when the child is sort
 

 Key: SPARK-7303
 URL: https://issues.apache.org/jira/browse/SPARK-7303
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Fei Wang
 Fix For: 1.4.0


 Optimize the case of `project(_, sort)` , a example is:
 `select key from (select * from testData order by key) t`
 optimize it from
 ```
 == Parsed Logical Plan ==
 'Project ['key]
  'Subquery t
   'Sort ['key ASC], true
'Project [*]
 'UnresolvedRelation [testData], None
 == Analyzed Logical Plan ==
 Project [key#0]
  Subquery t
   Sort [key#0 ASC], true
Project [key#0,value#1]
 Subquery testData
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
 == Optimized Logical Plan ==
 Project [key#0]
  Sort [key#0 ASC], true
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 == Physical Plan ==
 Project [key#0]
  Sort [key#0 ASC], true
   Exchange (RangePartitioning [key#0 ASC], 5), []
PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 ```
 to 
 ```
 == Parsed Logical Plan ==
 'Project ['key]
  'Subquery t
   'Sort ['key ASC], true
'Project [*]
 'UnresolvedRelation [testData], None
 == Analyzed Logical Plan ==
 Project [key#0]
  Subquery t
   Sort [key#0 ASC], true
Project [key#0,value#1]
 Subquery testData
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 == Optimized Logical Plan ==
 Sort [key#0 ASC], true
  Project [key#0]
   LogicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 == Physical Plan ==
 Sort [key#0 ASC], true
  Exchange (RangePartitioning [key#0 ASC], 5), []
   Project [key#0]
PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] 
 ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7601) Support Insert into JDBC Datasource

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7601:
---
Fix Version/s: 1.4.0

 Support Insert into JDBC Datasource
 ---

 Key: SPARK-7601
 URL: https://issues.apache.org/jira/browse/SPARK-7601
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Venkata Ramana G
 Fix For: 1.4.0


 Support Insert into JDBCDataSource. Following are usage examples
 {code}
 sqlContext.sql(
   s
 |CREATE TEMPORARY TABLE testram1
 |USING org.apache.spark.sql.jdbc
 |OPTIONS (url '$url', dbtable 'testram1', user 'xx', password 'xx', 
 driver 'com.h2.Driver')
   .stripMargin.replaceAll(\n,  ))
 sqlContext.sql(insert into table testram1 select * from testsrc).show
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7482) Rename some DataFrame API methods in SparkR to match their counterparts in Scala

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7482:
---
Fix Version/s: 1.4.0

 Rename some DataFrame API methods in SparkR to match their counterparts in 
 Scala
 

 Key: SPARK-7482
 URL: https://issues.apache.org/jira/browse/SPARK-7482
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Sun Rui
Assignee: Sun Rui
Priority: Critical
 Fix For: 1.4.0


 This is re-consideration on how to solve name conflict. Previously, we rename 
 API names from Scala API if there is name conflict with base or other 
 commonly-used packages. However, from long term perspective, this is not good 
 for API stability, because we can't predict name conflicts, for example, if 
 in the future a name added in base package conflicts with an API in SparkR? 
 So the better policy is to keep API name same as Scala's without worrying 
 about name conflicts. When users use SparkR, they should load SparkR as last 
 package, so that all API names are effective. Use can explicitly use :: to 
 refer to hidden names from other packages.
 more discussion can be found at 
 https://issues.apache.org/jira/browse/SPARK-6812



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7589) Make Input Rate in the Streaming page consistent with other pages

2015-05-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7589:
---
Component/s: Streaming

 Make Input Rate in the Streaming page consistent with other pages
 ---

 Key: SPARK-7589
 URL: https://issues.apache.org/jira/browse/SPARK-7589
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Reporter: Shixiong Zhu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7597) Make default doc build avoid search engine indexing

2015-05-13 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7597:
--

 Summary: Make default doc build avoid search engine indexing
 Key: SPARK-7597
 URL: https://issues.apache.org/jira/browse/SPARK-7597
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Patrick Wendell
Assignee: Patrick Wendell


By default we should add the necessary headers to avoid indexing. This will 
help random personally hosted docs from getting indexed, for instance, nightly 
doc builds. We should gate this behind the PRODUCTION flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Adding/Using More Resolution Types on JIRA

2015-05-12 Thread Patrick Wendell
In Spark we sometimes close issues as something other than Fixed,
and this is an important part of maintaining our JIRA.

The current resolution types we use are the following:

Won't Fix - bug fix or (more often) feature we don't want to add
Invalid - issue is underspecified or not appropriate for a JIRA issue
Duplicate - duplicate of another JIRA
Cannot Reproduce - bug that could not be reproduced
Not A Problem - issue purports to represent a bug, but does not

I would like to propose adding a few new resolutions. This will
require modifying the ASF JIRA, but infra said they are open to
proposals as long as they are considered of broad interest.

My issue with the current set of resolutions are that Won't Fix is a
big catch all we use for many different things. Most often it's used
for things that aren't even bugs even though it has Fix in the name.
I'm proposing adding:

Inactive - A feature or bug that has had no activity from users or
developers in a long time
Out of Scope - A feature proposal that is not in scope given the projects goals
Later - A feature not on the immediate roadmap, but potentially of
interest longer term (this one already exists, I'm just proposing to
start using it)

I am in no way proposing changes to the decision making model around
JIRA's, notably that it is consensus based and that all resolutions
are considered tentative and fully reversible.

The benefits I see of this change would be the following:
1. Inactive: A way to clear out inactive/dead JIRA's without
indicating a decision has been made one way or the other.
2. Out of Scope: It more clearly explains closing out-of-scope
features than the generic Won't Fix. Also makes it more clear to
future contributors what is considered in scope for Spark.
3. Later: A way to signal that issues aren't targeted for a near term
version. This would help avoid the mess we have now of like 200+
issues targeted at each version and target version being a very bad
indicator of actual roadmap. An alternative on this one is to have a
version called Later or Parking Lot but not close the issues.

Any thoughts on this?

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Created] (SPARK-7561) Install Junit Attachment Plugin on Jenkins

2015-05-12 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7561:
--

 Summary: Install Junit Attachment Plugin on Jenkins
 Key: SPARK-7561
 URL: https://issues.apache.org/jira/browse/SPARK-7561
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Patrick Wendell
Assignee: shane knapp


As part of SPARK-7560 I'd like to just attach the test output file to the 
Jenkins build. This is nicer than requiring someone have an SSH login to the 
master node.

Currently we gzip the logs, copy it to the master, and then delete them on the 
worker.
https://github.com/apache/spark/blob/master/dev/run-tests-jenkins#L132

Instead I think we can just gzip them and then have the attachment plugin add 
them to the build. But it would require installing this plug-in to see if we 
can get it working.

[~shaneknapp] not sure how willing you are to install plug-ins on Jenkins, but 
this one would be awesome if it's doable and we can get it working.

https://wiki.jenkins-ci.org/display/JENKINS/JUnit+Attachments+Plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7558) Log test name when starting and finishing each test

2015-05-12 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7558:
--

 Summary: Log test name when starting and finishing each test
 Key: SPARK-7558
 URL: https://issues.apache.org/jira/browse/SPARK-7558
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Reporter: Patrick Wendell
Assignee: Andrew Or


Right now it's really tough to interpret testing output because logs for 
different tests are interspersed in the same unit-tests.log file. This makes it 
particularly hard to diagnose flaky tests. This would be much easier if we 
logged the test name before and after every test (e.g. Starting test XX, 
Finished test XX). Then you could get right to the logs.

I think one way to do this might be to create a custom test fixture that logs 
the test class name and then mix that into every test suite /cc [~joshrosen] 
for his superb knowledge of Scalatest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7560) Make flaky tests easier to debug

2015-05-12 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7560:
--

 Summary: Make flaky tests easier to debug
 Key: SPARK-7560
 URL: https://issues.apache.org/jira/browse/SPARK-7560
 Project: Spark
  Issue Type: New Feature
  Components: Project Infra, Tests
Reporter: Patrick Wendell


Right now it's really hard for people to even get the logs from a flakey test. 
Once you get the logs, it's very difficult to figure out what logs are 
associated with what tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7558) Log test name when starting and finishing each test

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7558:
---
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-7560

 Log test name when starting and finishing each test
 ---

 Key: SPARK-7558
 URL: https://issues.apache.org/jira/browse/SPARK-7558
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Reporter: Patrick Wendell
Assignee: Andrew Or

 Right now it's really tough to interpret testing output because logs for 
 different tests are interspersed in the same unit-tests.log file. This makes 
 it particularly hard to diagnose flaky tests. This would be much easier if we 
 logged the test name before and after every test (e.g. Starting test XX, 
 Finished test XX). Then you could get right to the logs.
 I think one way to do this might be to create a custom test fixture that logs 
 the test class name and then mix that into every test suite /cc [~joshrosen] 
 for his superb knowledge of Scalatest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7590) Test Issue to Debug JIRA Problem

2015-05-12 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7590:
--

 Summary: Test Issue to Debug JIRA Problem
 Key: SPARK-7590
 URL: https://issues.apache.org/jira/browse/SPARK-7590
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7590) Test Issue to Debug JIRA Problem

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7590.

Resolution: Fixed

Issue resolved by pull request 5426
[https://github.com/apache/spark/pull/5426]

 Test Issue to Debug JIRA Problem
 

 Key: SPARK-7590
 URL: https://issues.apache.org/jira/browse/SPARK-7590
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
 Fix For: 1.6.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7590) Test Issue to Debug JIRA Problem

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7590:


 Test Issue to Debug JIRA Problem
 

 Key: SPARK-7590
 URL: https://issues.apache.org/jira/browse/SPARK-7590
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
 Fix For: 1.6.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7592) Resolution set to Pending Closed when using PR merge script

2015-05-12 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-7592:
--

 Summary: Resolution set to Pending Closed when using PR merge 
script
 Key: SPARK-7592
 URL: https://issues.apache.org/jira/browse/SPARK-7592
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker


I noticed this was happening. The issue is that the behavior of the ASF JIRA 
silently changed. Now when the Resolve Issue transition occurs, the default 
resolution is Pending Closed. We used to count on the default behavior being 
to set the resolution as Fixed.

The solution is to explicitly set the resolution as Fixed and not count on 
default behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[IMPORTANT] Committers please update merge script

2015-05-12 Thread Patrick Wendell
Due to an ASF infrastructure change (bug?) [1] the default JIRA
resolution status has switched to Pending Closed. I've made a change
to our merge script to coerce the correct status of Fixed when
resolving [2]. Please upgrade the merge script to master.

I've manually corrected JIRA's that were closed with the incorrect
status. Let me know if you have any issues.

[1] https://issues.apache.org/jira/browse/INFRA-9646

[2] 
https://github.com/apache/spark/commit/1b9e434b6c19f23a01e9875a3c1966cd03ce8e2d

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Reopened] (SPARK-7590) Test Issue to Debug JIRA Problem

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7590:


 Test Issue to Debug JIRA Problem
 

 Key: SPARK-7590
 URL: https://issues.apache.org/jira/browse/SPARK-7590
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
 Fix For: 1.6.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7590) Test Issue to Debug JIRA Problem

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7590:


 Test Issue to Debug JIRA Problem
 

 Key: SPARK-7590
 URL: https://issues.apache.org/jira/browse/SPARK-7590
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
 Fix For: 1.6.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7590) Test Issue to Debug JIRA Problem

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7590.

Resolution: Pending Closed

Issue resolved by pull request 5426
[https://github.com/apache/spark/pull/5426]

 Test Issue to Debug JIRA Problem
 

 Key: SPARK-7590
 URL: https://issues.apache.org/jira/browse/SPARK-7590
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
 Fix For: 1.6.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-5182) Partitioning support for tables created by the data source API

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-5182:


 Partitioning support for tables created by the data source API
 --

 Key: SPARK-5182
 URL: https://issues.apache.org/jira/browse/SPARK-5182
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai
Assignee: Cheng Lian
Priority: Blocker
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6876) DataFrame.na.replace value support for Python

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-6876.

Resolution: Fixed

 DataFrame.na.replace value support for Python
 -

 Key: SPARK-6876
 URL: https://issues.apache.org/jira/browse/SPARK-6876
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Adrian Wang
 Fix For: 1.4.0


 Scala/Java support is in. We should provide the Python version, similar to 
 what Pandas supports.
 http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.replace.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7435) Make DataFrame.show() consistent with that of Scala and pySpark

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7435:


 Make DataFrame.show() consistent with that of Scala and pySpark
 ---

 Key: SPARK-7435
 URL: https://issues.apache.org/jira/browse/SPARK-7435
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Sun Rui
Assignee: Rekha Joshi
Priority: Critical
 Fix For: 1.4.0


 Currently in SparkR, DataFrame has two methods show() and showDF(). show() 
 prints the DataFrame column names and types and showDF() prints the first 
 numRows rows of a DataFrame.
 In Scala and pySpark, show() is used to prints rows of a DataFrame. 
 We'd better keep API consistent unless there is some important reason. So 
 propose to interchange the names (show() and showDF()) in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7435) Make DataFrame.show() consistent with that of Scala and pySpark

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-7435.

Resolution: Fixed

 Make DataFrame.show() consistent with that of Scala and pySpark
 ---

 Key: SPARK-7435
 URL: https://issues.apache.org/jira/browse/SPARK-7435
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Sun Rui
Assignee: Rekha Joshi
Priority: Critical
 Fix For: 1.4.0


 Currently in SparkR, DataFrame has two methods show() and showDF(). show() 
 prints the DataFrame column names and types and showDF() prints the first 
 numRows rows of a DataFrame.
 In Scala and pySpark, show() is used to prints rows of a DataFrame. 
 We'd better keep API consistent unless there is some important reason. So 
 propose to interchange the names (show() and showDF()) in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5182) Partitioning support for tables created by the data source API

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5182.

Resolution: Fixed

 Partitioning support for tables created by the data source API
 --

 Key: SPARK-5182
 URL: https://issues.apache.org/jira/browse/SPARK-5182
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Yin Huai
Assignee: Cheng Lian
Priority: Blocker
 Fix For: 1.4.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-7534) Fix the Stage table when a stage is missing

2015-05-12 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-7534:


 Fix the Stage table when a stage is missing
 ---

 Key: SPARK-7534
 URL: https://issues.apache.org/jira/browse/SPARK-7534
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu
Priority: Minor
 Fix For: 1.4.0


 Just improved the Stage table when a stage is missing.
 Please see the screenshots in https://github.com/apache/spark/pull/6061



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >