[jira] [Commented] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]

2015-08-10 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680885#comment-14680885
 ] 

Patrick Wendell commented on SPARK-7726:


[~srowen] [~dragos] This is cropping up again when trying to create a release 
candidate for Spark 1.5:

https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Release-All-Java7/26/console

 Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]
 -

 Key: SPARK-7726
 URL: https://issues.apache.org/jira/browse/SPARK-7726
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell
Assignee: Iulian Dragos
Priority: Blocker
 Fix For: 1.4.0


 This one took a long time to track down. The Maven install phase is part of 
 our release process. It runs the scala:doc target to generate doc jars. 
 Between Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in 
 a way that breaks our build. In both cases, it returned an error (there has 
 been a long running error here that we've always ignored), however in 2.11.3 
 that error became fatal and failed the entire build process. The upgrade 
 occurred in SPARK-7092. Here is a simple reproduction:
 {code}
 ./dev/change-version-to-2.11.sh
 mvn clean install -pl network/common -pl network/shuffle -DskipTests 
 -Dscala-2.11
 {code} 
 This command exits success when Spark is at Scala 2.11.2 and fails with 
 2.11.3 or higher. In either case an error is printed:
 {code}
 [INFO] 
 [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ 
 spark-network-shuffle_2.11 ---
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56:
  error: not found: type Type
   protected Type type() { return Type.UPLOAD_BLOCK; }
 ^
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37:
  error: not found: type Type
   protected Type type() { return Type.STREAM_HANDLE; }
 ^
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44:
  error: not found: type Type
   protected Type type() { return Type.REGISTER_EXECUTOR; }
 ^
 /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40:
  error: not found: type Type
   protected Type type() { return Type.OPEN_BLOCKS; }
 ^
 model contains 22 documentable templates
 four errors found
 {code}
 Ideally we'd just dig in and fix this error. Unfortunately it's a very 
 confusing error and I have no idea why it is appearing. I'd propose reverting 
 SPARK-7092 in the mean time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds

2015-08-06 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660796#comment-14660796
 ] 

Patrick Wendell commented on SPARK-1517:


Hey Ryan,

IIRC - the Apache snapshot repository won't let us publish binaries that do not 
have SNAPSHOT in the version number. The reason is it expects to see 
timestamped snapshots so its garbage collection mechanism can work. We could 
look at adding sha1 hashes, before SNAPSHOT, but I think there is some chance 
this would break their cleanup.

In terms of posting more binaries - I can look at whether Databricks or 
Berkeley might be able to donate S3 resources for this, but it would have to be 
clearly maintained by those organizations and not branded as official Apache 
releases or anything like that.

 Publish nightly snapshots of documentation, maven artifacts, and binary builds
 --

 Key: SPARK-1517
 URL: https://issues.apache.org/jira/browse/SPARK-1517
 Project: Spark
  Issue Type: Improvement
  Components: Build, Project Infra
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Critical

 Should be pretty easy to do with Jenkins. The only thing I can think of that 
 would be tricky is to set up credentials so that jenkins can publish this 
 stuff somewhere on apache infra.
 Ideally we don't want to have to put a private key on every jenkins box 
 (since they are otherwise pretty stateless). One idea is to encrypt these 
 credentials with a passphrase and post them somewhere publicly visible. Then 
 the jenkins build can download the credentials provided we set a passphrase 
 in an environment variable in jenkins. There may be simpler solutions as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds

2015-08-06 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660420#comment-14660420
 ] 

Patrick Wendell commented on SPARK-1517:


Hey Ryan,

For the maven snapshot releases - unfortunately we are constrained by maven's 
own SNAPSHOT version format which doesn't allow encoding anything other than 
the timestamp. It's just not supported in their SNAPSHOT mechanism. However, 
one thing we could see is whether we can align the timestamp with the time of 
the actual spark commit, rather than the time of publication of the SNAPSHOT 
release. I'm not sure if maven lets you provide a custom timestamp when 
publishing. If we had that feature users could look at the Spark commit log and 
do some manual association.

For the binaries, the reason why the same commit appears multiple times is that 
we do the build every four hours and always publish the latest one even if it's 
a duplicate. However, this could be modified pretty easily to just avoid 
double-publishing the same commit if there hasn't been any code change. Maybe 
create a JIRA for this?

In terms of how many older versions are available, the scripts we use for this 
have a tunable retention window. Right now I'm only keeping the last 4 builds, 
we could probably extend it to something like 10 builds. However, at some point 
I'm likely to blow out of space in my ASF user account. Since the binaries are 
quite large, I don't think at least using ASF infrastructure it's feasible to 
keep all past builds. We have 3000 commits in a typical Spark release, and it's 
a few gigs for each binary build.

 Publish nightly snapshots of documentation, maven artifacts, and binary builds
 --

 Key: SPARK-1517
 URL: https://issues.apache.org/jira/browse/SPARK-1517
 Project: Spark
  Issue Type: Improvement
  Components: Build, Project Infra
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Critical

 Should be pretty easy to do with Jenkins. The only thing I can think of that 
 would be tricky is to set up credentials so that jenkins can publish this 
 stuff somewhere on apache infra.
 Ideally we don't want to have to put a private key on every jenkins box 
 (since they are otherwise pretty stateless). One idea is to encrypt these 
 credentials with a passphrase and post them somewhere publicly visible. Then 
 the jenkins build can download the credentials provided we set a passphrase 
 in an environment variable in jenkins. There may be simpler solutions as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Avoiding unnecessary build changes until tests are in better shape

2015-08-05 Thread Patrick Wendell
Hey All,

Was wondering if people would be willing to avoid merging build
changes until we have put the tests in better shape. The reason is
that build changes are the most likely to cause downstream issues with
the test matrix and it's very difficult to reverse engineer which
patches caused which problems when the tests are not in a stable
state. For instance, the updates to Hive 1.2.1 caused cascading
failures that have lasted several days now and in the mean time a few
other build related patches were also merged - as these pile up it
gets harder for us to have confidence those other patches didn't
introduce problems.

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: How to help for 1.5 release?

2015-08-04 Thread Patrick Wendell
Hey Meihua,

If you are a user of Spark, one thing that is really helpful is to run
Spark 1.5 on your workload and report any issues, performance
regressions, etc.

- Patrick

On Mon, Aug 3, 2015 at 11:49 PM, Akhil Das ak...@sigmoidanalytics.com wrote:
 I think you can start from here
 https://issues.apache.org/jira/browse/SPARK/fixforversion/12332078/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel

 Thanks
 Best Regards

 On Tue, Aug 4, 2015 at 12:02 PM, Meihua Wu rotationsymmetr...@gmail.com
 wrote:

 I think the team is preparing for the 1.5 release. Anything to help with
 the QA, testing etc?

 Thanks,

 MW



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: PSA: Maven 3.3.3 now required to build

2015-08-03 Thread Patrick Wendell
Yeah the best bet is to use ./build/mvn --force (otherwise we'll still
use your system maven).

- Patrick

On Mon, Aug 3, 2015 at 1:26 PM, Sean Owen so...@cloudera.com wrote:
 That statement is true for Spark 1.4.x. But you've reminded me that I
 failed to update this doc for 1.5, to say Maven 3.3.3 is required.
 Patch coming up.

 On Mon, Aug 3, 2015 at 9:12 PM, Guru Medasani gdm...@gmail.com wrote:
 Thanks Sean. Reason I asked this is, in Building Spark documentation of
 1.4.1, I still see this.

 https://spark.apache.org/docs/latest/building-spark.html

 Building Spark using Maven requires Maven 3.0.4 or newer and Java 6+.

 But I noticed the following warnings from the build of Spark version
 1.5.0-snapshot. So I was wondering if the changes you mentioned relate to
 newer versions of Spark or for 1.4.1 version as well.

 [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

 [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion
 failed with message:
 Detected JDK Version: 1.6.0-36 is not in the allowed range 1.7.

 Guru Medasani
 gdm...@gmail.com

 On Aug 3, 2015, at 2:38 PM, Sean Owen so...@cloudera.com wrote:

 Using ./build/mvn should always be fine. Your local mvn is fine too if
 it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users
 on OS X out there will have, by the way.

 On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote:

 Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT
 this morning.

 WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion
 failed with message:
 Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3.

 Should we be using maven 3.3.3 locally or build/mvn starting from Spark
 1.4.1 or Spark version 1.5?

 Guru Medasani
 gdm...@gmail.com



 On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote:

 If you use build/mvn or are already using Maven 3.3.3 locally (i.e.
 via brew on OS X), then this won't affect you, but I wanted to call
 attention to https://github.com/apache/spark/pull/7852 which makes
 Maven 3.3.3 the minimum required to build Spark. This heads off
 problems from some behavior differences that Patrick and I observed
 between 3.3 and 3.2 last week, on top of the dependency reduced POM
 glitch from the 1.4.1 release window.

 Again all you need to do is use build/mvn if you don't already have
 the latest Maven installed and all will be well.

 Sean

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Created] (SPARK-9547) Allow testing pull requests with different Hadoop versions

2015-08-02 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-9547:
--

 Summary: Allow testing pull requests with different Hadoop versions
 Key: SPARK-9547
 URL: https://issues.apache.org/jira/browse/SPARK-9547
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell


Similar to SPARK-9545 we should allow testing different Hadoop profiles in the 
PRB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9545) Run Maven tests in pull request builder if title has [maven-test] in it

2015-08-02 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-9545:
---
Issue Type: Improvement  (was: Bug)

 Run Maven tests in pull request builder if title has [maven-test] in it
 -

 Key: SPARK-9545
 URL: https://issues.apache.org/jira/browse/SPARK-9545
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell

 We have infrastructure now in the build tooling for running maven tests, but 
 it's not actually used anywhere. With a very minor change we can support 
 running maven tests if the pull request title has maven-test in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9545) Run Maven tests in pull request builder if title has [maven-test] in it

2015-08-02 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-9545:
--

 Summary: Run Maven tests in pull request builder if title has 
[maven-test] in it
 Key: SPARK-9545
 URL: https://issues.apache.org/jira/browse/SPARK-9545
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Patrick Wendell
Assignee: Patrick Wendell


We have infrastructure now in the build tooling for running maven tests, but 
it's not actually used anywhere. With a very minor change we can support 
running maven tests if the pull request title has maven-test in it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-08-01 Thread Patrick Wendell
Hey All,

I got it up and running - it was a newly surfaced bug in the build scripts.

- Patrick

On Wed, Jul 29, 2015 at 6:05 AM, Bharath Ravi Kumar reachb...@gmail.com wrote:
 Hey Patrick,

 Any update on this front please?

 Thanks,
 Bharath

 On Fri, Jul 24, 2015 at 8:38 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Bharath,

 There was actually an incompatible change to the build process that
 broke several of the Jenkins builds. This should be patched up in the
 next day or two and nightly builds will resume.

 - Patrick

 On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar
 reachb...@gmail.com wrote:
  I noticed the last (1.5) build has a timestamp of 16th July. Have
  nightly
  builds been discontinued since then?
 
  Thanks,
  Bharath
 
  On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Hi All,
 
  This week I got around to setting up nightly builds for Spark on
  Jenkins. I'd like feedback on these and if it's going well I can merge
  the relevant automation scripts into Spark mainline and document it on
  the website. Right now I'm doing:
 
  1. SNAPSHOT's of Spark master and release branches published to ASF
  Maven snapshot repo:
 
 
 
  https://repository.apache.org/content/repositories/snapshots/org/apache/spark/
 
  These are usable by adding this repository in your build and using a
  snapshot version (e.g. 1.3.2-SNAPSHOT).
 
  2. Nightly binary package builds and doc builds of master and release
  versions.
 
  http://people.apache.org/~pwendell/spark-nightly/
 
  These build 4 times per day and are tagged based on commits.
 
  If anyone has feedback on these please let me know.
 
  Thanks!
  - Patrick
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Should spark-ec2 get its own repo?

2015-07-31 Thread Patrick Wendell
Hey All,

I've mostly kept quiet since I am not very active in maintaining this
code anymore. However, it is a bit odd that the project is
split-brained with a lot of the code being on github and some in the
Spark repo.

If the consensus is to migrate everything to github, that seems okay
with me. I would vouch for having user continuity, for instance still
have a shim ec2/spark-ec2 script that could perhaps just download
and unpack the real script from github.

- Patrick

On Fri, Jul 31, 2015 at 2:13 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 Yes - It is still in progress, but I have just not gotten time to get to
 this. I think getting the repo moved from mesos to amplab in the codebase by
 1.5 should be possible.

 Thanks
 Shivaram

 On Fri, Jul 31, 2015 at 3:08 AM, Sean Owen so...@cloudera.com wrote:

 PS is this still in progress? it feels like something that would be
 good to do before 1.5.0, if it's going to happen soon.

 On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman
 shiva...@eecs.berkeley.edu wrote:
  Yeah I'll send a note to the mesos dev list just to make sure they are
  informed.
 
  Shivaram
 
  On Tue, Jul 21, 2015 at 11:47 AM, Sean Owen so...@cloudera.com wrote:
 
  I agree it's worth informing Mesos devs and checking that there are no
  big objections. I presume Shivaram is plugged in enough to Mesos that
  there won't be any surprises there, and that the project would also
  agree with moving this Spark-specific bit out. they may also want to
  leave a pointer to the new location in the mesos repo of course.
 
  I don't think it is something that requires a formal vote. It's not a
  question of ownership -- neither Apache nor the project PMC owns the
  code. I don't think it's different from retiring or removing any other
  code.
 
 
 
 
 
  On Tue, Jul 21, 2015 at 7:03 PM, Mridul Muralidharan mri...@gmail.com
  wrote:
   If I am not wrong, since the code was hosted within mesos project
   repo, I assume (atleast part of it) is owned by mesos project and so
   its PMC ?
  
   - Mridul
  
   On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman
   shiva...@eecs.berkeley.edu wrote:
   There is technically no PMC for the spark-ec2 project (I guess we
   are
   kind
   of establishing one right now). I haven't heard anything from the
   Spark
   PMC
   on the dev list that might suggest a need for a vote so far. I will
   send
   another round of email notification to the dev list when we have a
   JIRA
   / PR
   that actually moves the scripts (right now the only thing that
   changed
   is
   the location of some scripts in mesos/ to amplab/).
  
   Thanks
   Shivaram
  
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Data source aliasing

2015-07-30 Thread Patrick Wendell
Yeah this could make sense - allowing data sources to register a short
name. What mechanism did you have in mind? To use the jar service loader?

The only issue is that there could be conflicts since many of these are
third party packages. If the same name were registered twice I'm not sure
what the best behavior would be. Ideally in my mind if the same shortname
were registered twice we'd force the user to use a fully qualified name and
say the short name is ambiguous.

Patrick
On Jul 30, 2015 9:44 AM, Joseph Batchik josephbatc...@gmail.com wrote:

 Hi all,

 There are now starting to be a lot of data source packages for Spark. A
 annoyance I see is that I have to type in the full class name like:

 sqlContext.read.format(com.databricks.spark.avro).load(path).

 Spark internally has formats such as parquet and jdbc registered and
 it would be nice to be able just to type in avro, redshift, etc. as
 well. Would it be a good idea to use something like a service loader to
 allow data sources defined in other packages to register themselves with
 Spark? I think that this would make it easier for end users. I would be
 interested in adding this, please let me know what you guys think.

 - Joe





[jira] [Resolved] (SPARK-9423) Why do every other spark comiter keep suggesting to use spark-submit script

2015-07-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-9423.

Resolution: Invalid

 Why do every other spark comiter keep suggesting to use spark-submit script
 ---

 Key: SPARK-9423
 URL: https://issues.apache.org/jira/browse/SPARK-9423
 Project: Spark
  Issue Type: Question
  Components: Deploy
Affects Versions: 1.3.1
Reporter: nirav patel

 I see that on spark forum and stackoverflow people keep suggesting to use 
 spark-submit.sh script as a way (only way) to launch spark jobs? Are we still 
 living in application server monolithic world where I need to run startup.sh 
 ? What if spark application is long running context that serves multiple 
 requests? What if user just don't want to use script? They want to embed 
 spark as a service in their application. 
 Please STOP suggesting user to use spark-submit script as an alternative. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9423) Why do every other spark comiter keep suggesting to use spark-submit script

2015-07-28 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645495#comment-14645495
 ] 

Patrick Wendell commented on SPARK-9423:


This is not a valid issue for JIRA (we use JIRA for project bugs and feature 
tracking). Please send an email to the spark-users list. Thanks.

 Why do every other spark comiter keep suggesting to use spark-submit script
 ---

 Key: SPARK-9423
 URL: https://issues.apache.org/jira/browse/SPARK-9423
 Project: Spark
  Issue Type: Question
  Components: Deploy
Affects Versions: 1.3.1
Reporter: nirav patel

 I see that on spark forum and stackoverflow people keep suggesting to use 
 spark-submit.sh script as a way (only way) to launch spark jobs? Are we still 
 living in application server monolithic world where I need to run startup.sh 
 ? What if spark application is long running context that serves multiple 
 requests? What if user just don't want to use script? They want to embed 
 spark as a service in their application. 
 Please STOP suggesting user to use spark-submit script as an alternative. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: ReceiverTrackerSuite failing in master build

2015-07-28 Thread Patrick Wendell
Thanks ted for pointing this out. CC to Ryan and TD

On Tue, Jul 28, 2015 at 8:25 AM, Ted Yu yuzhih...@gmail.com wrote:
 Hi,
 I noticed that ReceiverTrackerSuite is failing in master Jenkins build for
 both hadoop profiles.

 The failure seems to start with:
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3104/

 FYI

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Protocol for build breaks

2015-07-25 Thread Patrick Wendell
Hi All,

If there is a build break (i.e. a compile issue or consistently
failing test) that somehow makes it into master, the best protocol is:

1. Revert the offending patch.
2. File a JIRA and assign it to the committer of the offending patch.
The JIRA should contain links to broken builds.

It's not worth waiting any time to try and figure out how to fix it,
or blocking on tracking down the commit author. This is because every
hour that we have the PRB broken is a major cost in terms of developer
productivity.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Created] (SPARK-9304) Improve backwards compatibility of SPARK-8401

2015-07-24 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-9304:
--

 Summary: Improve backwards compatibility of SPARK-8401
 Key: SPARK-9304
 URL: https://issues.apache.org/jira/browse/SPARK-9304
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Patrick Wendell
Assignee: Michael Allman
Priority: Critical


In SPARK-8401 a backwards incompatible change was made to the scala 2.11 build 
process. It would be good to add scripts with the older names to avoid breaking 
compatibility for harnesses or other automated builds that build for Scala 
2.11. The can just be a one line shell script with a comment explaining it is 
for backwards compatibility purposes.

/cc [~srowen]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [ANNOUNCE] Nightly maven and package builds for Spark

2015-07-24 Thread Patrick Wendell
Hey Bharath,

There was actually an incompatible change to the build process that
broke several of the Jenkins builds. This should be patched up in the
next day or two and nightly builds will resume.

- Patrick

On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar
reachb...@gmail.com wrote:
 I noticed the last (1.5) build has a timestamp of 16th July. Have nightly
 builds been discontinued since then?

 Thanks,
 Bharath

 On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hi All,

 This week I got around to setting up nightly builds for Spark on
 Jenkins. I'd like feedback on these and if it's going well I can merge
 the relevant automation scripts into Spark mainline and document it on
 the website. Right now I'm doing:

 1. SNAPSHOT's of Spark master and release branches published to ASF
 Maven snapshot repo:


 https://repository.apache.org/content/repositories/snapshots/org/apache/spark/

 These are usable by adding this repository in your build and using a
 snapshot version (e.g. 1.3.2-SNAPSHOT).

 2. Nightly binary package builds and doc builds of master and release
 versions.

 http://people.apache.org/~pwendell/spark-nightly/

 These build 4 times per day and are tagged based on commits.

 If anyone has feedback on these please let me know.

 Thanks!
 - Patrick

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Policy around backporting bug fixes

2015-07-24 Thread Patrick Wendell
Hi All,

A few times I've been asked about backporting and when to backport and
not backport fix patches. Since I have managed this for many of the
past releases, I wanted to point out the way I have been thinking
about it. If we have some consensus I can put it on the wiki.

The trade off when backporting is you get to deliver the fix to people
running older versions (great!), but you risk introducing new or even
worse bugs in maintenance releases (bad!). The decision point is when
you have a bug fix and it's not clear whether it is worth backporting.

I think the following facets are important to consider:
(a) Backports are an extremely valuable service to the community and
should be considered for any bug fix.
(b) Introducing a new bug in a maintenance release must be avoided at
all costs. It over time would erode confidence in our release process.
(c) Distributions or advanced users can always backport risky patches
on their own, if they see fit.

For me, the consequence of these is that we should backport in the
following situations:
- Both the bug and the fix are well understood and isolated. Code
being modified is well tested.
- The bug being addressed is high priority to the community.
- The backported fix does not vary widely from the master branch fix.

We tend to avoid backports in the converse situations:
- The bug or fix are not well understood. For instance, it relates to
interactions between complex components or third party libraries (e.g.
Hadoop libraries). The code is not well tested outside of the
immediate bug being fixed.
- The bug is not clearly a high priority for the community.
- The backported fix is widely different from the master branch fix.

These are clearly subjective criteria, but ones worth considering. I
am always happy to help advise people on specific patches if they want
a soundingboard to understand whether it makes sense to backport.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Updated] (SPARK-8703) Add CountVectorizer as a ml transformer to convert document to words count vector

2015-07-24 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8703:
---
Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-8521

 Add CountVectorizer as a ml transformer to convert document to words count 
 vector
 -

 Key: SPARK-8703
 URL: https://issues.apache.org/jira/browse/SPARK-8703
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: yuhao yang
Assignee: yuhao yang
 Fix For: 1.5.0

   Original Estimate: 24h
  Remaining Estimate: 24h

 Converts a text document to a sparse vector of token counts. Similar to 
 http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
 I can further add an estimator to extract vocabulary from corpus if that's 
 appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8564) Add the Python API for Kinesis

2015-07-23 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8564:
---
Target Version/s: 1.5.0

 Add the Python API for Kinesis
 --

 Key: SPARK-8564
 URL: https://issues.apache.org/jira/browse/SPARK-8564
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Reporter: Shixiong Zhu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: KinesisStreamSuite failing in master branch

2015-07-19 Thread Patrick Wendell
I think we should just revert this patch on all affected branches. No
reason to leave the builds broken until a fix is in place.

- Patrick

On Sun, Jul 19, 2015 at 6:03 PM, Josh Rosen rosenvi...@gmail.com wrote:
 Yep, I emailed TD about it; I think that we may need to make a change to the
 pull request builder to fix this.  Pending that, we could just revert the
 commit that added this.

 On Sun, Jul 19, 2015 at 5:32 PM, Ted Yu yuzhih...@gmail.com wrote:

 Hi,
 I noticed that KinesisStreamSuite fails for both hadoop profiles in master
 Jenkins builds.

 From
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/3011/console
 :

 KinesisStreamSuite:
 *** RUN ABORTED ***
   java.lang.AssertionError: assertion failed: Kinesis test not enabled,
 should not attempt to get AWS credentials
   at scala.Predef$.assert(Predef.scala:179)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils$.getAWSCredentials(KinesisTestUtils.scala:189)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.org$apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient$lzycompute(KinesisTestUtils.scala:59)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.org$apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient(KinesisTestUtils.scala:58)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.describeStream(KinesisTestUtils.scala:121)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.findNonExistentStreamName(KinesisTestUtils.scala:157)
   at
 org.apache.spark.streaming.kinesis.KinesisTestUtils.createStream(KinesisTestUtils.scala:78)
   at
 org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:45)
   at
 org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
   at
 org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:33)


 FYI



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Foundation policy on releases and Spark nightly builds

2015-07-19 Thread Patrick Wendell
Sean B.,

Thank you for giving a thorough reply. I will work with Sean O. and
see what we can change to make us more in line with the stated policy.

I did some research and it appears that some time between October [1]
and December [2] 2006, this page was modified to include stricter
policy surrounding nightly builds. Actually, the original version of
the policy page encouraged projects to post nightly builds for the
benefit of all developers, just as we have been doing.

If you detect frustration from the Spark community, it's because this
type of situation occurs with some regularity. In this case:

(a) A policy exists from ~10 years ago, presumably because some
project back then had problematic release management practices and so
a policy needed to be created to solve a problem.
(b) The policy is outdated now, and no one is 100% sure why it was
created (likely many of the people are no longer involved in the ASF
who helped craft it).
(c) The steps for how to change it are unclear and there isn't clear
ownership of the policy document.

I think it's unavoidable given the decentralized organization
structure of the ASF, but I just want to be up front about our
perspective and why you might sense some frustration.

[1] 
https://web.archive.org/web/20061020220358/http://www.apache.org/dev/release.html
[2] 
https://web.archive.org/web/20061231050046/http://www.apache.org/dev/release.html

- Patrick

On Tue, Jul 14, 2015 at 10:09 AM, Sean Busbey bus...@cloudera.com wrote:
 Responses inline, with some liberties on ordering.

 On Sun, Jul 12, 2015 at 10:32 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Hey Sean B,

 Would you mind outlining for me how we go about changing this policy -
 I think it's outdated and doesn't make much sense. Ideally I'd like to
 propose a vote to modify the text slightly such that our current
 behavior is seen as complaint. Specifically:




 - Who has the authority to change this document?


 It's foundation level policy, so I'd presume the board needs to. Since it's
 part of our legal position, it might be owned by the legal affairs
 committee[1]. That would mean they could update it without a board
 resolution. (legal-discuss@ could tell you for sure).


 - What concrete steps can I take to change the policy?


 The Legal Affairs Committee is reachable either through their mailing
 list[2] or their issue tracker[3].

 Please be sure to read the entire original document, it explains the
 rationale that has gone into it. You'll need to address the matters raised
 there.



 - You keep mentioning the incubator@ list, why is this the place for
 such policy to be discussed or decided on?



 It can't be decided on the general@incubator list, but there are already
 several relevant parties discussing the matter there. You certainly don't
 *need* to join that conversation, but the participants there have overlap
 with the folks who can ultimately decide the issue. Thus, it may help avoid
 having to repeat things.



 - What is the reasonable amount of time frame in which the policy
 change is likely to be decided?


 I am neither a participant on legal affairs nor the board, so I have no
 idea.


 We've had a few times people from the various parts of the ASF come
 and say we are in violation of a policy. And sometimes other ASF
 people come and then get in a fight on our mailing list, and there is



 Please keep in mind that you are also ASF people, as is the entire Spark
 community (users and all)[4]. Phrasing things in terms of us and them by
 drawing a distinction on [they] get in a fight on our mailing list is not
 helpful.



 back and fourth, and it turns out there isn't so much a widely
 followed policy as a doc somewhere that is really old and not actually
 universally followed. It's difficult for us in such situations to now
 how to proceed and how much autonomy we as a PMC have to make
 decisions about our own project.


 Understanding and abiding by ASF legal obligations and policies is the job
 of each project PMC as a part of their formation by the board[5]. If anyone
 in your community has questions about what the project can or can not do
 then it's the job of the PMC find out proactively (rather than take a ask
 for forgiveness approach). Where the existing documentation is unclear or
 where you think it might be out of date, you can often get guidance from
 general@incubator (since it contains a large number of members and folks
 from across foundation projects) or comdev[6] (since their charter includes
 explaining ASF policy). If those resources prove insufficient matters can be
 brought up with either legal-discuss@ or board@.

 If you find out of date documentation that is not ASF policy, you can have
 it removed by notifying the appropriate group (i.e. legal-discuss, comdev,
 or whomever is hosting it).


 [1]: http://apache.org/legal/
 [2]: http://www.apache.org/foundation/mailinglists.html#foundation-legal
 [3]: https://issues.apache.org/jira/browse/LEGAL

Re: Foundation policy on releases and Spark nightly builds

2015-07-19 Thread Patrick Wendell
Hey Sean,

One other thing I'd be okay doing is moving the main text about
nightly builds to the wiki and just have header called Nightly
builds at the end of the downloads page that says For developers,
Spark maintains nightly builds. More information is available on the
[Spark developer Wiki](link). I think this would preserve
discoverability while also placing the information on the wiki, which
seems to be the main ask of the policy.

- Patrick

On Sun, Jul 19, 2015 at 2:32 AM, Sean Owen so...@cloudera.com wrote:
 I am going to make an edit to the download page on the web site to
 start, as that much seems uncontroversial. Proposed change:

 Reorder sections to put developer-oriented sections at the bottom,
 including the info on nightly builds:
   Download Spark
   Link with Spark
   All Releases
   Spark Source Code Management
   Nightly Builds

 Change text to emphasize the audience:

 Packages are built regularly off of Spark’s master branch and release
 branches. These provide *Spark developers* access to the bleeding-edge
 of Spark master or the most recent fixes not yet incorporated into a
 maintenance release. *They should not be used by anyone except Spark
 developers, and may be unstable or have serious bugs. End users should
 only use official releases above. Please subscribe to
 dev@spark.apache.org if you are a Spark developer to be aware of
 issues in nightly builds.* Spark nightly packages are available at:

 On Thu, Jul 16, 2015 at 8:21 AM, Sean Owen so...@cloudera.com wrote:
 To move this forward, I think one of two things needs to happen:

 1. Move this guidance to the wiki. Seems that people gathered here
 believe that resolves the issue. Done.

 2. Put disclaimers on the current downloads page. This may resolve the
 issue, but then we bring it up on the right mailing list for
 discussion. It may end up at #1, or may end in a tweak to the policy.

 I can drive either one. Votes on how to proceed?


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [discuss] Removing individual commit messages from the squash commit message

2015-07-18 Thread Patrick Wendell
+1 from me too

On Sat, Jul 18, 2015 at 3:32 AM, Ted Yu yuzhih...@gmail.com wrote:
 +1 to removing commit messages.



 On Jul 18, 2015, at 1:35 AM, Sean Owen so...@cloudera.com wrote:

 +1 to removing them. Sometimes there are 50+ commits because people
 have been merging from master into their branch rather than rebasing.

 On Sat, Jul 18, 2015 at 8:48 AM, Reynold Xin r...@databricks.com wrote:
 I took a look at the commit messages in git log -- it looks like the
 individual commit messages are not that useful to include, but do make the
 commit messages more verbose. They are usually just a bunch of extremely
 concise descriptions of bug fixes, merges, etc:

cb3f12d [xxx] add whitespace
6d874a6 [xxx] support pyspark for yarn-client

89b01f5 [yyy] Update the unit test to add more cases
275d252 [yyy] Address the comments
7cc146d [yyy] Address the comments
2624723 [yyy] Fix rebase conflict
45befaa [yyy] Update the unit test
bbc1c9c [yyy] Fix checkpointing doesn't retain driver port issue


 Anybody against removing those from the merge script so the log looks
 cleaner? If nobody feels strongly about this, we can just create a JIRA to
 remove them, and only keep the author names.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Slight API incompatibility caused by SPARK-4072

2015-07-15 Thread Patrick Wendell
One related note here is that we have a Java version of this that is
an abstract class - in the doc it says that it exists more or less to
allow for binary compatibility (it says it's for Java users, but
really Scala could use this also):

https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/JavaSparkListener.java#L23

I think it might be reasonable that the Scala trait provides only
source compatibitly and the Java class provides binary compatibility.

- Patrick

On Wed, Jul 15, 2015 at 11:47 AM, Marcelo Vanzin van...@cloudera.com wrote:
 Hey all,

 Just noticed this when some of our tests started to fail. SPARK-4072 added a
 new method to the SparkListener trait, and even though it has a default
 implementation, it doesn't seem like that applies retroactively.

 Namely, if you have an existing, compiled app that has an implementation of
 SparkListener, that app won't work on 1.5 without a recompile. You'll get
 something like this:

 java.lang.AbstractMethodError
   at
 org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:62)
   at
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at 
 org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56)
   at
 org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
   at
 org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:79)
   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1235)
   at
 org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)


 Now I know that SparkListener is marked as @DeveloperApi, but is this
 something we should care about? Seems like adding methods to traits is just
 as backwards-incompatible as adding new methods to Java interfaces.


 --
 Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Slight API incompatibility caused by SPARK-4072

2015-07-15 Thread Patrick Wendell
Actually the java one is a concrete class.

On Wed, Jul 15, 2015 at 12:14 PM, Patrick Wendell pwend...@gmail.com wrote:
 One related note here is that we have a Java version of this that is
 an abstract class - in the doc it says that it exists more or less to
 allow for binary compatibility (it says it's for Java users, but
 really Scala could use this also):

 https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/JavaSparkListener.java#L23

 I think it might be reasonable that the Scala trait provides only
 source compatibitly and the Java class provides binary compatibility.

 - Patrick

 On Wed, Jul 15, 2015 at 11:47 AM, Marcelo Vanzin van...@cloudera.com wrote:
 Hey all,

 Just noticed this when some of our tests started to fail. SPARK-4072 added a
 new method to the SparkListener trait, and even though it has a default
 implementation, it doesn't seem like that applies retroactively.

 Namely, if you have an existing, compiled app that has an implementation of
 SparkListener, that app won't work on 1.5 without a recompile. You'll get
 something like this:

 java.lang.AbstractMethodError
   at
 org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:62)
   at
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at
 org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
   at 
 org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56)
   at
 org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
   at
 org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:79)
   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1235)
   at
 org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)


 Now I know that SparkListener is marked as @DeveloperApi, but is this
 something we should care about? Seems like adding methods to traits is just
 as backwards-incompatible as adding new methods to Java interfaces.


 --
 Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Announcing Spark 1.4.1!

2015-07-15 Thread Patrick Wendell
Hi All,

I'm happy to announce the Spark 1.4.1 maintenance release.
We recommend all users on the 1.4 branch upgrade to
this release, which contain several important bug fixes.

Download Spark 1.4.1 - http://spark.apache.org/downloads.html
Release notes - http://spark.apache.org/releases/spark-release-1-4-1.html
Comprehensive list of fixes - http://s.apache.org/spark-1.4.1

Thanks to the 85 developers who worked on this release!

Please contact me directly for errata in the release notes.

- Patrick

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Announcing Spark 1.4.1!

2015-07-15 Thread Patrick Wendell
Hi All,

I'm happy to announce the Spark 1.4.1 maintenance release.
We recommend all users on the 1.4 branch upgrade to
this release, which contain several important bug fixes.

Download Spark 1.4.1 - http://spark.apache.org/downloads.html
Release notes - http://spark.apache.org/releases/spark-release-1-4-1.html
Comprehensive list of fixes - http://s.apache.org/spark-1.4.1

Thanks to the 85 developers who worked on this release!

Please contact me directly for errata in the release notes.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Updated] (SPARK-7920) Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example).

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7920:
---
Labels:   (was: spark.tc)

 Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example).
 

 Key: SPARK-7920
 URL: https://issues.apache.org/jira/browse/SPARK-7920
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.3.1, 1.4.0
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.0


 The MLlib ChiSqSelector class is not serializable, and so the example in the 
 ChiSqSelector documentation fails.  Also, that example is missing the import 
 of ChiSqSelector.  ChiSqSelector should just extend Serializable.
 Steps:
 1. Locate the MLlib ChiSqSelector documentation example.
 2. Fix the example by adding an import statement for ChiSqSelector.
 3. Attempt to run - notice that it will fail due to ChiSqSelector not being 
 serializable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8927) Doc format wrong for some config descriptions

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8927:
---
Labels:   (was: spark.tc)

 Doc format wrong for some config descriptions
 -

 Key: SPARK-8927
 URL: https://issues.apache.org/jira/browse/SPARK-8927
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.4.0
Reporter: Jon Alter
Assignee: Jon Alter
Priority: Trivial
 Fix For: 1.4.2, 1.5.0


 In the docs, a couple descriptions of configuration (under Network) are not 
 inside td/td and are being displayed immediately under the section title 
 instead of in their row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7985) Remove fittingParamMap references. Update ML Doc Estimator, Transformer, and Param examples.

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7985:
---
Labels:   (was: spark.tc)

 Remove fittingParamMap references. Update ML Doc Estimator, Transformer, 
 and Param examples.
 

 Key: SPARK-7985
 URL: https://issues.apache.org/jira/browse/SPARK-7985
 Project: Spark
  Issue Type: Bug
  Components: Documentation, ML
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.0


 Update ML Doc's Estimator, Transformer, and Param Scala  Java examples to 
 use model.extractParamMap instead of model.fittingParamMap, which no longer 
 exists.  Remove all other references to fittingParamMap throughout Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7969) Drop method on Dataframes should handle Column

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7969:
---
Labels:   (was: spark.tc)

 Drop method on Dataframes should handle Column
 --

 Key: SPARK-7969
 URL: https://issues.apache.org/jira/browse/SPARK-7969
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 1.4.0
Reporter: Olivier Girardot
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.1, 1.5.0


 For now the drop method available on Dataframe since Spark 1.4.0 only accepts 
 a column name (as a string), it should also accept a Column as input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7830) ML doc cleanup: logreg, classification link

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7830:
---
Labels:   (was: spark.tc)

 ML doc cleanup: logreg, classification link
 ---

 Key: SPARK-7830
 URL: https://issues.apache.org/jira/browse/SPARK-7830
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, MLlib
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Trivial
 Fix For: 1.4.0


 Add logistic regression to the list of Multiclass Classification Supported 
 Methods in the MLlib Classification and Regression documentation, and fix 
 related broken link.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8343) Improve the Spark Streaming Guides

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8343:
---
Labels:   (was: spark.tc)

 Improve the Spark Streaming Guides
 --

 Key: SPARK-8343
 URL: https://issues.apache.org/jira/browse/SPARK-8343
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, Streaming
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.4.1, 1.5.0


 Improve the Spark Streaming Guides by fixing broken links, rewording 
 confusing sections, fixing typos, adding missing words, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7977) Disallow println

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7977:
---
Labels: starter  (was: spark.tc starter)

 Disallow println
 

 Key: SPARK-7977
 URL: https://issues.apache.org/jira/browse/SPARK-7977
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Reporter: Reynold Xin
Assignee: Jon Alter
  Labels: starter
 Fix For: 1.5.0


 Very often we see pull requests that added println from debugging, but the 
 author forgot to remove it before code review.
 We can use the regex checker to disallow println. For legitimate use of 
 println, we can then disable the rule where they are used.
 Add to scalastyle-config.xml file:
 {code}
   check customId=println level=error 
 class=org.scalastyle.scalariform.TokenChecker enabled=true
 parametersparameter name=regex^println$/parameter/parameters
 customMessage![CDATA[Are you sure you want to println? If yes, wrap 
 the code block with 
   // scalastyle:off println
   println(...)
   // scalastyle:on println]]/customMessage
   /check
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8570) Improve MLlib Local Matrix Documentation.

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8570:
---
Labels:   (was: spark.tc)

 Improve MLlib Local Matrix Documentation.
 -

 Key: SPARK-8570
 URL: https://issues.apache.org/jira/browse/SPARK-8570
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, MLlib
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.5.0


 Update the MLlib Data Types Local Matrix documentation as follows:
 -Include information on sparse matrices.
 -Add sparse matrix examples to the existing Scala and Java examples.
 -Add Python examples for both dense and sparse matrices (currently no Python 
 examples exist for the Local Matrix section).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7883) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation.

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7883:
---
Labels:   (was: spark.tc)

 Fixing broken trainImplicit example in MLlib Collaborative Filtering 
 documentation.
 ---

 Key: SPARK-7883
 URL: https://issues.apache.org/jira/browse/SPARK-7883
 Project: Spark
  Issue Type: Bug
  Components: Documentation, MLlib
Affects Versions: 1.0.2, 1.1.1, 1.2.2, 1.3.1, 1.4.0
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Trivial
 Fix For: 1.0.3, 1.1.2, 1.2.3, 1.3.2, 1.4.0


 The trainImplicit Scala example near the end of the MLlib Collaborative 
 Filtering documentation refers to an ALS.trainImplicit function signature 
 that does not exist.  Rather than add an extra function, let's just fix the 
 example.
 Currently, the example refers to a function that would have the following 
 signature: 
 def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, alpha: 
 Double) : MatrixFactorizationModel
 Instead, let's change the example to refer to this function, which does exist 
 (notice the addition of the lambda parameter):
 def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, lambda: 
 Double, alpha: Double) : MatrixFactorizationModel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7426) spark.ml AttributeFactory.fromStructField should allow other NumericTypes

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7426:
---
Labels:   (was: spark.tc)

 spark.ml AttributeFactory.fromStructField should allow other NumericTypes
 -

 Key: SPARK-7426
 URL: https://issues.apache.org/jira/browse/SPARK-7426
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Joseph K. Bradley
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.5.0


 It currently only supports DoubleType, but it should support others, at least 
 for fromStructField (importing into ML attribute format, rather than 
 exporting).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8639) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8639:
---
Labels:   (was: spark.tc)

 Instructions for executing jekyll in docs/README.md could be slightly more 
 clear, typo in docs/api.md
 -

 Key: SPARK-8639
 URL: https://issues.apache.org/jira/browse/SPARK-8639
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Rosstin Murphy
Assignee: Rosstin Murphy
Priority: Trivial
 Fix For: 1.4.1, 1.5.0


 In docs/README.md, the text states around line 31
 Execute 'jekyll' from the 'docs/' directory. Compiling the site with Jekyll 
 will create a directory called '_site' containing index.html as well as the 
 rest of the compiled files.
 It might be more clear if we said
 Execute 'jekyll build' from the 'docs/' directory to compile the site. 
 Compiling the site with Jekyll will create a directory called '_site' 
 containing index.html as well as the rest of the compiled files.
 In docs/api.md: Here you can API docs for Spark and its submodules.
 should be something like: Here you can read API docs for Spark and its 
 submodules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7357) Improving HBaseTest example

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7357:
---
Labels:   (was: spark.tc)

 Improving HBaseTest example
 ---

 Key: SPARK-7357
 URL: https://issues.apache.org/jira/browse/SPARK-7357
 Project: Spark
  Issue Type: Improvement
  Components: Examples
Affects Versions: 1.3.1
Reporter: Jihong MA
Assignee: Jihong MA
Priority: Minor
 Fix For: 1.5.0

   Original Estimate: 2m
  Remaining Estimate: 2m

 Minor improvement to HBaseTest example, when Hbase related configurations 
 e.g: zookeeper quorum, zookeeper client port or zookeeper.znode.parent are 
 not set to default (localhost:2181), connection to zookeeper might hang as 
 shown in following stack
 15/03/26 18:31:20 INFO zookeeper.ZooKeeper: Initiating client connection, 
 connectString=xxx.xxx.xxx:2181 sessionTimeout=9 
 watcher=hconnection-0x322a4437, quorum=xxx.xxx.xxx:2181, baseZNode=/hbase
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Opening socket connection to 
 server 9.30.94.121:2181. Will not attempt to authenticate using SASL (unknown 
 error)
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Socket connection established to 
 xxx.xxx.xxx/9.30.94.121:2181, initiating session
 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Session establishment complete 
 on server xxx.xxx.xxx/9.30.94.121:2181, sessionid = 0x14c53cd311e004b, 
 negotiated timeout = 4
 15/03/26 18:31:21 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper 
 is null
 this is due to hbase-site.xml is not placed on spark class path. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8746) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8746:
---
Labels: documentation test  (was: documentation spark.tc test)

 Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)
 --

 Key: SPARK-8746
 URL: https://issues.apache.org/jira/browse/SPARK-8746
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Christian Kadner
Assignee: Christian Kadner
Priority: Trivial
  Labels: documentation, test
 Fix For: 1.4.1, 1.5.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 The Spark SQL documentation (https://github.com/apache/spark/tree/master/sql) 
 describes how to generate golden answer files for new hive comparison test 
 cases. However the download link for the Hive 0.13.1 jars points to 
 https://hive.apache.org/downloads.html but none of the linked mirror sites 
 still has the 0.13.1 version.
 We need to update the link to 
 https://archive.apache.org/dist/hive/hive-0.13.1/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6485) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6485:
---
Labels:   (was: spark.tc)

 Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark
 --

 Key: SPARK-6485
 URL: https://issues.apache.org/jira/browse/SPARK-6485
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Xiangrui Meng

 We should add APIs for CoordinateMatrix/RowMatrix/IndexedRowMatrix in 
 PySpark. Internally, we can use DataFrames for serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7744) Distributed matrix section in MLlib Data Types documentation should be reordered.

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7744:
---
Labels:   (was: spark.tc)

 Distributed matrix section in MLlib Data Types documentation should be 
 reordered.
 -

 Key: SPARK-7744
 URL: https://issues.apache.org/jira/browse/SPARK-7744
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, MLlib
Reporter: Mike Dusenberry
Assignee: Mike Dusenberry
Priority: Minor
 Fix For: 1.3.2, 1.4.0


 The documentation for BlockMatrix should come after RowMatrix, 
 IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later 
 three types, and RowMatrix is considered the basic distributed matrix.  
 This will improve comprehensibility of the Distributed matrix section, 
 especially for the new reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6785:
---
Labels:   (was: spark.tc)

 DateUtils can not handle date before 1970/01/01 correctly
 -

 Key: SPARK-6785
 URL: https://issues.apache.org/jira/browse/SPARK-6785
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Christian Kadner
 Fix For: 1.5.0


 {code}
 scala val d = new Date(100)
 d: java.sql.Date = 1969-12-31
 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d))
 res1: java.sql.Date = 1970-01-01
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5562) LDA should handle empty documents

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5562:
---
Labels: starter  (was: spark.tc starter)

 LDA should handle empty documents
 -

 Key: SPARK-5562
 URL: https://issues.apache.org/jira/browse/SPARK-5562
 Project: Spark
  Issue Type: Test
  Components: MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Assignee: Alok Singh
Priority: Minor
  Labels: starter
 Fix For: 1.5.0

   Original Estimate: 96h
  Remaining Estimate: 96h

 Latent Dirichlet Allocation (LDA) could easily be given empty documents when 
 people select a small vocabulary.  We should check to make sure it is robust 
 to empty documents.
 This will hopefully take the form of a unit test, but may require modifying 
 the LDA implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7265) Improving documentation for Spark SQL Hive support

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7265:
---
Labels:   (was: spark.tc)

 Improving documentation for Spark SQL Hive support 
 ---

 Key: SPARK-7265
 URL: https://issues.apache.org/jira/browse/SPARK-7265
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.3.1
Reporter: Jihong MA
Assignee: Jihong MA
Priority: Trivial
 Fix For: 1.5.0


 miscellaneous documentation improvement for Spark SQL Hive support, Yarn 
 cluster deployment. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2859) Update url of Kryo project in related docs

2015-07-14 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-2859:
---
Labels:   (was: spark.tc)

 Update url of Kryo project in related docs
 --

 Key: SPARK-2859
 URL: https://issues.apache.org/jira/browse/SPARK-2859
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Reporter: Guancheng Chen
Assignee: Guancheng Chen
Priority: Trivial
 Fix For: 1.0.3, 1.1.0


 Kryo project has been migrated from googlecode to github, hence we need to 
 update its URL in related docs such as tuning.md.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2015-07-13 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1403.

  Resolution: Fixed
Target Version/s:   (was: 1.5.0)

Hey All,

This issue should remain fixed. [~mandoskippy] I think you are just running 
into a different issue that is also in some way related to classloading.

Can you open a new JIRA for your issue, paste in the stack trace and give as 
much information as possible without the environment? Thanks!

 Spark on Mesos does not set Thread's context class loader
 -

 Key: SPARK-1403
 URL: https://issues.apache.org/jira/browse/SPARK-1403
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0, 1.3.0, 1.4.0
 Environment: ubuntu 12.04 on vagrant
Reporter: Bharath Bhushan
Priority: Blocker
 Fix For: 1.0.0


 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
 executor on mesos slave throws a  java.lang.ClassNotFoundException for 
 org.apache.spark.serializer.JavaSerializer.
 The lengthy discussion is here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-1403) Spark on Mesos does not set Thread's context class loader

2015-07-13 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625739#comment-14625739
 ] 

Patrick Wendell edited comment on SPARK-1403 at 7/14/15 2:59 AM:
-

Hey All,

This issue should remain fixed. [~mandoskippy] I think you are just running 
into a different issue that is also in some way related to classloading.

Can you open a new JIRA for your issue, paste in the stack trace and give as 
much information as possible about the environment? Thanks!


was (Author: pwendell):
Hey All,

This issue should remain fixed. [~mandoskippy] I think you are just running 
into a different issue that is also in some way related to classloading.

Can you open a new JIRA for your issue, paste in the stack trace and give as 
much information as possible without the environment? Thanks!

 Spark on Mesos does not set Thread's context class loader
 -

 Key: SPARK-1403
 URL: https://issues.apache.org/jira/browse/SPARK-1403
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0, 1.3.0, 1.4.0
 Environment: ubuntu 12.04 on vagrant
Reporter: Bharath Bhushan
Priority: Blocker
 Fix For: 1.0.0


 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark 
 executor on mesos slave throws a  java.lang.ClassNotFoundException for 
 org.apache.spark.serializer.JavaSerializer.
 The lengthy discussion is here: 
 http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-13 Thread Patrick Wendell
This vote passes with 14 +1 (7 binding) votes and no 0 or -1 votes.

+1 (14):
Patrick Wendell
Reynold Xin
Sean Owen
Burak Yavuz
Mark Hamstra
Michael Armbrust
Andrew Or
York, Brennon
Krishna Sankar
Luciano Resende
Holden Karau
Tom Graves
Denny Lee
Sean McNamara

- Patrick

On Wed, Jul 8, 2015 at 10:55 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1125/
 [published as version: 1.4.1-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1126/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Sunday, July 12, at 06:55 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Foundation policy on releases and Spark nightly builds

2015-07-12 Thread Patrick Wendell
Thanks Sean O. I was thinking something like NOTE: Nightly builds are
meant for development and testing purposes. They do not go through
Apache's release auditing process and are not official releases.

- Patrick

On Sun, Jul 12, 2015 at 3:39 PM, Sean Owen so...@cloudera.com wrote:
 (This sounds pretty good to me. Mark it developers-only, not formally
 tested by the community, etc.)

 On Sun, Jul 12, 2015 at 7:50 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Sean B.,

 Thanks for bringing this to our attention. I think putting them on the
 developer wiki would substantially decrease visibility in a way that
 is not beneficial to the project - this feature was specifically
 requested by developers from other projects that integrate with Spark.

 If the concern underlying that policy is that snapshot builds could be
 misconstrued as formal releases, I think it would work to put a very
 clear disclaimer explaining the difference directly adjacent to the
 link. That's arguably more explicit than just moving the same text to
 a different page.

 The formal policy asks us not to include links that encourage
 non-developers to download the builds. Stating clearly that the
 audience for those links is developers, in my interpretation that
 would satisfy the letter and spirit of this policy.

 - Patrick


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-12 Thread Patrick Wendell
I think we can close this vote soon. Any addition votes/testing would
be much appreciated!

On Fri, Jul 10, 2015 at 11:30 AM, Sean McNamara
sean.mcnam...@webtrends.com wrote:
 +1

 Sean

 On Jul 8, 2015, at 11:55 PM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1125/
 [published as version: 1.4.1-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1126/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Sunday, July 12, at 06:55 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2015-07-12 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624086#comment-14624086
 ] 

Patrick Wendell commented on SPARK-2089:


Yeah - we can open it again later if someone who maintains this code is wanting 
to work on this feature. I just want to have this JIRA reflect the current 
status (i.e. for 5 versions there hasn't been any action in Spark) which is 
that it is not actively being fixed and make sure the documentation correctly 
reflects what we have now, to discourage the use of a feature that does not 
work.

 With YARN, preferredNodeLocalityData isn't honored 
 ---

 Key: SPARK-2089
 URL: https://issues.apache.org/jira/browse/SPARK-2089
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Critical

 When running in YARN cluster mode, apps can pass preferred locality data when 
 constructing a Spark context that will dictate where to request executor 
 containers.
 This is currently broken because of a race condition.  The Spark-YARN code 
 runs the user class and waits for it to start up a SparkContext.  During its 
 initialization, the SparkContext will create a YarnClusterScheduler, which 
 notifies a monitor in the Spark-YARN code that .  The Spark-Yarn code then 
 immediately fetches the preferredNodeLocationData from the SparkContext and 
 uses it to start requesting containers.
 But in the SparkContext constructor that takes the preferredNodeLocationData, 
 setting preferredNodeLocationData comes after the rest of the initialization, 
 so, if the Spark-YARN code comes around quickly enough after being notified, 
 the data that's fetched is the empty unset version.  The occurred during all 
 of my runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Foundation policy on releases and Spark nightly builds

2015-07-12 Thread Patrick Wendell
Hey Sean B.,

Thanks for bringing this to our attention. I think putting them on the
developer wiki would substantially decrease visibility in a way that
is not beneficial to the project - this feature was specifically
requested by developers from other projects that integrate with Spark.

If the concern underlying that policy is that snapshot builds could be
misconstrued as formal releases, I think it would work to put a very
clear disclaimer explaining the difference directly adjacent to the
link. That's arguably more explicit than just moving the same text to
a different page.

The formal policy asks us not to include links that encourage
non-developers to download the builds. Stating clearly that the
audience for those links is developers, in my interpretation that
would satisfy the letter and spirit of this policy.

- Patrick

On Sat, Jul 11, 2015 at 11:53 AM, Sean Owen so...@cloudera.com wrote:
 From a developer perspective, I also find it surprising to hear that
 nightly builds should be hidden from non-developer end users. In an
 age of Github, what on earth is the problem with distributing the
 content of master? However I do understand why this exists.

 To the extent the ASF provides any value, it is at least a legal
 framework for defining what it means for you and I to give software to
 a bunch of other people. Software artifacts released according to an
 ASF process becomes something the ASF can take responsibility for as
 an entity. Nightly builds are not. It might matter to the committers
 if, say, somebody commits a serious data loss bug. You don't want to
 be on the hook individually for putting that into end-user hands.

 More practically, I think this exists to prevent some projects from
 lazily depending on unofficial nightly builds as pseudo-releases for
 long periods of time. End users may come to perceive them as official
 sanctioned releases when they aren't. That's not the case here of
 course.

 I think nightlies aren't for end-users anyway, and I think developers
 who care would know how to get nightlies anyway. There's little cost
 to moving this info to the wiki, so I'd do it.

 On Sat, Jul 11, 2015 at 4:29 PM, Reynold Xin r...@databricks.com wrote:
 I don't get this rule. It is arbitrary, and does not seem like something
 that should be enforced at the foundation level. By this reasoning, are we
 not allowed to list source code management on the project public page as
 well?

 The download page clearly states the nightly builds are bleeding-edge.

 Note that technically we did not violate any rules, since the ones we showed
 were not nightly builds by the foundation's definition: Nightly Builds
 are simply built from the Subversion trunk, usually once a day.. Spark
 nightly artifacts were built from git, not svn trunk. :)  (joking).



 On Sat, Jul 11, 2015 at 7:44 AM, Sean Busbey bus...@cloudera.com wrote:

 That would be great.

 A note on that page that it's meant for the use of folks working on the
 project with a link to your get involved howto would be nice additional
 context.

 --
 Sean

 On Jul 11, 2015 6:18 AM, Sean Owen so...@cloudera.com wrote:

 I suggest we move this info to the developer wiki, to keep it out from
 the place all and users look for downloads. What do you think about
 that Sean B?

 On Sat, Jul 11, 2015 at 5:34 AM, Sean Busbey bus...@cloudera.com wrote:
  Hi Folks!
 
  I noticed that Spark website's download page lists nightly builds and
  instructions for accessing SNAPSHOT maven artifacts[1]. The ASF policy
  on
  releases expressly forbids this kind of publishing outside of the
  dev@spark
  community[2].
 
  If you'd like to discuss having the policy updated (including expanding
  the
  definition of in the development community), please contribute to the
  discussion on general@incubator[3] after removing the offending items.
 
  [1]:
  http://spark.apache.org/downloads.html#nightly-packages-and-artifacts
  [2]: http://www.apache.org/dev/release.html#what
  [3]: http://s.apache.org/XFP
 
  --
  Sean



 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



SparkHub: a new community site for Apache Spark

2015-07-10 Thread Patrick Wendell
Hi All,

Today, I'm happy to announce SparkHub
(http://sparkhub.databricks.com), a service for the Apache Spark
community to easily find the most relevant Spark resources on the web.

SparkHub is a curated list of Spark news, videos and talks, package
releases, upcoming events around the world, and a Spark Meetup
directory to help you find a meetup close to you.

We will continue to expand the site in the coming months and add more
content. I hope SparkHub can help you find Spark related information
faster and more easily than is currently possible. Everything is
sourced from the Spark community, and we welcome input from you as
well!

- Patrick

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



[jira] [Created] (SPARK-8957) Backport Hive 1.X support to Branch 1.4

2015-07-09 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-8957:
--

 Summary: Backport Hive 1.X support to Branch 1.4
 Key: SPARK-8957
 URL: https://issues.apache.org/jira/browse/SPARK-8957
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Patrick Wendell
Assignee: Michael Armbrust


We almost never to feature backports. But I think it would be really useful to 
backport support for newer Hive versions to the 1.4 branch, for the following 
reasons:

1. It blocks a large number of users from using Hive support.
2. It's a relatively small set of patches, since most of the heavy lifting 
was done in Spark 1.4.0's classloader refactoring.
3. Some distributions have already done this, with success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8957) Backport Hive 1.X support to Branch 1.4

2015-07-09 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8957:
---
Priority: Critical  (was: Major)

 Backport Hive 1.X support to Branch 1.4
 ---

 Key: SPARK-8957
 URL: https://issues.apache.org/jira/browse/SPARK-8957
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Patrick Wendell
Assignee: Michael Armbrust
Priority: Critical

 We almost never to feature backports. But I think it would be really useful 
 to backport support for newer Hive versions to the 1.4 branch, for the 
 following reasons:
 1. It blocks a large number of users from using Hive support.
 2. It's a relatively small set of patches, since most of the heavy lifting 
 was done in Spark 1.4.0's classloader refactoring.
 3. Some distributions have already done this, with success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-09 Thread Patrick Wendell
+1

On Wed, Jul 8, 2015 at 10:55 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1125/
 [published as version: 1.4.1-rc4]
 https://repository.apache.org/content/repositories/orgapachespark-1126/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Sunday, July 12, at 06:55 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2015-07-09 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620051#comment-14620051
 ] 

Patrick Wendell commented on SPARK-2089:


Yeah - I think let's get SPARK-4352 merged and then just close this as won't 
fix and add a JIRA to document it's non working-ness. This hasn't worked since 
before Spark 1.0, and SPARK-5352 is just a strictly better solution than this. 

 With YARN, preferredNodeLocalityData isn't honored 
 ---

 Key: SPARK-2089
 URL: https://issues.apache.org/jira/browse/SPARK-2089
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Critical

 When running in YARN cluster mode, apps can pass preferred locality data when 
 constructing a Spark context that will dictate where to request executor 
 containers.
 This is currently broken because of a race condition.  The Spark-YARN code 
 runs the user class and waits for it to start up a SparkContext.  During its 
 initialization, the SparkContext will create a YarnClusterScheduler, which 
 notifies a monitor in the Spark-YARN code that .  The Spark-Yarn code then 
 immediately fetches the preferredNodeLocationData from the SparkContext and 
 uses it to start requesting containers.
 But in the SparkContext constructor that takes the preferredNodeLocationData, 
 setting preferredNodeLocationData comes after the rest of the initialization, 
 so, if the Spark-YARN code comes around quickly enough after being notified, 
 the data that's fetched is the empty unset version.  The occurred during all 
 of my runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8949) Remove references to preferredNodeLocalityData in javadoc and print warning when used

2015-07-09 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-8949:
--

 Summary: Remove references to preferredNodeLocalityData in javadoc 
and print warning when used
 Key: SPARK-8949
 URL: https://issues.apache.org/jira/browse/SPARK-8949
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Reporter: Patrick Wendell
Priority: Blocker


The SparkContext constructor that takes preferredNodeLocalityData has not 
worked since before Spark 1.0. Also, the feature in SPARK-4352 is strictly 
better than a correct implementation of that feature.

We should remove any documentation references to that feature and print a 
warning when it is used saying it doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Patrick Wendell
Yeah - we can fix the docs separately from the release.

- Patrick

On Wed, Jul 8, 2015 at 10:03 AM, Mark Hamstra m...@clearstorydata.com wrote:
 HiveSparkSubmitSuite is fine for me, but I do see the same issue with
 DataFrameStatSuite -- OSX 10.10.4, java

 1.7.0_75, -Phive -Phive-thriftserver -Phadoop-2.4 -Pyarn


 On Wed, Jul 8, 2015 at 4:18 AM, Sean Owen so...@cloudera.com wrote:

 The POM issue is resolved and the build succeeds. The license and sigs
 still work. The tests pass for me with -Pyarn -Phadoop-2.6, with the
 following two exceptions. Is anyone else seeing these? this is
 consistent on Ubuntu 14 with Java 7/8:

 DataFrameStatSuite:
 ...
 - special crosstab elements (., '', null, ``) *** FAILED ***
   java.lang.NullPointerException:
   at
 org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$4.apply(StatFunctions.scala:131)
   at
 org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$4.apply(StatFunctions.scala:121)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at scala.collection.immutable.Map$Map4.foreach(Map.scala:181)
   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
   at
 org.apache.spark.sql.execution.stat.StatFunctions$.crossTabulate(StatFunctions.scala:121)
   at
 org.apache.spark.sql.DataFrameStatFunctions.crosstab(DataFrameStatFunctions.scala:94)
   at
 org.apache.spark.sql.DataFrameStatSuite$$anonfun$5.apply$mcV$sp(DataFrameStatSuite.scala:97)
   ...

 HiveSparkSubmitSuite:
 - SPARK-8368: includes jars passed in through --jars *** FAILED ***
   Process returned with exit code 1. See the log4j logs for more
 detail. (HiveSparkSubmitSuite.scala:92)
 - SPARK-8020: set sql conf in spark conf *** FAILED ***
   Process returned with exit code 1. See the log4j logs for more
 detail. (HiveSparkSubmitSuite.scala:92)
 - SPARK-8489: MissingRequirementError during reflection *** FAILED ***
   Process returned with exit code 1. See the log4j logs for more
 detail. (HiveSparkSubmitSuite.scala:92)

 On Tue, Jul 7, 2015 at 8:06 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
  1.4.1!
 
  This release fixes a handful of known issues in Spark 1.4.0, listed
  here:
  http://s.apache.org/spark-1.4.1
 
  The tag to be voted on is v1.4.1-rc3 (commit 3e8ae38):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  3e8ae38944f13895daf328555c1ad22cd590b089
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.1]
  https://repository.apache.org/content/repositories/orgapachespark-1123/
  [published as version: 1.4.1-rc3]
  https://repository.apache.org/content/repositories/orgapachespark-1124/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.1!
 
  The vote is open until Friday, July 10, at 20:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf

2015-07-08 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619681#comment-14619681
 ] 

Patrick Wendell edited comment on SPARK-8768 at 7/9/15 1:04 AM:


So it turns out that build/mvn still uses the system maven even if it downloads 
the newer version (this was the original design). Is it possible that is why 
it's breaking?

It might be nice to modify that script to have a flag like --force that will 
always use the downloaded maven.


was (Author: pwendell):
So it turns out that build/mvn still uses the system maven even if it downloads 
the newer version (this was the original design). Is it possible that is why 
it's breaking?

 SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in 
 Akka Protobuf
 -

 Key: SPARK-8768
 URL: https://issues.apache.org/jira/browse/SPARK-8768
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
Reporter: Josh Rosen
Priority: Blocker

 The end-to-end SparkSubmitSuite tests (launch simple application with 
 spark-submit, include jars passed in through --jars, and include jars 
 passed in through --packages) are currently failing for the pre-YARN Hadoop 
 builds.
 I managed to reproduce one of the Jenkins failures locally:
 {code}
 build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver 
 -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite 
 -Dtest=none
 {code}
 Here's the output from unit-tests.log:
 {code}
 = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple 
 application with spark-submit' =
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Class path contains multiple SLF4J bindings.
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 
 1.5.0-SNAPSHOT
 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: 
 authentication disabled; ui acls disabled; users with view permissions: 
 Set(joshrosen); users with modify permissions: Set(joshrosen)
 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started
 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from 
 thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down 
 ActorSystem [sparkDriver]
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: java.lang.VerifyError: class 
 akka.remote.WireFormats$AkkaControlMessage overrides final method 
 getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass1(Native Method)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449

[jira] [Commented] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf

2015-07-08 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619687#comment-14619687
 ] 

Patrick Wendell commented on SPARK-8768:


I created SPARK-8933 to track improvements to our maven script.

 SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in 
 Akka Protobuf
 -

 Key: SPARK-8768
 URL: https://issues.apache.org/jira/browse/SPARK-8768
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
Reporter: Josh Rosen
Priority: Blocker

 The end-to-end SparkSubmitSuite tests (launch simple application with 
 spark-submit, include jars passed in through --jars, and include jars 
 passed in through --packages) are currently failing for the pre-YARN Hadoop 
 builds.
 I managed to reproduce one of the Jenkins failures locally:
 {code}
 build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver 
 -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite 
 -Dtest=none
 {code}
 Here's the output from unit-tests.log:
 {code}
 = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple 
 application with spark-submit' =
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Class path contains multiple SLF4J bindings.
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 
 1.5.0-SNAPSHOT
 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: 
 authentication disabled; ui acls disabled; users with view permissions: 
 Set(joshrosen); users with modify permissions: Set(joshrosen)
 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started
 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from 
 thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down 
 ActorSystem [sparkDriver]
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: java.lang.VerifyError: class 
 akka.remote.WireFormats$AkkaControlMessage overrides final method 
 getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass1(Native Method)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 15/07/01 13:40:00.010 redirect

[jira] [Created] (SPARK-8933) Provide a --force flag to build/mvn that always uses downloaded maven

2015-07-08 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-8933:
--

 Summary: Provide a --force flag to build/mvn that always uses 
downloaded maven
 Key: SPARK-8933
 URL: https://issues.apache.org/jira/browse/SPARK-8933
 Project: Spark
  Issue Type: Improvement
  Components: Build
Reporter: Patrick Wendell
Assignee: Brennon York


I noticed the other day that build/mvn will still use the system maven if mvn 
binary is installed. I think this was intentional to support just using zinc 
and using the system maven (and to match the semantics of sbt/sbt). It would be 
nice to have a flag that will force it to use the downloaded maven. I was 
thinking it could have a --force flag, and then it could swallow that flag and 
not pass it onto maven.

This is useful in some cases like our test runners, where we want to coerce a 
specific version of maven is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Patrick Wendell
Hey All,

The issue that Josh pointed out is not just a test failure, it's an
issue with an important bug fix that was not correctly back-ported
into the 1.4 branch. Unfortunately the overall state of the 1.4 branch
tests on Jenkins was not in great shape so this was missed earlier on.

Given that this is fixed now, I have prepared another RC and am
leaning towards restarting the vote. If anyone feels strongly one way
or the other let me know, otherwise I'll restart it in a few hours. I
figured since this will likely finalize over the weekend anyways, it's
not so bad to wait 1 additional day in order to get that fix.

- Patrick

On Wed, Jul 8, 2015 at 12:00 PM, Josh Rosen rosenvi...@gmail.com wrote:
 I've filed https://issues.apache.org/jira/browse/SPARK-8903 to fix the
 DataFrameStatSuite test failure. The problem turned out to be caused by a
 mistake made while resolving a merge-conflict when backporting that patch to
 branch-1.4.

 I've submitted https://github.com/apache/spark/pull/7295 to fix this issue.

 On Wed, Jul 8, 2015 at 11:30 AM, Sean Owen so...@cloudera.com wrote:

 I see, but shouldn't this test not be run when Hive isn't in the build?

 On Wed, Jul 8, 2015 at 7:13 PM, Andrew Or and...@databricks.com wrote:
  @Sean You actually need to run HiveSparkSubmitSuite with `-Phive` and
  `-Phive-thriftserver`. The MissingRequirementsError is just complaining
  that
  it can't find the right classes. The other one (DataFrameStatSuite) is a
  little more concerning.
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Commented] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf

2015-07-08 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619681#comment-14619681
 ] 

Patrick Wendell commented on SPARK-8768:


So it turns out that build/mvn still uses the system maven even if it downloads 
the newer version (this was the original design). Is it possible that is why 
it's breaking?

 SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in 
 Akka Protobuf
 -

 Key: SPARK-8768
 URL: https://issues.apache.org/jira/browse/SPARK-8768
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 1.5.0
Reporter: Josh Rosen
Priority: Blocker

 The end-to-end SparkSubmitSuite tests (launch simple application with 
 spark-submit, include jars passed in through --jars, and include jars 
 passed in through --packages) are currently failing for the pre-YARN Hadoop 
 builds.
 I managed to reproduce one of the Jenkins failures locally:
 {code}
 build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver 
 -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite 
 -Dtest=none
 {code}
 Here's the output from unit-tests.log:
 {code}
 = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple 
 application with spark-submit' =
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Class path contains multiple SLF4J bindings.
 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Found binding in 
 [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO 
 Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 
 1.5.0-SNAPSHOT
 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: 
 joshrosen
 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: 
 authentication disabled; ui acls disabled; users with view permissions: 
 Set(joshrosen); users with modify permissions: Set(joshrosen)
 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started
 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from 
 thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down 
 ActorSystem [sparkDriver]
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils: java.lang.VerifyError: class 
 akka.remote.WireFormats$AkkaControlMessage overrides final method 
 getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass1(Native Method)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO 
 Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark

[VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-08 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.4.1!

This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1

The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.1]
https://repository.apache.org/content/repositories/orgapachespark-1125/
[published as version: 1.4.1-rc4]
https://repository.apache.org/content/repositories/orgapachespark-1126/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/

Please vote on releasing this package as Apache Spark 1.4.1!

The vote is open until Sunday, July 12, at 06:55 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC3)

2015-07-08 Thread Patrick Wendell
This vote is cancelled in favor of RC4.

- Patrick

On Tue, Jul 7, 2015 at 12:06 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc3 (commit 3e8ae38):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 3e8ae38944f13895daf328555c1ad22cd590b089

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1123/
 [published as version: 1.4.1-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1124/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Friday, July 10, at 20:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-07 Thread Patrick Wendell
Hey All,

This vote is cancelled in favor of RC3.

- Patrick

On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 07b95c7adf88f0662b7ab1c47e302ff5e6859606

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1120/
 [published as version: 1.4.1-rc2]
 https://repository.apache.org/content/repositories/orgapachespark-1121/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Monday, July 06, at 22:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Updated] (SPARK-6805) ML Pipeline API in SparkR

2015-07-07 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-6805:
---
Priority: Critical  (was: Major)

 ML Pipeline API in SparkR
 -

 Key: SPARK-6805
 URL: https://issues.apache.org/jira/browse/SPARK-6805
 Project: Spark
  Issue Type: Umbrella
  Components: ML, SparkR
Reporter: Xiangrui Meng
Priority: Critical

 SparkR was merged. So let's have this umbrella JIRA for the ML pipeline API 
 in SparkR. The implementation should be similar to the pipeline API 
 implementation in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Can not build master

2015-07-04 Thread Patrick Wendell
Hi Tomo,

For now you can do that as a work around. We are working on a fix for
this in the master branch but it may take a couple of days since the
issue is fairly complicated.

- Patrick

On Sat, Jul 4, 2015 at 7:00 AM, tomo cocoa cocoatom...@gmail.com wrote:
 Hi all,

 I have a same error and it seems depending on Maven versions.

 I tried building Spark using Maven with several versions on Jenkins.

 + Output of
 /Users/tomohiko/.jenkins/tools/hudson.tasks.Maven_MavenInstallation/mvn-3.3.3/bin/mvn
 -version:

 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T20:57:37+09:00)
 Maven home:
 /Users/tomohiko/.jenkins/tools/hudson.tasks.Maven_MavenInstallation/mvn-3.3.3
 Java version: 1.8.0, vendor: Oracle Corporation
 Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac

 + Jenkins Configuration:
 Jenkins project type: Maven Project
 Goals and options: -Phadoop-2.6 -DskipTests clean package

 + Maven versions and results:
 3.3.3 - infinite loop
 3.3.1 - infinite loop
 3.2.5 - SUCCESS


 So do we prefer to build Spark with Maven 3.2.5?


 On 4 July 2015 at 12:28, Andrew Or and...@databricks.com wrote:

 Thanks, I just tried it with 3.3.3 and I was able to reproduce it as well.

 2015-07-03 18:51 GMT-07:00 Tarek Auel tarek.a...@gmail.com:

 That's mine

 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T04:57:37-07:00)

 Maven home: /usr/local/Cellar/maven/3.3.3/libexec

 Java version: 1.8.0_45, vendor: Oracle Corporation

 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/jre

 Default locale: en_US, platform encoding: UTF-8

 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac


 On Fri, Jul 3, 2015 at 6:32 PM Ted Yu yuzhih...@gmail.com wrote:

 Here is mine:

 Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c;
 2015-03-13T13:10:27-07:00)
 Maven home: /home/hbase/apache-maven-3.3.1
 Java version: 1.8.0_45, vendor: Oracle Corporation
 Java home: /home/hbase/jdk1.8.0_45/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: linux, version: 2.6.32-504.el6.x86_64, arch: amd64,
 family: unix

 On Fri, Jul 3, 2015 at 6:05 PM, Andrew Or and...@databricks.com wrote:

 @Tarek and Ted, what maven versions are you using?

 2015-07-03 17:35 GMT-07:00 Krishna Sankar ksanka...@gmail.com:

 Patrick,
I assume an RC3 will be out for folks like me to test the
 distribution. As usual, I will run the tests when you have a new
 distribution.
 Cheers
 k/

 On Fri, Jul 3, 2015 at 4:38 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Patch that added test-jar dependencies:
 https://github.com/apache/spark/commit/bfe74b34

 Patch that originally disabled dependency reduced poms:

 https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724

 Patch that reverted the disabling of dependency reduced poms:

 https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e

 On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Okay I did some forensics with Sean Owen. Some things about this
  bug:
 
  1. The underlying cause is that we added some code to make the
  tests
  of sub modules depend on the core tests. For unknown reasons this
  causes Spark to hit MSHADE-148 for *some* combinations of build
  profiles.
 
  2. MSHADE-148 can be worked around by disabling building of
  dependency reduced poms because then the buggy code path is
  circumvented. Andrew Or did this in a patch on the 1.4 branch.
  However, that is not a tenable option for us because our
  *published*
  pom files require dependency reduction to substitute in the scala
  version correctly for the poms published to maven central.
 
  3. As a result, Andrew Or reverted his patch recently, causing some
  package builds to start failing again (but publishing works now).
 
  4. The reason this is not detected in our test harness or release
  build is that it is sensitive to the profiles enabled. The
  combination
  of profiles we enable in the test harness and release builds do not
  trigger this bug.
 
  The best path I see forward right now is to do the following:
 
  1. Disable creation of dependency reduced poms by default (this
  doesn't matter for people doing a package build) so typical users
  won't have this bug.
 
  2. Add a profile that re-enables that setting.
 
  3. Use the above profile when publishing release artifacts to maven
  central.
 
  4. Hope that we don't hit this bug for publishing.
 
  - Patrick
 
  On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com
  wrote:
  Doesn't change anything for me.
 
  On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell
  pwend...@gmail.com wrote:
 
  Can you try using the built in maven build/mvn...? All of our
  builds
  are passing on Jenkins so I wonder if it's a maven version issue:
 
  https

Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell
Hm - what if you do a fresh git checkout (just to make sure you don't
have an older maven version downloaded). It also might be that this
really is an issue even with Maven 3.3.3. I just am not sure why it's
not reflected in our continuous integration or the build of the
release packages themselves:

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

It could be that it's dependent on which modules are enabled.

On Fri, Jul 3, 2015 at 3:46 PM, Robin East robin.e...@xense.co.uk wrote:
 which got me thinking:

 build/mvn -version
 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;
 support was removed in 8.0
 Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c;
 2015-03-13T20:10:27+00:00)
 Maven home: /usr/local/Cellar/maven/3.3.1/libexec
 Java version: 1.8.0_40, vendor: Oracle Corporation
 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.2, arch: x86_64, family: “mac

 Seems to be using 3.3.1

 On 3 Jul 2015, at 23:44, Robin East robin.e...@xense.co.uk wrote:

 I used the following build command:

 build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
 package

 this also gave the ‘Dependency-reduced POM’ loop

 Robin

 On 3 Jul 2015, at 23:41, Patrick Wendell pwend...@gmail.com wrote:

 What if you use the built-in maven (i.e. build/mvn). It might be that
 we require a newer version of maven than you have. The release itself
 is built with maven 3.3.3:

 https://github.com/apache/spark/blob/master/build/mvn#L72

 - Patrick

 On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com wrote:

 Yep, happens to me as well. Build loops.
 Cheers
 k/

 On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote:


 Patrick:
 I used the following command:
 ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean
 package

 The build doesn't seem to stop.
 Here is tail of build output:

 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml

 Here is part of the stack trace for the build process:

 http://pastebin.com/xL2Y0QMU

 FYI

 On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com
 wrote:


 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 07b95c7adf88f0662b7ab1c47e302ff5e6859606

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1120/
 [published as version: 1.4.1-rc2]
 https://repository.apache.org/content/repositories/orgapachespark-1121/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Monday, July 06, at 22:00 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell
Let's continue the disucssion on the other thread relating to the master build.

On Fri, Jul 3, 2015 at 4:13 PM, Patrick Wendell pwend...@gmail.com wrote:
 Thanks - it appears this is just a legitimate issue with the build,
 affecting all versions of Maven.

 On Fri, Jul 3, 2015 at 4:02 PM, Krishna Sankar ksanka...@gmail.com wrote:
 I have 3.3.3
 USS-Defiant:NW ksankar$ mvn -version
 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T04:57:37-07:00)
 Maven home: /usr/local/apache-maven-3.3.3
 Java version: 1.7.0_60, vendor: Oracle Corporation
 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac

 Let me nuke it and reinstall maven.

 Cheers
 k/

 On Fri, Jul 3, 2015 at 3:41 PM, Patrick Wendell pwend...@gmail.com wrote:

 What if you use the built-in maven (i.e. build/mvn). It might be that
 we require a newer version of maven than you have. The release itself
 is built with maven 3.3.3:

 https://github.com/apache/spark/blob/master/build/mvn#L72

 - Patrick

 On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com
 wrote:
  Yep, happens to me as well. Build loops.
  Cheers
  k/
 
  On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  Patrick:
  I used the following command:
  ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive
  clean
  package
 
  The build doesn't seem to stop.
  Here is tail of build output:
 
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
 
  Here is part of the stack trace for the build process:
 
  http://pastebin.com/xL2Y0QMU
 
  FYI
 
  On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
  version
  1.4.1!
 
  This release fixes a handful of known issues in Spark 1.4.0, listed
  here:
  http://s.apache.org/spark-1.4.1
 
  The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  07b95c7adf88f0662b7ab1c47e302ff5e6859606
 
  The release files, including signatures, digests, etc. can be found
  at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.1]
 
  https://repository.apache.org/content/repositories/orgapachespark-1120/
  [published as version: 1.4.1-rc2]
 
  https://repository.apache.org/content/repositories/orgapachespark-1121/
 
  The documentation corresponding to this release can be found at:
 
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.1!
 
  The vote is open until Monday, July 06, at 22:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Can not build master

2015-07-03 Thread Patrick Wendell
Okay I did some forensics with Sean Owen. Some things about this bug:

1. The underlying cause is that we added some code to make the tests
of sub modules depend on the core tests. For unknown reasons this
causes Spark to hit MSHADE-148 for *some* combinations of build
profiles.

2. MSHADE-148 can be worked around by disabling building of
dependency reduced poms because then the buggy code path is
circumvented. Andrew Or did this in a patch on the 1.4 branch.
However, that is not a tenable option for us because our *published*
pom files require dependency reduction to substitute in the scala
version correctly for the poms published to maven central.

3. As a result, Andrew Or reverted his patch recently, causing some
package builds to start failing again (but publishing works now).

4. The reason this is not detected in our test harness or release
build is that it is sensitive to the profiles enabled. The combination
of profiles we enable in the test harness and release builds do not
trigger this bug.

The best path I see forward right now is to do the following:

1. Disable creation of dependency reduced poms by default (this
doesn't matter for people doing a package build) so typical users
won't have this bug.

2. Add a profile that re-enables that setting.

3. Use the above profile when publishing release artifacts to maven central.

4. Hope that we don't hit this bug for publishing.

- Patrick

On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote:
 Doesn't change anything for me.

 On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote:

 Can you try using the built in maven build/mvn...? All of our builds
 are passing on Jenkins so I wonder if it's a maven version issue:

 https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

 - Patrick

 On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote:
  Please take a look at SPARK-8781
  (https://github.com/apache/spark/pull/7193)
 
  Cheers
 
  On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote:
 
  I found a solution, there might be a better one.
 
  https://github.com/apache/spark/pull/7217
 
  On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk
  wrote:
 
  Yes me too
 
  On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote:
 
  This is what I got (the last line was repeated non-stop):
 
  [INFO] Replacing original artifact with shaded artifact.
  [INFO] Replacing
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar
  with
 
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
 
  On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com
  wrote:
 
  Hi all,
 
  I am trying to build the master, but it stucks and prints
 
  [INFO] Dependency-reduced POM written at:
  /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml
 
  build command:  mvn -DskipTests clean package
 
  Do others have the same issue?
 
  Regards,
  Tarek
 
 
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Can not build master

2015-07-03 Thread Patrick Wendell
Patch that added test-jar dependencies:
https://github.com/apache/spark/commit/bfe74b34

Patch that originally disabled dependency reduced poms:
https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724

Patch that reverted the disabling of dependency reduced poms:
https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e

On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell pwend...@gmail.com wrote:
 Okay I did some forensics with Sean Owen. Some things about this bug:

 1. The underlying cause is that we added some code to make the tests
 of sub modules depend on the core tests. For unknown reasons this
 causes Spark to hit MSHADE-148 for *some* combinations of build
 profiles.

 2. MSHADE-148 can be worked around by disabling building of
 dependency reduced poms because then the buggy code path is
 circumvented. Andrew Or did this in a patch on the 1.4 branch.
 However, that is not a tenable option for us because our *published*
 pom files require dependency reduction to substitute in the scala
 version correctly for the poms published to maven central.

 3. As a result, Andrew Or reverted his patch recently, causing some
 package builds to start failing again (but publishing works now).

 4. The reason this is not detected in our test harness or release
 build is that it is sensitive to the profiles enabled. The combination
 of profiles we enable in the test harness and release builds do not
 trigger this bug.

 The best path I see forward right now is to do the following:

 1. Disable creation of dependency reduced poms by default (this
 doesn't matter for people doing a package build) so typical users
 won't have this bug.

 2. Add a profile that re-enables that setting.

 3. Use the above profile when publishing release artifacts to maven central.

 4. Hope that we don't hit this bug for publishing.

 - Patrick

 On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote:
 Doesn't change anything for me.

 On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote:

 Can you try using the built in maven build/mvn...? All of our builds
 are passing on Jenkins so I wonder if it's a maven version issue:

 https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

 - Patrick

 On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote:
  Please take a look at SPARK-8781
  (https://github.com/apache/spark/pull/7193)
 
  Cheers
 
  On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote:
 
  I found a solution, there might be a better one.
 
  https://github.com/apache/spark/pull/7217
 
  On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk
  wrote:
 
  Yes me too
 
  On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote:
 
  This is what I got (the last line was repeated non-stop):
 
  [INFO] Replacing original artifact with shaded artifact.
  [INFO] Replacing
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar
  with
 
  /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark/bagel/dependency-reduced-pom.xml
 
  On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com
  wrote:
 
  Hi all,
 
  I am trying to build the master, but it stucks and prints
 
  [INFO] Dependency-reduced POM written at:
  /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml
 
  build command:  mvn -DskipTests clean package
 
  Do others have the same issue?
 
  Regards,
  Tarek
 
 
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Can not build master

2015-07-03 Thread Patrick Wendell
Can you try using the built in maven build/mvn...? All of our builds
are passing on Jenkins so I wonder if it's a maven version issue:

https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/

- Patrick

On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote:
 Please take a look at SPARK-8781 (https://github.com/apache/spark/pull/7193)

 Cheers

 On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote:

 I found a solution, there might be a better one.

 https://github.com/apache/spark/pull/7217

 On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk wrote:

 Yes me too

 On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote:

 This is what I got (the last line was repeated non-stop):

 [INFO] Replacing original artifact with shaded artifact.
 [INFO] Replacing
 /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with
 /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar
 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark/bagel/dependency-reduced-pom.xml
 [INFO] Dependency-reduced POM written at:
 /home/hbase/spark/bagel/dependency-reduced-pom.xml

 On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com wrote:

 Hi all,

 I am trying to build the master, but it stucks and prints

 [INFO] Dependency-reduced POM written at:
 /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml

 build command:  mvn -DskipTests clean package

 Do others have the same issue?

 Regards,
 Tarek





-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell
Thanks - it appears this is just a legitimate issue with the build,
affecting all versions of Maven.

On Fri, Jul 3, 2015 at 4:02 PM, Krishna Sankar ksanka...@gmail.com wrote:
 I have 3.3.3
 USS-Defiant:NW ksankar$ mvn -version
 Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06;
 2015-04-22T04:57:37-07:00)
 Maven home: /usr/local/apache-maven-3.3.3
 Java version: 1.7.0_60, vendor: Oracle Corporation
 Java home:
 /Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac

 Let me nuke it and reinstall maven.

 Cheers
 k/

 On Fri, Jul 3, 2015 at 3:41 PM, Patrick Wendell pwend...@gmail.com wrote:

 What if you use the built-in maven (i.e. build/mvn). It might be that
 we require a newer version of maven than you have. The release itself
 is built with maven 3.3.3:

 https://github.com/apache/spark/blob/master/build/mvn#L72

 - Patrick

 On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com
 wrote:
  Yep, happens to me as well. Build loops.
  Cheers
  k/
 
  On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  Patrick:
  I used the following command:
  ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive
  clean
  package
 
  The build doesn't seem to stop.
  Here is tail of build output:
 
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
  [INFO] Dependency-reduced POM written at:
  /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml
 
  Here is part of the stack trace for the build process:
 
  http://pastebin.com/xL2Y0QMU
 
  FYI
 
  On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
  version
  1.4.1!
 
  This release fixes a handful of known issues in Spark 1.4.0, listed
  here:
  http://s.apache.org/spark-1.4.1
 
  The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  07b95c7adf88f0662b7ab1c47e302ff5e6859606
 
  The release files, including signatures, digests, etc. can be found
  at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.1]
 
  https://repository.apache.org/content/repositories/orgapachespark-1120/
  [published as version: 1.4.1-rc2]
 
  https://repository.apache.org/content/repositories/orgapachespark-1121/
 
  The documentation corresponding to this release can be found at:
 
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.1!
 
  The vote is open until Monday, July 06, at 22:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.1
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[RESULT] [VOTE] Release Apache Spark 1.4.1

2015-07-03 Thread Patrick Wendell
This vote is cancelled in favor of RC2. Thanks very much to Sean Owen
for triaging an important bug associated with RC1.

I took a look at the branch-1.4 contents and I think its safe to cut
RC2 from the head of that branch (i.e no very high risk patches that I
could see). JIRA management around the time of the RC voting is an
interesting topic, Sean I like your most recent proposal. Maybe we can
put that on the wiki or start a DISCUSS thread to cover that topic.

On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[VOTE] Release Apache Spark 1.4.1 (RC2)

2015-07-03 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.4.1!

This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1

The tag to be voted on is v1.4.1-rc2 (commit 07b95c7):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
07b95c7adf88f0662b7ab1c47e302ff5e6859606

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.1]
https://repository.apache.org/content/repositories/orgapachespark-1120/
[published as version: 1.4.1-rc2]
https://repository.apache.org/content/repositories/orgapachespark-1121/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/

Please vote on releasing this package as Apache Spark 1.4.1!

The vote is open until Monday, July 06, at 22:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Resolved] (SPARK-8649) Mapr repository is not defined properly

2015-06-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-8649.

   Resolution: Fixed
Fix Version/s: 1.5.0

 Mapr repository is not defined properly
 ---

 Key: SPARK-8649
 URL: https://issues.apache.org/jira/browse/SPARK-8649
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Ashok Kumar
Priority: Trivial
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.1

2015-06-28 Thread Patrick Wendell
Hey Krishna - this is still the current release candidate.

- Patrick

On Sun, Jun 28, 2015 at 12:14 PM, Krishna Sankar ksanka...@gmail.com wrote:
 Patrick,
Haven't seen any replies on test results. I will byte ;o) - Should I test
 this version or is another one in the wings ?
 Cheers
 k/

 On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Commented] (SPARK-8667) Improve Spark UI behavior at scale

2015-06-27 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604000#comment-14604000
 ] 

Patrick Wendell commented on SPARK-8667:


Thanks Sean. I looked for a while for an older JIRA on this, but couldn't find 
it. This is definitely a dup of SPARK-2015.

 Improve Spark UI behavior at scale
 --

 Key: SPARK-8667
 URL: https://issues.apache.org/jira/browse/SPARK-8667
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Patrick Wendell
Assignee: Shixiong Zhu

 This is a parent ticket and we can create child tickets when solving specific 
 issues. The main problem I would like to solve is the fact that the Spark UI 
 has issues at very large scale.
 The worst issue is when there is a stage page with more than a few thousand 
 tasks. In this case:
 1. The page itself is very slow to load and becomes unresponsive with huge 
 number of tasks.
 2. The Scala XML output can become so large that it crashes the driver 
 program due to OOM for a page with a huge number of tasks.
 I am not sure if (1) is caused by javascript slowness, or maybe just the raw 
 amount of data sent over the wire. If it is the latter, it might be possible 
 to add compression to the HTTP payload to help improve load time.
 It would be nice to reproduce+investigate these issues further and create 
 specific sub tasks to improve them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8667) Improve Spark UI behavior at scale

2015-06-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-8667.

Resolution: Duplicate

 Improve Spark UI behavior at scale
 --

 Key: SPARK-8667
 URL: https://issues.apache.org/jira/browse/SPARK-8667
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Patrick Wendell
Assignee: Shixiong Zhu

 This is a parent ticket and we can create child tickets when solving specific 
 issues. The main problem I would like to solve is the fact that the Spark UI 
 has issues at very large scale.
 The worst issue is when there is a stage page with more than a few thousand 
 tasks. In this case:
 1. The page itself is very slow to load and becomes unresponsive with huge 
 number of tasks.
 2. The Scala XML output can become so large that it crashes the driver 
 program due to OOM for a page with a huge number of tasks.
 I am not sure if (1) is caused by javascript slowness, or maybe just the raw 
 amount of data sent over the wire. If it is the latter, it might be possible 
 to add compression to the HTTP payload to help improve load time.
 It would be nice to reproduce+investigate these issues further and create 
 specific sub tasks to improve them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.1

2015-06-26 Thread Patrick Wendell
Hey Tom - no one voted on this yet, so I need to keep it open until
people vote. But I'm not aware of specific things we are waiting for.
Anyone else?

- Patrick

On Fri, Jun 26, 2015 at 7:10 AM, Tom Graves tgraves...@yahoo.com wrote:
 So is this open for vote then or are we waiting on other things?

 Tom



 On Thursday, June 25, 2015 10:32 AM, Andrew Ash and...@andrewash.com
 wrote:


 I would guess that many tickets targeted at 1.4.1 were set that way during
 the tail end of the 1.4.0 voting process as people realized they wouldn't
 make the .0 release in time.  In that case, they were likely aiming for a
 1.4.x release, not necessarily 1.4.1 specifically.  Maybe creating a 1.4.x
 target in Jira in addition to 1.4.0, 1.4.1, 1.4.2, etc would make it more
 clear that these tickets are targeted at some 1.4 update release rather
 than specifically the 1.4.1 update.

 On Thu, Jun 25, 2015 at 5:38 AM, Sean Owen so...@cloudera.com wrote:

 That makes sense to me -- there's an urgent fix to get out. I missed
 that part. Not that it really matters but was that expressed
 elsewhere?

 I know we tend to start the RC process even when a few more changes
 are still in progress, to get a first wave or two of testing done
 early, knowing that the RC won't be the final one. It makes sense for
 some issues for X to be open when an RC is cut, if they are actually
 truly intended for X.

 44 seems like a lot, and I don't think it's good practice just because
 that's how it's happened before. It looks like half of them weren't
 actually important for 1.4.x as we're now down to 21. I don't disagree
 with the idea that only most of the issues targeted for version X
 will be in version X; the target expresses a stretch goal. Given the
 fast pace of change that's probably the only practical view.

 I think we're just missing a step then: before RC of X, ask people to
 review and update the target of JIRAs for X? In this case, it was a
 good point to untarget stuff from 1.4.x entirely; I suspect everything
 else should then be targeted at 1.4.2 by default with the exception of
 a handful that people really do intend to work in for 1.4.1 before its
 final release.

 I know it sounds like pencil-pushing, but it's a cheap way to bring
 some additional focus to release planning. RC time has felt like a
 last-call to *begin* changes ad-hoc when it would go faster if it were
 more intentional and constrained. Meaning faster RCs, meaning getting
 back to a 3-month release cycle or less, and meaning less rush to push
 stuff into a .0 release and less frequent need for a maintenance .1
 version.

 So what happens if all 1.4.1-targeted JIRAs are targeted to 1.4.2?
 would that miss something that is definitely being worked on for
 1.4.1?

 On Wed, Jun 24, 2015 at 6:56 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Sean,

 This is being shipped now because there is a severe bug in 1.4.0 that
 can cause data corruption for Parquet users.

 There are no blockers targeted for 1.4.1 - so I don't see that JIRA is
 inconsistent with shipping a release now. The goal of having every
 single targeted JIRA cleared by the time we start voting, I don't
 think there is broad consensus and cultural adoption of that principle
 yet. So I do not take it as a signal this release is premature (the
 story has been the same for every previous release we've ever done).

 The fact that we hit 90/124 of issues targeted at this release means
 we are targeting such that we get around 70% of issues merged. That
 actually doesn't seem so bad to me since there is some uncertainty in
 the process. B

 - Patrick

 On Wed, Jun 24, 2015 at 1:54 AM, Sean Owen so...@cloudera.com wrote:
 There are 44 issues still targeted for 1.4.1. None are Blockers; 12
 are Critical. ~80% were opened and/or set by committers. Compare with
 90 issues resolved for 1.4.1.

 I'm concerned that committers are targeting lots more for a release
 even in the short term than realistically can go in. On its face, it
 suggests that an RC is premature. Why is 1.4.1 being put forth for
 release now? It seems like people are saying they want a fair bit more
 time to work on 1.4.1.

 I suspect that in fact people would rather untarget / slip (again)
 these JIRAs, but it calls into question again how the targeting is
 consistently off by this much.

 What unresolved JIRAs targeted for 1.4.1 are *really* still open for
 1.4.1? like, what would go badly if all 32 non-Critical JIRAs were
 untargeted now? is the reality that there are a handful of items to
 get in before the final release, and those are hopefully the ~12
 critical ones? How about some review of that before we ask people to
 seriously test these bits?

 On Wed, Jun 24, 2015 at 8:37 AM, Patrick Wendell pwend...@gmail.com
 wrote:
 Please vote on releasing the following candidate as Apache Spark version
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed
 here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted

[jira] [Created] (SPARK-8667) Improve Spark UI behavior at scale

2015-06-26 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-8667:
--

 Summary: Improve Spark UI behavior at scale
 Key: SPARK-8667
 URL: https://issues.apache.org/jira/browse/SPARK-8667
 Project: Spark
  Issue Type: Improvement
Reporter: Patrick Wendell
Assignee: Shixiong Zhu


This is a parent ticket and we can create child tickets when solving specific 
issues. The main problem I would like to solve is the fact that the Spark UI 
has issues at very large scale.

The worst issue is when there is a stage page with more than a few thousand 
tasks. In this case:
1. The page itself is very slow to load and becomes unresponsive with huge 
number of tasks.
2. The Scala XML output can become so large that it crashes the driver program 
due to OOM for a page with a huge number of tasks.

I am not sure if (1) is caused by javascript slowness, or maybe just the raw 
amount of data sent over the wire. If it is the latter, it might be possible to 
add compression to the HTTP payload to help improve load time.

It would be nice to reproduce+investigate these issues further and create 
specific sub tasks to improve them.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8667) Improve Spark UI behavior at scale

2015-06-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8667:
---
Component/s: Web UI

 Improve Spark UI behavior at scale
 --

 Key: SPARK-8667
 URL: https://issues.apache.org/jira/browse/SPARK-8667
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Reporter: Patrick Wendell
Assignee: Shixiong Zhu

 This is a parent ticket and we can create child tickets when solving specific 
 issues. The main problem I would like to solve is the fact that the Spark UI 
 has issues at very large scale.
 The worst issue is when there is a stage page with more than a few thousand 
 tasks. In this case:
 1. The page itself is very slow to load and becomes unresponsive with huge 
 number of tasks.
 2. The Scala XML output can become so large that it crashes the driver 
 program due to OOM for a page with a huge number of tasks.
 I am not sure if (1) is caused by javascript slowness, or maybe just the raw 
 amount of data sent over the wire. If it is the latter, it might be possible 
 to add compression to the HTTP payload to help improve load time.
 It would be nice to reproduce+investigate these issues further and create 
 specific sub tasks to improve them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.1

2015-06-24 Thread Patrick Wendell
Hey Sean,

This is being shipped now because there is a severe bug in 1.4.0 that
can cause data corruption for Parquet users.

There are no blockers targeted for 1.4.1 - so I don't see that JIRA is
inconsistent with shipping a release now. The goal of having every
single targeted JIRA cleared by the time we start voting, I don't
think there is broad consensus and cultural adoption of that principle
yet. So I do not take it as a signal this release is premature (the
story has been the same for every previous release we've ever done).

The fact that we hit 90/124 of issues targeted at this release means
we are targeting such that we get around 70% of issues merged. That
actually doesn't seem so bad to me since there is some uncertainty in
the process. B

- Patrick

On Wed, Jun 24, 2015 at 1:54 AM, Sean Owen so...@cloudera.com wrote:
 There are 44 issues still targeted for 1.4.1. None are Blockers; 12
 are Critical. ~80% were opened and/or set by committers. Compare with
 90 issues resolved for 1.4.1.

 I'm concerned that committers are targeting lots more for a release
 even in the short term than realistically can go in. On its face, it
 suggests that an RC is premature. Why is 1.4.1 being put forth for
 release now? It seems like people are saying they want a fair bit more
 time to work on 1.4.1.

 I suspect that in fact people would rather untarget / slip (again)
 these JIRAs, but it calls into question again how the targeting is
 consistently off by this much.

 What unresolved JIRAs targeted for 1.4.1 are *really* still open for
 1.4.1? like, what would go badly if all 32 non-Critical JIRAs were
 untargeted now? is the reality that there are a handful of items to
 get in before the final release, and those are hopefully the ~12
 critical ones? How about some review of that before we ask people to
 seriously test these bits?

 On Wed, Jun 24, 2015 at 8:37 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.1!

 This release fixes a handful of known issues in Spark 1.4.0, listed here:
 http://s.apache.org/spark-1.4.1

 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 60e08e50751fe3929156de956d62faea79f5b801

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.1]
 https://repository.apache.org/content/repositories/orgapachespark-1118/
 [published as version: 1.4.1-rc1]
 https://repository.apache.org/content/repositories/orgapachespark-1119/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.1!

 The vote is open until Saturday, June 27, at 06:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.1
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[VOTE] Release Apache Spark 1.4.1

2015-06-23 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.4.1!

This release fixes a handful of known issues in Spark 1.4.0, listed here:
http://s.apache.org/spark-1.4.1

The tag to be voted on is v1.4.1-rc1 (commit 60e08e5):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
60e08e50751fe3929156de956d62faea79f5b801

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.1]
https://repository.apache.org/content/repositories/orgapachespark-1118/
[published as version: 1.4.1-rc1]
https://repository.apache.org/content/repositories/orgapachespark-1119/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/

Please vote on releasing this package as Apache Spark 1.4.1!

The vote is open until Saturday, June 27, at 06:32 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8494:
---
Assignee: (was: Patrick Wendell)

 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
 Attachments: spark-test-case.zip


 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a 
 ClassNotFoundException otherwise.
 I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT.
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:270)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 {code}
 {code}
 name := spark-test-case
 version := 1.0
 scalaVersion := 2.10.4
 resolvers += spray repo at http://repo.spray.io;
 resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;
 val akkaVersion = 2.3.11
 val sprayVersion = 1.3.3
 libraryDependencies ++= Seq(
   com.h2database  % h2   % 1.4.187,
   com.typesafe.akka  %% akka-actor   % akkaVersion,
   com.typesafe.akka  %% akka-slf4j   % akkaVersion,
   ch.qos.logback  % logback-classic  % 1.0.13,
   io.spray   %% spray-can% sprayVersion,
   io.spray   %% spray-routing% sprayVersion,
   io.spray   %% spray-json   % 1.3.1,
   com.databricks %% spark-csv% 1.0.3,
   org.specs2 %% specs2   % 2.4.17   % test,
   org.specs2 %% specs2-junit % 2.4.17   % test,
   io.spray   %% spray-testkit% sprayVersion   % test,
   com.typesafe.akka  %% akka-testkit % akkaVersion% test,
   junit   % junit% 4.12 % test
 )
 scalacOptions ++= Seq(
   -unchecked,
   -deprecation,
   -Xlint,
   -Ywarn-dead-code,
   -language:_,
   -target:jvm-1.7,
   -encoding, UTF-8
 )
 testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7292) Provide operator to truncate lineage without persisting RDD's

2015-06-19 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7292:
---
Assignee: Andrew Or

 Provide operator to truncate lineage without persisting RDD's
 -

 Key: SPARK-7292
 URL: https://issues.apache.org/jira/browse/SPARK-7292
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Andrew Or

 Checkpointing exists in Spark to truncate a lineage chain. I've heard 
 requests from some users to allow truncation of lineage in a way that is 
 cheap and doesn't serialized and persist the RDD. This is possible if the 
 user is willing to forgo fault tolerance for that RDD (for instance, for 
 shorter running jobs or ones that use a small number of machines). It's 
 pretty easy to allow this so we should look into it for Spark 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8416) Thread dump page should highlight Spark executor threads

2015-06-18 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592411#comment-14592411
 ] 

Patrick Wendell commented on SPARK-8416:


It would also be nice to put those threads first in the list.

 Thread dump page should highlight Spark executor threads
 

 Key: SPARK-8416
 URL: https://issues.apache.org/jira/browse/SPARK-8416
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Josh Rosen

 On the Spark thread dump page, it's hard to pick out executor threads from 
 other system threads.  The UI should employ some color coding or highlighting 
 to make this more apparent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8434) Add a pretty parameter to show

2015-06-18 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8434:
---
Component/s: SQL

 Add a pretty parameter to show
 

 Key: SPARK-8434
 URL: https://issues.apache.org/jira/browse/SPARK-8434
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Shixiong Zhu

 Sometimes the user may want to show the complete content of cells, such as 
 sql(set -v)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8450) PySpark write.parquet raises Unsupported datatype DecimalType()

2015-06-18 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8450:
---
Component/s: SQL
 PySpark

 PySpark write.parquet raises Unsupported datatype DecimalType()
 ---

 Key: SPARK-8450
 URL: https://issues.apache.org/jira/browse/SPARK-8450
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
 Environment: Spark 1.4.0 on Debian
Reporter: Peter Hoffmann

 I'm getting an Exception when I try to save a DataFrame with a DeciamlType as 
 an parquet file
 Minimal Example:
 from decimal import Decimal
 from pyspark.sql import SQLContext
 from pyspark.sql.types import *
 sqlContext = SQLContext(sc)
 schema = StructType([
 StructField('id', LongType()),
 StructField('value', DecimalType())])
 rdd = sc.parallelize([[1, Decimal(0.5)],[2, Decimal(2.9)]])
 df = sqlContext.createDataFrame(rdd, schema)
 df.write.parquet(hdfs://srv:9000/user/ph/decimal.parquet, 'overwrite')
 Stack Trace
 ---
 Py4JJavaError Traceback (most recent call last)
 ipython-input-19-a77dac8de5f3 in module()
  1 sr.write.parquet(hdfs://srv:9000/user/ph/decimal.parquet, 
 'overwrite')
 /home/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/readwriter.pyc in 
 parquet(self, path, mode)
 367 :param mode: one of `append`, `overwrite`, `error`, `ignore` 
 (default: error)
 368 
 -- 369 return self._jwrite.mode(mode).parquet(path)
 370 
 371 @since(1.4)
 /home/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py
  in __call__(self, *args)
 536 answer = self.gateway_client.send_command(command)
 537 return_value = get_return_value(answer, self.gateway_client,
 -- 538 self.target_id, self.name)
 539 
 540 for temp_arg in temp_args:
 /home/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py
  in get_return_value(answer, gateway_client, target_id, name)
 298 raise Py4JJavaError(
 299 'An error occurred while calling {0}{1}{2}.\n'.
 -- 300 format(target_id, '.', name), value)
 301 else:
 302 raise Py4JError(
 Py4JJavaError: An error occurred while calling o361.parquet.
 : org.apache.spark.SparkException: Job aborted.
   at 
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:138)
   at 
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:114)
   at 
 org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
   at 
 org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
   at 
 org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
   at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
   at 
 org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
   at 
 org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
   at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:332)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
   at 
 org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:281)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
   at py4j.Gateway.invoke(Gateway.java:259)
   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
   at py4j.commands.CallCommand.execute(CallCommand.java:79)
   at py4j.GatewayConnection.run(GatewayConnection.java:207)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
 Task 158 in stage 35.0 failed 4 times, most recent failure: Lost task 158.3 
 in stage 35.0 (TID 2736, 10.2.160.14

[jira] [Updated] (SPARK-8427) Incorrect ACL checking for partitioned table in Spark SQL-1.4

2015-06-18 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8427:
---
Priority: Critical  (was: Blocker)

 Incorrect ACL checking for partitioned table in Spark SQL-1.4
 -

 Key: SPARK-8427
 URL: https://issues.apache.org/jira/browse/SPARK-8427
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: CentOS 6  OS X 10.9.5, Hive-0.13.1, Spark-1.4, Hadoop 
 2.6.0
Reporter: Karthik Subramanian
Priority: Critical
  Labels: security

 Problem Statement:
 While doing query on a partitioned table using Spark SQL (Version 1.4.0), 
 access denied exception is observed on the partition the user doesn’t belong 
 to (The user permission is controlled using HDF ACLs). The same works 
 correctly in hive.
 Usercase: To address Multitenancy
 Consider a table containing multiple customers and each customer with 
 multiple facility. The table is partitioned by customer and facility. The 
 user belonging to on facility will not have access to other facility. This is 
 enforced using HDFS ACLs on corresponding directories. When querying on the 
 table as ‘user1’ belonging to ‘facility1’ and ‘customer1’ on the particular 
 partition (using ‘where’ clause) only the corresponding directory access 
 should be verified and not the entire table. 
 The above use case works as expected when using HIVE client, version 0.13.1  
 1.1.0. 
 The query used: select count(*) from customertable where customer=‘customer1’ 
 and facility=‘facility1’
 Below is the exception received in Spark-shell:
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=user1, access=READ_EXECUTE, 
 inode=/data/customertable/customer=customer2/facility=facility2”:root:supergroup:drwxrwx---:group::r-x,group:facility2:rwx
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkAccessAcl(FSPermissionChecker.java:351)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:253)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6494)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6419)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4954)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4915)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:826)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:612)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1971)
   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755

[jira] [Updated] (SPARK-5787) Protect JVM from some not-important exceptions

2015-06-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5787:
---
Target Version/s: 1.5.0  (was: 1.4.0)

 Protect JVM from some not-important exceptions
 --

 Key: SPARK-5787
 URL: https://issues.apache.org/jira/browse/SPARK-5787
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Davies Liu
Priority: Critical

 Any un-captured exception will shutdown the executor JVM, so we should 
 capture all those exceptions which did not hurt executor much (executor is 
 still functional).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7448) Implement custom bye array serializer for use in PySpark shuffle

2015-06-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7448:
---
Target Version/s: 1.5.0  (was: 1.4.0)

 Implement custom bye array serializer for use in PySpark shuffle
 

 Key: SPARK-7448
 URL: https://issues.apache.org/jira/browse/SPARK-7448
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Shuffle
Reporter: Josh Rosen
Priority: Minor

 PySpark's shuffle typically shuffles Java RDDs that contain byte arrays. We 
 should implement a custom Serializer for use in these shuffles.  This will 
 allow us to take advantage of shuffle optimizations like SPARK-7311 for 
 PySpark without requiring users to change the default serializer to 
 KryoSerializer (this is useful for JobServer-type applications).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7078) Cache-aware binary processing in-memory sort

2015-06-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7078:
---
Target Version/s: 1.5.0  (was: 1.4.0)

 Cache-aware binary processing in-memory sort
 

 Key: SPARK-7078
 URL: https://issues.apache.org/jira/browse/SPARK-7078
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Reporter: Reynold Xin
Assignee: Josh Rosen

 A cache-friendly sort algorithm that can be used eventually for:
 * sort-merge join
 * shuffle
 See the old alpha sort paper: 
 http://research.microsoft.com/pubs/68249/alphasort.doc
 Note that state-of-the-art for sorting has improved quite a bit, but we can 
 easily optimize the sorting algorithm itself later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7041) Avoid writing empty files in BypassMergeSortShuffleWriter

2015-06-17 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7041:
---
Target Version/s: 1.5.0  (was: 1.4.0)

 Avoid writing empty files in BypassMergeSortShuffleWriter
 -

 Key: SPARK-7041
 URL: https://issues.apache.org/jira/browse/SPARK-7041
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Reporter: Josh Rosen
Assignee: Josh Rosen

 In BypassMergeSortShuffleWriter, we may end up opening disk writers files for 
 empty partitions; this occurs because we manually call {{open()}} after 
 creating the writer, causing serialization and compression input streams to 
 be created; these streams may write headers to the output stream, resulting 
 in non-zero-length files being created for partitions that contain no 
 records.  This is unnecessary, though, since the disk object writer will 
 automatically open itself when the first write is performed.  Removing this 
 eager {{open()}} call and rewriting the consumers to cope with the 
 non-existence of empty files results in a large performance benefit for 
 certain sparse workloads when using sort-based shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6393) Extra RPC to the AM during killExecutor invocation

2015-06-17 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590426#comment-14590426
 ] 

Patrick Wendell commented on SPARK-6393:


[~sandyryza] I'm un-targeting this. If you are planning on working on this for 
a specific version, feel free to retarget.

 Extra RPC to the AM during killExecutor invocation
 --

 Key: SPARK-6393
 URL: https://issues.apache.org/jira/browse/SPARK-6393
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 1.3.1
Reporter: Sandy Ryza

 This was introduced by SPARK-6325



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >