[jira] [Commented] (SPARK-7726) Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher]
[ https://issues.apache.org/jira/browse/SPARK-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680885#comment-14680885 ] Patrick Wendell commented on SPARK-7726: [~srowen] [~dragos] This is cropping up again when trying to create a release candidate for Spark 1.5: https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Release-All-Java7/26/console Maven Install Breaks When Upgrading Scala 2.11.2--[2.11.3 or higher] - Key: SPARK-7726 URL: https://issues.apache.org/jira/browse/SPARK-7726 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Iulian Dragos Priority: Blocker Fix For: 1.4.0 This one took a long time to track down. The Maven install phase is part of our release process. It runs the scala:doc target to generate doc jars. Between Scala 2.11.2 and Scala 2.11.3, the behavior of this plugin changed in a way that breaks our build. In both cases, it returned an error (there has been a long running error here that we've always ignored), however in 2.11.3 that error became fatal and failed the entire build process. The upgrade occurred in SPARK-7092. Here is a simple reproduction: {code} ./dev/change-version-to-2.11.sh mvn clean install -pl network/common -pl network/shuffle -DskipTests -Dscala-2.11 {code} This command exits success when Spark is at Scala 2.11.2 and fails with 2.11.3 or higher. In either case an error is printed: {code} [INFO] [INFO] --- scala-maven-plugin:3.2.0:doc-jar (attach-scaladocs) @ spark-network-shuffle_2.11 --- /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/UploadBlock.java:56: error: not found: type Type protected Type type() { return Type.UPLOAD_BLOCK; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/StreamHandle.java:37: error: not found: type Type protected Type type() { return Type.STREAM_HANDLE; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/RegisterExecutor.java:44: error: not found: type Type protected Type type() { return Type.REGISTER_EXECUTOR; } ^ /Users/pwendell/Documents/spark/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java:40: error: not found: type Type protected Type type() { return Type.OPEN_BLOCKS; } ^ model contains 22 documentable templates four errors found {code} Ideally we'd just dig in and fix this error. Unfortunately it's a very confusing error and I have no idea why it is appearing. I'd propose reverting SPARK-7092 in the mean time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds
[ https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660796#comment-14660796 ] Patrick Wendell commented on SPARK-1517: Hey Ryan, IIRC - the Apache snapshot repository won't let us publish binaries that do not have SNAPSHOT in the version number. The reason is it expects to see timestamped snapshots so its garbage collection mechanism can work. We could look at adding sha1 hashes, before SNAPSHOT, but I think there is some chance this would break their cleanup. In terms of posting more binaries - I can look at whether Databricks or Berkeley might be able to donate S3 resources for this, but it would have to be clearly maintained by those organizations and not branded as official Apache releases or anything like that. Publish nightly snapshots of documentation, maven artifacts, and binary builds -- Key: SPARK-1517 URL: https://issues.apache.org/jira/browse/SPARK-1517 Project: Spark Issue Type: Improvement Components: Build, Project Infra Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Critical Should be pretty easy to do with Jenkins. The only thing I can think of that would be tricky is to set up credentials so that jenkins can publish this stuff somewhere on apache infra. Ideally we don't want to have to put a private key on every jenkins box (since they are otherwise pretty stateless). One idea is to encrypt these credentials with a passphrase and post them somewhere publicly visible. Then the jenkins build can download the credentials provided we set a passphrase in an environment variable in jenkins. There may be simpler solutions as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds
[ https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660420#comment-14660420 ] Patrick Wendell commented on SPARK-1517: Hey Ryan, For the maven snapshot releases - unfortunately we are constrained by maven's own SNAPSHOT version format which doesn't allow encoding anything other than the timestamp. It's just not supported in their SNAPSHOT mechanism. However, one thing we could see is whether we can align the timestamp with the time of the actual spark commit, rather than the time of publication of the SNAPSHOT release. I'm not sure if maven lets you provide a custom timestamp when publishing. If we had that feature users could look at the Spark commit log and do some manual association. For the binaries, the reason why the same commit appears multiple times is that we do the build every four hours and always publish the latest one even if it's a duplicate. However, this could be modified pretty easily to just avoid double-publishing the same commit if there hasn't been any code change. Maybe create a JIRA for this? In terms of how many older versions are available, the scripts we use for this have a tunable retention window. Right now I'm only keeping the last 4 builds, we could probably extend it to something like 10 builds. However, at some point I'm likely to blow out of space in my ASF user account. Since the binaries are quite large, I don't think at least using ASF infrastructure it's feasible to keep all past builds. We have 3000 commits in a typical Spark release, and it's a few gigs for each binary build. Publish nightly snapshots of documentation, maven artifacts, and binary builds -- Key: SPARK-1517 URL: https://issues.apache.org/jira/browse/SPARK-1517 Project: Spark Issue Type: Improvement Components: Build, Project Infra Reporter: Patrick Wendell Assignee: Patrick Wendell Priority: Critical Should be pretty easy to do with Jenkins. The only thing I can think of that would be tricky is to set up credentials so that jenkins can publish this stuff somewhere on apache infra. Ideally we don't want to have to put a private key on every jenkins box (since they are otherwise pretty stateless). One idea is to encrypt these credentials with a passphrase and post them somewhere publicly visible. Then the jenkins build can download the credentials provided we set a passphrase in an environment variable in jenkins. There may be simpler solutions as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Avoiding unnecessary build changes until tests are in better shape
Hey All, Was wondering if people would be willing to avoid merging build changes until we have put the tests in better shape. The reason is that build changes are the most likely to cause downstream issues with the test matrix and it's very difficult to reverse engineer which patches caused which problems when the tests are not in a stable state. For instance, the updates to Hive 1.2.1 caused cascading failures that have lasted several days now and in the mean time a few other build related patches were also merged - as these pile up it gets harder for us to have confidence those other patches didn't introduce problems. https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/ - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: How to help for 1.5 release?
Hey Meihua, If you are a user of Spark, one thing that is really helpful is to run Spark 1.5 on your workload and report any issues, performance regressions, etc. - Patrick On Mon, Aug 3, 2015 at 11:49 PM, Akhil Das ak...@sigmoidanalytics.com wrote: I think you can start from here https://issues.apache.org/jira/browse/SPARK/fixforversion/12332078/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-summary-panel Thanks Best Regards On Tue, Aug 4, 2015 at 12:02 PM, Meihua Wu rotationsymmetr...@gmail.com wrote: I think the team is preparing for the 1.5 release. Anything to help with the QA, testing etc? Thanks, MW - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: PSA: Maven 3.3.3 now required to build
Yeah the best bet is to use ./build/mvn --force (otherwise we'll still use your system maven). - Patrick On Mon, Aug 3, 2015 at 1:26 PM, Sean Owen so...@cloudera.com wrote: That statement is true for Spark 1.4.x. But you've reminded me that I failed to update this doc for 1.5, to say Maven 3.3.3 is required. Patch coming up. On Mon, Aug 3, 2015 at 9:12 PM, Guru Medasani gdm...@gmail.com wrote: Thanks Sean. Reason I asked this is, in Building Spark documentation of 1.4.1, I still see this. https://spark.apache.org/docs/latest/building-spark.html Building Spark using Maven requires Maven 3.0.4 or newer and Java 6+. But I noticed the following warnings from the build of Spark version 1.5.0-snapshot. So I was wondering if the changes you mentioned relate to newer versions of Spark or for 1.4.1 version as well. [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3. [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed with message: Detected JDK Version: 1.6.0-36 is not in the allowed range 1.7. Guru Medasani gdm...@gmail.com On Aug 3, 2015, at 2:38 PM, Sean Owen so...@cloudera.com wrote: Using ./build/mvn should always be fine. Your local mvn is fine too if it's 3.3.3 or later (3.3.3 is the latest). That's what any brew users on OS X out there will have, by the way. On Mon, Aug 3, 2015 at 8:37 PM, Guru Medasani gdm...@gmail.com wrote: Thanks Sean. I noticed this one while building Spark version 1.5.0-SNAPSHOT this morning. WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: Detected Maven Version: 3.2.5 is not in the allowed range 3.3.3. Should we be using maven 3.3.3 locally or build/mvn starting from Spark 1.4.1 or Spark version 1.5? Guru Medasani gdm...@gmail.com On Aug 3, 2015, at 1:01 PM, Sean Owen so...@cloudera.com wrote: If you use build/mvn or are already using Maven 3.3.3 locally (i.e. via brew on OS X), then this won't affect you, but I wanted to call attention to https://github.com/apache/spark/pull/7852 which makes Maven 3.3.3 the minimum required to build Spark. This heads off problems from some behavior differences that Patrick and I observed between 3.3 and 3.2 last week, on top of the dependency reduced POM glitch from the 1.4.1 release window. Again all you need to do is use build/mvn if you don't already have the latest Maven installed and all will be well. Sean - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Created] (SPARK-9547) Allow testing pull requests with different Hadoop versions
Patrick Wendell created SPARK-9547: -- Summary: Allow testing pull requests with different Hadoop versions Key: SPARK-9547 URL: https://issues.apache.org/jira/browse/SPARK-9547 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell Similar to SPARK-9545 we should allow testing different Hadoop profiles in the PRB. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9545) Run Maven tests in pull request builder if title has [maven-test] in it
[ https://issues.apache.org/jira/browse/SPARK-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-9545: --- Issue Type: Improvement (was: Bug) Run Maven tests in pull request builder if title has [maven-test] in it - Key: SPARK-9545 URL: https://issues.apache.org/jira/browse/SPARK-9545 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell We have infrastructure now in the build tooling for running maven tests, but it's not actually used anywhere. With a very minor change we can support running maven tests if the pull request title has maven-test in it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9545) Run Maven tests in pull request builder if title has [maven-test] in it
Patrick Wendell created SPARK-9545: -- Summary: Run Maven tests in pull request builder if title has [maven-test] in it Key: SPARK-9545 URL: https://issues.apache.org/jira/browse/SPARK-9545 Project: Spark Issue Type: Bug Components: Build Reporter: Patrick Wendell Assignee: Patrick Wendell We have infrastructure now in the build tooling for running maven tests, but it's not actually used anywhere. With a very minor change we can support running maven tests if the pull request title has maven-test in it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [ANNOUNCE] Nightly maven and package builds for Spark
Hey All, I got it up and running - it was a newly surfaced bug in the build scripts. - Patrick On Wed, Jul 29, 2015 at 6:05 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Hey Patrick, Any update on this front please? Thanks, Bharath On Fri, Jul 24, 2015 at 8:38 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Bharath, There was actually an incompatible change to the build process that broke several of the Jenkins builds. This should be patched up in the next day or two and nightly builds will resume. - Patrick On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: I noticed the last (1.5) build has a timestamp of 16th July. Have nightly builds been discontinued since then? Thanks, Bharath On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell pwend...@gmail.com wrote: Hi All, This week I got around to setting up nightly builds for Spark on Jenkins. I'd like feedback on these and if it's going well I can merge the relevant automation scripts into Spark mainline and document it on the website. Right now I'm doing: 1. SNAPSHOT's of Spark master and release branches published to ASF Maven snapshot repo: https://repository.apache.org/content/repositories/snapshots/org/apache/spark/ These are usable by adding this repository in your build and using a snapshot version (e.g. 1.3.2-SNAPSHOT). 2. Nightly binary package builds and doc builds of master and release versions. http://people.apache.org/~pwendell/spark-nightly/ These build 4 times per day and are tagged based on commits. If anyone has feedback on these please let me know. Thanks! - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Should spark-ec2 get its own repo?
Hey All, I've mostly kept quiet since I am not very active in maintaining this code anymore. However, it is a bit odd that the project is split-brained with a lot of the code being on github and some in the Spark repo. If the consensus is to migrate everything to github, that seems okay with me. I would vouch for having user continuity, for instance still have a shim ec2/spark-ec2 script that could perhaps just download and unpack the real script from github. - Patrick On Fri, Jul 31, 2015 at 2:13 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Yes - It is still in progress, but I have just not gotten time to get to this. I think getting the repo moved from mesos to amplab in the codebase by 1.5 should be possible. Thanks Shivaram On Fri, Jul 31, 2015 at 3:08 AM, Sean Owen so...@cloudera.com wrote: PS is this still in progress? it feels like something that would be good to do before 1.5.0, if it's going to happen soon. On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Yeah I'll send a note to the mesos dev list just to make sure they are informed. Shivaram On Tue, Jul 21, 2015 at 11:47 AM, Sean Owen so...@cloudera.com wrote: I agree it's worth informing Mesos devs and checking that there are no big objections. I presume Shivaram is plugged in enough to Mesos that there won't be any surprises there, and that the project would also agree with moving this Spark-specific bit out. they may also want to leave a pointer to the new location in the mesos repo of course. I don't think it is something that requires a formal vote. It's not a question of ownership -- neither Apache nor the project PMC owns the code. I don't think it's different from retiring or removing any other code. On Tue, Jul 21, 2015 at 7:03 PM, Mridul Muralidharan mri...@gmail.com wrote: If I am not wrong, since the code was hosted within mesos project repo, I assume (atleast part of it) is owned by mesos project and so its PMC ? - Mridul On Tue, Jul 21, 2015 at 9:22 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: There is technically no PMC for the spark-ec2 project (I guess we are kind of establishing one right now). I haven't heard anything from the Spark PMC on the dev list that might suggest a need for a vote so far. I will send another round of email notification to the dev list when we have a JIRA / PR that actually moves the scripts (right now the only thing that changed is the location of some scripts in mesos/ to amplab/). Thanks Shivaram - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Data source aliasing
Yeah this could make sense - allowing data sources to register a short name. What mechanism did you have in mind? To use the jar service loader? The only issue is that there could be conflicts since many of these are third party packages. If the same name were registered twice I'm not sure what the best behavior would be. Ideally in my mind if the same shortname were registered twice we'd force the user to use a fully qualified name and say the short name is ambiguous. Patrick On Jul 30, 2015 9:44 AM, Joseph Batchik josephbatc...@gmail.com wrote: Hi all, There are now starting to be a lot of data source packages for Spark. A annoyance I see is that I have to type in the full class name like: sqlContext.read.format(com.databricks.spark.avro).load(path). Spark internally has formats such as parquet and jdbc registered and it would be nice to be able just to type in avro, redshift, etc. as well. Would it be a good idea to use something like a service loader to allow data sources defined in other packages to register themselves with Spark? I think that this would make it easier for end users. I would be interested in adding this, please let me know what you guys think. - Joe
[jira] [Resolved] (SPARK-9423) Why do every other spark comiter keep suggesting to use spark-submit script
[ https://issues.apache.org/jira/browse/SPARK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-9423. Resolution: Invalid Why do every other spark comiter keep suggesting to use spark-submit script --- Key: SPARK-9423 URL: https://issues.apache.org/jira/browse/SPARK-9423 Project: Spark Issue Type: Question Components: Deploy Affects Versions: 1.3.1 Reporter: nirav patel I see that on spark forum and stackoverflow people keep suggesting to use spark-submit.sh script as a way (only way) to launch spark jobs? Are we still living in application server monolithic world where I need to run startup.sh ? What if spark application is long running context that serves multiple requests? What if user just don't want to use script? They want to embed spark as a service in their application. Please STOP suggesting user to use spark-submit script as an alternative. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9423) Why do every other spark comiter keep suggesting to use spark-submit script
[ https://issues.apache.org/jira/browse/SPARK-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645495#comment-14645495 ] Patrick Wendell commented on SPARK-9423: This is not a valid issue for JIRA (we use JIRA for project bugs and feature tracking). Please send an email to the spark-users list. Thanks. Why do every other spark comiter keep suggesting to use spark-submit script --- Key: SPARK-9423 URL: https://issues.apache.org/jira/browse/SPARK-9423 Project: Spark Issue Type: Question Components: Deploy Affects Versions: 1.3.1 Reporter: nirav patel I see that on spark forum and stackoverflow people keep suggesting to use spark-submit.sh script as a way (only way) to launch spark jobs? Are we still living in application server monolithic world where I need to run startup.sh ? What if spark application is long running context that serves multiple requests? What if user just don't want to use script? They want to embed spark as a service in their application. Please STOP suggesting user to use spark-submit script as an alternative. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: ReceiverTrackerSuite failing in master build
Thanks ted for pointing this out. CC to Ryan and TD On Tue, Jul 28, 2015 at 8:25 AM, Ted Yu yuzhih...@gmail.com wrote: Hi, I noticed that ReceiverTrackerSuite is failing in master Jenkins build for both hadoop profiles. The failure seems to start with: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/3104/ FYI - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Protocol for build breaks
Hi All, If there is a build break (i.e. a compile issue or consistently failing test) that somehow makes it into master, the best protocol is: 1. Revert the offending patch. 2. File a JIRA and assign it to the committer of the offending patch. The JIRA should contain links to broken builds. It's not worth waiting any time to try and figure out how to fix it, or blocking on tracking down the commit author. This is because every hour that we have the PRB broken is a major cost in terms of developer productivity. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Created] (SPARK-9304) Improve backwards compatibility of SPARK-8401
Patrick Wendell created SPARK-9304: -- Summary: Improve backwards compatibility of SPARK-8401 Key: SPARK-9304 URL: https://issues.apache.org/jira/browse/SPARK-9304 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Michael Allman Priority: Critical In SPARK-8401 a backwards incompatible change was made to the scala 2.11 build process. It would be good to add scripts with the older names to avoid breaking compatibility for harnesses or other automated builds that build for Scala 2.11. The can just be a one line shell script with a comment explaining it is for backwards compatibility purposes. /cc [~srowen] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [ANNOUNCE] Nightly maven and package builds for Spark
Hey Bharath, There was actually an incompatible change to the build process that broke several of the Jenkins builds. This should be patched up in the next day or two and nightly builds will resume. - Patrick On Fri, Jul 24, 2015 at 12:51 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: I noticed the last (1.5) build has a timestamp of 16th July. Have nightly builds been discontinued since then? Thanks, Bharath On Sun, May 24, 2015 at 1:11 PM, Patrick Wendell pwend...@gmail.com wrote: Hi All, This week I got around to setting up nightly builds for Spark on Jenkins. I'd like feedback on these and if it's going well I can merge the relevant automation scripts into Spark mainline and document it on the website. Right now I'm doing: 1. SNAPSHOT's of Spark master and release branches published to ASF Maven snapshot repo: https://repository.apache.org/content/repositories/snapshots/org/apache/spark/ These are usable by adding this repository in your build and using a snapshot version (e.g. 1.3.2-SNAPSHOT). 2. Nightly binary package builds and doc builds of master and release versions. http://people.apache.org/~pwendell/spark-nightly/ These build 4 times per day and are tagged based on commits. If anyone has feedback on these please let me know. Thanks! - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Policy around backporting bug fixes
Hi All, A few times I've been asked about backporting and when to backport and not backport fix patches. Since I have managed this for many of the past releases, I wanted to point out the way I have been thinking about it. If we have some consensus I can put it on the wiki. The trade off when backporting is you get to deliver the fix to people running older versions (great!), but you risk introducing new or even worse bugs in maintenance releases (bad!). The decision point is when you have a bug fix and it's not clear whether it is worth backporting. I think the following facets are important to consider: (a) Backports are an extremely valuable service to the community and should be considered for any bug fix. (b) Introducing a new bug in a maintenance release must be avoided at all costs. It over time would erode confidence in our release process. (c) Distributions or advanced users can always backport risky patches on their own, if they see fit. For me, the consequence of these is that we should backport in the following situations: - Both the bug and the fix are well understood and isolated. Code being modified is well tested. - The bug being addressed is high priority to the community. - The backported fix does not vary widely from the master branch fix. We tend to avoid backports in the converse situations: - The bug or fix are not well understood. For instance, it relates to interactions between complex components or third party libraries (e.g. Hadoop libraries). The code is not well tested outside of the immediate bug being fixed. - The bug is not clearly a high priority for the community. - The backported fix is widely different from the master branch fix. These are clearly subjective criteria, but ones worth considering. I am always happy to help advise people on specific patches if they want a soundingboard to understand whether it makes sense to backport. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Updated] (SPARK-8703) Add CountVectorizer as a ml transformer to convert document to words count vector
[ https://issues.apache.org/jira/browse/SPARK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8703: --- Issue Type: Sub-task (was: New Feature) Parent: SPARK-8521 Add CountVectorizer as a ml transformer to convert document to words count vector - Key: SPARK-8703 URL: https://issues.apache.org/jira/browse/SPARK-8703 Project: Spark Issue Type: Sub-task Components: ML Reporter: yuhao yang Assignee: yuhao yang Fix For: 1.5.0 Original Estimate: 24h Remaining Estimate: 24h Converts a text document to a sparse vector of token counts. Similar to http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html I can further add an estimator to extract vocabulary from corpus if that's appropriate. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8564) Add the Python API for Kinesis
[ https://issues.apache.org/jira/browse/SPARK-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8564: --- Target Version/s: 1.5.0 Add the Python API for Kinesis -- Key: SPARK-8564 URL: https://issues.apache.org/jira/browse/SPARK-8564 Project: Spark Issue Type: New Feature Components: Streaming Reporter: Shixiong Zhu -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: KinesisStreamSuite failing in master branch
I think we should just revert this patch on all affected branches. No reason to leave the builds broken until a fix is in place. - Patrick On Sun, Jul 19, 2015 at 6:03 PM, Josh Rosen rosenvi...@gmail.com wrote: Yep, I emailed TD about it; I think that we may need to make a change to the pull request builder to fix this. Pending that, we could just revert the commit that added this. On Sun, Jul 19, 2015 at 5:32 PM, Ted Yu yuzhih...@gmail.com wrote: Hi, I noticed that KinesisStreamSuite fails for both hadoop profiles in master Jenkins builds. From https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/3011/console : KinesisStreamSuite: *** RUN ABORTED *** java.lang.AssertionError: assertion failed: Kinesis test not enabled, should not attempt to get AWS credentials at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getAWSCredentials(KinesisTestUtils.scala:189) at org.apache.spark.streaming.kinesis.KinesisTestUtils.org$apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient$lzycompute(KinesisTestUtils.scala:59) at org.apache.spark.streaming.kinesis.KinesisTestUtils.org$apache$spark$streaming$kinesis$KinesisTestUtils$$kinesisClient(KinesisTestUtils.scala:58) at org.apache.spark.streaming.kinesis.KinesisTestUtils.describeStream(KinesisTestUtils.scala:121) at org.apache.spark.streaming.kinesis.KinesisTestUtils.findNonExistentStreamName(KinesisTestUtils.scala:157) at org.apache.spark.streaming.kinesis.KinesisTestUtils.createStream(KinesisTestUtils.scala:78) at org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:45) at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) at org.apache.spark.streaming.kinesis.KinesisStreamSuite.beforeAll(KinesisStreamSuite.scala:33) FYI - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Foundation policy on releases and Spark nightly builds
Sean B., Thank you for giving a thorough reply. I will work with Sean O. and see what we can change to make us more in line with the stated policy. I did some research and it appears that some time between October [1] and December [2] 2006, this page was modified to include stricter policy surrounding nightly builds. Actually, the original version of the policy page encouraged projects to post nightly builds for the benefit of all developers, just as we have been doing. If you detect frustration from the Spark community, it's because this type of situation occurs with some regularity. In this case: (a) A policy exists from ~10 years ago, presumably because some project back then had problematic release management practices and so a policy needed to be created to solve a problem. (b) The policy is outdated now, and no one is 100% sure why it was created (likely many of the people are no longer involved in the ASF who helped craft it). (c) The steps for how to change it are unclear and there isn't clear ownership of the policy document. I think it's unavoidable given the decentralized organization structure of the ASF, but I just want to be up front about our perspective and why you might sense some frustration. [1] https://web.archive.org/web/20061020220358/http://www.apache.org/dev/release.html [2] https://web.archive.org/web/20061231050046/http://www.apache.org/dev/release.html - Patrick On Tue, Jul 14, 2015 at 10:09 AM, Sean Busbey bus...@cloudera.com wrote: Responses inline, with some liberties on ordering. On Sun, Jul 12, 2015 at 10:32 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean B, Would you mind outlining for me how we go about changing this policy - I think it's outdated and doesn't make much sense. Ideally I'd like to propose a vote to modify the text slightly such that our current behavior is seen as complaint. Specifically: - Who has the authority to change this document? It's foundation level policy, so I'd presume the board needs to. Since it's part of our legal position, it might be owned by the legal affairs committee[1]. That would mean they could update it without a board resolution. (legal-discuss@ could tell you for sure). - What concrete steps can I take to change the policy? The Legal Affairs Committee is reachable either through their mailing list[2] or their issue tracker[3]. Please be sure to read the entire original document, it explains the rationale that has gone into it. You'll need to address the matters raised there. - You keep mentioning the incubator@ list, why is this the place for such policy to be discussed or decided on? It can't be decided on the general@incubator list, but there are already several relevant parties discussing the matter there. You certainly don't *need* to join that conversation, but the participants there have overlap with the folks who can ultimately decide the issue. Thus, it may help avoid having to repeat things. - What is the reasonable amount of time frame in which the policy change is likely to be decided? I am neither a participant on legal affairs nor the board, so I have no idea. We've had a few times people from the various parts of the ASF come and say we are in violation of a policy. And sometimes other ASF people come and then get in a fight on our mailing list, and there is Please keep in mind that you are also ASF people, as is the entire Spark community (users and all)[4]. Phrasing things in terms of us and them by drawing a distinction on [they] get in a fight on our mailing list is not helpful. back and fourth, and it turns out there isn't so much a widely followed policy as a doc somewhere that is really old and not actually universally followed. It's difficult for us in such situations to now how to proceed and how much autonomy we as a PMC have to make decisions about our own project. Understanding and abiding by ASF legal obligations and policies is the job of each project PMC as a part of their formation by the board[5]. If anyone in your community has questions about what the project can or can not do then it's the job of the PMC find out proactively (rather than take a ask for forgiveness approach). Where the existing documentation is unclear or where you think it might be out of date, you can often get guidance from general@incubator (since it contains a large number of members and folks from across foundation projects) or comdev[6] (since their charter includes explaining ASF policy). If those resources prove insufficient matters can be brought up with either legal-discuss@ or board@. If you find out of date documentation that is not ASF policy, you can have it removed by notifying the appropriate group (i.e. legal-discuss, comdev, or whomever is hosting it). [1]: http://apache.org/legal/ [2]: http://www.apache.org/foundation/mailinglists.html#foundation-legal [3]: https://issues.apache.org/jira/browse/LEGAL
Re: Foundation policy on releases and Spark nightly builds
Hey Sean, One other thing I'd be okay doing is moving the main text about nightly builds to the wiki and just have header called Nightly builds at the end of the downloads page that says For developers, Spark maintains nightly builds. More information is available on the [Spark developer Wiki](link). I think this would preserve discoverability while also placing the information on the wiki, which seems to be the main ask of the policy. - Patrick On Sun, Jul 19, 2015 at 2:32 AM, Sean Owen so...@cloudera.com wrote: I am going to make an edit to the download page on the web site to start, as that much seems uncontroversial. Proposed change: Reorder sections to put developer-oriented sections at the bottom, including the info on nightly builds: Download Spark Link with Spark All Releases Spark Source Code Management Nightly Builds Change text to emphasize the audience: Packages are built regularly off of Spark’s master branch and release branches. These provide *Spark developers* access to the bleeding-edge of Spark master or the most recent fixes not yet incorporated into a maintenance release. *They should not be used by anyone except Spark developers, and may be unstable or have serious bugs. End users should only use official releases above. Please subscribe to dev@spark.apache.org if you are a Spark developer to be aware of issues in nightly builds.* Spark nightly packages are available at: On Thu, Jul 16, 2015 at 8:21 AM, Sean Owen so...@cloudera.com wrote: To move this forward, I think one of two things needs to happen: 1. Move this guidance to the wiki. Seems that people gathered here believe that resolves the issue. Done. 2. Put disclaimers on the current downloads page. This may resolve the issue, but then we bring it up on the right mailing list for discussion. It may end up at #1, or may end in a tweak to the policy. I can drive either one. Votes on how to proceed? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [discuss] Removing individual commit messages from the squash commit message
+1 from me too On Sat, Jul 18, 2015 at 3:32 AM, Ted Yu yuzhih...@gmail.com wrote: +1 to removing commit messages. On Jul 18, 2015, at 1:35 AM, Sean Owen so...@cloudera.com wrote: +1 to removing them. Sometimes there are 50+ commits because people have been merging from master into their branch rather than rebasing. On Sat, Jul 18, 2015 at 8:48 AM, Reynold Xin r...@databricks.com wrote: I took a look at the commit messages in git log -- it looks like the individual commit messages are not that useful to include, but do make the commit messages more verbose. They are usually just a bunch of extremely concise descriptions of bug fixes, merges, etc: cb3f12d [xxx] add whitespace 6d874a6 [xxx] support pyspark for yarn-client 89b01f5 [yyy] Update the unit test to add more cases 275d252 [yyy] Address the comments 7cc146d [yyy] Address the comments 2624723 [yyy] Fix rebase conflict 45befaa [yyy] Update the unit test bbc1c9c [yyy] Fix checkpointing doesn't retain driver port issue Anybody against removing those from the merge script so the log looks cleaner? If nobody feels strongly about this, we can just create a JIRA to remove them, and only keep the author names. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Slight API incompatibility caused by SPARK-4072
One related note here is that we have a Java version of this that is an abstract class - in the doc it says that it exists more or less to allow for binary compatibility (it says it's for Java users, but really Scala could use this also): https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/JavaSparkListener.java#L23 I think it might be reasonable that the Scala trait provides only source compatibitly and the Java class provides binary compatibility. - Patrick On Wed, Jul 15, 2015 at 11:47 AM, Marcelo Vanzin van...@cloudera.com wrote: Hey all, Just noticed this when some of our tests started to fail. SPARK-4072 added a new method to the SparkListener trait, and even though it has a default implementation, it doesn't seem like that applies retroactively. Namely, if you have an existing, compiled app that has an implementation of SparkListener, that app won't work on 1.5 without a recompile. You'll get something like this: java.lang.AbstractMethodError at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:62) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:79) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1235) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63) Now I know that SparkListener is marked as @DeveloperApi, but is this something we should care about? Seems like adding methods to traits is just as backwards-incompatible as adding new methods to Java interfaces. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Slight API incompatibility caused by SPARK-4072
Actually the java one is a concrete class. On Wed, Jul 15, 2015 at 12:14 PM, Patrick Wendell pwend...@gmail.com wrote: One related note here is that we have a Java version of this that is an abstract class - in the doc it says that it exists more or less to allow for binary compatibility (it says it's for Java users, but really Scala could use this also): https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/JavaSparkListener.java#L23 I think it might be reasonable that the Scala trait provides only source compatibitly and the Java class provides binary compatibility. - Patrick On Wed, Jul 15, 2015 at 11:47 AM, Marcelo Vanzin van...@cloudera.com wrote: Hey all, Just noticed this when some of our tests started to fail. SPARK-4072 added a new method to the SparkListener trait, and even though it has a default implementation, it doesn't seem like that applies retroactively. Namely, if you have an existing, compiled app that has an implementation of SparkListener, that app won't work on 1.5 without a recompile. You'll get something like this: java.lang.AbstractMethodError at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:62) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:56) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:79) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1235) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63) Now I know that SparkListener is marked as @DeveloperApi, but is this something we should care about? Seems like adding methods to traits is just as backwards-incompatible as adding new methods to Java interfaces. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Announcing Spark 1.4.1!
Hi All, I'm happy to announce the Spark 1.4.1 maintenance release. We recommend all users on the 1.4 branch upgrade to this release, which contain several important bug fixes. Download Spark 1.4.1 - http://spark.apache.org/downloads.html Release notes - http://spark.apache.org/releases/spark-release-1-4-1.html Comprehensive list of fixes - http://s.apache.org/spark-1.4.1 Thanks to the 85 developers who worked on this release! Please contact me directly for errata in the release notes. - Patrick - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Announcing Spark 1.4.1!
Hi All, I'm happy to announce the Spark 1.4.1 maintenance release. We recommend all users on the 1.4 branch upgrade to this release, which contain several important bug fixes. Download Spark 1.4.1 - http://spark.apache.org/downloads.html Release notes - http://spark.apache.org/releases/spark-release-1-4-1.html Comprehensive list of fixes - http://s.apache.org/spark-1.4.1 Thanks to the 85 developers who worked on this release! Please contact me directly for errata in the release notes. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Updated] (SPARK-7920) Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example).
[ https://issues.apache.org/jira/browse/SPARK-7920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7920: --- Labels: (was: spark.tc) Make MLlib ChiSqSelector Serializable ( Fix Related Documentation Example). Key: SPARK-7920 URL: https://issues.apache.org/jira/browse/SPARK-7920 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.3.1, 1.4.0 Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.0 The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector. ChiSqSelector should just extend Serializable. Steps: 1. Locate the MLlib ChiSqSelector documentation example. 2. Fix the example by adding an import statement for ChiSqSelector. 3. Attempt to run - notice that it will fail due to ChiSqSelector not being serializable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8927) Doc format wrong for some config descriptions
[ https://issues.apache.org/jira/browse/SPARK-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8927: --- Labels: (was: spark.tc) Doc format wrong for some config descriptions - Key: SPARK-8927 URL: https://issues.apache.org/jira/browse/SPARK-8927 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.4.0 Reporter: Jon Alter Assignee: Jon Alter Priority: Trivial Fix For: 1.4.2, 1.5.0 In the docs, a couple descriptions of configuration (under Network) are not inside td/td and are being displayed immediately under the section title instead of in their row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7985) Remove fittingParamMap references. Update ML Doc Estimator, Transformer, and Param examples.
[ https://issues.apache.org/jira/browse/SPARK-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7985: --- Labels: (was: spark.tc) Remove fittingParamMap references. Update ML Doc Estimator, Transformer, and Param examples. Key: SPARK-7985 URL: https://issues.apache.org/jira/browse/SPARK-7985 Project: Spark Issue Type: Bug Components: Documentation, ML Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.0 Update ML Doc's Estimator, Transformer, and Param Scala Java examples to use model.extractParamMap instead of model.fittingParamMap, which no longer exists. Remove all other references to fittingParamMap throughout Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7969) Drop method on Dataframes should handle Column
[ https://issues.apache.org/jira/browse/SPARK-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7969: --- Labels: (was: spark.tc) Drop method on Dataframes should handle Column -- Key: SPARK-7969 URL: https://issues.apache.org/jira/browse/SPARK-7969 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 1.4.0 Reporter: Olivier Girardot Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.1, 1.5.0 For now the drop method available on Dataframe since Spark 1.4.0 only accepts a column name (as a string), it should also accept a Column as input. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7830) ML doc cleanup: logreg, classification link
[ https://issues.apache.org/jira/browse/SPARK-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7830: --- Labels: (was: spark.tc) ML doc cleanup: logreg, classification link --- Key: SPARK-7830 URL: https://issues.apache.org/jira/browse/SPARK-7830 Project: Spark Issue Type: Improvement Components: Documentation, MLlib Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Trivial Fix For: 1.4.0 Add logistic regression to the list of Multiclass Classification Supported Methods in the MLlib Classification and Regression documentation, and fix related broken link. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8343) Improve the Spark Streaming Guides
[ https://issues.apache.org/jira/browse/SPARK-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8343: --- Labels: (was: spark.tc) Improve the Spark Streaming Guides -- Key: SPARK-8343 URL: https://issues.apache.org/jira/browse/SPARK-8343 Project: Spark Issue Type: Improvement Components: Documentation, Streaming Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.4.1, 1.5.0 Improve the Spark Streaming Guides by fixing broken links, rewording confusing sections, fixing typos, adding missing words, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7977) Disallow println
[ https://issues.apache.org/jira/browse/SPARK-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7977: --- Labels: starter (was: spark.tc starter) Disallow println Key: SPARK-7977 URL: https://issues.apache.org/jira/browse/SPARK-7977 Project: Spark Issue Type: Sub-task Components: Project Infra Reporter: Reynold Xin Assignee: Jon Alter Labels: starter Fix For: 1.5.0 Very often we see pull requests that added println from debugging, but the author forgot to remove it before code review. We can use the regex checker to disallow println. For legitimate use of println, we can then disable the rule where they are used. Add to scalastyle-config.xml file: {code} check customId=println level=error class=org.scalastyle.scalariform.TokenChecker enabled=true parametersparameter name=regex^println$/parameter/parameters customMessage![CDATA[Are you sure you want to println? If yes, wrap the code block with // scalastyle:off println println(...) // scalastyle:on println]]/customMessage /check {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8570) Improve MLlib Local Matrix Documentation.
[ https://issues.apache.org/jira/browse/SPARK-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8570: --- Labels: (was: spark.tc) Improve MLlib Local Matrix Documentation. - Key: SPARK-8570 URL: https://issues.apache.org/jira/browse/SPARK-8570 Project: Spark Issue Type: Improvement Components: Documentation, MLlib Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.5.0 Update the MLlib Data Types Local Matrix documentation as follows: -Include information on sparse matrices. -Add sparse matrix examples to the existing Scala and Java examples. -Add Python examples for both dense and sparse matrices (currently no Python examples exist for the Local Matrix section). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7883) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation.
[ https://issues.apache.org/jira/browse/SPARK-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7883: --- Labels: (was: spark.tc) Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation. --- Key: SPARK-7883 URL: https://issues.apache.org/jira/browse/SPARK-7883 Project: Spark Issue Type: Bug Components: Documentation, MLlib Affects Versions: 1.0.2, 1.1.1, 1.2.2, 1.3.1, 1.4.0 Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Trivial Fix For: 1.0.3, 1.1.2, 1.2.3, 1.3.2, 1.4.0 The trainImplicit Scala example near the end of the MLlib Collaborative Filtering documentation refers to an ALS.trainImplicit function signature that does not exist. Rather than add an extra function, let's just fix the example. Currently, the example refers to a function that would have the following signature: def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, alpha: Double) : MatrixFactorizationModel Instead, let's change the example to refer to this function, which does exist (notice the addition of the lambda parameter): def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int, lambda: Double, alpha: Double) : MatrixFactorizationModel -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7426) spark.ml AttributeFactory.fromStructField should allow other NumericTypes
[ https://issues.apache.org/jira/browse/SPARK-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7426: --- Labels: (was: spark.tc) spark.ml AttributeFactory.fromStructField should allow other NumericTypes - Key: SPARK-7426 URL: https://issues.apache.org/jira/browse/SPARK-7426 Project: Spark Issue Type: Improvement Components: ML Reporter: Joseph K. Bradley Assignee: Mike Dusenberry Priority: Minor Fix For: 1.5.0 It currently only supports DoubleType, but it should support others, at least for fromStructField (importing into ML attribute format, rather than exporting). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8639) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md
[ https://issues.apache.org/jira/browse/SPARK-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8639: --- Labels: (was: spark.tc) Instructions for executing jekyll in docs/README.md could be slightly more clear, typo in docs/api.md - Key: SPARK-8639 URL: https://issues.apache.org/jira/browse/SPARK-8639 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Rosstin Murphy Assignee: Rosstin Murphy Priority: Trivial Fix For: 1.4.1, 1.5.0 In docs/README.md, the text states around line 31 Execute 'jekyll' from the 'docs/' directory. Compiling the site with Jekyll will create a directory called '_site' containing index.html as well as the rest of the compiled files. It might be more clear if we said Execute 'jekyll build' from the 'docs/' directory to compile the site. Compiling the site with Jekyll will create a directory called '_site' containing index.html as well as the rest of the compiled files. In docs/api.md: Here you can API docs for Spark and its submodules. should be something like: Here you can read API docs for Spark and its submodules. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7357) Improving HBaseTest example
[ https://issues.apache.org/jira/browse/SPARK-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7357: --- Labels: (was: spark.tc) Improving HBaseTest example --- Key: SPARK-7357 URL: https://issues.apache.org/jira/browse/SPARK-7357 Project: Spark Issue Type: Improvement Components: Examples Affects Versions: 1.3.1 Reporter: Jihong MA Assignee: Jihong MA Priority: Minor Fix For: 1.5.0 Original Estimate: 2m Remaining Estimate: 2m Minor improvement to HBaseTest example, when Hbase related configurations e.g: zookeeper quorum, zookeeper client port or zookeeper.znode.parent are not set to default (localhost:2181), connection to zookeeper might hang as shown in following stack 15/03/26 18:31:20 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=xxx.xxx.xxx:2181 sessionTimeout=9 watcher=hconnection-0x322a4437, quorum=xxx.xxx.xxx:2181, baseZNode=/hbase 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Opening socket connection to server 9.30.94.121:2181. Will not attempt to authenticate using SASL (unknown error) 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Socket connection established to xxx.xxx.xxx/9.30.94.121:2181, initiating session 15/03/26 18:31:21 INFO zookeeper.ClientCnxn: Session establishment complete on server xxx.xxx.xxx/9.30.94.121:2181, sessionid = 0x14c53cd311e004b, negotiated timeout = 4 15/03/26 18:31:21 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null this is due to hbase-site.xml is not placed on spark class path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8746) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest)
[ https://issues.apache.org/jira/browse/SPARK-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8746: --- Labels: documentation test (was: documentation spark.tc test) Need to update download link for Hive 0.13.1 jars (HiveComparisonTest) -- Key: SPARK-8746 URL: https://issues.apache.org/jira/browse/SPARK-8746 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Reporter: Christian Kadner Assignee: Christian Kadner Priority: Trivial Labels: documentation, test Fix For: 1.4.1, 1.5.0 Original Estimate: 1h Remaining Estimate: 1h The Spark SQL documentation (https://github.com/apache/spark/tree/master/sql) describes how to generate golden answer files for new hive comparison test cases. However the download link for the Hive 0.13.1 jars points to https://hive.apache.org/downloads.html but none of the linked mirror sites still has the 0.13.1 version. We need to update the link to https://archive.apache.org/dist/hive/hive-0.13.1/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6485) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark
[ https://issues.apache.org/jira/browse/SPARK-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6485: --- Labels: (was: spark.tc) Add CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark -- Key: SPARK-6485 URL: https://issues.apache.org/jira/browse/SPARK-6485 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Xiangrui Meng We should add APIs for CoordinateMatrix/RowMatrix/IndexedRowMatrix in PySpark. Internally, we can use DataFrames for serialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7744) Distributed matrix section in MLlib Data Types documentation should be reordered.
[ https://issues.apache.org/jira/browse/SPARK-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7744: --- Labels: (was: spark.tc) Distributed matrix section in MLlib Data Types documentation should be reordered. - Key: SPARK-7744 URL: https://issues.apache.org/jira/browse/SPARK-7744 Project: Spark Issue Type: Improvement Components: Documentation, MLlib Reporter: Mike Dusenberry Assignee: Mike Dusenberry Priority: Minor Fix For: 1.3.2, 1.4.0 The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the basic distributed matrix. This will improve comprehensibility of the Distributed matrix section, especially for the new reader. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6785) DateUtils can not handle date before 1970/01/01 correctly
[ https://issues.apache.org/jira/browse/SPARK-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6785: --- Labels: (was: spark.tc) DateUtils can not handle date before 1970/01/01 correctly - Key: SPARK-6785 URL: https://issues.apache.org/jira/browse/SPARK-6785 Project: Spark Issue Type: Bug Components: SQL Reporter: Davies Liu Assignee: Christian Kadner Fix For: 1.5.0 {code} scala val d = new Date(100) d: java.sql.Date = 1969-12-31 scala DateUtils.toJavaDate(DateUtils.fromJavaDate(d)) res1: java.sql.Date = 1970-01-01 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5562) LDA should handle empty documents
[ https://issues.apache.org/jira/browse/SPARK-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5562: --- Labels: starter (was: spark.tc starter) LDA should handle empty documents - Key: SPARK-5562 URL: https://issues.apache.org/jira/browse/SPARK-5562 Project: Spark Issue Type: Test Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Alok Singh Priority: Minor Labels: starter Fix For: 1.5.0 Original Estimate: 96h Remaining Estimate: 96h Latent Dirichlet Allocation (LDA) could easily be given empty documents when people select a small vocabulary. We should check to make sure it is robust to empty documents. This will hopefully take the form of a unit test, but may require modifying the LDA implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7265) Improving documentation for Spark SQL Hive support
[ https://issues.apache.org/jira/browse/SPARK-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7265: --- Labels: (was: spark.tc) Improving documentation for Spark SQL Hive support --- Key: SPARK-7265 URL: https://issues.apache.org/jira/browse/SPARK-7265 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.3.1 Reporter: Jihong MA Assignee: Jihong MA Priority: Trivial Fix For: 1.5.0 miscellaneous documentation improvement for Spark SQL Hive support, Yarn cluster deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2859) Update url of Kryo project in related docs
[ https://issues.apache.org/jira/browse/SPARK-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2859: --- Labels: (was: spark.tc) Update url of Kryo project in related docs -- Key: SPARK-2859 URL: https://issues.apache.org/jira/browse/SPARK-2859 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Guancheng Chen Assignee: Guancheng Chen Priority: Trivial Fix For: 1.0.3, 1.1.0 Kryo project has been migrated from googlecode to github, hence we need to update its URL in related docs such as tuning.md. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1403) Spark on Mesos does not set Thread's context class loader
[ https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1403. Resolution: Fixed Target Version/s: (was: 1.5.0) Hey All, This issue should remain fixed. [~mandoskippy] I think you are just running into a different issue that is also in some way related to classloading. Can you open a new JIRA for your issue, paste in the stack trace and give as much information as possible without the environment? Thanks! Spark on Mesos does not set Thread's context class loader - Key: SPARK-1403 URL: https://issues.apache.org/jira/browse/SPARK-1403 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.3.0, 1.4.0 Environment: ubuntu 12.04 on vagrant Reporter: Bharath Bhushan Priority: Blocker Fix For: 1.0.0 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark executor on mesos slave throws a java.lang.ClassNotFoundException for org.apache.spark.serializer.JavaSerializer. The lengthy discussion is here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-1403) Spark on Mesos does not set Thread's context class loader
[ https://issues.apache.org/jira/browse/SPARK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625739#comment-14625739 ] Patrick Wendell edited comment on SPARK-1403 at 7/14/15 2:59 AM: - Hey All, This issue should remain fixed. [~mandoskippy] I think you are just running into a different issue that is also in some way related to classloading. Can you open a new JIRA for your issue, paste in the stack trace and give as much information as possible about the environment? Thanks! was (Author: pwendell): Hey All, This issue should remain fixed. [~mandoskippy] I think you are just running into a different issue that is also in some way related to classloading. Can you open a new JIRA for your issue, paste in the stack trace and give as much information as possible without the environment? Thanks! Spark on Mesos does not set Thread's context class loader - Key: SPARK-1403 URL: https://issues.apache.org/jira/browse/SPARK-1403 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0, 1.3.0, 1.4.0 Environment: ubuntu 12.04 on vagrant Reporter: Bharath Bhushan Priority: Blocker Fix For: 1.0.0 I can run spark 0.9.0 on mesos but not spark 1.0.0. This is because the spark executor on mesos slave throws a java.lang.ClassNotFoundException for org.apache.spark.serializer.JavaSerializer. The lengthy discussion is here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html#a3513 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC4)
This vote passes with 14 +1 (7 binding) votes and no 0 or -1 votes. +1 (14): Patrick Wendell Reynold Xin Sean Owen Burak Yavuz Mark Hamstra Michael Armbrust Andrew Or York, Brennon Krishna Sankar Luciano Resende Holden Karau Tom Graves Denny Lee Sean McNamara - Patrick On Wed, Jul 8, 2015 at 10:55 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1125/ [published as version: 1.4.1-rc4] https://repository.apache.org/content/repositories/orgapachespark-1126/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Sunday, July 12, at 06:55 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Foundation policy on releases and Spark nightly builds
Thanks Sean O. I was thinking something like NOTE: Nightly builds are meant for development and testing purposes. They do not go through Apache's release auditing process and are not official releases. - Patrick On Sun, Jul 12, 2015 at 3:39 PM, Sean Owen so...@cloudera.com wrote: (This sounds pretty good to me. Mark it developers-only, not formally tested by the community, etc.) On Sun, Jul 12, 2015 at 7:50 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean B., Thanks for bringing this to our attention. I think putting them on the developer wiki would substantially decrease visibility in a way that is not beneficial to the project - this feature was specifically requested by developers from other projects that integrate with Spark. If the concern underlying that policy is that snapshot builds could be misconstrued as formal releases, I think it would work to put a very clear disclaimer explaining the difference directly adjacent to the link. That's arguably more explicit than just moving the same text to a different page. The formal policy asks us not to include links that encourage non-developers to download the builds. Stating clearly that the audience for those links is developers, in my interpretation that would satisfy the letter and spirit of this policy. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1 (RC4)
I think we can close this vote soon. Any addition votes/testing would be much appreciated! On Fri, Jul 10, 2015 at 11:30 AM, Sean McNamara sean.mcnam...@webtrends.com wrote: +1 Sean On Jul 8, 2015, at 11:55 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1125/ [published as version: 1.4.1-rc4] https://repository.apache.org/content/repositories/orgapachespark-1126/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Sunday, July 12, at 06:55 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624086#comment-14624086 ] Patrick Wendell commented on SPARK-2089: Yeah - we can open it again later if someone who maintains this code is wanting to work on this feature. I just want to have this JIRA reflect the current status (i.e. for 5 versions there hasn't been any action in Spark) which is that it is not actively being fixed and make sure the documentation correctly reflects what we have now, to discourage the use of a feature that does not work. With YARN, preferredNodeLocalityData isn't honored --- Key: SPARK-2089 URL: https://issues.apache.org/jira/browse/SPARK-2089 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical When running in YARN cluster mode, apps can pass preferred locality data when constructing a Spark context that will dictate where to request executor containers. This is currently broken because of a race condition. The Spark-YARN code runs the user class and waits for it to start up a SparkContext. During its initialization, the SparkContext will create a YarnClusterScheduler, which notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then immediately fetches the preferredNodeLocationData from the SparkContext and uses it to start requesting containers. But in the SparkContext constructor that takes the preferredNodeLocationData, setting preferredNodeLocationData comes after the rest of the initialization, so, if the Spark-YARN code comes around quickly enough after being notified, the data that's fetched is the empty unset version. The occurred during all of my runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Foundation policy on releases and Spark nightly builds
Hey Sean B., Thanks for bringing this to our attention. I think putting them on the developer wiki would substantially decrease visibility in a way that is not beneficial to the project - this feature was specifically requested by developers from other projects that integrate with Spark. If the concern underlying that policy is that snapshot builds could be misconstrued as formal releases, I think it would work to put a very clear disclaimer explaining the difference directly adjacent to the link. That's arguably more explicit than just moving the same text to a different page. The formal policy asks us not to include links that encourage non-developers to download the builds. Stating clearly that the audience for those links is developers, in my interpretation that would satisfy the letter and spirit of this policy. - Patrick On Sat, Jul 11, 2015 at 11:53 AM, Sean Owen so...@cloudera.com wrote: From a developer perspective, I also find it surprising to hear that nightly builds should be hidden from non-developer end users. In an age of Github, what on earth is the problem with distributing the content of master? However I do understand why this exists. To the extent the ASF provides any value, it is at least a legal framework for defining what it means for you and I to give software to a bunch of other people. Software artifacts released according to an ASF process becomes something the ASF can take responsibility for as an entity. Nightly builds are not. It might matter to the committers if, say, somebody commits a serious data loss bug. You don't want to be on the hook individually for putting that into end-user hands. More practically, I think this exists to prevent some projects from lazily depending on unofficial nightly builds as pseudo-releases for long periods of time. End users may come to perceive them as official sanctioned releases when they aren't. That's not the case here of course. I think nightlies aren't for end-users anyway, and I think developers who care would know how to get nightlies anyway. There's little cost to moving this info to the wiki, so I'd do it. On Sat, Jul 11, 2015 at 4:29 PM, Reynold Xin r...@databricks.com wrote: I don't get this rule. It is arbitrary, and does not seem like something that should be enforced at the foundation level. By this reasoning, are we not allowed to list source code management on the project public page as well? The download page clearly states the nightly builds are bleeding-edge. Note that technically we did not violate any rules, since the ones we showed were not nightly builds by the foundation's definition: Nightly Builds are simply built from the Subversion trunk, usually once a day.. Spark nightly artifacts were built from git, not svn trunk. :) (joking). On Sat, Jul 11, 2015 at 7:44 AM, Sean Busbey bus...@cloudera.com wrote: That would be great. A note on that page that it's meant for the use of folks working on the project with a link to your get involved howto would be nice additional context. -- Sean On Jul 11, 2015 6:18 AM, Sean Owen so...@cloudera.com wrote: I suggest we move this info to the developer wiki, to keep it out from the place all and users look for downloads. What do you think about that Sean B? On Sat, Jul 11, 2015 at 5:34 AM, Sean Busbey bus...@cloudera.com wrote: Hi Folks! I noticed that Spark website's download page lists nightly builds and instructions for accessing SNAPSHOT maven artifacts[1]. The ASF policy on releases expressly forbids this kind of publishing outside of the dev@spark community[2]. If you'd like to discuss having the policy updated (including expanding the definition of in the development community), please contribute to the discussion on general@incubator[3] after removing the offending items. [1]: http://spark.apache.org/downloads.html#nightly-packages-and-artifacts [2]: http://www.apache.org/dev/release.html#what [3]: http://s.apache.org/XFP -- Sean - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
SparkHub: a new community site for Apache Spark
Hi All, Today, I'm happy to announce SparkHub (http://sparkhub.databricks.com), a service for the Apache Spark community to easily find the most relevant Spark resources on the web. SparkHub is a curated list of Spark news, videos and talks, package releases, upcoming events around the world, and a Spark Meetup directory to help you find a meetup close to you. We will continue to expand the site in the coming months and add more content. I hope SparkHub can help you find Spark related information faster and more easily than is currently possible. Everything is sourced from the Spark community, and we welcome input from you as well! - Patrick - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
[jira] [Created] (SPARK-8957) Backport Hive 1.X support to Branch 1.4
Patrick Wendell created SPARK-8957: -- Summary: Backport Hive 1.X support to Branch 1.4 Key: SPARK-8957 URL: https://issues.apache.org/jira/browse/SPARK-8957 Project: Spark Issue Type: Improvement Components: SQL Reporter: Patrick Wendell Assignee: Michael Armbrust We almost never to feature backports. But I think it would be really useful to backport support for newer Hive versions to the 1.4 branch, for the following reasons: 1. It blocks a large number of users from using Hive support. 2. It's a relatively small set of patches, since most of the heavy lifting was done in Spark 1.4.0's classloader refactoring. 3. Some distributions have already done this, with success. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8957) Backport Hive 1.X support to Branch 1.4
[ https://issues.apache.org/jira/browse/SPARK-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8957: --- Priority: Critical (was: Major) Backport Hive 1.X support to Branch 1.4 --- Key: SPARK-8957 URL: https://issues.apache.org/jira/browse/SPARK-8957 Project: Spark Issue Type: Improvement Components: SQL Reporter: Patrick Wendell Assignee: Michael Armbrust Priority: Critical We almost never to feature backports. But I think it would be really useful to backport support for newer Hive versions to the 1.4 branch, for the following reasons: 1. It blocks a large number of users from using Hive support. 2. It's a relatively small set of patches, since most of the heavy lifting was done in Spark 1.4.0's classloader refactoring. 3. Some distributions have already done this, with success. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1 (RC4)
+1 On Wed, Jul 8, 2015 at 10:55 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1125/ [published as version: 1.4.1-rc4] https://repository.apache.org/content/repositories/orgapachespark-1126/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Sunday, July 12, at 06:55 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored
[ https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620051#comment-14620051 ] Patrick Wendell commented on SPARK-2089: Yeah - I think let's get SPARK-4352 merged and then just close this as won't fix and add a JIRA to document it's non working-ness. This hasn't worked since before Spark 1.0, and SPARK-5352 is just a strictly better solution than this. With YARN, preferredNodeLocalityData isn't honored --- Key: SPARK-2089 URL: https://issues.apache.org/jira/browse/SPARK-2089 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.0.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical When running in YARN cluster mode, apps can pass preferred locality data when constructing a Spark context that will dictate where to request executor containers. This is currently broken because of a race condition. The Spark-YARN code runs the user class and waits for it to start up a SparkContext. During its initialization, the SparkContext will create a YarnClusterScheduler, which notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then immediately fetches the preferredNodeLocationData from the SparkContext and uses it to start requesting containers. But in the SparkContext constructor that takes the preferredNodeLocationData, setting preferredNodeLocationData comes after the rest of the initialization, so, if the Spark-YARN code comes around quickly enough after being notified, the data that's fetched is the empty unset version. The occurred during all of my runs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8949) Remove references to preferredNodeLocalityData in javadoc and print warning when used
Patrick Wendell created SPARK-8949: -- Summary: Remove references to preferredNodeLocalityData in javadoc and print warning when used Key: SPARK-8949 URL: https://issues.apache.org/jira/browse/SPARK-8949 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Reporter: Patrick Wendell Priority: Blocker The SparkContext constructor that takes preferredNodeLocalityData has not worked since before Spark 1.0. Also, the feature in SPARK-4352 is strictly better than a correct implementation of that feature. We should remove any documentation references to that feature and print a warning when it is used saying it doesn't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1 (RC3)
Yeah - we can fix the docs separately from the release. - Patrick On Wed, Jul 8, 2015 at 10:03 AM, Mark Hamstra m...@clearstorydata.com wrote: HiveSparkSubmitSuite is fine for me, but I do see the same issue with DataFrameStatSuite -- OSX 10.10.4, java 1.7.0_75, -Phive -Phive-thriftserver -Phadoop-2.4 -Pyarn On Wed, Jul 8, 2015 at 4:18 AM, Sean Owen so...@cloudera.com wrote: The POM issue is resolved and the build succeeds. The license and sigs still work. The tests pass for me with -Pyarn -Phadoop-2.6, with the following two exceptions. Is anyone else seeing these? this is consistent on Ubuntu 14 with Java 7/8: DataFrameStatSuite: ... - special crosstab elements (., '', null, ``) *** FAILED *** java.lang.NullPointerException: at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$4.apply(StatFunctions.scala:131) at org.apache.spark.sql.execution.stat.StatFunctions$$anonfun$4.apply(StatFunctions.scala:121) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Map$Map4.foreach(Map.scala:181) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.execution.stat.StatFunctions$.crossTabulate(StatFunctions.scala:121) at org.apache.spark.sql.DataFrameStatFunctions.crosstab(DataFrameStatFunctions.scala:94) at org.apache.spark.sql.DataFrameStatSuite$$anonfun$5.apply$mcV$sp(DataFrameStatSuite.scala:97) ... HiveSparkSubmitSuite: - SPARK-8368: includes jars passed in through --jars *** FAILED *** Process returned with exit code 1. See the log4j logs for more detail. (HiveSparkSubmitSuite.scala:92) - SPARK-8020: set sql conf in spark conf *** FAILED *** Process returned with exit code 1. See the log4j logs for more detail. (HiveSparkSubmitSuite.scala:92) - SPARK-8489: MissingRequirementError during reflection *** FAILED *** Process returned with exit code 1. See the log4j logs for more detail. (HiveSparkSubmitSuite.scala:92) On Tue, Jul 7, 2015 at 8:06 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc3 (commit 3e8ae38): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 3e8ae38944f13895daf328555c1ad22cd590b089 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1123/ [published as version: 1.4.1-rc3] https://repository.apache.org/content/repositories/orgapachespark-1124/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Friday, July 10, at 20:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf
[ https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619681#comment-14619681 ] Patrick Wendell edited comment on SPARK-8768 at 7/9/15 1:04 AM: So it turns out that build/mvn still uses the system maven even if it downloads the newer version (this was the original design). Is it possible that is why it's breaking? It might be nice to modify that script to have a flag like --force that will always use the downloaded maven. was (Author: pwendell): So it turns out that build/mvn still uses the system maven even if it downloads the newer version (this was the original design). Is it possible that is why it's breaking? SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf - Key: SPARK-8768 URL: https://issues.apache.org/jira/browse/SPARK-8768 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Reporter: Josh Rosen Priority: Blocker The end-to-end SparkSubmitSuite tests (launch simple application with spark-submit, include jars passed in through --jars, and include jars passed in through --packages) are currently failing for the pre-YARN Hadoop builds. I managed to reproduce one of the Jenkins failures locally: {code} build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite -Dtest=none {code} Here's the output from unit-tests.log: {code} = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple application with spark-submit' = 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Class path contains multiple SLF4J bindings. 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Found binding in [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Found binding in [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 1.5.0-SNAPSHOT 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: joshrosen 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: joshrosen 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(joshrosen); users with modify permissions: Set(joshrosen) 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down ActorSystem [sparkDriver] 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils: java.lang.VerifyError: class akka.remote.WireFormats$AkkaControlMessage overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.lang.ClassLoader.defineClass1(Native Method) 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800) 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449
[jira] [Commented] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf
[ https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619687#comment-14619687 ] Patrick Wendell commented on SPARK-8768: I created SPARK-8933 to track improvements to our maven script. SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf - Key: SPARK-8768 URL: https://issues.apache.org/jira/browse/SPARK-8768 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Reporter: Josh Rosen Priority: Blocker The end-to-end SparkSubmitSuite tests (launch simple application with spark-submit, include jars passed in through --jars, and include jars passed in through --packages) are currently failing for the pre-YARN Hadoop builds. I managed to reproduce one of the Jenkins failures locally: {code} build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite -Dtest=none {code} Here's the output from unit-tests.log: {code} = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple application with spark-submit' = 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Class path contains multiple SLF4J bindings. 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Found binding in [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Found binding in [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 1.5.0-SNAPSHOT 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: joshrosen 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: joshrosen 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(joshrosen); users with modify permissions: Set(joshrosen) 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down ActorSystem [sparkDriver] 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils: java.lang.VerifyError: class akka.remote.WireFormats$AkkaControlMessage overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.lang.ClassLoader.defineClass1(Native Method) 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800) 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:355) 15/07/01 13:40:00.010 redirect
[jira] [Created] (SPARK-8933) Provide a --force flag to build/mvn that always uses downloaded maven
Patrick Wendell created SPARK-8933: -- Summary: Provide a --force flag to build/mvn that always uses downloaded maven Key: SPARK-8933 URL: https://issues.apache.org/jira/browse/SPARK-8933 Project: Spark Issue Type: Improvement Components: Build Reporter: Patrick Wendell Assignee: Brennon York I noticed the other day that build/mvn will still use the system maven if mvn binary is installed. I think this was intentional to support just using zinc and using the system maven (and to match the semantics of sbt/sbt). It would be nice to have a flag that will force it to use the downloaded maven. I was thinking it could have a --force flag, and then it could swallow that flag and not pass it onto maven. This is useful in some cases like our test runners, where we want to coerce a specific version of maven is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1 (RC3)
Hey All, The issue that Josh pointed out is not just a test failure, it's an issue with an important bug fix that was not correctly back-ported into the 1.4 branch. Unfortunately the overall state of the 1.4 branch tests on Jenkins was not in great shape so this was missed earlier on. Given that this is fixed now, I have prepared another RC and am leaning towards restarting the vote. If anyone feels strongly one way or the other let me know, otherwise I'll restart it in a few hours. I figured since this will likely finalize over the weekend anyways, it's not so bad to wait 1 additional day in order to get that fix. - Patrick On Wed, Jul 8, 2015 at 12:00 PM, Josh Rosen rosenvi...@gmail.com wrote: I've filed https://issues.apache.org/jira/browse/SPARK-8903 to fix the DataFrameStatSuite test failure. The problem turned out to be caused by a mistake made while resolving a merge-conflict when backporting that patch to branch-1.4. I've submitted https://github.com/apache/spark/pull/7295 to fix this issue. On Wed, Jul 8, 2015 at 11:30 AM, Sean Owen so...@cloudera.com wrote: I see, but shouldn't this test not be run when Hive isn't in the build? On Wed, Jul 8, 2015 at 7:13 PM, Andrew Or and...@databricks.com wrote: @Sean You actually need to run HiveSparkSubmitSuite with `-Phive` and `-Phive-thriftserver`. The MissingRequirementsError is just complaining that it can't find the right classes. The other one (DataFrameStatSuite) is a little more concerning. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-8768) SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf
[ https://issues.apache.org/jira/browse/SPARK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619681#comment-14619681 ] Patrick Wendell commented on SPARK-8768: So it turns out that build/mvn still uses the system maven even if it downloads the newer version (this was the original design). Is it possible that is why it's breaking? SparkSubmitSuite fails on Hadoop 1.x builds due to java.lang.VerifyError in Akka Protobuf - Key: SPARK-8768 URL: https://issues.apache.org/jira/browse/SPARK-8768 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 1.5.0 Reporter: Josh Rosen Priority: Blocker The end-to-end SparkSubmitSuite tests (launch simple application with spark-submit, include jars passed in through --jars, and include jars passed in through --packages) are currently failing for the pre-YARN Hadoop builds. I managed to reproduce one of the Jenkins failures locally: {code} build/mvn -Phadoop-1 -Dhadoop.version=1.2.1 -Phive -Phive-thriftserver -Pkinesis-asl test -DwildcardSuites=org.apache.spark.deploy.SparkSubmitSuite -Dtest=none {code} Here's the output from unit-tests.log: {code} = TEST OUTPUT FOR o.a.s.deploy.SparkSubmitSuite: 'launch simple application with spark-submit' = 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Class path contains multiple SLF4J bindings. 15/07/01 13:39:58.964 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Found binding in [jar:file:/Users/joshrosen/Documents/spark-2/assembly/target/scala-2.10/spark-assembly-1.5.0-SNAPSHOT-hadoop1.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Found binding in [jar:file:/Users/joshrosen/.m2/repository/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 15/07/01 13:39:58.965 redirect stderr for command ./bin/spark-submit INFO Utils: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/07/01 13:39:58.966 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:58 INFO SparkContext: Running Spark version 1.5.0-SNAPSHOT 15/07/01 13:39:59.334 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing view acls to: joshrosen 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: Changing modify acls to: joshrosen 15/07/01 13:39:59.335 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(joshrosen); users with modify permissions: Set(joshrosen) 15/07/01 13:39:59.898 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO Slf4jLogger: Slf4jLogger started 15/07/01 13:39:59.934 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:39:59 INFO Remoting: Starting remoting 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils: 15/07/01 13:40:00 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-5] shutting down ActorSystem [sparkDriver] 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils: java.lang.VerifyError: class akka.remote.WireFormats$AkkaControlMessage overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.lang.ClassLoader.defineClass1(Native Method) 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.lang.ClassLoader.defineClass(ClassLoader.java:800) 15/07/01 13:40:00.009 redirect stderr for command ./bin/spark-submit INFO Utils:at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader.access$100(URLClassLoader.java:71) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark-submit INFO Utils:at java.net.URLClassLoader$1.run(URLClassLoader.java:361) 15/07/01 13:40:00.010 redirect stderr for command ./bin/spark
[VOTE] Release Apache Spark 1.4.1 (RC4)
Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc4 (commit dbaa5c2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= dbaa5c294eb565f84d7032e387e4b8c1a56e4cd2 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1125/ [published as version: 1.4.1-rc4] https://repository.apache.org/content/repositories/orgapachespark-1126/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc4-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Sunday, July 12, at 06:55 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC3)
This vote is cancelled in favor of RC4. - Patrick On Tue, Jul 7, 2015 at 12:06 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc3 (commit 3e8ae38): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 3e8ae38944f13895daf328555c1ad22cd590b089 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1123/ [published as version: 1.4.1-rc3] https://repository.apache.org/content/repositories/orgapachespark-1124/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Friday, July 10, at 20:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[RESULT] [VOTE] Release Apache Spark 1.4.1 (RC2)
Hey All, This vote is cancelled in favor of RC3. - Patrick On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 07b95c7adf88f0662b7ab1c47e302ff5e6859606 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1120/ [published as version: 1.4.1-rc2] https://repository.apache.org/content/repositories/orgapachespark-1121/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Monday, July 06, at 22:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Updated] (SPARK-6805) ML Pipeline API in SparkR
[ https://issues.apache.org/jira/browse/SPARK-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-6805: --- Priority: Critical (was: Major) ML Pipeline API in SparkR - Key: SPARK-6805 URL: https://issues.apache.org/jira/browse/SPARK-6805 Project: Spark Issue Type: Umbrella Components: ML, SparkR Reporter: Xiangrui Meng Priority: Critical SparkR was merged. So let's have this umbrella JIRA for the ML pipeline API in SparkR. The implementation should be similar to the pipeline API implementation in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Can not build master
Hi Tomo, For now you can do that as a work around. We are working on a fix for this in the master branch but it may take a couple of days since the issue is fairly complicated. - Patrick On Sat, Jul 4, 2015 at 7:00 AM, tomo cocoa cocoatom...@gmail.com wrote: Hi all, I have a same error and it seems depending on Maven versions. I tried building Spark using Maven with several versions on Jenkins. + Output of /Users/tomohiko/.jenkins/tools/hudson.tasks.Maven_MavenInstallation/mvn-3.3.3/bin/mvn -version: Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T20:57:37+09:00) Maven home: /Users/tomohiko/.jenkins/tools/hudson.tasks.Maven_MavenInstallation/mvn-3.3.3 Java version: 1.8.0, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac + Jenkins Configuration: Jenkins project type: Maven Project Goals and options: -Phadoop-2.6 -DskipTests clean package + Maven versions and results: 3.3.3 - infinite loop 3.3.1 - infinite loop 3.2.5 - SUCCESS So do we prefer to build Spark with Maven 3.2.5? On 4 July 2015 at 12:28, Andrew Or and...@databricks.com wrote: Thanks, I just tried it with 3.3.3 and I was able to reproduce it as well. 2015-07-03 18:51 GMT-07:00 Tarek Auel tarek.a...@gmail.com: That's mine Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T04:57:37-07:00) Maven home: /usr/local/Cellar/maven/3.3.3/libexec Java version: 1.8.0_45, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac On Fri, Jul 3, 2015 at 6:32 PM Ted Yu yuzhih...@gmail.com wrote: Here is mine: Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c; 2015-03-13T13:10:27-07:00) Maven home: /home/hbase/apache-maven-3.3.1 Java version: 1.8.0_45, vendor: Oracle Corporation Java home: /home/hbase/jdk1.8.0_45/jre Default locale: en_US, platform encoding: UTF-8 OS name: linux, version: 2.6.32-504.el6.x86_64, arch: amd64, family: unix On Fri, Jul 3, 2015 at 6:05 PM, Andrew Or and...@databricks.com wrote: @Tarek and Ted, what maven versions are you using? 2015-07-03 17:35 GMT-07:00 Krishna Sankar ksanka...@gmail.com: Patrick, I assume an RC3 will be out for folks like me to test the distribution. As usual, I will run the tests when you have a new distribution. Cheers k/ On Fri, Jul 3, 2015 at 4:38 PM, Patrick Wendell pwend...@gmail.com wrote: Patch that added test-jar dependencies: https://github.com/apache/spark/commit/bfe74b34 Patch that originally disabled dependency reduced poms: https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724 Patch that reverted the disabling of dependency reduced poms: https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell pwend...@gmail.com wrote: Okay I did some forensics with Sean Owen. Some things about this bug: 1. The underlying cause is that we added some code to make the tests of sub modules depend on the core tests. For unknown reasons this causes Spark to hit MSHADE-148 for *some* combinations of build profiles. 2. MSHADE-148 can be worked around by disabling building of dependency reduced poms because then the buggy code path is circumvented. Andrew Or did this in a patch on the 1.4 branch. However, that is not a tenable option for us because our *published* pom files require dependency reduction to substitute in the scala version correctly for the poms published to maven central. 3. As a result, Andrew Or reverted his patch recently, causing some package builds to start failing again (but publishing works now). 4. The reason this is not detected in our test harness or release build is that it is sensitive to the profiles enabled. The combination of profiles we enable in the test harness and release builds do not trigger this bug. The best path I see forward right now is to do the following: 1. Disable creation of dependency reduced poms by default (this doesn't matter for people doing a package build) so typical users won't have this bug. 2. Add a profile that re-enables that setting. 3. Use the above profile when publishing release artifacts to maven central. 4. Hope that we don't hit this bug for publishing. - Patrick On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote: Doesn't change anything for me. On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote: Can you try using the built in maven build/mvn...? All of our builds are passing on Jenkins so I wonder if it's a maven version issue: https
Re: [VOTE] Release Apache Spark 1.4.1 (RC2)
Hm - what if you do a fresh git checkout (just to make sure you don't have an older maven version downloaded). It also might be that this really is an issue even with Maven 3.3.3. I just am not sure why it's not reflected in our continuous integration or the build of the release packages themselves: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/ It could be that it's dependent on which modules are enabled. On Fri, Jul 3, 2015 at 3:46 PM, Robin East robin.e...@xense.co.uk wrote: which got me thinking: build/mvn -version Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c; 2015-03-13T20:10:27+00:00) Maven home: /usr/local/Cellar/maven/3.3.1/libexec Java version: 1.8.0_40, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: mac os x, version: 10.10.2, arch: x86_64, family: “mac Seems to be using 3.3.1 On 3 Jul 2015, at 23:44, Robin East robin.e...@xense.co.uk wrote: I used the following build command: build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package this also gave the ‘Dependency-reduced POM’ loop Robin On 3 Jul 2015, at 23:41, Patrick Wendell pwend...@gmail.com wrote: What if you use the built-in maven (i.e. build/mvn). It might be that we require a newer version of maven than you have. The release itself is built with maven 3.3.3: https://github.com/apache/spark/blob/master/build/mvn#L72 - Patrick On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com wrote: Yep, happens to me as well. Build loops. Cheers k/ On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote: Patrick: I used the following command: ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package The build doesn't seem to stop. Here is tail of build output: [INFO] Dependency-reduced POM written at: /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml Here is part of the stack trace for the build process: http://pastebin.com/xL2Y0QMU FYI On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 07b95c7adf88f0662b7ab1c47e302ff5e6859606 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1120/ [published as version: 1.4.1-rc2] https://repository.apache.org/content/repositories/orgapachespark-1121/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Monday, July 06, at 22:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1 (RC2)
Let's continue the disucssion on the other thread relating to the master build. On Fri, Jul 3, 2015 at 4:13 PM, Patrick Wendell pwend...@gmail.com wrote: Thanks - it appears this is just a legitimate issue with the build, affecting all versions of Maven. On Fri, Jul 3, 2015 at 4:02 PM, Krishna Sankar ksanka...@gmail.com wrote: I have 3.3.3 USS-Defiant:NW ksankar$ mvn -version Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T04:57:37-07:00) Maven home: /usr/local/apache-maven-3.3.3 Java version: 1.7.0_60, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac Let me nuke it and reinstall maven. Cheers k/ On Fri, Jul 3, 2015 at 3:41 PM, Patrick Wendell pwend...@gmail.com wrote: What if you use the built-in maven (i.e. build/mvn). It might be that we require a newer version of maven than you have. The release itself is built with maven 3.3.3: https://github.com/apache/spark/blob/master/build/mvn#L72 - Patrick On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com wrote: Yep, happens to me as well. Build loops. Cheers k/ On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote: Patrick: I used the following command: ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package The build doesn't seem to stop. Here is tail of build output: [INFO] Dependency-reduced POM written at: /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml Here is part of the stack trace for the build process: http://pastebin.com/xL2Y0QMU FYI On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 07b95c7adf88f0662b7ab1c47e302ff5e6859606 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1120/ [published as version: 1.4.1-rc2] https://repository.apache.org/content/repositories/orgapachespark-1121/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Monday, July 06, at 22:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Can not build master
Okay I did some forensics with Sean Owen. Some things about this bug: 1. The underlying cause is that we added some code to make the tests of sub modules depend on the core tests. For unknown reasons this causes Spark to hit MSHADE-148 for *some* combinations of build profiles. 2. MSHADE-148 can be worked around by disabling building of dependency reduced poms because then the buggy code path is circumvented. Andrew Or did this in a patch on the 1.4 branch. However, that is not a tenable option for us because our *published* pom files require dependency reduction to substitute in the scala version correctly for the poms published to maven central. 3. As a result, Andrew Or reverted his patch recently, causing some package builds to start failing again (but publishing works now). 4. The reason this is not detected in our test harness or release build is that it is sensitive to the profiles enabled. The combination of profiles we enable in the test harness and release builds do not trigger this bug. The best path I see forward right now is to do the following: 1. Disable creation of dependency reduced poms by default (this doesn't matter for people doing a package build) so typical users won't have this bug. 2. Add a profile that re-enables that setting. 3. Use the above profile when publishing release artifacts to maven central. 4. Hope that we don't hit this bug for publishing. - Patrick On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote: Doesn't change anything for me. On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote: Can you try using the built in maven build/mvn...? All of our builds are passing on Jenkins so I wonder if it's a maven version issue: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/ - Patrick On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at SPARK-8781 (https://github.com/apache/spark/pull/7193) Cheers On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote: I found a solution, there might be a better one. https://github.com/apache/spark/pull/7217 On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk wrote: Yes me too On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote: This is what I got (the last line was repeated non-stop): [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com wrote: Hi all, I am trying to build the master, but it stucks and prints [INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml build command: mvn -DskipTests clean package Do others have the same issue? Regards, Tarek - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Can not build master
Patch that added test-jar dependencies: https://github.com/apache/spark/commit/bfe74b34 Patch that originally disabled dependency reduced poms: https://github.com/apache/spark/commit/984ad60147c933f2d5a2040c87ae687c14eb1724 Patch that reverted the disabling of dependency reduced poms: https://github.com/apache/spark/commit/bc51bcaea734fe64a90d007559e76f5ceebfea9e On Fri, Jul 3, 2015 at 4:36 PM, Patrick Wendell pwend...@gmail.com wrote: Okay I did some forensics with Sean Owen. Some things about this bug: 1. The underlying cause is that we added some code to make the tests of sub modules depend on the core tests. For unknown reasons this causes Spark to hit MSHADE-148 for *some* combinations of build profiles. 2. MSHADE-148 can be worked around by disabling building of dependency reduced poms because then the buggy code path is circumvented. Andrew Or did this in a patch on the 1.4 branch. However, that is not a tenable option for us because our *published* pom files require dependency reduction to substitute in the scala version correctly for the poms published to maven central. 3. As a result, Andrew Or reverted his patch recently, causing some package builds to start failing again (but publishing works now). 4. The reason this is not detected in our test harness or release build is that it is sensitive to the profiles enabled. The combination of profiles we enable in the test harness and release builds do not trigger this bug. The best path I see forward right now is to do the following: 1. Disable creation of dependency reduced poms by default (this doesn't matter for people doing a package build) so typical users won't have this bug. 2. Add a profile that re-enables that setting. 3. Use the above profile when publishing release artifacts to maven central. 4. Hope that we don't hit this bug for publishing. - Patrick On Fri, Jul 3, 2015 at 3:51 PM, Tarek Auel tarek.a...@gmail.com wrote: Doesn't change anything for me. On Fri, Jul 3, 2015 at 3:45 PM Patrick Wendell pwend...@gmail.com wrote: Can you try using the built in maven build/mvn...? All of our builds are passing on Jenkins so I wonder if it's a maven version issue: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/ - Patrick On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at SPARK-8781 (https://github.com/apache/spark/pull/7193) Cheers On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote: I found a solution, there might be a better one. https://github.com/apache/spark/pull/7217 On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk wrote: Yes me too On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote: This is what I got (the last line was repeated non-stop): [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com wrote: Hi all, I am trying to build the master, but it stucks and prints [INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml build command: mvn -DskipTests clean package Do others have the same issue? Regards, Tarek - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Can not build master
Can you try using the built in maven build/mvn...? All of our builds are passing on Jenkins so I wonder if it's a maven version issue: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Compile/ - Patrick On Fri, Jul 3, 2015 at 3:14 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at SPARK-8781 (https://github.com/apache/spark/pull/7193) Cheers On Fri, Jul 3, 2015 at 3:05 PM, Tarek Auel tarek.a...@gmail.com wrote: I found a solution, there might be a better one. https://github.com/apache/spark/pull/7217 On Fri, Jul 3, 2015 at 2:28 PM Robin East robin.e...@xense.co.uk wrote: Yes me too On 3 Jul 2015, at 22:21, Ted Yu yuzhih...@gmail.com wrote: This is what I got (the last line was repeated non-stop): [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT.jar with /home/hbase/spark/bagel/target/spark-bagel_2.10-1.5.0-SNAPSHOT-shaded.jar [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/hbase/spark/bagel/dependency-reduced-pom.xml On Fri, Jul 3, 2015 at 1:13 PM, Tarek Auel tarek.a...@gmail.com wrote: Hi all, I am trying to build the master, but it stucks and prints [INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml build command: mvn -DskipTests clean package Do others have the same issue? Regards, Tarek - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1 (RC2)
Thanks - it appears this is just a legitimate issue with the build, affecting all versions of Maven. On Fri, Jul 3, 2015 at 4:02 PM, Krishna Sankar ksanka...@gmail.com wrote: I have 3.3.3 USS-Defiant:NW ksankar$ mvn -version Apache Maven 3.3.3 (7994120775791599e205a5524ec3e0dfe41d4a06; 2015-04-22T04:57:37-07:00) Maven home: /usr/local/apache-maven-3.3.3 Java version: 1.7.0_60, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.7.0_60.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: mac os x, version: 10.10.3, arch: x86_64, family: mac Let me nuke it and reinstall maven. Cheers k/ On Fri, Jul 3, 2015 at 3:41 PM, Patrick Wendell pwend...@gmail.com wrote: What if you use the built-in maven (i.e. build/mvn). It might be that we require a newer version of maven than you have. The release itself is built with maven 3.3.3: https://github.com/apache/spark/blob/master/build/mvn#L72 - Patrick On Fri, Jul 3, 2015 at 3:19 PM, Krishna Sankar ksanka...@gmail.com wrote: Yep, happens to me as well. Build loops. Cheers k/ On Fri, Jul 3, 2015 at 2:40 PM, Ted Yu yuzhih...@gmail.com wrote: Patrick: I used the following command: ~/apache-maven-3.3.1/bin/mvn -DskipTests -Phadoop-2.4 -Pyarn -Phive clean package The build doesn't seem to stop. Here is tail of build output: [INFO] Dependency-reduced POM written at: /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /home/hbase/spark-1.4.1/bagel/dependency-reduced-pom.xml Here is part of the stack trace for the build process: http://pastebin.com/xL2Y0QMU FYI On Fri, Jul 3, 2015 at 1:15 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 07b95c7adf88f0662b7ab1c47e302ff5e6859606 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1120/ [published as version: 1.4.1-rc2] https://repository.apache.org/content/repositories/orgapachespark-1121/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Monday, July 06, at 22:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[RESULT] [VOTE] Release Apache Spark 1.4.1
This vote is cancelled in favor of RC2. Thanks very much to Sean Owen for triaging an important bug associated with RC1. I took a look at the branch-1.4 contents and I think its safe to cut RC2 from the head of that branch (i.e no very high risk patches that I could see). JIRA management around the time of the RC voting is an interesting topic, Sean I like your most recent proposal. Maybe we can put that on the wiki or start a DISCUSS thread to cover that topic. On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 60e08e50751fe3929156de956d62faea79f5b801 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1118/ [published as version: 1.4.1-rc1] https://repository.apache.org/content/repositories/orgapachespark-1119/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Saturday, June 27, at 06:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[VOTE] Release Apache Spark 1.4.1 (RC2)
Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc2 (commit 07b95c7): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 07b95c7adf88f0662b7ab1c47e302ff5e6859606 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1120/ [published as version: 1.4.1-rc2] https://repository.apache.org/content/repositories/orgapachespark-1121/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc2-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Monday, July 06, at 22:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Resolved] (SPARK-8649) Mapr repository is not defined properly
[ https://issues.apache.org/jira/browse/SPARK-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-8649. Resolution: Fixed Fix Version/s: 1.5.0 Mapr repository is not defined properly --- Key: SPARK-8649 URL: https://issues.apache.org/jira/browse/SPARK-8649 Project: Spark Issue Type: Bug Components: Build Reporter: Ashok Kumar Priority: Trivial Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1
Hey Krishna - this is still the current release candidate. - Patrick On Sun, Jun 28, 2015 at 12:14 PM, Krishna Sankar ksanka...@gmail.com wrote: Patrick, Haven't seen any replies on test results. I will byte ;o) - Should I test this version or is another one in the wings ? Cheers k/ On Tue, Jun 23, 2015 at 10:37 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 60e08e50751fe3929156de956d62faea79f5b801 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1118/ [published as version: 1.4.1-rc1] https://repository.apache.org/content/repositories/orgapachespark-1119/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Saturday, June 27, at 06:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-8667) Improve Spark UI behavior at scale
[ https://issues.apache.org/jira/browse/SPARK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604000#comment-14604000 ] Patrick Wendell commented on SPARK-8667: Thanks Sean. I looked for a while for an older JIRA on this, but couldn't find it. This is definitely a dup of SPARK-2015. Improve Spark UI behavior at scale -- Key: SPARK-8667 URL: https://issues.apache.org/jira/browse/SPARK-8667 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Patrick Wendell Assignee: Shixiong Zhu This is a parent ticket and we can create child tickets when solving specific issues. The main problem I would like to solve is the fact that the Spark UI has issues at very large scale. The worst issue is when there is a stage page with more than a few thousand tasks. In this case: 1. The page itself is very slow to load and becomes unresponsive with huge number of tasks. 2. The Scala XML output can become so large that it crashes the driver program due to OOM for a page with a huge number of tasks. I am not sure if (1) is caused by javascript slowness, or maybe just the raw amount of data sent over the wire. If it is the latter, it might be possible to add compression to the HTTP payload to help improve load time. It would be nice to reproduce+investigate these issues further and create specific sub tasks to improve them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8667) Improve Spark UI behavior at scale
[ https://issues.apache.org/jira/browse/SPARK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-8667. Resolution: Duplicate Improve Spark UI behavior at scale -- Key: SPARK-8667 URL: https://issues.apache.org/jira/browse/SPARK-8667 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Patrick Wendell Assignee: Shixiong Zhu This is a parent ticket and we can create child tickets when solving specific issues. The main problem I would like to solve is the fact that the Spark UI has issues at very large scale. The worst issue is when there is a stage page with more than a few thousand tasks. In this case: 1. The page itself is very slow to load and becomes unresponsive with huge number of tasks. 2. The Scala XML output can become so large that it crashes the driver program due to OOM for a page with a huge number of tasks. I am not sure if (1) is caused by javascript slowness, or maybe just the raw amount of data sent over the wire. If it is the latter, it might be possible to add compression to the HTTP payload to help improve load time. It would be nice to reproduce+investigate these issues further and create specific sub tasks to improve them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1
Hey Tom - no one voted on this yet, so I need to keep it open until people vote. But I'm not aware of specific things we are waiting for. Anyone else? - Patrick On Fri, Jun 26, 2015 at 7:10 AM, Tom Graves tgraves...@yahoo.com wrote: So is this open for vote then or are we waiting on other things? Tom On Thursday, June 25, 2015 10:32 AM, Andrew Ash and...@andrewash.com wrote: I would guess that many tickets targeted at 1.4.1 were set that way during the tail end of the 1.4.0 voting process as people realized they wouldn't make the .0 release in time. In that case, they were likely aiming for a 1.4.x release, not necessarily 1.4.1 specifically. Maybe creating a 1.4.x target in Jira in addition to 1.4.0, 1.4.1, 1.4.2, etc would make it more clear that these tickets are targeted at some 1.4 update release rather than specifically the 1.4.1 update. On Thu, Jun 25, 2015 at 5:38 AM, Sean Owen so...@cloudera.com wrote: That makes sense to me -- there's an urgent fix to get out. I missed that part. Not that it really matters but was that expressed elsewhere? I know we tend to start the RC process even when a few more changes are still in progress, to get a first wave or two of testing done early, knowing that the RC won't be the final one. It makes sense for some issues for X to be open when an RC is cut, if they are actually truly intended for X. 44 seems like a lot, and I don't think it's good practice just because that's how it's happened before. It looks like half of them weren't actually important for 1.4.x as we're now down to 21. I don't disagree with the idea that only most of the issues targeted for version X will be in version X; the target expresses a stretch goal. Given the fast pace of change that's probably the only practical view. I think we're just missing a step then: before RC of X, ask people to review and update the target of JIRAs for X? In this case, it was a good point to untarget stuff from 1.4.x entirely; I suspect everything else should then be targeted at 1.4.2 by default with the exception of a handful that people really do intend to work in for 1.4.1 before its final release. I know it sounds like pencil-pushing, but it's a cheap way to bring some additional focus to release planning. RC time has felt like a last-call to *begin* changes ad-hoc when it would go faster if it were more intentional and constrained. Meaning faster RCs, meaning getting back to a 3-month release cycle or less, and meaning less rush to push stuff into a .0 release and less frequent need for a maintenance .1 version. So what happens if all 1.4.1-targeted JIRAs are targeted to 1.4.2? would that miss something that is definitely being worked on for 1.4.1? On Wed, Jun 24, 2015 at 6:56 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean, This is being shipped now because there is a severe bug in 1.4.0 that can cause data corruption for Parquet users. There are no blockers targeted for 1.4.1 - so I don't see that JIRA is inconsistent with shipping a release now. The goal of having every single targeted JIRA cleared by the time we start voting, I don't think there is broad consensus and cultural adoption of that principle yet. So I do not take it as a signal this release is premature (the story has been the same for every previous release we've ever done). The fact that we hit 90/124 of issues targeted at this release means we are targeting such that we get around 70% of issues merged. That actually doesn't seem so bad to me since there is some uncertainty in the process. B - Patrick On Wed, Jun 24, 2015 at 1:54 AM, Sean Owen so...@cloudera.com wrote: There are 44 issues still targeted for 1.4.1. None are Blockers; 12 are Critical. ~80% were opened and/or set by committers. Compare with 90 issues resolved for 1.4.1. I'm concerned that committers are targeting lots more for a release even in the short term than realistically can go in. On its face, it suggests that an RC is premature. Why is 1.4.1 being put forth for release now? It seems like people are saying they want a fair bit more time to work on 1.4.1. I suspect that in fact people would rather untarget / slip (again) these JIRAs, but it calls into question again how the targeting is consistently off by this much. What unresolved JIRAs targeted for 1.4.1 are *really* still open for 1.4.1? like, what would go badly if all 32 non-Critical JIRAs were untargeted now? is the reality that there are a handful of items to get in before the final release, and those are hopefully the ~12 critical ones? How about some review of that before we ask people to seriously test these bits? On Wed, Jun 24, 2015 at 8:37 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted
[jira] [Created] (SPARK-8667) Improve Spark UI behavior at scale
Patrick Wendell created SPARK-8667: -- Summary: Improve Spark UI behavior at scale Key: SPARK-8667 URL: https://issues.apache.org/jira/browse/SPARK-8667 Project: Spark Issue Type: Improvement Reporter: Patrick Wendell Assignee: Shixiong Zhu This is a parent ticket and we can create child tickets when solving specific issues. The main problem I would like to solve is the fact that the Spark UI has issues at very large scale. The worst issue is when there is a stage page with more than a few thousand tasks. In this case: 1. The page itself is very slow to load and becomes unresponsive with huge number of tasks. 2. The Scala XML output can become so large that it crashes the driver program due to OOM for a page with a huge number of tasks. I am not sure if (1) is caused by javascript slowness, or maybe just the raw amount of data sent over the wire. If it is the latter, it might be possible to add compression to the HTTP payload to help improve load time. It would be nice to reproduce+investigate these issues further and create specific sub tasks to improve them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8667) Improve Spark UI behavior at scale
[ https://issues.apache.org/jira/browse/SPARK-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8667: --- Component/s: Web UI Improve Spark UI behavior at scale -- Key: SPARK-8667 URL: https://issues.apache.org/jira/browse/SPARK-8667 Project: Spark Issue Type: Improvement Components: Web UI Reporter: Patrick Wendell Assignee: Shixiong Zhu This is a parent ticket and we can create child tickets when solving specific issues. The main problem I would like to solve is the fact that the Spark UI has issues at very large scale. The worst issue is when there is a stage page with more than a few thousand tasks. In this case: 1. The page itself is very slow to load and becomes unresponsive with huge number of tasks. 2. The Scala XML output can become so large that it crashes the driver program due to OOM for a page with a huge number of tasks. I am not sure if (1) is caused by javascript slowness, or maybe just the raw amount of data sent over the wire. If it is the latter, it might be possible to add compression to the HTTP payload to help improve load time. It would be nice to reproduce+investigate these issues further and create specific sub tasks to improve them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.1
Hey Sean, This is being shipped now because there is a severe bug in 1.4.0 that can cause data corruption for Parquet users. There are no blockers targeted for 1.4.1 - so I don't see that JIRA is inconsistent with shipping a release now. The goal of having every single targeted JIRA cleared by the time we start voting, I don't think there is broad consensus and cultural adoption of that principle yet. So I do not take it as a signal this release is premature (the story has been the same for every previous release we've ever done). The fact that we hit 90/124 of issues targeted at this release means we are targeting such that we get around 70% of issues merged. That actually doesn't seem so bad to me since there is some uncertainty in the process. B - Patrick On Wed, Jun 24, 2015 at 1:54 AM, Sean Owen so...@cloudera.com wrote: There are 44 issues still targeted for 1.4.1. None are Blockers; 12 are Critical. ~80% were opened and/or set by committers. Compare with 90 issues resolved for 1.4.1. I'm concerned that committers are targeting lots more for a release even in the short term than realistically can go in. On its face, it suggests that an RC is premature. Why is 1.4.1 being put forth for release now? It seems like people are saying they want a fair bit more time to work on 1.4.1. I suspect that in fact people would rather untarget / slip (again) these JIRAs, but it calls into question again how the targeting is consistently off by this much. What unresolved JIRAs targeted for 1.4.1 are *really* still open for 1.4.1? like, what would go badly if all 32 non-Critical JIRAs were untargeted now? is the reality that there are a handful of items to get in before the final release, and those are hopefully the ~12 critical ones? How about some review of that before we ask people to seriously test these bits? On Wed, Jun 24, 2015 at 8:37 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 60e08e50751fe3929156de956d62faea79f5b801 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1118/ [published as version: 1.4.1-rc1] https://repository.apache.org/content/repositories/orgapachespark-1119/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Saturday, June 27, at 06:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[VOTE] Release Apache Spark 1.4.1
Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be voted on is v1.4.1-rc1 (commit 60e08e5): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 60e08e50751fe3929156de956d62faea79f5b801 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.1] https://repository.apache.org/content/repositories/orgapachespark-1118/ [published as version: 1.4.1-rc1] https://repository.apache.org/content/repositories/orgapachespark-1119/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.1-rc1-docs/ Please vote on releasing this package as Apache Spark 1.4.1! The vote is open until Saturday, June 27, at 06:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.1 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
[ https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8494: --- Assignee: (was: Patrick Wendell) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3 --- Key: SPARK-8494 URL: https://issues.apache.org/jira/browse/SPARK-8494 Project: Spark Issue Type: Bug Components: Spark Core Reporter: PJ Fanning Attachments: spark-test-case.zip I found a similar issue to SPARK-1923 but with Scala 2.10.4. I used the Test.scala from SPARK-1923 but used the libraryDependencies from a build.sbt that I am working on. If I remove the spray 1.3.3 jars, the test case passes but has a ClassNotFoundException otherwise. I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT. Application: {code} import org.apache.spark.SparkConf import org.apache.spark.SparkContext object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster(local[4]).setAppName(Test) val sc = new SparkContext(conf) sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count sc.stop() } {code} Exception: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:354) java.lang.ClassLoader.loadClass(ClassLoader.java:425) java.lang.ClassLoader.loadClass(ClassLoader.java:358) java.lang.Class.forName0(Native Method) java.lang.Class.forName(Class.java:270) org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60) java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) {code} {code} name := spark-test-case version := 1.0 scalaVersion := 2.10.4 resolvers += spray repo at http://repo.spray.io; resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases; val akkaVersion = 2.3.11 val sprayVersion = 1.3.3 libraryDependencies ++= Seq( com.h2database % h2 % 1.4.187, com.typesafe.akka %% akka-actor % akkaVersion, com.typesafe.akka %% akka-slf4j % akkaVersion, ch.qos.logback % logback-classic % 1.0.13, io.spray %% spray-can% sprayVersion, io.spray %% spray-routing% sprayVersion, io.spray %% spray-json % 1.3.1, com.databricks %% spark-csv% 1.0.3, org.specs2 %% specs2 % 2.4.17 % test, org.specs2 %% specs2-junit % 2.4.17 % test, io.spray %% spray-testkit% sprayVersion % test, com.typesafe.akka %% akka-testkit % akkaVersion% test, junit % junit% 4.12 % test ) scalacOptions ++= Seq( -unchecked, -deprecation, -Xlint, -Ywarn-dead-code, -language:_, -target:jvm-1.7, -encoding, UTF-8 ) testOptions += Tests.Argument(TestFrameworks.JUnit, -v) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7292) Provide operator to truncate lineage without persisting RDD's
[ https://issues.apache.org/jira/browse/SPARK-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7292: --- Assignee: Andrew Or Provide operator to truncate lineage without persisting RDD's - Key: SPARK-7292 URL: https://issues.apache.org/jira/browse/SPARK-7292 Project: Spark Issue Type: New Feature Components: Spark Core Reporter: Patrick Wendell Assignee: Andrew Or Checkpointing exists in Spark to truncate a lineage chain. I've heard requests from some users to allow truncation of lineage in a way that is cheap and doesn't serialized and persist the RDD. This is possible if the user is willing to forgo fault tolerance for that RDD (for instance, for shorter running jobs or ones that use a small number of machines). It's pretty easy to allow this so we should look into it for Spark 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8416) Thread dump page should highlight Spark executor threads
[ https://issues.apache.org/jira/browse/SPARK-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592411#comment-14592411 ] Patrick Wendell commented on SPARK-8416: It would also be nice to put those threads first in the list. Thread dump page should highlight Spark executor threads Key: SPARK-8416 URL: https://issues.apache.org/jira/browse/SPARK-8416 Project: Spark Issue Type: Bug Components: Web UI Reporter: Josh Rosen On the Spark thread dump page, it's hard to pick out executor threads from other system threads. The UI should employ some color coding or highlighting to make this more apparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8434) Add a pretty parameter to show
[ https://issues.apache.org/jira/browse/SPARK-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8434: --- Component/s: SQL Add a pretty parameter to show Key: SPARK-8434 URL: https://issues.apache.org/jira/browse/SPARK-8434 Project: Spark Issue Type: Bug Components: SQL Reporter: Shixiong Zhu Sometimes the user may want to show the complete content of cells, such as sql(set -v) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8450) PySpark write.parquet raises Unsupported datatype DecimalType()
[ https://issues.apache.org/jira/browse/SPARK-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8450: --- Component/s: SQL PySpark PySpark write.parquet raises Unsupported datatype DecimalType() --- Key: SPARK-8450 URL: https://issues.apache.org/jira/browse/SPARK-8450 Project: Spark Issue Type: Bug Components: PySpark, SQL Environment: Spark 1.4.0 on Debian Reporter: Peter Hoffmann I'm getting an Exception when I try to save a DataFrame with a DeciamlType as an parquet file Minimal Example: from decimal import Decimal from pyspark.sql import SQLContext from pyspark.sql.types import * sqlContext = SQLContext(sc) schema = StructType([ StructField('id', LongType()), StructField('value', DecimalType())]) rdd = sc.parallelize([[1, Decimal(0.5)],[2, Decimal(2.9)]]) df = sqlContext.createDataFrame(rdd, schema) df.write.parquet(hdfs://srv:9000/user/ph/decimal.parquet, 'overwrite') Stack Trace --- Py4JJavaError Traceback (most recent call last) ipython-input-19-a77dac8de5f3 in module() 1 sr.write.parquet(hdfs://srv:9000/user/ph/decimal.parquet, 'overwrite') /home/spark/spark-1.4.0-bin-hadoop2.6/python/pyspark/sql/readwriter.pyc in parquet(self, path, mode) 367 :param mode: one of `append`, `overwrite`, `error`, `ignore` (default: error) 368 -- 369 return self._jwrite.mode(mode).parquet(path) 370 371 @since(1.4) /home/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in __call__(self, *args) 536 answer = self.gateway_client.send_command(command) 537 return_value = get_return_value(answer, self.gateway_client, -- 538 self.target_id, self.name) 539 540 for temp_arg in temp_args: /home/spark/spark-1.4.0-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 298 raise Py4JJavaError( 299 'An error occurred while calling {0}{1}{2}.\n'. -- 300 format(target_id, '.', name), value) 301 else: 302 raise Py4JError( Py4JJavaError: An error occurred while calling o361.parquet. : org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:138) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:114) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:332) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:281) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 158 in stage 35.0 failed 4 times, most recent failure: Lost task 158.3 in stage 35.0 (TID 2736, 10.2.160.14
[jira] [Updated] (SPARK-8427) Incorrect ACL checking for partitioned table in Spark SQL-1.4
[ https://issues.apache.org/jira/browse/SPARK-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-8427: --- Priority: Critical (was: Blocker) Incorrect ACL checking for partitioned table in Spark SQL-1.4 - Key: SPARK-8427 URL: https://issues.apache.org/jira/browse/SPARK-8427 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.0 Environment: CentOS 6 OS X 10.9.5, Hive-0.13.1, Spark-1.4, Hadoop 2.6.0 Reporter: Karthik Subramanian Priority: Critical Labels: security Problem Statement: While doing query on a partitioned table using Spark SQL (Version 1.4.0), access denied exception is observed on the partition the user doesn’t belong to (The user permission is controlled using HDF ACLs). The same works correctly in hive. Usercase: To address Multitenancy Consider a table containing multiple customers and each customer with multiple facility. The table is partitioned by customer and facility. The user belonging to on facility will not have access to other facility. This is enforced using HDFS ACLs on corresponding directories. When querying on the table as ‘user1’ belonging to ‘facility1’ and ‘customer1’ on the particular partition (using ‘where’ clause) only the corresponding directory access should be verified and not the entire table. The above use case works as expected when using HIVE client, version 0.13.1 1.1.0. The query used: select count(*) from customertable where customer=‘customer1’ and facility=‘facility1’ Below is the exception received in Spark-shell: org.apache.hadoop.security.AccessControlException: Permission denied: user=user1, access=READ_EXECUTE, inode=/data/customertable/customer=customer2/facility=facility2”:root:supergroup:drwxrwx---:group::r-x,group:facility2:rwx at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkAccessAcl(FSPermissionChecker.java:351) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:253) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6494) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6419) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4954) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4915) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:826) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:612) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1971) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755
[jira] [Updated] (SPARK-5787) Protect JVM from some not-important exceptions
[ https://issues.apache.org/jira/browse/SPARK-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5787: --- Target Version/s: 1.5.0 (was: 1.4.0) Protect JVM from some not-important exceptions -- Key: SPARK-5787 URL: https://issues.apache.org/jira/browse/SPARK-5787 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Davies Liu Priority: Critical Any un-captured exception will shutdown the executor JVM, so we should capture all those exceptions which did not hurt executor much (executor is still functional). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7448) Implement custom bye array serializer for use in PySpark shuffle
[ https://issues.apache.org/jira/browse/SPARK-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7448: --- Target Version/s: 1.5.0 (was: 1.4.0) Implement custom bye array serializer for use in PySpark shuffle Key: SPARK-7448 URL: https://issues.apache.org/jira/browse/SPARK-7448 Project: Spark Issue Type: Improvement Components: PySpark, Shuffle Reporter: Josh Rosen Priority: Minor PySpark's shuffle typically shuffles Java RDDs that contain byte arrays. We should implement a custom Serializer for use in these shuffles. This will allow us to take advantage of shuffle optimizations like SPARK-7311 for PySpark without requiring users to change the default serializer to KryoSerializer (this is useful for JobServer-type applications). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7078) Cache-aware binary processing in-memory sort
[ https://issues.apache.org/jira/browse/SPARK-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7078: --- Target Version/s: 1.5.0 (was: 1.4.0) Cache-aware binary processing in-memory sort Key: SPARK-7078 URL: https://issues.apache.org/jira/browse/SPARK-7078 Project: Spark Issue Type: New Feature Components: Shuffle Reporter: Reynold Xin Assignee: Josh Rosen A cache-friendly sort algorithm that can be used eventually for: * sort-merge join * shuffle See the old alpha sort paper: http://research.microsoft.com/pubs/68249/alphasort.doc Note that state-of-the-art for sorting has improved quite a bit, but we can easily optimize the sorting algorithm itself later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7041) Avoid writing empty files in BypassMergeSortShuffleWriter
[ https://issues.apache.org/jira/browse/SPARK-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-7041: --- Target Version/s: 1.5.0 (was: 1.4.0) Avoid writing empty files in BypassMergeSortShuffleWriter - Key: SPARK-7041 URL: https://issues.apache.org/jira/browse/SPARK-7041 Project: Spark Issue Type: Improvement Components: Shuffle Reporter: Josh Rosen Assignee: Josh Rosen In BypassMergeSortShuffleWriter, we may end up opening disk writers files for empty partitions; this occurs because we manually call {{open()}} after creating the writer, causing serialization and compression input streams to be created; these streams may write headers to the output stream, resulting in non-zero-length files being created for partitions that contain no records. This is unnecessary, though, since the disk object writer will automatically open itself when the first write is performed. Removing this eager {{open()}} call and rewriting the consumers to cope with the non-existence of empty files results in a large performance benefit for certain sparse workloads when using sort-based shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6393) Extra RPC to the AM during killExecutor invocation
[ https://issues.apache.org/jira/browse/SPARK-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590426#comment-14590426 ] Patrick Wendell commented on SPARK-6393: [~sandyryza] I'm un-targeting this. If you are planning on working on this for a specific version, feel free to retarget. Extra RPC to the AM during killExecutor invocation -- Key: SPARK-6393 URL: https://issues.apache.org/jira/browse/SPARK-6393 Project: Spark Issue Type: Improvement Components: Spark Core, YARN Affects Versions: 1.3.1 Reporter: Sandy Ryza This was introduced by SPARK-6325 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org