Re: Spark Master Maven with YARN build is broken

2015-02-02 Thread Patrick Wendell
It's my fault, I'm sending a hot fix now. On Mon, Feb 2, 2015 at 1:44 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/ Is this is a known issue? It seems to have

Temporary jenkins issue

2015-02-02 Thread Patrick Wendell
Hey All, I made a change to the Jenkins configuration that caused most builds to fail (attempting to enable a new plugin), I've reverted the change effective about 10 minutes ago. If you've seen recent build failures like below, this was caused by that change. Sorry about that. ERROR:

[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-02-02 Thread Patrick Wendell
This is cancelled in favor of RC2. On Mon, Feb 2, 2015 at 8:50 PM, Patrick Wendell pwend...@gmail.com wrote: The windows issue reported only affects actually running Spark on Windows (not job submission). However, I agree it's worth cutting a new RC. I'm going to cancel this vote and propose

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-02-02 Thread Patrick Wendell
The windows issue reported only affects actually running Spark on Windows (not job submission). However, I agree it's worth cutting a new RC. I'm going to cancel this vote and propose RC3 with a single additional patch. Let's try to vote that through so we can ship Spark 1.2.1. - Patrick On Sat,

[VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-02 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc3 (commit b6eaf77): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b6eaf77d4332bfb0a698849b1f5f917d20d70e97 The release files, including signatures, digests, etc.

Re: Questions about Spark standalone resource scheduler

2015-02-02 Thread Patrick Wendell
Hey Jerry, I think standalone mode will still add more features over time, but the goal isn't really for it to become equivalent to what Mesos/YARN are today. Or at least, I doubt Spark Standalone will ever attempt to manage _other_ frameworks outside of Spark and become a general purpose

[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-28 Thread Patrick Wendell
And Scale OK Fixed : org.apache.spark.SparkException in zip ! 2.5. rdd operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. recommendation OK Cheers k/ On Mon, Jan 26, 2015 at 11:02 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Patrick Wendell
://issues.apache.org/jira/browse/SPARK-5144 Thanks, Aniket On Wed Jan 28 2015 at 15:39:43 Patrick Wendell [via Apache Spark Developers List] ml-node+s1001551n1031...@n3.nabble.com wrote: Minor typo in the above e-mail - the tag is named v1.2.1-rc2 (not v1.2.1-rc1). On Wed, Jan 28, 2015 at 2:06 AM

[VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit b77f876): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b77f87673d1f9f03d4c83cf583158227c551359b The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Patrick Wendell
Minor typo in the above e-mail - the tag is named v1.2.1-rc2 (not v1.2.1-rc1). On Wed, Jan 28, 2015 at 2:06 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit b77f876

Re: spark akka fork : is the source anywhere?

2015-01-28 Thread Patrick Wendell
It's maintained here: https://github.com/pwendell/akka/tree/2.2.3-shaded-proto Over time, this is something that would be great to get rid of, per rxin On Wed, Jan 28, 2015 at 3:33 PM, Reynold Xin r...@databricks.com wrote: Hopefully problems like this will go away entirely in the next couple

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
, Sean On Jan 27, 2015, at 12:04 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
, at 11:35 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean, Right now we don't publish every 2.11 binary to avoid combinatorial explosion of the number of build artifacts we publish (there are other parameters such as whether hive is included, etc). We can revisit this in future feature

Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
create: configuration createChecksumtrue/createChecksum /configuration As for the key issue, I think it's just a matter of uploading the new key in both places. We should all of course test the release anyway. On Tue, Jan 27, 2015 at 5:55 PM, Patrick Wendell pwend...@gmail.com wrote: Hey

Friendly reminder/request to help with reviews!

2015-01-27 Thread Patrick Wendell
Hey All, Just a reminder, as always around release time we have a very large volume of patches show up near the deadline. One thing that can help us maximize the number of patches we get in is to have community involvement in performing code reviews. And in particular, doing a thorough review

[VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-26 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1! The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec The release files, including signatures, digests, etc.

Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
, this will at least serve as an up-to-date list of packages/versions they should try to install locally in whatever environment they have. - Patrick On Wed, Jan 21, 2015 at 5:42 AM, Will Benton wi...@redhat.com wrote: - Original Message - From: Patrick Wendell pwend...@gmail.com To: Sean

Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
If the goal is a reproducible test environment then I think that is what Jenkins is. Granted you can only ask it for a test. But presumably you get the same result if you start from the same VM image as Jenkins and run the same steps. But the issue is when users can't reproduce Jenkins

Re: Standardized Spark dev environment

2015-01-20 Thread Patrick Wendell
To respond to the original suggestion by Nick. I always thought it would be useful to have a Docker image on which we run the tests and build releases, so that we could have a consistent environment that other packagers or people trying to exhaustively run Spark tests could replicate (or at least

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell
The wiki does not seem to be operational ATM, but I will do this when it is back up. On Mon, Jan 19, 2015 at 12:00 PM, Patrick Wendell pwend...@gmail.com wrote: Okay - so given all this I was going to put the following on the wiki tentatively: ## Reviewing Code Community code review

Re: Semantics of LGTM

2015-01-19 Thread Patrick Wendell
the latter unless qualified in some other way. I don't have any opinion on the specific characters, but I agree with Aaron that it would be nice to have some sort of abbreviation for both the strong and weak forms of approval. -Sandy On Jan 17, 2015, at 7:25 PM, Patrick Wendell

Re: Bouncing Mails

2015-01-17 Thread Patrick Wendell
Akhil, Those are handled by ASF infrastructure, not anyone in the Spark project. So this list is not the appropriate place to ask for help. - Patrick On Sat, Jan 17, 2015 at 12:56 AM, Akhil Das ak...@sigmoidanalytics.com wrote: My mails to the mailing list are getting rejected, have opened a

Semantics of LGTM

2015-01-17 Thread Patrick Wendell
Hey All, Just wanted to ping about a minor issue - but one that ends up having consequence given Spark's volume of reviews and commits. As much as possible, I think that we should try and gear towards Google Style LGTM on reviews. What I mean by this is that LGTM has the following semantics: I

Re: Job priority

2015-01-11 Thread Patrick Wendell
Priority scheduling isn't something we've supported in Spark and we've opted to support FIFO and Fair scheduling and asked users to try and fit these to the needs of their applications. In practice from what I've seen of priority schedulers, such as the linux CPU scheduler, is that strict

Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell
Nick - yes. Do you mind moving it? I should have put it in the Contributing to Spark page. On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Side question: Should this section

Re: Spark development with IntelliJ

2015-01-08 Thread Patrick Wendell
Actually I went ahead and did it. On Thu, Jan 8, 2015 at 10:25 PM, Patrick Wendell pwend...@gmail.com wrote: Nick - yes. Do you mind moving it? I should have put it in the Contributing to Spark page. On Thu, Jan 8, 2015 at 3:22 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Side

Re: Spark UI history job duration is wrong

2015-01-05 Thread Patrick Wendell
Thanks for reporting this - it definitely sounds like a bug. Please open a JIRA for it. My guess is that we define the start or end time of the job based on the current time instead of looking at data encoded in the underlying event stream. That would cause it to not work properly when loading

Re: Spark driver main thread hanging after SQL insert

2015-01-02 Thread Patrick Wendell
Hi Alessandro, Can you create a JIRA for this rather than reporting it on the dev list? That's where we track issues like this. Thanks!. - Patrick On Wed, Dec 31, 2014 at 8:48 PM, Alessandro Baretta alexbare...@gmail.com wrote: Here's what the console shows: 15/01/01 01:12:29 INFO

ANNOUNCE: New build script ./build/mvn

2014-12-27 Thread Patrick Wendell
Hi All, A consistent piece of feedback from Spark developers has been that the Maven build is very slow. Typesafe provides a tool called Zinc which improves Scala complication speed substantially with Maven, but is difficult to install and configure, especially for platforms other than Mac OS.

Re: Problems with large dataset using collect() and broadcast()

2014-12-24 Thread Patrick Wendell
Hi Will, When you call collect() the item you are collecting needs to fit in memory on the driver. Is it possible your driver program does not have enough memory? - Patrick On Wed, Dec 24, 2014 at 9:34 PM, Will Yang era.ye...@gmail.com wrote: Hi all, In my occasion, I have a huge

Re: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Patrick Wendell
: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 25, 2014 3:22 PM To: Shao, Saisai Cc: u...@spark.apache.org; dev@spark.apache.org Subject: Re: Question on saveAsTextFile with overwrite option Is it sufficient to set spark.hadoop.validateOutputSpecs to false? http

Re: Use mvn to build Spark 1.2.0 failed

2014-12-22 Thread Patrick Wendell
I also couldn't reproduce this issued. On Mon, Dec 22, 2014 at 2:24 AM, Sean Owen so...@cloudera.com wrote: I just tried the exact same command and do not see any error. Maybe you can make sure you're starting from a clean extraction of the distro, and check your environment. I'm on OSX, Maven

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Xiangrui asked me to report that it's back and running :) On Mon, Dec 22, 2014 at 3:21 PM, peng pc...@uowmail.edu.au wrote: Me 2 :) On 12/22/2014 06:14 PM, Andrew Ash wrote: Hi Xiangrui, That link is currently returning a 503 Over Quota error message. Would you mind pinging back out

Re: More general submitJob API

2014-12-22 Thread Patrick Wendell
A SparkContext is thread safe, so you can just have different threads that create their own RDD's and do actions, etc. - Patrick On Mon, Dec 22, 2014 at 4:15 PM, Alessandro Baretta alexbare...@gmail.com wrote: Andrew, Thanks, yes, this is what I wanted: basically just to start multiple jobs

Re: Announcing Spark Packages

2014-12-22 Thread Patrick Wendell
Hey Nick, I think Hitesh was just trying to be helpful and point out the policy - not necessarily saying there was an issue. We've taken a close look at this and I think we're in good shape her vis-a-vis this policy. - Patrick On Mon, Dec 22, 2014 at 5:29 PM, Nicholas Chammas

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
missing we should add. - Patrick On Mon, Dec 22, 2014 at 6:17 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Does this include contributions made against the spark-ec2 repo? On Wed Dec 17 2014 at 12:29:19 AM Patrick Wendell pwend...@gmail.com wrote: Hey All, Due to the very high

Re: [ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-22 Thread Patrick Wendell
s/Josh/Nick/ - sorry! On Mon, Dec 22, 2014 at 10:52 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Josh, We don't explicitly track contributions to spark-ec2 in the Apache Spark release notes. The main reason is that usually updates to spark-ec2 include a corresponding update to spark so

Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! This release brings operational and performance improvements in Spark

Re: Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
to different commits in https://github.com/apache/spark/releases Best Regards, Shixiong Zhu 2014-12-19 16:52 GMT+08:00 Patrick Wendell pwend...@gmail.com: I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest

Re: Which committers care about Kafka?

2014-12-18 Thread Patrick Wendell
Hey Cody, Thanks for reaching out with this. The lead on streaming is TD - he is traveling this week though so I can respond a bit. To the high level point of whether Kafka is important - it definitely is. Something like 80% of Spark Streaming deployments (anecdotally) ingest data from Kafka.

Re: [RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-18 Thread Patrick Wendell
Update: An Apache infrastructure issue prevented me from pushing this last night. The issue was resolved today and I should be able to push the final release artifacts tonight. On Tue, Dec 16, 2014 at 9:20 PM, Patrick Wendell pwend...@gmail.com wrote: This vote has PASSED with 12 +1 votes (8

[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-16 Thread Patrick Wendell
This vote has PASSED with 12 +1 votes (8 binding) and no 0 or -1 votes: +1: Matei Zaharia* Madhu Siddalingaiah Reynold Xin* Sandy Ryza Josh Rozen* Mark Hamstra* Denny Lee Tom Graves* GuiQiang Li Nick Pentreath* Sean McNamara* Patrick Wendell* 0: -1: I'll finalize and package this release

Re: [VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-16 Thread Patrick Wendell
...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p

[ANNOUNCE] Requiring JIRA for inclusion in release credits

2014-12-16 Thread Patrick Wendell
Hey All, Due to the very high volume of contributions, we're switching to an automated process for generating release credits. This process relies on JIRA for categorizing contributions, so it's not possible for us to provide credits in the case where users submit pull requests with no associated

Re: Test failures after Jenkins upgrade

2014-12-15 Thread Patrick Wendell
/apache/spark/pull/3701 We might be close to fixing this via one of those PRs, so maybe we should try using one of those instead? On December 15, 2014 at 10:51:46 AM, Patrick Wendell (pwend...@gmail.com) wrote: Hey All, It appears that a single test suite is failing after the jenkins upgrade

Re: zinc invocation examples

2014-12-12 Thread Patrick Wendell
) and would be great to get your initial read on it. Per this thread I need to add in the -scala-home call to zinc, but its close to ready for a PR. On 12/5/14, 2:10 PM, Patrick Wendell pwend...@gmail.com wrote: One thing I created a JIRA for a while back was to have a similar script to sbt/sbt

Re: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-10 Thread Patrick Wendell
Hi Andrew, It looks like somehow you are including jars from the upstream Apache Hive 0.13 project on your classpath. For Spark 1.2 Hive 0.13 support, we had to modify Hive to use a different version of Kryo that was compatible with Spark's Kryo version.

[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-10 Thread Patrick Wendell
This vote is closed in favor of RC2. On Fri, Dec 5, 2014 at 2:02 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks all for the continued testing! The issue I mentioned earlier SPARK-4498 was fixed earlier this week (hat tip to Mark Hamstra who contributed to fix

[VOTE] Release Apache Spark 1.2.0 (RC2)

2014-12-10 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc.

Re: Is Apache JIRA down?

2014-12-10 Thread Patrick Wendell
I believe many apache services are/were down due to an outage. On Wed, Dec 10, 2014 at 5:24 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Nevermind, seems to be back up now. On Wed Dec 10 2014 at 7:46:30 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: For example:

Re: Is this a little bug in BlockTransferMessage ?

2014-12-09 Thread Patrick Wendell
Hey Nick, Thanks for bringing this up. I believe these Java tests are running in the sbt build right now, the issue is that this particular bug was flagged by the triggering of a runtime Java assert (not a normal Junit test assertion) and those are not enabled in our sbt tests. It would be good

Re: zinc invocation examples

2014-12-05 Thread Patrick Wendell
One thing I created a JIRA for a while back was to have a similar script to sbt/sbt that transparently downloads Zinc, Scala, and Maven in a subdirectory of Spark and sets it up correctly. I.e. build/mvn. Outside of brew for MacOS there aren't good Zinc packages, and it's a pain to figure out how

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-05 Thread Patrick Wendell
the web UI. On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h

Re: Ooyala Spark JobServer

2014-12-04 Thread Patrick Wendell
Hey Jun, The Ooyala server is being maintained by it's original author (Evan Chan) here: https://github.com/spark-jobserver/spark-jobserver This is likely to stay as a standalone project for now, since it builds directly on Spark's public API's. - Patrick On Wed, Dec 3, 2014 at 9:02 PM, Jun

Re: Exception adding resource files in latest Spark

2014-12-04 Thread Patrick Wendell
Thanks for flagging this. I reverted the relevant YARN fix in Spark 1.2 release. We can try to debug this in master. On Thu, Dec 4, 2014 at 9:51 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: I created a ticket for this: https://issues.apache.org/jira/browse/SPARK-4757 Jianshi On Fri,

Re: keeping PR titles / descriptions up to date

2014-12-02 Thread Patrick Wendell
Also a note on this for committers - it's possible to re-word the title during merging, by just running git commit -a --amend before you push the PR. - Patrick On Tue, Dec 2, 2014 at 12:50 PM, Mridul Muralidharan mri...@gmail.com wrote: I second that ! Would also be great if the JIRA was

Re: Spurious test failures, testing best practices

2014-12-02 Thread Patrick Wendell
Hey Ryan, What if you run a single mvn install to install all libraries locally - then can you mvn compile -pl core? I think this may be the only way to make it work. - Patrick On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams ryan.blake.willi...@gmail.com wrote: Following on Mark's Maven

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-12-01 Thread Patrick Wendell
additional logging in order to help with debugging. I just wanted to give an early head's up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
for the Contributing to Spark https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark guide. Nick On Sun Nov 30 2014 at 6:50:55 PM Patrick Wendell pwend...@gmail.com wrote: Hey Ryan, A few more things here. You should feel free to send patches to Jenkins to test them, since

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
larger than for snapshot docs). However we could just add /snapshot and publish docs there. - Patrick On Sun, Nov 30, 2014 at 6:15 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Ryan, The existing JIRA also covers publishing nightly docs: https://issues.apache.org/jira/browse/SPARK-1517

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-29 Thread Patrick Wendell
Patrick, unfortunately you got some of the text here wrong, saying 1.1.0 instead of 1.2.0. Not sure it will matter since there can well be another RC after testing, but we should be careful. Matei On Nov 28, 2014, at 9:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing

Re: Trouble testing after updating to latest master

2014-11-29 Thread Patrick Wendell
Thanks for reporting this. One thing to try is to just do a git clean to make sure you have a totally clean working space (git clean -fdx will blow away any differences you have from the repo, of course only do that if you don't have other files around). Can you reproduce this if you just run

[VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Patrick Wendell
+1 (binding). Don't see any evidence of regressions at this point. The issue reported by Hector was not related to this rlease. On Sun, Nov 23, 2014 at 9:50 AM, Debasish Das debasish.da...@gmail.com wrote: -1 from me...same FetchFailed issue as what Hector saw... I am running Netflix dataset

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-23 Thread Patrick Wendell
Hey Stephen, Thanks for bringing this up. Technically when we call a release vote it needs to be on the exact commit that will be the final release. However, one thing I've thought of doing for a while would be to publish the maven artifacts using a version tag with $VERSION-rcX even if the

Re: Notes on writing complex spark applications

2014-11-23 Thread Patrick Wendell
Hey Evan, It might be nice to merge this into existing documentation. In particular, a lot of this could serve to update the current tuning section and programming guides. It could also work to paste this wholesale as a reference for Spark users, but in that case it's less likely to get updated

Re: How spark and hive integrate in long term?

2014-11-22 Thread Patrick Wendell
There are two distinct topics when it comes to hive integration. Part of the 1.3 roadmap will likely be better defining the plan for Hive integration as Hive adds future versions. 1. Ability to interact with Hive metastore's from different versions == I.e. if a user has a metastore, can Spark SQL

Re: Apache infra github sync down

2014-11-22 Thread Patrick Wendell
Hi All, Unfortunately this went back down again. I've opened a new JIRA to track it: https://issues.apache.org/jira/browse/INFRA-8688 - Patrick On Tue, Nov 18, 2014 at 10:24 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, The Apache--github mirroring is not working right now

Automated github closing of issues is not working

2014-11-21 Thread Patrick Wendell
After we merge pull requests in Spark they are closed via a special message we put in each commit description (Closes #XXX). This feature stopped working around 21 hours ago causing already-merged pull requests to display as open. I've contacted Github support with the issue. No word from them

Spark development with IntelliJ

2014-11-20 Thread Patrick Wendell
Hi All, I noticed people sometimes struggle to get Spark set up in IntelliJ. I'd like to maintain comprehensive instructions on our Wiki to make this seamless for future developers. Due to some nuances of our build, getting to the point where you can build + test every module from within the IDE

Build break

2014-11-19 Thread Patrick Wendell
Hey All, Just a heads up. I merged this patch last night which caused the Spark build to break: https://github.com/apache/spark/commit/397d3aae5bde96b01b4968dde048b6898bb6c914 The patch itself was fine and previously had passed on Jenkins. The issue was that other intermediate changes merged

Re: [VOTE] Release Apache Spark 1.1.1 (RC1)

2014-11-17 Thread Patrick Wendell
Hey Kevin, If you are upgrading from 1.0.X to 1.1.X checkout the upgrade notes here [1] - it could be that default changes caused a regression for your workload. Do you still see a regression if you restore the configuration changes? It's great to hear specifically about issues like this, so

Re: mvn or sbt for studying and developing Spark?

2014-11-16 Thread Patrick Wendell
Neither is strictly optimal which is why we ended up supporting both. Our reference build for packaging is Maven so you are less likely to run into unexpected dependency issues, etc. Many developers use sbt as well. It's somewhat religion and the best thing might be to try both and see which you

Re: Has anyone else observed this build break?

2014-11-15 Thread Patrick Wendell
Server VM (build 24.60-b09, mixed mode) Let me see if the problem can be solved upstream in HBase hbase-annotations module. Cheers On Fri, Nov 14, 2014 at 12:32 PM, Patrick Wendell pwend...@gmail.com wrote: I think in this case we can probably just drop that dependency, so there is a simpler

Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
A recent patch broke clean builds for me, I am trying to see how widespread this issue is and whether we need to revert the patch. The error I've seen is this when building the examples project: spark-examples_2.10: Could not resolve dependencies for project

Re: Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
A work around for this fix is identified here: http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsjar-in-jdk.html However, if this affects more users I'd prefer to just fix it properly in our build. On Fri, Nov 14, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com wrote

Re: Has anyone else observed this build break?

2014-11-14 Thread Patrick Wendell
this can fix it? Thanks, Hari On Fri, Nov 14, 2014 at 12:21 PM, Patrick Wendell pwend...@gmail.com wrote: A work around for this fix is identified here: http://dbknickerbocker.blogspot.com/2013/04/simple-fix-to-missing-toolsjar-in-jdk.html However, if this affects more users I'd prefer

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
, IntelliJ will temporarily think things like the Kafka module are being removed. Say 'no' when it asks if you want to remove them. - Can we go straight to Scala 2.11.4? On Wed, Nov 12, 2014 at 5:47 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, I've just merged a patch that adds

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
scrapco...@gmail.com wrote: One thing we can do it is print a helpful error and break. I don't know about how this can be done, but since now I can write groovy inside maven build so we have more control. (Yay!!) Prashant Sharma On Thu, Nov 13, 2014 at 12:05 PM, Patrick Wendell pwend

Re: [NOTICE] [BUILD] Minor changes to Spark's build

2014-11-12 Thread Patrick Wendell
: Currently there are no mandatory profiles required to build Spark. I.e. mvn package just works. It seems sad that we would need to break this. On Wed, Nov 12, 2014 at 10:59 PM, Patrick Wendell pwend...@gmail.com wrote: I think printing an error that says -Pscala-2.10 must be enabled

Re: JIRA + PR backlog

2014-11-11 Thread Patrick Wendell
I wonder if we should be linking to that dashboard somewhere from our official docs or the wiki... On Tue, Nov 11, 2014 at 12:23 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, kudos to Josh for putting that together. On Tue, Nov 11, 2014 at 3:26 AM, Yu Ishikawa

[NOTICE] [BUILD] Minor changes to Spark's build

2014-11-11 Thread Patrick Wendell
Hey All, I've just merged a patch that adds support for Scala 2.11 which will have some minor implications for the build. These are due to the complexities of supporting two versions of Scala in a single project. 1. The JDBC server will now require a special flag to build -Phive-thriftserver on

Re: Should new YARN shuffle service work with yarn-alpha?

2014-11-08 Thread Patrick Wendell
refactoring needed? Either to support YARN alpha as a separate shuffle module, or sever this dependency? Of course this goes away when yarn-alpha goes away too. On Sat, Nov 8, 2014 at 7:45 AM, Patrick Wendell pwend...@gmail.com wrote: I bet it doesn't work. +1 on isolating it's inclusion

Re: Should new YARN shuffle service work with yarn-alpha?

2014-11-08 Thread Patrick Wendell
. That makes yarn-alpha work. I'll run tests and open a quick JIRA / PR for the change. On Sat, Nov 8, 2014 at 8:23 AM, Patrick Wendell pwend...@gmail.com wrote: This second error is something else. Maybe you are excluding network-shuffle instead of spark-network-yarn

Re: Should new YARN shuffle service work with yarn-alpha?

2014-11-07 Thread Patrick Wendell
I bet it doesn't work. +1 on isolating it's inclusion to only the newer YARN API's. - Patrick On Fri, Nov 7, 2014 at 11:43 PM, Sean Owen so...@cloudera.com wrote: I noticed that this doesn't compile: mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package [error]

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
I think new committers might or might not be maintainers (it would depend on the PMC vote). I don't think it would affect what you could merge, you can merge in any part of the source tree, you just need to get sign off if you want to touch a public API or make major architectural changes. Most

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
Hey Greg, Regarding subversion - I think the reference is to partial vs full committers here: https://subversion.apache.org/docs/community-guide/roles.html - Patrick On Thu, Nov 6, 2014 at 4:18 PM, Greg Stein gst...@gmail.com wrote: -1 (non-binding) This is an idea that runs COMPLETELY

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Patrick Wendell
In fact, if you look at the subversion commiter list, the majority of people here have commit access only for particular areas of the project: http://svn.apache.org/repos/asf/subversion/trunk/COMMITTERS On Thu, Nov 6, 2014 at 4:26 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Greg

Re: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Patrick Wendell
I'm a +1 on this as well, I think it will be a useful model as we scale the project in the future and recognizes some informal process we have now. To respond to Sandy's comment: for changes that fall in between the component boundaries or are straightforward, my understanding of this model is

branch-1.2 has been cut

2014-11-03 Thread Patrick Wendell
Hi All, I've just cut the release branch for Spark 1.2, consistent with then end of the scheduled feature window for the release. New commits to master will need to be explicitly merged into branch-1.2 in order to be in the release. This begins the transition into a QA period for Spark 1.2, with

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Patrick Wendell
Does this happen if you clean and recompile? I've seen failures on and off, but haven't been able to find one that I could reproduce from a clean build such that we could hand it to the scala team. - Patrick On Sun, Nov 2, 2014 at 7:25 PM, Imran Rashid im...@therashids.com wrote: I'm finding

Re: sbt scala compiler crashes on spark-sql

2014-11-02 Thread Patrick Wendell
versa. A clean rebuild can always solve this. On Mon, Nov 3, 2014 at 11:28 AM, Patrick Wendell pwend...@gmail.com wrote: Does this happen if you clean and recompile? I've seen failures on and off, but haven't been able to find one that I could reproduce from a clean build such that we

Changes to Spark's networking subsystem

2014-11-01 Thread Patrick Wendell
== Short version == A recent commit replaces Spark's networking subsystem with one based on Netty rather than raw sockets. Users running off of master can disable this change by setting spark.shuffle.blockTransferService=nio. We will be testing with this during the QA period for Spark 1.2. The new

Re: Surprising Spark SQL benchmark

2014-10-31 Thread Patrick Wendell
Hey Nick, Unfortunately Citus Data didn't contact any of the Spark or Spark SQL developers when running this. It is really easy to make one system look better than others when you are running a benchmark yourself because tuning and sizing can lead to a 10X performance improvement. This benchmark

Re: How to run tests properly?

2014-10-30 Thread Patrick Wendell
packaging would only build the jar and place it in the target folder. How does that affect the tests? If tests depend on the assembly a mvn install would be more sensible to me. Probably I misunderstand the maven build life-cycle. Thanks, Niklas On 29.10.2014 19:01, Patrick Wendell wrote: One

Re: How to run tests properly?

2014-10-29 Thread Patrick Wendell
One thing is you need to do a maven package before you run tests. The local-cluster tests depend on Spark already being packaged. - Patrick On Wed, Oct 29, 2014 at 10:02 AM, Niklas Wilcke 1wil...@informatik.uni-hamburg.de wrote: Hi Sean, thanks for your reply. The tests still don't work. I

Re: Support Hive 0.13 .1 in Spark SQL

2014-10-28 Thread Patrick Wendell
Hey Cheng, Right now we aren't using stable API's to communicate with the Hive Metastore. We didn't want to drop support for Hive 0.12 so right now we are using a shim layer to support compiling for 0.12 and 0.13. This is very costly to maintain. If Hive has a stable meta-data API for talking to

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional instances of this due to supporting scala 2.11

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
. Anyone who knows how to handle this - a quick note here would be appreciated. 2014-10-28 20:20 GMT-07:00 Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
Patrick Wendell pwend...@gmail.com: Hey Stephen, In some cases in the maven build we now have pluggable source directories based on profiles using the maven build helper plug-in. This is necessary to support cross building against different Hive versions, and there will be additional

<    1   2   3   4   5   6   >