Re: Tentative due dates for Spark 1.3.2 release

2015-05-15 Thread Patrick Wendell
Hi Niranda, Maintenance releases are not done on a predetermined schedule but instead according to which fixes show up and their severity. Since we just did a 1.3.1 release I'm not sure I see 1.3.2 on the immediate horizon. However, the maintenance releases are simply builds at the head of the

Re: Adding/Using More Resolution Types on JIRA

2015-05-15 Thread Patrick Wendell
it mean too many TODOs are filed and forgotten? That's no comment on the current state, just something to watch. So: yes I like the idea. On May 12, 2015 8:50 AM, Patrick Wendell pwend...@gmail.com wrote: In Spark we sometimes close issues as something other than Fixed

Re: Recent Spark test failures

2015-05-15 Thread Patrick Wendell
The PR builder currently builds against Hadoop 2.3. - Patrick On Fri, May 15, 2015 at 11:40 AM, Marcelo Vanzin van...@cloudera.com wrote: Funny thing, since I asked this question in a PR a few minutes ago... Ignoring the rotation suggestion for a second, can the PR builder at least cover

Re: Recent Spark test failures

2015-05-15 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: The PR builder currently builds against Hadoop 2.3. - Patrick On Fri, May 15, 2015 at 11:40 AM, Marcelo Vanzin van...@cloudera.com wrote: Funny thing, since I asked this question in a PR a few minutes ago... Ignoring the rotation suggestion for a second

Re: How to link code pull request with JIRA ID?

2015-05-14 Thread Patrick Wendell
Yeah I wrote the original script and I intentionally made it easy for other projects to use (you'll just need to tweak some variables at the top). You just need somewhere to run it... we were using a jenkins cluster to run it every 5 minutes. BTW - I looked and there is one instance where it hard

Re: [IMPORTANT] Committers please update merge script

2015-05-13 Thread Patrick Wendell
Hi All - unfortunately the fix introduced another bug, which is that fixVersion was not updated properly. I've updated the script and had one other person test it. So committers please pull from master again thanks! - Patrick On Tue, May 12, 2015 at 6:25 PM, Patrick Wendell pwend...@gmail.com

Re: Change for submitting to yarn in 1.3.1

2015-05-13 Thread Patrick Wendell
Sent from my iPad On May 12, 2015, at 20:54, Patrick Wendell pwend...@gmail.com wrote: Hey Kevin and Ron, So is the main shortcoming of the launcher library the inability to get an app ID back from YARN? Or are there other issues here that fundamentally regress things for you

Adding/Using More Resolution Types on JIRA

2015-05-12 Thread Patrick Wendell
In Spark we sometimes close issues as something other than Fixed, and this is an important part of maintaining our JIRA. The current resolution types we use are the following: Won't Fix - bug fix or (more often) feature we don't want to add Invalid - issue is underspecified or not appropriate

[IMPORTANT] Committers please update merge script

2015-05-12 Thread Patrick Wendell
Due to an ASF infrastructure change (bug?) [1] the default JIRA resolution status has switched to Pending Closed. I've made a change to our merge script to coerce the correct status of Fixed when resolving [2]. Please upgrade the merge script to master. I've manually corrected JIRA's that were

Re: [discuss] ending support for Java 6?

2015-05-05 Thread Patrick Wendell
If there is broad consensus here to drop Java 1.6 in Spark 1.5, should we do an ANNOUNCE to user and dev? On Mon, May 4, 2015 at 7:24 PM, shane knapp skn...@berkeley.edu wrote: sgtm On Mon, May 4, 2015 at 11:23 AM, Patrick Wendell pwend...@gmail.com wrote: If we just set JAVA_HOME in dev/run

Pull request builder errors (taking Jenkins worker 3 offline)

2015-05-05 Thread Patrick Wendell
For unknown reasons, pull requests on Jenkins worker 3 have been failing with an exception[1]. After trying to fix this by clearing the ivy and maven caches on the node, I've given up and simply blacklisted that worker. [error] oro#oro;2.0.8!oro.jar origin location must be absolute:

[ANNOUNCE] Spark branch-1.4

2015-05-04 Thread Patrick Wendell
Hi Devs, Just an announcement that I've cut Spark's branch 1.4 to form the basis of the 1.4 release. Other than a few stragglers, this represents the end of active feature development for Spark 1.4. Per usual, if committers are merging any features, please be in touch so I can help coordinate.

Re: [discuss] ending support for Java 6?

2015-05-04 Thread Patrick Wendell
If we just set JAVA_HOME in dev/run-test-jenkins, I think it should work. On Mon, May 4, 2015 at 7:20 PM, shane knapp skn...@berkeley.edu wrote: ...and now the workers all have java6 installed. https://issues.apache.org/jira/browse/SPARK-1437 sadly, the built-in jenkins jdk management

Thanking Test Partners

2015-05-04 Thread Patrick Wendell
Hey All, Community testing during the QA window is an important part of the release cycle in Spark. It helps us deliver higher quality releases by vetting out issues not covered by our unit tests. I was thinking that from now on, it would be nice to recognize the organizations that donate time

Re: What is the location in the source code of the computation of the elements in a map transformation?

2015-05-02 Thread Patrick Wendell
Maybe I can help a bit. What happens when you call .map(my func) is that you create a MapPartitionsRDD that has a reference to that closure in it's compute() function. When a job is run (jobs are run as the result of RDD actions):

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Patrick Wendell
I'd also support this. In general, I think it's good that we try to have Spark support different versions of things (Hadoop, Hive, etc). But at some point you need to weigh the costs of doing so against the number of users affected. In the case of Java 6, we are seeing increasing cost from this.

Re: Mima test failure in the master branch?

2015-04-30 Thread Patrick Wendell
I reverted the patch that I think was causing this: SPARK-5213 Thanks On Thu, Apr 30, 2015 at 7:59 PM, zhazhan zzh...@hortonworks.com wrote: Any PR open for this? -- View this message in context:

Re: WebUI shows poor locality when task scheduling

2015-04-26 Thread Patrick Wendell
Hi Eric - please direct this to the user@ list. This list is for development of Spark itself. On Sun, Apr 26, 2015 at 1:12 AM, eric wong win19...@gmail.com wrote: Hi developers, I have sent to user mail list but no response... When running a exprimental KMeans job for expriment, the

Re: Design docs: consolidation and discoverability

2015-04-26 Thread Patrick Wendell
are on Google Docs. Perhaps Apache should consider opening up parts of the wiki to a larger group, to better serve this use case. Punya On Fri, Apr 24, 2015 at 5:01 PM Patrick Wendell pwend...@gmail.com wrote: Using our ASF git repository as a working area for design docs, it seems potentially

Reminder about Spark 1.4.0 deadline of May 1st

2015-04-25 Thread Patrick Wendell
Hey All, Just a friendly reminder that May 1st is the feature freeze for Spark 1.4, meaning major outstanding changes will need to land in the next week. After May 1st we'll package a release for testing and then go into the normal triage process where bugs are prioritized and some smaller

Re: Contributing Documentation Changes

2015-04-25 Thread Patrick Wendell
It is true that in the past we've posted community tutorials on the site. Spark has grown a lot since then and it might be a better fit at this point to curate community tutorials on the wiki (something like the powered by page) and link to them from the documentation website. The documentation

Re: Should we let everyone set Assignee?

2015-04-24 Thread Patrick Wendell
It's a bit of a digression - but Steve's suggestion that we have a mailing list for new issues is a great idea and we can do it easily. We could nave new-issues@s.a.o or something (we already have issues@s.a.o). - Patrick On Fri, Apr 24, 2015 at 9:50 AM, Ted Yu yuzhih...@gmail.com wrote: bq.

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Patrick Wendell
Using our ASF git repository as a working area for design docs, it seems potentially concerning to me. It's difficult process wise because all commits need to go through committers and also, we'd pollute our git history a lot with random incremental design updates. The git history is used a lot

Re: Should we let everyone set Assignee?

2015-04-22 Thread Patrick Wendell
One over arching issue is that it's pretty unclear what Assigned to X in JIAR means from a process perspective. Personally I actually feel it's better for this to be more historical - i.e. who ended up submitting a patch for this feature that was merged - rather than creating an exclusive

Re: Should we let everyone set Assignee?

2015-04-22 Thread Patrick Wendell
: Agreed. The Spark project and community that Vinod describes do not resemble the ones with which I am familiar. On Wed, Apr 22, 2015 at 1:20 PM, Patrick Wendell pwend...@gmail.com wrote: Hi Vinod, Thanks for you thoughts - However, I do not agree with your sentiment

Re: Should we let everyone set Assignee?

2015-04-22 Thread Patrick Wendell
at Apache. +Vinod On Apr 22, 2015, at 12:32 PM, Patrick Wendell pwend...@gmail.com wrote: One over arching issue is that it's pretty unclear what Assigned to X in JIAR means from a process perspective. Personally I actually feel it's better for this to be more historical - i.e. who ended up

Re: Spark 1.2.2 prebuilt release for Hadoop 2.4 didn't get deployed

2015-04-21 Thread Patrick Wendell
Good catch Olivier - I'll take care of it. Tracking this on SPARK-7027. On Tue, Apr 21, 2015 at 6:06 AM, Olivier Girardot o.girar...@lateral-thoughts.com wrote: Hi everyone, It seems the some of the Spark 1.2.2 prebuilt versions (I tested mainly for Hadoop 2.4 and later) didn't get deploy on

Re: Is spark-ec2 for production use?

2015-04-21 Thread Patrick Wendell
It could be a good idea to document this a bit. The original goals were to give people an easy way to get started with Spark and also to provide a consistent environment for our own experiments and benchmarking of Spark at the AMPLab. Over time I've noticed a huge amount of scope increase in terms

Announcing Spark 1.3.1 and 1.2.2

2015-04-17 Thread Patrick Wendell
Hi All, I'm happy to announce the Spark 1.3.1 and 1.2.2 maintenance releases. We recommend all users on the 1.3 and 1.2 Spark branches upgrade to these releases, which contain several important bug fixes. Download Spark 1.3.1 or 1.2.2: http://spark.apache.org/downloads.html Release notes:

[RESULT] [VOTE] Release Apache Spark 1.2.2

2015-04-16 Thread Patrick Wendell
on OS X +1 Sean On Apr 14, 2015, at 10:59 PM, Patrick Wendell pwend...@gmail.com wrote: I'd like to close this vote to coincide with the 1.3.1 release, however, it would be great to have more people test this release first. I'll leave it open for a bit longer and see if others can give

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-14 Thread Patrick Wendell
I'd like to close this vote to coincide with the 1.3.1 release, however, it would be great to have more people test this release first. I'll leave it open for a bit longer and see if others can give a +1. On Tue, Apr 14, 2015 at 9:55 PM, Patrick Wendell pwend...@gmail.com wrote: +1 from me ass

Re: [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-14 Thread Patrick Wendell
+1 from myself as well On Mon, Apr 13, 2015 at 8:35 PM, GuoQiang Li wi...@qq.com wrote: +1 (non-binding) -- Original -- From: Patrick Wendell;pwend...@gmail.com; Date: Sat, Apr 11, 2015 02:05 PM To: dev@spark.apache.orgdev@spark.apache.org; Subject

[RESULT] [VOTE] Release Apache Spark 1.3.1 (RC3)

2015-04-14 Thread Patrick Wendell
This vote passes with 10 +1 votes (5 binding) and no 0 or -1 votes. +1: Sean Owen* Reynold Xin* Krishna Sankar Denny Lee Mark Hamstra* Sean McNamara* Sree V Marcelo Vanzin GuoQiang Li Patrick Wendell* 0: -1: I will work on packaging this release in the next 48 hours. - Patrick

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-14 Thread Patrick Wendell
,1/14/15 SPARK-4888,Spark EC2 doesn't mount local disks for i2.8xlarge instances,,Open,1/27/15 SPARK-4879,Missing output partitions after job completes with speculative execution,Josh Rosen,Open,3/5/15 SPARK-4568,Publish release candidates under $VERSION-RCX instead of $VERSION,Patrick Wendell

Re: [VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-08 Thread Patrick Wendell
spark on yarn against hadoop 2.6. Tom On Wednesday, April 8, 2015 6:15 AM, Sean Owen so...@cloudera.com wrote: Still a +1 from me; same result (except that now of course the UISeleniumSuite test does not fail) On Wed, Apr 8, 2015 at 1:46 AM, Patrick Wendell pwend...@gmail.com

Re: [VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-08 Thread Patrick Wendell
:30 Patrick Wendell pwend...@gmail.com wrote: Hey Denny, I beleive the 2.4 bits are there. The 2.6 bits I had done specially (we haven't merge that into our upstream build script). I'll do it again now for RC2. - Patrick On Wed, Apr 8, 2015 at 1:53 PM, Timothy Chen tnac...@gmail.com wrote

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-07 Thread Patrick Wendell
) Ran standalone and yarn tests on the hadoop-2.6 tarball, with and without the external shuffle service in yarn mode. On Sat, Apr 4, 2015 at 5:09 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.1! The tag to be voted

[RESULT] [VOTE] Release Apache Spark 1.3.1

2015-04-07 Thread Patrick Wendell
to 1.3.x. - Josh Sent from my phone On Apr 7, 2015, at 4:13 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Today SPARK-6737 came to my attention. This is a bug that causes a memory leak for any long running program that repeatedly saves data out to a Hadoop FileSystem

[VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-07 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.1! The tag to be voted on is v1.3.1-rc2 (commit 7c4473a): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7c4473aa5a7f5de0323394aaedeefbf9738e8eb5 The list of fixes present in this release can be found

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-06 Thread Patrick Wendell
,,Open,3/24/15 SPARK-5098,Number of running tasks become negative after tasks lost,,Open,1/14/15 SPARK-4925,Publish Spark SQL hive-thriftserver maven artifact,Patrick Wendell,Reopened,3/23/15 SPARK-4922,Support dynamic allocation for coarse-grained Mesos,,Open,3/31/15 SPARK-4888,Spark EC2

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
as possible. On Mon, Apr 6, 2015 at 7:31 PM Patrick Wendell pwend...@gmail.com wrote: What if you don't run zinc? I.e. just download maven and run that mvn package It might take longer, but I wonder if it will work. On Mon, Apr 6, 2015 at 10:26 PM, mjhb sp...@mjhb.com wrote: Similar

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
attempt. Trying to build as clean as possible. On Mon, Apr 6, 2015 at 7:31 PM Patrick Wendell pwend...@gmail.com wrote: What if you don't run zinc? I.e. just download maven and run that mvn package It might take longer, but I wonder if it will work. On Mon, Apr 6, 2015 at 10:26 PM, mjhb

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
What if you don't run zinc? I.e. just download maven and run that mvn package It might take longer, but I wonder if it will work. On Mon, Apr 6, 2015 at 10:26 PM, mjhb sp...@mjhb.com wrote: Similar problem on 1.2 branch: [ERROR] Failed to execute goal on project spark-core_2.11: Could not

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
The only think that can persist outside of Spark is if there is still a live Zinc process. We took care to make sure this was a generally stateless mechanism. Both the 1.2.X and 1.3.X releases are built with Scala 2.11 for packaging purposes. And these have been built as recently as in the last

Re: 1.3 Build Error with Scala-2.11

2015-04-06 Thread Patrick Wendell
Hmm.. Make sure you are building with the right flags. I think you need to pass -Dscala-2.11 to maven. Take a look at the upstream docs - on my phone now so can't easily access. On Apr 7, 2015 1:01 AM, mjhb sp...@mjhb.com wrote: I even deleted my local maven repository (.m2) but still stuck

Re: Unit test logs in Jenkins?

2015-04-01 Thread Patrick Wendell
Hey Marcelo, Great question. Right now, some of the more active developers have an account that allows them to log into this cluster to inspect logs (we copy the logs from each run to a node on that cluster). The infrastructure is maintained by the AMPLab. I will put you in touch the someone

Re: hadoop input/output format advanced control

2015-03-25 Thread Patrick Wendell
and pullreq when i have some time. On Wed, Mar 25, 2015 at 1:23 AM, Patrick Wendell pwend...@gmail.com wrote: I see - if you look, in the saving functions we have the option for the user to pass an arbitrary Configuration. https://github.com/apache/spark/blob/master/core/src/main/scala/org

Re: hadoop input/output format advanced control

2015-03-25 Thread Patrick Wendell
Great - that's even easier. Maybe we could have a simple example in the doc. On Wed, Mar 25, 2015 at 7:06 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Regarding Patrick's question, you can just do new Configuration(oldConf) to get a cloned Configuration object and add any new properties to it.

Re: Any guidance on when to back port and how far?

2015-03-24 Thread Patrick Wendell
My philosophy has been basically what you suggested, Sean. One thing you didn't mention though is if a bug fix seems complicated, I will think very hard before back-porting it. This is because fixes can introduce their own new bugs, in some cases worse than the original issue. It's really bad to

Re: hadoop input/output format advanced control

2015-03-24 Thread Patrick Wendell
Yeah - to Nick's point, I think the way to do this is to pass in a custom conf when you create a Hadoop RDD (that's AFAIK why the conf field is there). Is there anything you can't do with that feature? On Tue, Mar 24, 2015 at 11:50 AM, Nick Pentreath nick.pentre...@gmail.com wrote: Imran, on

Experience using binary packages on various Hadoop distros

2015-03-24 Thread Patrick Wendell
Hey All, For a while we've published binary packages with different Hadoop client's pre-bundled. We currently have three interfaces to a Hadoop cluster (a) the HDFS client (b) the YARN client (c) the Hive client. Because (a) and (b) are supposed to be backwards compatible interfaces. My working

Re: enum-like types in Spark

2015-03-23 Thread Patrick Wendell
If the official solution from the Scala community is to use Java enums, then it seems strange they aren't generated in scaldoc? Maybe we can just fix that w/ Typesafe's help and then we can use them. On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen so...@cloudera.com wrote: Yeah the fully realized #4,

Re: enum-like types in Spark

2015-03-16 Thread Patrick Wendell
{ private[this] case object _MemoryOnly extends StorageLevel final val MemoryOnly: StorageLevel = _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I

Re: Wrong version on the Spark documentation page

2015-03-15 Thread Patrick Wendell
Cheng - what if you hold shift+refresh? For me the /latest link correctly points to 1.3.0 On Sun, Mar 15, 2015 at 10:40 AM, Cheng Lian lian.cs@gmail.com wrote: It's still marked as 1.2.1 here http://spark.apache.org/docs/latest/ But this page is updated (1.3.0)

[ANNOUNCE] Announcing Spark 1.3!

2015-03-13 Thread Patrick Wendell
Hi All, I'm happy to announce the availability of Spark 1.3.0! Spark 1.3.0 is the fourth release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! Visit the release notes [1] to read about the new features, or

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Patrick Wendell
on yarn on hadoop 2.6 in cluster and client mode. Tom On Thursday, March 5, 2015 8:53 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4): https://git-wip

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-08 Thread Patrick Wendell
We probably want to revisit the way we do binaries in general for 1.4+. IMO, something worth forking a separate thread for. I've been hesitating to add new binaries because people (understandably) complain if you ever stop packaging older ones, but on the other hand the ASF has complained that we

Re: Block Transfer Service encryption support

2015-03-08 Thread Patrick Wendell
I think that yes, longer term we want to have encryption of all communicated data. However Jeff, can you open a JIRA to discuss the design before opening a pull request (it's fine to link to a WIP branch if you'd like)? I'd like to better understand the performance and operational complexity of

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-08 Thread Patrick Wendell
. There the vendors can add the latest downloads - for example when 1.4 is released, HDP can build a release of HDP Spark 1.4 bundle. Cheers k/ On Sun, Mar 8, 2015 at 2:11 PM, Patrick Wendell pwend...@gmail.com wrote: We probably want to revisit the way we do binaries in general for 1.4+. IMO, something

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
Hey Sean, SPARK-5310 Update SQL programming guide for 1.3 SPARK-5183 Document data source API SPARK-6128 Update Spark Streaming Guide for Spark 1.3 For these, the issue is that they are documentation JIRA's, which don't need to be timed exactly with the release vote, since we can update the

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
affects a subset of build profiles. On Fri, Mar 6, 2015 at 6:43 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean, SPARK-5310 Update SQL programming guide for 1.3 SPARK-5183 Document data source API SPARK-6128 Update Spark Streaming Guide for Spark 1.3 For these, the issue

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
, Mar 6, 2015 at 9:17 PM, Patrick Wendell pwend...@gmail.com wrote: Sean, The docs are distributed and consumed in a fundamentally different way than Spark code itself. So we've always considered the deadline for doc changes to be when the release is finally posted. If there are small

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-06 Thread Patrick Wendell
I'll kick it off with a +1. On Thu, Mar 5, 2015 at 6:52 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 4aaf48d4): https://git-wip-us.apache.org/repos/asf?p=spark.git

Re: enum-like types in Spark

2015-03-05 Thread Patrick Wendell
= _MemoryOnly private[this] case object _DiskOnly extends StorageLevel final val DiskOnly: StorageLevel = _DiskOnly } On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell pwend...@gmail.com wrote: I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015

Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-04 Thread Patrick Wendell
consider https://issues.apache.org/jira/browse/SPARK-6144 a serious regression from 1.2 (since it affects existing addFile() functionality if the URL is hdfs:...). Will test other parts separately. On Tue, Mar 3, 2015 at 8:19 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Patrick Wendell
Hey Mingyu, I think it's broken out separately so we can record the time taken to serialize the result. Once we serializing it once, the second serialization should be really simple since it's just wrapping something that has already been turned into a byte buffer. Do you see a specific issue

Re: enum-like types in Spark

2015-03-04 Thread Patrick Wendell
I like #4 as well and agree with Aaron's suggestion. - Patrick On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson ilike...@gmail.com wrote: I'm cool with #4 as well, but make sure we dictate that the values should be defined within an object with the same name as the enumeration (like we do for

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Patrick Wendell
for the serialized task result shouldn¹t account for the majority of memory footprint anyways, I¹m okay with leaving it as is, then. Thanks, Mingyu On 3/4/15, 5:07 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Mingyu, I think it's broken out separately so we can record the time taken

[VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-03 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 3af2687): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3af26870e5163438868c4eb2df88380a533bb232 The release files, including signatures, digests, etc.

[RESULT] [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-03-03 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: Hey All, Just a quick updated on this thread. Issues have continued to trickle in. Not all of them are blocker level but enough to warrant another RC: I've been keeping the JIRA dashboard up and running with the latest status (sorry, long link): https

Re: spark-ec2 default to Hadoop 2

2015-03-01 Thread Patrick Wendell
Yeah calling it Hadoop 2 was a very bad naming choice (of mine!), this was back when CDH4 was the only real distribution available with some of the newer Hadoop API's and packaging. I think to not surprise people using this, it's best to keep v1 as the default. Overall, we try not to change

Re: UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Patrick Wendell
This has been around for multiple versions of Spark, so I am a bit surprised to see it not working in your build. - Patrick On Wed, Feb 25, 2015 at 9:41 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Cody, What build command are you using? In any case, we can actually comment out

Re: UnusedStubClass in 1.3.0-rc1

2015-02-25 Thread Patrick Wendell
Hey Cody, What build command are you using? In any case, we can actually comment out the unused thing now in the root pom.xml. It existed just to ensure that at least one dependency was listed in the shade plugin configuration (otherwise, some work we do that requires the shade plugin does not

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-25 Thread Patrick Wendell
:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) any ideas on this? Tom On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Patrick Wendell
So actually, the list of blockers on JIRA is a bit outdated. These days I won't cut RC1 unless there are no known issues that I'm aware of that would actually block the release (that's what the snapshot ones are for). I'm going to clean those up and push others to do so also. The main issues I'm

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: So actually, the list of blockers on JIRA is a bit outdated. These days I won't cut RC1 unless there are no known issues that I'm aware of that would actually block the release (that's what the snapshot ones are for). I'm going to clean those up and push

Merging code into branch 1.3

2015-02-18 Thread Patrick Wendell
Hey Committers, Now that Spark 1.3 rc1 is cut, please restrict branch-1.3 merges to the following: 1. Fixes for issues blocking the 1.3 release (i.e. 1.2.X regressions) 2. Documentation and tests. 3. Fixes for non-blocker issues that are surgical, low-risk, and/or outside of the core. If there

[VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-18 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2 The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-18 Thread Patrick Wendell
UISeleniumSuite: *** RUN ABORTED *** java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal ... This is a newer test suite. There is something flaky about it, we should definitely fix it, IMO it's not a blocker though. Patrick this link gives a 404:

Re: [Performance] Possible regression in rdd.take()?

2015-02-18 Thread Patrick Wendell
I believe the heuristic governing the way that take() decides to fetch partitions changed between these versions. It could be that in certain cases the new heuristic is worse, but it might be good to just look at the source code and see, for your number of elements taken and number of partitions,

Re: Replacing Jetty with TomCat

2015-02-17 Thread Patrick Wendell
Hey Niranda, It seems to me a lot of effort to support multiple libraries inside of Spark like this, so I'm not sure that's a great solution. If you are building an application that embeds Spark, is it not possible for you to continue to use Jetty for Spark's internal servers and use tomcat for

Re: driver fail-over in Spark streaming 1.2.0

2015-02-12 Thread Patrick Wendell
It will create and connect to new executors. The executors are mostly stateless, so the program can resume with new executors. On Wed, Feb 11, 2015 at 11:24 PM, lin kurtt@gmail.com wrote: Hi, all In Spark Streaming 1.2.0, when the driver fails and a new driver starts with the most updated

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-12 Thread Patrick Wendell
The map will start with a capacity of 64, but will grow to accommodate new data. Are you using the groupBy operator in Spark or are you using Spark SQL's group by? This usually happens if you are grouping or aggregating in a way that doesn't sufficiently condense the data created from each input

Re: How to track issues that must wait for Spark 2.x in JIRA?

2015-02-12 Thread Patrick Wendell
Yeah my preferred is also having a more open ended 2+ for issues that are clearly desirable but blocked by compatibility concerns. What I would really want to avoid is major feature proposals sitting around in our JIRA and tagged under some 2.X version. IMO JIRA isn't the place for thoughts about

[ANNOUNCE] Spark 1.3.0 Snapshot 1

2015-02-11 Thread Patrick Wendell
Hey All, I've posted Spark 1.3.0 snapshot 1. At this point the 1.3 branch is ready for community testing and we are strictly merging fixes and documentation across all components. The release files, including signatures, digests, etc can be found at:

Re: Powered by Spark: Concur

2015-02-10 Thread Patrick Wendell
Thanks Paolo - I've fixed it. On Mon, Feb 9, 2015 at 11:10 PM, Paolo Platter paolo.plat...@agilelab.it wrote: Hi, I checked the powered by wiki too and Agile Labs should be Agile Lab. The link is wrong too, it should be www.agilelab.it. The description is correct. Thanks a lot Paolo

Re: Mail to u...@spark.apache.org failing

2015-02-09 Thread Patrick Wendell
Ah - we should update it to suggest mailing the dev@ list (and if there is enough traffic maybe do something else). I'm happy to add you if you can give an organization name, URL, a list of which Spark components you are using, and a short description of your use case.. On Mon, Feb 9, 2015 at

Re: New Metrics Sink class not packaged in spark-assembly jar

2015-02-09 Thread Patrick Wendell
Hi Judy, If you have added source files in the sink/ source folder, they should appear in the assembly jar when you build. One thing I noticed is that you are looking inside the /dist folder. That only gets populated if you run make-distribution. The normal development process is just to do mvn

Re: Keep or remove Debian packaging in Spark?

2015-02-09 Thread Patrick Wendell
I have wondered whether we should sort of deprecated it more officially, since otherwise I think people have the reasonable expectation based on the current code that Spark intends to support complete Debian packaging as part of the upstream build. Having something that's sort-of maintained but no

Re: multi-line comment style

2015-02-09 Thread Patrick Wendell
can find Thanks Shivaram [1] https://google-styleguide.googlecode.com/svn/trunk/javaguide.html#s4.8.6.1-block-comment-style On Wed, Feb 4, 2015 at 2:05 PM, Patrick Wendell pwend...@gmail.com wrote: Personally I have no opinion, but agree it would be nice to standardize. - Patrick

[ANNOUNCE] Apache Spark 1.2.1 Released

2015-02-09 Thread Patrick Wendell
Hi All, I've just posted the 1.2.1 maintenance release of Apache Spark. We recommend all 1.2.0 users upgrade to this release, as this release includes stability fixes across all components of Spark. - Download this release: http://spark.apache.org/downloads.html - View the release notes:

Re: Improving metadata in Spark JIRA

2015-02-08 Thread Patrick Wendell
:52 PM Patrick Wendell pwend...@gmail.com wrote: Per Nick's suggestion I added two components: 1. Spark Submit 2. Spark Scheduler I figured I would just add these since if we decide later we don't want them, we can simply merge them into Spark Core. On Fri, Feb 6, 2015 at 11:53 AM

[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-08 Thread Patrick Wendell
This vote passes with 5 +1 votes (3 binding) and no 0 or -1 votes. +1 Votes: Krishna Sankar Sean Owen* Chip Senkbeil Matei Zaharia* Patrick Wendell* 0 Votes: (none) -1 Votes: (none) On Fri, Feb 6, 2015 at 5:12 PM, Patrick Wendell pwend...@gmail.com wrote: I'll add a +1 as well. On Fri, Feb

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-08 Thread Patrick Wendell
I'll add a +1 as well. On Fri, Feb 6, 2015 at 2:38 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested on Mac OS X. Matei On Feb 2, 2015, at 8:57 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.1

Unit tests

2015-02-08 Thread Patrick Wendell
Hey All, The tests are in a not-amazing state right now due to a few compounding factors: 1. We've merged a large volume of patches recently. 2. The load on jenkins has been relatively high, exposing races and other behavior not seen at lower load. For those not familiar, the main issue is

Re: Improving metadata in Spark JIRA

2015-02-06 Thread Patrick Wendell
Per Nick's suggestion I added two components: 1. Spark Submit 2. Spark Scheduler I figured I would just add these since if we decide later we don't want them, we can simply merge them into Spark Core. On Fri, Feb 6, 2015 at 11:53 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Do we

Re: PSA: Maven supports parallel builds

2015-02-05 Thread Patrick Wendell
I've done this in the past, but back when I wasn't using Zinc it didn't make a big difference. It's worth doing this in our jenkins environment though. - Patrick On Thu, Feb 5, 2015 at 4:52 PM, Dirceu Semighini Filho dirceu.semigh...@gmail.com wrote: Thanks Nicholas, I didn't knew this.

Re: 1.2.1-rc3 - Avro input format for Hadoop 2 broken/fix?

2015-02-04 Thread Patrick Wendell
Hi Markus, That won't be included in 1.2.1 most likely because the release votes have already started, and at that point we don't hold the release except for major regression issues from 1.2.0. However, if this goes through we can backport it into the 1.2 branch and it will end up in a future

Re: multi-line comment style

2015-02-04 Thread Patrick Wendell
Personally I have no opinion, but agree it would be nice to standardize. - Patrick On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com wrote: One thing Marcelo pointed out to me is that the // style does not interfere with commenting out blocks of code with /* */, which is a small

[ANNOUNCE] branch-1.3 has been cut

2015-02-03 Thread Patrick Wendell
Hey All, Just wanted to announce that we've cut the 1.3 branch which will become the 1.3 release after community testing. There are still some features that will go in (in higher level libraries, and some stragglers in spark core), but overall this indicates the end of major feature development

<    1   2   3   4   5   6   >