Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
to do this? On Tue, Oct 28, 2014 at 9:57 PM, Patrick Wendell pwend...@gmail.com wrote: I just started a totally fresh IntelliJ project importing from our root pom. I used all the default options and I added hadoop-2.4, hive, hive-0.13.1 profiles. I was able to run spark core tests from within

Re: HiveShim not found when building in Intellij

2014-10-28 Thread Patrick Wendell
Oops - I actually should have added v0.13.0 (i.e. to match whatever I did in the profile). On Tue, Oct 28, 2014 at 10:05 PM, Patrick Wendell pwend...@gmail.com wrote: Cheng - to make it recognize the new HiveShim for 0.12 I had to click on spark-hive under packages in the left pane, then go

Re: your weekly git timeout update! TL;DR: i'm now almost certain we're not hitting rate limits.

2014-10-24 Thread Patrick Wendell
Thanks for the update Shane. As a point of process, for things like this where we re debugging specific issues - can we use JIRA instead of notifying everyone on the spark-dev list? I'd prefer if ops/infra announcements on the dev list are restricted to things that are widely applicable to

Re: Moving PR Builder to mvn

2014-10-24 Thread Patrick Wendell
compilation to be much better than before I had it running. Is the sbt build still faster (sorry, long time since I did a build with sbt). Thanks, Hari On Fri, Oct 24, 2014 at 1:46 PM, Patrick Wendell pwend...@gmail.com wrote: Overall I think this would be a good idea. The main blocker

Re: scalastyle annoys me a little bit

2014-10-23 Thread Patrick Wendell
Hey Koert, I think disabling the style checks in maven package could be a good idea for the reason you point out. I was sort of mixed on that when it was proposed for this exact reason. It's just annoying to developers. In terms of changing the global limit, this is more religion than anything

Spark 1.2 feature freeze on November 1

2014-10-23 Thread Patrick Wendell
Hey All, Just a reminder that as planned [1] we'll go into a feature freeze on November 1. On that date I'll cut a 1.2 release branch and make the up-or-down call on any patches that go into that branch, along with individual committers. It is common for us to receive a very large volume of

Re: Which part of the code deals with communication?

2014-10-22 Thread Patrick Wendell
The best documentation about communication interfaces is the SecurityManager doc written by Tom Graves. With this as a starting point I'd recommend digging through the code for each component.

Re: something wrong with Jenkins or something untested merged?

2014-10-21 Thread Patrick Wendell
at 5:16 PM, Patrick Wendell pwend...@gmail.com (mailto:pwend...@gmail.com) wrote: The failure is in the Kinesis compoent, can you reproduce this if you build with -Pkinesis-asl? - Patrick On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu

Re: Get attempt number in a closure

2014-10-20 Thread Patrick Wendell
There is a deeper issue here which is AFAIK we don't even store a notion of attempt inside of Spark, we just use a new taskId with the same index. On Mon, Oct 20, 2014 at 12:38 PM, Yin Huai huaiyin@gmail.com wrote: Yeah, seems we need to pass the attempt id to executors through

Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Patrick Wendell
The failure is in the Kinesis compoent, can you reproduce this if you build with -Pkinesis-asl? - Patrick On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote: hmm, strange. i'll take a look. On Mon, Oct 20, 2014 at 5:11 PM, Nan Zhu zhunanmcg...@gmail.com wrote: yes, I

Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Patrick Wendell
that! shane On Mon, Oct 20, 2014 at 5:16 PM, Patrick Wendell pwend...@gmail.com wrote: The failure is in the Kinesis compoent, can you reproduce this if you build with -Pkinesis-asl? - Patrick On Mon, Oct 20, 2014 at 5:08 PM, shane knapp skn...@berkeley.edu wrote: hmm, strange. i'll take

Re: something wrong with Jenkins or something untested merged?

2014-10-20 Thread Patrick Wendell
I created an issue to fix this: https://issues.apache.org/jira/browse/SPARK-4021 On Mon, Oct 20, 2014 at 5:32 PM, Patrick Wendell pwend...@gmail.com wrote: Thanks Shane - we should fix the source code issues in the Kinesis code that made stricter Java compilers reject it. - Patrick On Mon

Re: Scalastyle improvements / large code reformatting

2014-10-13 Thread Patrick Wendell
to worry about. Instead, what you can do is as you edit a file when you're working on a feature, fix up style issues you see. Or, as Josh suggested, some way to make this apply only to new files would help. Matei On Oct 12, 2014, at 10:16 PM, Patrick Wendell pwend...@gmail.com wrote

Re: Scalastyle improvements / large code reformatting

2014-10-12 Thread Patrick Wendell
Another big problem with these patches are that they make it almost impossible to backport changes to older branches cleanly (there becomes like 100% chance of a merge conflict). One proposal is to do this: 1. We only consider new style rules at the end of a release cycle, when there is the

Re: Unneeded branches/tags

2014-10-07 Thread Patrick Wendell
Actually - weirdly - we can delete old tags and it works with the mirroring. Nick if you put together a list of un-needed tags I can delete them. On Tue, Oct 7, 2014 at 6:27 PM, Reynold Xin r...@databricks.com wrote: Those branches are no longer active. However, I don't think we can delete

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-04 Thread Patrick Wendell
Hey All, Just a couple notes. I recently posted a shell script for creating the AMI's from a clean Amazon Linux AMI. https://github.com/mesos/spark-ec2/blob/v3/create_image.sh I think I will update the AMI's soon to get the most recent security updates. For spark-ec2's purpose this is probably

Re: Extending Scala style checks

2014-10-01 Thread Patrick Wendell
Hey Nick, We can always take built-in rules. Back when we added this Prashant Sharma actually did some great work that lets us write our own style rules in cases where rules don't exist. You can see some existing rules here:

Re: do MIMA checking before all test cases start?

2014-09-25 Thread Patrick Wendell
, 2014 at 12:04 AM, Patrick Wendell wrote: Have you considered running the mima checks locally? We prefer people not use Jenkins for very frequent checks since it takes resources away from other people trying to run tests. On Wed, Sep 24, 2014 at 6:44 PM, Nan Zhu zhunanmcg...@gmail.com

Re: do MIMA checking before all test cases start?

2014-09-24 Thread Patrick Wendell
Have you considered running the mima checks locally? We prefer people not use Jenkins for very frequent checks since it takes resources away from other people trying to run tests. On Wed, Sep 24, 2014 at 6:44 PM, Nan Zhu zhunanmcg...@gmail.com wrote: Hi, all It seems that, currently, Jenkins

Re: hash vs sort shuffle

2014-09-22 Thread Patrick Wendell
Hey Cody, In terms of Spark 1.1.1 - we wouldn't change a default value in a spot release. Changing this to default is slotted for 1.2.0: https://issues.apache.org/jira/browse/SPARK-3280 - Patrick On Mon, Sep 22, 2014 at 9:08 AM, Cody Koeninger c...@koeninger.org wrote: Unfortunately we were

Re: BlockManager issues

2014-09-21 Thread Patrick Wendell
Hey the numbers you mentioned don't quite line up - did you mean PR 2711? On Sun, Sep 21, 2014 at 8:45 PM, Reynold Xin r...@databricks.com wrote: It seems like you just need to raise the ulimit? On Sun, Sep 21, 2014 at 8:41 PM, Nishkam Ravi nr...@cloudera.com wrote: Recently upgraded to

Re: BlockManager issues

2014-09-21 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: Hey the numbers you mentioned don't quite line up - did you mean PR 2711? On Sun, Sep 21, 2014 at 8:45 PM, Reynold Xin r...@databricks.com wrote: It seems like you just need to raise the ulimit? On Sun, Sep 21, 2014 at 8:41 PM, Nishkam Ravi nr

Re: greeting from new member and jira 3489

2014-09-16 Thread Patrick Wendell
Hi Mohit, Welcome to the Spark community! We normally look at feature proposals using github pull requests mind submitting one? The contribution process is covered here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Tue, Sep 16, 2014 at 9:16 PM, Mohit Jaggi

Tests and Test Infrastructure

2014-09-14 Thread Patrick Wendell
Hey All, Wanted to send a quick update about test infrastructure. With the number of contributors we have and the rate of development, maintaining a well-oiled test infra is really important. Every time a flaky test fails a legitimate pull request, it wastes developer time and effort. 1. Master

Re: Use Case of mutable RDD - any ideas around will help.

2014-09-12 Thread Patrick Wendell
[moving to user@] This would typically be accomplished with a union() operation. You can't mutate an RDD in-place, but you can create a new RDD with a union() which is an inexpensive operator. On Fri, Sep 12, 2014 at 5:28 AM, Archit Thakur archit279tha...@gmail.com wrote: Hi, We have a use

Re: Adding abstraction in MLlib

2014-09-12 Thread Patrick Wendell
We typically post design docs on JIRA's before major work starts. For instance, pretty sure SPARk-1856 will have a design doc posted shortly. On Fri, Sep 12, 2014 at 12:10 PM, Erik Erlandson e...@redhat.com wrote: Are interface designs being captured anywhere as documents that the community

Re: [RESULT] [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-11 Thread Patrick Wendell
count. Next time we can pipeline this work to avoid a delay. I did cut the v1.1.0 tag today. We should be able to do the full announce tomorrow. Thanks, Patrick On Sun, Sep 7, 2014 at 5:50 PM, Patrick Wendell pwend...@gmail.com wrote: This vote passes with 8 binding +1 votes and no -1 votes. I'll

Announcing Spark 1.1.0!

2014-09-11 Thread Patrick Wendell
I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 171 developers! This release brings operational and performance improvements in Spark core including a new

Re: parquet predicate / projection pushdown into unionAll

2014-09-09 Thread Patrick Wendell
I think what Michael means is people often use this to read existing partitioned Parquet tables that are defined in a Hive metastore rather than data generated directly from within Spark and then reading it back as a table. I'd expect the latter case to become more common, but for now most users

Re: [mllib] Add multiplying large scale matrices

2014-09-05 Thread Patrick Wendell
Hey There, I believe this is on the roadmap for the 1.2 next release. But Xiangrui can comment on this. - Patrick On Fri, Sep 5, 2014 at 9:18 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi Evan, That's sounds interesting. Here is the ticket which I created.

Re: amplab jenkins is down

2014-09-04 Thread Patrick Wendell
Hm yeah it seems that it hasn't been polling since 3:45. On Thu, Sep 4, 2014 at 4:21 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: It appears that our main man is having trouble hearing new requests. Do we need some smelling salts? On Thu, Sep 4, 2014 at 5:49 PM, shane knapp

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-03 Thread Patrick Wendell
I'm cancelling this release in favor of RC4. Happy voting! On Tue, Sep 2, 2014 at 9:55 PM, Patrick Wendell pwend...@gmail.com wrote: Thanks everyone for voting on this. There were two minor issues (one a blocker) were found that warrant cutting a new RC. For those who voted +1 on this release

[VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc4 (commit 2f9b2bd): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=2f9b2bd7844ee8393dc9c319f4fefedf95f5e460 The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Patrick Wendell
I'll kick it off with a +1 On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc4 (commit 2f9b2bd): https://git-wip-us.apache.org/repos/asf?p=spark.git

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Patrick Wendell
Hey Nick, Yeah we'll put those in the release notes. On Wed, Sep 3, 2014 at 7:23 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: On Wed, Sep 3, 2014 at 3:24 AM, Patrick Wendell pwend...@gmail.com wrote: == What default changes should I be aware of? == 1. The default value

Re: memory size for caching RDD

2014-09-03 Thread Patrick Wendell
Changing this is not supported, it si immutable similar to other spark configuration settings. On Wed, Sep 3, 2014 at 8:13 PM, 牛兆捷 nzjem...@gmail.com wrote: Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts,

Re: hey spark developers! intro from shane knapp, devops engineer @ AMPLab

2014-09-02 Thread Patrick Wendell
Hey Shane, Thanks for your work so far and I'm really happy to see investment in this infrastructure. This is a key productivity tool for us and something we'd love to expand over time to improve the development process of Spark. - Patrick On Tue, Sep 2, 2014 at 10:47 AM, Nicholas Chammas

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-02 Thread Patrick Wendell
not appear to be serious. On Sun, Aug 31, 2014 at 5:14 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: -1: I believe I've found a regression from 1.0.2. The report is captured in SPARK-. On Sat, Aug 30, 2014 at 6:07 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote

Re: Run the Big Data Benchmark for new releases

2014-09-01 Thread Patrick Wendell
Yeah, this wasn't detected in our performance tests. We even have a test in PySpark that I would have though might catch this (it just schedules a bunch of really small tasks, similar to the regression case). https://github.com/databricks/spark-perf/blob/master/pyspark-tests/tests.py#L51

Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-31 Thread Patrick Wendell
For my part I'm +1 on this, though Sean it would be great separately to fix the test environment. For those who voted on rc2, this is almost identical, so feel free to +1 unless you think there are issues with the two minor bug fixes. On Sun, Aug 31, 2014 at 10:18 AM, Sean Owen

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-30 Thread Patrick Wendell
Thanks to Nick Chammas and Cheng Lian who pointed out two issues with the release candidate. I'll cancel this in favor of RC3. On Fri, Aug 29, 2014 at 1:33 PM, Jeremy Freeman freeman.jer...@gmail.com wrote: +1. Validated several custom analysis pipelines on a private cluster in standalone mode.

[VOTE] Release Apache Spark 1.1.0 (RC3)

2014-08-30 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc3 (commit b2d0493b): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b2d0493b223c5f98a593bb6d7372706cc02bebad The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
, 2014 at 3:14 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc2 (commit 711aebb3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
to just think about over time... (oh and PS for my part I think it's a good thing that CDH4 binaries were removed. I wasn't arguing for resurrecting them) On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell pwend...@gmail.com wrote: Hey Sean, The reason there are no longer CDH-specific builds

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
, Patrick Wendell pwend...@gmail.com wrote: Hey Sean, The reason there are no longer CDH-specific builds is that all newer versions of CDH and HDP work with builds for the upstream Hadoop projects. I dropped CDH4 in favor of a newer Hadoop version (2.4) and the Hadoop-without-Hive (also

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
formatting and typographical errors in the SQL docs that I've fixed in this PR. Dunno if we want to roll that into the release. On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com wrote: Okay I'll plan to add cdh4 binary as well for the final release! --- sent from my

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
the default Spark version in spark-ec2 be updated for this release? Nick On Fri, Aug 29, 2014 at 12:55 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Nicholas, Thanks for this, we can merge in doc changes outside of the actual release timeline, so we'll make sure to loop those

Re: Compie error with XML elements

2014-08-29 Thread Patrick Wendell
In some cases IntelliJ's Scala compiler can't compile valid Scala source files. Hopefully they fix (or have fixed) this in a newer version. - Patrick On Fri, Aug 29, 2014 at 11:38 AM, Yi Tian tianyi.asiai...@gmail.com wrote: Hi, Devl! I got the same problem. You can try to upgrade your

[VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc1 (commit f0718324): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f07183249b74dd857069028bf7d570b35f265585 The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Patrick Wendell
: Is SPARK-3277 applicable to 1.1 ? If yes, until it is fixed, I am -1 on the release (I am on break, so can't verify or help fix, sorry). Regards Mridul On 28-Aug-2014 9:33 pm, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark

Re: [VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Patrick Wendell
Okay I'm cancelling this vote in favor of RC2. On Thu, Aug 28, 2014 at 3:27 PM, Mridul Muralidharan mri...@gmail.com wrote: Thanks for being on top of this Patrick ! And apologies for not being able to help more. Regards, Mridul On Aug 29, 2014 1:30 AM, Patrick Wendell pwend...@gmail.com

[VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc2 (commit 711aebb3): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327 The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-28 Thread Patrick Wendell
I'll kick off the vote with a +1. On Thu, Aug 28, 2014 at 7:14 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc2 (commit 711aebb3): https://git-wip-us.apache.org/repos/asf?p

Re: Handling stale PRs

2014-08-27 Thread Patrick Wendell
Hey Nishkam, To some extent we already have this process - many community members help review patches and some earn a reputation where committer's will take an LGTM from them seriously. I'd be interested in seeing if any other projects recognize people who do this. - Patrick On Wed, Aug 27,

Re: Handling stale PRs

2014-08-26 Thread Patrick Wendell
Hey Nicholas, Thanks for bringing this up. There are a few dimensions to this... one is that it's actually precedurally difficult for us to close pull requests. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
One other idea - when things freeze up, try to run jstack on the spark shell process and on the executors and attach the results. It could be that somehow you are encountering a deadlock somewhere. On Mon, Aug 25, 2014 at 1:26 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Was the original

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Patrick Wendell
opening that JIRA issue with them. Nick On Mon, Aug 11, 2014 at 12:52 PM, Patrick Wendell pwend...@gmail.com wrote: I spent some time on this and I'm not sure either of these is an option, unfortunately. We typically can't use custom JIRA plug-in's because this JIRA is controlled

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
Hey Amnon, So just to make sure I understand - you also saw the same issue with 1.0.2? Just asking because whether or not this regresses the 1.0.2 behavior is important for our own bug tracking. - Patrick On Mon, Aug 25, 2014 at 10:22 PM, Amnon Khen amnon...@gmail.com wrote: There were no

[SNAPSHOT] Snapshot2 of Spark 1.1 has been posted

2014-08-21 Thread Patrick Wendell
Hi All, I've packaged and published a snapshot release of Spark 1.1 for testing. This is very close to RC1 and we are distributing it for testing. Please test this and report any issues on this thread. The tag of this release is v1.1.0-snapshot1 (commit e1535ad3):

Re: [SNAPSHOT] Snapshot2 of Spark 1.1 has been posted

2014-08-21 Thread Patrick Wendell
The docs for this release are also available here: http://people.apache.org/~pwendell/spark-1.1.0-snapshot2-docs/ On Thu, Aug 21, 2014 at 1:12 AM, Patrick Wendell pwend...@gmail.com wrote: Hi All, I've packaged and published a snapshot release of Spark 1.1 for testing. This is very close

Tests failing

2014-08-15 Thread Patrick Wendell
Hi All, I noticed that all PR tests run overnight had failed due to timeouts. The patch that updates the netty shuffle I believe somehow inflated to the build time significantly. That patch had been tested, but one change was made before it was merged that was not tested. I've reverted the patch

Re: Tests failing

2014-08-15 Thread Patrick Wendell
. Is there anyway we can fix that ? On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote: Hi All, I noticed that all PR tests run overnight had failed due to timeouts. The patch that updates the netty shuffle I believe somehow inflated to the build time significantly. That patch

Re: Tests failing

2014-08-15 Thread Patrick Wendell
run. Okie doke. Perhaps then I'll wrap the run-tests step as you suggest and limit it to 100 minutes or something, and cleanly report if it times out. Sound good? On Fri, Aug 15, 2014 at 4:43 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Nicholas, Yeah so Jenkins has it's own timeout

Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-14 Thread Patrick Wendell
I commented on the bug. For driver mode, you'll need to get the corresponding version of spark-submit for Spark 1.0.2. On Thu, Aug 14, 2014 at 3:43 PM, Gary Malouf malouf.g...@gmail.com wrote: To be clear, is it 'compiled' against 1.0.2 or it packaged with it? On Thu, Aug 14, 2014 at 6:39

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-11 Thread Patrick Wendell
://marketplace-cdn.atlassian.com/files/images/com.atlassian.jira.plugins.jira-bitbucket-connector-plugin/86ff1a21-44fb-4227-aa4f-44c77aec2c97.png that might be nice to have for heavy JIRA users. Nick On Sun, Jul 20, 2014 at 12:50 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah it needs

Re: Unit tests in 5 minutes

2014-08-08 Thread Patrick Wendell
I dug around this a bit a while ago, I think if someone sat down and profiled the tests it's likely we could find some things to optimize. In particular, there may be overheads in starting up a local spark context that could be minimized and speed up all the tests. Also, there are some tests

Re: Unit tests in 5 minutes

2014-08-08 Thread Patrick Wendell
machine while also running another Spark shell, I've noticed that the test logs fill up with errors when the web UI attempts to bind to the default port, fails, and tries a higher one. - Josh On August 8, 2014 at 11:54:24 AM, Patrick Wendell (pwend...@gmail.com) wrote: I dug around this a bit

[SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-07 Thread Patrick Wendell
Hi All, I've packaged and published a snapshot release of Spark 1.1 for testing. This is being distributed to the community for QA and preview purposes. It is not yet an official RC for voting. Going forward, we'll do preview releases like this for testing ahead of official votes. The tag of

Re: [SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-07 Thread Patrick Wendell
Minor correction: the encoded URL in the staging repo link was wrong. The correct repo is: https://repository.apache.org/content/repositories/orgapachespark-1025/ On Wed, Aug 6, 2014 at 11:23 PM, Patrick Wendell pwend...@gmail.com wrote: Hi All, I've packaged and published a snapshot release

Re: Unit test best practice for Spark-derived projects

2014-08-07 Thread Patrick Wendell
In the past I've found if I do a jstack when running some tests, it sits forever inside of a hostname resolution step or something. I never narrowed it down, though. - Patrick On Thu, Aug 7, 2014 at 10:45 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: Thanks. let me check this hypothesis (i

Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Patrick Wendell
it doesn't seem immediately related to that ticket. Should I file a bug report? On Thu, Jul 31, 2014 at 1:33 AM, Patrick Wendell pwend...@gmail.com wrote: The third issue may be related to this: https://issues.apache.org/jira/browse/SPARK-2022 We can take a look

Re: Fine-Grained Scheduler on Yarn

2014-08-07 Thread Patrick Wendell
The current YARN is equivalent to what is called fine grained mode in Mesos. The scheduling of tasks happens totally inside of the Spark driver. On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu liuj...@cn.ibm.com wrote: Any one know the answer? Best Regards *Jun Feng Liu* IBM China Systems

Re: Fine-Grained Scheduler on Yarn

2014-08-07 Thread Patrick Wendell
scheduling at the granularity of individual cores. On Thu, Aug 7, 2014 at 9:43 PM, Patrick Wendell pwend...@gmail.com wrote: The current YARN is equivalent to what is called fine grained mode in Mesos. The scheduling of tasks happens totally inside of the Spark driver. On Thu, Aug 7, 2014 at 7:50

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Patrick Wendell
:15 AM, Patrick Wendell pwend...@gmail.com wrote: For hortonworks, I believe it should work to just link against the corresponding upstream version. I.e. just set the Hadoop version to 2.4.0 Does that work? - Patrick On Mon, Aug 4, 2014 at 12:13 AM, Ron's Yahoo! zlgonza

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Patrick Wendell
... Thanks, Ron On Aug 4, 2014, at 10:01 AM, Ron's Yahoo! zlgonza...@yahoo.com wrote: That failed since it defaulted the versions for yarn and hadoop I'll give it a try with just 2.4.0 for both yarn and hadoop... Thanks, Ron On Aug 4, 2014, at 9:44 AM, Patrick Wendell pwend...@gmail.com

Re: -1s on pull requests?

2014-08-03 Thread Patrick Wendell
1. Include the commit hash in the tests have started/completed messages, so that it's clear what code exactly is/has been tested for each test cycle. Great idea - I think this is easy to do given the current architecture. We already have access to the commit ID in the same script

Re: Low Level Kafka Consumer for Spark

2014-08-03 Thread Patrick Wendell
I'll let TD chime on on this one, but I'm guessing this would be a welcome addition. It's great to see community effort on adding new streams/receivers, adding a Java API for receivers was something we did specifically to allow this :) - Patrick On Sat, Aug 2, 2014 at 10:09 AM, Dibyendu

Re: Scala 2.11 external dependencies

2014-08-03 Thread Patrick Wendell
Hey Anand, Thanks for looking into this - it's great to see momentum towards Scala 2.11 and I'd love if this land in Spark 1.2. For the external dependencies, it would be good to create a sub-task of SPARK-1812 to track our efforts encouraging other projects to upgrade. In certain cases (e.g.

ASF JIRA is down for maintenance

2014-08-02 Thread Patrick Wendell
Please don't let this prevent you from merging patches, just keep a list and we can update the JIRA later. - Patrick

branch-1.1 of Spark has been cut

2014-08-02 Thread Patrick Wendell
Hey All, I'm happy to announce branch-1.1 of Spark [1] - this branch will eventually become the 1.1 release. Committers: new patches will need to be explicitly back-ported into this branch in order to appear in the 1.1 release. Thanks so much to all the committers and contributors who were

Re: replacement for SPARK_JAVA_OPTS

2014-07-31 Thread Patrick Wendell
org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest

Re: replacement for SPARK_JAVA_OPTS

2014-07-31 Thread Patrick Wendell
The third issue may be related to this: https://issues.apache.org/jira/browse/SPARK-2022 We can take a look at this during the bug fix period for the 1.1 release next week. If we come up with a fix we can backport it into the 1.0 branch also. On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell

Re: replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Patrick Wendell
Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13,

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
in the main asf hive repo github mirror. On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014

Re: 'Proper' Build Tool

2014-07-28 Thread Patrick Wendell
Yeah for packagers we officially recommend using maven. Spark's dependency graph is very complicated and Maven and SBT use different conflict resolution strategies, so we've opted to official support Maven. SBT is still around though and it's used more often by day-to-day developers. - Patrick

Github mirroring is running behind

2014-07-28 Thread Patrick Wendell
https://issues.apache.org/jira/browse/INFRA-8116 Just a heads up, the github mirroring is running behind. You can follow that JIRA to keep up to date on the fix. In the mean time you can use the Apache git itself: https://git-wip-us.apache.org/repos/asf/spark.git Some people have reported

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Patrick Wendell
SPARK-2677 is a long standing issue and not a regression. Also, as far as I can see there is no patch for it or clear understanding of the cause. This type of bug does not warrant holding a release. If we fix SPARK-2677 we can just make another release with the fix. On Sun, Jul 27, 2014 at 10:47

branch-1.1 will be cut on Friday

2014-07-27 Thread Patrick Wendell
Hey All, Just a heads up, we'll cut branch-1.1 on this Friday, August 1st. Once the release branch is cut we'll start community QA and go into the normal triage process for merging patches into that branch. For Spark core, we'll be conservative in merging things past the freeze date (e.g. high

Re: Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-27 Thread Patrick Wendell
release version number to individual classes. Using newer hadoop releases would encourage pushing bug fixes / new features upstream. Ultimately Spark code would become cleaner. Cheers On Sun, Jul 27, 2014 at 8:52 AM, Patrick Wendell pwend...@gmail.com wrote: Ted - technically I think you

Re: new JDBC server test cases seems failed ?

2014-07-27 Thread Patrick Wendell
I'm going to revert it again - Cheng can you try to look into this? Thanks. On Sun, Jul 27, 2014 at 6:06 PM, Nan Zhu zhunanmcg...@gmail.com wrote: it's 20 minutes ago https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17259/consoleFull -- Nan Zhu On Sunday, July 27, 2014

Re: Can I translate the documentations of Spark in Japanese?

2014-07-27 Thread Patrick Wendell
Hey Yu, I think we could definitely put a pointer to documentation in other languages that is hosted somewhere welse, but since we are not in a position to maintain this, I'm not sure we could merge it into the mainline Spark codebase. I'd be interested to know what other projects do about this

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-25 Thread Patrick Wendell
The most important issue in this release is actually an ammendment to an earlier fix. The original fix caused a deadlock which was a regression from 1.0.0-1.0.1: Issue: https://issues.apache.org/jira/browse/SPARK-1097 1.0.1 Fix: https://github.com/apache/spark/pull/1273/files (had a deadlock)

Re: Suggestion for SPARK-1825

2014-07-25 Thread Patrick Wendell
Yeah I agree reflection is the best solution. Whenever we do reflection we should clearly document in the code which YARN API version corresponds to which code path. I'm guessing since YARN is adding new features... we'll just have to do this over time. - Patrick On Fri, Jul 25, 2014 at 3:35 PM,

Re: Dynamic variables in Spark

2014-07-22 Thread Patrick Wendell
Shivaram, You should take a look at this patch which adds support for naming accumulators - this is likely to get merged in soon. I actually started this patch by supporting named TaskMetrics similar to what you have there, but then I realized there is too much semantic overlap with accumulators,

SPARK-1199 has been reverted in branch-1.0

2014-07-21 Thread Patrick Wendell
Just a note - there was a fix in branch-1.0 (and Spark 1.0.1) that introduced a new bug worse than the original one. https://issues.apache.org/jira/browse/SPARK-1199 The original bug was an issue with case classes defined in the repl. The fix caused a separate bug which broke most compound

Re: -1s on pull requests?

2014-07-21 Thread Patrick Wendell
I've always operated under the assumption that if a commiter makes a comment on a PR, and that's not addressed, that should block the PR from being merged (even without a specific -1). I don't know of any cases where this has intentionally been violated, but I do think this happens accidentally

Re: Pull requests will be automatically linked to JIRA when submitted

2014-07-20 Thread Patrick Wendell
/49e472744951d875627d78b0d6e93cd139232929 - Patrick On Sun, Jul 20, 2014 at 8:06 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: That's pretty neat. How does it work? Do we just need to put the issue ID (e.g. SPARK-1234) anywhere in the pull request? Nick On Sat, Jul 19, 2014 at 11:10 PM, Patrick Wendell pwend

Pull requests will be automatically linked to JIRA when submitted

2014-07-19 Thread Patrick Wendell
Just a small note, today I committed a tool that will automatically mirror pull requests to JIRA issues, so contributors will no longer have to manually post a pull request on the JIRA when they make one. It will create a link on the JIRA and also make a comment to trigger an e-mail to people

<    1   2   3   4   5   6   >