Re: Welcoming two new committers

2016-02-08 Thread Corey Nolet
Congrats guys! On Mon, Feb 8, 2016 at 12:23 PM, Ted Yu wrote: > Congratulations, Herman and Wenchen. > > On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia > wrote: > >> Hi all, >> >> The PMC has recently added two new Spark committers -- Herman van

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread Corey Nolet
David, Thank you very much for announcing this! It looks like it could be very useful. Would you mind providing a link to the github? On Tue, Jan 12, 2016 at 10:03 AM, David wrote: > Hi all, > > I'd like to share news of the recent release of a new Spark

Re: Forecasting Library For Apache Spark

2015-09-21 Thread Corey Nolet
Mohamed, Have you checked out the Spark Timeseries [1] project? Non-seasonal ARIMA was added to this recently and seasonal ARIMA should be following shortly. [1] https://github.com/cloudera/spark-timeseries On Mon, Sep 21, 2015 at 7:47 AM, Mohamed Baddar wrote: >

Re: MongoDB and Spark

2015-09-11 Thread Corey Nolet
Unfortunately, MongoDB does not directly expose its locality via its client API so the problem with trying to schedule Spark tasks against it is that the tasks themselves cannot be scheduled locally on nodes containing query results- which means you can only assume most results will be sent over

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Corey Nolet
+1 (non-binding) - Verified signatures - Built on Mac OS X and Fedora 21. On Mon, Mar 9, 2015 at 11:01 PM, Krishna Sankar ksanka...@gmail.com wrote: Excellent, Thanks Xiangrui. The mystery is solved. Cheers k/ On Mon, Mar 9, 2015 at 3:30 PM, Xiangrui Meng men...@gmail.com wrote:

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Corey Nolet
This vote was supposed to close on Saturday but it looks like no PMCs voted (other than the implicit vote from Patrick). Was there a discussion offline to cut an RC2? Was the vote extended? On Mon, Feb 23, 2015 at 6:59 AM, Robin East robin.e...@xense.co.uk wrote: Running ec2 launch scripts

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Corey Nolet
SPARK-5183 SPARK-5180 Document data source API SPARK-3650 Triangle Count handles reverse edges incorrectly SPARK-3511 Create a RELEASE-NOTES.txt file in the repo On Mon, Feb 23, 2015 at 1:55 PM, Corey Nolet cjno...@gmail.com wrote: This vote was supposed to close on Saturday but it looks

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Corey Nolet
+1 (non-binding) - Verified signatures using [1] - Built on MacOSX Yosemite - Built on Fedora 21 Each build was run with and Hadoop-2.4 version with yarn, hive, and hive-thriftserver profiles I am having trouble getting all the tests passing on a single run on both machines but we have this

Re: Replacing Jetty with TomCat

2015-02-17 Thread Corey Nolet
Niranda, I'm not sure if I'd say Spark's use of Jetty to expose its UI monitoring layer constitutes a use of two web servers in a single product. Hadoop uses Jetty as well as do many other applications today that need embedded http layers for serving up their monitoring UI to users. This is

Re: Welcoming three new committers

2015-02-03 Thread Corey Nolet
Congrats guys! On Tue, Feb 3, 2015 at 7:01 PM, Evan Chan velvia.git...@gmail.com wrote: Congrats everyone!!! On Tue, Feb 3, 2015 at 3:17 PM, Timothy Chen tnac...@gmail.com wrote: Congrats all! Tim On Feb 4, 2015, at 7:10 AM, Pritish Nawlakhe prit...@nirvana-international.com

Re: Spark SQL API changes and stabilization

2015-01-15 Thread Corey Nolet
Reynold, One thing I'd like worked into the public portion of the API is the json inferencing logic that creates a Set[(String, StructType)] out of Map[String,Any]. SPARK-5260 addresses this so that I can use Accumulators to infer my schema instead of forcing a map/reduce phase to occur on an RDD

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Corey Nolet
I was actually about to post this myself- I have a complex join that could benefit from something like a GroupComparator vs having to do multiple grouyBy operations. This is probably the wrong thread for a full discussion on this but I didn't see a JIRA ticket for this or anything similar- any

Spark Hadoop 2.5.1

2014-11-14 Thread Corey Nolet
I noticed Spark 1.2.0-SNAPSHOT still has 2.4.x in the pom. Since 2.5.x is the current stable Hadoop 2.x, would it make sense for us to update the poms?

Re: Spark Hadoop 2.5.1

2014-11-14 Thread Corey Nolet
specialization needed beyond that. The profile sets hadoop.version to 2.4.0 by default, but this can be overridden. On Fri, Nov 14, 2014 at 3:43 PM, Corey Nolet cjno...@gmail.com wrote: I noticed Spark 1.2.0-SNAPSHOT still has 2.4.x in the pom. Since 2.5.x is the current stable Hadoop 2.x

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
+1 (non-binding) [for original process proposal] Greg, the first time I've seen the word ownership on this thread is in your message. The first time the word lead has appeared in this thread is in your message as well. I don't think that was the intent. The PMC and Committers have a

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
PMC [1] is responsible for oversight and does not designate partial or full committer. There are projects where all committers become PMC and others where PMC is reserved for committers with the most merit (and willingness to take on the responsibility of project oversight, releases, etc...).

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
I'm actually going to change my non-binding to +0 for the proposal as-is. I overlooked some parts of the original proposal that, when reading over them again, do not sit well with me. one of the maintainers needs to sign off on each patch to the component, as Greg has pointed out, does seem to

Re: Raise Java dependency from 6 to 7

2014-10-19 Thread Corey Nolet
A concrete plan and a definite version upon which the upgrade would be applied sounds like it would benefit the community. If you plan far enough out (as Hadoop has done) and give the community enough of a notice, I can't see it being a problem as they would have ample time upgrade. On Sat, Oct