Re: The latest master branch didn't compile with -Phive?

2015-07-10 Thread Ted Yu
Compilation on master branch has been fixed. Thanks to Cheng Lian. On Thu, Jul 9, 2015 at 8:50 AM, Josh Rosen rosenvi...@gmail.com wrote: Jenkins runs compile-only builds for Maven as an early warning system for this type of issue; you can see from

Model parallelism with RDD

2015-07-10 Thread Ulanov, Alexander
Hi, I am interested how scalable can be the model parallelism within Spark. Suppose, the model contains N weights of type Double and N is so large that does not fit into the memory of a single node. So, we can store the model in RDD[Double] within several nodes. To train the model, one needs

language-independent RDD Spark core code?

2015-07-10 Thread Vasili I. Galchin
I am looking at R side, but curious what the RDD core side looks like. Not sure which directory to look inside. ?? Thanks, Vasili - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail:

Re: language-independent RDD Spark core code?

2015-07-10 Thread Vasili I. Galchin
think I found this RDD code On Fri, Jul 10, 2015 at 7:00 PM, Vasili I. Galchin vigalc...@gmail.com wrote: I am looking at R side, but curious what the RDD core side looks like. Not sure which directory to look inside. ?? Thanks, Vasili

Re: Model parallelism with RDD

2015-07-10 Thread Shivaram Venkataraman
I think you need to do `newRDD.cache()` and `newRDD.count` before you do oldRDD.unpersist(true) -- Otherwise it might be recomputing all the previous iterations each time. Thanks Shivaram On Fri, Jul 10, 2015 at 7:44 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, I am interested

Foundation policy on releases and Spark nightly builds

2015-07-10 Thread Sean Busbey
Hi Folks! I noticed that Spark website's download page lists nightly builds and instructions for accessing SNAPSHOT maven artifacts[1]. The ASF policy on releases expressly forbids this kind of publishing outside of the dev@spark community[2]. If you'd like to discuss having the policy updated

Re: Model parallelism with RDD

2015-07-10 Thread Shivaram Venkataraman
Yeah I can see that being the case -- caching implies creating objects that will be stored in memory. So there is a trade-off between storing data in memory but having to garbage collect it later vs. recomputing the data. Shivaram On Fri, Jul 10, 2015 at 9:49 PM, Ulanov, Alexander

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-10 Thread Sean McNamara
+1 Sean On Jul 8, 2015, at 11:55 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to be

Re: [VOTE] Release Apache Spark 1.4.1 (RC4)

2015-07-10 Thread Tom Graves
+1 Tom On Thursday, July 9, 2015 12:55 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.1! This release fixes a handful of known issues in Spark 1.4.0, listed here: http://s.apache.org/spark-1.4.1 The tag to

Re: Questions about Fault tolerance of Spark

2015-07-10 Thread MIKE HYNES
Gentle bump on this topic; how to test the fault tolerance and previous benchmark results are both things we are interested in as well.  Mike div Original message /divdivFrom: 牛兆捷 nzjem...@gmail.com /divdivDate:07-09-2015 04:19 (GMT-05:00) /divdivTo: dev@spark.apache.org,

Re: PySpark vs R

2015-07-10 Thread Shivaram Venkataraman
The R and Python implementations differ in how they communicate with the JVM so there is no invariant there per-se. Thanks Shivaram On Thu, Jul 9, 2015 at 10:40 PM, Vasili I. Galchin vigalc...@gmail.com wrote: Hello, Just trying to get up to speed ( a week .. pls be patient with me).