Re: Force Partitioner to use entire entry of PairRDD as key

2016-02-22 Thread Jay Luan
Thank you, that helps a lot. On Mon, Feb 22, 2016 at 6:01 PM, Takeshi Yamamuro <linguin@gmail.com> wrote: > You're correct, reduceByKey is just an example. > > On Tue, Feb 23, 2016 at 10:57 AM, Jay Luan <jaylu...@gmail.com> wrote: > >> Could you elaborate on

Re: Force Partitioner to use entire entry of PairRDD as key

2016-02-22 Thread Jay Luan
Could you elaborate on how this would work? So from what I can tell, this maps a key to a tuple which always has a 0 as the second element. From there the hash widely changes because we now hash something like ((1,4), 0) and ((1,3), 0). Thus mapping this would create more even partitions. Why

RE: [MLLIB] Best way to extract RandomForest decision splits

2016-02-10 Thread Jay Luan
Thanks for the reply, I'd like to export the decision splits for each tree out to an external file which is read elsewhere not using spark. As far as I know, saving a model to a path will save a bunch of binary files which can be loaded back into spark. Is this correct? On Feb 10, 2016 7:21 PM,

Re: How to run two operations on the same RDD simultaneously

2015-11-25 Thread Jay Luan
Ah, thank you so much, this is perfect On Fri, Nov 20, 2015 at 3:48 PM, Ali Tajeldin EDU wrote: > You can try to use an Accumulator ( > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.Accumulator) > to keep count in map1. Note that the final