Re: all values for a key must fit in memory

2014-05-25 Thread Nilesh
this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/all-values-for-a-key-must-fit-in-memory-tp6342p6791.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: all values for a key must fit in memory

2014-05-25 Thread Andrew Ash
for this for the meantime? I'm out of ideas. Thanks, Nilesh -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/all-values-for-a-key-must-fit-in-memory-tp6342p6791.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: all values for a key must fit in memory

2014-05-25 Thread Nilesh
of which thankfully works with 0.9.1 too, no new API changes there. Cheers, Nilesh -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/all-values-for-a-key-must-fit-in-memory-tp6342p6794.html Sent from the Apache Spark Developers List mailing list archive

Re: all values for a key must fit in memory

2014-05-25 Thread Nilesh
OK for me here, though it might turn out to be slow. Cheers, Nilesh PS: Can't wait for 1.0! ^_^ Looks like it's been RC10 till now. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/all-values-for-a-key-must-fit-in-memory-tp6342p6796.html Sent from

Re: all values for a key must fit in memory

2014-04-21 Thread Sandy Ryza
Thanks Matei and Mridul - was basically wondering whether we would be able to change the shuffle to accommodate this after 1.0, and from your answers it sounds like we can. On Mon, Apr 21, 2014 at 12:31 AM, Mridul Muralidharan mri...@gmail.comwrote: As Matei mentioned, the Values is now an

Re: all values for a key must fit in memory

2014-04-20 Thread Mridul Muralidharan
An iterator does not imply data has to be memory resident. Think merge sort output as an iterator (disk backed). Tom is actually planning to work on something similar with me on this hopefully this or next month. Regards, Mridul On Sun, Apr 20, 2014 at 11:46 PM, Sandy Ryza

Re: all values for a key must fit in memory

2014-04-20 Thread Sandy Ryza
The issue isn't that the Iterator[P] can't be disk-backed. It's that, with a groupBy, each P is a (Key, Values) tuple, and the entire tuple is read into memory at once. The ShuffledRDD is agnostic to what goes inside P. On Sun, Apr 20, 2014 at 11:36 AM, Mridul Muralidharan