subject:"Advantage of using cache\(\)"

Re: Advantage of using cache()

2014-08-23 Thread Patrick Wendell

shuffle data to disk . So the only diffrence with caching or no-caching version is : .map { case (x, (n, i)) = (x, n)} - Thanks, Nieyuan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Advantage-of-using-cache-tp12480p12634.html Sent from the Apache

Re: Advantage of using cache()

2014-08-22 Thread Nieyuan

Because map-reduce tasks like join will save shuffle data to disk . So the only diffrence with caching or no-caching version is : .map { case (x, (n, i)) = (x, n)} - Thanks, Nieyuan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Advantage-of-using

Re: Advantage of using cache()

2014-08-21 Thread Grzegorz Białek

Hi, thank you for your response. I removed issues you mentioned. Now I read RDDs from files, whole rdd is cached, I don't use random and rdd1 and rdd2 are identical. RDDs that are joined contains 100k entries and result contains 10m entries. rdd1 and rdd2 after join also contains 10m entries. Here

Advantage of using cache()

2014-08-20 Thread Grzegorz Białek

Hi, I tried to write small program which shows that using cache() can speed up execution but results with and without cache were similar. Could help me with this issue? I tried to compute rdd and use it later in two places and I thought in second usage this rdd is recomputed but it doesn't:

Re: Advantage of using cache()

2014-08-20 Thread Patrick Wendell

Your rdd2 and rdd3 differ in two ways so it's hard to track the exact effect of caching. In rdd3, in addition to the fact that rdd will be cached, you are also doing a bunch of extra random number generation. So it will be hard to isolate the effect of caching. On Wed, Aug 20, 2014 at 7:48 AM,

Re: Advantage of using cache()

Re: Advantage of using cache()

Re: Advantage of using cache()

Advantage of using cache()

Re: Advantage of using cache()

5 matches

Site Navigation

Mail list logo

Footer information