Hi Kang,
You raise a good point. Spark does not automatically cache all your RDDs.
Why? Simply because the application may create many RDDs, and not all of
them are to be reused. After all, there is only so much memory available to
each executor, and caching an RDD adds some overhead especially
: Friday, June 27, 2014 10:08 AM
To: user
Subject: Re: About StorageLevel
Thank u Andrew, that's very helpful.
I still have some doubts on a simple trial: I opened a spark shell in local
mode,
and typed in
val r=sc.parallelize(0 to 50)
val r2=r.keyBy(x=x).groupByKey(10)
and then I invoked
stage, it behaves like there is a
persist(StorageLevel.DISk_ONLY) called implicitly?
Regards,
Kang Liu
From: Liu, Raymond
Date: 2014-06-27 11:02
To: user@spark.apache.org
Subject: RE: About StorageLevel
I think there is a shuffle stage involved. And the future count job will
depends on the first