Re: Why cached RDD is recomputed again?

shahab Wed, 18 Feb 2015 07:59:53 -0800

Thanks Sean, but I don't think that fitting into memory  is the case,
because:
1- I can see in the UI that 100% of RDD is cached, (moreover the RDD is
quite small, 100 MB, while worker has 1.5 GB)
2- I also tried  MEMORY_AND_DISK, but absolutely no difference !


Probably I have messed up somewhere else!
Do you have any other idea where I should look for the cause?

best,
/Shahab

On Wed, Feb 18, 2015 at 4:22 PM, Sean Owen <so...@cloudera.com> wrote:

> The mostly likely explanation is that you wanted to put all the
> partitions in memory and they don't all fit. Unless you asked to
> persist to memory or disk, some partitions will simply not be cached.
>
> Consider using MEMORY_OR_DISK persistence.
>
> This can also happen if blocks were lost due to node failure.
>
> On Wed, Feb 18, 2015 at 3:19 PM, shahab <shahab.mok...@gmail.com> wrote:
> > Hi,
> >
> > I have a cached RDD (I can see in UI that it is cached), but when I use
> this
> > RDD , I can see that the RDD is partially recomputed (computed) again.
> It is
> > "partially" because I can see in UI that some task are skipped (have a
> look
> > at the attached figure).
> >
> > Now the question is 1: what causes a cached RDD to be recomputed again?
> and
> > why somes tasks are skipped and some not??
> >
> > best,
> > /Shahab
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
>

Re: Why cached RDD is recomputed again?

Reply via email to