Thanks Sean, but I don't think that fitting into memory is the case, because: 1- I can see in the UI that 100% of RDD is cached, (moreover the RDD is quite small, 100 MB, while worker has 1.5 GB) 2- I also tried MEMORY_AND_DISK, but absolutely no difference !
Probably I have messed up somewhere else! Do you have any other idea where I should look for the cause? best, /Shahab On Wed, Feb 18, 2015 at 4:22 PM, Sean Owen <so...@cloudera.com> wrote: > The mostly likely explanation is that you wanted to put all the > partitions in memory and they don't all fit. Unless you asked to > persist to memory or disk, some partitions will simply not be cached. > > Consider using MEMORY_OR_DISK persistence. > > This can also happen if blocks were lost due to node failure. > > On Wed, Feb 18, 2015 at 3:19 PM, shahab <shahab.mok...@gmail.com> wrote: > > Hi, > > > > I have a cached RDD (I can see in UI that it is cached), but when I use > this > > RDD , I can see that the RDD is partially recomputed (computed) again. > It is > > "partially" because I can see in UI that some task are skipped (have a > look > > at the attached figure). > > > > Now the question is 1: what causes a cached RDD to be recomputed again? > and > > why somes tasks are skipped and some not?? > > > > best, > > /Shahab > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org >