Re: Caching small Rdd's take really long time and Spark seems frozen

2018-08-24 Thread Sonal Goyal
Without knowing too much about your application, it would be hard to say. Maybe it is working faster in local as there is no shuffling etc? The spark.ui would be your best bet to know what stage is slowing things down. On Fri 24 Aug, 2018, 3:26 PM Guillermo Ortiz, wrote: > Another test I just

Re: Caching small Rdd's take really long time and Spark seems frozen

2018-08-24 Thread Guillermo Ortiz
Another test I just did it's to execute with local[X] and this problem doesn't happen. Communication problems? 2018-08-23 22:43 GMT+02:00 Guillermo Ortiz : > it's a complex DAG before the point I cache the RDD, they are some joins, > filter and maps before caching data, but most of the times it

Re: Caching small Rdd's take really long time and Spark seems frozen

2018-08-23 Thread Guillermo Ortiz
it's a complex DAG before the point I cache the RDD, they are some joins, filter and maps before caching data, but most of the times it doesn't take almost time to do it. I could understand if it would take the same time all the times to process or cache the data. Besides it seems random and they

Re: Caching small Rdd's take really long time and Spark seems frozen

2018-08-23 Thread Sonal Goyal
How are these small RDDs created? Could the blockage be in their compute creation instead of their caching? Thanks, Sonal Nube Technologies On Thu, Aug 23, 2018 at 6:38 PM, Guillermo Ortiz wrote: > I use spark with caching with

Caching small Rdd's take really long time and Spark seems frozen

2018-08-23 Thread Guillermo Ortiz
I use spark with caching with persist method. I have several RDDs what I cache but some of them are pretty small (about 300kbytes). Most of time it works well and usually lasts 1s the whole job, but sometimes it takes about 40s to store 300kbytes to cache. If I go to the SparkUI->Cache, I can see