I use spark with caching with persist method. I have several RDDs what I
cache but some of them are pretty small (about 300kbytes). Most of time it
works well and usually lasts 1s the whole job, but sometimes it takes about
40s to store 300kbytes to cache.

If I go to the SparkUI->Cache, I can see how the percentage is increasing
until 83% (250kbytes) and then it stops for a while. If I check the event
time in the Spark UI I can see that when this happen there is a node where
tasks takes very long time. This node could be any from the cluster, it's
not always the same.

In the spark executor logs I can see it's that it takes about 40s in store
3.7kb when this problem occurs

    INFO  2018-08-23 12:46:58 Logging.scala:54 -
org.apache.spark.storage.BlockManager: Found block rdd_1705_23 locally
    INFO  2018-08-23 12:47:38 Logging.scala:54 -
org.apache.spark.storage.memory.MemoryStore: Block rdd_1692_7 stored as
bytes in memory (estimated size 3.7 KB, free 1048.0 MB)
    INFO  2018-08-23 12:47:38 Logging.scala:54 -
org.apache.spark.storage.BlockManager: Found block rdd_1692_7 locally

I have tried with MEMORY_ONLY, MEMORY_AND_SER and so on with the same
results. I have checked the IO disk (although if I use memory_only I guess
that it doesn't have sense) and I can't see any problem. This happens
randomly, but it could be in the 25% of the jobs.

Any idea about what it could be happening?

Reply via email to