Hi All,

My spark streaming jobs are filling up the disk within a short amount of
time < 10 mins. I have a disk space of 10GB and it is getting full
at SPARK_LOCAL_DIRS location. In my case SPARK_LOCAL_DIRS is set to
/usr/local/spark/temp.

There are lot of files like this input-0-1489072623600 and each file is
somewhere between 3MB-8MB.

I have a batch interval of 1 second however some jobs are taking like
35seconds and I am not entirely sure why it is taking so long but in the
driver UI I can see it is pointing to a transformation where I am using
window interval of 1 minute and slide interval of 1 second. The line that
the driver UI points to is rdd.collect(). I believe I am doing a very basic
transformation.


Eventually after 10 minutes jobs will start failing because there is No
space left on the device.

Any ideas?

I am using Spark 2.1.0 in a stand alone cluster

Thanks,

Reply via email to