Hi Abhi - are you running on Mesos perchance?

If so then with spark <1.6 you will be hitting https://issues.apache.org/jira/browse/SPARK-10975
With spark >= 1.6:
https://issues.apache.org/jira/browse/SPARK-12430
and also be aware of:
https://issues.apache.org/jira/browse/SPARK-12583

On 25/01/2016 07:14, Abhishek Anand wrote:
Hi All,

How long the shuffle files and data files are stored on the block manager folder of the workers.

I have a spark streaming job with window duration of 2 hours and slide interval of 15 minutes.

When I execute the following command in my block manager path

find . -type f -cmin +150 -name "shuffle*" -exec ls {} \;

I see a lot of files which means that they are not getting cleared which I was expecting that they should get cleared.

Subsequently, this size keeps on increasing and takes space on the disk.

Please suggest how to get rid of this and help on understanding this behaviour.



Thanks !!!
Abhi

--
*Adrian Bridgett* | Sysadmin Engineer, OpenSignal <http://www.opensignal.com>
_____________________________________________________
Office: 3rd Floor, The Angel Office, 2 Angel Square, London, EC1V 1NY
Phone #: +44 777-377-8251
Skype: abridgett |@adrianbridgett <http://twitter.com/adrianbridgett>| LinkedIn link <https://uk.linkedin.com/in/abridgett>
_____________________________________________________

Reply via email to