[ 
https://issues.apache.org/jira/browse/SPARK-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692692#comment-14692692
 ] 

Pat Ferrel commented on SPARK-4796:
-----------------------------------

Why is this marked resolved? Spark does indeed leave around a lot of files and 
unless you are looking you'd never know. It sounds like the only safe method to 
remove these is to shutdown Spark and delete them.

I skimmed the issue so sorry if I missed something. 15G on the MBP and counting 
:-)


> Spark does not remove temp files
> --------------------------------
>
>                 Key: SPARK-4796
>                 URL: https://issues.apache.org/jira/browse/SPARK-4796
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 1.1.0
>         Environment: I'm runnin spark on mesos and mesos slaves are docker 
> containers. Spark 1.1.0, elasticsearch spark 2.1.0-Beta3, mesos 0.20.0, 
> docker 1.2.0.
>            Reporter: Ian Babrou
>
> I started a job that cannot fill into memory and got "no space left on 
> device". That was fair, because docker containers only have 10gb of disk 
> space and some is taken by OS already.
> But then I found out when job failed it didn't release any disk space and 
> left container without any free disk space.
> Then I decided to check if spark removes temp files in any case, because many 
> mesos slaves had /tmp/spark-local-*. Apparently some garbage stays after 
> spark task is finished. I attached with strace to running job:
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/12/temp_8a73fcc2-4baa-499a-8add-0161f918de8a")
>  = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/31/temp_47efd04b-d427-4139-8f48-3d5d421e9be4")
>  = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/15/temp_619a46dc-40de-43f1-a844-4db146a607c6")
>  = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/05/temp_d97d90a7-8bc1-4742-ba9b-41d74ea73c36"
>  <unfinished ...>
> [pid 30212] <... unlink resumed> )      = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/36/temp_a2deb806-714a-457a-90c8-5d9f3247a5d7")
>  = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/04/temp_afd558f1-2fd0-48d7-bc65-07b5f4455b22")
>  = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/32/temp_a7add910-8dc3-482c-baf5-09d5a187c62a"
>  <unfinished ...>
> [pid 30212] <... unlink resumed> )      = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/21/temp_485612f0-527f-47b0-bb8b-6016f3b9ec19")
>  = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/12/temp_bb2b4e06-a9dd-408e-8395-f6c5f4e2d52f")
>  = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/1e/temp_825293c6-9d3b-4451-9cb8-91e2abe5a19d"
>  <unfinished ...>
> [pid 30212] <... unlink resumed> )      = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/15/temp_43fbb94c-9163-4aa7-ab83-e7693b9f21fc")
>  = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/3d/temp_37f3629c-1b09-4907-b599-61b7df94b898"
>  <unfinished ...>
> [pid 30212] <... unlink resumed> )      = 0
> [pid 30212] 
> unlink("/tmp/spark-local-20141209091330-48b5/35/temp_d18f49f6-1fb1-4c01-a694-0ee0a72294c0")
>  = 0
> And after job is finished, some files are still there:
> /tmp/spark-local-20141209091330-48b5/
> /tmp/spark-local-20141209091330-48b5/11
> /tmp/spark-local-20141209091330-48b5/11/shuffle_0_1_4
> /tmp/spark-local-20141209091330-48b5/32
> /tmp/spark-local-20141209091330-48b5/04
> /tmp/spark-local-20141209091330-48b5/05
> /tmp/spark-local-20141209091330-48b5/0f
> /tmp/spark-local-20141209091330-48b5/0f/shuffle_0_1_2
> /tmp/spark-local-20141209091330-48b5/3d
> /tmp/spark-local-20141209091330-48b5/0e
> /tmp/spark-local-20141209091330-48b5/0e/shuffle_0_1_1
> /tmp/spark-local-20141209091330-48b5/15
> /tmp/spark-local-20141209091330-48b5/0d
> /tmp/spark-local-20141209091330-48b5/0d/shuffle_0_1_0
> /tmp/spark-local-20141209091330-48b5/36
> /tmp/spark-local-20141209091330-48b5/31
> /tmp/spark-local-20141209091330-48b5/12
> /tmp/spark-local-20141209091330-48b5/21
> /tmp/spark-local-20141209091330-48b5/10
> /tmp/spark-local-20141209091330-48b5/10/shuffle_0_1_3
> /tmp/spark-local-20141209091330-48b5/1e
> /tmp/spark-local-20141209091330-48b5/35
> If I look into my mesos slaves, there are mostly "shuffle" files, overall 
> picture for single node:
> root@web338:~# find /tmp/spark-local-20141* -type f | fgrep shuffle | wc -l
> 781
> root@web338:~# find /tmp/spark-local-20141* -type f | fgrep -v shuffle | wc -l
> 10
> root@web338:~# find /tmp/spark-local-20141* -type f | fgrep -v shuffle
> /tmp/spark-local-20141119144512-67c4/2d/temp_9056f380-3edb-48d6-a7df-d4896f1e1cc3
> /tmp/spark-local-20141119144512-67c4/3d/temp_e005659b-eddf-4a34-947f-4f63fcddf111
> /tmp/spark-local-20141119144512-67c4/16/temp_71eba702-36b4-4e1a-aebc-20d2080f1705
> /tmp/spark-local-20141119144512-67c4/0d/temp_8037b9db-2d8a-4786-a554-a8cad922bf5e
> /tmp/spark-local-20141119144512-67c4/24/temp_f0e4cc43-6cc9-42a7-882d-f8a031fa4dc3
> /tmp/spark-local-20141119144512-67c4/29/temp_a8bbe2cb-f590-4b71-8ef8-9c0324beddc7
> /tmp/spark-local-20141119144512-67c4/3a/temp_9fc08519-f23a-40ac-a3fd-e58df6871460
> /tmp/spark-local-20141119144512-67c4/1e/temp_d66668ab-2999-48af-a136-84cfd6f5f6cb
> /tmp/spark-local-20141205110922-f78e/0a/temp_7409add5-e6ff-46e5-ae3f-6a4c7b2ddf8f
> /tmp/spark-local-20141205111026-0b53/01/temp_72024c94-7512-4692-8bd1-ef2417143d8c
> Conclusions:
> 1. Shuffle files should be removed, but they stay. 
> 2. Temp files should always be removed, but they stay.
> Maybe we should unlink temp and shuffle files immediately after creation to 
> remove them even if spark fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to