Interesting. TD, can you please throw some light on why this is and point
to  the relevant code in Spark repo. It will help in a better understanding
of things that can affect a long running streaming job.
On Aug 21, 2015 1:44 PM, "Tathagata Das" <t...@databricks.com> wrote:

> Could you periodically (say every 10 mins) run System.gc() on the driver.
> The cleaning up shuffles is tied to the garbage collection.
>
>
> On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma <sharmagaura...@gmail.com>
> wrote:
>
>> Hi All,
>>
>>
>> I have a 24x7 running Streaming Process, which runs on 2 hour windowed
>> data
>>
>> The issue i am facing is my worker machines are running OUT OF DISK space
>>
>> I checked that the SHUFFLE FILES are not getting cleaned up.
>>
>>
>> /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438-87a2-b0f2becfac6f/blockmgr-c905b93b-c817-4124-a774-be1e706768c1//00/shuffle_2739_5_0.data
>>
>> Ultimately the machines runs out of Disk Spac
>>
>>
>> i read about *spark.cleaner.ttl *config param which what i can
>> understand from the documentation, says cleans up all the metadata beyond
>> the time limit.
>>
>> I went through https://issues.apache.org/jira/browse/SPARK-5836
>> it says resolved, but there is no code commit
>>
>> Can anyone please throw some light on the issue.
>>
>>
>>
>

Reply via email to