[
https://issues.apache.org/jira/browse/TEZ-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774811#comment-17774811
]
Mudit Sharma commented on TEZ-4518:
-----------------------------------
[~rajesh.balamohan] , correct this can be dependent on applications, but on a
cluster level we wanted it to be some high number just as a guardrail, since
sort buffer size is almost same, this will guardrail amount of data being
written by a task and will guardrail amount of data on a node. On need basis,
sort buffer size and this limit can be configured at a job level
Does that make sense?
> Limit number of spill files getting created
> -------------------------------------------
>
> Key: TEZ-4518
> URL: https://issues.apache.org/jira/browse/TEZ-4518
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Mudit Sharma
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Hi,
>
> We have been facing some issues where many of our cluster node disks go full
> because of some rogue applications creating a lot of spill data
> We wanted to fail the app if more than a threshold amount of spill files are
> written
> Please let us know if any such capability is supported
>
> If the capability is not there, we are proposing it to support it via a
> config, we have added a PR for the same:
> https://github.com/apache/tez/pull/312, please let us know your thoughts on it
--
This message was sent by Atlassian Jira
(v8.20.10#820010)