[ https://issues.apache.org/jira/browse/PIG-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-3288: ------------------------------- Attachment: PIG-3288.patch Attached is a patch that I implemented the following: * I added a new property called {{pig.exec.hdfs.files.max.limit}}. * When this property is enabled, MRLauncher monitors a counter ({{CREATED_FILES_COUTNER}}) periodically. * Since how many files are created by a mapper/reducer is RecordWriter-specific, each storage is responsible for increasing the counter properly. As a reference example, I added code that increases the counter in {{PigLineRecordWriter}} for PigStorage. > Kill jobs if the number of output files is over a configurable limit > -------------------------------------------------------------------- > > Key: PIG-3288 > URL: https://issues.apache.org/jira/browse/PIG-3288 > Project: Pig > Issue Type: Wish > Reporter: Cheolsoo Park > Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3288.patch > > > I ran into a situation where a Pig job tried to create too many files on hdfs > and overloaded NN. To prevent such events, it would be nice if we could set a > upper limit on the number of files that a Pig job can create. > In fact, Hive has a property called "hive.exec.max.created.files". The idea > is that each mapper/reducer increases a counter every time when they create > files. Then, MRLauncher periodically checks whether the number of created > files so far has exceeded the upper limit. If so, we kill running jobs and > exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira