[ 
https://issues.apache.org/jira/browse/PIG-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687624#comment-13687624
 ] 

Aniket Mokashi commented on PIG-3288:
-------------------------------------

Implementation is generic that the counter need not count number of files. It 
can really count arbitrary metrics and kill the job if exceeded. Should we 
rename the counter from "pig.exec.created.files.max.limit" to something else?
Also, in the storefunc, you are relying on the fact that for each new file 
storefunc is reinitialize in a new object. Is it a guaranteed behavior?
                
> Kill jobs if the number of output files is over a configurable limit
> --------------------------------------------------------------------
>
>                 Key: PIG-3288
>                 URL: https://issues.apache.org/jira/browse/PIG-3288
>             Project: Pig
>          Issue Type: Wish
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.12
>
>         Attachments: PIG-3288-2.patch, PIG-3288-3.patch, PIG-3288-4.patch, 
> PIG-3288.patch
>
>
> I ran into a situation where a Pig job tried to create too many files on hdfs 
> and overloaded NN. To prevent such events, it would be nice if we could set a 
> upper limit on the number of files that a Pig job can create.
> In fact, Hive has a property called "hive.exec.max.created.files". The idea 
> is that each mapper/reducer increases a counter every time when they create 
> files. Then, MRLauncher periodically checks whether the number of created 
> files so far has exceeded the upper limit. If so, we kill running jobs and 
> exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to