[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115120#comment-13115120 ] Dmitriy V. Ryaboy commented on PIG-2208: It'd be nice if we could flip that bit automatically during runtime, but I suppose that requires changes to MR code. Ok, you have my begrudging +1 :) > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115040#comment-13115040 ] Daniel Dai commented on PIG-2208: - I cannot agree more on this. User will have to run a handicapped Pig in some cases, which is not good. However, I cannot find a workaround easier than the proposed option. "local aggregation" is a good addition to this approach. When counter is disabled, user can at least check log. But counters are handier than logs, so keeping counters makes sense. "with [no]counters" also makes sense when user want finer control. But an overall control should still be there in case user don't want to change script. I will add a comment in the pig.properties to explain "pig.disable.counter" option. > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115003#comment-13115003 ] Daniel Dai commented on PIG-2208: - Thanks, Dmitriy. We can do something in addition in this case, but I just don't see any downside to provide a way to disable input counter. How do you think? > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115010#comment-13115010 ] Dmitriy V. Ryaboy commented on PIG-2208: I just think that we keep relying on the crutch of providing a user-configurable option for things that shouldn't be issues in the first place. That's not a long-term strategy, and causes user confusion. > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114963#comment-13114963 ] Dmitriy V. Ryaboy commented on PIG-2208: I still don't like it but it sounds like I am in the minority. Can you add the new properties with default values and a bit of docs to pig.properties? > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114925#comment-13114925 ] Daniel Dai commented on PIG-2208: - It seems no matter what else we want to do, option 2 is a good addition. I am going to commit the patch, objection? > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089330#comment-13089330 ] Richard Ding commented on PIG-2208: --- It only logs once per job in the front end so that user is informed that the multi-inputs (or outputs) counters are disabled. In the back-end the counters are simply disabled without logging. > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083882#comment-13083882 ] Dmitriy V. Ryaboy commented on PIG-2208: This is just trading one issue for another. If we use too many counters, the job is killed by limits. If we don't, we spam the logs and the tasks are killed for using too much local disk. We should at least do local aggregation -- keep counters local to task (a simple map), and log what we would otherwise put in counters. > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > Attachments: PIG-2208.patch > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2208) Restrict number of PIG generated Haddop counters
[ https://issues.apache.org/jira/browse/PIG-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081990#comment-13081990 ] Dmitriy V. Ryaboy commented on PIG-2208: Can I propose that in addition to the above, we augment the grammar of store and load funcs with a "with [no]counters"? Sometimes you know which relations you care about and which you do not. > Restrict number of PIG generated Haddop counters > - > > Key: PIG-2208 > URL: https://issues.apache.org/jira/browse/PIG-2208 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Richard Ding >Assignee: Richard Ding > Fix For: 0.9.1 > > > PIG 8.0 implemented Hadoop counters to track the number of records read for > each input and the number of records written for each output (PIG-1389 & > PIG-1299). On the other hand, Hadoop has imposed limit on per job counters > (MAPREDUCE-1943) and jobs will fail if the counters exceed the limit. > Therefore we need a way to cap the number of PIG generated counters. > Here are the two options: > 1. Add a integer property (e.g., pig.counter.limit) to the pig property file > (e.g., 20). If the number of inputs of a job exceeds this number, the input > counters are disabled. Similarly, if the number of outputs of a job exceeds > this number, the output counters are disabled. > 2. Add a boolean property (e.g., pig.disable.counters) to the pig property > file (default: false). If this property is set to true, then the PIG > generated counters are disabled. > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira