[
https://issues.apache.org/jira/browse/PIG-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552261#comment-13552261
]
Jarek Jarcec Cecho commented on PIG-3002:
-----------------------------------------
Hi Bill,
again thank you for your comment. I appreciate your input. Please do not
understand me wrong, I'll be more than happy to change my patch by including
your suggestions. However I would like to make sure that we're on the same page
first.
I've dug into Hadoop source code and the call getCounter(Enum) is defined in
[1]. This call is just forwarded to findCounter(Enum) defined in
AbstractCounters in [2]. This method will firstly check if the counter exists
in object "cache", if so, then this object is returned without any exception
being raised. If the object is not present, then it call method
findCounter(String, String) that will eventually create the counter and throw
CountersExceededException exception if needed.
Let's assume for example that we have following counters with maximal counter
limit set to 2:
* A => 1
* B => 2
If we try to call method getCounter(Enum) with values A, B and C, then we will
get:
* A => 1
* B => 2
* C => CountersExceededException, because C is not in map "cache" and thus the
code will try to create new counter and fail on configured limitation.
By this example, I'm trying to explain that the method getCounter(Enum) won't
throw CountersExceededException for all subsequent calls after the Counters
gets full, but just for those that do not yet exists. I'm also trying to show
that the method computeWarningAggregate by itself won't affect generated
aggregates.
You're definitely right that newly introduced function getCounterValue() is
driven by single caller and that it's not the best way to do it. I've tried to
put the code directly inside computeWarningAggregate(), but I failed on
compilation error as CountersExceededException is not available in Hadoop 1.0.
Therefore I've moved the code into the shim layer.
I've defined the method getCounterValue() to return long instead of Counter
object to narrow down the purpose to just get the long value of the counters,
not the counter object itself. I believe that in such case, swallowing the
exception is reasonable, because in such case the counter do not exists there
and thus it's value is 0. Maybe providing descriptive javadoc to method
getCounterValue() would help here?
Jarcec
Links:
1:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Counters.java#L516
2:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/AbstractCounters.java#L163
> Pig client should handle CountersExceededException
> --------------------------------------------------
>
> Key: PIG-3002
> URL: https://issues.apache.org/jira/browse/PIG-3002
> Project: Pig
> Issue Type: Bug
> Reporter: Bill Graham
> Assignee: Jarek Jarcec Cecho
> Labels: newbie, simple
> Attachments: PIG-3002.patch
>
>
> Running a pig job that uses more than 120 counters will succeed, but a grunt
> exception will occur when trying to output counter info to the console. This
> exception should be caught and handled with friendly messaging:
> {noformat}
> org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected
> error during execution.
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1275)
> at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
> at org.apache.pig.PigServer.execute(PigServer.java:1239)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:333)
> at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:136)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:197)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> at org.apache.pig.Main.run(Main.java:604)
> at org.apache.pig.Main.main(Main.java:154)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.hadoop.mapred.Counters$CountersExceededException:
> Error: Exceeded limits on number of counters - Counters=120 Limit=120
> at
> org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:312)
> at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:431)
> at org.apache.hadoop.mapred.Counters.getCounter(Counters.java:495)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.computeWarningAggregate(MapReduceLauncher.java:707)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:442)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira