[ 
https://issues.apache.org/jira/browse/PIG-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885700#action_12885700
 ] 

Richard Ding commented on PIG-1389:
-----------------------------------

New patch to address above comments.

bq. 2. In PigRecordReader, initialization of Counters should be done in 
initialize() instead of getCurrentValue() that will avoid branching for every 
call of getCurrentValue.

The problem here is that Map Reduce Framework calls RecordReader.initialize 
before calling Mapper.setup wherein Pig sets up the counter factory 
(PigStatusReporter). Therefore Counters cannot be initialized inside the 
RecordReader.initialize method.

> Implement Pig counter to track number of rows for each input files 
> -------------------------------------------------------------------
>
>                 Key: PIG-1389
>                 URL: https://issues.apache.org/jira/browse/PIG-1389
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>         Attachments: PIG-1389.patch, PIG-1389.patch, PIG-1389_1.patch, 
> PIG-1389_2.patch
>
>
> A MR job generated by Pig not only can have multiple outputs (in the case of 
> multiquery) but also can have multiple inputs (in the case of join or 
> cogroup). In both cases, the existing Hadoop counters (e.g. 
> MAP_INPUT_RECORDS, REDUCE_OUTPUT_RECORDS) can not be used to count the number 
> of records in the given input or output.  PIG-1299 addressed the case of 
> multiple outputs.  We need to add new counters for jobs with multiple inputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to