[ 
https://issues.apache.org/jira/browse/PIG-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111798#comment-15111798
 ] 

liyunzhang_intel commented on PIG-4784:
---------------------------------------

In mr mode, there is hadoop 
api(https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/Task.Counter.html)
 to calculate the MAP_INPUT_RECORDS and REDUCE_OUTPUT_RECORDS. But when in 
multiple inputs and outputs case, there is no hadoop api to calculate the 
MAP_INPUT_RECORDS and REDUCE_OUTPUT_RECORDS of each file.
When there are multiple inputs, in mr mode, pig counts once reading each 
record(https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java#L148)
 of an input file.
When there are multiple outputs, in mr mode, pig counts once getting the result 
of 
POStore(https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStore.java#L170).
So in mr mode, "pig.disable.counter" is only suitable for for multiple inputs 
and multiple outputs case.

In spark mode, there is no spark api to calculate the input and output records 
of single input and output. In PIG-4655 and PIG-4634 we implemented counter. So 
in spark mode, whether in single or multiple inputs, the counter will be 
disabled and 
the record number of input and output is always -1 when pig.disable.counter is 
true.

> Enable "pig.disable.counter“ for spark engine
> ---------------------------------------------
>
>                 Key: PIG-4784
>                 URL: https://issues.apache.org/jira/browse/PIG-4784
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4784.patch
>
>
> When you enable pig.disable.counter as "true" in the conf/pig.properties, the 
> counter to calculate the number of input records  and output records will be 
> disabled. 
> Following unit tests are designed to test it but now they fail:
> org.apache.pig.test.TestPigRunner#testDisablePigCounters
> org.apache.pig.test.TestPigRunner#testDisablePigCounters2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to