[ 
https://issues.apache.org/jira/browse/PIG-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659317#comment-14659317
 ] 

kexianda commented on PIG-4645:
-------------------------------

Implement a Hadoop-like Counter, using Spark Accumulator.

SparkCounter is a wrapper class of Spark Accumulator. counters are organised in 
groups(SparkCounters/SparkCounterGroup), just like Hadoop Counter.  We can use 
SparkCounter to implement the customized counters in pig, such as counter for 
IO statistics. 

Note: SparkCounter is slightly differet from hadoop counter. If user update 
SparkCounter in transformations like map(), users should be aware of that each 
task’s update may be applied more than once if tasks or job stages are 
re-executed.
Please refer to([https://issues.apache.org/jira/browse/SPARK-732 SPARK-732])

Hi [~mohitsabharwal] & [~xuefuz],  Would you please help review the 
code(PIG-4645.patch).  Right now, This patch is only the common part.  There is 
no test case for this.  I will use it to implement the Pig's customized 
counters and add testcases in splitted Jira.

> Support hadoop-like Counter using spark accumulator
> ---------------------------------------------------
>
>                 Key: PIG-4645
>                 URL: https://issues.apache.org/jira/browse/PIG-4645
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: kexianda
>            Assignee: kexianda
>             Fix For: spark-branch
>
>         Attachments: PIG-4645.patch
>
>
> Pig collect Input/Output statistic info via Counter in MR/Tez mode, we need 
> to support this using spark accumulator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to