[
https://issues.apache.org/jira/browse/PIG-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659317#comment-14659317
]
kexianda commented on PIG-4645:
-------------------------------
Implement a Hadoop-like Counter, using Spark Accumulator.
SparkCounter is a wrapper class of Spark Accumulator. counters are organised in
groups(SparkCounters/SparkCounterGroup), just like Hadoop Counter. We can use
SparkCounter to implement the customized counters in pig, such as counter for
IO statistics.
Note: SparkCounter is slightly differet from hadoop counter. If user update
SparkCounter in transformations like map(), users should be aware of that each
task’s update may be applied more than once if tasks or job stages are
re-executed.
Please refer to([https://issues.apache.org/jira/browse/SPARK-732 SPARK-732])
Hi [~mohitsabharwal] & [~xuefuz], Would you please help review the
code(PIG-4645.patch). Right now, This patch is only the common part. There is
no test case for this. I will use it to implement the Pig's customized
counters and add testcases in splitted Jira.
> Support hadoop-like Counter using spark accumulator
> ---------------------------------------------------
>
> Key: PIG-4645
> URL: https://issues.apache.org/jira/browse/PIG-4645
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: kexianda
> Assignee: kexianda
> Fix For: spark-branch
>
> Attachments: PIG-4645.patch
>
>
> Pig collect Input/Output statistic info via Counter in MR/Tez mode, we need
> to support this using spark accumulator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)