[
https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917821#action_12917821
]
Luke Lu commented on MAPREDUCE-901:
-----------------------------------
The latest patch already handles JobCounter and TaskCounter optimization (with
the generic FrameworkCounterGroup) transparently. But it doesn't address file
system counter optimization yet. However using concrete fs enums (hdfs, s3
etc.) like in the previous patches is too brittle, as the whole mapreduce
package needs to be recompiled/released for every new implementation of
distributed filesystem, which defeats the purpose of having a filesystem
interface, where we can already query for (fs scheme, stats) tuples.
HADOOP-4188 tried to address the issue but the treatment is incomplete: the
Task#getFileSystemCounters helper method is package private and quite awkward
to use: requires explict array indexing, e.g. getFileSystemCounters(scheme)[0]
to return <SCHEME>_BYTES_READ (e.g. HDFS_BYTES_READ) to use with the
generic counter interface. This also makes decoupled file system counter
display name localization impossible.
I propose that we add a file system counter API to the Counters framework.
Something like:
{code}
Counter getFileSystemCounter(String scheme, FileSystemCounter key);
{code}
where FileSystemCounter is an enum class:
{code}
public enum FileSystemCounter {
BYTES_READ,
BYTES_WRITTEN
// etc.
}
{code}
We can take advantage of this interface to create an efficient file system
counter group that can be more efficiently stored in memory and serialized
(say: (<scheme>, vint(BYTES_READ), vint(BYTES_WRITTEN)...) tuples)
Thoughts?
> Move Framework Counters into a TaskMetric structure
> ---------------------------------------------------
>
> Key: MAPREDUCE-901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: task
> Affects Versions: 0.21.0
> Reporter: Owen O'Malley
> Assignee: Luke Lu
> Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java,
> MAPREDUCE-901.patch, MAPREDUCE-901.patch, mr-901-trunk-v1.patch
>
>
> I think we should move all of the Counters that the framework updates into a
> single class called TaskMetrics. TaskMetrics would have specific fields for
> each of the metrics like input records, input bytes, output records, etc.
> It would both reduce the serialized size of the heartbeats (by shrinking the
> Counters down to just the user's counters) and decrease the latency for
> updates to the JobTracker (since Counters are sent at most 1/minute instead
> of 1/heartbeat).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.