[
https://issues.apache.org/jira/browse/HADOOP-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Philip Zeyliger updated HADOOP-5469:
------------------------------------
Description: Implement a "/metrics" URL on the HTTP server of Hadoop
daemons, to expose metrics data to users via their web browsers, in plain-text
and JSON. (was: I'd like to be able to query Hadoop's metrics via HTTP, e.g.,
by going to "/metrics" on any Hadoop daemon that has an HttpServer. My
motivation is pretty simple--if you're running on a lot of machines, tracking
down the relevant metrics files is pretty time-consuming; this would be a
useful debugging utility. I'd also like the output to be parseable, so I could
write a quick web app to query the metrics dynamically.
This is similar in spirit, but different, from just using JMX. (See also
HADOOP-4756.) JMX requires a client, and, more annoyingly, JMX requires
setting up authentication. If you just disable authentication, someone can do
Bad Things, and if you enable it, you have to worry about yet another password.
It's also more complete--JMX require separate instrumentation, so, for example,
the JobTracker's metrics aren't exposed via JMX.
To start the discussion going, I've attached a patch. I had to add a method to
ContextFactory to get all the active MetrixContexts, implement a do-little
MetricsContext that simply inherits from AbstractMetricsContext, add a method
to MetricsContext to get all the records, expose copy methods for the maps in
OutputRecord, and implemented an easy servlet. I ended up removing some
common code from all MetricsContexts, for setting the period; I'm open to
taking that out if it muddies the patch significantly.
I'd love to hear your suggestions. There's a bug in the JSON representation,
and there's some gross type-handling.
The patch is missing tests. I wanted to post to gather feedback before I got
too far, but tests are forthcoming.
Here's a sample output for a job tracker, while it was running a "pi" job:
{noformat}
jvm
metrics
{hostName=doorstop.local, processName=JobTracker, sessionId=}
gcCount=22
gcTimeMillis=68
logError=0
logFatal=0
logInfo=52
logWarn=0
memHeapCommittedM=7.4375
memHeapUsedM=4.2150116
memNonHeapCommittedM=23.1875
memNonHeapUsedM=18.438614
threadsBlocked=0
threadsNew=0
threadsRunnable=7
threadsTerminated=0
threadsTimedWaiting=8
threadsWaiting=15
mapred
job
{counter=Map input records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=2.0
{counter=Map output records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=4.0
{counter=Data-local map tasks, group=Job Counters ,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=4.0
{counter=Map input bytes, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=48.0
{counter=FILE_BYTES_WRITTEN, group=FileSystemCounters,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=148.0
{counter=Combine output records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=0.0
{counter=Launched map tasks, group=Job Counters , hostName=doorstop.local,
jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
value=4.0
{counter=HDFS_BYTES_READ, group=FileSystemCounters,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=236.0
{counter=Map output bytes, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=64.0
{counter=Launched reduce tasks, group=Job Counters ,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=1.0
{counter=Spilled Records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=4.0
{counter=Combine input records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=0.0
jobtracker
{hostName=doorstop.local, sessionId=}
jobs_completed=0
jobs_submitted=1
maps_completed=2
maps_launched=5
reduces_completed=0
reduces_launched=1
rpc
metrics
{hostName=doorstop.local, port=50030}
NumOpenConnections=2
RpcProcessingTime_avg_time=0
RpcProcessingTime_num_ops=84
RpcQueueTime_avg_time=1
RpcQueueTime_num_ops=84
callQueueLen=0
getBuildVersion_avg_time=0
getBuildVersion_num_ops=1
getJobProfile_avg_time=0
getJobProfile_num_ops=17
getJobStatus_avg_time=0
getJobStatus_num_ops=32
getNewJobId_avg_time=0
getNewJobId_num_ops=1
getProtocolVersion_avg_time=0
getProtocolVersion_num_ops=2
getSystemDir_avg_time=0
getSystemDir_num_ops=2
getTaskCompletionEvents_avg_time=0
getTaskCompletionEvents_num_ops=19
heartbeat_avg_time=5
heartbeat_num_ops=9
submitJob_avg_time=0
submitJob_num_ops=1
{noformat})
I've been schooled that descriptions ought to be short, and comments lengthy.
The original description follows, and the description has been shortened.
I'd like to be able to query Hadoop's metrics via HTTP, e.g., by going to
"/metrics" on any Hadoop daemon that has an HttpServer. My motivation is
pretty simple--if you're running on a lot of machines, tracking down the
relevant metrics files is pretty time-consuming; this would be a useful
debugging utility. I'd also like the output to be parseable, so I could write
a quick web app to query the metrics dynamically.
This is similar in spirit, but different, from just using JMX. (See also
HADOOP-4756.) JMX requires a client, and, more annoyingly, JMX requires
setting up authentication. If you just disable authentication, someone can do
Bad Things, and if you enable it, you have to worry about yet another password.
It's also more complete--JMX require separate instrumentation, so, for example,
the JobTracker's metrics aren't exposed via JMX.
To start the discussion going, I've attached a patch. I had to add a method to
ContextFactory to get all the active MetrixContexts, implement a do-little
MetricsContext that simply inherits from AbstractMetricsContext, add a method
to MetricsContext to get all the records, expose copy methods for the maps in
OutputRecord, and implemented an easy servlet. I ended up removing some
common code from all MetricsContexts, for setting the period; I'm open to
taking that out if it muddies the patch significantly.
I'd love to hear your suggestions. There's a bug in the JSON representation,
and there's some gross type-handling.
The patch is missing tests. I wanted to post to gather feedback before I got
too far, but tests are forthcoming.
Here's a sample output for a job tracker, while it was running a "pi" job:
{noformat}
jvm
metrics
{hostName=doorstop.local, processName=JobTracker, sessionId=}
gcCount=22
gcTimeMillis=68
logError=0
logFatal=0
logInfo=52
logWarn=0
memHeapCommittedM=7.4375
memHeapUsedM=4.2150116
memNonHeapCommittedM=23.1875
memNonHeapUsedM=18.438614
threadsBlocked=0
threadsNew=0
threadsRunnable=7
threadsTerminated=0
threadsTimedWaiting=8
threadsWaiting=15
mapred
job
{counter=Map input records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=2.0
{counter=Map output records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=4.0
{counter=Data-local map tasks, group=Job Counters ,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=4.0
{counter=Map input bytes, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=48.0
{counter=FILE_BYTES_WRITTEN, group=FileSystemCounters,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=148.0
{counter=Combine output records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=0.0
{counter=Launched map tasks, group=Job Counters , hostName=doorstop.local,
jobId=job_200903101702_0001, jobName=test-mini-mr, sessionId=, user=philip}
value=4.0
{counter=HDFS_BYTES_READ, group=FileSystemCounters,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=236.0
{counter=Map output bytes, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=64.0
{counter=Launched reduce tasks, group=Job Counters ,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=1.0
{counter=Spilled Records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=4.0
{counter=Combine input records, group=Map-Reduce Framework,
hostName=doorstop.local, jobId=job_200903101702_0001, jobName=test-mini-mr,
sessionId=, user=philip}
value=0.0
jobtracker
{hostName=doorstop.local, sessionId=}
jobs_completed=0
jobs_submitted=1
maps_completed=2
maps_launched=5
reduces_completed=0
reduces_launched=1
rpc
metrics
{hostName=doorstop.local, port=50030}
NumOpenConnections=2
RpcProcessingTime_avg_time=0
RpcProcessingTime_num_ops=84
RpcQueueTime_avg_time=1
RpcQueueTime_num_ops=84
callQueueLen=0
getBuildVersion_avg_time=0
getBuildVersion_num_ops=1
getJobProfile_avg_time=0
getJobProfile_num_ops=17
getJobStatus_avg_time=0
getJobStatus_num_ops=32
getNewJobId_avg_time=0
getNewJobId_num_ops=1
getProtocolVersion_avg_time=0
getProtocolVersion_num_ops=2
getSystemDir_avg_time=0
getSystemDir_num_ops=2
getTaskCompletionEvents_avg_time=0
getTaskCompletionEvents_num_ops=19
heartbeat_avg_time=5
heartbeat_num_ops=9
submitJob_avg_time=0
submitJob_num_ops=1
{noformat}
> Exposing Hadoop metrics via HTTP
> --------------------------------
>
> Key: HADOOP-5469
> URL: https://issues.apache.org/jira/browse/HADOOP-5469
> Project: Hadoop Core
> Issue Type: New Feature
> Components: metrics
> Reporter: Philip Zeyliger
> Attachments: HADOOP-5469.patch
>
>
> Implement a "/metrics" URL on the HTTP server of Hadoop daemons, to expose
> metrics data to users via their web browsers, in plain-text and JSON.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.