[ 
https://issues.apache.org/jira/browse/HDFS-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122049#comment-16122049
 ] 

Konstantin Shvachko edited comment on HDFS-9153 at 8/10/17 6:35 PM:
--------------------------------------------------------------------

Hey guys I think the metrics you introduced is absolutely deceiving, and has 
nothing to do with the throughput the benchmark is intended to measure.
"Test exec time" is the running time of the job, which includes the compute 
overhead: scheduling, cleanup, and retries if there were failed maps.
While we want to benchmark the average throughput of the actual data transfers 
on HDFS. You should see the implementation measures time of transfers only.

The formatting changes are fine. But I think "Total Throughput" should be 
removed.
The bug reported in MAPREDUCE-6931 makes it invalid, but even if fixed it is 
still deceiving.

-Also, DFSIO issues should be filed on HDFS jira. Then you should expect more 
prompt response.-
_Sorry last part was for the other jira. Please ignore._


was (Author: shv):
Hey guys I think the metrics you introduced is absolutely deceiving, and has 
nothing to do with the throughput the benchmark is intended to measure.
"Test exec time" is the running time of the job, which includes the compute 
overhead: scheduling, cleanup, and retries if there were failed maps.
While we want to benchmark the average throughput of the actual data transfers 
on HDFS. You should see the implementation measures time of transfers only.

The formatting changes are fine. But I think "Total Throughput" should be 
removed.
The bug reported in MAPREDUCE-6931 makes it invalid, but even if fixed it is 
still deceiving.

Also, DFSIO issues should be filed on HDFS jira. Then you should expect more 
prompt response.

> Pretty-format the output for DFSIO
> ----------------------------------
>
>                 Key: HDFS-9153
>                 URL: https://issues.apache.org/jira/browse/HDFS-9153
>             Project: Hadoop HDFS
>          Issue Type: Test
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>             Fix For: 2.8.0, 3.0.0-alpha1
>
>         Attachments: HDFS-9153-v1.patch
>
>
> Ref. the following DFSIO output, I was surprised the test throughput was only 
> {{17}} MB/s, which doesn't make sense for a real cluster. Maybe it's used for 
> other purpose? For users, it may make more sense to give the throughput 1610 
> MB/s (1228800/763), calculated by *Total MBytes processed / Test exec time*.
> {noformat}
> 15/09/28 11:42:23 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
> 15/09/28 11:42:23 INFO fs.TestDFSIO:            Date & time: Mon Sep 28 
> 11:42:23 CST 2015
> 15/09/28 11:42:23 INFO fs.TestDFSIO:        Number of files: 100
> 15/09/28 11:42:23 INFO fs.TestDFSIO: Total MBytes processed: 1228800.0
> 15/09/28 11:42:23 INFO fs.TestDFSIO:      Throughput mb/sec: 
> 17.457387239456878
> 15/09/28 11:42:23 INFO fs.TestDFSIO: Average IO rate mb/sec: 17.57563018798828
> 15/09/28 11:42:23 INFO fs.TestDFSIO:  IO rate std deviation: 
> 1.7076328985378455
> 15/09/28 11:42:23 INFO fs.TestDFSIO:     Test exec time sec: 762.697
> 15/09/28 11:42:23 INFO fs.TestDFSIO: 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to