[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

Konstantin Shvachko (JIRA) Thu, 10 Aug 2017 14:18:34 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122346#comment-16122346
 ]


Konstantin Shvachko commented on MAPREDUCE-6931:
------------------------------------------------

[~dennishuo] I agree that "Total Throughput" metric highly depends on how you 
run the job. This is exactly the point that it makes it a Mapreduce metric, not 
HDFS. One can go to Yarn UI and divide HDFS bytes written by the job time for 
any job, but it does not measure HDFS write operation.
I think we should just remove it.

> Fix TestDFSIO "Total Throughput" calculation
> --------------------------------------------
>
>                 Key: MAPREDUCE-6931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: benchmarks, test
>    Affects Versions: 2.8.0
>            Reporter: Dennis Huo
>            Priority: Trivial
>
> The new "Total Throughput" line added in 
> https://issues.apache.org/jira/browse/HDFS-9153 is currently calculated as 
> {{toMB(size) / ((float)execTime)}} and claims to be in units of "MB/s", but 
> {{execTime}} is in milliseconds; thus, the reported number is 1/1000x the 
> actual value:
> {code:java}
>     String resultLines[] = {
>         "----- TestDFSIO ----- : " + testType,
>         "            Date & time: " + new Date(System.currentTimeMillis()),
>         "        Number of files: " + tasks,
>         " Total MBytes processed: " + df.format(toMB(size)),
>         "      Throughput mb/sec: " + df.format(size * 1000.0 / (time * 
> MEGA)),
>         "Total Throughput mb/sec: " + df.format(toMB(size) / 
> ((float)execTime)),
>         " Average IO rate mb/sec: " + df.format(med),
>         "  IO rate std deviation: " + df.format(stdDev),
>         "     Test exec time sec: " + df.format((float)execTime / 1000),
>         "" };
> {code}
> The different calculated fields can also use toMB and a shared 
> milliseconds-to-seconds conversion to make it easier to keep units consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6931) Fix TestDFSIO "Total Throughput" calculation

Reply via email to