Why do reads take as long as replicated writes?

Eitan Rosenfeld Tue, 04 Nov 2014 06:22:58 -0800

I am benchmarking my cluster of 16 nodes (all in one rack) with TestDFSIO on
Hadoop 1.0.4.  For simplicity, I turned off speculative task execution and set
the max map and reduce tasks to 1.


With a replication factor of 2, writing 1 file of 5GB takes twice as long as
reading 1 file. This result seems to make sense since the replication results
in twice the I/O in the cluster versus the read. However, as I scale up the
number of 5GB files from 1 to 64 files, reading ultimately takes as long as
writing. In particular, I see this result when writing and reading 64
such files.

What could cause read performance to degrade faster than write performance
as the number of files increases?

The full results (number of 5GB files, ratio of write time to read
time) are below:
1,  2.02
2,  1.87
4,  1.73
8,  1.54
16,  1.37
32,  1.29
64,  1.01

Thank you,

Eitan

Why do reads take as long as replicated writes?

Reply via email to