Daemeon - Indeed, I neglected to mention that I am clearing the caches
throughout my cluster before running the read benchmark. My expectation
was to ideally get results that were proportionate to disk I/O, given
that replicated writes perform twice the disk I/O relative to reads. I've
verified the I/O with iostat. However, as I mentioned earlier, reads and
writes converge as the number of files in the workload increases, despite
the constant ratio of write I/O to read I/O.

Andrew - I've verified that the network is not the bottleneck. (All of the
links are 10Gb). As you'll see, I suspect that the lack of data-locality
causes the slowdown because a given node can be responsible for
serving multiple remote block reads all at once.

I hope my understanding of writes and reads can be confirmed:

Write pipelining allows a node to write, replicate, and receive replicated
data in parallel. If node A is writing its own data while receiving
replicated data from node B, node B does not wait for node A to finish
writing B's replicated data to disk. Rather, node B can begin writing its
next local block immediately.  Thus, pipelining helps replicated writes
have good performance.

In contrast, let's assume node A is currently reading a block. If node A
receives an additional read request from node B, A will take longer to
serve the block to B because of A's pre-existing read. Because node B
waits longer for the block to be served from A, there is a delay on node B
before it attempts to read the next block in the file. Multiple read
requests from different nodes are a consequence of having no built-in
data locality with TestDFSIO. Finally, as the number of concurrent tasks
throughout the cluster increases, the wait time for reads increases.

Is my understanding of these read and write mechanisms correct?

Thank you,
Eitan

Reply via email to