I strongly suggest benchmarking a modern version of Hadoop rather than Hadoop 1.x. The native CRC stuff from HDFS-3528 greatly reduces CPU consumption on the read path. I wrote about some other read path optimizations in Hadoop 2.x here: http://www.club.cc.cmu.edu/~cmccabe/d/2014.04_ApacheCon_HDFS_read_path_optimization_presentation.pdf . I agree with Andrew that Teragen and Teravalidate are probably a better choice for you. Look for the bottleneck in your system.
best, Colin On Wed, Nov 5, 2014 at 4:10 PM, Eitan Rosenfeld <eita...@gmail.com> wrote: > Daemeon - Indeed, I neglected to mention that I am clearing the caches > throughout my cluster before running the read benchmark. My expectation > was to ideally get results that were proportionate to disk I/O, given > that replicated writes perform twice the disk I/O relative to reads. I've > verified the I/O with iostat. However, as I mentioned earlier, reads and > writes converge as the number of files in the workload increases, despite > the constant ratio of write I/O to read I/O. > > Andrew - I've verified that the network is not the bottleneck. (All of the > links are 10Gb). As you'll see, I suspect that the lack of data-locality > causes the slowdown because a given node can be responsible for > serving multiple remote block reads all at once. > > I hope my understanding of writes and reads can be confirmed: > > Write pipelining allows a node to write, replicate, and receive replicated > data in parallel. If node A is writing its own data while receiving > replicated data from node B, node B does not wait for node A to finish > writing B's replicated data to disk. Rather, node B can begin writing its > next local block immediately. Thus, pipelining helps replicated writes > have good performance. > > In contrast, let's assume node A is currently reading a block. If node A > receives an additional read request from node B, A will take longer to > serve the block to B because of A's pre-existing read. Because node B > waits longer for the block to be served from A, there is a delay on node B > before it attempts to read the next block in the file. Multiple read > requests from different nodes are a consequence of having no built-in > data locality with TestDFSIO. Finally, as the number of concurrent tasks > throughout the cluster increases, the wait time for reads increases. > > Is my understanding of these read and write mechanisms correct? > > Thank you, > Eitan