Ok, here's something perhaps even more strange. I removed the "seek"
part out of my timings, so I was only timing the "read" instead of the
"seek + read" as in the first case. I also turned the read-ahead down
to 1-byte (aka, off).
The jump *always* occurs at 128KB, exactly.
I'm a bit befuddled. I know we say that HDFS is optimized for large,
sequential reads, not random reads - but it seems that it's one bug-
fix away from being a good general-purpose system. Heck if I can find
what's causing the issues though...
Brian
On Apr 12, 2009, at 8:53 PM, Brian Bockelman wrote:
Hey all,
I was doing some research on I/O patterns of our applications, and I
noticed the attached pattern. In case if the mail server strips out
attachments, I also uploaded it:
http://t2.unl.edu/store/Hadoop_64KB_ra.png
http://t2.unl.edu/store/Hadoop_1024KB_ra.png
This was taken using the FUSE mounts of Hadoop; the first one was
with a 64KB read-ahead and the second with a 1MB read-ahead. This
was taken from a 2GB file and randomly 'seek'ed in the file. This
was performed 20 times for each read size, advancing in 4KB
increments. Each blue dot is the read time of one experiment; the
red dot is the median read time for the read size. The graphs show
the absolute read time.
There's very interesting behavior - it seems that there is a change
in behavior around reads of size of 800KB. The time for the reads
go down significantly when you read *larger* files. I thought this
was just an artifact of the 64KB read-ahead I set in FUSE, so I
upped the read-ahead significantly, to 1MB. In this case, the
difference between the the small read sizes and large read sizes are
*very* pronounced. If it was an artifact from FUSE, I'd expect the
place where the change occurred would be a function of the readahead-
size.
Anyone out there who knows the code have any ideas? What could I be
doing wrong?
Brian
<Hadoop_64KB_ra.png>
<Hadoop_1024KB_ra.png>