Re: Interesting Hadoop/FUSE-DFS access patterns

Brian Bockelman Sun, 12 Apr 2009 21:41:44 -0700

Ok, here's something perhaps even more strange. I removed the "seek"part out of my timings, so I was only timing the "read" instead of the"seek + read" as in the first case. I also turned the read-ahead downto 1-byte (aka, off).


The jump *always* occurs at 128KB, exactly.

I'm a bit befuddled. I know we say that HDFS is optimized for large,sequential reads, not random reads - but it seems that it's one bug-fix away from being a good general-purpose system. Heck if I can findwhat's causing the issues though...


Brian




On Apr 12, 2009, at 8:53 PM, Brian Bockelman wrote:

Hey all,
I was doing some research on I/O patterns of our applications, and Inoticed the attached pattern. In case if the mail server strips outattachments, I also uploaded it:
http://t2.unl.edu/store/Hadoop_64KB_ra.png
http://t2.unl.edu/store/Hadoop_1024KB_ra.png
This was taken using the FUSE mounts of Hadoop; the first one waswith a 64KB read-ahead and the second with a 1MB read-ahead. Thiswas taken from a 2GB file and randomly 'seek'ed in the file. Thiswas performed 20 times for each read size, advancing in 4KBincrements. Each blue dot is the read time of one experiment; thered dot is the median read time for the read size. The graphs showthe absolute read time.
There's very interesting behavior - it seems that there is a changein behavior around reads of size of 800KB. The time for the readsgo down significantly when you read *larger* files. I thought thiswas just an artifact of the 64KB read-ahead I set in FUSE, so Iupped the read-ahead significantly, to 1MB. In this case, thedifference between the the small read sizes and large read sizes are*very* pronounced. If it was an artifact from FUSE, I'd expect theplace where the change occurred would be a function of the readahead-size.
Anyone out there who knows the code have any ideas? What could I bedoing wrong?
Brian

<Hadoop_64KB_ra.png>

<Hadoop_1024KB_ra.png>

Re: Interesting Hadoop/FUSE-DFS access patterns

Reply via email to