Hi Da, It's not immediately clear to me the size of the benefit versus the costs. Two cases where one normally thinks about direct I/O are: 1) The usage scenario is a cache anti-pattern. This will be true for some Hadoop use cases (MapReduce), not true for some others. - http://www.jeffshafer.com/publications/papers/shafer_ispass10.pdf 2) The application manages its own cache. Not applicable. Atom processors, which you mention below, will just exacerbate (1) due to the small cache size.
Since it should help with MapReduce case, there's probably an overall benefit to the community. However, what is the cost in terms of code complexity? Here's a LKML post from Linus mentioning all the nasty parts of doing O_DIRECT: http://lkml.org/lkml/2007/1/11/129 Other choice quotes from Linus: """ The right way to do it is to just not use O_DIRECT. The whole notion of "direct IO" is totally braindamaged. Just say no. This is your brain: O This is your brain on O_DIRECT: . Any questions? I should have fought back harder. There really is no valid reason for EVER using O_DIRECT. You need a buffer whatever IO you do, and it might as well be the page cache. There are better ways to control the page cache than play games and think that a page cache isn't necessary. So don't use O_DIRECT. Use things like madvise() and posix_fadvise() instead. """ It sounds like trying to implement direct I/O is the way of bugs, data loss, and pain for Hadoop. However, the OpenJDK list has a few opinions of their own (where I found the above quotes): http://markmail.org/message/rmty2xsl45p7klbt Using properly aligned NIO is going to get you most of the way (guess what Hadoop does already!). The other thing to try is using posix_fadvise JNI. In fact, it appears Lucene has considered this: http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html In their case, fadvise *didn't* work well, but for a reason which shouldn't be true for MapReduce. All-in-all, doing this specialization such that you don't hurt the general case is going to be tough. Brian On Jan 3, 2011, at 12:46 PM, Da Zheng wrote: > Hello, > > I don't know which mailing list is better for this question, so I like to > forward my questions to this mailing list. > > If no one is thinking of doing direct IO in Hadoop, I will do it myself. I > have located the code, but the thing is that I'm not familiar with the > environment of compiling Hadoop. I can use jposix, but I don't know how to > integrate it to Hadoop (jposix uses JNI). Any instructions to do it? > > Thank you, > Da > > > -------- Original Message -------- > Subject: Hadoop use direct I/O in Linux? > Date: Sun, 02 Jan 2011 15:01:18 -0500 > From: Da Zheng <zhengda1...@gmail.com> > To: common-u...@hadoop.apache.org > > > > Hello, > > direct IO can make huge performance difference, especially when Atom > processors > are used. but as far as I know, hadoop doesn't enable direct IO of Linux. Does > anyone know any unofficial versions were developed to use direct IO? > > I googled it, and found FUSE provides an option for direct IO. If I use FUSE > DFS > and enable direct IO, will I get what I want? i.e., when I write data to HDFS, > the data is written to the disk directly (no caching by any file systems)? or > this direct IO option only allows me to bypass the caching in FUSE and the > data > is still cached by the underlying FS? > > Best, > Da >
smime.p7s
Description: S/MIME cryptographic signature