On Mon, Jan 3, 2011 at 7:15 PM, Brian Bockelman <bbock...@cse.unl.edu>wrote:
> The I/O pattern isn't truly random. To convert from physicist terms to CS > terms, the application is iterating through the rows of a column-oriented > store, reading out somewhere between 1 and 10% of the columns. The twist is > that the columns are compressed, meaning the size of a set of rows on disk > is variable. > We're getting pretty far off topic here, but this is an interesting problem. It *sounds* to me like a "compressed bitmap index" problem, possibly with bloom filters for joins (basically what HBase/Cassandra/Hypertable get in to, or in a less distributed case: MonetDB). Is that on the money? > This prevents any sort of OS page cache stride detection from helping - > the OS sees everything as random. > It seems though like if you organized the data a certain way, the OS page cache could help. > However, the application also has an index of where each row is located, > meaning if it knows the active set of columns, it can predict the reads the > client will perform and do a read-ahead. > Yes, this is the kind of advantage where O_DIRECT might help, although I'd hope in this kind of circumstance the OS buffer cache would mostly give up anyway and just give as much of the available RAM as possible to the app. In that case memory mapped files with a thread doing a bit of read ahead would seem like not that much slower than using O_DIRECT. That said, I have to wonder how often this problem devolves in to a straight forward column scan. I mean, with a 1-10% hit rate, you need SSD seek times for it to make sense to seek to specific records vs. just scanning through the whole column, or to put it another way: "disk is the new tape". ;-) > Some days, it does feel like "building a better TCP using UDP". However, > we got a 3x performance improvement by building it (and multiplying by > 10-15k cores for just our LHC experiment, that's real money!), so it's a > particular monstrosity we are stuck with. It sure sounds like a problem better suited to C++ than Java though. What benefits do you yield from doing all this with a JVM? -- Chris