I spent a bit of time today testing Melanie's v11, except with read_stream.c v13, on Linux, ext4, and 3000 IOPS cloud storage. I think I now know roughly what's going on. Here are some numbers, using your random table from above and a simple SELECT * FROM t WHERE a < 100 OR a = 123456. I'll keep parallelism out of this for now. These are milliseconds:
eic unpatched patched 0 4172 9572 1 30846 10376 2 18435 5562 4 18980 3503 8 18980 2680 16 18976 3233 So with eic=0, unpatched wins. The reason is that Linux readahead wakes up and scans the table at 150MB/s, because there are enough clusters to trigger it. But patched doesn't look quite so sequential because we removed the sequential accesses by I/O combining... At eic=1, unpatched completely collapses. I'm not sure why exactly. Once you go above eic=1, Linux seems to get out of the way and just do what we asked it to do: iostat shows exactly 3000 IOPS, exactly 8KB avg read size, and (therefore) throughput of 24MB/sec, though you can see the queue depth being exactly what we asked it to do,eg 7.9 or whatever for eic=8, while patched eats it for breakfast because it issues wide requests, averaging around 27KB. It seems more informative to look at the absolute numbers rather than the A/B ratios, because then you can see how the numbers themselves are already completely nuts, sort of interference patterns from interaction with kernel heuristics. On the other hand this might be a pretty unusual data distribution. People who store random numbers or hashes or whatever probably don't really search for ranges of them (unless they're trying to mine bitcoins in SQL). I dunno. Maybe we need more realistic tests, or maybe we're just discovering all the things that are bad about the pre-existing code.