On Mon, Oct 15, 2018 at 12:16 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > On Sun, 14 Oct 2018 at 06:19, Thomas Munro <thomas.mu...@enterprisedb.com> > > wrote: > > Cache-oblivious hash joins cause a lot of TLB and cache misses. > > ... > > (There is another class of cache-aware hash join algorithms that partition > > carefully up front to avoid them; that's not us.) > > Just out of curiosity, can you please elaborate more on this part (with > references)? I'm thinking about this topic for a while, and I'm wondering, if > by another class you mean something like this [1], then even if it's not us > today, are there any issues that prevent from experimenting in this area?
Hmm, I didn't mean the term-of-art "cache-oblivious" (Wikipedia definition: "an algorithm designed to take advantage of a CPU cache without having the size of the cache"), I meant not "cache-conscious" (we don't do anything at all to reduce cache misses, though obviously we could add prefetching to improve on that). The distinction I'm making is between "no partition" hash join (what we have), where you don't have to do any work up front, but you pay for a lot of cache misses during building and probing, and "partition" hash join, notably "radix join" (what MonetDB has?), where you have a partition phase before the build phase that aims to break the data up into enough partitions so that the hash tables will fit better in cache, making the later phases faster. There seems to be some disagreement about which is best -- passing through the data first is expensive, but so are cache misses on every probe, and there are claims that the winner depends on skew and tuple size. Here's some of the stuff I read/watched about this subject: https://15721.courses.cs.cmu.edu/spring2018/schedule.html#apr-04-2018 Add to that http://www.adms-conf.org/2017/camera-ready/Analyzing_In_Memory_Hash_Joins__Granularity_Matters.pdf. Skimming very quickly through the paper you posted, yeah, I mean exactly that stuff. Specifically I was thinking of the radix join mentioned in background section 2.3. (I see also that the same authors wrote a paper "Cache-Oblivious Hash Joins" which appears to describe a refinement of radix that doesn't need to be parameterised for cache size.) Sure, we could always consider these ideas. I am not personally working on that; to me it looked very hard to build, for a feature so uncertain to produce better results! (Note: we do of course have some kind of partitioning called "batching" when work_mem runs out, but it's not a kind of partitioning that cares about reducing cache misses, so if I understood correctly it's still "no partition" as far as this discussion goes.) -- Thomas Munro http://www.enterprisedb.com