On Thu, Jun 16, 2022 at 11:57 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > I've attached an updated version patch that changes the configure > script. I'm still studying how to support AVX2 on msvc build. Also, > added more regression tests.
Thanks for the update, I will take a closer look at the patch in the near future, possibly next week. For now, though, I'd like to question why we even need to use 32-byte registers in the first place. For one, the paper referenced has 16-pointer nodes, but none for 32 (next level is 48 and uses a different method to find the index of the next pointer). Andres' prototype has 32-pointer nodes, but in a quick read of his patch a couple weeks ago I don't recall a reason mentioned for it. Even if 32-pointer nodes are better from a memory perspective, I imagine it should be possible to use two SSE2 registers to find the index. It'd be locally slightly more complex, but not much. It might not even cost much more in cycles since AVX2 would require indirecting through a function pointer. It's much more convenient if we don't need a runtime check. There are also thermal and power disadvantages when using AXV2 in some workloads. I'm not sure that's the case here, but if it is, we'd better be getting something in return. One more thing in general: In an earlier version, I noticed that Andres used the slab allocator and documented why. The last version of your patch that I saw had the same allocator, but not the "why". Especially in early stages of review, we want to document design decisions so it's more clear for the reader. -- John Naylor EDB: http://www.enterprisedb.com