On Mon, Dec 4, 2023 at 4:16 AM Jeff Davis <pg...@j-davis.com> wrote: > I'm trying to follow the distinctions you're making between dynahash > and simplehash -- are you saying it's easier to do incremental hashing > with dynahash, and if so, why?
That's a good thing to clear up. This thread has taken simplehash as a starting point from the very beginning. It initially showed no improvement, and then we identified problems with the hashing and equality computations. The latter seem like independently commitable improvements, so I'm curious if they help on their own, even if we still need to switch to simplehash as a last step to meet your performance goals. > If I understood what Andres was saying, the exposed hash state would be > useful for writing a hash function like guc_name_hash(). >From my point of view, it would at least be useful for C-strings, where we don't have the length available up front. Aside from that, we have multiple places that compute full 32-bit hashes on multiple individual values, and then combine them with various ad-hoc ways. It could be worth exploring whether an incremental interface would be better in those places on a case-by-case basis. (If Andres had something else in mind, I'll let him address that.) > But whether we > use simplehash or dynahash is a separate question, right? Right, the table implementation should treat the hash function as a black box. Think of the incremental API as lower-level building blocks for building hash functions. > Also, while the |= 0x20 is a nice trick for lowercasing, did we decide > that it's better than my approach in patch 0004 here: > > https://www.postgresql.org/message-id/27a7a289d5b8f42e1b1e79b1bcaeef3a40583bd2.ca...@j-davis.com > > which optimizes exact hits (most GUC names are already folded) before > trying case folding? Note there were two aspects there: hashing and equality. I demonstrated in https://www.postgresql.org/message-id/CANWCAZbQ30O9j-bEZ_1zVCyKPpSjwbE4u19cSDDBJ%3DTYrHvPig%40mail.gmail.com ... in v4-0003 that the equality function can be optimized for already-folded names (and in fact measured almost equally) using way, way, way less code.