On Fri, Jun 9, 2023 at 6:58 PM Joel Jacobson <j...@compiler.org> wrote:
> On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote: > > Would you be interested in helping with / working on some of that? I > > don't have immediate need for this stuff, so it's not very high on my > > TODO list. > > Sure, I'm willing to help! > > I've attached a patch that works on some of the items on your list, > including some additions to the README.md. > > There were a bunch of places where `maxelements / 8` caused bugs, > that had to be changed to do proper integer ceiling division: > > - values = (int32 *) (set->data + set->maxelements / 8); > + values = (int32 *) (set->data + (set->maxelements + 7) / 8); > > Side note: I wonder if it would be good to add CEIL_DIV and FLOOR_DIV > macros > to the PostgreSQL source code in general, since it's easy to make this > mistake, > and quite verbose/error-prone to write it out manually everywhere. > Such macros could simplify code in e.g. numeric.c. > > > There's a bunch of stuff that needs to be improved to make this properly > > usable, like: > > > > 1) better hash table implementation > TODO > > > 2) input/output functions > I've attempted to implement these. > I thought comma separated values wrapped around curly braces felt as the > most natural format, > example: > SELECT '{1,2,3}'::hashset; > > > 3) support for other types (now it only works with int32) > TODO > > > 4) I wonder if this might be done as an array-like polymorphic type. > That would be nice! > I guess the work-around would be to store the actual value of non-int type > in a lookup table, and then hash the int-based primary key in such table. > > Do you think later implementing polymorphic type support would > mean a more or less complete rewrite, or can we carry on with int32-support > and add it later on? > > > 5) more efficient storage format, with versioning etc. > TODO > > > 6) regression tests > I've added some regression tests. > > > Right. IMHO the query language is a separate thing, you still need to > > evaluate the query somehow - which is where hashset applies. > > Good point, I fully agree. > > /Joel Hi, I am quite new about C..... The following function I have 3 questions. 1. 7691,4201, I assume they are just random prime ints? 2. I don't get the last return set, even the return type should be bool. 3. I don't understand 13 in hash = (hash + 13) % set->maxelements; static bool hashset_contains_element(hashset_t *set, int32 value) { int byte; int bit; uint32 hash; char *bitmap; int32 *values; hash = ((uint32) value * 7691 + 4201) % set->maxelements; bitmap = set->data; values = (int32 *) (set->data + set->maxelements / 8); while (true) { byte = (hash / 8); bit = (hash % 8); /* found an empty slot, value is not there */ if ((bitmap[byte] & (0x01 << bit)) == 0) return false; /* is it the same value? */ if (values[hash] == value) return true; /* move to the next element */ hash = (hash + 13) % set->maxelements; } return set; }