On Fri, Jun 9, 2023 at 6:58 PM Joel Jacobson <j...@compiler.org> wrote:

> On Thu, Jun 8, 2023, at 12:19, Tomas Vondra wrote:
> > Would you be interested in helping with / working on some of that? I
> > don't have immediate need for this stuff, so it's not very high on my
> > TODO list.
>
> Sure, I'm willing to help!
>
> I've attached a patch that works on some of the items on your list,
> including some additions to the README.md.
>
> There were a bunch of places where `maxelements / 8` caused bugs,
> that had to be changed to do proper integer ceiling division:
>
> -       values = (int32 *) (set->data + set->maxelements / 8);
> +       values = (int32 *) (set->data + (set->maxelements + 7) / 8);
>
> Side note: I wonder if it would be good to add CEIL_DIV and FLOOR_DIV
> macros
> to the PostgreSQL source code in general, since it's easy to make this
> mistake,
> and quite verbose/error-prone to write it out manually everywhere.
> Such macros could simplify code in e.g. numeric.c.
>
> > There's a bunch of stuff that needs to be improved to make this properly
> > usable, like:
> >
> > 1) better hash table implementation
> TODO
>
> > 2) input/output functions
> I've attempted to implement these.
> I thought comma separated values wrapped around curly braces felt as the
> most natural format,
> example:
> SELECT '{1,2,3}'::hashset;
>
> > 3) support for other types (now it only works with int32)
> TODO
>
> > 4) I wonder if this might be done as an array-like polymorphic type.
> That would be nice!
> I guess the work-around would be to store the actual value of non-int type
> in a lookup table, and then hash the int-based primary key in such table.
>
> Do you think later implementing polymorphic type support would
> mean a more or less complete rewrite, or can we carry on with int32-support
> and add it later on?
>
> > 5) more efficient storage format, with versioning etc.
> TODO
>
> > 6) regression tests
> I've added some regression tests.
>
> > Right. IMHO the query language is a separate thing, you still need to
> > evaluate the query somehow - which is where hashset applies.
>
> Good point, I fully agree.
>
> /Joel




Hi, I am quite new about C.....
The following function I have 3 questions.
1. 7691,4201, I assume they are just random prime ints?
2. I don't get the last return set, even the return type should be bool.
3. I don't understand 13 in hash = (hash + 13) % set->maxelements;

static bool
hashset_contains_element(hashset_t *set, int32 value)
{
        int             byte;
        int             bit;
        uint32  hash;
        char   *bitmap;
        int32  *values;

        hash = ((uint32) value * 7691 + 4201) % set->maxelements;

        bitmap = set->data;
        values = (int32 *) (set->data + set->maxelements / 8);

        while (true)
        {
                byte = (hash / 8);
                bit = (hash % 8);

                /* found an empty slot, value is not there */
                if ((bitmap[byte] & (0x01 << bit)) == 0)
                        return false;

                /* is it the same value? */
                if (values[hash] == value)
                        return true;

                /* move to the next element */
                hash = (hash + 13) % set->maxelements;
        }

        return set;
}

Reply via email to