On Thu, Jun 15, 2023, at 06:29, jian he wrote: > I am not sure the following results are correct. > with cte as ( > select hashset(x) as x > ,hashset_capacity(hashset(x)) > ,hashset_count(hashset(x)) > from generate_series(1,10) g(x)) > select * > ,'|' as delim > , hashset_add(x,11111::int) > ,hashset_capacity(hashset_add(x,11111::int)) > ,hashset_count(hashset_add(x,11111::int)) > from cte \gx > > > results: > -[ RECORD 1 ]----+----------------------------- > x | {8,1,10,3,9,4,6,2,11111,5,7} > hashset_capacity | 64 > hashset_count | 10 > delim | | > hashset_add | {8,1,10,3,9,4,6,2,11111,5,7} > hashset_capacity | 64 > hashset_count | 11
Nice catch, you found a bug! Fixed in attached patch: --- Ensure hashset_add and hashset_merge operate on copied data Previously, the hashset_add() and hashset_merge() functions were modifying the original hashset in-place. This was leading to unexpected results because the original data in the hashset was being altered. This commit introduces the macro PG_GETARG_INT4HASHSET_COPY(), ensuring a copy of the hashset is created and modified, leaving the original hashset untouched. This adjustment ensures hashset_add() and hashset_merge() operate correctly on the copied hashset and prevent modification of the original data. A new regression test file `reported_bugs.sql` has been added to validate the proper functionality of these changes. Future reported bugs and their corresponding tests will also be added to this file. --- I wonder if this function: static int4hashset_t * int4hashset_copy(int4hashset_t *src) { return src; } ...that was previously named hashset_copy(), should be implemented to actually copy the struct, instead of just returning the input? It is being used by int4hashset_agg_combine() like this: /* copy the hashset into the right long-lived memory context */ oldcontext = MemoryContextSwitchTo(aggcontext); src = int4hashset_copy(src); MemoryContextSwitchTo(oldcontext); /Joel
hashset-0.0.1-da84659.patch
Description: Binary data