... BTW, could you explain exactly how that "Fill histogram by hashtab" loop works? It's way too magic for my taste, and does in fact have bugs in the currently submitted patch. I've reworked it to this:
/* Fill histogram by hashtab. */ delta = analyzed_rows - 1; count_item_index = 0; frac = sorted_count_items[0]->frequency * (num_hist - 1); for (i = 0; i < num_hist; i++) { while (frac <= 0) { count_item_index++; Assert(count_item_index < count_items_count); frac += sorted_count_items[count_item_index]->frequency * (num_hist - 1); } hist[i] = sorted_count_items[count_item_index]->count; frac -= delta; } Assert(count_item_index == count_items_count - 1); The asserts don't fire in any test case I've tried, which seems to indicate that it *does* work in the sense that the first histogram entry is always the smallest count and the last histogram entry is always the largest one. But it's extremely unclear why it manages to stop exactly at the last count_items array entry, or for that matter why it's generating a representative histogram at all. I'm suspicious that the "-1" bits represent off-by-one bugs. I also don't especially like the fact that "frac" is capable of overflowing (since worst case frequency is 300 * 10000 and worst case num_hist is 10000, with the current limits on statistics_target). We could work around that by doing the frac arithmetic in int64, but I wonder whether that couldn't be avoided. In any case, first I'd like an explanation why this code works at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers