On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > 1. I'm still unhappy about the loop that fills the count histogram, > as I noted earlier today. It at least needs a decent comment and some > overflow protection, and I'm not entirely convinced that it doesn't have > more bugs than the overflow issue. >
Attached patch is focused on fixing this. The "frac" variable overflow is evaded by making it int64. I hope comments is clarifying something. In general this loop copies behaviour of histogram constructing loop of compute_scalar_stats function. But instead of values array we've array of unique DEC and it's frequency. ------ With best regards, Alexander Korotkov.
*** a/src/backend/utils/adt/array_typanalyze.c --- b/src/backend/utils/adt/array_typanalyze.c *************** *** 581,587 **** compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, DECountItem **sorted_count_items; int count_item_index; int delta; ! int frac; float4 *hist; /* num_hist must be at least 2 for the loop below to work */ --- 581,587 ---- DECountItem **sorted_count_items; int count_item_index; int delta; ! int64 frac; float4 *hist; /* num_hist must be at least 2 for the loop below to work */ *************** *** 612,633 **** compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc, hist[num_hist] = (double) element_no / (double) nonnull_cnt; /* ! * Construct the histogram. ! * ! * XXX this needs work: frac could overflow, and it's not clear ! * how or why the code works. Even if it does work, it needs ! * documented. */ delta = analyzed_rows - 1; count_item_index = 0; ! frac = sorted_count_items[0]->frequency * (num_hist - 1); for (i = 0; i < num_hist; i++) { while (frac <= 0) { count_item_index++; Assert(count_item_index < count_items_count); ! frac += sorted_count_items[count_item_index]->frequency * (num_hist - 1); } hist[i] = sorted_count_items[count_item_index]->count; frac -= delta; --- 612,642 ---- hist[num_hist] = (double) element_no / (double) nonnull_cnt; /* ! * Construct the histogram of DECs. The object of this loop is to ! * copy the max and min DECs and evenly-spaced DECs in between ! * ("space" here is number of arrays corresponding to DEC). If we ! * imagine ordered array of DECs where each input array have a ! * corresponding DEC item, i'th value of histogram will be ! * DECs[i * (analyzed_rows - 1) / (num_hist - 1)]. But instead ! * of such array we've sorted_count_items which holds unique DEC ! * values with their frequencies. We can imagine "frac" variable as ! * an (index in DECs corresponding to next sorted_count_items ! * element - index in DECs corresponding to last histogram value) * ! * (num_hist - 1). In this case negative fraction leads us to ! * iterate over sorted_count_items. */ delta = analyzed_rows - 1; count_item_index = 0; ! frac = (int64)sorted_count_items[0]->frequency * ! (int64)(num_hist - 1); for (i = 0; i < num_hist; i++) { while (frac <= 0) { count_item_index++; Assert(count_item_index < count_items_count); ! frac += (int64)sorted_count_items[count_item_index]->frequency * ! (int64)(num_hist - 1); } hist[i] = sorted_count_items[count_item_index]->count; frac -= delta;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers