David Rowley <dgrowle...@gmail.com> writes: > Because there's been quite a few of these, and this report is yet > another one, I wonder if it's time to try and stamp these out at the > source rather than where the row counts are being used.
I'm on board with trying to get rid of NaN rowcount estimates more centrally. I do not think it is a good idea to try to wire in a prohibition against zero rowcounts. That is actually the correct thing in assorted scenarios --- one example recently under discussion was ModifyTable without RETURNING, and another is where we can prove that a restriction clause is constant-false. At some point I think we are going to want to deal honestly with those cases instead of sweeping them under the rug. So I'm disinclined to remove zero defenses that we'll just have to put back someday. I think converting Inf to DBL_MAX, in hopes of avoiding creation of NaNs later, is fine. (Note that applying rint() to that is quite useless --- in every floating-point system, values bigger than 2^number-of-mantissa-bits are certainly integral.) I'm not sure why you propose to map NaN to one. Wouldn't mapping it to Inf (and thence to DBL_MAX) make at least as much sense? Probably more in fact. We know that unwarranted one-row estimates are absolute death to our chances of picking a well-chosen plan. > I toyed around with the attached patch, but I'm still not that excited > about the clamping of infinite values to DBL_MAX. The test case I > showed above with generate_Series(1,379) still ends up with NaN cost > estimates due to costing a sort with DBL_MAX rows. When I was writing > the patch, I had it in my head that the costs per row will always be > lower than 1. Yeah, that is a good point. Maybe instead of clamping to DBL_MAX, we should clamp rowcounts to something that provides some headroom for multiplication by per-row costs. A max rowcount of say 1e100 should serve fine, while still being comfortably more than any non-insane estimate. So now I'm imagining something like #define MAXIMUM_ROWCOUNT 1e100 clamp_row_est(double nrows) { /* Get rid of NaN, Inf, and impossibly large row counts */ if (isnan(nrows) || nrows >= MAXIMUM_ROWCOUNT) nrows = MAXIMUM_ROWCOUNT; else ... existing logic ... Perhaps we should also have some sort of clamp for path cost estimates, at least to prevent them from being NaNs which is going to confuse add_path terribly. regards, tom lane