Hi, Thanks for the detailed explaination. I misunderstood the code (more honest speaking, din't look so close there). Then I looked it closer.
At Wed, 08 Jul 2015 03:03:16 +0200, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote in <559c76d4.2030...@2ndquadrant.com> > FWIW this was a stupid bug in update_match_bitmap_histogram(), which > initially handled only AND clauses, and thus assumed the "match" of a > bucket can only decrease. But for OR clauses this is exactly the > opposite (we assume no buckets match and add buckets matching at least > one of the clauses). > > With this fixed, the estimates look like this: > > IMHO pretty accurate estimates - no issue with OR clauses. Ok, I understood the diferrence between what I thought and what you say. The code is actually concious of OR clause but is looks somewhat confused. Currently choosing mv stats in clauselist_selectivity can be outlined as following, 1. find_stats finds candidate mv stats containing *all* attributes appeared in the whole clauses regardless of and/or exprs by walking whole the clause tree. Perhaps this is the measure to early bailout. 2.1. Within every disjunction elements, collect mv-related attributes while checking whether the all leaf nodes (binop or ifnull) are compatible by (eventually) walking whole the clause tree. 2.2. Check if all the collected attribute are contained in mv-stats columns. 3. Finally, clauseset_mv_selectivity_histogram() (and others). This funciton applies every ExprOp onto every attribute in every histogram backes and (tries to) make the boolean operation of the result bitmaps. I have some comments on the implement and I also try to find the solution for them. 1. The flow above looks doing very similiar thins repeatedly. 2. I believe what the current code does can be simplified. 3. As you mentioned in comments, some additional infrastructure needed. After all, I think what we should do after this are as follows, as the first step. - Add the means to judge the selectivity operator(?) by other than oprrest of the op of ExprOp. (You missed neqsel already) I suppose one solution for this is adding oprmvstats taking 'm', 'h' and 'f' and their combinations. Or for the convenience, it would be a fixed-length string like this. oprname | oprmvstats = | 'mhf' <> | 'mhf' < | 'mh-' > | 'mh-' >= | 'mh-' <= | 'mh-' This would make the code in clause_is_mv_compatible like this. > oprmvstats = get_mvstatsset(expr->opno); /* bitwise representation */ > if (oprmvstats & types) > { > *attnums = bms_add_member(*attnums, var->varattno); > return true; > } > return false; - Current design just manage to work but it is too complicated and hardly have affinity with the existing estimation framework. I proposed separation of finding stats phase and calculation phase, but I would like to propose transforming RestrictInfo(and finding mvstat) phase and running the transformed RestrictInfo phase after looking close to the patch. I think transforing RestrictInfo makes the situnation better. Since it nedds different information, maybe it is better to have new struct, say, RestrictInfoForEstimate (boo!). Then provide mvstatssel() to use in the new struct. The rough looking of the code would be like below. clauselist_selectivity() { ... RestrictInfoForEstmate *esclause = transformClauseListForEstimation(root, clauses, varRelid); ... return clause_selectivity(esclause): } clause_selectivity(RestrictInfoForEstmate *esclause) { if (IsA(clause, RestrictInfo))... if (IsA(clause, RestrictInfoForEstimate)) { RestrictInfoForEstimate *ecl = (RestrictInfoForEstimate*) clause; if (ecl->selfunc) { sx = ecl->selfunc(root, ecl); } } if (IsA(clause, Var))... } transformClauseListForEstimation(...) { ... relid = collect_mvstats_info(root, clause, &attlist); if (!relid) return; if (get_mvstats_hook) mvstats = (*get_mvstats_hoook) (root, relid, attset); else mvstats = find_mv_stats(root, relid, attset)) } ... > I've pushed this to github [1] but I need to do some additional > fixes. I also had to remove some optimizations while fixing this, and > will have to reimplement those. > > That's not to say that the handling of OR-clauses is perfectly > correct. After looking at clauselist_selectivity_or(), I believe it's > a bit broken and will need a bunch of fixes, as explained in the > FIXMEs I pushed to github. > > [1] https://github.com/tvondra/postgres/tree/mvstats I don't see whether it is doable or not, and I suppose you're unwilling to change the big picture, so I will consider the idea and will show you the result, if it turns out to be possible and promising. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers