Hi, Thanks for the detailed explaination. I misunderstood the
code (more honest speaking, din't look so close there). Then I
looked it closer.


At Wed, 08 Jul 2015 03:03:16 +0200, Tomas Vondra <tomas.von...@2ndquadrant.com> 
wrote in <559c76d4.2030...@2ndquadrant.com>
> FWIW this was a stupid bug in update_match_bitmap_histogram(), which
> initially handled only AND clauses, and thus assumed the "match" of a
> bucket can only decrease. But for OR clauses this is exactly the
> opposite (we assume no buckets match and add buckets matching at least
> one of the clauses).
> 
> With this fixed, the estimates look like this:
> 

> IMHO pretty accurate estimates - no issue with OR clauses.

Ok, I understood the diferrence between what I thought and what
you say. The code is actually concious of OR clause but is looks
somewhat confused.

Currently choosing mv stats in clauselist_selectivity can be
outlined as following,

1. find_stats finds candidate mv stats containing *all*
   attributes appeared in the whole clauses regardless of and/or
   exprs by walking whole the clause tree.

   Perhaps this is the measure to early bailout.

2.1. Within every disjunction elements, collect mv-related
   attributes while checking whether the all leaf nodes (binop or
   ifnull) are compatible by (eventually) walking whole the
   clause tree.

2.2. Check if all the collected attribute are contained in
   mv-stats columns.

3. Finally, clauseset_mv_selectivity_histogram() (and others).

   This funciton applies every ExprOp onto every attribute in
   every histogram backes and (tries to) make the boolean
   operation of the result bitmaps.

I have some comments on the implement and I also try to find the
solution for them.


1. The flow above looks doing  very similiar thins repeatedly.

2. I believe what the current code does can be simplified.

3. As you mentioned in comments, some additional infrastructure
   needed.

After all, I think what we should do after this are as follows,
as the first step.

- Add the means to judge the selectivity operator(?) by other
  than oprrest of the op of ExprOp. (You missed neqsel already)

  I suppose one solution for this is adding oprmvstats taking
  'm', 'h' and 'f' and their combinations. Or for the
  convenience, it would be a fixed-length string like this.

  oprname | oprmvstats
  =       | 'mhf'
  <>      | 'mhf'
  <       | 'mh-'
  >       | 'mh-'
  >=      | 'mh-'
  <=      | 'mh-'

  This would make the code in clause_is_mv_compatible like this.

  > oprmvstats = get_mvstatsset(expr->opno); /* bitwise representation */
  > if (oprmvstats & types)
  > {
  >    *attnums = bms_add_member(*attnums, var->varattno);
  >    return true;
  > }
  > return false;

- Current design just manage to work but it is too complicated
  and hardly have affinity with the existing estimation
  framework. I proposed separation of finding stats phase and
  calculation phase, but I would like to propose transforming
  RestrictInfo(and finding mvstat) phase and running the
  transformed RestrictInfo phase after looking close to the
  patch.

  I think transforing RestrictInfo makes the situnation
  better. Since it nedds different information, maybe it is
  better to have new struct, say, RestrictInfoForEstimate
  (boo!). Then provide mvstatssel() to use in the new struct.
  The rough looking of the code would be like below. 

  clauselist_selectivity()
  {
    ...
    RestrictInfoForEstmate *esclause =
      transformClauseListForEstimation(root, clauses, varRelid);
    ...

    return clause_selectivity(esclause):
  }

  clause_selectivity(RestrictInfoForEstmate *esclause)
  {
    if (IsA(clause, RestrictInfo))...
    if (IsA(clause, RestrictInfoForEstimate))
    {
       RestrictInfoForEstimate *ecl = (RestrictInfoForEstimate*) clause;
       if (ecl->selfunc)
       {
          sx = ecl->selfunc(root, ecl);
       }
    }
    if (IsA(clause, Var))...
  }

  
  transformClauseListForEstimation(...)
  {
    ...

    relid = collect_mvstats_info(root, clause, &attlist);
    if (!relid) return;
    if (get_mvstats_hook)
         mvstats = (*get_mvstats_hoook) (root, relid, attset);
    else
         mvstats = find_mv_stats(root, relid, attset))
  }
  ...

> I've pushed this to github [1] but I need to do some additional
> fixes. I also had to remove some optimizations while fixing this, and
> will have to reimplement those.
> 
> That's not to say that the handling of OR-clauses is perfectly
> correct. After looking at clauselist_selectivity_or(), I believe it's
> a bit broken and will need a bunch of fixes, as explained in the
> FIXMEs I pushed to github.
> 
> [1] https://github.com/tvondra/postgres/tree/mvstats

I don't see whether it is doable or not, and I suppose you're
unwilling to change the big picture, so I will consider the idea
and will show you the result, if it turns out to be possible and
promising.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to