Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan Wed, 13 May 2020 08:49:18 -0700

On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <[email protected]>
wrote:

> On Fri, May 8, 2020 at 7:27 AM Andy Fan <[email protected]> wrote:
>
> >> +            else if (inner_is_onerow)
> >> +            {
> >> +                /* Since rows in innerrel can't be duplicated AND if
> >> innerrel is onerow,
> >> +                 * the join result will be onerow also as well. Note:
> >> onerow implies
> >> +                 * multi_nullvals = false.
> >>
> >> I don't understand this comment. Why is there only one row in the other
> >> relation which can join to this row?
> >
> >
> > I guess you may miss the onerow special case if I understand correctly.
> > inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".
> > innerrel can't be duplicated means:  t1.y = t2.pk;  so the finally
> result is onerow
> > as well.  One of the overall query is  SELECT .. FROM t1, t2 where t2.y
> = t2.pk;
> >
> >
> > I explained more about onerow in the v7 README.unqiuekey document, just
> copy
> > it here.
> For some reason this mail remained in my drafts without being sent.
> Sending it now. Sorry.
>
> My impression about the one row stuff, is that there is too much
> special casing around it. We should somehow structure the UniqueKey
> data so that one row unique keys come naturally rather than special
> cased. E.g every column in such a case is unique in the result so
> create as many UniqueKeys are the number of columns

This is the beginning state of the UniqueKey,  later David suggested
this as an optimization[1], I buy-in the idea and later I found it mean
more than the original one [2], so I think onerow is needed actually.

> or create one
> unique key with no column as you have done but handle it more
> gracefully rather than spreading it all over the place.

I think this is what I do now, but it is possible that I spread it more than
necessary, if so, please let me know.  I maintained the README.uniquekey
carefully since v7 and improved a lot in v8, it may be a good place to check
it.

>
Also, the amount of code that these patches changes seems to be much
> larger than the feature's worth arguably. But it indicates that we are
> modifying/adding more code than necessary. Some of that code can be
> merged into existing code which does similar things as I have pointed
> out in my previous comment.
>
>
I have reused the code select_mergejoin_clause rather than maintaining my
own copies in v8. Thanks a lot about that suggestion.  This happened mainly
because I didn't read enough code.  I will go through more to see if I have
similar
issues.

> Thanks for working on the expanded scope of the initial feature you
> proposed. But it makes the feature more useful, I think.
>
> That's mainly because your suggestions are always insightful which makes me
willing to continue to work on it, so thank you all!

===
In fact, I was hesitated that how to reply an email when I send an new
version
of the patch.  One idea is I should explain clear what is the difference
between Vn
and Vn-1.  The other one is not many people read the Vn-1, so I would like
to keep
the email simplified and keep the README clearly to save the reviewer's
time.
Actually there are more changes in v8 than I stated above. for example:
for the
UniqueKey on baserelation,  we will reduce the expr from the UniqueKey if
the
expr is a const.   Unique on (A, B).   query is SELECT b FROM t WHERE a =
1;
in v7, the UniqueKey is (a, b).  In v8, the UniqueKey is (b) only.  But
since most
people still not start to read it, so I add such  information to README
rather than
echo the same in email thread.  I will try more to understand how to
communicate more
smooth.  But any suggestion on this part is appreciated.

Best Regards
Andy Fan

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Reply via email to