Re: [HACKERS] Use unique index for longer pathkeys.

Kyotaro HORIGUCHI Mon, 14 Jul 2014 03:34:18 -0700

Thank you for reviewing, the revised patch will come later.

At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila <amit.kapil...@gmail.com> wrote 
in <caa4ek1+6b6wjwf51ozmrl+mkfh8xup9j-pehqvor8se7swy...@mail.gmail.com>
> On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> >
> > Hello, This is the continuation from the last CF.
> >
> > This patch intends to make PG to use index for longer pathkeys
> > than index columns when,
> >
> >  - The index is a unique index.
> >  - All index columns are NOT NULL.
> >  - The index column list is a subset of query_pathkeys.
> >
> > ====
> >
> > This patch consists of mainly three parts.
(snip)
> 
> I have spent some time on this patch and would like to share my
> findings as below with you.
> 
> 1.
> It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in
> this patch is not getting set properly.  Please refer below test:


Yeah, probably you omitted the second condition above.

> create table t (a int, b int, c int, d text);

The following definition instead will make this work. NULLs
cannot be unique.

| create table t (a int not null, b int not null, c int, d text);

> create unique index i_t_pkey on t(a, b);
> insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
> 100000) a);
> analyze;
> explain (costs off, analyze on) select distinct * from t;
...
> 2.
> typedef struct IndexOptInfo
>   bool predOK; /* true if predicate matches query */
>   bool unique; /* true if a unique index */
>   bool immediate; /* is uniqueness enforced immediately? */
> + bool full_ordered; /* don't key columns duplicate? */
> 
> It seems you have forgotten to update out function _outIndexOptInfo().

Mmm, it's surely what I didn't intended to make:( , but the
member actually works in collect_common_primary_pathkeys.

The correct (desirable) code should use (allnotnull && unique)
instead of (full_ordered). I'll fix this in the comming version
later but they are equivelant in functionality. It's not a
mistake of forgetting update out (so broken), but branching from
wrong point (so it works contrary to the expectation):)

> 3.
> + root->distinct_pathkeys =
> + shorten_pathkeys_following(root, root->distinct_pathkeys,pk_guide,
> +   true);
> +
> + root->query_pathkeys =
> + shorten_pathkeys_following(root,root->query_pathkeys,pk_guide,
> +   true);
> 
> Add a space after each parameter in function call.

Thank you for pointing out.

> 4.
> +PathKeysComparison
> +compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)
> 
> a. This function return 4 different type of values (PATHKEYS_EQUAL,
>    PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),
>    however the caller just checks for PATHKEYS_EQUAL, isn't it better to
> just
>    return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.

Hmm.. I agree that it is practically removable, but I think it is
necessary from the view of similarity to the function with the
similar name. Perhaps I want another name which don't suggest the
similarity to compare_pathekeys(), say, bool
pathkeys_are_equal_ignoring_strategy() to change the rerurn
values set.

> b. Another idea could be that instead of writting separate function,
>    pass an additional parameter to compare_pathkeys() to distinguish
>    for ignore strategy case.

It sounds reasonable. According to my faint memory, the reason
why the function is separated from the original one is, perhaps,
simplly a editorial reason.

> c. In any case, I think it is good to add comments on top of newly
>    introduced function (compare_pathkeys_ignoring_strategy) or extend
>    the comments of compare_pathkeys() if you want to add a new parameter
>    to it.

Ouch, thank you for pointing out. I'll add comment if it remains
in the next patch.

> > Finally, the new patch has grown up to twice in size.. Although
> > it seems a bit large, I think that side-effects are clearly
> > avoided.
> 
> I am bit worried about the extra cycles added by this patch as compare
> to your previous patch; example
> During trimming of paths, it will build index paths (build_index_pathkeys())
> which it will anyway do again during creation of index paths:
> create_index_paths()->get_index_paths()->build_index_paths()->build_index_pathkeys()

I felt the same thing. On stupid measure to reduce cycles would
be caching created pathkeys in IndexOptInfo. I'll try in the next
patch.

> I am not sure if this has direct impact, however I am suspecting trimming
> logic can add quite a few instructions even for cases when it is not
> beneficial.
> At this moment, I also don't have better idea, so lets proceed with review
> of this
> patch unless there is any other better way to achieve it.

I suppose there's some shortcut which can exclude the cases where
obviously there's no use of pathkeys trimming, for example, using
only pathkey lengths. I'll seek for it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use unique index for longer pathkeys.

Reply via email to