31.01.2016 11:04, David Rowley:
On 27 January 2016 at 03:35, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:
including_columns_3.0 is the latest version of patch.
And changes regarding the previous version are attached in a separate patch.
Just to ease the review and debug.
Hi,

I've made another pass over the patch. There's still a couple of
things that I think need to be looked at.
Thank you again.
I just write here to say that I do not disappear and I do remember about the issue. But I'm very very busy this week. I'll send an updated patch next week as soon as possible.

Do we need the "b (included)" here? The key is (a) = (1). Having
irrelevant details might be confusing.

postgres=# create table a (a int not null, b int not null);
CREATE TABLE
postgres=# create unique index on a (a) including(b);
CREATE INDEX
postgres=# insert into a values(1,1);
INSERT 0 1
postgres=# insert into a values(1,1);
ERROR:  duplicate key value violates unique constraint "a_a_b_idx"
DETAIL:  Key (a, b (included))=(1, 1) already exists.
I thought that it could be strange if user inserts two values and then sees only one of them in error message. But now I see that you're right. I'll also look at the same functional in other DBs and fix it.

In index_reform_tuple() I find it a bit scary that you change the
TupleDesc's number of attributes then set it back again once you're
finished reforming the shortened tuple.
Maybe it would be better to modify index_form_tuple() to accept a new
argument with a number of attributes, then you can just Assert that
this number is never higher than the number of attributes in the
TupleDesc.
Good point.
I agree that this function is a bit strange. I have to set tupdesc->nattrs to support compatibility with index_form_tuple(). I didn't want to add neither a new field to tupledesc nor a new parameter to index_form_tuple(), because they are used widely.
I'm also not that keen on index_reform_tuple() in general. I wonder if
there's a way we can just keep the Datum/isnull arrays a bit longer,
and only form the tuple when needed. I've not looked into this in
detail, but it does look like reforming the tuple is not going to be
cheap.
It is used in splits, for example. There is no datum array, we just move tuple key from a child page to a parent page or something like that.
And according to INCLUDING algorithm we need to truncate nonkey attributes.
If we do need to keep this function, I think a better name might be
index_trim_tuple() and I don't think you need to pass the original
length. It might make sense to Assert() that the trim length is
smaller than the tuple size

As regards the performance, I don't think that it's a big problem here.
Do you suggest to do it in a following way memcpy(oldtup, newtup, newtuplength)?
I will
in gistrescan() there is some code:

for (attno = 1; attno <= natts; attno++)
{
TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL,
   scan->indexRelation->rd_opcintype[attno - 1],
   -1, 0);
}

Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to
be sized by the number of key columns, but this loop goes over the
number of attribute columns.
Perhaps this is not a big problem since GIST does not support
INCLUDING columns, but it does seem wrong still.

GiST doesn't support INCLUDING clause, so natts and nkeyatts are always equal. I don't see any problem here. And I think that it's an extra work to this patch. Maybe I or someone else would add this feature to other access methods later.
Which brings me to the fact that I've spent a bit of time trying to
look for places where you've forgotten to change natts to nkeyatts. I
did find this one, but I don't have much confidence that there's not
lots more places that have been forgotten. Apart from this one, how
confident are you that you've found all the places? I'm getting
towards being happy with the code that I see that's been changed, but
I'm hesitant to mark as "Ready for committer" due to not being all
that comfortable that all the code that needs to be updated has been
updated. I'm not quite sure of a good way to find all these places.
I found all mentions of natts and other related variables with grep, and replaced (or expand) them with nkeyatts where it was necessary.
As mentioned before, I didn't change other AMs.
I strongly agree that any changes related to btree require thorough inspection, so I'll recheck it again. But I'm almost sure that it's okay.

I wondering if hacking the code so that each btree index which is
created with > 1 column puts all but the first column into the
INCLUDING columns, then run the regression tests to see if there are
any crashes. I'm really not that sure of how else to increase the
confidence levels on this. Do you have ideas?

Do I understand correctly that you suggest to replace all multicolumn indexes with (1key column) + included? I don't think it's a good idea. INCLUDING clause brings some disadvantages. For example, included columns must be filtered after the search, while key columns could be used in scan key directly. I already mentioned this in test example:

explain analyze select c1, c2 from tbl where c1<10000 and c3<20;
If columns' opclasses are used, new query plan uses them in Index Cond: ((c1 < 10000) AND (c3 < 20)) Otherwise, new query can not use included column in Index Cond and uses filter instead:
Index Cond: (c1 < 10000)
Filter: (c3 < 20)
Rows Removed by Filter: 9993
It slows down the query significantly.

And besides that, we still want to have multicolumn unique indexes. CREATE UNIQUE INDEX on tbl (a, b, c) INCLUDING (d);

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to