Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
On Mar 31, 2017 2:17 PM, "Peter Geoghegan"wrote: On Fri, Mar 31, 2017 at 2:11 PM, Tom Lane wrote: >> The patch does actually store truncated/key-only tuples in the hi keys / >> non-leaf-nodes, which don't need the "included" columns. > > Hm. Since index tuples lack any means of indicating the actual number > of columns included (ie there's no equivalent of the natts field that > exists in HeapTupleHeaders), I think that this is an unreasonably > dangerous design. It'd be better to store nulls for the missing > fields. That would force a null bitmap to be included whether or > not there were nulls in the key columns, but if we're only doing it > once per page that doesn't seem like much cost. We're doing it once per page for the leaf page high key, but that's used as the downlink in the second phase of a B-Tree page split -- it's directly copied. So, including a NULL bitmap would make Anastasia's patch significantly less useful, since that now has to be in every internal page, and might imply that you end up with less fan-in much of the time (e.g., int4 attribute is truncated from the end relative to leaf IndexTuple format). BTW, what about the 1/3 of a page restriction on tuple size? I think that truncation has to strictly guarantee that the resulting tuple will be no larger than the original. Otherwise, you get into trouble here. But, that's still not my main objection to using the null bitmap. -- Peter Geoghegan (Sent from my phone)
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
I don't see why you'd necessarily need to know where in the index the tuple came from under my proposal. Besides, as I already pointed out, we already hard code minus infinity handling within internal pages during tuple comparisons. Anastasia's patch could modify nbtree in exactly the same way as suffix truncation would. I see no conflict there. Quite the opposite, in fact. -- Peter Geoghegan (Sent from my phone)
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
Peter Geogheganwrites: > On Fri, Mar 31, 2017 at 2:32 PM, Tom Lane wrote: >> ... I also think we're setting up a situation where tools >> like amcheck and pg_filedump are going to be unable to cope, because >> figuring out what a particular tuple contains is going to require context >> they haven't got. It just seems way too dangerous. > Why wouldn't they have the context? Up to now, we have always had the property that you could decode an index tuple given only the tuple and its relation's tupdesc (and likewise for heap tuples). This patch breaks that: now you need to know where in the index the tuple came from. That's a property that I think we will regret losing. No doubt we can make it work for some value of "work", but there's going to be broken tools, and other opportunity costs down the road. > There is a free bit within IndexTupleData.t_info that could indicate > that this is what happened. Yeah, I was just looking at that, but it's the only bit we've got left. Don't think we can use it both to deal with Anastasia's patch and your suffix truncation idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
On Fri, Mar 31, 2017 at 3:03 PM, Tom Lanewrote: > The reason it works fine for heap tuples is that heap tuple headers > explicitly record the number of attributes in the tuple. Index > tuples don't. Per my previous mail, I think we can change things so that Index tuples effectively record that in all relevant cases. i.e., in what I called separator key index tuples -- high key tuples in leaf pages, and all internal page index tuples (high keys and downlinks/internal items). We already store a minus infinity downlink on internal pages, which doesn't bother diagnostic tools at all, despite being its own special case without real Datum values. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
Robert Haaswrites: > On Fri, Mar 31, 2017 at 2:32 PM, Tom Lane wrote: >> It just seems way too dangerous. > So, we end up with heap tuples with different numbers of attributes > all the time, whenever you add a column. It works fine - on decoding, > the additional columns will be treated as null (unless somebody > commits Serge Rielau's patch, which regrettably nobody has gotten > around to reviewing). Why is this case different? The reason it works fine for heap tuples is that heap tuple headers explicitly record the number of attributes in the tuple. Index tuples don't. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
I wrote: > Andres Freundwrites: >> It'd be useful for FieldStore - we'd not have to error out anymore if >> the number of columns changes (which I previously had "solved" by using >> MaxHeapAttributeNumber sized values/nulls array). > Ah, I see. But in that case you're not really truncating the tuple > --- what you want is to be allowed to pass undersized datum/nulls > arrays and have the missing columns be taken as nulls. No, scratch that: you can't use an "extended" heap_form_tuple to solve that problem, because it's not only the form-tuple end but the deform-tuple end that needs fullsize arrays. Otherwise, we'd be losing trailing-column values in the deform-and-reform cycle. So I still don't see that there's a valid application for this feature. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
On Fri, Mar 31, 2017 at 2:32 PM, Tom Lanewrote: > I think you are failing to get the point. I am not on about whether > we need a few bytes more or less, I am on about whether the patch > even works. As an example, it's quite unclear what the width of > such an index tuple's nulls bitmap will be if it exists, and there > certainly will be cases where it must exist because of nulls in the > keys columns. I also think we're setting up a situation where tools > like amcheck and pg_filedump are going to be unable to cope, because > figuring out what a particular tuple contains is going to require context > they haven't got. It just seems way too dangerous. Why wouldn't they have the context? I think that we can use the page offset for internal items to indicate the number of attributes that were truncated in each case. That field is currently unused in all relevant cases (for "separator" IndexTuples). This wouldn't be the first time we put a magic value into an item pointer offset. That detail would be redundant for Anastasia's patch, but we can imagine a future in which that isn't the case. There is a free bit within IndexTupleData.t_info that could indicate that this is what happened. I am not going to comment on how dangerous any of this may or may not be just yet, but FWIW I think it would be worth using that bit to allow suffix truncation to work. Suffix truncation is a widely used technique, that Anastasia's patch happens to be a special case of. It's a technique that is almost as old as B-Trees themselves. The use of a dedicated bit probably wouldn't be necessary, but perhaps it makes things safer for the patch. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
On Fri, Mar 31, 2017 at 2:32 PM, Tom Lanewrote: > Peter Geoghegan writes: >> On Fri, Mar 31, 2017 at 2:11 PM, Tom Lane wrote: >>> Hm. Since index tuples lack any means of indicating the actual number >>> of columns included (ie there's no equivalent of the natts field that >>> exists in HeapTupleHeaders), I think that this is an unreasonably >>> dangerous design. It'd be better to store nulls for the missing >>> fields. That would force a null bitmap to be included whether or >>> not there were nulls in the key columns, but if we're only doing it >>> once per page that doesn't seem like much cost. > >> We're doing it once per page for the leaf page high key, but that's >> used as the downlink in the second phase of a B-Tree page split -- >> it's directly copied. So, including a NULL bitmap would make >> Anastasia's patch significantly less useful, > > I think you are failing to get the point. I am not on about whether > we need a few bytes more or less, I am on about whether the patch > even works. As an example, it's quite unclear what the width of > such an index tuple's nulls bitmap will be if it exists, and there > certainly will be cases where it must exist because of nulls in the > keys columns. I also think we're setting up a situation where tools > like amcheck and pg_filedump are going to be unable to cope, because > figuring out what a particular tuple contains is going to require context > they haven't got. It just seems way too dangerous. So, we end up with heap tuples with different numbers of attributes all the time, whenever you add a column. It works fine - on decoding, the additional columns will be treated as null (unless somebody commits Serge Rielau's patch, which regrettably nobody has gotten around to reviewing). Why is this case different? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
Peter Geogheganwrites: > On Fri, Mar 31, 2017 at 2:11 PM, Tom Lane wrote: >> Hm. Since index tuples lack any means of indicating the actual number >> of columns included (ie there's no equivalent of the natts field that >> exists in HeapTupleHeaders), I think that this is an unreasonably >> dangerous design. It'd be better to store nulls for the missing >> fields. That would force a null bitmap to be included whether or >> not there were nulls in the key columns, but if we're only doing it >> once per page that doesn't seem like much cost. > We're doing it once per page for the leaf page high key, but that's > used as the downlink in the second phase of a B-Tree page split -- > it's directly copied. So, including a NULL bitmap would make > Anastasia's patch significantly less useful, I think you are failing to get the point. I am not on about whether we need a few bytes more or less, I am on about whether the patch even works. As an example, it's quite unclear what the width of such an index tuple's nulls bitmap will be if it exists, and there certainly will be cases where it must exist because of nulls in the keys columns. I also think we're setting up a situation where tools like amcheck and pg_filedump are going to be unable to cope, because figuring out what a particular tuple contains is going to require context they haven't got. It just seems way too dangerous. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
On Fri, Mar 31, 2017 at 2:11 PM, Tom Lanewrote: >> The patch does actually store truncated/key-only tuples in the hi keys / >> non-leaf-nodes, which don't need the "included" columns. > > Hm. Since index tuples lack any means of indicating the actual number > of columns included (ie there's no equivalent of the natts field that > exists in HeapTupleHeaders), I think that this is an unreasonably > dangerous design. It'd be better to store nulls for the missing > fields. That would force a null bitmap to be included whether or > not there were nulls in the key columns, but if we're only doing it > once per page that doesn't seem like much cost. We're doing it once per page for the leaf page high key, but that's used as the downlink in the second phase of a B-Tree page split -- it's directly copied. So, including a NULL bitmap would make Anastasia's patch significantly less useful, since that now has to be in every internal page, and might imply that you end up with less fan-in much of the time (e.g., int4 attribute is truncated from the end relative to leaf IndexTuple format). I've implemented a rough prototype of suffix truncation, that seems to work well enough, and keeps amcheck happy, and I base my remarks on the experience of writing that prototype. Using the NULL bitmap this way was the first thing I tried. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
Robert Haaswrites: > You might want to have a read through that patch. I think your > opinion would be helpful in determining how close that patch is to > being ready to commit (same for WARM). Well, now that we have an extra week, maybe I'll find the time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
Andres Freundwrites: > On 2017-03-31 13:44:52 -0400, Tom Lane wrote: >> Andres Freund writes: >>> The covering indexes patch [1] really needs a version of >>> heap_form_tuple/index_form_tuple that allows to specify the number of >>> columns in the to-be generated tuple. >> I was thinking about that this morning, and wondering why exactly it >> would need such a thing. Certainly we're not going to store such a >> truncated tuple in the index, so I assume that the purpose here is >> just to pass around the tuple internally for some reason. > The patch does actually store truncated/key-only tuples in the hi keys / > non-leaf-nodes, which don't need the "included" columns. Hm. Since index tuples lack any means of indicating the actual number of columns included (ie there's no equivalent of the natts field that exists in HeapTupleHeaders), I think that this is an unreasonably dangerous design. It'd be better to store nulls for the missing fields. That would force a null bitmap to be included whether or not there were nulls in the key columns, but if we're only doing it once per page that doesn't seem like much cost. >> Um, I didn't notice anyplace where that would have helped, certainly >> not on the "form tuple" side. Tuples that don't meet their own tupdesc >> don't seem to have wide application to me. > It'd be useful for FieldStore - we'd not have to error out anymore if > the number of columns changes (which I previously had "solved" by using > MaxHeapAttributeNumber sized values/nulls array). Ah, I see. But in that case you're not really truncating the tuple --- what you want is to be allowed to pass undersized datum/nulls arrays and have the missing columns be taken as nulls. In short, I'd be okay with an extension that allows shortened input arrays, but I think it needs to produce fully valid tuples with nulls in the unsupplied columns. (In the heap case we could indeed achieve that effect by storing the smaller natts value in the header, but not in the index case.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
On 2017-03-31 13:44:52 -0400, Tom Lane wrote: > Andres Freundwrites: > > The covering indexes patch [1] really needs a version of > > heap_form_tuple/index_form_tuple that allows to specify the number of > > columns in the to-be generated tuple. > > I was thinking about that this morning, and wondering why exactly it > would need such a thing. Certainly we're not going to store such a > truncated tuple in the index, so I assume that the purpose here is > just to pass around the tuple internally for some reason. The patch does actually store truncated/key-only tuples in the hi keys / non-leaf-nodes, which don't need the "included" columns. > > Previously the faster expression > > evaluation stuff could also have benefited form the same for both > > forming and deforming tuples. > > Um, I didn't notice anyplace where that would have helped, certainly > not on the "form tuple" side. Tuples that don't meet their own tupdesc > don't seem to have wide application to me. It'd be useful for FieldStore - we'd not have to error out anymore if the number of columns changes (which I previously had "solved" by using MaxHeapAttributeNumber sized values/nulls array). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
On Fri, Mar 31, 2017 at 1:44 PM, Tom Lanewrote: > Andres Freund writes: >> The covering indexes patch [1] really needs a version of >> heap_form_tuple/index_form_tuple that allows to specify the number of >> columns in the to-be generated tuple. > > I was thinking about that this morning, and wondering why exactly it > would need such a thing. Certainly we're not going to store such a > truncated tuple in the index, ... The patch stores it in the index as a high key. You might want to have a read through that patch. I think your opinion would be helpful in determining how close that patch is to being ready to commit (same for WARM). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
Andres Freundwrites: > The covering indexes patch [1] really needs a version of > heap_form_tuple/index_form_tuple that allows to specify the number of > columns in the to-be generated tuple. I was thinking about that this morning, and wondering why exactly it would need such a thing. Certainly we're not going to store such a truncated tuple in the index, so I assume that the purpose here is just to pass around the tuple internally for some reason. What's wrong with just generating the full tuple, perhaps with nulls for the extra columns if you're concerned about memory space? > Previously the faster expression > evaluation stuff could also have benefited form the same for both > forming and deforming tuples. Um, I didn't notice anyplace where that would have helped, certainly not on the "form tuple" side. Tuples that don't meet their own tupdesc don't seem to have wide application to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple
On Fri, Mar 31, 2017 at 1:24 PM, Andres Freundwrote: > The covering indexes patch [1] really needs a version of > heap_form_tuple/index_form_tuple that allows to specify the number of > columns in the to-be generated tuple. Previously the faster expression > evaluation stuff could also have benefited form the same for both > forming and deforming tuples. > > It's obviously trivial to add, codewise. > > I think for index_form_tuple we can just add a parameter, but for > heap_form_tuple/heap_deform_tuple that'd probably break a number of > external users. How about adding _ex variants that allow to specify > that? I think our usual practice is _extended rather than just _ex, but no objections otherwise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Allow to specify #columns in heap/index_form_tuple
Hi, The covering indexes patch [1] really needs a version of heap_form_tuple/index_form_tuple that allows to specify the number of columns in the to-be generated tuple. Previously the faster expression evaluation stuff could also have benefited form the same for both forming and deforming tuples. It's obviously trivial to add, codewise. I think for index_form_tuple we can just add a parameter, but for heap_form_tuple/heap_deform_tuple that'd probably break a number of external users. How about adding _ex variants that allow to specify that? Regards, Andres [1] http://archives.postgresql.org/message-id/56168952.4010101%40postgrespro.ru -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers