Re: split TOAST support out of postgres.h
On 10.01.23 09:48, Peter Eisentraut wrote: On 10.01.23 08:39, Noah Misch wrote: On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote: On 30.12.22 17:50, Tom Lane wrote: Peter Eisentraut writes: On 28.12.22 16:07, Tom Lane wrote: I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included so widely, I doubt it is buying very much in terms of reducing header footprint. How bad is it to do #2? See this incremental patch set. Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. I think that bears out my feeling that fmgr.h wasn't a great location: I count 117 #includes of that, many of which are in .h files themselves so that many more .c files would be required to read them. committed SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension breakage en masse. I would revert this. Well, that was sort of my thinking, but people seemed to like this. I'm happy to consider alternatives. Given the subsequent discussion, I'll keep it as is for now but consider it a semi-open item. It's easy to change.
Re: split TOAST support out of postgres.h
On Wed, Jan 11, 2023 at 1:14 AM Noah Misch wrote: > If the patch had just made postgres.h include varatt.h, like it does elog.h, > I'd consider that change a nonnegative. Grouping things is nice, even if it > makes compilation a bit slower. That also covers your frontend use case. How > about that? I'm not direly opposed to that, but I'm also unconvinced that having the varatt.h stuff is important enough relative to other things to justify giving it a privileged place forever. > I agree fixing any one extension is easy enough. Thinking back to the > htup_details.h refactor, I found the aggregate pain unreasonable relative to > alleged benefits, even though each individual extension wasn't too bad. > SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62). What annoyed me about that refactoring is that, in most cases where I had been including htup.h, I had to change it to include htup_details.h. In the main source tree, too, we've got 31 places that include access/htup.h and 241 that include access/htup_details.h. It's hard to argue that it was worth splitting the file given those numbers, and in fact I think that change was a mistake. But that doesn't mean every such change is a mistake. -- Robert Haas EDB: http://www.enterprisedb.com
Re: split TOAST support out of postgres.h
On Tue, Jan 10, 2023 at 12:00:46PM -0500, Robert Haas wrote: > On Tue, Jan 10, 2023 at 9:46 AM Tom Lane wrote: > > Now, there is a fair question whether splitting this code out of > > postgres.h is worth any trouble at all. TBH my initial reaction > > had been "no". But once we found that only 40-ish backend files > > need to read this new header, I became a "yes" vote because it > > seems clear that there will be a total-compilation-time benefit. A time claim with no benchmarks is a red flag. I've chosen to run one: export CCACHE_DISABLE=1 change=d952373a987bad331c0e499463159dd142ced1ef for commit in $change $change^; do echo === git checkout $commit git checkout $commit for n in `seq 1 200`; do make -j20 clean; env time make -j20; done done Results: commit median mean count d952373a987bad331c0e499463159dd142ced1ef 49.35 49.37 200 d952373a987bad331c0e499463159dd142ced1ef^ 49.33 49.36 200 That is to say, the patch made the build a bit slower, not faster. That's with GCC 4.8.5 (RHEL 7). I likely should have interleaved the run types, but in any case the speed win didn't show up. > I wasn't totally about this, either, but I think on balance it's > probably a smart thing to do. I always found it kind of weird that we > put that stuff in postgres.h. It seems to privilege the TOAST > mechanism to an undue degree; what's the argument, for example, that > TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS() > or LWLockAcquire or HeapTuple? It felt to me like we'd just decided > that one subsystem gets to go into the main header file and everybody > else just had to have their own headers. > > One thing that's particularly awkward about that is that if you want > to write some front-end code that knows about how varlenas are stored > on disk, it was very awkward with the old structure. You're not > supposed to include "postgres.h" into frontend code, but if the stuff > you need is defined there then what else can you do? So I generally > think that anything that's in postgres.h should have a strong claim to > be not only widely-needed in the backend, but also never needed at all > in any other executable. If the patch had just made postgres.h include varatt.h, like it does elog.h, I'd consider that change a nonnegative. Grouping things is nice, even if it makes compilation a bit slower. That also covers your frontend use case. How about that? I agree fixing any one extension is easy enough. Thinking back to the htup_details.h refactor, I found the aggregate pain unreasonable relative to alleged benefits, even though each individual extension wasn't too bad. SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62).
Re: split TOAST support out of postgres.h
On Tue, Jan 10, 2023 at 9:46 AM Tom Lane wrote: > Now, there is a fair question whether splitting this code out of > postgres.h is worth any trouble at all. TBH my initial reaction > had been "no". But once we found that only 40-ish backend files > need to read this new header, I became a "yes" vote because it > seems clear that there will be a total-compilation-time benefit. I wasn't totally about this, either, but I think on balance it's probably a smart thing to do. I always found it kind of weird that we put that stuff in postgres.h. It seems to privilege the TOAST mechanism to an undue degree; what's the argument, for example, that TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS() or LWLockAcquire or HeapTuple? It felt to me like we'd just decided that one subsystem gets to go into the main header file and everybody else just had to have their own headers. One thing that's particularly awkward about that is that if you want to write some front-end code that knows about how varlenas are stored on disk, it was very awkward with the old structure. You're not supposed to include "postgres.h" into frontend code, but if the stuff you need is defined there then what else can you do? So I generally think that anything that's in postgres.h should have a strong claim to be not only widely-needed in the backend, but also never needed at all in any other executable. -- Robert Haas EDB: http://www.enterprisedb.com
Re: split TOAST support out of postgres.h
Robert Haas writes: > On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut > wrote: >>> SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension >>> breakage en masse. I would revert this. >> Well, that was sort of my thinking, but people seemed to like this. I'm >> happy to consider alternatives. > I don't think it would be very nice to do something like this in a > minor release. But in a new major release, I think it's fine. I've > been on the hook to maintain extensions in the face of these kinds of > changes at various times over the years, and it's never taken me much > time. Yeah, that was my thinking. We could never do any header refactoring at all if the standard is "will some extension author need to add a #if". In practice, we make bigger adjustments than this all the time, both in header layout and in individual function APIs. Now, there is a fair question whether splitting this code out of postgres.h is worth any trouble at all. TBH my initial reaction had been "no". But once we found that only 40-ish backend files need to read this new header, I became a "yes" vote because it seems clear that there will be a total-compilation-time benefit. regards, tom lane
Re: split TOAST support out of postgres.h
On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut wrote: > >>> Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. > >>> I think that bears out my feeling that fmgr.h wasn't a great location: > >>> I count 117 #includes of that, many of which are in .h files themselves > >>> so that many more .c files would be required to read them. > >> > >> committed > > > > SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension > > breakage en masse. I would revert this. > > Well, that was sort of my thinking, but people seemed to like this. I'm > happy to consider alternatives. I don't think that the number of extensions that get broken is really the right metric. It's not fantastic to break large numbers of extensions, of course, but if the solution is merely to add an #if PG_VERSION_NUM >= whatever #include "newstuff" #endif then I don't think it's really an issue. If an extension doesn't have an author who can do at least that much updating when a new PG release comes out, then it's basically unmaintained, and I just don't feel that bad about breaking unmaintained extensions now and then, even annually. Of course, if we go and remove something that's very widely used and for which there's no simple workaround, that sucks. Say, removing LWLocks entirely. But we don't usually do that sort of thing unless there's a good reason and significant benefits. I don't think it would be very nice to do something like this in a minor release. But in a new major release, I think it's fine. I've been on the hook to maintain extensions in the face of these kinds of changes at various times over the years, and it's never taken me much time. -- Robert Haas EDB: http://www.enterprisedb.com
Re: split TOAST support out of postgres.h
On 10.01.23 08:39, Noah Misch wrote: On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote: On 30.12.22 17:50, Tom Lane wrote: Peter Eisentraut writes: On 28.12.22 16:07, Tom Lane wrote: I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included so widely, I doubt it is buying very much in terms of reducing header footprint. How bad is it to do #2? See this incremental patch set. Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. I think that bears out my feeling that fmgr.h wasn't a great location: I count 117 #includes of that, many of which are in .h files themselves so that many more .c files would be required to read them. committed SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension breakage en masse. I would revert this. Well, that was sort of my thinking, but people seemed to like this. I'm happy to consider alternatives.
Re: split TOAST support out of postgres.h
On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote: > On 30.12.22 17:50, Tom Lane wrote: > >Peter Eisentraut writes: > >>On 28.12.22 16:07, Tom Lane wrote: > >>>I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included > >>>so widely, I doubt it is buying very much in terms of reducing header > >>>footprint. How bad is it to do #2? > > > >>See this incremental patch set. > > > >Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. > >I think that bears out my feeling that fmgr.h wasn't a great location: > >I count 117 #includes of that, many of which are in .h files themselves > >so that many more .c files would be required to read them. > > committed SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension breakage en masse. I would revert this.
Re: split TOAST support out of postgres.h
On 30.12.22 17:50, Tom Lane wrote: Peter Eisentraut writes: On 28.12.22 16:07, Tom Lane wrote: I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included so widely, I doubt it is buying very much in terms of reducing header footprint. How bad is it to do #2? See this incremental patch set. Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. I think that bears out my feeling that fmgr.h wasn't a great location: I count 117 #includes of that, many of which are in .h files themselves so that many more .c files would be required to read them. committed
Re: split TOAST support out of postgres.h
On 2022-12-30 Fr 11:50, Tom Lane wrote: > Peter Eisentraut writes: >> On 28.12.22 16:07, Tom Lane wrote: >>> I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included >>> so widely, I doubt it is buying very much in terms of reducing header >>> footprint. How bad is it to do #2? >> See this incremental patch set. > Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. > I think that bears out my feeling that fmgr.h wasn't a great location: > I count 117 #includes of that, many of which are in .h files themselves > so that many more .c files would be required to read them. > > (You did check that this passes cpluspluscheck/headerscheck, right?) > >> It seems like maybe there is some intermediate abstraction that a lot of >> these places should be using that we haven't thought of yet. > Hmm. Perhaps, but I think I'm content with this version of the patch. Looked good to me too. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: split TOAST support out of postgres.h
Peter Eisentraut writes: > On 28.12.22 16:07, Tom Lane wrote: >> I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included >> so widely, I doubt it is buying very much in terms of reducing header >> footprint. How bad is it to do #2? > See this incremental patch set. Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. I think that bears out my feeling that fmgr.h wasn't a great location: I count 117 #includes of that, many of which are in .h files themselves so that many more .c files would be required to read them. (You did check that this passes cpluspluscheck/headerscheck, right?) > It seems like maybe there is some intermediate abstraction that a lot of > these places should be using that we haven't thought of yet. Hmm. Perhaps, but I think I'm content with this version of the patch. regards, tom lane
Re: split TOAST support out of postgres.h
On 28.12.22 16:07, Tom Lane wrote: Peter Eisentraut writes: ... Then we could either 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That way we clean up the files a bit but don't change any external interfaces. 2) Just let everyone who needs it include the new file. 3) Compromise: You can avoid most "damage" by having fmgr.h include varatt.h. That satisfies most data types and extension code. That way, there are only a few places that need an explicit include of varatt.h. I went with the last option in my patch. I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included so widely, I doubt it is buying very much in terms of reducing header footprint. How bad is it to do #2? See this incremental patch set. It seems like maybe there is some intermediate abstraction that a lot of these places should be using that we haven't thought of yet.From 04a1b66c9950991fb8dab141888c85ddd2d1a67c Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 28 Dec 2022 13:30:05 +0100 Subject: [PATCH v2 1/2] New header varatt.h split off from postgres.h This new header contains all the variable-length data types support (TOAST support) from postgres.h, which isn't needed by large parts of the backend code. This new header file is included by fmgr.h, so most data type and extension code won't need any changes. --- contrib/cube/cubedata.h | 2 + contrib/pg_trgm/trgm_regexp.c | 1 + src/backend/access/table/toast_helper.c | 1 + src/backend/libpq/pqformat.c| 1 + src/include/access/htup_details.h | 1 + src/include/fmgr.h | 2 + src/include/meson.build | 1 + src/include/postgres.h | 358 +-- src/include/utils/expandeddatum.h | 2 + src/include/{postgres.h => varatt.h}| 579 +--- 10 files changed, 21 insertions(+), 927 deletions(-) copy src/include/{postgres.h => varatt.h} (53%) diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h index 96fa41a04e..d8df478059 100644 --- a/contrib/cube/cubedata.h +++ b/contrib/cube/cubedata.h @@ -1,5 +1,7 @@ /* contrib/cube/cubedata.h */ +#include "fmgr.h" + /* * This limit is pretty arbitrary, but don't make it so large that you * risk overflow in sizing calculations. diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c index 58d32ba946..08eecd269b 100644 --- a/contrib/pg_trgm/trgm_regexp.c +++ b/contrib/pg_trgm/trgm_regexp.c @@ -196,6 +196,7 @@ #include "tsearch/ts_locale.h" #include "utils/hsearch.h" #include "utils/memutils.h" +#include "varatt.h" /* * Uncomment (or use -DTRGM_REGEXP_DEBUG) to print debug info, diff --git a/src/backend/access/table/toast_helper.c b/src/backend/access/table/toast_helper.c index 74ba2189f0..a4889d7642 100644 --- a/src/backend/access/table/toast_helper.c +++ b/src/backend/access/table/toast_helper.c @@ -19,6 +19,7 @@ #include "access/toast_helper.h" #include "access/toast_internals.h" #include "catalog/pg_type_d.h" +#include "varatt.h" /* diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c index 9c24df3360..65a913e893 100644 --- a/src/backend/libpq/pqformat.c +++ b/src/backend/libpq/pqformat.c @@ -77,6 +77,7 @@ #include "libpq/pqformat.h" #include "mb/pg_wchar.h" #include "port/pg_bswap.h" +#include "varatt.h" /* diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h index c1af814e8d..c7b505e33f 100644 --- a/src/include/access/htup_details.h +++ b/src/include/access/htup_details.h @@ -19,6 +19,7 @@ #include "access/tupdesc.h" #include "access/tupmacs.h" #include "storage/bufpage.h" +#include "varatt.h" /* * MaxTupleAttributeNumber limits the number of (user) columns in a tuple. diff --git a/src/include/fmgr.h b/src/include/fmgr.h index 972afe3aff..0ce0090360 100644 --- a/src/include/fmgr.h +++ b/src/include/fmgr.h @@ -18,6 +18,8 @@ #ifndef FMGR_H #define FMGR_H +#include "varatt.h" + /* We don't want to include primnodes.h here, so make some stub references */ typedef struct Node *fmNodePtr; typedef struct Aggref *fmAggrefPtr; diff --git a/src/include/meson.build b/src/include/meson.build index b4820049c8..63b444bc99 100644 --- a/src/include/meson.build +++ b/src/include/meson.build @@ -113,6 +113,7 @@ install_headers( 'postgres.h', 'postgres_ext.h', 'postgres_fe.h', + 'varatt.h', 'windowapi.h', pg_config_ext, pg_config_os, diff --git a/src/include/postgres.h b/src/include/postgres.h index 54730dfb38..a915f2b97f 100644 --- a/src/include/postgres.h +++ b/src/include/postgres.h @@ -23,9 +23,8 @@ * * section description * --- - * 1) variable-length datatypes (TOAST support) - * 2) Datum type + support functions - * 3)
Re: split TOAST support out of postgres.h
Matthias van de Meent writes: > On Thu, 29 Dec 2022 at 18:16, Andres Freund wrote: >> We have a bunch of toast*.h files already. The new header should pretty much >> only contain the types, given how widely the header is going to be >> included. So maybe toast_type.h? > My 2 cents: I don't think that toast_anything.h is appropriate, > because even though the varatt infrastructure does enable > externally-stored oversized attributes (which is the essence of > TOAST), this is not the only (or primary) use of the type. +1 ... varatt.h sounded fine to me. I'd suggest varlena.h except we have one already. regards, tom lane
Re: split TOAST support out of postgres.h
On Thu, 29 Dec 2022 at 18:16, Andres Freund wrote: > > Hi, > > On 2022-12-28 09:07:12 -0500, Isaac Morland wrote: > > This is a bit of a bikeshed suggestion, but I'm wondering if you considered > > calling it toast.h? Only because the word is so distinctive within > > Postgres; everybody knows exactly to what it refers. > > We have a bunch of toast*.h files already. The new header should pretty much > only contain the types, given how widely the header is going to be > included. So maybe toast_type.h? My 2 cents: I don't think that toast_anything.h is appropriate, because even though the varatt infrastructure does enable externally-stored oversized attributes (which is the essence of TOAST), this is not the only (or primary) use of the type. Example: Indexes do not (can not?) support toasted values, but generally do support variable length attributes that would be pulled in with varatt.h. I don't see why we'd call the headers of variable-length attributes after one small - but not insignifcant - use case. Kind regards, Matthias van de Meent
Re: split TOAST support out of postgres.h
Hi, On 2022-12-28 09:07:12 -0500, Isaac Morland wrote: > This is a bit of a bikeshed suggestion, but I'm wondering if you considered > calling it toast.h? Only because the word is so distinctive within > Postgres; everybody knows exactly to what it refers. We have a bunch of toast*.h files already. The new header should pretty much only contain the types, given how widely the header is going to be included. So maybe toast_type.h? Greetings, Andres Freund
Re: split TOAST support out of postgres.h
Hi, I've thought about this while working on Pluggable TOAST and 64-bit TOAST value ID myself. Agree with #2 to seem the best of the above. There are not so many places where a new header should be included. On Wed, Dec 28, 2022 at 6:08 PM Tom Lane wrote: > Peter Eisentraut writes: > > ... Then we could either > > > 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That > > way we clean up the files a bit but don't change any external interfaces. > > > 2) Just let everyone who needs it include the new file. > > > 3) Compromise: You can avoid most "damage" by having fmgr.h include > > varatt.h. That satisfies most data types and extension code. That way, > > there are only a few places that need an explicit include of varatt.h. > > > I went with the last option in my patch. > > I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included > so widely, I doubt it is buying very much in terms of reducing header > footprint. How bad is it to do #2? > > regards, tom lane > > > -- Regards, Nikita Malakhov Postgres Professional https://postgrespro.ru/
Re: split TOAST support out of postgres.h
Peter Eisentraut writes: > ... Then we could either > 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That > way we clean up the files a bit but don't change any external interfaces. > 2) Just let everyone who needs it include the new file. > 3) Compromise: You can avoid most "damage" by having fmgr.h include > varatt.h. That satisfies most data types and extension code. That way, > there are only a few places that need an explicit include of varatt.h. > I went with the last option in my patch. I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included so widely, I doubt it is buying very much in terms of reducing header footprint. How bad is it to do #2? regards, tom lane
Re: split TOAST support out of postgres.h
On Wed, 28 Dec 2022 at 08:07, Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > Most backend code doesn't actually need the variable-length data types > support (TOAST support) in postgres.h. So I figured we could try to put > it into a separate header file. That makes postgres.h more manageable, > and it avoids including a bunch of complicated unused stuff everywhere. > I picked "varatt.h" as the name. Then we could either > […] > I went with the last option in my patch. > > Thoughts? This is a bit of a bikeshed suggestion, but I'm wondering if you considered calling it toast.h? Only because the word is so distinctive within Postgres; everybody knows exactly to what it refers. I definitely agree with the principle of organizing and splitting up the header files. Personally, I don't mind importing a bunch of headers if I'm using a bunch of subsystems so I would be OK with needing to import this new header if I need it.