Re: split TOAST support out of postgres.h

2023-01-12 Thread Peter Eisentraut

On 10.01.23 09:48, Peter Eisentraut wrote:

On 10.01.23 08:39, Noah Misch wrote:

On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:

On 30.12.22 17:50, Tom Lane wrote:

Peter Eisentraut  writes:

On 28.12.22 16:07, Tom Lane wrote:
I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is 
included

so widely, I doubt it is buying very much in terms of reducing header
footprint.  How bad is it to do #2?



See this incremental patch set.


Wow, 41 files requiring varatt.h is a lot fewer than I would have 
guessed.

I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.


committed


SET_VARSIZE alone appears in 74 pgxn distributions, so I predict 
extension

breakage en masse.  I would revert this.


Well, that was sort of my thinking, but people seemed to like this.  I'm 
happy to consider alternatives.


Given the subsequent discussion, I'll keep it as is for now but consider 
it a semi-open item.  It's easy to change.






Re: split TOAST support out of postgres.h

2023-01-12 Thread Robert Haas
On Wed, Jan 11, 2023 at 1:14 AM Noah Misch  wrote:
> If the patch had just made postgres.h include varatt.h, like it does elog.h,
> I'd consider that change a nonnegative.  Grouping things is nice, even if it
> makes compilation a bit slower.  That also covers your frontend use case.  How
> about that?

I'm not direly opposed to that, but I'm also unconvinced that having
the varatt.h stuff is important enough relative to other things to
justify giving it a privileged place forever.

> I agree fixing any one extension is easy enough.  Thinking back to the
> htup_details.h refactor, I found the aggregate pain unreasonable relative to
> alleged benefits, even though each individual extension wasn't too bad.
> SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62).

What annoyed me about that refactoring is that, in most cases where I
had been including htup.h, I had to change it to include
htup_details.h. In the main source tree, too, we've got 31 places that
include access/htup.h and 241 that include access/htup_details.h. It's
hard to argue that it was worth splitting the file given those
numbers, and in fact I think that change was a mistake. But that
doesn't mean every such change is a mistake.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: split TOAST support out of postgres.h

2023-01-10 Thread Noah Misch
On Tue, Jan 10, 2023 at 12:00:46PM -0500, Robert Haas wrote:
> On Tue, Jan 10, 2023 at 9:46 AM Tom Lane  wrote:
> > Now, there is a fair question whether splitting this code out of
> > postgres.h is worth any trouble at all.  TBH my initial reaction
> > had been "no".  But once we found that only 40-ish backend files
> > need to read this new header, I became a "yes" vote because it
> > seems clear that there will be a total-compilation-time benefit.

A time claim with no benchmarks is a red flag.  I've chosen to run one:

export CCACHE_DISABLE=1
change=d952373a987bad331c0e499463159dd142ced1ef
for commit in $change $change^; do
echo === git checkout $commit
git checkout $commit
for n in `seq 1 200`; do make -j20 clean; env time make -j20; done
done

Results:

commit  median mean count
d952373a987bad331c0e499463159dd142ced1ef 49.35 49.37 200
d952373a987bad331c0e499463159dd142ced1ef^ 49.33 49.36 200

That is to say, the patch made the build a bit slower, not faster.  That's
with GCC 4.8.5 (RHEL 7).  I likely should have interleaved the run types, but
in any case the speed win didn't show up.

> I wasn't totally about this, either, but I think on balance it's
> probably a smart thing to do. I always found it kind of weird that we
> put that stuff in postgres.h. It seems to privilege the TOAST
> mechanism to an undue degree; what's the argument, for example, that
> TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS()
> or LWLockAcquire or HeapTuple? It felt to me like we'd just decided
> that one subsystem gets to go into the main header file and everybody
> else just had to have their own headers.
> 
> One thing that's particularly awkward about that is that if you want
> to write some front-end code that knows about how varlenas are stored
> on disk, it was very awkward with the old structure. You're not
> supposed to include "postgres.h" into frontend code, but if the stuff
> you need is defined there then what else can you do? So I generally
> think that anything that's in postgres.h should have a strong claim to
> be not only widely-needed in the backend, but also never needed at all
> in any other executable.

If the patch had just made postgres.h include varatt.h, like it does elog.h,
I'd consider that change a nonnegative.  Grouping things is nice, even if it
makes compilation a bit slower.  That also covers your frontend use case.  How
about that?

I agree fixing any one extension is easy enough.  Thinking back to the
htup_details.h refactor, I found the aggregate pain unreasonable relative to
alleged benefits, even though each individual extension wasn't too bad.
SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62).




Re: split TOAST support out of postgres.h

2023-01-10 Thread Robert Haas
On Tue, Jan 10, 2023 at 9:46 AM Tom Lane  wrote:
> Now, there is a fair question whether splitting this code out of
> postgres.h is worth any trouble at all.  TBH my initial reaction
> had been "no".  But once we found that only 40-ish backend files
> need to read this new header, I became a "yes" vote because it
> seems clear that there will be a total-compilation-time benefit.

I wasn't totally about this, either, but I think on balance it's
probably a smart thing to do. I always found it kind of weird that we
put that stuff in postgres.h. It seems to privilege the TOAST
mechanism to an undue degree; what's the argument, for example, that
TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS()
or LWLockAcquire or HeapTuple? It felt to me like we'd just decided
that one subsystem gets to go into the main header file and everybody
else just had to have their own headers.

One thing that's particularly awkward about that is that if you want
to write some front-end code that knows about how varlenas are stored
on disk, it was very awkward with the old structure. You're not
supposed to include "postgres.h" into frontend code, but if the stuff
you need is defined there then what else can you do? So I generally
think that anything that's in postgres.h should have a strong claim to
be not only widely-needed in the backend, but also never needed at all
in any other executable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: split TOAST support out of postgres.h

2023-01-10 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut
>  wrote:
>>> SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
>>> breakage en masse.  I would revert this.

>> Well, that was sort of my thinking, but people seemed to like this.  I'm
>> happy to consider alternatives.

> I don't think it would be very nice to do something like this in a
> minor release. But in a new major release, I think it's fine. I've
> been on the hook to maintain extensions in the face of these kinds of
> changes at various times over the years, and it's never taken me much
> time.

Yeah, that was my thinking.  We could never do any header refactoring
at all if the standard is "will some extension author need to add a #if".
In practice, we make bigger adjustments than this all the time,
both in header layout and in individual function APIs.

Now, there is a fair question whether splitting this code out of
postgres.h is worth any trouble at all.  TBH my initial reaction
had been "no".  But once we found that only 40-ish backend files
need to read this new header, I became a "yes" vote because it
seems clear that there will be a total-compilation-time benefit.

regards, tom lane




Re: split TOAST support out of postgres.h

2023-01-10 Thread Robert Haas
On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut
 wrote:
> >>> Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
> >>> I think that bears out my feeling that fmgr.h wasn't a great location:
> >>> I count 117 #includes of that, many of which are in .h files themselves
> >>> so that many more .c files would be required to read them.
> >>
> >> committed
> >
> > SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
> > breakage en masse.  I would revert this.
>
> Well, that was sort of my thinking, but people seemed to like this.  I'm
> happy to consider alternatives.

I don't think that the number of extensions that get broken is really
the right metric. It's not fantastic to break large numbers of
extensions, of course, but if the solution is merely to add an #if
PG_VERSION_NUM >= whatever #include "newstuff" #endif then I don't
think it's really an issue. If an extension doesn't have an author who
can do at least that much updating when a new PG release comes out,
then it's basically unmaintained, and I just don't feel that bad about
breaking unmaintained extensions now and then, even annually.

Of course, if we go and remove something that's very widely used and
for which there's no simple workaround, that sucks. Say, removing
LWLocks entirely. But we don't usually do that sort of thing unless
there's a good reason and significant benefits.

I don't think it would be very nice to do something like this in a
minor release. But in a new major release, I think it's fine. I've
been on the hook to maintain extensions in the face of these kinds of
changes at various times over the years, and it's never taken me much
time.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: split TOAST support out of postgres.h

2023-01-10 Thread Peter Eisentraut

On 10.01.23 08:39, Noah Misch wrote:

On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:

On 30.12.22 17:50, Tom Lane wrote:

Peter Eisentraut  writes:

On 28.12.22 16:07, Tom Lane wrote:

I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint.  How bad is it to do #2?



See this incremental patch set.


Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.


committed


SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
breakage en masse.  I would revert this.


Well, that was sort of my thinking, but people seemed to like this.  I'm 
happy to consider alternatives.






Re: split TOAST support out of postgres.h

2023-01-09 Thread Noah Misch
On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:
> On 30.12.22 17:50, Tom Lane wrote:
> >Peter Eisentraut  writes:
> >>On 28.12.22 16:07, Tom Lane wrote:
> >>>I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
> >>>so widely, I doubt it is buying very much in terms of reducing header
> >>>footprint.  How bad is it to do #2?
> >
> >>See this incremental patch set.
> >
> >Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
> >I think that bears out my feeling that fmgr.h wasn't a great location:
> >I count 117 #includes of that, many of which are in .h files themselves
> >so that many more .c files would be required to read them.
> 
> committed

SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
breakage en masse.  I would revert this.




Re: split TOAST support out of postgres.h

2023-01-09 Thread Peter Eisentraut

On 30.12.22 17:50, Tom Lane wrote:

Peter Eisentraut  writes:

On 28.12.22 16:07, Tom Lane wrote:

I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint.  How bad is it to do #2?



See this incremental patch set.


Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.


committed





Re: split TOAST support out of postgres.h

2022-12-31 Thread Andrew Dunstan


On 2022-12-30 Fr 11:50, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 28.12.22 16:07, Tom Lane wrote:
>>> I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
>>> so widely, I doubt it is buying very much in terms of reducing header
>>> footprint.  How bad is it to do #2?
>> See this incremental patch set.
> Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
> I think that bears out my feeling that fmgr.h wasn't a great location:
> I count 117 #includes of that, many of which are in .h files themselves
> so that many more .c files would be required to read them.
>
> (You did check that this passes cpluspluscheck/headerscheck, right?)
>
>> It seems like maybe there is some intermediate abstraction that a lot of 
>> these places should be using that we haven't thought of yet.
> Hmm.  Perhaps, but I think I'm content with this version of the patch.


Looked good to me too.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: split TOAST support out of postgres.h

2022-12-30 Thread Tom Lane
Peter Eisentraut  writes:
> On 28.12.22 16:07, Tom Lane wrote:
>> I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
>> so widely, I doubt it is buying very much in terms of reducing header
>> footprint.  How bad is it to do #2?

> See this incremental patch set.

Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.

(You did check that this passes cpluspluscheck/headerscheck, right?)

> It seems like maybe there is some intermediate abstraction that a lot of 
> these places should be using that we haven't thought of yet.

Hmm.  Perhaps, but I think I'm content with this version of the patch.

regards, tom lane




Re: split TOAST support out of postgres.h

2022-12-30 Thread Peter Eisentraut

On 28.12.22 16:07, Tom Lane wrote:

Peter Eisentraut  writes:

... Then we could either



1) Include varatt.h in postgres.h, similar to elog.h and palloc.h.  That
way we clean up the files a bit but don't change any external interfaces.



2) Just let everyone who needs it include the new file.



3) Compromise: You can avoid most "damage" by having fmgr.h include
varatt.h.  That satisfies most data types and extension code.  That way,
there are only a few places that need an explicit include of varatt.h.



I went with the last option in my patch.


I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint.  How bad is it to do #2?


See this incremental patch set.

It seems like maybe there is some intermediate abstraction that a lot of 
these places should be using that we haven't thought of yet.From 04a1b66c9950991fb8dab141888c85ddd2d1a67c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 28 Dec 2022 13:30:05 +0100
Subject: [PATCH v2 1/2] New header varatt.h split off from postgres.h

This new header contains all the variable-length data types support
(TOAST support) from postgres.h, which isn't needed by large parts of
the backend code.

This new header file is included by fmgr.h, so most data type and
extension code won't need any changes.
---
 contrib/cube/cubedata.h |   2 +
 contrib/pg_trgm/trgm_regexp.c   |   1 +
 src/backend/access/table/toast_helper.c |   1 +
 src/backend/libpq/pqformat.c|   1 +
 src/include/access/htup_details.h   |   1 +
 src/include/fmgr.h  |   2 +
 src/include/meson.build |   1 +
 src/include/postgres.h  | 358 +--
 src/include/utils/expandeddatum.h   |   2 +
 src/include/{postgres.h => varatt.h}| 579 +---
 10 files changed, 21 insertions(+), 927 deletions(-)
 copy src/include/{postgres.h => varatt.h} (53%)

diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h
index 96fa41a04e..d8df478059 100644
--- a/contrib/cube/cubedata.h
+++ b/contrib/cube/cubedata.h
@@ -1,5 +1,7 @@
 /* contrib/cube/cubedata.h */
 
+#include "fmgr.h"
+
 /*
  * This limit is pretty arbitrary, but don't make it so large that you
  * risk overflow in sizing calculations.
diff --git a/contrib/pg_trgm/trgm_regexp.c b/contrib/pg_trgm/trgm_regexp.c
index 58d32ba946..08eecd269b 100644
--- a/contrib/pg_trgm/trgm_regexp.c
+++ b/contrib/pg_trgm/trgm_regexp.c
@@ -196,6 +196,7 @@
 #include "tsearch/ts_locale.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
+#include "varatt.h"
 
 /*
  * Uncomment (or use -DTRGM_REGEXP_DEBUG) to print debug info,
diff --git a/src/backend/access/table/toast_helper.c 
b/src/backend/access/table/toast_helper.c
index 74ba2189f0..a4889d7642 100644
--- a/src/backend/access/table/toast_helper.c
+++ b/src/backend/access/table/toast_helper.c
@@ -19,6 +19,7 @@
 #include "access/toast_helper.h"
 #include "access/toast_internals.h"
 #include "catalog/pg_type_d.h"
+#include "varatt.h"
 
 
 /*
diff --git a/src/backend/libpq/pqformat.c b/src/backend/libpq/pqformat.c
index 9c24df3360..65a913e893 100644
--- a/src/backend/libpq/pqformat.c
+++ b/src/backend/libpq/pqformat.c
@@ -77,6 +77,7 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "port/pg_bswap.h"
+#include "varatt.h"
 
 
 /* 
diff --git a/src/include/access/htup_details.h 
b/src/include/access/htup_details.h
index c1af814e8d..c7b505e33f 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -19,6 +19,7 @@
 #include "access/tupdesc.h"
 #include "access/tupmacs.h"
 #include "storage/bufpage.h"
+#include "varatt.h"
 
 /*
  * MaxTupleAttributeNumber limits the number of (user) columns in a tuple.
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index 972afe3aff..0ce0090360 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -18,6 +18,8 @@
 #ifndef FMGR_H
 #define FMGR_H
 
+#include "varatt.h"
+
 /* We don't want to include primnodes.h here, so make some stub references */
 typedef struct Node *fmNodePtr;
 typedef struct Aggref *fmAggrefPtr;
diff --git a/src/include/meson.build b/src/include/meson.build
index b4820049c8..63b444bc99 100644
--- a/src/include/meson.build
+++ b/src/include/meson.build
@@ -113,6 +113,7 @@ install_headers(
   'postgres.h',
   'postgres_ext.h',
   'postgres_fe.h',
+  'varatt.h',
   'windowapi.h',
   pg_config_ext,
   pg_config_os,
diff --git a/src/include/postgres.h b/src/include/postgres.h
index 54730dfb38..a915f2b97f 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -23,9 +23,8 @@
  *
  *   section   description
  *   ---   
- * 1)  variable-length datatypes (TOAST support)
- * 2)  Datum type + support functions
- * 3)   

Re: split TOAST support out of postgres.h

2022-12-29 Thread Tom Lane
Matthias van de Meent  writes:
> On Thu, 29 Dec 2022 at 18:16, Andres Freund  wrote:
>> We have a bunch of toast*.h files already. The new header should pretty much
>> only contain the types, given how widely the header is going to be
>> included. So maybe toast_type.h?

> My 2 cents: I don't think that toast_anything.h is appropriate,
> because even though the varatt infrastructure does enable
> externally-stored oversized attributes (which is the essence of
> TOAST), this is not the only (or primary) use of the type.

+1 ... varatt.h sounded fine to me.  I'd suggest varlena.h except
we have one already.

regards, tom lane




Re: split TOAST support out of postgres.h

2022-12-29 Thread Matthias van de Meent
On Thu, 29 Dec 2022 at 18:16, Andres Freund  wrote:
>
> Hi,
>
> On 2022-12-28 09:07:12 -0500, Isaac Morland wrote:
> > This is a bit of a bikeshed suggestion, but I'm wondering if you considered
> > calling it toast.h? Only because the word is so distinctive within
> > Postgres; everybody knows exactly to what it refers.
>
> We have a bunch of toast*.h files already. The new header should pretty much
> only contain the types, given how widely the header is going to be
> included. So maybe toast_type.h?

My 2 cents: I don't think that toast_anything.h is appropriate,
because even though the varatt infrastructure does enable
externally-stored oversized attributes (which is the essence of
TOAST), this is not the only (or primary) use of the type.

Example: Indexes do not (can not?) support toasted values, but
generally do support variable length attributes that would be pulled
in with varatt.h. I don't see why we'd call the headers of
variable-length attributes after one small - but not insignifcant -
use case.

Kind regards,

Matthias van de Meent




Re: split TOAST support out of postgres.h

2022-12-29 Thread Andres Freund
Hi,

On 2022-12-28 09:07:12 -0500, Isaac Morland wrote:
> This is a bit of a bikeshed suggestion, but I'm wondering if you considered
> calling it toast.h? Only because the word is so distinctive within
> Postgres; everybody knows exactly to what it refers.

We have a bunch of toast*.h files already. The new header should pretty much
only contain the types, given how widely the header is going to be
included. So maybe toast_type.h?

Greetings,

Andres Freund




Re: split TOAST support out of postgres.h

2022-12-28 Thread Nikita Malakhov
Hi,

I've thought about this while working on Pluggable TOAST and 64-bit
TOAST value ID myself. Agree with #2 to seem the best of the above.
There are not so many places where a new header should be included.

On Wed, Dec 28, 2022 at 6:08 PM Tom Lane  wrote:

> Peter Eisentraut  writes:
> > ... Then we could either
>
> > 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h.  That
> > way we clean up the files a bit but don't change any external interfaces.
>
> > 2) Just let everyone who needs it include the new file.
>
> > 3) Compromise: You can avoid most "damage" by having fmgr.h include
> > varatt.h.  That satisfies most data types and extension code.  That way,
> > there are only a few places that need an explicit include of varatt.h.
>
> > I went with the last option in my patch.
>
> I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
> so widely, I doubt it is buying very much in terms of reducing header
> footprint.  How bad is it to do #2?
>
> regards, tom lane
>
>
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: split TOAST support out of postgres.h

2022-12-28 Thread Tom Lane
Peter Eisentraut  writes:
> ... Then we could either

> 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h.  That 
> way we clean up the files a bit but don't change any external interfaces.

> 2) Just let everyone who needs it include the new file.

> 3) Compromise: You can avoid most "damage" by having fmgr.h include 
> varatt.h.  That satisfies most data types and extension code.  That way, 
> there are only a few places that need an explicit include of varatt.h.

> I went with the last option in my patch.

I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint.  How bad is it to do #2?

regards, tom lane




Re: split TOAST support out of postgres.h

2022-12-28 Thread Isaac Morland
On Wed, 28 Dec 2022 at 08:07, Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> Most backend code doesn't actually need the variable-length data types
> support (TOAST support) in postgres.h.  So I figured we could try to put
> it into a separate header file.  That makes postgres.h more manageable,
> and it avoids including a bunch of complicated unused stuff everywhere.
> I picked "varatt.h" as the name.  Then we could either
>
[…]

> I went with the last option in my patch.
>
> Thoughts?


This is a bit of a bikeshed suggestion, but I'm wondering if you considered
calling it toast.h? Only because the word is so distinctive within
Postgres; everybody knows exactly to what it refers.

I definitely agree with the principle of organizing and splitting up the
header files. Personally, I don't mind importing a bunch of headers if I'm
using a bunch of subsystems so I would be OK with needing to import this
new header if I need it.