Re: Trying to understand a failed upgrade in AWS RDS

2023-05-19 Thread Julien Rouhaud
On Sat, 20 May 2023, 05:56 Mike Lissner, 
wrote:

>
> I'm still trying to understand what went wrong though. Putting a finer
> point on my question: Does pg_upgrade mess up disabled subscriptions?
>

yes, whether they're disabled or not. As far as I know it's impossible to
reliably pg_upgrade a node that has subscriptions and eventually resume
logical replication.

It's possible to make it work with some efforts in some basic
configurations and / or if no changes happen on the publications, but it's
up to you trying to find out if your specific scenario can work as it's not
documented. It's also impossible to check whether some incompatible events
happened on any of the publisher nodes so you have to make sure that you
have total control and knowledge of all the activity that happens on the
publisher nodes during the upgrade.

>


Profiling a function call

2023-05-19 Thread Tiffany Thang
Hi,
I have a function that has been executing for a long time and not returning
any results. Wait event=NULL so it seems like it is still executing and not
waiting on any specific resources. Is there a way to profile the function
call to get an idea of what it is currently executing within the function?
All I could find in pg_stat_activity is the function is running. I want to
know what query/transaction within the function is running. Is there a way
to obtain that information?

PostgreSQL 15.2 on Linux.

Thanks.

Tiff


Re: Trying to understand a failed upgrade in AWS RDS

2023-05-19 Thread Mike Lissner
Thanks for the suggestions. I think in the future I'll do something like
this rather than try to re-use existing subscriptions.

I'm still trying to understand what went wrong though. Putting a finer
point on my question: Does pg_upgrade mess up disabled subscriptions?

On Fri, May 19, 2023 at 1:55 PM Elterman, Michael 
wrote:

> Please, use the following runbook.
> 1. Disable the subscription to pg10.
> 2. Disable Application Users on Publisher.
> 3. Drop all replication slots on Publisher (The upgrade can not be
> executed if there are any replication slots)
> 4. Run RDS's upgrade (which runs pg_upgrade).
> 5. Recreate replication slots with the same names.
> 6. Enable Application Users on Publisher.
> 7. Re-Enable the subscriptions to the newly upgraded server.
> Good luck
>
> On Fri, May 19, 2023 at 11:49 AM Mike Lissner <
> mliss...@michaeljaylissner.com> wrote:
>
>> I also am realizing belatedly that my solution of dropping the subscriber
>> probably won't work anyway, since I'd lose the changes on the publisher for
>> the duration of the upgrade. Maybe I could drop the subscription while
>> keeping the slot on the publisher, and then create a new subscription after
>> the upgrade using that slot and copy_data=False? Getting wonky.
>>
>> On Fri, May 19, 2023 at 8:17 AM Mike Lissner <
>> mliss...@michaeljaylissner.com> wrote:
>>
>>> Hi all,
>>>
>>> In AWS RDS, we are using logical replication between a postgresql 14
>>> publisher and a postgresql 10 subscriber. The subscriber is rather old, so
>>> yesterday I tried to update it using AWS's built in upgrade tool (which
>>> uses pg_upgrade behind the scenes).
>>>
>>> I did a pretty thorough test run before beginning, but the live run went
>>> pretty poorly. My process was:
>>>
>>> 1. Disable the subscription to pg10.
>>> 2. Run RDS's upgrade (which runs pg_upgrade).
>>> 3. Re-Enable the subscription to the newly upgraded server.
>>>
>>> The idea was that the publisher could still be live and collect changes,
>>> and then on step 3, those changes would flush to the newly upgraded server.
>>>
>>> When I hit step three, things went awry. From what I can tell, it seems
>>> like pg_upgrade might have wiped out the LSN location of the subscriber,
>>> because I was getting many messages in the logs saying:
>>>
>>> 2023-05-19 01:01:09 
>>> UTC:100.20.224.120(56536):django@courtlistener:[29669]:STATEMENT: 
>>> CREATE_REPLICATION_SLOT "pg_18278_sync_86449755_7234675743763347169" 
>>> LOGICAL pgoutput USE_SNAPSHOT2023-05-19 01:01:09 
>>> UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: replication 
>>> slot "pg_18278_sync_16561_7234675743763347169" does not exist2023-05-19 
>>> 01:01:09 UTC:100.20.224.120(56550):django@courtlistener:[29670]:STATEMENT: 
>>> DROP_REPLICATION_SLOT pg_18278_sync_16561_7234675743763347169 
>>> WAIT2023-05-19 01:01:09 
>>> UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: all 
>>> replication slots are in use2023-05-19 01:01:09 
>>> UTC:100.20.224.120(56550):django@courtlistener:[29670]:HINT: Free one or 
>>> increase max_replication_slots.
>>>
>>> I followed those instructions, and upped max_replication_slots to 200.
>>> That fixed that error, but then I had errors about COPY commands failing,
>>> and looking in the publisher I saw about 150 slots like:
>>>
>>> select * from pg_replication_slots ;
>>>  slot_name  |  plugin  | slot_type | datoid 
>>> |   database| temporary | active | active_pid | xmin | catalog_xmin | 
>>> restart_lsn  | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase 
>>> +--+---++---+---+++--+--+--+-++---+---
>>>  pg_18278_sync_86449408_7234675743763347169 | pgoutput | logical   |  16428 
>>> | courtlistener | f | t  |   6906 |  |859962500 | 
>>> EA5/954A9F18 | | reserved   |   | f
>>>  pg_18278_sync_20492279_7234675743763347169 | pgoutput | logical   |  16428 
>>> | courtlistener | f | f  ||  |859962448 | 
>>> EA5/9548EDF0 | EA5/9548EE28| reserved   |   | f
>>>  pg_18278_sync_16940_7234675743763347169| pgoutput | logical   |  16428 
>>> | courtlistener | f | f  ||  |859962448 | 
>>> EA5/9548EE60 | EA5/9548EE98| reserved   |   | f
>>>
>>>
>>> So this looks like it's trying to sync all of the existing tables all
>>> over again when I re-enabled the subscription.
>>>
>>> Does that make sense? In the future, I'll DROP the subscription and then
>>> create a new one with copy_data=False, but this was a real gotcha.
>>>
>>> Anybody know what's going on here?
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>


Re: Trying to understand a failed upgrade in AWS RDS

2023-05-19 Thread Elterman, Michael
Please, use the following runbook.
1. Disable the subscription to pg10.
2. Disable Application Users on Publisher.
3. Drop all replication slots on Publisher (The upgrade can not be executed
if there are any replication slots)
4. Run RDS's upgrade (which runs pg_upgrade).
5. Recreate replication slots with the same names.
6. Enable Application Users on Publisher.
7. Re-Enable the subscriptions to the newly upgraded server.
Good luck

On Fri, May 19, 2023 at 11:49 AM Mike Lissner <
mliss...@michaeljaylissner.com> wrote:

> I also am realizing belatedly that my solution of dropping the subscriber
> probably won't work anyway, since I'd lose the changes on the publisher for
> the duration of the upgrade. Maybe I could drop the subscription while
> keeping the slot on the publisher, and then create a new subscription after
> the upgrade using that slot and copy_data=False? Getting wonky.
>
> On Fri, May 19, 2023 at 8:17 AM Mike Lissner <
> mliss...@michaeljaylissner.com> wrote:
>
>> Hi all,
>>
>> In AWS RDS, we are using logical replication between a postgresql 14
>> publisher and a postgresql 10 subscriber. The subscriber is rather old, so
>> yesterday I tried to update it using AWS's built in upgrade tool (which
>> uses pg_upgrade behind the scenes).
>>
>> I did a pretty thorough test run before beginning, but the live run went
>> pretty poorly. My process was:
>>
>> 1. Disable the subscription to pg10.
>> 2. Run RDS's upgrade (which runs pg_upgrade).
>> 3. Re-Enable the subscription to the newly upgraded server.
>>
>> The idea was that the publisher could still be live and collect changes,
>> and then on step 3, those changes would flush to the newly upgraded server.
>>
>> When I hit step three, things went awry. From what I can tell, it seems
>> like pg_upgrade might have wiped out the LSN location of the subscriber,
>> because I was getting many messages in the logs saying:
>>
>> 2023-05-19 01:01:09 
>> UTC:100.20.224.120(56536):django@courtlistener:[29669]:STATEMENT: 
>> CREATE_REPLICATION_SLOT "pg_18278_sync_86449755_7234675743763347169" LOGICAL 
>> pgoutput USE_SNAPSHOT2023-05-19 01:01:09 
>> UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: replication 
>> slot "pg_18278_sync_16561_7234675743763347169" does not exist2023-05-19 
>> 01:01:09 UTC:100.20.224.120(56550):django@courtlistener:[29670]:STATEMENT: 
>> DROP_REPLICATION_SLOT pg_18278_sync_16561_7234675743763347169 WAIT2023-05-19 
>> 01:01:09 UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: all 
>> replication slots are in use2023-05-19 01:01:09 
>> UTC:100.20.224.120(56550):django@courtlistener:[29670]:HINT: Free one or 
>> increase max_replication_slots.
>>
>> I followed those instructions, and upped max_replication_slots to 200.
>> That fixed that error, but then I had errors about COPY commands failing,
>> and looking in the publisher I saw about 150 slots like:
>>
>> select * from pg_replication_slots ;
>>  slot_name  |  plugin  | slot_type | datoid 
>> |   database| temporary | active | active_pid | xmin | catalog_xmin | 
>> restart_lsn  | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase 
>> +--+---++---+---+++--+--+--+-++---+---
>>  pg_18278_sync_86449408_7234675743763347169 | pgoutput | logical   |  16428 
>> | courtlistener | f | t  |   6906 |  |859962500 | 
>> EA5/954A9F18 | | reserved   |   | f
>>  pg_18278_sync_20492279_7234675743763347169 | pgoutput | logical   |  16428 
>> | courtlistener | f | f  ||  |859962448 | 
>> EA5/9548EDF0 | EA5/9548EE28| reserved   |   | f
>>  pg_18278_sync_16940_7234675743763347169| pgoutput | logical   |  16428 
>> | courtlistener | f | f  ||  |859962448 | 
>> EA5/9548EE60 | EA5/9548EE98| reserved   |   | f
>>
>>
>> So this looks like it's trying to sync all of the existing tables all
>> over again when I re-enabled the subscription.
>>
>> Does that make sense? In the future, I'll DROP the subscription and then
>> create a new one with copy_data=False, but this was a real gotcha.
>>
>> Anybody know what's going on here?
>>
>> Thanks,
>>
>> Mike
>>
>


Re: Unexpected cross-database vacuum impact with hot_standby_feedback=on

2023-05-19 Thread Laurenz Albe
On Thu, 2023-05-18 at 17:34 +0100, Owen Stephens wrote:
> We are seeing that vacuum is prevented from cleaning dead tuples by an open
> transaction in a different database (where both connections are made against 
> the
> primary server) when hot_standby_feedback = on but not when it is off. Is this
> cross-database interaction an expected effect of enabling 
> hot_standby_feedback,
> even if the connections interact only with the primary not the replica?

Yes, that's what I would expect.  There is only one "backend_xmin" in
"pg_stat_replication", which corresponds to the snapshot held by the oldest
query in any database on the standby server.

Yours,
Laurenz Albe




Re: Modeling combinations (options and dependencies)

2023-05-19 Thread Peter J. Holzer
On 2023-05-18 19:21:23 +, eacil wrote:
> -
> DATA
> -
> 
> -You have objects that are linked to many tables describing their
> properties, such as an object__tag table.
> -These objects are sold in releases.
> -Thing is, objects can be made of objects. I call them superobjects
> when they assume that container role (as objects, they can be part of
> other superobjects). Multiple releases can be linked to one
> superobject. Yes, it's a parts scenario.
> -Objects can be shuffled in all sorts of unpredictable ways so it's
> useless to think that if one object is contained inside some
> superobject A, if that superobject A will be contained inside another
> superobject B, superobject B will inherit all objects from superobject
> A. In fact, my attempt at a solution will instantiate objects to link
> them to superobjects.
> -In a superobject, multiple objects can be grouped together into a set
> of options. One option can be made of multiple objects (majority of
> the time, just one). You can choose how many options minimum and
> maximum (aka a range) you can choose from a set (majority of the time,
> just one). It's possible to have a NONE choice aka you don't choose
> any option.
> -Any object can have 0 (common), 1 (common), or more (very rare)
> dependencies.
> 
> That's pretty much all.

Not sure if I followed that completely, but it certainly looks like a
quite complicated system of constraints.

> CREATE TABLE pairing (
> superobject_id integer NOT NULL,
> instance_a integer NOT NULL,
> instance_b integer NOT NULL,
> CONSTRAINT pairing__pk PRIMARY KEY (superobject_id,instance_a,instance_b)
> );

So if I understand this correctly, you are computing all the
combinations somewhere outside the database and then filling in the
pairings table with all valid pairs? Doesn't that have the same problem
of possibly generating an exponential number of combinations?

I'm also not sure if this is sufficient. If (A,B),(A,C),(A,D),(B,C),
(B,D),(C,D) are all valid combinations within superobject S, does that really
mean that (A,B,C,D) is a valid combination? I suspect you still have to
validate the results.

If you have to validate the results anyway, maybe you can radically
simplify the filter: Just add one row for each object which can possibly
appear in each super-object (or even: 1 row for each super-object with
an array of objects). Then find all super-objects which can contain all
the objects you are looking for and finally filter those in your
application. (Although I wonder how fast that validation is: That also
looks like it could potentially have exponential runtime)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

2023-05-19 Thread Peter J. Holzer
On 2023-05-19 16:55:00 +0200, Victor Nordam Suadicani wrote:
> Thanks for the perspective :)
> 
> > If you need SQL, you need to design for SQL for the get go. Not shoehorn 
> > your
> >  Rust data model into SQL.
> 
> Sometimes the data in the domain really does fit a sum type and then a sum 
> type
> is the right tool to use (whether you use Rust or Haskell or whatever
> language). Trying to shoehorn your domain data model into a data format that
> doesn't fit isn't the way to go either. I feel like it's a deficiency in SQL
> that there is no support for sum types. I would guess this is influenced by 
> the
> fact that SQL was developed in a time when there were no major programming
> languages with sum type support either.

That may have been a reason. It is worth noting, however that (at least
some versions of) entity-relationship diagrams do have graphical
symbols for "exactly/at most one of these attributes must be set". This
is very similar to the example you gave, where each of the three
variants had a name (and very different to e.g. TypeScript, where you
can just define a variable to be of type «string | number | Date»)

The way this is implemented is quite straightforward, as Dominique
wrote: You have several nullable columns and a constraint which ensures
that only one is not null.

For example

create table alternatetest (
id serial primary key,
name text,
i int, f float, t text, -- these three form a union/sum type
check (
(i is not null)::int + (f is not null)::int + (t is not null)::int 
= 1
)
);

The space overhead is very small (in PostgreSQL it's a single bit per
nullable column).

An ORM for Rust should be able to read the three columns and stuff them
into a single variable of sum type. I don't know Rust well enough if the
other direction is possible automatically (does the compiler and/or the
run-time system keep track which variant is currently valid?), but even
if it isn't, surely the application must have that information and be
able to pass it into the ORM.

If you aren't using an ORM but using SQL directly (as I prefer to do as
I find that ORMs are just another leaky abstraction layer which makes
stuff harder instead of easier) then you have to code that yourself, but
you can probably centralize that somewhere and the rest of your code
will be blissfully unaware.

(Of course you can stuff those values in a single column of JSONB type.
But I don't think this is better.)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Trying to understand a failed upgrade in AWS RDS

2023-05-19 Thread Mike Lissner
I also am realizing belatedly that my solution of dropping the subscriber
probably won't work anyway, since I'd lose the changes on the publisher for
the duration of the upgrade. Maybe I could drop the subscription while
keeping the slot on the publisher, and then create a new subscription after
the upgrade using that slot and copy_data=False? Getting wonky.

On Fri, May 19, 2023 at 8:17 AM Mike Lissner 
wrote:

> Hi all,
>
> In AWS RDS, we are using logical replication between a postgresql 14
> publisher and a postgresql 10 subscriber. The subscriber is rather old, so
> yesterday I tried to update it using AWS's built in upgrade tool (which
> uses pg_upgrade behind the scenes).
>
> I did a pretty thorough test run before beginning, but the live run went
> pretty poorly. My process was:
>
> 1. Disable the subscription to pg10.
> 2. Run RDS's upgrade (which runs pg_upgrade).
> 3. Re-Enable the subscription to the newly upgraded server.
>
> The idea was that the publisher could still be live and collect changes,
> and then on step 3, those changes would flush to the newly upgraded server.
>
> When I hit step three, things went awry. From what I can tell, it seems
> like pg_upgrade might have wiped out the LSN location of the subscriber,
> because I was getting many messages in the logs saying:
>
> 2023-05-19 01:01:09 
> UTC:100.20.224.120(56536):django@courtlistener:[29669]:STATEMENT: 
> CREATE_REPLICATION_SLOT "pg_18278_sync_86449755_7234675743763347169" LOGICAL 
> pgoutput USE_SNAPSHOT2023-05-19 01:01:09 
> UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: replication 
> slot "pg_18278_sync_16561_7234675743763347169" does not exist2023-05-19 
> 01:01:09 UTC:100.20.224.120(56550):django@courtlistener:[29670]:STATEMENT: 
> DROP_REPLICATION_SLOT pg_18278_sync_16561_7234675743763347169 WAIT2023-05-19 
> 01:01:09 UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: all 
> replication slots are in use2023-05-19 01:01:09 
> UTC:100.20.224.120(56550):django@courtlistener:[29670]:HINT: Free one or 
> increase max_replication_slots.
>
> I followed those instructions, and upped max_replication_slots to 200.
> That fixed that error, but then I had errors about COPY commands failing,
> and looking in the publisher I saw about 150 slots like:
>
> select * from pg_replication_slots ;
>  slot_name  |  plugin  | slot_type | datoid | 
>   database| temporary | active | active_pid | xmin | catalog_xmin | 
> restart_lsn  | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase 
> +--+---++---+---+++--+--+--+-++---+---
>  pg_18278_sync_86449408_7234675743763347169 | pgoutput | logical   |  16428 | 
> courtlistener | f | t  |   6906 |  |859962500 | 
> EA5/954A9F18 | | reserved   |   | f
>  pg_18278_sync_20492279_7234675743763347169 | pgoutput | logical   |  16428 | 
> courtlistener | f | f  ||  |859962448 | 
> EA5/9548EDF0 | EA5/9548EE28| reserved   |   | f
>  pg_18278_sync_16940_7234675743763347169| pgoutput | logical   |  16428 | 
> courtlistener | f | f  ||  |859962448 | 
> EA5/9548EE60 | EA5/9548EE98| reserved   |   | f
>
>
> So this looks like it's trying to sync all of the existing tables all over
> again when I re-enabled the subscription.
>
> Does that make sense? In the future, I'll DROP the subscription and then
> create a new one with copy_data=False, but this was a real gotcha.
>
> Anybody know what's going on here?
>
> Thanks,
>
> Mike
>


Trying to understand a failed upgrade in AWS RDS

2023-05-19 Thread Mike Lissner
Hi all,

In AWS RDS, we are using logical replication between a postgresql 14
publisher and a postgresql 10 subscriber. The subscriber is rather old, so
yesterday I tried to update it using AWS's built in upgrade tool (which
uses pg_upgrade behind the scenes).

I did a pretty thorough test run before beginning, but the live run went
pretty poorly. My process was:

1. Disable the subscription to pg10.
2. Run RDS's upgrade (which runs pg_upgrade).
3. Re-Enable the subscription to the newly upgraded server.

The idea was that the publisher could still be live and collect changes,
and then on step 3, those changes would flush to the newly upgraded server.

When I hit step three, things went awry. From what I can tell, it seems
like pg_upgrade might have wiped out the LSN location of the subscriber,
because I was getting many messages in the logs saying:

2023-05-19 01:01:09
UTC:100.20.224.120(56536):django@courtlistener:[29669]:STATEMENT:
CREATE_REPLICATION_SLOT "pg_18278_sync_86449755_7234675743763347169"
LOGICAL pgoutput USE_SNAPSHOT2023-05-19 01:01:09
UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR:
replication slot "pg_18278_sync_16561_7234675743763347169" does not
exist2023-05-19 01:01:09
UTC:100.20.224.120(56550):django@courtlistener:[29670]:STATEMENT:
DROP_REPLICATION_SLOT pg_18278_sync_16561_7234675743763347169
WAIT2023-05-19 01:01:09
UTC:100.20.224.120(56550):django@courtlistener:[29670]:ERROR: all
replication slots are in use2023-05-19 01:01:09
UTC:100.20.224.120(56550):django@courtlistener:[29670]:HINT: Free one
or increase max_replication_slots.

I followed those instructions, and upped max_replication_slots to 200. That
fixed that error, but then I had errors about COPY commands failing, and
looking in the publisher I saw about 150 slots like:

select * from pg_replication_slots ;
 slot_name  |  plugin  | slot_type |
datoid |   database| temporary | active | active_pid | xmin |
catalog_xmin | restart_lsn  | confirmed_flush_lsn | wal_status |
safe_wal_size | two_phase
+--+---++---+---+++--+--+--+-++---+---
 pg_18278_sync_86449408_7234675743763347169 | pgoutput | logical   |
16428 | courtlistener | f | t  |   6906 |  |
859962500 | EA5/954A9F18 | | reserved   |
 | f
 pg_18278_sync_20492279_7234675743763347169 | pgoutput | logical   |
16428 | courtlistener | f | f  ||  |
859962448 | EA5/9548EDF0 | EA5/9548EE28| reserved   |
 | f
 pg_18278_sync_16940_7234675743763347169| pgoutput | logical   |
16428 | courtlistener | f | f  ||  |
859962448 | EA5/9548EE60 | EA5/9548EE98| reserved   |
 | f


So this looks like it's trying to sync all of the existing tables all over
again when I re-enabled the subscription.

Does that make sense? In the future, I'll DROP the subscription and then
create a new one with copy_data=False, but this was a real gotcha.

Anybody know what's going on here?

Thanks,

Mike


Re: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

2023-05-19 Thread Victor Nordam Suadicani
On Fri, 19 May 2023 at 12:44, Dominique Devienne 
wrote:

> On Thu, May 18, 2023 at 2:28 PM Victor Nordam Suadicani <
> v.n.suadic...@gmail.com> wrote:
>
>> Is there any nice way to handle sum types (aka tagged unions) in a
>> PostgreSQL database? [...]
>>
> A third method would be to save all fields of all variants into a single
>> table, with all fields being nullable.
>>
> So you'd have a nullable text field, nullable integer and nullable double
>> precision field.
>>
>
> Yes, we do that. That's IMHO the only sane way to do it.
> And if some of those alternatives are FKs (relations), that's the only
> choice.
>
> You'd then need an additional tag field to indicate which variant of the
>> union is used
>>
>
> No, you don't need it. That's implicit from the NULL'ability of the
> alternative columns.
> If you want, you can have it as a generated column, thus read-only.
> Worse, having it as an explicit column would make it denormalized, and
> possibly out of sync.
>
>
>> and you'd have to write check constraints for each variant to ensure that
>> all the fields in that variant are not null and all the fields not in that
>> variant *are* null.
>>
>
> Yes indeed.
>
>
>> This *almost* works, but has two major problems:
>>
>> 1. It wastes space. In Rust, an enum is only as big as its largest
>> variant. Using this method, a table row would be as big as the sum of all
>> the variants.
>>
>
> Not really, or not to a point it matters that much.
> I don't know about the actual physical bytes on disk for PostgreSQL, but
> as an example in SQLite,
> all columns have *at least* 1 "header" byte per value, and NULL values
> (and 0 and 1) have no "payload" bytes.
> In PostgreSQL (which is more "rigidly typed" as DRH would say :)) you may
> waste space for primitive types,
> but not for text and bytea, which is where it really matters IMHO.
>
>
>> 2. Querying the data is very cumbersome, [...].
>>
>
> Sure, it's cumbersome. But I don't get your point here. NULL handling is
> part of SQL.
> And sum-types (variants) implemented via exclusive NULL'ness is just a
> special case.
> You "dispatch" to the proper column on writes. You read all alternatives
> and assign the one (if any) NOT NULL to the variant.
>
>
>> Both of these problems get bigger and bigger as you add more variants -
>> it doesn't scale well.
>>
>
> ORMs cannot magically resolve the impedence mismatch between SQL and
> OO-based or sum-type based type systems a la Rust (and co).
> If you need SQL, you need to design for SQL for the get go. Not shoehorn
> your Rust data model into SQL.
>
> My $0.02.
>

Thanks for the perspective :)

> If you need SQL, you need to design for SQL for the get go. Not shoehorn
your Rust data model into SQL.

Sometimes the data in the domain really does fit a sum type and then a sum
type is the right tool to use (whether you use Rust or Haskell or whatever
language). Trying to shoehorn your domain data model into a data format
that doesn't fit isn't the way to go either. I feel like it's a deficiency
in SQL that there is no support for sum types. I would guess this is
influenced by the fact that SQL was developed in a time when there were no
major programming languages with sum type support either.

But really it's not that I "need" SQL per se, it's just that SQL databases
are the most developed and used at this time. Do you know of any other
production-grade databases that actually support sum types in a better way
than SQL? I'd be very curious cause I haven't really found any.


Re: a simple-minded question about updating

2023-05-19 Thread Adrian Klaver

On 5/19/23 07:25, Martin Mueller wrote:
I currently work with Postgres 13. I forgot to mention that. From which 
I gather that around version 18 it would be time to upgrade. But in the 
interim I’d be OK.


Right?


Strictly speaking yes, though the longer you wait the greater the number 
of changes you will have to deal with. It is a good idea to read the 
Release Notes:


https://www.postgresql.org/docs/current/release.html

for each new release to see how far away the new code is getting from 
your existing version. Breaking changes are restricted(with some 
exceptions) to the X.0 release where X is new major version. So you can 
save some time by just looking at the notes for the .0 releases.



--
Adrian Klaver
adrian.kla...@aklaver.com



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: a simple-minded question about updating

2023-05-19 Thread Martin Mueller
I currently work with Postgres 13. I forgot to mention that. From which I 
gather that around version 18 it would be time to upgrade. But in the interim 
I’d be OK.
Right?

From: Adrian Klaver 
Date: Thursday, May 18, 2023 at 11:21 PM
To: Martin Mueller , 
pgsql-general@lists.postgresql.org 
Subject: Re: a simple-minded question about updating
On 5/18/23 21:08, Martin Mueller wrote:
> I work with Postgres and wonder whether for my purposes there is a
> good-enough reason to update one of these days.

Since you have not mentioned the Postgres version you are on now, there
is really no definitive way to answer this.

Though as a rule keeping up with minor updates to whatever major version
you running is a good idea. Should be noted that at some point ~5 years
from a version's initial release the minor updates will stop. That is
when community support ends. Then the answer to any problem you have
will start with; you should upgrade.

>
> Martin Mueller
>
> Professor emeritus of English and Classiccs
>
> Northwestern University
>

--
Adrian Klaver
adrian.kla...@aklaver.com


Re: Records, Types, and Arrays

2023-05-19 Thread David G. Johnston
The convention here is to inline or, at worse, bottom post, trimming
irrelevant context.

On Friday, May 19, 2023, Raymond Brinzer  wrote:

>
> I'm guessing that row() isn't really a function, then?  And even so,
> assuming this is the important difference, how is the ability to change row
> structure on the fly making the cast possible?  In what way would the query
> calling get_row() be critical?
>

Row() is pure syntax. It distinguishes (col) vs. row(col) where the first
is just a column in parentheses and the second is a composite with one
column member.

 David J.


How to connect with PostgreSQL Database with SSL using Certificates and Key from client Eclipse in Java

2023-05-19 Thread sujay kadam
Hi Team.



I am trying to connect with PostgreSQL database from client with SSL
enabled on server 10.30.32.186 port 6432 using below java code -

I am using certificates ( [server-cert.pem, server-key.pem, ca.cert] and
[postgresql.crt, postgresql.pk8, root.crt] ).

Suggest me if there are any specific java understandable certificate and
key file format.


package com.ssl;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;

public class DBConnect {

private final String url = "jdbc:postgresql://
10.30.32.186:6432/postgres?sslmode=require=/root/.postgresql/postgresql.crt=/root/.postgresql/postgresql.pk8=/root/.postgresql/root.crt=postgress
";

private final String user = "postgres";
private final String password = "postgres123";

/**
 * Connect to the PostgreSQL database
 *
 * @return a Connection object
 */
public Connection connect() {
Connection conn = null;
try {
conn = DriverManager.getConnection(url, user, password);
System.out.println("Connected to the PostgreSQL server
successfully.");
} catch (SQLException e) {
System.out.println(e.getMessage());
}

return conn;
}

public static void main(String[] args) {

DBConnect db = new DBConnect();
db.connect();

}

}

Gives Error -

SSL error: -1



Code NO 2 -

package SSL_Enablement;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;

public class PostgresSSLConnection {
public static void main(String[] args) {
Connection conn = null;
try {
// Set SSL properties
Properties props = new Properties();
props.setProperty("user", "postgres");
props.setProperty("password", "postgres123");
props.setProperty("ssl", "true");
props.setProperty("https.protocols", "TLSv1.2");
props.setProperty("sslmode", "Verify-CA");
props.setProperty("sslcert",
"/root/.postgresql/server-cert.pem");
props.setProperty("sslkey", "/root/.postgresql/server-key.pem");
props.setProperty("sslrootcert", "/root/.postgresql/ca.cert");

// Initialize SSL context
Class.forName("org.postgresql.Driver");
String url = "jdbc:postgresql://10.30.32.186:6432/postgres";
conn = DriverManager.getConnection(url, props);
System.out.println("Connected DB using SSL");
// Use the connection...
// ...

} catch (SQLException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
} finally {
try {
if (conn != null) {
conn.close();
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}

Gives Error -

org.postgresql.util.PSQLException: Could not read SSL key file
/root/.postgresql/server-key.pem.
 at org.postgresql.ssl.LazyKeyManager.getPrivateKey(LazyKeyManager.java:284)
 at
sun.security.ssl.AbstractKeyManagerWrapper.getPrivateKey(SSLContextImpl.java:1552)
 at
sun.security.ssl.X509Authentication$X509PossessionGenerator.createClientPossession(X509Authentication.java:220)
 at
sun.security.ssl.X509Authentication$X509PossessionGenerator.createPossession(X509Authentication.java:175)
 at
sun.security.ssl.X509Authentication.createPossession(X509Authentication.java:88)
 at
sun.security.ssl.CertificateMessage$T13CertificateProducer.choosePossession(CertificateMessage.java:1080)
 at
sun.security.ssl.CertificateMessage$T13CertificateProducer.onProduceCertificate(CertificateMessage.java:1101)
 at
sun.security.ssl.CertificateMessage$T13CertificateProducer.produce(CertificateMessage.java:958)
 at sun.security.ssl.SSLHandshake.produce(SSLHandshake.java:421)
 at
sun.security.ssl.Finished$T13FinishedConsumer.onConsumeFinished(Finished.java:989)
 at sun.security.ssl.Finished$T13FinishedConsumer.consume(Finished.java:852)
 at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:377)
 at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:444)
 at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:422)
 at sun.security.ssl.TransportContext.dispatch(TransportContext.java:182)
 at sun.security.ssl.SSLTransport.decode(SSLTransport.java:152)
 at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1397)
 at
sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1305)
 at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:440)
 at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:41)
 at
org.postgresql.core.v3.ConnectionFactoryImpl.enableSSL(ConnectionFactoryImpl.java:584)
 at
org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:168)
 at

Re: Is there a good way to handle sum types (or tagged unions) in PostgreSQL?

2023-05-19 Thread Dominique Devienne
On Thu, May 18, 2023 at 2:28 PM Victor Nordam Suadicani <
v.n.suadic...@gmail.com> wrote:

> Is there any nice way to handle sum types (aka tagged unions) in a
> PostgreSQL database? [...]
>
A third method would be to save all fields of all variants into a single
> table, with all fields being nullable.
>
So you'd have a nullable text field, nullable integer and nullable double
> precision field.
>

Yes, we do that. That's IMHO the only sane way to do it.
And if some of those alternatives are FKs (relations), that's the only
choice.

You'd then need an additional tag field to indicate which variant of the
> union is used
>

No, you don't need it. That's implicit from the NULL'ability of the
alternative columns.
If you want, you can have it as a generated column, thus read-only.
Worse, having it as an explicit column would make it denormalized, and
possibly out of sync.


> and you'd have to write check constraints for each variant to ensure that
> all the fields in that variant are not null and all the fields not in that
> variant *are* null.
>

Yes indeed.


> This *almost* works, but has two major problems:
>
> 1. It wastes space. In Rust, an enum is only as big as its largest
> variant. Using this method, a table row would be as big as the sum of all
> the variants.
>

Not really, or not to a point it matters that much.
I don't know about the actual physical bytes on disk for PostgreSQL, but as
an example in SQLite,
all columns have *at least* 1 "header" byte per value, and NULL values (and
0 and 1) have no "payload" bytes.
In PostgreSQL (which is more "rigidly typed" as DRH would say :)) you may
waste space for primitive types,
but not for text and bytea, which is where it really matters IMHO.


> 2. Querying the data is very cumbersome, [...].
>

Sure, it's cumbersome. But I don't get your point here. NULL handling is
part of SQL.
And sum-types (variants) implemented via exclusive NULL'ness is just a
special case.
You "dispatch" to the proper column on writes. You read all alternatives
and assign the one (if any) NOT NULL to the variant.


> Both of these problems get bigger and bigger as you add more variants - it
> doesn't scale well.
>

ORMs cannot magically resolve the impedence mismatch between SQL and
OO-based or sum-type based type systems a la Rust (and co).
If you need SQL, you need to design for SQL for the get go. Not shoehorn
your Rust data model into SQL.

My $0.02.


Modeling combinations (options and dependencies)

2023-05-19 Thread eacil
Hello. I am struggling for the longest time about data I can't fit into a 
satisfying model. I thought about giving up about this project many times...
Context: it's for a web server so speed is of the essence. It's an amateur 
project so expect cheap server. I am also more a novice than a seasoned user so 
please have mercy on me.

Here we go, I hope I will be able to correctly explain my problem because 
chatgpt failed spectacularly to suggest anything even remotely worthwhile (yes, 
pretty desperate move, I know, but I am a desperate man).

-
DATA
-

-You have objects that are linked to many tables describing their properties, 
such as an object__tag table.
-These objects are sold in releases.
-Thing is, objects can be made of objects. I call them superobjects when they 
assume that container role (as objects, they can be part of other 
superobjects). Multiple releases can be linked to one superobject. Yes, it's a 
parts scenario.
-Objects can be shuffled in all sorts of unpredictable ways so it's useless to 
think that if one object is contained inside some superobject A, if that 
superobject A will be contained inside another superobject B, superobject B 
will inherit all objects from superobject A. In fact, my attempt at a solution 
will instantiate objects to link them to superobjects.
-In a superobject, multiple objects can be grouped together into a set of 
options. One option can be made of multiple objects (majority of the time, just 
one). You can choose how many options minimum and maximum (aka a range) you can 
choose from a set (majority of the time, just one). It's possible to have a 
NONE choice aka you don't choose any option.
-Any object can have 0 (common), 1 (common), or more (very rare) dependencies.

That's pretty much all.
It looks like a hierarchy but it is not really one. It looks like a graph but 
probably not one either.
It's made of multiple concepts that, alone, are difficult enough already, but 
now they are all grouped together and it's going well over my head.
In the end, I think it's about combinations with a bit of hierarchy to make 
everything more painful...

Here is a simple "tree" representation of that data. Think of "( )" as radio 
buttons. Empty linebreaks are separating sets of options. X indicates mandatory 
nodes (unless their parent is not chosen). "(n-m)" represents the min-max range 
for a set of options.

(x) Object 0 (it's the superobject)

(1-1) Object A1
(1-1) Object A2
(1-1) Object A3

(x) Object B1

(1-1) Object C1
│ (x) Object D1
│
│ (0-1) Object E1
│ (0-1) Object E2 + Object E3
│
│ (1-2) Object F1
│ (1-2) Object F2 {requires A2}
│
(1-1) Object C2
│ (x) Object G1
│
│ (x) Object E1

As you can see, a user can choose between three options A, E (with the second 
option being made of two objects and the third option making this set 
optional), two options C, F. If they choose object C2, they gets objects G1 and 
E1 by default.
Choosing object F2 forces you to have both objects C1 and A2. It's ugly to 
represent constraints like that but impossible to draw in 2D properly.

The reason why I said that it wasn't really a hierarchy is that you have three 
groups at "depth 1": object 0, object B, and objects C with their children. 
These three groups are all equal to each others and their position could be 
swapped with each other. Same with the children of objects C. Groups D, E, F, 
could be swapped with each other.

When you buy a release, you have to go from the top to the bottom and go 
through every branch until you reach its end.
Here, you could buy objects 0, A2, B1, C1, D1, E2, E3, F2.
Or 0, A1, B1, C1, D1, F1.
Or 0, A3, B1, C2, G1, E1.

Superobjects don't have much depth to begin with, depth 6 max.
Sets of options per superobject won't go probably above 10, but options 
themselves can go pretty high like 50.
Most objects don't have dependencies to tell you. The average superobject is 
made of a bunch of mandatory objects and few sets of options with few options 
(half of them probably being if you want the object or not).
I know it's vague but the data is unpredictable with lot of exceptions and 
that's why I need flexibility and can't hardcode anything.

-
PURPOSE
-

I don't really care at the moment as of how to render that tree above as it 
doesn't appear to be a big challenge. What I care is building a performant 
search engine to find superobjects (and releases above them) given columns of 
objects (subqueries) selected through joining objects with their related tableS 
such as object__tag. You have to find superobjects you can buy, while 
respecting all those annoying constraints. So, if you search for a superobject 
containing E2 and G1, you cannot find superobject 0 because it's on different 
branches! Same, if you search for A1 and A2, you cannot find superobject 0 
because they are in the same set of 

Unexpected cross-database vacuum impact with hot_standby_feedback=on

2023-05-19 Thread Owen Stephens
Hi,

We are seeing that vacuum is prevented from cleaning dead tuples by an open
transaction in a different database (where both connections are made
against the
primary server) when hot_standby_feedback = on but not when it is off. Is
this
cross-database interaction an expected effect of enabling
hot_standby_feedback,
even if the connections interact only with the primary not the replica? I
haven't managed to find anything in the documentation describing this
effect if
so.

To reproduce, consider a PG 14.7 setup with a primary server that has a
replica
with hot_standby_feedback enabled. Create two databases, with a table
containing
some rows in each. Then, open a `psql` session against each database.

In one, open a transaction, and in the other, delete the rows from the
table and
attempt to vacuum:

```
second_example_db=# BEGIN;
BEGIN
second_example_db=*# SELECT txid_current();
 txid_current
--
  770
(1 row)
second_example_db=*#
```

```
first_example_db=# DELETE FROM first_table;
DELETE 2
first_example_db=# VACUUM VERBOSE first_table;
INFO:  vacuuming "public.first_table"
INFO:  table "first_table": found 0 removable, 2 nonremovable row versions
in 1 out of 1 pages
DETAIL:  2 dead row versions cannot be removed yet, oldest xmin: 770
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s.
VACUUM
first_example_db=#
```

Notice that the oldest xmin is reported as that of the transaction in a
different database. If I COMMIT/ROLLBACK the transaction in
`second_example_db`,
then after a short while, the same VACUUM command succeeds:

```
...
INFO:  table "first_table": found 2 removable, 0 nonremovable row versions
in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 772
...
```

If I recreate the hot-standby replica but with hot_standby_feedback = off,
then
under the same reproduction, vacuum is able to remove the dead rows despite
the
open transaction in a different database, as expected.

Is anyone able to shed any light on this behaviour and whether or not it is
intentional?

Thanks,
Owen.


Re: PostgreSQL 13 - Logical Replication - ERROR: could not receive data from WAL stream: SSL SYSCALL error: EOF detected

2023-05-19 Thread FOUTE K . Jaurès
Le ven. 19 mai 2023 à 05:02, Kyotaro Horiguchi  a
écrit :

> At Thu, 18 May 2023 21:00:08 +0100, FOUTE K. Jaurès 
> wrote in
> > Hello everyone,
> >
> > I have a replication between PostgreSQL 12 to 13 in the production
> system.
> > Using Ubuntu 18.04 LTS
> > We have this error today.
> >
> >
> >
> > *2023-05-18 18:34:04.374 WAT [117322] ERROR:  could not receive data from
> > WAL stream: SSL SYSCALL error: EOF detected
> > 2023-05-18 18:34:04.381 WAT [118393] LOG:  logical replication apply
> worker for subscription "inov_transactionnal_table_bertoua_sub_00" has
> started
> > 2023-05-18 18:34:04.423 WAT [1039] LOG:  background worker "logical
> replication worker" (PID 117322) exited with exit code 1
> > 2023-05-18 18:47:51.485 WAT [66836] postgres@inov_henrietfreres_v71_00
> LOG:  could not receive data from client: Connection timed out.*
> >
> > *Any idea how to solve this?*
>
> According to the message, the SSL-encrypted replication connection got
> disconnected unexpectedly. I suppose it is due to the death of the
> upstream server or some reasons outside of PostgreSQL. It seems like
> the issue had been persisting for a few minutes after that. Other than
> the server's death, I doubt some network hardware problems or changes
> of firewall or networking setup of the OS. I think it would be good
> idea to check for them first.
>
Hello @Kyotaro Horiguchi 
Thx for feedback. BTW I didn't have any network issues.

>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>


-- 
Jaurès FOUTE


Re: Records, Types, and Arrays

2023-05-19 Thread Raymond Brinzer
On Fri, May 19, 2023 at 2:58 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Thursday, May 18, 2023, Raymond Brinzer  wrote:
>
>> scratch=# select row(2,3)::test_type;
>>
>
> Unknown typed value, immediately converted to a known concrete instance of
> test_type. It is never actually resolved as record.
>
> All of the others must concretely be resolved to record to escape their
> query level, and if you then try to cast the concrete record to some other
> concrete type a cast needs to exist.
>

I see.  That suggests, oddly, that pg_typeof() is changing the thing it's
observing, because row(2,3) was not a record but an 'unknown typed value'
before pg_typeof() was called on it.  Good to know.

CREATE CAST seems not to like working with pseudo-types.  Neither the
source nor the target can be 'anyarray' or 'record'.  So that seems out.

On the other hand, the pseudo-type doc says, 'Functions coded in C (whether
built-in or dynamically loaded) can be declared to accept or return any of
these pseudo data types.'  I'm assuming that an 'unknown typed value"
corresponds to the 'unknown' pseudo-type.  So it seems like a C function
which took an anyarray as its parameter, and returned a value of type
'unknown' would bypass the need for a specifically defined cast.  (Emphasis
on "seems".)
-- 
Ray Brinzer


Re: Records, Types, and Arrays

2023-05-19 Thread Pavel Stehule
Hi

pá 19. 5. 2023 v 8:59 odesílatel Raymond Brinzer 
napsal:

> Sorry, I should have noted this as well:
>
> "One should also realize that when a PL/pgSQL function is declared to
> return type record, this is not quite the same concept as a record
> variable, even though such a function might use a record variable to hold
> its result. In both cases the actual row structure is unknown when the
> function is written, but for a function returning record the actual
> structure is determined when the calling query is parsed, whereas a record
> variable can change its row structure on-the-fly."
>
> I'm guessing that row() isn't really a function, then?  And even so,
> assuming this is the important difference, how is the ability to change row
> structure on the fly making the cast possible?  In what way would the query
> calling get_row() be critical?
>

plpgsql cannot work well with too dynamic data. If you need more dynamic
data, then using jsonb is probably the best idea now.

Regards

Pavel


> On Fri, May 19, 2023 at 2:48 AM Raymond Brinzer 
> wrote:
>
>> On a problem which came up while trying to implement a solution, perhaps
>> someone could explain this:
>>
>> scratch=# create type test_type as (a int, b int);
>> CREATE TYPE
>> scratch=# create function get_row() returns record as $$ select row(2,3);
>> $$ language sql;
>> CREATE FUNCTION
>> scratch=# select get_row();
>>  get_row
>> -
>>  (2,3)
>> (1 row)
>>
>> scratch=# select pg_typeof( get_row() );
>>  pg_typeof
>> ---
>>  record
>> (1 row)
>>
>> scratch=# select pg_typeof( row(2,3) );
>>  pg_typeof
>> ---
>>  record
>> (1 row)
>>
>> scratch=# select row(2,3)::test_type;
>>   row
>> ---
>>  (2,3)
>> (1 row)
>>
>> scratch=# select get_row()::test_type;
>> ERROR:  cannot cast type record to test_type
>> LINE 1: select get_row()::test_type;
>>
>> If row(2,3) and get_row() are both of type record, and the records have
>> the same values, why can one be cast to test_type, and the other not?
>>
>> On Fri, May 19, 2023 at 1:07 AM Raymond Brinzer 
>> wrote:
>>
>>> Greetings, all.
>>>
>>> It's been a down-the-rabbit-hole day for me.  It all started out with a
>>> simple problem.  I have defined a composite type.  There are functions
>>> which return arrays whose values would be suitable to the type I defined.
>>> How do I turn arrays into composite typed values?
>>>
>>> Conceptually, this is straightforward.  Any given array can be mapped to
>>> a corresponding record with the same elements, so this expression would
>>> make sense:
>>>
>>> ARRAY[1,2,3]::RECORD
>>>
>>> If the result happens to be a valid instance of my_type, you might say:
>>>
>>> ARRAY[1,2,3]::RECORD::my_type
>>>
>>> Or, ideally, just:
>>>
>>> ARRAY[1,2,3]::my_type
>>>
>>> It seems to be a rather long way from the idea to the implementation,
>>> however.  A helpful soul from the IRC channel did manage to make this
>>> happen in a single expression:
>>>
>>> (format('(%s)', array_to_string(the_array, ','))::my_type).*
>>>
>>> While I'm happy to have it, that's ugly even by SQL's syntactic
>>> yardstick.  So, I figured I'd see about hiding it behind a function and a
>>> custom cast.  These efforts have not been successful, for reasons I'll
>>> probably share in a subsequent email, as the details would distract from
>>> the point of this one.
>>>
>>> Getting to that point... we have these three kinds of things:
>>>
>>> * Arrays
>>> * Composite Values / Records
>>> * Typed Composite Values (instances of composite types)
>>>
>>> (Note on the second:  while section 8.16.2 of the documentation talks
>>> about constructing "composite values", pg_typeof() reports these to be of
>>> the "record" pseudo-type.  To (hopefully) avoid confusion, I'm going to
>>> exclusively say "record" here.)
>>>
>>> Here's the thing about these:  in the abstract, they're mostly the
>>> same.  A record is simply an ordered multiset.  If you ignore
>>> implementation, syntax, and whatnot, you could say that arrays are the
>>> subset of records where all the members are of the same type.  Objects of
>>> composite type can be considered records with an additional feature:  each
>>> member has a name.
>>>
>>> It seems to me, then, that:
>>>
>>> 1) Switching between these things should be dead easy; and
>>> 2) One should be able to treat them as similarly as their actual
>>> differences allow.
>>>
>>> On the first point (speaking of arrays and composite types generically),
>>> there are six possible casts.  One of these already works, when members are
>>> compatible:
>>>
>>> record::composite_type
>>>
>>> (Mostly, anyway; I did run into a kink with it, which I'll explain when
>>> I discuss what I've tried.)
>>>
>>> These casts would always be valid:
>>>
>>> array::record
>>> composite_type::record
>>>
>>> These would be valid where the member sets are compatible:
>>>
>>> array::composite_type
>>> record::array
>>> composite_type::array
>>>
>>> It seems like having all six casts available would be 

Re: Records, Types, and Arrays

2023-05-19 Thread Raymond Brinzer
Sorry, I should have noted this as well:

"One should also realize that when a PL/pgSQL function is declared to
return type record, this is not quite the same concept as a record
variable, even though such a function might use a record variable to hold
its result. In both cases the actual row structure is unknown when the
function is written, but for a function returning record the actual
structure is determined when the calling query is parsed, whereas a record
variable can change its row structure on-the-fly."

I'm guessing that row() isn't really a function, then?  And even so,
assuming this is the important difference, how is the ability to change row
structure on the fly making the cast possible?  In what way would the query
calling get_row() be critical?

On Fri, May 19, 2023 at 2:48 AM Raymond Brinzer 
wrote:

> On a problem which came up while trying to implement a solution, perhaps
> someone could explain this:
>
> scratch=# create type test_type as (a int, b int);
> CREATE TYPE
> scratch=# create function get_row() returns record as $$ select row(2,3);
> $$ language sql;
> CREATE FUNCTION
> scratch=# select get_row();
>  get_row
> -
>  (2,3)
> (1 row)
>
> scratch=# select pg_typeof( get_row() );
>  pg_typeof
> ---
>  record
> (1 row)
>
> scratch=# select pg_typeof( row(2,3) );
>  pg_typeof
> ---
>  record
> (1 row)
>
> scratch=# select row(2,3)::test_type;
>   row
> ---
>  (2,3)
> (1 row)
>
> scratch=# select get_row()::test_type;
> ERROR:  cannot cast type record to test_type
> LINE 1: select get_row()::test_type;
>
> If row(2,3) and get_row() are both of type record, and the records have
> the same values, why can one be cast to test_type, and the other not?
>
> On Fri, May 19, 2023 at 1:07 AM Raymond Brinzer 
> wrote:
>
>> Greetings, all.
>>
>> It's been a down-the-rabbit-hole day for me.  It all started out with a
>> simple problem.  I have defined a composite type.  There are functions
>> which return arrays whose values would be suitable to the type I defined.
>> How do I turn arrays into composite typed values?
>>
>> Conceptually, this is straightforward.  Any given array can be mapped to
>> a corresponding record with the same elements, so this expression would
>> make sense:
>>
>> ARRAY[1,2,3]::RECORD
>>
>> If the result happens to be a valid instance of my_type, you might say:
>>
>> ARRAY[1,2,3]::RECORD::my_type
>>
>> Or, ideally, just:
>>
>> ARRAY[1,2,3]::my_type
>>
>> It seems to be a rather long way from the idea to the implementation,
>> however.  A helpful soul from the IRC channel did manage to make this
>> happen in a single expression:
>>
>> (format('(%s)', array_to_string(the_array, ','))::my_type).*
>>
>> While I'm happy to have it, that's ugly even by SQL's syntactic
>> yardstick.  So, I figured I'd see about hiding it behind a function and a
>> custom cast.  These efforts have not been successful, for reasons I'll
>> probably share in a subsequent email, as the details would distract from
>> the point of this one.
>>
>> Getting to that point... we have these three kinds of things:
>>
>> * Arrays
>> * Composite Values / Records
>> * Typed Composite Values (instances of composite types)
>>
>> (Note on the second:  while section 8.16.2 of the documentation talks
>> about constructing "composite values", pg_typeof() reports these to be of
>> the "record" pseudo-type.  To (hopefully) avoid confusion, I'm going to
>> exclusively say "record" here.)
>>
>> Here's the thing about these:  in the abstract, they're mostly the same.
>> A record is simply an ordered multiset.  If you ignore implementation,
>> syntax, and whatnot, you could say that arrays are the subset of records
>> where all the members are of the same type.  Objects of composite type can
>> be considered records with an additional feature:  each member has a name.
>>
>> It seems to me, then, that:
>>
>> 1) Switching between these things should be dead easy; and
>> 2) One should be able to treat them as similarly as their actual
>> differences allow.
>>
>> On the first point (speaking of arrays and composite types generically),
>> there are six possible casts.  One of these already works, when members are
>> compatible:
>>
>> record::composite_type
>>
>> (Mostly, anyway; I did run into a kink with it, which I'll explain when I
>> discuss what I've tried.)
>>
>> These casts would always be valid:
>>
>> array::record
>> composite_type::record
>>
>> These would be valid where the member sets are compatible:
>>
>> array::composite_type
>> record::array
>> composite_type::array
>>
>> It seems like having all six casts available would be very handy.  But
>> (here's point 2) to the extent that you don't have to bother switching
>> between them at all, so much the better.  For instance:
>>
>> (ARRAY[5,6,7])[1]
>> (ROW(5,6,7))[1]
>> (ROW(5,6,7)::my_type)[1]
>>
>> all make perfect sense.  It would be lovely to be able to treat these
>> types interchangeably where appropriate.  It seems to me (having 

Re: Records, Types, and Arrays

2023-05-19 Thread David G. Johnston
On Thursday, May 18, 2023, Raymond Brinzer  wrote:

>
> scratch=# select row(2,3)::test_type;
>

Unknown typed value, immediately converted to a known concrete instance of
test_type. It is never actually resolved as record.

All of the others must concretely be resolved to record to escape their
query level, and if you then try to cast the concrete record to some other
concrete type a cast needs to exist.

David J.


Re: Records, Types, and Arrays

2023-05-19 Thread Raymond Brinzer
On a problem which came up while trying to implement a solution, perhaps
someone could explain this:

scratch=# create type test_type as (a int, b int);
CREATE TYPE
scratch=# create function get_row() returns record as $$ select row(2,3);
$$ language sql;
CREATE FUNCTION
scratch=# select get_row();
 get_row
-
 (2,3)
(1 row)

scratch=# select pg_typeof( get_row() );
 pg_typeof
---
 record
(1 row)

scratch=# select pg_typeof( row(2,3) );
 pg_typeof
---
 record
(1 row)

scratch=# select row(2,3)::test_type;
  row
---
 (2,3)
(1 row)

scratch=# select get_row()::test_type;
ERROR:  cannot cast type record to test_type
LINE 1: select get_row()::test_type;

If row(2,3) and get_row() are both of type record, and the records have the
same values, why can one be cast to test_type, and the other not?

On Fri, May 19, 2023 at 1:07 AM Raymond Brinzer 
wrote:

> Greetings, all.
>
> It's been a down-the-rabbit-hole day for me.  It all started out with a
> simple problem.  I have defined a composite type.  There are functions
> which return arrays whose values would be suitable to the type I defined.
> How do I turn arrays into composite typed values?
>
> Conceptually, this is straightforward.  Any given array can be mapped to a
> corresponding record with the same elements, so this expression would make
> sense:
>
> ARRAY[1,2,3]::RECORD
>
> If the result happens to be a valid instance of my_type, you might say:
>
> ARRAY[1,2,3]::RECORD::my_type
>
> Or, ideally, just:
>
> ARRAY[1,2,3]::my_type
>
> It seems to be a rather long way from the idea to the implementation,
> however.  A helpful soul from the IRC channel did manage to make this
> happen in a single expression:
>
> (format('(%s)', array_to_string(the_array, ','))::my_type).*
>
> While I'm happy to have it, that's ugly even by SQL's syntactic
> yardstick.  So, I figured I'd see about hiding it behind a function and a
> custom cast.  These efforts have not been successful, for reasons I'll
> probably share in a subsequent email, as the details would distract from
> the point of this one.
>
> Getting to that point... we have these three kinds of things:
>
> * Arrays
> * Composite Values / Records
> * Typed Composite Values (instances of composite types)
>
> (Note on the second:  while section 8.16.2 of the documentation talks
> about constructing "composite values", pg_typeof() reports these to be of
> the "record" pseudo-type.  To (hopefully) avoid confusion, I'm going to
> exclusively say "record" here.)
>
> Here's the thing about these:  in the abstract, they're mostly the same.
> A record is simply an ordered multiset.  If you ignore implementation,
> syntax, and whatnot, you could say that arrays are the subset of records
> where all the members are of the same type.  Objects of composite type can
> be considered records with an additional feature:  each member has a name.
>
> It seems to me, then, that:
>
> 1) Switching between these things should be dead easy; and
> 2) One should be able to treat them as similarly as their actual
> differences allow.
>
> On the first point (speaking of arrays and composite types generically),
> there are six possible casts.  One of these already works, when members are
> compatible:
>
> record::composite_type
>
> (Mostly, anyway; I did run into a kink with it, which I'll explain when I
> discuss what I've tried.)
>
> These casts would always be valid:
>
> array::record
> composite_type::record
>
> These would be valid where the member sets are compatible:
>
> array::composite_type
> record::array
> composite_type::array
>
> It seems like having all six casts available would be very handy.  But
> (here's point 2) to the extent that you don't have to bother switching
> between them at all, so much the better.  For instance:
>
> (ARRAY[5,6,7])[1]
> (ROW(5,6,7))[1]
> (ROW(5,6,7)::my_type)[1]
>
> all make perfect sense.  It would be lovely to be able to treat these
> types interchangeably where appropriate.  It seems to me (having failed to
> imagine a counterexample) that any operation you could apply to an array
> should be applicable to a record, and any operation you could apply to a
> record should be applicable to an instance of a composite type.
>
> While the second point is rather far-reaching and idealistic, the first
> seems well-defined and reasonably easy.
>
> If you've taken the time to read all this, thank you.  If you take the
> idea seriously, or have practical suggestions, thank you even more.  If you
> correct me on something important... well, I owe much of what I know to
> people like you, so please accept my deepest gratitude.
>
> --
> Yours,
>
> Ray Brinzer
>


-- 
Ray Brinzer


Re: pg_stats.avg_width

2023-05-19 Thread Maciek Sakrejda
Thanks, that makes sense. It was going to be my third guess, but it
seemed pretty wide for a TOAST pointer. Reviewing what goes in there,
though, it's reasonable.

I assume that this means for unTOASTed but compressed data, this
counts the compressed size.

Would a doc patch clarifying this (and possibly linking to the
relevant TOAST docs [1]) be welcome? The current wording is pretty
vague. Something like

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 7c09ab3000..2814ac8007 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7466,7 +7466,9 @@ SCRAM-SHA-256$iteration
count:
stawidth int4
   
   
-   The average stored width, in bytes, of nonnull entries
+   The average stored width, in bytes, of nonnull entries. For compressed
+   entries, counts the compressed size; for TOASTed data, the size of the
+   TOAST pointer (see TOAST).
   
  

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index bb1a418450..62184fe32b 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3680,7 +3680,9 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
avg_width int4
   
   
-   Average width in bytes of column's entries
+   Average width in bytes of column's entries. For compressed entries,
+   counts the compressed size; for TOASTed data, the size of the TOAST
+   pointer (see TOAST).
   
  

(not sure if this should be  or ).

Thanks,
Maciek

[1]: https://www.postgresql.org/docs/current/storage-toast.html




Re: Records, Types, and Arrays

2023-05-19 Thread Raymond Brinzer
On Fri, May 19, 2023 at 1:42 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Thu, May 18, 2023 at 10:06 PM Raymond Brinzer 
> wrote:
>
>> How do I turn arrays into composite typed values?
>>
>
> Using just SQL syntax and no string munging:
>
> (array_val[1]::col1_type, array_val[2]::col2_type)::composite_type
>

Assuming one wants to write a specific solution, rather than a general one,
sure.  And when you want to deal with an unnamed array returned from a
function?  Well, you can throw *that* in a CTE to give it a name, or
perform some other such contortion.  The aggregate load of having to phrase
such simple ideas in complicated ways really isn't good.
>
>
> None of what you are saying is likely to ever see the light of day.  If
> you want to learn the SQL-way might be easier to just forget about your
> idealized equivalency between composite types and array containers.
>

The problem with "easier" is that addressing directly in front of you is
always easier in the immediate sense than actually attacking the problem
itself.  It also dooms you to the (after)life of Sisyphus, always rolling
the same rock up the same hill.


> ARRAY[...] is a constructor, its output is an array container.  You can
> either type the elements within the constructor or leave them untyped and
> put a syntactical-sugar cast on the result.
>
> ARRAY['2023-01-01'::date,'2023-02-01'::date]
> ARRAY['2023-01-01','2023-02-01']::date[]
>
> While you've said that having all this stuff would "be quite handy" that
> isn't obvious to me.  It is even less obvious that any benefit would likely
> be small compared to the effort to make all of this actually work.
>

Well, making one small part of it work would be a boon to me.  Is a simple,
generic cast from an array to a record really rocket science?  I can't
imagine why that would be.


> Even if I could write: composite_type[1] instead of composite_type.column1
> I don't know why I'd want to give up the expressiveness of writing the
> column name.
>

Naturally, you wouldn't give up the ability to do that.  You'd merely gain
the ability to do it another way.

-- 
Ray Brinzer