Re: effects of nullifying bytea column on storage

2022-05-11 Thread David G. Johnston
On Wednesday, May 11, 2022, David Gauthier  wrote:

> Doesn't vacuum run automatically (or can it be set to run automatically) ?
>
>

https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM

David J.


Re: effects of nullifying bytea column on storage

2022-05-11 Thread David Gauthier
Doesn't vacuum run automatically (or can it be set to run automatically) ?


On Wed, May 11, 2022 at 8:05 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wednesday, May 11, 2022, David Gauthier  wrote:
>
>> Hi:
>> psql (11.5, server 11.3) on linux
>>
>> I have a table with a bytea column which, of course, contains binary
>> data.  After 60 days, I no longer need the binary data but want to retain
>> the rest of the record.  Of course it's easy to just update the bytea
>> column to null for the older records.  But I can almost imagine this record
>> on disk with a big "hole" in the middle where the bytea data used to be.
>> Is there a PG daemon (the vacuum ?) that will "heal the hole" in time?
>>
>>
> The freshly written record will not have a hole - either by virtue of
> variable width fields taking up basically their actual data space and also
> the fact that null is treated specially in the record format.
>
> For the old row, yes you should read up on the how and why of the vacuum
> command.
>
> David J.
>
>


Re: effects of nullifying bytea column on storage

2022-05-11 Thread David G. Johnston
On Wednesday, May 11, 2022, David Gauthier  wrote:

> Hi:
> psql (11.5, server 11.3) on linux
>
> I have a table with a bytea column which, of course, contains binary
> data.  After 60 days, I no longer need the binary data but want to retain
> the rest of the record.  Of course it's easy to just update the bytea
> column to null for the older records.  But I can almost imagine this record
> on disk with a big "hole" in the middle where the bytea data used to be.
> Is there a PG daemon (the vacuum ?) that will "heal the hole" in time?
>
>
The freshly written record will not have a hole - either by virtue of
variable width fields taking up basically their actual data space and also
the fact that null is treated specially in the record format.

For the old row, yes you should read up on the how and why of the vacuum
command.

David J.


effects of nullifying bytea column on storage

2022-05-11 Thread David Gauthier
Hi:
psql (11.5, server 11.3) on linux

I have a table with a bytea column which, of course, contains binary data.
After 60 days, I no longer need the binary data but want to retain the rest
of the record.  Of course it's easy to just update the bytea column to null
for the older records.  But I can almost imagine this record on disk with a
big "hole" in the middle where the bytea data used to be.  Is there a PG
daemon (the vacuum ?) that will "heal the hole" in time?

Thanks !


Re: consistent postgresql snapshot

2022-05-11 Thread Ron

On 5/11/22 10:41, Zwettler Markus (OIZ) wrote:


PostgreSQL12 running on CentOS7 + ext4.

Would it be save to do a "fsfreeze –freeze" + disk snapshot + "fsfreeze 
–unfreeze" of the PGDATA filesystem to get a consistent snapshot?


I am wondering as PostgreSQL default blocksize = 8k while ext4 default 
blocksize = 4k, resulting in 2 fs blocks per db block.




If data/base and pg_xlog are on the same filesystem, then it might work.

Be sure to select pg_start_backup('something informative'); beforehand, and 
select pg_stop_backup(); afterwards.


For example:
psql -c "select pg_start_backup('something informative');"
fsfreeze –freeze
/disk snapshot/
fsfreeze –unfreeze
psql -c "select pg_stop_backup();"

--
Angular momentum makes the world go 'round.

Re: Deferred constraint trigger semantics

2022-05-11 Thread Bryn Llewellyn
> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Oops. I did a copy-and-paste error on going from my test env. to email and 
>> missed out the "deferral" that I'd intended. For completeness, here's the 
>> test that I meant:
>> 
>> create constraint trigger trg
>> after insert on t1
>> for each row
>> execute function trg_fn();
>> 
>> create constraint trigger trg
>> after insert on t2
>> initially deferred
>> for each row
>> execute function trg_fn();
>> 
>> It adds the "initially deferred" decoration to the "create constraint 
>> trigger" statement. This is (still) the result:
> 
> You only added it to the uninteresting trigger on t2.  It's the t1 trigger 
> where I'd expect the behavior to change.  I'm assuming your test does both 
> (not in a position to test it myself at the moment).

Damn. I'm horrified that, despite my best intentions, I still managed to do a 
typo. How embarrassing… With the correction in place, I now get this output:

INFO:  trg fired. new.v = 10, n = 8
INFO:  trg fired. new.v = 20, n = 8
INFO:  trg fired. new.v = 30, n = 8
INFO:  trg fired. new.v = 40, n = 8
INFO:  trg fired. new.v = 50, n = 8
INFO:  trg fired. new.v = 60, n = 8
INFO:  trg fired. new.v = 70, n = 8
INFO:  trg fired. new.v = 80, n = 8

This is exactly what you predicted. I'm delighted (finally) to see this outcome.

>> [What I wrote here was rubbish, given that my test code was not what I 
>> claimed it was.]
> 
> [David's response here is now moot.]
>  
>> With respect to « having to keep around a working set of what are the 
>> changed records » I think that the complexity that you envisaged is avoided 
>> by the (emergent) rule that an AFTER EACH STATEMENT trigger cannot see "old" 
>> and "new" values. In other words, all you can sensibly do in its function is 
>> ordinary SQL that sees the current state at the moment it fires.
>> 
>> To my surprise, it *is* legal to write code that accesses "old" and "new" 
>> values. But, because many rows can be affected by a single statement, and 
>> the trigger fires just once, the meanings of "old" and "new" are undefined. 
>> I've seen that, in any test that I do, both are always set to NULL (which 
>> seems reasonable).
> 
> I was thinking more about transition tables - though I admit it's not a 
> complete thought given their opt-in nature.

Ah… I hadn't considered transition tables. However, they don't seem to be 
valuable for a constraint trigger. So your concern could be removed at a stroke 
by adding a semantic rule to the account of the CREATE TRIGGER syntax 
specification thus:

« The SQL that creates a statement-level AFTER constraint trigger may not 
specify using transition tables. »

Might this device give me hope?

Finally, it seems that a not deferrable constraint trigger has no value—except 
in that using the keyword CONSTRAINT is a nice bit of self-documentation. Did I 
miss something? Is this trigger flavor valuable for a use-case that I haven't 
spotted?



Re: Deferred constraint trigger semantics

2022-05-11 Thread David G. Johnston
On Wed, May 11, 2022 at 3:43 PM Bryn Llewellyn  wrote:

>
> Oops. I did a copy-and-paste error on going from my test env. to email and
> missed out the "deferral" that I'd intended. For completeness, here's the
> test that I meant:
>
> create constraint trigger trg
> after insert on t1
> for each row
> execute function trg_fn();
>
> create constraint trigger trg
> after insert on t2
> initially deferred
> for each row
> execute function trg_fn();
>
> It adds the "initially deferred" decoration to the "create constraint
> trigger" statement. This is (still) the result:
>

You only added it to the uninteresting trigger on t2.  It's the t1 trigger
where I'd expect the behavior to change.  I'm assuming your test does both
(not in a position to test it myself at the moment).


> Even though both inserts have completed by commit time, only the trigger
> firing caused by the second statement sees the final state that obtains the
> moment before commit. The first statement sees only the state after it
> finishes and before the second statement executes. You said « I suspect
> [that both statements will see the final state] if you actually perform
> deferral ». My test shows that this is not the case.
>
> *Did I misunderstand you? Or does this result surprise you? If it does, do
> you think that this is a bug*?
>

It both surprises me and fails to surprise me at the same time.  I have no
opinion on whether the observed behavior constitutes a bug or not.  I'd
lean toward not - just maybe a chance to improve the documentation.


>
> With respect to « having to keep around a working set of what are the
> changed records » I think that the complexity that you envisaged is avoided
> by the (emergent) rule that an AFTER EACH STATEMENT trigger cannot see
> "old" and "new" values. In other words, all you can sensibly do in its
> function is ordinary SQL that sees the current state at the moment it fires.
>
> To my surprise, it *is* legal to write code that accesses "old" and "new"
> values. But, because many rows can be affected by a single statement, and
> the trigger fires just once, the meanings of "old" and "new" are undefined.
> I've seen that, in any test that I do, both are always set to NULL (which
> seems reasonable).
>

I was thinking more about transition tables - though I admit it's not a
complete thought given their opt-in nature.

David J.


Re: Deferred constraint trigger semantics

2022-05-11 Thread Bryn Llewellyn
> laurenz.a...@cybertec.at wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> …Internet Search turned up this 2019 post by Laurenz Albe—but nothing else 
>> at all.
>> 
>> https://www.cybertec-postgresql.com/en/triggers-to-enforce-constraints 
>> 
>> 
>> (This is why I CC'd you, Laurenz.)
> 
> So I guess I should answer.

Thanks for replying to my original post with this subject line, Laurenz. Sorry 
to bring up notions that you wrote about three years ago. I judged that, 
because I referred to those notions, it would be polite to copy you—especially 
because I interpreted (maybe incorrectly) what you had written.

> About the starting paragraph of your mail: Constraint triggers are a 
> syntactic leftover from the way that triggers are implemented in PostgreSQL. 
> There is different syntax now, but it was decided to leave constraint 
> triggers, since they may have some use.

If constraint triggers are, in the hands of an informed application programmer, 
to have some sound uses, then the semantics must be clearly defined. And you do 
say that they are supported. David Johnson argues that, as long as you read the 
right snippets from various parts of the doc and synthesize their joint 
meaning, then the semantics are defined. Yes, David, I can accept that—with a 
loud caveat about the answer to my (a) or (b) question below.

I re-read the penultimate paragraph in Laurenz's post:

«
By making the trigger INITIALLY DEFERRED, we tell PostgreSQL to check the 
condition at COMMIT time.
»

I have always understood that (in Postgres and any respectable RDBMS) commits 
in a multi-session environment are always strictly serialized—irrespective of 
the transaction's isolation level. Am I correct to assume this is the case for 
Postgres? I took "at COMMIT time" to mean "as part of the strictly serialized 
operations that implement a session's COMMIT". But I see (now) that you argue 
that this is not the case, thus:

«
This will reduce the window for the race condition a little, but the problem is 
still there. If concurrent transactions run the trigger function at the same 
time, they won’t see each other’s modifications.
»

I take what you say in your post to mean that each session executes its 
deferred constraint check (by extension, not just for constraint triggers but 
for all deferred constraint cases) momentarily *before* COMMIT so that the 
effect is only to reduce the duration of the race condition window rather than 
to eliminate it.

So it all depends on a lawyerly reading of the wording "at COMMIT time". The 
current CREATE TABLE doc says this:

«
If the constraint is INITIALLY DEFERRED, it is checked only at the end of the 
transaction.
»

The wording "at the end of the transaction" is not precise enough to 
adjudicate—and so the key question remains: Is a deferred constraint checked:

(a) as part of the strictly serialized operations that implement a session's 
COMMIT?

or

(b) momentarily *before* COMMIT and not within the serialized COMMIT execution?

So… (asking the wider audience) is the answer (a) or (b)? An if it's (b), why? 
After all, (b) brings the race condition risk. Is (a) simply not feasible?

> [Lots of ruminations and wandering throughts]
> 
> Sorry, that was too much for me to comment on - that would require a 
> mid-sized article.

Oh… I'm sorry to hear that I rambled and lost clarity. I find it impossible to 
say what I want to, striving for precision, without sacrificing brevity. I 
always find that I can improve my wording with successive reads. But life is 
short and I have, eventually, just to publish and be damned.

>> Is my entire concept (and Laurenz's too) fundamentally flawed? Specifically, 
>> is querying a trigger's base table in a "for each row" trigger fundamentally 
>> unsound and not supported?
> 
> My post claims that constraint triggers alone are *not* a sufficient solution 
> to validate constraints - you need additional locking or SERIALIZABLE 
> isolation to make that work reliably.

This doesn't seem to be what you wrote. These two headings

> Solving the problem with “optimistic locking” (which you explain means using 
> SERIALIZABLE)

and

> What about these “constraint triggers”?

read as if they are orthogonal schemes where the constraint trigger approach 
does not rely on SERIALIZABLE.

As I reason it, if you use the SERIALIZABLE approach, then an ordinary 
immediate AFTER EACH STATEMENT trigger will work fine—precisely because of how 
that isolation level is defined. So here, a deferred constraint trigger isn't 
needed and brings no value.

This implies that if a deferred constraint trigger is to have any utility, it 
must be safe to use it (as I tested it) at the READ COMMITTED level. I do see 
that, though I appear to be testing this, I cannot do a reliable test because I 
cannot, in application code, open up, and exploit, a race condition window 
after COMMIT has been issued. (I *am* able 

Re: Deferred constraint trigger semantics

2022-05-11 Thread Bryn Llewellyn
> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:

Thanks for the point-by-point reply, David.

>> ...makes no mention of what you might expect to see in an AFTER EACH ROW 
>> trigger.
> 
> ...the absence of a comment declaring a guarantee of order means that, like 
> the comment for the row-level BEFORE trigger, the row-level AFTER row 
> ordering is not guaranteed (even if one cannot produce a counter-example in 
> today's codebase).

Got it!

>> ...unless anybody contradicts me.
> 
> Caveat emptor...? I wouldn't be surprised that doing so is technically 
> possible in all cases - as to whether a particular algorithm is sound, to 
> some extent, isn't something we try to predict. We do try to describe all the 
> known interactions though - and let the user work within what those mean for 
> them.

Got it again!

>> ...implies that there's no such unpredictability in the AFTER EACH ROW cases.
> 
> I would not read it that way. In general, absence of mention of 
> predictability like this means there is none - that some other sentence goes 
> into more detail doesn't change that.

OK.

>> But there has to be a subtle caveat here for the deferred constraint trigger 
>> when the txn changes two or more tables, all of which participate in the 
>> query that the trigger function issues… The "raise info" output below 
>> illustrates my point (n changes from 5 to 8).
> 
> I'm failing to see the deferral aspect of that example. First statement 
> finishes, sees the 5 inserts, next statement finishes, sees 3 more inserts. 
> Not, both statements finish, triggers fire, triggers see all 8 inserts (which 
> I suspect they will if you actually perform deferral).

Oops. I did a copy-and-paste error on going from my test env. to email and 
missed out the "deferral" that I'd intended. For completeness, here's the test 
that I meant:

create table t1(k serial primary key, v int not null);
create table t2(k serial primary key, v int not null);

create function trg_fn()
  returns trigger
  language plpgsql
as $body$
declare
  n int not null := 0;
begin
  n := (select count(*) from t1) + (select count(*) from t2);
  raise info 'trg fired. new.v = %, n = %', new.v, n;
  return new;
end;
$body$;

create constraint trigger trg
after insert on t1
for each row
execute function trg_fn();

create constraint trigger trg
after insert on t2
initially deferred
for each row
execute function trg_fn();

set default_transaction_isolation = 'read committed';
do $body$
begin
  insert into t1(v)
  values (10), (20), (30), (40), (50);

  insert into t2(v)
  values (60), (70), (80);
end;
$body$;

It adds the "initially deferred" decoration to the "create constraint trigger" 
statement. This is (still) the result:

INFO:  trg fired. new.v = 10, n = 5
INFO:  trg fired. new.v = 20, n = 5
INFO:  trg fired. new.v = 30, n = 5
INFO:  trg fired. new.v = 40, n = 5
INFO:  trg fired. new.v = 50, n = 5
INFO:  trg fired. new.v = 60, n = 8
INFO:  trg fired. new.v = 70, n = 8
INFO:  trg fired. new.v = 80, n = 8

Even though both inserts have completed by commit time, only the trigger firing 
caused by the second statement sees the final state that obtains the moment 
before commit. The first statement sees only the state after it finishes and 
before the second statement executes. You said « I suspect [that both 
statements will see the final state] if you actually perform deferral ». My 
test shows that this is not the case.

Did I misunderstand you? Or does this result surprise you? If it does, do you 
think that this is a bug?

> ...Your test case for the deferred constraint, that supposedly allows for the 
> insertion of invalid data per the specification of the constraint trigger, 
> isn't something I've worked through yet; and as written reads like a bug 
> report.

It's not a report of a PG bug. Rather, it shows how an application programmer 
might write a bug in *their* code. When "set constraints all immediate" is 
used, it opens a race condition window between its execution and the commit. 
I'd speculated on that earlier. So I felt that I should show a self-contained 
demo of this possible trap for the application programmer. As long as  "set 
constraints all immediate" is not used, the demo shows proper behavior. Of 
course, it moves the constraint violation error to commit time—and this means 
that PL/pgSQL code cannot handle it (as discussed at length elsewhere).

I appreciate that using  "set constraints all immediate" is useful in many 
scenarios to allow handling the violation error in PL/pgSQL. (Thanks again for 
that tip, David.) For example, and as I reason it, the famous "mandatory 
one-to-one relationship" use-case is safe when you use this technique. This 
use-case needs mutual FK constraints between the two tables. But you have to 
insert one of the two rows (call it "main") that are so related before the 
other (call it "extra"). And so the mutual FK constraints cannot be satisfied 
until both new rows are in place. 

consistent postgresql snapshot

2022-05-11 Thread Zwettler Markus (OIZ)
PostgreSQL12 running on CentOS7 + ext4.

Would it be save to do a "fsfreeze -freeze" + disk snapshot + "fsfreeze 
-unfreeze" of the PGDATA filesystem to get a consistent snapshot?

I am wondering as PostgreSQL default blocksize = 8k while ext4 default 
blocksize = 4k, resulting in 2 fs blocks per db block.

Thanks, Markus



Re: PLPGSQL - extra column existence in trigger

2022-05-11 Thread hubert depesz lubaczewski
On Sat, May 07, 2022 at 07:41:44AM -0700, David G. Johnston wrote:
> No.  I’d probably approach this by generically converting the NEW record to
> json and working with that.  Non-existent object keys return null when
> accessed.

One note - in my tests working with hstore was significantly faster than
json.

It could have changed since I wrote it, but you might want to check it
out:
https://www.depesz.com/2021/04/21/getting-value-from-dynamic-column-in-pl-pgsql-triggers/

depesz




Re: Deferred constraint trigger semantics

2022-05-11 Thread Laurenz Albe
On Tue, 2022-05-10 at 17:46 -0700, Bryn Llewellyn wrote:
> I looked at the sections "CREATE TRIGGER" and "Chapter 39. Triggers" in the 
> Current PG doc.
> But I failed to find any information about the semantics of the deferred 
> constraint trigger
> or about the use cases that motivated this feature. Nor could I find any code 
> examples.
> Internet Search turned up this 2019 post by Laurenz Albe's—but nothing else 
> at all.
> 
> https://www.cybertec-postgresql.com/en/triggers-to-enforce-constraints/
> 
> (This is why I CC'd you, Laurenz.)

So I guess I should answer.

About the starting paragraph of your mail: Constraint triggers are a syntactic 
leftover
from the way that triggers are implemented in PostgreSQL.  There is different 
syntax now,
but it was decided to leave constraint triggers, since they may have some use.

> [Lots of ruminations and wandering throughts]

Sorry, that was too much for me to comment on - that would require a mid-sized
article.

> Is my entire concept (and Laurenz's too) fundamentally flawed? Specifically, 
> is
> querying a trigger's base table in a "for each row" trigger fundamentally 
> unsound
> and not supported? (In Oracle Database, it causes the notorious "mutating 
> table"
> runtime error.)

My post claims that constraint triggers alone are *not* a sufficient solution to
validate constraints - you need additional locking or SERIALIZABLE isolation to
make that work reliably.

That does not mean that using constraint triggers is unsound or unsupported,
and the fact that Oracle's implementation of transaction isolation is somewhat
shoddy has little impact on that.

Yours,
Laurenz Albe