RE: Insert data if it is not existing

2018-05-23 Thread Steven Winfield
INSERT … ON CONFLICT DO UPDATE … is probably what you want, perhaps using a 
specified unique index/constraint:
https://www.postgresql.org/docs/10/static/sql-insert.html#SQL-ON-CONFLICT

Steve.

From: tango ward [mailto:tangowar...@gmail.com]
Sent: 23 May 2018 10:04
To: pgsql-generallists.postgresql.org
Subject: Insert data if it is not existing


Hi,

I just want to ask if it's possible to insert data if it's not existing yet. I 
was able to play around with UPSERT before but that was when there was an error 
for duplicate data. In my scenario, no error message is showing.


Any suggestion?


Thanks,
J

This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. 
The registered name of Cantab- part of GAM Systematic is Cantab Capital 
Partners LLP. 
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information 
on confidentiality, the risks of non-secure electronic communication, and 
certain disclosures which we are required to make in accordance with applicable 
legislation and regulations. 
If you cannot access this link, please notify us by reply message and we will 
send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and 
use information about you in the course of your interactions with us. 
Full details about the data types we collect and what we use this for and your 
related rights is set out in our online privacy policy at 
https://www.gam.com/en/legal/privacy-policy. 
Please familiarise yourself with this policy and check it from time to time for 
updates as it supplements this notice.


RE: Insert data if it is not existing

2018-05-23 Thread Steven Winfield
From the docs:
“ON CONFLICT can be used to specify an alternative action to raising a unique 
constraint or exclusion constraint violation error.”

So if the INSERT part succeeds then the ON CONFLICT part is never executed.
If the INSERT fails with due to a violation of the constraint you specified (or 
was implied) then the ON CONFLICT part is executed instead. An UPDATE here can 
raise further errors, of course.
If the INSERT fails for a different reason then the ON CONFLICT part is not 
executed.

Steve.

From: tango ward [mailto:tangowar...@gmail.com]
Sent: 23 May 2018 10:46
To: Steven Winfield
Cc: pgsql-generallists.postgresql.org
Subject: Re: Insert data if it is not existing

thanks for the response Steven.

Will ON CONFLICT DO UPDATE/NOTHING if there's no error?

On Wed, May 23, 2018 at 5:43 PM, Steven Winfield 
mailto:steven.winfi...@cantabcapital.com>> 
wrote:
INSERT … ON CONFLICT DO UPDATE … is probably what you want, perhaps using a 
specified unique index/constraint:
https://www.postgresql.org/docs/10/static/sql-insert.html#SQL-ON-CONFLICT<https://www.postgresql.org/docs/10/static/sql-insert.html#SQL-ON-CONFLICT>

Steve.


From: tango ward [mailto:tangowar...@gmail.com<mailto:tangowar...@gmail.com>]
Sent: 23 May 2018 10:04
To: pgsql-generallists.postgresql.org<http://pgsql-generallists.postgresql.org>
Subject: Insert data if it is not existing


Hi,

I just want to ask if it's possible to insert data if it's not existing yet. I 
was able to play around with UPSERT before but that was when there was an error 
for duplicate data. In my scenario, no error message is showing.


Any suggestion?


Thanks,
J


This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. The registered name of Cantab- 
part of GAM Systematic is Cantab Capital Partners LLP. See - 
http://www.gam.com/en/Legal/Email+disclosures+EU<http://www.gam.com/en/Legal/Email+disclosures+EU>
 for further information on confidentiality, the risks of non-secure electronic 
communication, and certain disclosures which we are required to make in 
accordance with applicable legislation and regulations. If you cannot access 
this link, please notify us by reply message and we will send the contents to 
you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and 
use information about you in the course of your interactions with us. Full 
details about the data types we collect and what we use this for and your 
related rights is set out in our online privacy policy at 
https://www.gam.com/en/legal/privacy-policy<https://www.gam.com/en/legal/privacy-policy>.
 Please familiarise yourself with this policy and check it from time to time 
for updates as it supplements this notice




RE: Index Gin Creation is taking long time..

2018-06-28 Thread Steven Winfield



From: Anto Aravinth [mailto:anto.aravinth@gmail.com]
Sent: 28 June 2018 11:34
To: pgsql-generallists.postgresql.org
Subject: Index Gin Creation is taking long time..

Hello,

I'm trying to create an index:


create index search_idx on so2 using gin (to_tsvector('english',posts));

Looks like its running atleast for 8hours :( Totally I have 47M records in so2. 
Not sure why its taking so long time.

Any idea or tips to debug while the index creation is going on?

Thanks,
Anto.



· Try increasing maintenance_work_mem and restarting? 
https://www.postgresql.org/docs/10/static/gin-tips.html

· “top -p ” can show you if you are CPU or I/O bound.

· “perf top -p ” on the backend that is doing the index creation 
is a good way to see where it is spending time.

Good luck!

Steven.

This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. 
The registered name of Cantab- part of GAM Systematic is Cantab Capital 
Partners LLP. 
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information 
on confidentiality, the risks of non-secure electronic communication, and 
certain disclosures which we are required to make in accordance with applicable 
legislation and regulations. 
If you cannot access this link, please notify us by reply message and we will 
send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and 
use information about you in the course of your interactions with us. 
Full details about the data types we collect and what we use this for and your 
related rights is set out in our online privacy policy at 
https://www.gam.com/en/legal/privacy-policy. 
Please familiarise yourself with this policy and check it from time to time for 
updates as it supplements this notice.


RE: JSONB arrays

2018-06-28 Thread Steven Winfield
  From: Victor Noagbodji [mailto:vnoagbo...@amplify-nation.com]
  Sent: 28 June 2018 11:28
  To: pgsql-general
  Subject: JSONB arrays

  Hey people,

  I ended up with the following to check (or similar to return) the 
intersection of two list of values on a JSONB object:

  array_length(
  array(
  select jsonb_array_elements_text(col1)
  intersect
  select jsonb_array_elements_text(col2)
  ), 1) > 0

  Is there a better way?

  Thanks


Try the @> operator:

select '[1, 2, 3, 4]'::jsonb @> '[4, 2]'::jsonb


Steve.



RE: Extremely slow autovacuum:vacuum to prevent wraparound

2018-07-11 Thread Steven Winfield

From: Rijo Roy [mailto:rjo_...@yahoo.com]
Sent: 11 July 2018 14:30
To: srkris...@yahoo.com
Cc: pgsql-gene...@postgresql.org
Subject: Re: Extremely slow autovacuum:vacuum to prevent wraparound

No, I have  3 sessions of autovacuum against 3 tables which is doing a Vacuum 
to prevent wraparound as it hit the limit of autovacuum_freeze_max_age of 2 
billion.
I also have a vaccumdb session which is invoked by me on these tables which is 
currently in paused state. So, I want to know whether I can kill the autivacuum 
sessions which is running since 5 days and extremely slow and just run a manual 
vacuum against these tables instead.


Is your autovacuum_freeze_max_age really set to 2 billion? The default value is 
200 million. Setting it that high and disabling autovacuum isn’t just silly - 
it borders on sabotage!
I’ve used pg_cancel_backend() (NOT pg_terminate_backend()) on autovacuum jobs 
before without issue.

Good luck!

Steve.



RE: Extremely slow autovacuum:vacuum to prevent wraparound

2018-07-11 Thread Steven Winfield
From: Rijo Roy [mailto:rjo_...@yahoo.com]
Sent: 11 July 2018 15:14
To: Steven Winfield
Cc: pgsql-gene...@postgresql.org
Subject: RE: Extremely slow autovacuum:vacuum to prevent wraparound

Hi Steve,

Apologies, I misread it as 2 billion, it is as you said set as 200 million 
which is the default value for the parameter autovacuum_freeze_max_age.
I just wanted to confirm whether there would be any impact if I cancel or 
terminate the backend for the existing autovacuum :vacuum table_name (to 
prevent wraparound). I have initiated a normal vacuum manually but it is stuck 
on the table where the autovacuum is processing it.

Thanks,
Rijo Roy


Stack Overflow confirms my memories about killing autovacuum - 
pg_cancel_backend() should be fine.
https://stackoverflow.com/questions/18048842/danger-in-killing-autovacuum-vacuum-queries-to-prevent-wraparound

Steve.


RE: Advice on machine specs for growth

2018-09-20 Thread Steven Winfield
> Disable memory overcommit and set swappiness to 0 on database servers.

Just to be clear, are you suggesting (on linux) setting overcommit_memory = 2 
but leaving overcommit_ratio alone, which defaults to 50%?
I’ve tried setting them to 2 and 0% respectively in the past and it didn’t end 
well…

Also I’ve read, and now use, swappiness = 1 which is supposed to disable 
swapping entirely except when not swapping would cause OOM.
Any thoughts on that?

Cheers,
Steve.



RE: Incremental aggregate/rollup strategy advice

2019-07-08 Thread Steven Winfield
There are a couple of extensions that might help you:

PipelineDB[1]: Their “Continuous Views” could be useful. A big caveat here is 
that PipelineDB’s future is uncertain[2], though.
I haven’t used it myself, but it looks like you could put triggers onto your 
existing tables to insert data in PipelinedDB “Streams”, then build whatever 
continuous views are needed around those streams.

The other is TimescaleDB[3], which has “Continuous Aggregates”, but they are 
fairly new and currently have some limitation.

Steve.

[1] https://www.pipelinedb.com/
[2] https://www.pipelinedb.com/blog/pipelinedb-is-joining-confluent
[3] https://www.timescale.com/



This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. 
The registered name of Cantab- part of GAM Systematic is Cantab Capital 
Partners LLP. 
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information 
on confidentiality, the risks of non-secure electronic communication, and 
certain disclosures which we are required to make in accordance with applicable 
legislation and regulations. 
If you cannot access this link, please notify us by reply message and we will 
send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and 
use information about you in the course of your interactions with us. 
Full details about the data types we collect and what we use this for and your 
related rights is set out in our online privacy policy at 
https://www.gam.com/en/legal/privacy-policy. 
Please familiarise yourself with this policy and check it from time to time for 
updates as it supplements this notice.


RE: How to run a task continuously in the background

2019-07-11 Thread Steven Winfield
pg_cron, perhaps?
https://www.citusdata.com/blog/2016/09/09/pgcron-run-periodic-jobs-in-postgres/

I _think_ it ticks all three of your boxes.


Steve.



This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. 
The registered name of Cantab- part of GAM Systematic is Cantab Capital 
Partners LLP. 
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information 
on confidentiality, the risks of non-secure electronic communication, and 
certain disclosures which we are required to make in accordance with applicable 
legislation and regulations. 
If you cannot access this link, please notify us by reply message and we will 
send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and 
use information about you in the course of your interactions with us. 
Full details about the data types we collect and what we use this for and your 
related rights is set out in our online privacy policy at 
https://www.gam.com/en/legal/privacy-policy. 
Please familiarise yourself with this policy and check it from time to time for 
updates as it supplements this notice.


RE: Preventing in-session 'set role' commands

2019-08-06 Thread Steven Winfield
Maybe check out the set_user extension:
https://github.com/pgaudit/set_user

Steve.


Row locks, SKIP LOCKED, and transactions

2019-12-17 Thread Steven Winfield
Hi all,

I'm seeing some unexpected behaviour with SELECT ... FOR UPDATE SKIP LOCKED, 
and having finding it tricky to boil it down to a simple repro case as there's 
almost certainly a race condition somewhere (more later). So I thought I would 
ask if what I'm doing is unsupported (or just plain wrong!), before expending 
more effort in reproducing it.

I'm running v11.5, RHEL 7.7.

I have two tables jobs and results:
CREATE TABLE job (id integer PRIMARY KEY);
CREATE TABLE result (id integer PRIMARY KEY);
(obviously the real tables have more columns, but that's not too important here)

Something populates the job table with IDs. 
A job is done if its id appears in the result table. 
I would like to have multiple worker processes working on jobs.


I thought I could achieve this with each working doing the following:

BEGIN;

SELECT id 
FROM job
WHERE NOT EXISTS (SELECT 1 FROM result WHERE result.id = job.id)
LIMIT 1
FOR UPDATE SKIP LOCKED;

-- worker process does some work for the selected ID here

INSERT INTO result (id) VALUES (the_id_from_above);

COMMIT;


However, even with just two worker processes, I quickly found that one worker 
process would be assigned a job id that had *very* recently been completed by 
the other worker.

Some more potentially useful information:
* The LockRows node of the plan for the SELECT query above doesn't receive any 
tuples until about a second after the query begins executing
* If worker 2 begins querying for a new job id half a second before worker 1 
commits then worker 2 will pick up the job id that worker 1 has just finished 
with.
* I observe this even if I crank up the transaction isolation level to 
repeatable read and serializable.


I'm wondering if row locks are not obeying the same transactional semantics as 
row data, as a potential explanation for the above behaviour is as follows 
(W1/2 = Worker 1/2):

W1: BEGIN;
W1: SELECT ...
W1: (SELECT returns id=1. W1 now has job(id=1) locked.)
W1: INSERT INTO result (id) VALUES (1)

W2: BEGIN;
W2: SELECT ... 

W1: COMMIT; job(id=1) is now unlocked.

W2: (SELECT returns id=1: W1 had not committed when the SELECT started, so 
result(id=1) is not visible, but LockRows found that job(id=1) was not locked. 
W2 now has job(id=1) locked.)


...i.e. W2's SELECT could not see the row that W1 INSERTed (because W2's BEGIN 
occurs and W2's SELECT begins before W1's commit), but W2's SELECT *could* see 
the removal of W1's row lock. 


Perhaps this is a misuse of the locking system, since I'm locking a row "FOR 
UPDATE" but not actually updating it, but as row locks are released at the end 
of a transaction (according to the docs) then my expectation was for the 
unlocking and the visibility of newly committed rows to be atomic.
I've tried FOR NO KEY UPDATE too, without luck.

If I'm doing something forbidden (and the docs say so) then I'd be grateful if 
someone could point that out!

Best,
Steven.





RE: Row locks, SKIP LOCKED, and transactions

2019-12-18 Thread Steven Winfield
>> * I observe this even if I crank up the transaction isolation level to 
>> repeatable read and serializable.
>> 
>> 
>> I'm wondering if row locks are not obeying the same transactional semantics 
>> as row data, 


>Gotta believe it is this:
>
>https://www.postgresql.org/docs/11/transaction-iso.html#XACT-READ-COMMITTED
>
>"UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands ..."
>
>If I read correctly, transactions can see the effects of other 
>transactions that commit during their lifetime.

Thanks. I had a look at those docs when I first encountered the issue (if it 
can be called that), which prompted me to try repeatable read and serializable 
isolation levels, but to no avail. I couldn't find anything specifically 
mentioning the visibility of row locks at different isolation levels.





RE: Row locks, SKIP LOCKED, and transactions

2019-12-18 Thread Steven Winfield
> Huh.  SERIALIZABLE shouldn't allow two transactions to see no result row
> for a given ID and then insert a result row for that ID.  One of those
> transactions should have to roll back, because otherwise it'd be
> incompatible with both serial orderings of the two transactions.

Sorry for the misunderstanding - I wasn't suggesting that.
Even at the serializable level, W2 can see a row that is unlocked by W1's 
commit despite W2's snapshot being taken before W1 commits.
Carrying on my example, W2 would indeed fail to insert a result(id=1) row.

> Conceptually, the thing you really need to lock for this to work is the
> result row that isn't there yet, so that some overlapping transaction
> doesn't try to lock the same absent thing.  Unfortunately, our system for
> locking things that aren't there isn't there either.
> Some articles on serializability talk about "materialising the conflict",
> which means locking some other surrogate thing that "covers" a gap you are
> interested in.  You might think the job row would do the trick, but since
> we don't recheck the condition (that is, recheck that there is no
> corresponding result because you don't update the job row), no cigar. 

I like the concept of "materialising the conflict", that’s a useful way of 
thinking about it - thanks.

> You could also use plain old pg_try_advisory_xact_lock(id), because it just
> locks integers, and they're always there.

Yeah, I tried this, and might give it another go. A naïve attempt failed for a 
similar reason.

> 
> SERIALIZABLE deals with that type of magic internally (it locks gaps in
> key ranges by predicate-locking a physical btree or hash page that you'd
> need to write on to insert a row with a matching key, which is how it
> discovers a conflict between one transaction that went looking for key=42
> but didn't find it and another that later writes key=42), but, as
> mentioned, SERIALIZABLE doesn't really allow concurrency with this
> workload, and you specified that you wanted concurrency with SKIP LOCKED
> (but I think you'd have the same problem without it; SKIP LOCKED just gets
> you the wrong answer faster).
> 
> There are various ways you could deal with this, but I'd probably go for a
> simple scheme where you only have to consult a single row to know if you
> can claim it.  You could still put the results into a separate table, but
> use job.state to find work, and set it to DONE when you insert the result.
> It may also be possible to add no new columns but do a dummy update to the
> job row to get the join qual rechecked, but I'm not sure if that'd work.
> Another reason to add a state column to the job table is so that you can
> put a conditional index on it so you can find jobs to be done very
> quickly, if you're not planning to remove the ones that are done.

Thanks. I rejected the idea of doing a dummy update to the locked row as I 
wanted to avoid too much extra WAL - the real table originally had quite a few 
more columns than the toy example, but it's much slimmer now so this could be a 
viable option.




RE: Row locks, SKIP LOCKED, and transactions

2019-12-19 Thread Steven Winfield
> (Or you could use serializable mode, but that feels like using a hammer to 
> swat a fly.)

Do you mean the serializable transaction isolation level? Because that doesn't 
work either. Here (finally) is a tiny repro case. You'll need 2 psql sessions 
(S1, S2):

S1: CREATE TABLE t (id integer):
S1: INSERT INTO t VALUES (1);
S1: BEGIN;
S1: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
S1: SELECT id FROM t FOR UPDATE;

(So now there is a single, globally visible row that S1 has a lock on)

S2: BEGIN;
S2: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
S2: SELECT id FROM t;  -- returns 1 row, as expected
S2: SELECT id FROM t FOR UPDATE SKIP LOCKED; -- returns 0 rows, as expected

S1: ROLLBACK;  -- S1's row lock is released

S2: SELECT id FROM t FOR UPDATE SKIP LOCKED; -- returns 1 row

...i.e. the row appears unlocked to S2 despite its transaction's snapshot being 
taken before the lock was released.


I'm going to use the suggestions made by you and others previously in this 
thread, so (for me at least) this is now just academic, but I'm still 
interested to know if the above behaviour is expected, and if I should have 
been able to deduce it from the docs. The best I could find is:

https://www.postgresql.org/docs/11/sql-select.html
"With SKIP LOCKED, any selected rows that cannot be immediately locked are 
skipped. Skipping locked rows provides an inconsistent view of the data, so 
this is not suitable for general purpose work, but can be used to avoid lock 
contention with multiple consumers accessing a queue-like table."

Thanks for your (and everyone else's) help,

Steve.





RE: why select count(*) consumes wal logs

2018-11-07 Thread Steven Winfield
As long as you don’t have page checksums turned on, you can prevent this by 
turning off wal_log_hints.

Steve.



This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. 
The registered name of Cantab- part of GAM Systematic is Cantab Capital 
Partners LLP. 
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information 
on confidentiality, the risks of non-secure electronic communication, and 
certain disclosures which we are required to make in accordance with applicable 
legislation and regulations. 
If you cannot access this link, please notify us by reply message and we will 
send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and 
use information about you in the course of your interactions with us. 
Full details about the data types we collect and what we use this for and your 
related rights is set out in our online privacy policy at 
https://www.gam.com/en/legal/privacy-policy. 
Please familiarise yourself with this policy and check it from time to time for 
updates as it supplements this notice.


RE: why select count(*) consumes wal logs

2018-11-07 Thread Steven Winfield
> How to find out which parameter were used with initdb

pg_controldata -D  | grep sum
...should give you something like:
Data page checksum version:   0
...and 0 means off.

Similarly, from SQL:
select data_page_checksum_version from pg_control_init()

Steve.



This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. 
The registered name of Cantab- part of GAM Systematic is Cantab Capital 
Partners LLP. 
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information 
on confidentiality, the risks of non-secure electronic communication, and 
certain disclosures which we are required to make in accordance with applicable 
legislation and regulations. 
If you cannot access this link, please notify us by reply message and we will 
send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and 
use information about you in the course of your interactions with us. 
Full details about the data types we collect and what we use this for and your 
related rights is set out in our online privacy policy at 
https://www.gam.com/en/legal/privacy-policy. 
Please familiarise yourself with this policy and check it from time to time for 
updates as it supplements this notice.


RE: Questions about btree_gin vs btree_gist for low cardinality columns

2019-06-03 Thread Steven Winfield
> In the case of a single column with a small set of distinct values over a 
> large set of rows, how would a Bloom filter be preferable to, say, a GIN 
> index on an integer value? 

I don't think it would - it's probably better suited to the multi-column case 
you described previously.

> I have to say, this is actually a good reminder in my case. We've got a lot 
> of small-distinct-values-big-rows columns. For example, "server_id", 
> "company_id", "facility_id", and so on. Only a handful of parent keys with 
> many millions of related rows. Perhaps it would be conceivable to use a Bloom 
> index to do quick lookups on combinations of such values within the same 
> table. I haven't tried Bloom indexes in Postgres, this might be worth some 
> experimenting.

Yes, this is more like the use case I was thinking of.

Steve.


A little RULE help?

2018-01-25 Thread Steven Winfield
Hi all,

I'm trying to create a temporal table, from which data is never really deleted 
but each row "exists" for only a certain period of time, and a view of this 
table showing the currently "live" rows.
I would also like the view to accept INSERT, UPDATE, and DELETEs, and perform 
the relevant operations on the parent table.
I have got everything working by using RULEs on the view, except for UPDATES, 
where every attempt has failed - my UPDATE rule seems to have the same effect 
on the table as a DELETE, but no rows are returned when using a RETURNING 
clause. I can't see what I'm doing wrong.
I could use a TRIGGER instead (as shown below) but I'd rather not incur the 
extra overhead if there is a RULE out there that works.
I'm running v10.1, but this example should be compatible with at least v9.6.

Any help would be greatly appreciated!

Thanks,

Steve.



Here is some code to set up the example

CREATE EXTENSION IF NOT EXISTS btree_gist;

DROP TABLE IF EXISTS rule_test CASCADE;

CREATE TABLE rule_test (
id serial PRIMARY KEY,
tt tstzrange NOT NULL,
foo integer NOT NULL,
bar double precision NOT NULL,
EXCLUDE USING gist (tt WITH &&, foo WITH =)
);

CREATE TABLE rule_test_view ( LIKE rule_test );

CREATE RULE "_RETURN" AS ON SELECT TO rule_test_view DO INSTEAD
SELECT * FROM rule_test WHERE tt @> CURRENT_TIMESTAMP;

CREATE RULE rule_test_insert AS ON INSERT TO rule_test_view DO INSTEAD
INSERT INTO rule_test (tt, foo, bar) VALUES 
(tstzrange(CURRENT_TIMESTAMP, NULL, '[)'), NEW.foo, NEW.bar) RETURNING *;

CREATE RULE rule_test_delete AS ON DELETE TO rule_test_view DO INSTEAD
UPDATE rule_test SET tt = tstzrange(lower(tt), CURRENT_TIMESTAMP, '[)') 
WHERE id = OLD.id RETURNING *;

-- What I'd like to be able to do here is this:

-- CREATE RULE rule_test_update AS ON UPDATE TO rule_test_view DO INSTEAD (
-- UPDATE rule_test SET tt = tstzrange(lower(tt), CURRENT_TIMESTAMP, 
'[)') WHERE id = OLD.id;
-- INSERT INTO rule_test (tt, foo, bar) VALUES 
(tstzrange(CURRENT_TIMESTAMP, NULL, '[)'), NEW.foo, NEW.bar) RETURNING *;
-- );

-- or, using the rules already defined, this:

-- CREATE RULE rule_test_update AS ON UPDATE TO rule_test_view DO INSTEAD (
-- DELETE FROM rule_test_view WHERE id = OLD.id;
-- INSERT INTO rule_test_view (foo, bar) VALUES (NEW.foo, NEW.bar) 
RETURNING *;
-- );

-- but I can only get the desired effect using this trigger:


CREATE OR REPLACE FUNCTION rule_test_update ()
RETURNS trigger
AS
$$
BEGIN
DELETE FROM rule_test_view WHERE id = OLD.id;
INSERT INTO rule_test_view (foo, bar) VALUES (NEW.foo, NEW.bar) 
RETURNING tt into NEW.tt;
RETURN NEW;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER rule_test_update_trig INSTEAD OF UPDATE ON rule_test_view FOR 
EACH ROW EXECUTE PROCEDURE rule_test_update();




...And here is some code to test DML on the view



-- ###

-- 1. Insert some data, and check that RETURNING works
INSERT INTO rule_test_view (foo, bar) VALUES (1, 1.0), (2, 2.0), (3, 3.0) 
RETURNING *;

-- 2. Check that tt has been populated for each row
TABLE rule_test;

-- 3. Check that all rows are visible in the view
TABLE rule_test_view;

-- 4. "Delete" one row, and check that RETURNING works
DELETE FROM rule_test_view WHERE id = 1 RETURNING *;

-- 5. Check that the row still exists in the table, but the upper limit of tt 
is now set
TABLE rule_test;

-- 6. Check that the "deleted" row is no longer visible in the view
TABLE rule_test_view;

-- 7. "Update" one row, and check that RETURNING works
UPDATE rule_test_view SET bar = 3.141 WHERE foo = 3 RETURNING *;

-- 8. Check that the old version still exists in the table, but the upper limit 
of tt is now set,
--and a new version now exists, holding the new value
TABLE rule_test;

-- 9. Check that the old version of the row is no longer visible in the view
TABLE rule_test_view;



RE: A little RULE help?

2018-01-25 Thread Steven Winfield
> Don't use RULEs. HTH.

Why not?


RE: A little RULE help?

2018-01-25 Thread Steven Winfield
Maybe this?:

https://www.postgresql.org/docs/10/static/sql-createrule.html

"There is a catch if you try to use conditional rules for complex view
updates: there must be an unconditional INSTEAD rule for each action you
wish to allow on the view. If the rule is conditional, or is not
INSTEAD, then the system will still reject attempts to perform the
update action, because it thinks it might end up trying to perform the
action on the dummy table of the view in some cases. If you want to
handle all the useful cases in conditional rules, add an unconditional
DO INSTEAD NOTHING rule to ensure that the system understands it will
never be called on to update the dummy table. Then make the conditional
rules non-INSTEAD; in the cases where they are applied, they add to the
default INSTEAD NOTHING action. (This method does not currently work to
support RETURNING queries, however.)"

Thanks, I saw that - but none of my rules are conditional, and they are all 
INSTEAD OF, so I didn't think that was the problem.

FYI, I gave up on RULE's a while back. Triggers are a lot easier to
figure out and maintain.

These rules (including the non-working UPDATE rule) seemed fairly small and 
quite readable,
so I didn't think maintenance would be a problem.
Debugging this is a bit harder, however...

Steve.


RE: A little RULE help?

2018-01-26 Thread Steven Winfield
On 01/25/2018 03:14 PM, Steven Winfield wrote:
>
> https://www.postgresql.org/docs/10/static/rules-triggers.html<https://www.postgresql.org/docs/10/static/rules-triggers.html>
>
> “For the things that can be implemented by both, which is best depends
> on the usage of the database. A trigger is fired once for each
> affected row. A rule modifies the query or generates an additional
> query. So if many rows are affected in one statement, a rule issuing
> one extra command is likely to be faster than a trigger that is called
> for every single row and must re-determine what to do many times.
> However, the trigger approach is conceptually far simpler than the
> rule approach, and is easier for novices to get right.”
>

Well like I said, it may be better for what you are doing. I am not sure
but I can say from personal implementation experience that for old
school partitioning (e.g; everything before 10), triggers were so much
faster than rules that the general rule was, "don't use rules". That may
have changed since that experience.

JD

/

///


--
Command Prompt, Inc. || 
http://the.postgres.company/<http://the.postgres.company/> || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: 
https://postgresconf.org<https://postgresconf.org>
* Unless otherwise stated, opinions are my own. *


There have been comments on- and off-list about rules generally being slower 
than rules, which seemed counterintuitive (for my case at least) so I’ve done 
some timings.
(Also note that I haven’t done any partitioning here, new- or old-style - a few 
people have mentioned RULEs in relation to partitioning).
These compare the INSERT and UPDATE rules against equivalent INSTEAD OF 
triggers. Best of three runs each time, times are in seconds.

rows 10^410^510^6
insert rule   0.915.0179
insert trigger1.319.7224
delete rule   1.822.8282
delete trigger2.328.0331

…so the rules are about 20% faster than the triggers. Significant, but not 
game-changing.
Note that this is on quite close to a “real life” table too - there is the 
maintenance of the primary key index and the gist index that supports the 
exclude constraint in all those timings, so a table without those would have 
shown a bigger disparity between the two methods.
This makes sense - the RULEs just have one query to rewrite before it is 
planned and executed, whereas the TRIGGERs have to be re-executed for each row.

Back to my original attempt at writing an UPDATE rule…

CREATE RULE rule_test_update AS ON UPDATE TO rule_test_view DO INSTEAD (
UPDATE rule_test SET tt = tstzrange(lower(tt), CURRENT_TIMESTAMP, '[)') 
WHERE id = OLD.id;
INSERT INTO rule_test (tt, foo, bar) VALUES 
(tstzrange(CURRENT_TIMESTAMP, NULL, '[)'), NEW.foo, NEW.bar) RETURNING *;
);

…I wondered whether the pseudo relations NEW and OLD were somehow being 
modified by the first command (the UPDATE), such that the second command 
(INSERT) could not function properly. That would fit with what I observe, but 
I’m not sure how I go about proving or fixing it.

Best,
Steven.


Many Backends stuck in wait event IPC/ParallelFinish

2018-01-30 Thread Steven Winfield
Hi,

We just had an incident on one of our non-production databases where 14 
unrelated queries were all hung in wait event IPC / ParallelFinish. We had 
systematically called pg_cancel/terminate_backend on all other backends except 
these (and the autovacuum process mentioned below) to make sure there wasn't 
some other resource that they were deadlocked on.

We attached gdb to a number of the backends, and found their backtraces to look 
like this:

#0  0x7f9ea3e77903 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x0077cb5e in WaitEventSetWait ()
#2  0x0077d149 in WaitLatch ()
#3  0x004f1d75 in WaitForParallelWorkersToFinish ()
#4  0x006294e7 in ExecParallelFinish ()
#5  0x0063a57d in ExecShutdownGather ()
...
#6  0x00629978 in ExecShutdownNode ()   <-- Then zero or more of
#7  0x00676c01 in planstate_tree_walker ()  <-- this pair
...
#10 0x00629925 in ExecShutdownNode ()
#11 0x0062494e in standard_ExecutorRun ()
#12 0x7f9e99d73f5d in pgss_ExecutorRun () from 
/remote/install/sw/external/20180117-4-64/lib/pg_stat_statements.so
#13 0x007a5c24 in PortalRunSelect ()
#14 0x007a7316 in PortalRun ()
#15 0x007a2b49 in exec_simple_query ()
#16 0x007a4157 in PostgresMain ()
#17 0x0047926f in ServerLoop ()
#18 0x007200cc in PostmasterMain ()
#19 0x0047af97 in main ()

We also sent one of the backends a SIGABRT, so we have a core dump to play 
with. The only other backend running at the time was an autovacuum process, 
which may also have been hung - it didn't have a wait event in 
pg_stat_activity, but I didn't get a chance to strace it or attach gdb as the 
database restarted itself after we sent the SIGABRT.

The host is running Postgres v10.1 on RHEL7.4.

Any ideas what could have caused this, or what we could do to investigate this 
further?

Thanks,
Steve.


RE: Working with JSONB data having node lists

2018-01-31 Thread Steven Winfield
If you can do this one without unnesting the downloads array I do not know how. 
 Having done that: (WHERE dlarray->'publd' = '123)

I think it’s doable like this:

select * from contacts where data @> '{"downloads":[{"pubid":123}]}'::jsonb

…which would be aided by a gin index on ‘data’ using jsonb_path_ops:

CREATE INDEX idx_data_path_ops on contacts USING gin (data jsonb_path_ops);
https://www.postgresql.org/docs/10/static/datatype-json.html#JSON-INDEXING

Best,
Steve.



This email is confidential. If you are not the intended recipient, please 
advise us immediately and delete this message. 
The registered name of Cantab- part of GAM Systematic is Cantab Capital 
Partners LLP. 
See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information 
on confidentiality, the risks of non-secure electronic communication, and 
certain disclosures which we are required to make in accordance with applicable 
legislation and regulations. 
If you cannot access this link, please notify us by reply message and we will 
send the contents to you.


RE: A little RULE help?

2018-01-31 Thread Steven Winfield
No. It's because your view's WHERE clause is being added to your INSERT
and of course nothing matches, so nothing gets inserted.

> That would fit with what I observe, but I’m not sure how I go about
> proving
EXPLAIN

Thanks. You’re right - I should have thought to use EXPLAIN before posting here.

Don't use RULEs.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation 
et Support


Repeating this, as if I should already know this, is less helpful though. It’s 
not as if the docs tells users to steer clear of RULEs.
If there is something I don’t understand, such as RULEs, I prefer to experiment 
with it and ask questions if there is something I still don’t follow. I thought 
this was the correct forum for those questions.