[GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
Why does the UPDATE SET = FROM choose a more poorly performing plan than the UPDATE SET = (SELECT ...)? It seems to me that it is the same join. I'm using 9.3.5. CREATE TABLE orig ( key1VARCHAR(11) PRIMARY KEY, time1 TIME ); INSERT INTO orig (key1, time1) SELECT a::TEXT, (((random()*100)::INT % 24)::TEXT || ':' || ((random()*100)::INT % 60)::TEXT)::TIME FROM generate_series(800, 8000200) a; CREATE INDEX odx ON orig(key1); CREATE TABLE second (LIKE orig); INSERT INTO second (key1) SELECT (800+(((random()*100)::INT) % 100))::TEXT FROM generate_series(1,40); EXPLAIN ANALYZE UPDATE second SET time1 = orig.time1 FROM orig WHERE second.key1 = orig.key1; QUERY PLAN Update on second (cost=69461.02..106082.02 rows=40 width=32) (actual time=16033.023..16033.023 rows=0 loops=1) - Hash Join (cost=69461.02..106082.02 rows=40 width=32) (actual time=7698.445..12992.039 rows=40 loops=1) Hash Cond: ((second.key1)::text = (orig.key1)::text) - Seq Scan on second (cost=0.00..12627.00 rows=40 width=18) (actual time=49.820..791.397 rows=40 loops=1) - Hash (cost=31765.01..31765.01 rows=201 width=26) (actual time=7648.540..7648.540 rows=201 loops=1) Buckets: 4096 Batches: 128 Memory Usage: 717kB - Seq Scan on orig (cost=0.00..31765.01 rows=201 width=26) (actual time=0.014..3655.844 rows=201 loops=1) Total runtime: 16033.193 ms (8 rows) UPDATE second SET time1 = NULL; EXPLAIN ANALYZE UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second WHERE orig.key1 = second.key1 LIMIT 1); QUERY PLAN Update on second (cost=3.60..19078.19 rows=1279959 width=18) (actual time=4642.453..4642.453 rows=0 loops=1) InitPlan 1 (returns $1) - Limit (cost=0.43..3.60 rows=1 width=8) (actual time=2.611..2.613 rows=1 loops=1) - Nested Loop (cost=0.43..4056331.83 rows=1279959 width=8) (actual time=2.606..2.606 rows=1 loops=1) - Seq Scan on second second_1 (cost=0.00..19074.59 rows=1279959 width=12) (actual time=2.487..2.487 rows=1 loops=1) - Index Scan using odx on orig (cost=0.43..3.14 rows=1 width=20) (actual time=0.098..0.098 rows=1 loops=1) Index Cond: ((key1)::text = (second_1.key1)::text) - Seq Scan on second (cost=0.00..19074.59 rows=1279959 width=18) (actual time=6.420..817.739 rows=40 loops=1) Total runtime: 4642.561 ms (9 rows) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] COPY TO returning empty result with parallel ALTER TABLE
Hi all, we experienced what seems to be a bug in the COPY TO implementation. When a table is being rewritten by an ALTER TABLE statement, a parallel COPY TO results in an empty result. Consider the following table data: CREATE TABLE test (id INTEGER NOT NULL, PRIMARY KEY (id)); INSERT INTO test (id) SELECT generate_series(1, 1000); One session does: ALTER TABLE test ADD COLUMN dummy BOOLEAN NOT NULL DEFAULT FALSE; This acquires an exclusive lock to the table. Another session now performs parallel: COPY test TO STDOUT; This blocks on the exclusive lock held by the ALTER statement. When the ALTER staement is finished, the COPY statement returns with an empty result. Same goes for COPY (SELECT ...) TO, whereas the same SELECT executed without COPY blocks and returns the correct result as expected. This is my analysis of this issue: The backend opens the rewritten data files, but it ignores the complete content, which indicates the data is being ignored because of XIDs. For direct SELECT statements the top-level query parsing acquires locks on involved tables and creates a new snapshot, but for COPY statements the parsing and locking is done later in COPY code. After locking the tables in COPY code, the data is read with an old snapshot, effectively ignoring all data from the rewritten table. I've check git master and 9.x and all show the same behaviour. I came up with the patch below, which is against curent git master. The patch modifies the COPY TO code to create a new snapshot, after acquiring the necessary locks on the source tables, so that it sees any modification commited by other backends. Despite thinking this is the correct solution, another solution or optimization would be to have ALTER TABLE, which holds the highest lock level, set the XID of rewritten tuples to the FrozenXID, as no other backend should access the table before the ALTER TABLE is committed. Sven diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 6b83576..fe2d157 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -1344,6 +1344,13 @@ BeginCopy(bool is_from, (errcode(ERRCODE_UNDEFINED_COLUMN), errmsg(table \%s\ does not have OIDs, RelationGetRelationName(cstate-rel; + + /* +* Use a new snapshot to ensure this query sees +* results of any previously executed queries. +*/ + if (!is_from) + PushActiveSnapshot(GetTransactionSnapshot()); } else { @@ -1394,11 +1401,10 @@ BeginCopy(bool is_from, plan = planner(query, 0, NULL); /* -* Use a snapshot with an updated command ID to ensure this query sees +* Use a new snapshot to ensure this query sees * results of any previously executed queries. */ - PushCopiedSnapshot(GetActiveSnapshot()); - UpdateActiveSnapshotCommandId(); + PushActiveSnapshot(GetTransactionSnapshot()); /* Create dest receiver for COPY OUT */ dest = CreateDestReceiver(DestCopyOut); @@ -1741,9 +1747,11 @@ EndCopyTo(CopyState cstate) ExecutorFinish(cstate-queryDesc); ExecutorEnd(cstate-queryDesc); FreeQueryDesc(cstate-queryDesc); - PopActiveSnapshot(); } + /* Discard snapshot */ + PopActiveSnapshot(); + /* Clean up storage */ EndCopy(cstate); } -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] STABLE vs. IMMUTABLE w.r.t. indexes
It seems Postgres 9.3 does not realize that it can collapse the result of a STABLE function when attempting to match against an index for a single query. I am running into a problem with a full text index, where my filter conditions include a function that returns the user's language code. If the function result were substituted directly, the filter condition would match an index built for that particular language, but the only way I can get the function to collapse down is to call it IMMUTABLE. The function pulls a GUC value and that's all it does. Is it safe to mark it IMMUTABLE? I noticed that if I updated the GUC variable and ran the query again, it worked as I would hope, with the new value of the function substituted. So it seems it would be safe, but I'd like to verify. I'd also like to know why it wouldn't work if the function was STABLE: Since Postgres should know that it's not going to change over the course of the query, couldn't it substitute the value as well? More details can be provided on request. Thanks. Moshe Jacobson Principal Architect, Nead Werx Inc. http://www.neadwerx.com 2323 Cumberland Parkway · Suite 201 · Atlanta, GA 30339 Quality is not an act, it is a habit. -- Aristotle
[GENERAL] Testing on Power 8 systems
Hi, I have an opportunity to test PostgreSQL and the applications I care about on a IBM Power 8 system tomorrow afternoon (Nov 4th, CET time). While my primary interest lies in the applications, I was wondering if people here would have advice or interest about useful tests or benchmarks to perform. Apparently, a wide range of Linux distributions (most probably in VMs) will be available, and I have no doubt that installation will be easy. Didn't find many ppc64 ou ppc64el binary packages online, though: yum.postgresql.org and apt.postgresql.org seem to only provide x86*, there's a ppc64el port of Debian… Regards, -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Testing on Power 8 systems
On 11/3/2014 9:45 AM, Georges Racinet wrote: I have an opportunity to test PostgreSQL and the applications I care about on a IBM Power 8 system tomorrow afternoon (Nov 4th, CET time). While my primary interest lies in the applications, I was wondering if people here would have advice or interest about useful tests or benchmarks to perform. Apparently, a wide range of Linux distributions (most probably in VMs) will be available, and I have no doubt that installation will be easy. Didn't find many ppc64 ou ppc64el binary packages online, though: yum.postgresql.org and apt.postgresql.org seem to only provide x86*, there's a ppc64el port of Debian… postgres built just fine for me on AIX 6 on a Power6 server using IBM XLC. I've not run Linux on Power, had no need for it. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Testing on Power 8 systems
Hi, On Mon, 2014-11-03 at 18:45 +0100, Georges Racinet wrote: Didn't find many ppc64 ou ppc64el binary packages online, though: yum.postgresql.org For the archives: We will support Power 7 and Power 8 systems on RHEL 6 and RHEL 7 in the upcoming weeks. Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR signature.asc Description: This is a digitally signed message part
Re: [GENERAL] Testing on Power 8 systems
On 11/03/2014 07:27 PM, Devrim Gündüz wrote: On Mon, 2014-11-03 at 18:45 +0100, Georges Racinet wrote: Didn't find many ppc64 ou ppc64el binary packages online, though: yum.postgresql.org For the archives: We will support Power 7 and Power 8 systems on RHEL 6 and RHEL 7 in the upcoming weeks. Thanks for the tip ! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Testing on Power 8 systems
On 11/3/2014 10:27 AM, Devrim Gündüz wrote: Hi, On Mon, 2014-11-03 at 18:45 +0100, Georges Racinet wrote: Didn't find many ppc64 ou ppc64el binary packages online, though: yum.postgresql.org For the archives: We will support Power 7 and Power 8 systems on RHEL 6 and RHEL 7 in the upcoming weeks. oooh, did someone donate a build host?cool! what about AIX ? -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
-Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of p...@cmicdo.com Sent: Monday, November 03, 2014 11:34 AM To: pgsql-general@postgresql.org Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...) Why does the UPDATE SET = FROM choose a more poorly performing plan than the UPDATE SET = (SELECT ...)? It seems to me that it is the same join. I'm using 9.3.5. CREATE TABLE orig ( key1VARCHAR(11) PRIMARY KEY, time1 TIME ); INSERT INTO orig (key1, time1) SELECT a::TEXT, (((random()*100)::INT % 24)::TEXT || ':' || ((random()*100)::INT % 60)::TEXT)::TIME FROM generate_series(800, 8000200) a; CREATE INDEX odx ON orig(key1); CREATE TABLE second (LIKE orig); INSERT INTO second (key1) SELECT (800+(((random()*100)::INT) % 100))::TEXT FROM generate_series(1,40); EXPLAIN ANALYZE UPDATE second SET time1 = orig.time1 FROM orig WHERE second.key1 = orig.key1; QUERY PLAN Update on second (cost=69461.02..106082.02 rows=40 width=32) (actual time=16033.023..16033.023 rows=0 loops=1) - Hash Join (cost=69461.02..106082.02 rows=40 width=32) (actual time=7698.445..12992.039 rows=40 loops=1) Hash Cond: ((second.key1)::text = (orig.key1)::text) - Seq Scan on second (cost=0.00..12627.00 rows=40 width=18) (actual time=49.820..791.397 rows=40 loops=1) - Hash (cost=31765.01..31765.01 rows=201 width=26) (actual time=7648.540..7648.540 rows=201 loops=1) Buckets: 4096 Batches: 128 Memory Usage: 717kB - Seq Scan on orig (cost=0.00..31765.01 rows=201 width=26) (actual time=0.014..3655.844 rows=201 loops=1) Total runtime: 16033.193 ms (8 rows) UPDATE second SET time1 = NULL; EXPLAIN ANALYZE UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second WHERE orig.key1 = second.key1 LIMIT 1); QUERY PLAN Update on second (cost=3.60..19078.19 rows=1279959 width=18) (actual time=4642.453..4642.453 rows=0 loops=1) InitPlan 1 (returns $1) - Limit (cost=0.43..3.60 rows=1 width=8) (actual time=2.611..2.613 rows=1 loops=1) - Nested Loop (cost=0.43..4056331.83 rows=1279959 width=8) (actual time=2.606..2.606 rows=1 loops=1) - Seq Scan on second second_1 (cost=0.00..19074.59 rows=1279959 width=12) (actual time=2.487..2.487 rows=1 loops=1) - Index Scan using odx on orig (cost=0.43..3.14 rows=1 width=20) (actual time=0.098..0.098 rows=1 loops=1) Index Cond: ((key1)::text = (second_1.key1)::text) - Seq Scan on second (cost=0.00..19074.59 rows=1279959 width=18) (actual time=6.420..817.739 rows=40 loops=1) Total runtime: 4642.561 ms (9 rows) These 2 queries are not the same. The first query updates rows in the second table with the orig.time1 values based on key1 column match. The second query finds first possible match (based on key1 column) and assigns orig.time1 value from the matched row to every record in second table. Regards, Igor Neyman -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] STABLE vs. IMMUTABLE w.r.t. indexes
Moshe Jacobson mo...@neadwerx.com writes: The function pulls a GUC value and that's all it does. Is it safe to mark it IMMUTABLE? No; such a function is by definition mutable. I noticed that if I updated the GUC variable and ran the query again, it worked as I would hope, with the new value of the function substituted. So it seems it would be safe, but I'd like to verify. You might chance to get away with that as long as you never ever use the function in a view or prepared query (including inside a plpgsql function). But it seems likely to bite you eventually. I'd also like to know why it wouldn't work if the function was STABLE: Since Postgres should know that it's not going to change over the course of the query, couldn't it substitute the value as well? You have not shown us the context, but I suspect you are wishing that the planner would assume that the function's result can't change between planning and execution. Unfortunately, it can. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)
On Mon, 11/3/14, Igor Neyman iney...@perceptron.com wrote: -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of p...@cmicdo.com Sent: Monday, November 03, 2014 11:34 AM To: pgsql-general@postgresql.org Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...) Why does the UPDATE SET = FROM choose a more poorly performing plan than the UPDATE SET = (SELECT ...)? It seems to me that it is the same join. I'm using 9.3.5. CREATE TABLE orig ( key1VARCHAR(11) PRIMARY KEY, time1 TIME ); INSERT INTO orig (key1, time1) SELECT a::TEXT, (((random()*100)::INT % 24)::TEXT || ':' || ((random()*100)::INT % 60)::TEXT)::TIME FROM generate_series(800, 8000200) a; CREATE INDEX odx ON orig(key1); CREATE TABLE second (LIKE orig); INSERT INTO second (key1) SELECT (800+(((random()*100)::INT) % 100))::TEXT FROM generate_series(1,40); EXPLAIN ANALYZE UPDATE second SET time1 = orig.time1 FROM orig WHERE second.key1 = orig.key1; [.] UPDATE second SET time1 = NULL; EXPLAIN ANALYZE UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second WHERE orig.key1 = second.key1 LIMIT 1); [.] These 2 queries are not the same. The first query updates rows in the second table with the orig.time1 values based on key1 column match. The second query finds first possible match (based on key1 column) and assigns orig.time1 value from the matched row to every record in second table. Regards, Igor Neyman I see that now. I was trying to reproduce something from work from memory and got tripped up on a sublety of UPDATE ... SELECT. The query I ran at work was like this: EXPLAIN ANALYZE UPDATE second se SET time1 = (SELECT time1 FROM orig WHERE orig.key1 = se.key1); QUERY PLAN -- Update on second se (cost=0.00..3390627.00 rows=40 width=18) (actual time=18698.795..18698.795 rows=0 loops=1) - Seq Scan on second se (cost=0.00..3390627.00 rows=40 width=18) (actual time=7.558..16694.600 rows=40 loops=1) SubPlan 1 - Index Scan using odx on orig (cost=0.43..8.45 rows=1 width=8) (actual time=0.033..0.035 rows=1 loops=40) Index Cond: ((key1)::text = (se.key1)::text) Total runtime: 18698.865 ms (6 rows) This does correctly match and update all of the second table entries. The plan actually runs longer than the UPDATE ... FROM, which squares with a comment the fine manual. Thanks! PJ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Basic question regarding insert
Thank you John. That perfectly answered by question. Regards Anil On Sat, Nov 1, 2014 at 2:43 AM, John R Pierce pie...@hogranch.com wrote: On 10/31/2014 3:24 AM, Anil Menon wrote: I have a very basic question on inserts - I tried to get a good authoritative answer but could not really find one to my satisfaction in the usual places. TLDR : can (after) insert trigger be run in parallel? the trigger is run in the context of the connection and transaction that invoked it.the original INSERT doesn't return til any and all triggers are processed. OTHER connections can do concurrent inserts to the same tables, as long as you're not using explicit table locks. -- john r pierce 37N 122W somewhere on the middle of the left coast
[GENERAL] Help with PostgreSQL 9.4 to expand jsonb int array into table with row numbers
Trying to wrap my head around postgresql 9.4 jsonb and would like some help figuring out how to do the following. Given the following example jsonb: ‘{“name1” : value1, “name2” : value2, “name3” : [int1, int2, int3] }’::jsonb AS table1.column1 Wanted: Return the “name3” array only, as a table with a return signature of TABLE( var_name varchar, var_value int, var_row_num int) So the resulting data would look like this: (‘name3’, int1, 1) (‘name3’, int2, 2) (‘name3’, int3, 3) Assume the array could be any length except zero and ‘name3’ is guaranteed to exist. Also posted on stackoverflow: http://stackoverflow.com/questions/26691725/postgresql-9-4-expand-jsonb-int-array-into-table-with-row-numbers Thanks, Neil -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Dynomite from Netflix - Making Non-Distributed Databases, Distributed
Hello all, I've come across Dynomite this evening. http://techblog.netflix.com/2014/11/introducing-dynomite.html It already has some interesting features and I'm sure there'll be some momentum behind it. It's too soon to say but I can definitely see something neat being build on top of Postgres. Ps: It's writen in C and Apache licensed. -- Arthur Silva
[GENERAL] Is it possible to set a timeout for optimization in PostgreSQL?
Hi all, In PostgreSQL is there a timeout when the optimizer stops the optimization process and returns the least expensive plan it has found so far? Is it possible to change its value? Thanks, Zhan
Re: [GENERAL] Is it possible to set a timeout for optimization in PostgreSQL?
Zhan Li zhanl...@gmail.com hat am 4. November 2014 um 03:01 geschrieben: Hi all, In PostgreSQL is there a timeout when the optimizer stops the optimization process and returns the least expensive plan it has found so far? Is it possible to change its value? Thanks, Zhan Please read http://www.postgresql.org/docs/current/static/geqo-pg-intro.html. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: Help with PostgreSQL 9.4 to expand jsonb int array into table with row numbers
Neil Tiffin-3 wrote Trying to wrap my head around postgresql 9.4 jsonb and would like some help figuring out how to do the following. Given the following example jsonb: ‘{“name1” : value1, “name2” : value2, “name3” : [int1, int2, int3] }’::jsonb AS table1.column1 Wanted: Return the “name3” array only, as a table with a return signature of TABLE( var_name varchar, var_value int, var_row_num int) So the resulting data would look like this: (‘name3’, int1, 1) (‘name3’, int2, 2) (‘name3’, int3, 3) Assume the array could be any length except zero and ‘name3’ is guaranteed to exist. Also posted on stackoverflow: http://stackoverflow.com/questions/26691725/postgresql-9-4-expand-jsonb-int-array-into-table-with-row-numbers Not syntax checked but... SELECT 'name3', int_text::integer AS int, int_ord FROM ( VALUES (...) ) src (column1) LATERAL ROWS FROM( json_array_elements(column1-'name3') ) WITH ORDINALITY jae (int_text, int_ord) Both WITH ORDINALITY and jsonb are introduced in 9.4; it is possible to make this work in all supported versions of PostgreSQL through the liberal use of CTE (WITH) as possibly the generate_series() function. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Help-with-PostgreSQL-9-4-to-expand-jsonb-int-array-into-table-with-row-numbers-tp5825487p5825539.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general