[GENERAL] Curious why planner can't handle NOT IN
I have an `account` table with 5.3M rows, with primary key `id` of type `text` (and 600+ columns if that matters). I'm trying to create a `newaccount` table with the same schema but 600k newly imported rows, then insert all the old rows for which `id` isn't already in the set of newly updated rows. = create table newaccount as select * from account limit 0; = \copy newaccount from stdin with (format csv) [...copy 600k rows in...] = analyze newaccount; The most obvious query doesn't work so hot due to repeated execution of the subplan: = explain insert into newaccount select * from account where id not in (select id from account); QUERY PLAN - Insert on newaccount (cost=0.00..458800400224.89 rows=2985976 width=5366) - Seq Scan on account (cost=0.00..458800400224.89 rows=2985976 width=5366) Filter: (NOT (SubPlan 1)) SubPlan 1 - Materialize (cost=0.00..153276.00 rows=15 width=32) - Seq Scan on newaccount (cost=0.00..151500.00 rows=15 width=32) (6 rows) This works fine, though: = explain insert into newaccount select * from account where id in (select id from account except select id from newaccount); QUERY PLAN Insert on newaccount (cost=3749026.24..14034547.77 rows=2985976 width=5366) - Hash Join (cost=3749026.24..14034547.77 rows=2985976 width=5366) Hash Cond: (account.id = ANY_subquery.id) - Seq Scan on account (cost=0.00..2263744.51 rows=5971951 width=5366) - Hash (cost=3633552.85..3633552.85 rows=5971951 width=32) - Subquery Scan on ANY_subquery (cost=3543223.59..3633552.85 rows=5971951 width=32) - SetOp Except (cost=3543223.59..3573833.34 rows=5971951 width=19) - Sort (cost=3543223.59..3558528.47 rows=6121951 width=19) Sort Key: *SELECT* 1.id - Append (cost=0.00..2476464.02 rows=6121951 width=19) - Subquery Scan on *SELECT* 1 (cost=0.00..2323464.02 rows=5971951 width=19) - Seq Scan on account (cost=0.00..2263744.51 rows=5971951 width=19) - Subquery Scan on *SELECT* 2 (cost=0.00..153000.00 rows=15 width=32) - Seq Scan on newaccount (cost=0.00..151500.00 rows=15 width=32) (14 rows) This is all in PG 9.1. This isn't a big deal as there's a straightforward workaround, but I am curious what happened here. Googling turns up various mentions of NOT IN with poor plans that involve subplans. Then again I have read mention of hash anti-join which seems appropriate here(?), but that wasn't used here (nor was the latter join used, though for whatever reason it looks like a more complex/deeper plan tree than I had expected, so maybe it was out of the plan generator's reach?). E.g. the following mentions cranking up work_mem, but I probably can't crank up work_mem to meet the requirements of this example, and even if it doesn't fit in memory, it'd be nice for the planner to not degenerate to a pathological plan and still execute this join efficiently while spilling to and from disk. http://stackoverflow.com/questions/7125291/postgresql-not-in-versus-except-performance-difference-edited-2 Thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] SPI_execute_with_args call
Hello, colleagues ! I have to write random number generator state into database table Table structure is table rand_state { id serial not null primary key, state_rand bytea }; In C-function I do size_t nr_ins = strlen (insert into rand_state (state_rand) values ($1);); char * r_sql = (char *) palloc (nr_ins + 1); strncpy (r_sql, insert into rand_state (state_rand) values ($1);, nr_ins); Oid * oids = (Oid *)palloc (sizeof (Oid)); Datum * val = PointerGetDatum (randBuf); *oids = BYTEAOID; const char * nulls = NULL; int rins = SPI_execute_with_args (r_sql, 1, oids, val, nulls, false, 1); randBuf is a void * pointer that contains random number generator state, when I try to execute SPI_execute_with_args (r_sql, 1, oids, val, nulls, false, 1); I receive error The connection to the server was lost. Attempting reset: Failed. Could you give some work examples for SPI_execute_with_args because I didn't find them in documentation. Thanks a lot. -- Best regards, Sincerely yours, Yuriy Rusinov.
Re: [GENERAL] SPI_execute_with_args call
On 03/05/13 21:19, Yuriy Rusinov wrote: Hello, colleagues ! I have to write random number generator state into database table Table structure is table rand_state { id serial not null primary key, state_rand bytea }; In C-function I do size_t nr_ins = strlen (insert into rand_state (state_rand) values ($1);); char * r_sql = (char *) palloc (nr_ins + 1); strncpy (r_sql, insert into rand_state (state_rand) values ($1);, nr_ins); Oid * oids = (Oid *)palloc (sizeof (Oid)); Datum * val = PointerGetDatum (randBuf); *oids = BYTEAOID; const char * nulls = NULL; int rins = SPI_execute_with_args (r_sql, 1, oids, val, nulls, false, 1); randBuf is a void * pointer that contains random number generator state, when I try to execute SPI_execute_with_args (r_sql, 1, oids, val, nulls, false, 1); I receive error The connection to the server was lost. Attempting reset: Failed. Could you give some work examples for SPI_execute_with_args because I didn't find them in documentation. Thanks a lot. -- Best regards, Sincerely yours, Yuriy Rusinov. I can't answer your question. However, I can say that PRIMARY KEY implies NOT NULL (also an UNIQUE index), so you don't need to explicitly add NOT NULL when you are specifying PRIMARY KEY! Cheers, Gavin
Re: [GENERAL] SPI_execute_with_args call
I'm sorry ! But if I commented SPI_execute_with_args call, then all others works without bugs. On Fri, May 3, 2013 at 2:31 PM, Gavin Flower gavinflo...@archidevsys.co.nzwrote: On 03/05/13 21:19, Yuriy Rusinov wrote: Hello, colleagues ! I have to write random number generator state into database table Table structure is table rand_state { id serial not null primary key, state_rand bytea }; In C-function I do size_t nr_ins = strlen (insert into rand_state (state_rand) values ($1);); char * r_sql = (char *) palloc (nr_ins + 1); strncpy (r_sql, insert into rand_state (state_rand) values ($1);, nr_ins); Oid * oids = (Oid *)palloc (sizeof (Oid)); Datum * val = PointerGetDatum (randBuf); *oids = BYTEAOID; const char * nulls = NULL; int rins = SPI_execute_with_args (r_sql, 1, oids, val, nulls, false, 1); randBuf is a void * pointer that contains random number generator state, when I try to execute SPI_execute_with_args (r_sql, 1, oids, val, nulls, false, 1); I receive error The connection to the server was lost. Attempting reset: Failed. Could you give some work examples for SPI_execute_with_args because I didn't find them in documentation. Thanks a lot. -- Best regards, Sincerely yours, Yuriy Rusinov. I can't answer your question. However, I can say that PRIMARY KEY implies NOT NULL (also an UNIQUE index), so you don't need to explicitly add NOT NULL when you are specifying PRIMARY KEY! Cheers, Gavin -- Best regards, Sincerely yours, Yuriy Rusinov.
Re: [GENERAL] Curious why planner can't handle NOT IN
Yang Zhang yanghates...@gmail.com writes: The most obvious query doesn't work so hot due to repeated execution of the subplan: = explain insert into newaccount select * from account where id not in (select id from account); Yeah. Try using a NOT EXISTS instead. This isn't a big deal as there's a straightforward workaround, but I am curious what happened here. Googling turns up various mentions of NOT IN with poor plans that involve subplans. Then again I have read mention of hash anti-join which seems appropriate here(?), but that wasn't used here The trouble with NOT IN is that it's not exactly the same as an antijoin, because of the spec-mandated bizarre behavior for NULLs. It's very difficult to optimize it to any extent without producing wrong answers. NOT EXISTS avoids that problem. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SPI_execute_with_args call
Yuriy Rusinov yrusi...@gmail.com writes: In C-function I do size_t nr_ins = strlen (insert into rand_state (state_rand) values ($1);); char * r_sql = (char *) palloc (nr_ins + 1); strncpy (r_sql, insert into rand_state (state_rand) values ($1);, nr_ins); This is a hard, error-prone, and ultimately incorrect way to do pstrdup() --- you're not ensuring that the new string is null-terminated. Datum * val = PointerGetDatum (randBuf); Didn't your compiler give you a warning about that? PointerGetDatum produces a Datum, not a pointer to a Datum. You'd need something more like Datum val[1]; val[0] = PointerGetDatum (randBuf); This is assuming that randBuf is even of the right format to be a bytea value, which is unclear from your extract. const char * nulls = NULL; And that's just wrong. Personally I'd just pass NULL to SPI_execute_with_args since you don't have any null values to pass, but if you don't want to do that you'd need something more like char nulls[1]; nulls[0] = ' '; (hmm, it looks like the SPI documentation leaves something to be desired here --- the SPI_execute_with_args page, at least, isn't explaining the convention for elements of the nulls[] array) Could you give some work examples for SPI_execute_with_args because I didn't find them in documentation. A quick grep says there's a usage in src/pl/plpgsql/src/pl_exec.c regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Curious why planner can't handle NOT IN
On May 3, 2013 12:15 PM, Yang Zhang yanghates...@gmail.com wrote: I have an `account` table with 5.3M rows, with primary key `id` of type `text` (and 600+ columns if that matters). I'm trying to create a `newaccount` table with the same schema but 600k newly imported rows, then insert all the old rows for which `id` isn't already in the set of newly updated rows. = create table newaccount as select * from account limit 0; = \copy newaccount from stdin with (format csv) [...copy 600k rows in...] = analyze newaccount; The most obvious query doesn't work so hot due to repeated execution of the subplan: = explain insert into newaccount select * from account where id not in (select id from account); QUERY PLAN - Insert on newaccount (cost=0.00..458800400224.89 rows=2985976 width=5366) - Seq Scan on account (cost=0.00..458800400224.89 rows=2985976 width=5366) Filter: (NOT (SubPlan 1)) SubPlan 1 - Materialize (cost=0.00..153276.00 rows=15 width=32) - Seq Scan on newaccount (cost=0.00..151500.00 rows=15 width=32) (6 rows) This works fine, though: = explain insert into newaccount select * from account where id in (select id from account except select id from newaccount); QUERY PLAN Insert on newaccount (cost=3749026.24..14034547.77 rows=2985976 width=5366) - Hash Join (cost=3749026.24..14034547.77 rows=2985976 width=5366) Hash Cond: (account.id = ANY_subquery.id) - Seq Scan on account (cost=0.00..2263744.51 rows=5971951 width=5366) - Hash (cost=3633552.85..3633552.85 rows=5971951 width=32) - Subquery Scan on ANY_subquery (cost=3543223.59..3633552.85 rows=5971951 width=32) - SetOp Except (cost=3543223.59..3573833.34 rows=5971951 width=19) - Sort (cost=3543223.59..3558528.47 rows=6121951 width=19) Sort Key: *SELECT* 1.id - Append (cost=0.00..2476464.02 rows=6121951 width=19) - Subquery Scan on *SELECT* 1 (cost=0.00..2323464.02 rows=5971951 width=19) - Seq Scan on account (cost=0.00..2263744.51 rows=5971951 width=19) - Subquery Scan on *SELECT* 2 (cost=0.00..153000.00 rows=15 width=32) - Seq Scan on newaccount (cost=0.00..151500.00 rows=15 width=32) (14 rows) This is all in PG 9.1. This isn't a big deal as there's a straightforward workaround, but I am curious what happened here. Googling turns up various mentions of NOT IN with poor plans that involve subplans. Then again I have read mention of hash anti-join which seems appropriate here(?), but that wasn't used here (nor was the latter join used, though for whatever reason it looks like a more complex/deeper plan tree than I had expected, so maybe it was out of the plan generator's reach?). E.g. the following mentions cranking up work_mem, but I probably can't crank up work_mem to meet the requirements of this example, and even if it doesn't fit in memory, it'd be nice for the planner to not degenerate to a pathological plan and still execute this join efficiently while spilling to and from disk. http://stackoverflow.com/questions/7125291/postgresql-not-in-versus-except-performance-difference-edited-2 Thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SPI_execute_with_args call
Thanks a lot, I have corrected and bug was fixed. On Fri, May 3, 2013 at 6:42 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yuriy Rusinov yrusi...@gmail.com writes: In C-function I do size_t nr_ins = strlen (insert into rand_state (state_rand) values ($1);); char * r_sql = (char *) palloc (nr_ins + 1); strncpy (r_sql, insert into rand_state (state_rand) values ($1);, nr_ins); This is a hard, error-prone, and ultimately incorrect way to do pstrdup() --- you're not ensuring that the new string is null-terminated. Datum * val = PointerGetDatum (randBuf); Didn't your compiler give you a warning about that? PointerGetDatum produces a Datum, not a pointer to a Datum. You'd need something more like Datum val[1]; val[0] = PointerGetDatum (randBuf); This is assuming that randBuf is even of the right format to be a bytea value, which is unclear from your extract. const char * nulls = NULL; And that's just wrong. Personally I'd just pass NULL to SPI_execute_with_args since you don't have any null values to pass, but if you don't want to do that you'd need something more like char nulls[1]; nulls[0] = ' '; (hmm, it looks like the SPI documentation leaves something to be desired here --- the SPI_execute_with_args page, at least, isn't explaining the convention for elements of the nulls[] array) Could you give some work examples for SPI_execute_with_args because I didn't find them in documentation. A quick grep says there's a usage in src/pl/plpgsql/src/pl_exec.c regards, tom lane -- Best regards, Sincerely yours, Yuriy Rusinov.
[GENERAL] How to INSERT INTO one table from another table, WHERE
I am trying to insert data from 2 columns in tableB (colX and colY) into the same two columns of tableB, with a join like where clause. Is this possible? For example: INSERT INTO tableA (colX, colY) (SELECT colX, colY FROM tableB WHERE tableA.blockname = tableB.block_name AND tableA.timestamp = tableB.timestamp) ; -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to INSERT INTO one table from another table, WHERE
-Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- ow...@postgresql.org] On Behalf Of Kirk Wythers Sent: Friday, May 03, 2013 1:51 PM To: POSTGRES Subject: [GENERAL] How to INSERT INTO one table from another table, WHERE I am trying to insert data from 2 columns in tableB (colX and colY) into the same two columns of tableB, with a join like where clause. Is this possible? For example: INSERT INTO tableA (colX, colY) (SELECT colX, colY FROM tableB WHERE tableA.blockname = tableB.block_name AND tableA.timestamp = tableB.timestamp) ; If it's not the whole record but just some columns, you UPDATE them not INSERT: UPDATE tableA A SET colX = B.colx, colY = B.colY FROM table B B WHERE A. blockname = B.block_name AND A.timestamp = B.timestamp; Note the use of aliases (A, B). b.t.w. timestamp - isn't a good choice for column name, being a data type it's on the list of reserved words. Regards, Igor Neyman -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Unlogged indexes
Guessing the answer's no, but is there any way to construct indexes such that I can safely put them on (faster) volatile storage? (Just to be clear, I'm asking about indexes for *logged* tables.) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unlogged indexes
On 3 May 2013 21:06, Yang Zhang yanghates...@gmail.com wrote: Guessing the answer's no, but is there any way to construct indexes such that I can safely put them on (faster) volatile storage? (Just to be clear, I'm asking about indexes for *logged* tables.) Yes: CREATE INDEX ... TABLESPACE tablespacename; ALTER INDEX ... SET TABLESPACE tablespacename; Although there's a disparity between your email subject and main text. Indexes for logged tables are always logged. If you want an unlogged index you can only create it for an unlogged table. And putting indexes on a separate tablespace is probably not as advantageous as you're thinking. Might be worth testing. -- Thom -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Unlogged indexes
Yeah, I know that indexes for unlogged tables are unlogged. I was just wondering if you could do this for logged tables. (Safely, such that on crash recovery WAL replay won't throw up, these can be omitted from base backups, etc.) On Fri, May 3, 2013 at 1:46 PM, Thom Brown t...@linux.com wrote: On 3 May 2013 21:06, Yang Zhang yanghates...@gmail.com wrote: Guessing the answer's no, but is there any way to construct indexes such that I can safely put them on (faster) volatile storage? (Just to be clear, I'm asking about indexes for *logged* tables.) Yes: CREATE INDEX ... TABLESPACE tablespacename; ALTER INDEX ... SET TABLESPACE tablespacename; Although there's a disparity between your email subject and main text. Indexes for logged tables are always logged. If you want an unlogged index you can only create it for an unlogged table. And putting indexes on a separate tablespace is probably not as advantageous as you're thinking. Might be worth testing. -- Thom -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to monitor recovery on Windows?
This weekend, we're planning to move a production cluster to a new server. Rather than a full dump-and-restore, I want to use a base backup from last night, have the server replay the WAL files up to the time we shutdown the old server, and go from there. Both servers are Windows. I've never done a recovery from WAL files before, but it sounds simple enough. One thing I'm unsure of though: how can I monitor the progress of the recovery? Will the WAL files themselves change or disappear? Will progress reports end up in the PostgreSQL logs? Something else entirely? Thanks, -- Doug Gorley | d...@gorley.ca
Re: [GENERAL] Simple, free PG GUI/query tool wanted
You can use free tool - Valentina Studio http://www.valentina-db.com/en/valentina-studio-overview 14 Feb 2013 in the 5.0 version added support of PostgreSQL, as well as mySQL/mariaDB, SQLite. It is FREE. Works on Mac, Win and Linux. Includes not only db management but powerfull reports that work again on 3 OS. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Simple-free-PG-GUI-query-tool-wanted-tp3305055p5754182.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general