[GENERAL] Curious why planner can't handle NOT IN

2013-05-03 Thread Yang Zhang
I have an `account` table with 5.3M rows, with primary key `id` of
type `text` (and 600+ columns if that matters).

I'm trying to create a `newaccount` table with the same schema but
600k newly imported rows, then insert all the old rows for which `id`
isn't already in the set of newly updated rows.

= create table newaccount as select * from account limit 0;

= \copy newaccount from stdin with (format csv)
[...copy 600k rows in...]

= analyze newaccount;

The most obvious query doesn't work so hot due to repeated execution
of the subplan:

= explain insert into newaccount select * from account where id not
in (select id from account);

   QUERY PLAN
-
 Insert on newaccount  (cost=0.00..458800400224.89 rows=2985976 width=5366)
   -  Seq Scan on account  (cost=0.00..458800400224.89 rows=2985976 width=5366)
 Filter: (NOT (SubPlan 1))
 SubPlan 1
   -  Materialize  (cost=0.00..153276.00 rows=15 width=32)
 -  Seq Scan on newaccount  (cost=0.00..151500.00
rows=15 width=32)
(6 rows)

This works fine, though:

= explain insert into newaccount select * from account where id in
(select id from account except select id from newaccount);

   QUERY PLAN

 Insert on newaccount  (cost=3749026.24..14034547.77 rows=2985976 width=5366)
   -  Hash Join  (cost=3749026.24..14034547.77 rows=2985976 width=5366)
 Hash Cond: (account.id = ANY_subquery.id)
 -  Seq Scan on account  (cost=0.00..2263744.51 rows=5971951
width=5366)
 -  Hash  (cost=3633552.85..3633552.85 rows=5971951 width=32)
   -  Subquery Scan on ANY_subquery
(cost=3543223.59..3633552.85 rows=5971951 width=32)
 -  SetOp Except  (cost=3543223.59..3573833.34
rows=5971951 width=19)
   -  Sort  (cost=3543223.59..3558528.47
rows=6121951 width=19)
 Sort Key: *SELECT* 1.id
 -  Append  (cost=0.00..2476464.02
rows=6121951 width=19)
   -  Subquery Scan on *SELECT*
1  (cost=0.00..2323464.02 rows=5971951 width=19)
 -  Seq Scan on account
(cost=0.00..2263744.51 rows=5971951 width=19)
   -  Subquery Scan on *SELECT*
2  (cost=0.00..153000.00 rows=15 width=32)
 -  Seq Scan on
newaccount  (cost=0.00..151500.00 rows=15 width=32)
(14 rows)

This is all in PG 9.1.

This isn't a big deal as there's a straightforward workaround, but I
am curious what happened here.  Googling turns up various mentions of
NOT IN with poor plans that involve subplans.  Then again I have
read mention of hash anti-join which seems appropriate here(?), but
that wasn't used here (nor was the latter join used, though for
whatever reason it looks like a more complex/deeper plan tree than I
had expected, so maybe it was out of the plan generator's reach?).

E.g. the following mentions cranking up work_mem, but I probably can't
crank up work_mem to meet the requirements of this example, and even
if it doesn't fit in memory, it'd be nice for the planner to not
degenerate to a pathological plan and still execute this join
efficiently while spilling to and from disk.

http://stackoverflow.com/questions/7125291/postgresql-not-in-versus-except-performance-difference-edited-2

Thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] SPI_execute_with_args call

2013-05-03 Thread Yuriy Rusinov
Hello, colleagues !

I have to write random number generator state into database table
Table structure is
table rand_state
{
id serial not null primary key,
state_rand bytea
};

In C-function I do
size_t nr_ins = strlen (insert into rand_state (state_rand) values ($1););
char * r_sql = (char *) palloc (nr_ins + 1);
strncpy (r_sql, insert into rand_state (state_rand) values ($1);, nr_ins);

Oid * oids = (Oid *)palloc (sizeof (Oid));
Datum * val = PointerGetDatum (randBuf);
*oids = BYTEAOID;
const char * nulls = NULL;
int rins = SPI_execute_with_args (r_sql, 1, oids, val, nulls, false, 1);

randBuf is a void * pointer that contains random number generator state,
when I try to execute SPI_execute_with_args (r_sql, 1, oids, val, nulls,
false, 1); I receive error
The connection to the server was lost. Attempting reset: Failed.
Could you give some work examples for SPI_execute_with_args because I
didn't find them in documentation.

Thanks a lot.

-- 
Best regards,
Sincerely yours,
Yuriy Rusinov.


Re: [GENERAL] SPI_execute_with_args call

2013-05-03 Thread Gavin Flower

On 03/05/13 21:19, Yuriy Rusinov wrote:

Hello, colleagues !

I have to write random number generator state into database table
Table structure is
table rand_state
{
id serial not null primary key,
state_rand bytea
};

In C-function I do
size_t nr_ins = strlen (insert into rand_state (state_rand) values 
($1););

char * r_sql = (char *) palloc (nr_ins + 1);
strncpy (r_sql, insert into rand_state (state_rand) values ($1);, 
nr_ins);


Oid * oids = (Oid *)palloc (sizeof (Oid));
Datum * val = PointerGetDatum (randBuf);
*oids = BYTEAOID;
const char * nulls = NULL;
int rins = SPI_execute_with_args (r_sql, 1, oids, val, nulls, false, 1);

randBuf is a void * pointer that contains random number generator state,
when I try to execute SPI_execute_with_args (r_sql, 1, oids, val, 
nulls, false, 1); I receive error

The connection to the server was lost. Attempting reset: Failed.
Could you give some work examples for SPI_execute_with_args because I 
didn't find them in documentation.


Thanks a lot.

--
Best regards,
Sincerely yours,
Yuriy Rusinov.

I can't answer your question.

However, I can say that PRIMARY KEY implies NOT NULL (also an UNIQUE 
index), so you don't need to explicitly add NOT NULL when you are 
specifying PRIMARY KEY!



Cheers,
Gavin


Re: [GENERAL] SPI_execute_with_args call

2013-05-03 Thread Yuriy Rusinov
I'm sorry !

But if I commented SPI_execute_with_args call, then all others works
without bugs.


On Fri, May 3, 2013 at 2:31 PM, Gavin Flower
gavinflo...@archidevsys.co.nzwrote:

  On 03/05/13 21:19, Yuriy Rusinov wrote:

 Hello, colleagues !

  I have to write random number generator state into database table
 Table structure is
 table rand_state
 {
 id serial not null primary key,
 state_rand bytea
 };

  In C-function I do
 size_t nr_ins = strlen (insert into rand_state (state_rand) values
 ($1););
  char * r_sql = (char *) palloc (nr_ins + 1);
 strncpy (r_sql, insert into rand_state (state_rand) values ($1);,
 nr_ins);

  Oid * oids = (Oid *)palloc (sizeof (Oid));
 Datum * val = PointerGetDatum (randBuf);
 *oids = BYTEAOID;
 const char * nulls = NULL;
 int rins = SPI_execute_with_args (r_sql, 1, oids, val, nulls, false, 1);

  randBuf is a void * pointer that contains random number generator state,
 when I try to execute SPI_execute_with_args (r_sql, 1, oids, val, nulls,
 false, 1); I receive error
 The connection to the server was lost. Attempting reset: Failed.
  Could you give some work examples for SPI_execute_with_args because I
 didn't find them in documentation.

  Thanks a lot.

  --
 Best regards,
 Sincerely yours,
 Yuriy Rusinov.

 I can't answer your question.

 However, I can say that PRIMARY KEY implies NOT NULL (also an UNIQUE
 index), so you don't need to explicitly add NOT NULL when you are specifying
 PRIMARY KEY!


 Cheers,
 Gavin




-- 
Best regards,
Sincerely yours,
Yuriy Rusinov.


Re: [GENERAL] Curious why planner can't handle NOT IN

2013-05-03 Thread Tom Lane
Yang Zhang yanghates...@gmail.com writes:
 The most obvious query doesn't work so hot due to repeated execution
 of the subplan:

 = explain insert into newaccount select * from account where id not
 in (select id from account);

Yeah.  Try using a NOT EXISTS instead.

 This isn't a big deal as there's a straightforward workaround, but I
 am curious what happened here.  Googling turns up various mentions of
 NOT IN with poor plans that involve subplans.  Then again I have
 read mention of hash anti-join which seems appropriate here(?), but
 that wasn't used here

The trouble with NOT IN is that it's not exactly the same as an
antijoin, because of the spec-mandated bizarre behavior for NULLs.
It's very difficult to optimize it to any extent without producing
wrong answers.  NOT EXISTS avoids that problem.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SPI_execute_with_args call

2013-05-03 Thread Tom Lane
Yuriy Rusinov yrusi...@gmail.com writes:
 In C-function I do
 size_t nr_ins = strlen (insert into rand_state (state_rand) values ($1););
 char * r_sql = (char *) palloc (nr_ins + 1);
 strncpy (r_sql, insert into rand_state (state_rand) values ($1);, nr_ins);

This is a hard, error-prone, and ultimately incorrect way to do
pstrdup() --- you're not ensuring that the new string is
null-terminated.

 Datum * val = PointerGetDatum (randBuf);

Didn't your compiler give you a warning about that?  PointerGetDatum
produces a Datum, not a pointer to a Datum.  You'd need something more
like

Datum val[1];
val[0] = PointerGetDatum (randBuf);

This is assuming that randBuf is even of the right format to be a bytea
value, which is unclear from your extract.

 const char * nulls = NULL;

And that's just wrong.  Personally I'd just pass NULL to
SPI_execute_with_args since you don't have any null values to pass, but
if you don't want to do that you'd need something more like

char nulls[1];
nulls[0] = ' ';

(hmm, it looks like the SPI documentation leaves something to be desired
here --- the SPI_execute_with_args page, at least, isn't explaining the
convention for elements of the nulls[] array)

 Could you give some work examples for SPI_execute_with_args because I
 didn't find them in documentation.

A quick grep says there's a usage in src/pl/plpgsql/src/pl_exec.c

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Curious why planner can't handle NOT IN

2013-05-03 Thread pradeep singh
On May 3, 2013 12:15 PM, Yang Zhang yanghates...@gmail.com wrote:

 I have an `account` table with 5.3M rows, with primary key `id` of
 type `text` (and 600+ columns if that matters).

 I'm trying to create a `newaccount` table with the same schema but
 600k newly imported rows, then insert all the old rows for which `id`
 isn't already in the set of newly updated rows.

 = create table newaccount as select * from account limit 0;

 = \copy newaccount from stdin with (format csv)
 [...copy 600k rows in...]

 = analyze newaccount;

 The most obvious query doesn't work so hot due to repeated execution
 of the subplan:

 = explain insert into newaccount select * from account where id not
 in (select id from account);

QUERY PLAN

 -
  Insert on newaccount  (cost=0.00..458800400224.89 rows=2985976 width=5366)
-  Seq Scan on account  (cost=0.00..458800400224.89 rows=2985976
 width=5366)
  Filter: (NOT (SubPlan 1))
  SubPlan 1
-  Materialize  (cost=0.00..153276.00 rows=15 width=32)
  -  Seq Scan on newaccount  (cost=0.00..151500.00
 rows=15 width=32)
 (6 rows)

 This works fine, though:

 = explain insert into newaccount select * from account where id in
 (select id from account except select id from newaccount);

QUERY PLAN

 
  Insert on newaccount  (cost=3749026.24..14034547.77 rows=2985976
 width=5366)
-  Hash Join  (cost=3749026.24..14034547.77 rows=2985976 width=5366)
  Hash Cond: (account.id = ANY_subquery.id)
  -  Seq Scan on account  (cost=0.00..2263744.51 rows=5971951
 width=5366)
  -  Hash  (cost=3633552.85..3633552.85 rows=5971951 width=32)
-  Subquery Scan on ANY_subquery
 (cost=3543223.59..3633552.85 rows=5971951 width=32)
  -  SetOp Except  (cost=3543223.59..3573833.34
 rows=5971951 width=19)
-  Sort  (cost=3543223.59..3558528.47
 rows=6121951 width=19)
  Sort Key: *SELECT* 1.id
  -  Append  (cost=0.00..2476464.02
 rows=6121951 width=19)
-  Subquery Scan on *SELECT*
 1  (cost=0.00..2323464.02 rows=5971951 width=19)
  -  Seq Scan on account
 (cost=0.00..2263744.51 rows=5971951 width=19)
-  Subquery Scan on *SELECT*
 2  (cost=0.00..153000.00 rows=15 width=32)
  -  Seq Scan on
 newaccount  (cost=0.00..151500.00 rows=15 width=32)
 (14 rows)

 This is all in PG 9.1.

 This isn't a big deal as there's a straightforward workaround, but I
 am curious what happened here.  Googling turns up various mentions of
 NOT IN with poor plans that involve subplans.  Then again I have
 read mention of hash anti-join which seems appropriate here(?), but
 that wasn't used here (nor was the latter join used, though for
 whatever reason it looks like a more complex/deeper plan tree than I
 had expected, so maybe it was out of the plan generator's reach?).

 E.g. the following mentions cranking up work_mem, but I probably can't
 crank up work_mem to meet the requirements of this example, and even
 if it doesn't fit in memory, it'd be nice for the planner to not
 degenerate to a pathological plan and still execute this join
 efficiently while spilling to and from disk.


 http://stackoverflow.com/questions/7125291/postgresql-not-in-versus-except-performance-difference-edited-2

 Thanks!


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] SPI_execute_with_args call

2013-05-03 Thread Yuriy Rusinov
Thanks a lot, I have corrected and bug was fixed.


On Fri, May 3, 2013 at 6:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Yuriy Rusinov yrusi...@gmail.com writes:
  In C-function I do
  size_t nr_ins = strlen (insert into rand_state (state_rand) values
 ($1););
  char * r_sql = (char *) palloc (nr_ins + 1);
  strncpy (r_sql, insert into rand_state (state_rand) values ($1);,
 nr_ins);

 This is a hard, error-prone, and ultimately incorrect way to do
 pstrdup() --- you're not ensuring that the new string is
 null-terminated.

  Datum * val = PointerGetDatum (randBuf);

 Didn't your compiler give you a warning about that?  PointerGetDatum
 produces a Datum, not a pointer to a Datum.  You'd need something more
 like

 Datum val[1];
 val[0] = PointerGetDatum (randBuf);

 This is assuming that randBuf is even of the right format to be a bytea
 value, which is unclear from your extract.

  const char * nulls = NULL;

 And that's just wrong.  Personally I'd just pass NULL to
 SPI_execute_with_args since you don't have any null values to pass, but
 if you don't want to do that you'd need something more like

 char nulls[1];
 nulls[0] = ' ';

 (hmm, it looks like the SPI documentation leaves something to be desired
 here --- the SPI_execute_with_args page, at least, isn't explaining the
 convention for elements of the nulls[] array)

  Could you give some work examples for SPI_execute_with_args because I
  didn't find them in documentation.

 A quick grep says there's a usage in src/pl/plpgsql/src/pl_exec.c

 regards, tom lane




-- 
Best regards,
Sincerely yours,
Yuriy Rusinov.


[GENERAL] How to INSERT INTO one table from another table, WHERE

2013-05-03 Thread Kirk Wythers
I am trying to insert data from 2 columns in tableB (colX and colY) into the 
same two columns of tableB, with a join like where clause. Is this possible? 

For example:

INSERT INTO tableA (colX, colY) 
(SELECT colX, colY
FROM tableB
WHERE
tableA.blockname = tableB.block_name
AND tableA.timestamp = tableB.timestamp)
;




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to INSERT INTO one table from another table, WHERE

2013-05-03 Thread Igor Neyman
 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 ow...@postgresql.org] On Behalf Of Kirk Wythers
 Sent: Friday, May 03, 2013 1:51 PM
 To: POSTGRES
 Subject: [GENERAL] How to INSERT INTO one table from another table,
 WHERE
 
 I am trying to insert data from 2 columns in tableB (colX and colY)
 into the same two columns of tableB, with a join like where clause. Is
 this possible?
 
 For example:
 
 INSERT INTO tableA (colX, colY)
 (SELECT colX, colY
 FROM tableB
 WHERE
   tableA.blockname = tableB.block_name
   AND tableA.timestamp = tableB.timestamp) ;
 

If it's not the whole record but just some columns, you UPDATE them not INSERT:

UPDATE tableA A 
SET colX = B.colx, colY = B.colY
FROM table B B
WHERE A. blockname = B.block_name
AND A.timestamp = B.timestamp;

Note the use of aliases (A, B).

b.t.w. timestamp - isn't a good choice for column name, being a data type 
it's on the list of reserved words.

Regards,
Igor Neyman


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Unlogged indexes

2013-05-03 Thread Yang Zhang
Guessing the answer's no, but is there any way to construct indexes
such that I can safely put them on (faster) volatile storage? (Just to
be clear, I'm asking about indexes for *logged* tables.)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unlogged indexes

2013-05-03 Thread Thom Brown
On 3 May 2013 21:06, Yang Zhang yanghates...@gmail.com wrote:
 Guessing the answer's no, but is there any way to construct indexes
 such that I can safely put them on (faster) volatile storage? (Just to
 be clear, I'm asking about indexes for *logged* tables.)

Yes:

CREATE INDEX ... TABLESPACE tablespacename;
ALTER INDEX ... SET TABLESPACE tablespacename;

Although there's a disparity between your email subject and main text.
 Indexes for logged tables are always logged.  If you want an unlogged
index you can only create it for an unlogged table.

And putting indexes on a separate tablespace is probably not as
advantageous as you're thinking.  Might be worth testing.

--
Thom


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unlogged indexes

2013-05-03 Thread Yang Zhang
Yeah, I know that indexes for unlogged tables are unlogged.  I was
just wondering if you could do this for logged tables.  (Safely, such
that on crash recovery WAL replay won't throw up, these can be omitted
from base backups, etc.)

On Fri, May 3, 2013 at 1:46 PM, Thom Brown t...@linux.com wrote:
 On 3 May 2013 21:06, Yang Zhang yanghates...@gmail.com wrote:
 Guessing the answer's no, but is there any way to construct indexes
 such that I can safely put them on (faster) volatile storage? (Just to
 be clear, I'm asking about indexes for *logged* tables.)

 Yes:

 CREATE INDEX ... TABLESPACE tablespacename;
 ALTER INDEX ... SET TABLESPACE tablespacename;

 Although there's a disparity between your email subject and main text.
  Indexes for logged tables are always logged.  If you want an unlogged
 index you can only create it for an unlogged table.

 And putting indexes on a separate tablespace is probably not as
 advantageous as you're thinking.  Might be worth testing.

 --
 Thom


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How to monitor recovery on Windows?

2013-05-03 Thread Doug Gorley
This weekend, we're planning to move a production cluster to a new server.
 Rather than a full dump-and-restore, I want to use a base backup from last
night, have the server replay the WAL files up to the time we shutdown the
old server, and go from there.  Both servers are Windows.

I've never done a recovery from WAL files before, but it sounds simple
enough.  One thing I'm unsure of though: how can I monitor the progress of
the recovery?  Will the WAL files themselves change or disappear?  Will
progress reports end up in the PostgreSQL logs?  Something else entirely?

Thanks,
-- 
Doug Gorley | d...@gorley.ca


Re: [GENERAL] Simple, free PG GUI/query tool wanted

2013-05-03 Thread mark_r
You can use free tool - Valentina Studio
http://www.valentina-db.com/en/valentina-studio-overview 14 Feb 2013 in the
5.0 version added support of PostgreSQL, as well as mySQL/mariaDB, SQLite.
It is FREE. Works on Mac, Win and Linux. Includes not only db management but
powerfull reports that work again on 3 OS.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Simple-free-PG-GUI-query-tool-wanted-tp3305055p5754182.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general