date:20091228

Re: [HACKERS] info about patch: using parametrised query in psql

2009-12-28 Thread Pavel Stehule

2009/12/28 Robert Haas robertmh...@gmail.com:
 On Fri, Dec 25, 2009 at 3:10 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 25, 2009 at 2:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Works for me.  One small problem discussed upthread is that there
 currently doesn't appear to be a libpq function that does
 ident-quoting.  I'm thinking that we will need to add one to make this
 work - is that going to be a problem?

 The rules for ident-quoting are simple and haven't changed over the
 years, so we don't really *need* a libpq function for it.  OTOH you
 could argue it's inconsistent that we have one and not the other.

 Yeah.  Plus it seems like a useful thing to have, anyway.

 I'm thinking that since we're
 just adding a function it won't force an uncomfortable major-version
 bump on libpq.

 Yeah, we have taken the position in the past that adding new functions
 doesn't require a soname bump.

 Good.

 So it seems we have agreement on a new direction for this work.  We
 will not add the \pexec option Pavel proposed as part of this patch;
 instead, we will consider a patch that makes :'foo' and :foo do
 literal and identifier quoting of the corresponding value.  Based on
 this, I am marking the existing patch as Returned with Feedback, since
 what is needed here will amount to a totally base of code, and we can
 consider the revised patch if any for whichever CommitFest is open at
 the time that patch is submitted.

ok

Pavel


 Thanks,

 ...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] bugfix for int2vectorin

2009-12-28 Thread Robert Haas

On Wed, Dec 2, 2009 at 2:59 PM, Caleb Welton cwel...@greenplum.com wrote:
 New patch attached:

 1. Does not add a new error message (though the pg_atoi's error message is a
 little goofy looking).
 2. Handles int2 overflow cases.
 3. oidvectorin does NOT suffer from the same problems as int2vectorin,
 someone already fixed it.

 As for the use-case I'm not completely sure... I'm not an end-user, I'm just
 responding to a bug report.

 My stance here is that returning an error (even a bad error) on trying to
 convert data in is better
 doing  something wrong with bogus input.  In the first case a user
 scratches their head, maybe
 files a bug report, you tell them the correct syntax and they go on.  In the
 second case they input
 a bunch of data and then start complaining about data corruption, loss of
 data, etc. and the
 support case is 100x worse.

 The amount of code we are talking about here is less than 5 lines of code...

I have scrutinized the latest version of this patch and I feel that it
is a modest improvement on the status quo and that there is really no
downside.  Absent strong objections, I will commit it later this week.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Application name patch - v3

2009-12-28 Thread Dave Page

On Sun, Dec 27, 2009 at 11:15 PM, Guillaume Lelarge
guilla...@lelarge.info wrote:
 Le 13/11/2009 12:11, Dave Page a écrit :
 [...]
 What about pg_dump/psql setting fallback_application_name?

 Per Tom, I'm waiting on the possible new array-based libpq connect API
 which will make a conversion of those utilities from PQsetdbLogin a
 lot cleaner than moving to PQconnectdb (and all the ugly connection
 string building that would require).


 Is it still to be done? I don't see psql pr pg_dump set an application
 name on alpha 3. There are also pg_restore, vacuumdb, reindexdb, etc.

Yes, still waiting on the new API.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small locking bugs in hs

2009-12-28 Thread Simon Riggs

On Mon, 2009-12-28 at 01:22 +0100, Andres Freund wrote:

 Btw, dont understand my questions as criticism or such.

I didn't take them that way. Your questions and bug reports are welcome.

It was important that HS was released in Alpha so that we can shake out
bugs, issues and concerns early enough to get as many of them fixed in
this release as possible. It is also important that I fix remaining
issues in priority order.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Boszormenyi Zoltan

Hi,

attached is a small patch that makes it possible for clients
to receive row count for SELECT ... INTO ... and CREATE TABLE ... AS ...

Comments?

Best regards,
Zoltán Böszörményi

-- 
Bible has answers for everything. Proof:
But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil. (Matthew 5:37) - basics of digital technology.
May your kingdom come - superficial description of plate tectonics

--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/

diff -dcrpN pgsql.4.1/src/backend/tcop/pquery.c pgsql.6/src/backend/tcop/pquery.c
*** pgsql.4.1/src/backend/tcop/pquery.c	2009-12-15 10:15:05.0 +0100
--- pgsql.6/src/backend/tcop/pquery.c	2009-12-22 12:02:55.0 +0100
*** ProcessQuery(PlannedStmt *plan,
*** 205,211 
  		switch (queryDesc-operation)
  		{
  			case CMD_SELECT:
! strcpy(completionTag, SELECT);
  break;
  			case CMD_INSERT:
  if (queryDesc-estate-es_processed == 1)
--- 205,212 
  		switch (queryDesc-operation)
  		{
  			case CMD_SELECT:
! snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
! 		 SELECT %u, queryDesc-estate-es_processed);
  break;
  			case CMD_INSERT:
  if (queryDesc-estate-es_processed == 1)
diff -dcrpN pgsql.4.1/src/interfaces/libpq/fe-exec.c pgsql.6/src/interfaces/libpq/fe-exec.c
*** pgsql.4.1/src/interfaces/libpq/fe-exec.c	2009-08-07 13:06:30.0 +0200
--- pgsql.6/src/interfaces/libpq/fe-exec.c	2009-12-22 11:56:06.0 +0100
*** PQcmdTuples(PGresult *res)
*** 2753,2758 
--- 2753,2759 
  		p++;
  	}
  	else if (strncmp(res-cmdStatus, DELETE , 7) == 0 ||
+ 			 strncmp(res-cmdStatus, SELECT , 7) == 0 ||
  			 strncmp(res-cmdStatus, UPDATE , 7) == 0)
  		p = res-cmdStatus + 7;
  	else if (strncmp(res-cmdStatus, FETCH , 6) == 0)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Application name patch - v3

2009-12-28 Thread Guillaume Lelarge

Le 28/12/2009 10:07, Dave Page a écrit :
 On Sun, Dec 27, 2009 at 11:15 PM, Guillaume Lelarge
 guilla...@lelarge.info wrote:
 Le 13/11/2009 12:11, Dave Page a écrit :
 [...]
 What about pg_dump/psql setting fallback_application_name?

 Per Tom, I'm waiting on the possible new array-based libpq connect API
 which will make a conversion of those utilities from PQsetdbLogin a
 lot cleaner than moving to PQconnectdb (and all the ugly connection
 string building that would require).


 Is it still to be done? I don't see psql pr pg_dump set an application
 name on alpha 3. There are also pg_restore, vacuumdb, reindexdb, etc.
 
 Yes, still waiting on the new API.
 

Is there something I can do to make this move forward?


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Pavel Stehule

2009/12/28 Boszormenyi Zoltan z...@cybertec.at:
 Hi,

 attached is a small patch that makes it possible for clients
 to receive row count for SELECT ... INTO ... and CREATE TABLE ... AS ...

 Comments?


good idea

+1

Pavel

 Best regards,
 Zoltán Böszörményi

 --
 Bible has answers for everything. Proof:
 But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
 than these cometh of evil. (Matthew 5:37) - basics of digital technology.
 May your kingdom come - superficial description of plate tectonics

 --
 Zoltán Böszörményi
 Cybertec Schönig  Schönig GmbH
 http://www.postgresql.at/



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Hans-Juergen Schoenig


hello ...

just as a background info: this will have some positive side effects on 
embedded C programs which should be portable.

informix, for instance, will also return a row count on those commands.

   regards,

  hans



Pavel Stehule wrote:

2009/12/28 Boszormenyi Zoltan z...@cybertec.at:
  

Hi,

attached is a small patch that makes it possible for clients
to receive row count for SELECT ... INTO ... and CREATE TABLE ... AS ...

Comments?




good idea

+1

Pavel

  

Best regards,
Zoltán Böszörményi

--
Bible has answers for everything. Proof:
But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil. (Matthew 5:37) - basics of digital technology.
May your kingdom come - superficial description of plate tectonics

--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers





  



--
Cybertec Schoenig  Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Boszormenyi Zoltan

Hans-Juergen Schoenig írta:
 hello ...

 just as a background info: this will have some positive side effects
 on embedded C programs which should be portable.

Not just embedded C programs, every driver that's
based on libpq and used PQcmdTuples() will
automatically see the benefit.

 informix, for instance, will also return a row count on those commands.

regards,

   hans



 Pavel Stehule wrote:
 2009/12/28 Boszormenyi Zoltan z...@cybertec.at:
  
 Hi,

 attached is a small patch that makes it possible for clients
 to receive row count for SELECT ... INTO ... and CREATE TABLE ... AS
 ...

 Comments?

 

 good idea

 +1

 Pavel

  
 Best regards,
 Zoltán Böszörményi

 -- 
 Bible has answers for everything. Proof:
 But let your communication be, Yea, yea; Nay, nay: for whatsoever
 is more
 than these cometh of evil. (Matthew 5:37) - basics of digital
 technology.
 May your kingdom come - superficial description of plate tectonics

 --
 Zoltán Böszörményi
 Cybertec Schönig  Schönig GmbH
 http://www.postgresql.at/



 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


 

   




-- 
Bible has answers for everything. Proof:
But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil. (Matthew 5:37) - basics of digital technology.
May your kingdom come - superficial description of plate tectonics

--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Streaming Rep - 2-phase backups and reducing time to full replication

2009-12-28 Thread Simon Riggs

On Fri, 2009-12-25 at 14:33 +0900, Fujii Masao wrote:
 On Thu, Dec 24, 2009 at 6:03 PM, Simon Riggs si...@2ndquadrant.com wrote:
  I see it would work like this: Add a new option to recovery.conf,
  perhaps two_phase_backup = on. Startup creates a file called
  backup_in_progress then waits. When second phase of backup is complete
  (7b), delete the file and then Startup process will continue. Very few
  lines of code to make this work.
 
 Where do you think the WAL files shipped before doing (7b) are stored?
 If it's pg_xlog, the disk full failure would occur in the standby. If
 it's an archive, restore_command would have to be supplied the same as
 my idea.

Yes, agreed.

I am still concerned about the interactions at the start of replication.
It isn't clear how these things work exactly and as a result, I feel we
may be missing some better ideas.

Two points concern me
* When would sync rep be able to start?
* How do we avoid sending WAL twice?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] join ordering via Simulated Annealing

2009-12-28 Thread Jan Urbański

Andres Freund wrote:
 On Wednesday 23 December 2009 02:23:55 Jan Urbański wrote:

 Lastly, I'm lacking good testcases 

 If you want to see some queries which are rather hard to plan with random 
 search you can look at
 http://archives.postgresql.org/message-
 id/200907091700.43411.and...@anarazel.de
 which tom analyzed and improved here http://archives.postgresql.org/message-
 id/17807.1247932...@sss.pgh.pa.us

Thanks, these look like good testing candidates, not least because they
trigger assertion errors with my code :( I'll report back when they're
fixed...

 Robert had another example in 
 603c8f070911271205r4d4534edt1cebcb76ff506...@mail.gmail.com that might be 
 interesting.

Yes, I rememberd this one, will try to put them through the mill as soon
as I fix my code.

Cheers,
Jan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] updateMinRecoveryPoint bug?

2009-12-28 Thread Heikki Linnakangas

Fujii Masao wrote:
 In UpdateMinRecoveryPoint() and XLogNeedsFlush(), updateMinRecoveryPoint
 is used for us to short-circuit future checks only during a crash recovery.
 But it doesn't seem to play its role in a crash recovery that follows an
 archive recovery. Because such crash recovery always starts from *valid*
 minRecoveryPoint, i.e., updateMinRecoveryPoint is never set to FALSE.

Hmm, so the problem is:

1. Restore from archive. End archive recovery, creating a new
checkpoint. But we still write the old minRecoveryPoint value to pg_control

2. Crash. minRecoveryPoint is still set in crash recovery

Yeah, that should be fixed. Otherwise we will merrily start up after
crash, even if we don't reach the end of WAL. Although that shouldn't
happen, it's a bit disconcerting.

 How about always resetting ControlFile-minRecoveryPoint to {0, 0} at the
 beginning of a crash recovery, to fix the bug?

Yeah, that would work. I think it would be better to clear it in
CreateCheckPoint(), though, when we set the pointer to the new
checkpoint. That includes the shutdown checkpoint created at the end of
archive recovery. minRecoveryPoint should never be set during normal
operation, after all.

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6882,6 +6882,7 @@ CreateCheckPoint(int flags)
ControlFile-checkPoint = ProcLastRecPtr;
ControlFile-checkPointCopy = checkPoint;
ControlFile-time = (pg_time_t) time(NULL);
+   MemSet(ControlFile-minRecoveryPoint, 0, sizeof(XLogRecPtr));
UpdateControlFile();
LWLockRelease(ControlFileLock);

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing pg_migrator limitations

2009-12-28 Thread Bruce Momjian

Robert Haas wrote:
 On Sun, Dec 27, 2009 at 2:16 PM, Robert Haas robertmh...@gmail.com wrote:
  On Sun, Dec 27, 2009 at 9:53 AM, Bruce Momjian br...@momjian.us wrote:
  Bruce Momjian wrote:
  There are several pg_migrator limitations that appeared late in the 8.4
  development cycle and were impossible to fix at that point. ?I would
  like to fix them for Postgres 8.5:
 
  ? ? ? ? o ?a user-defined composite data type
  ? ? ? ? o ?a user-defined array data type
  ? ? ? ? o ?a user-defined enum data type
 
  FYI, these pg_migrator restrictions are now gone when migrating to PG
  8.5, even _from_ PG 8.3.
 
  Wow, cool. ?That seems like a good step forward.
 
 It appears that the pg_migrator README needs a bit of revision to make
 it more clear which limitations apply to migration between which
 versions.  In particular, the current wording suggests that NONE of
 the limitations apply to 8.3 - 8.5 migrations, which is not the case
 - e.g. we haven't done anything about the need to rebuild certain
 types of indices.

Very true. I have just made a new pg_migrator release with an updated
README file.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing pg_migrator limitations

2009-12-28 Thread Robert Haas

On Mon, Dec 28, 2009 at 10:48 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Sun, Dec 27, 2009 at 2:16 PM, Robert Haas robertmh...@gmail.com wrote:
  On Sun, Dec 27, 2009 at 9:53 AM, Bruce Momjian br...@momjian.us wrote:
  Bruce Momjian wrote:
  There are several pg_migrator limitations that appeared late in the 8.4
  development cycle and were impossible to fix at that point. ?I would
  like to fix them for Postgres 8.5:
 
  ? ? ? ? o ?a user-defined composite data type
  ? ? ? ? o ?a user-defined array data type
  ? ? ? ? o ?a user-defined enum data type
 
  FYI, these pg_migrator restrictions are now gone when migrating to PG
  8.5, even _from_ PG 8.3.
 
  Wow, cool. ?That seems like a good step forward.

 It appears that the pg_migrator README needs a bit of revision to make
 it more clear which limitations apply to migration between which
 versions.  In particular, the current wording suggests that NONE of
 the limitations apply to 8.3 - 8.5 migrations, which is not the case
 - e.g. we haven't done anything about the need to rebuild certain
 types of indices.

 Very true. I have just made a new pg_migrator release with an updated
 README file.

Ah, cool.  So this seems to imply that a migration from 8.4 to 8.5
should be clear sailing.  Is that correct?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Application name patch - v3

2009-12-28 Thread Tom Lane

Guillaume Lelarge guilla...@lelarge.info writes:
 Le 28/12/2009 10:07, Dave Page a écrit :
 Yes, still waiting on the new API.

 Is there something I can do to make this move forward?

I think we were stalled on the question of whether to use one array
or two parallel arrays.  Do you want to try coding up a sample usage
of each possibility so we can see which one seems more useful?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Tom Lane

Boszormenyi Zoltan z...@cybertec.at writes:
 Hans-Juergen Schoenig Ãrta:
 just as a background info: this will have some positive side effects
 on embedded C programs which should be portable.

 Not just embedded C programs, every driver that's
 based on libpq and used PQcmdTuples() will
 automatically see the benefit.

And, by the same token, the scope for possibly breaking clients is nearly
unlimited ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Peter Eisentraut

On mån, 2009-12-28 at 11:08 -0500, Tom Lane wrote:
 Boszormenyi Zoltan z...@cybertec.at writes:
  Hans-Juergen Schoenig írta:
  just as a background info: this will have some positive side effects
  on embedded C programs which should be portable.
 
  Not just embedded C programs, every driver that's
  based on libpq and used PQcmdTuples() will
  automatically see the benefit.
 
 And, by the same token, the scope for possibly breaking clients is nearly
 unlimited ...

Why is that?  Are there programs out there that expect PQcmdTuples() to
return something that is *not* the tuple count for these commands and
will violently misbehave otherwise?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] updateMinRecoveryPoint bug?

2009-12-28 Thread Simon Riggs

On Mon, 2009-12-28 at 14:40 +0200, Heikki Linnakangas wrote:
 Fujii Masao wrote:

  How about always resetting ControlFile-minRecoveryPoint to {0, 0} at the
  beginning of a crash recovery, to fix the bug?
 
 Yeah, that would work. I think it would be better to clear it in
 CreateCheckPoint(), though, when we set the pointer to the new
 checkpoint. That includes the shutdown checkpoint created at the end of
 archive recovery. minRecoveryPoint should never be set during normal
 operation, after all.

Sounds better.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On mÃ¥n, 2009-12-28 at 11:08 -0500, Tom Lane wrote:
 And, by the same token, the scope for possibly breaking clients is nearly
 unlimited ...

 Why is that?  Are there programs out there that expect PQcmdTuples() to
 return something that is *not* the tuple count for these commands and
 will violently misbehave otherwise?

It's more the possibility of doing strcmp(tag, SELECT) on the command
tag that worries me.  Describing the API change here as being limited
to PQcmdTuples misses the point rather completely: this is a protocol
change, and could break both clients and non-libpq driver libraries.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing pg_migrator limitations

2009-12-28 Thread Bruce Momjian

Robert Haas wrote:
 On Mon, Dec 28, 2009 at 10:48 AM, Bruce Momjian br...@momjian.us wrote:
  Robert Haas wrote:
  On Sun, Dec 27, 2009 at 2:16 PM, Robert Haas robertmh...@gmail.com wrote:
   On Sun, Dec 27, 2009 at 9:53 AM, Bruce Momjian br...@momjian.us wrote:
   Bruce Momjian wrote:
   There are several pg_migrator limitations that appeared late in the 8.4
   development cycle and were impossible to fix at that point. ?I would
   like to fix them for Postgres 8.5:
  
   ? ? ? ? o ?a user-defined composite data type
   ? ? ? ? o ?a user-defined array data type
   ? ? ? ? o ?a user-defined enum data type
  
   FYI, these pg_migrator restrictions are now gone when migrating to PG
   8.5, even _from_ PG 8.3.
  
   Wow, cool. ?That seems like a good step forward.
 
  It appears that the pg_migrator README needs a bit of revision to make
  it more clear which limitations apply to migration between which
  versions. ?In particular, the current wording suggests that NONE of
  the limitations apply to 8.3 - 8.5 migrations, which is not the case
  - e.g. we haven't done anything about the need to rebuild certain
  types of indices.
 
  Very true. I have just made a new pg_migrator release with an updated
  README file.
 
 Ah, cool.  So this seems to imply that a migration from 8.4 to 8.5
 should be clear sailing.  Is that correct?

Yes, so far.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New VACUUM FULL

2009-12-28 Thread Robert Haas

On Tue, Dec 22, 2009 at 7:29 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2009-12-22 at 19:45 +0900, Takahiro Itagaki wrote:

 I used VACUUM FULL because we were discussing to drop VFI completely,
 but I won't replace the behavior if hot-standby can support VFI.

 HS can't support VFI now, by definition. We agreed to spend the time
 getting rid of VFI, which working on this with you is part of.

 If we can just skip the index rebuild, I think that's all the additional
 code changes we need. I'll improve the docs as I review-to-commit.

So, what is the roadmap for getting this done?  It seems like to get
rid of VFI completely, we would need to implement something like what
Tom described here:

http://archives.postgresql.org/pgsql-hackers/2009-09/msg00249.php

I'm not sure whether the current patch is a good intermediate step
towards that ultimate goal, or whether events have overtaken it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] psql 8.4 \c repeats version banner

2009-12-28 Thread Peter Eisentraut

In 8.3, running \c from a file prints something like

You are now connected to database postgres.

In 8.4 it prints

psql (8.4.1)
You are now connected to database postgres.

Is it intentional/sensible to repeat the startup banner every time the
connection changes, or was this unintentionally introduced while the
startup banner was reshuffled in 8.4?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Tom Lane

Boszormenyi Zoltan z...@cybertec.at writes:
 attached is a small patch that makes it possible for clients
 to receive row count for SELECT ... INTO ... and CREATE TABLE ... AS ...

 Comments?

This doesn't look tremendously well thought out to me.

1. As given, the patch changes the result not only for SELECT INTO but
for any SELECT executed in PORTAL_MULTI_QUERY context (consider SELECTs
added by rules for example).  It seems like a pretty bad idea for the
result of a statement to depend on context.

2. In the past we have regretted it when we made the same command tag
sometimes have numbers attached and sometimes not (note the hack at
the bottom of PortalRunMulti).  It doesn't seem like terribly good
design to do that here.  On the other hand, always attaching a count
to SELECT tags would greatly increase the risk of breaking clients.


I'm not at all convinced that this is so useful as to justify taking
any compatibility risks for.  People who really need that count can
get it easily enough by breaking the command into a CREATE followed
by INSERT/SELECT.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql 8.4 \c repeats version banner

2009-12-28 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 In 8.3, running \c from a file prints something like
 You are now connected to database postgres.

 In 8.4 it prints

 psql (8.4.1)
 You are now connected to database postgres.

 Is it intentional/sensible to repeat the startup banner every time the
 connection changes, or was this unintentionally introduced while the
 startup banner was reshuffled in 8.4?

I think the argument was that if you are connecting to a new server, the
server version number could change, and so it is useful to repeat that
line to have a place to display the possible version mismatch indicator.
Maybe we could suppress it if the third and fourth arguments are
omitted, but I'm not sure it's worth the trouble.

The original design didn't have the third and fourth arguments, hence no
possibility of server change.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Serializable implementation

2009-12-28 Thread Kevin Grittner

For perspective:  Since our current application framework is about
ten years old now, the Wisconsin State Courts System has put
together a Long-Term Application Development Group to review all
aspects of our development and production runtime environments.  We
started by reviewing various aspect of these environments and
addressing the question: Ignoring for the moment how we would get
there, where would we most like to be four years from now?
 
One of the elements on the list was our database environment.  We're
very happy with PostgreSQL except for one thing -- the lack of
support for serializable transaction isolation means that no
integrity rules can be relied upon except those implemented within
the database engine itself (such as foreign key constraints).  Given
the complexity of the schema, the number of programmers, and the
number of queries -- any attempt to address the problem by looking
at particular interactions between particular queries and using
explicit locking would be hard to distinguish from completely
useless.
 
Serializable transaction isolation is attractive for shops with
active development by many programmers against a complex schema
because it guarantees data integrity with very little staff time --
if a transaction can be shown to always do the right thing when it
is run alone (before or after any other transaction), it will always
do the right thing in any mix of concurrent serializable
transactions.  Period.
 
I'm going to try to ignore any suggestions that no other database
product has implemented this or that it's not useful.  While it
carries a performance cost which makes it untenable in some
situations, and isn't needed when you have only a handful of
programmers developing against only 100 tables, it has it's place
and is supported in every major database product I know *except* for
PostgreSQL.
 
We would like to relax our portability mandate with regard to the
database layer, and use more PostgreSQL-specific features, but are
reluctant to do so while PostgreSQL remains virtually alone in
lacking support for actual serializable transactions.  (Other MVCC
databases, like Oracle and recent versions of Microsoft SQL Server
provide snapshot isolation, but also support serializable
isolation.)  Given the benefits of using the PostgreSQL-specific
features, it appears that we might be ahead by implementing support
for serializable transaction isolation so that we can relax
portability requirements and thereby avoid developing, in our
framework, our own (portable) implementation of features available
in native PostgreSQL.
 
Between staff and contractors we have over 20 programmers working
here.  We would probably draw on that base for some of the work. 
Given the nature of the work, we might be able to find a CS grad
student somewhere who would be interested in contributing.  If there
were any contractors with PostgreSQL development experience
registered with the state's procurement agency[1], we would be very
likely to draw on them in the effort.
 
I would like to start new threads for any technical discussions --
what I want to discuss here is how to approach this in a way that
avoids the SE-PostgreSQL doldrums.  I know from some responses to
previous discussions of the issue, both on and off list, that there
are other PostgreSQL users who want this, so at a minimum we would
make our patch set available to others; but it would certainly be a
benefit to both us (in avoiding re-basing the patch for every major
release, as well as having extra eyes on the code), and to others
(in having it just work on installation, without applying a
third-party patch set) if we can do this right.
 
One thing which strikes me is that there are at least three fairly
well documented techniques for implementing true serializable
transaction isolation under MVCC:
 
(1)  Strict 2 Phase Locking (S2PL).  This is the most widely used,
by far, and probably best understood by developers.  It has the
disadvantage of creating so much blocking under some loads that it
isn't always feasible.
 
(2)  Optimistic Concurrency Control (OCC).  This generates almost no
overhead in some DBMSs under some read-mostly loads (although I'm
not sure whether that would be true in PostgreSQL), but degenerates
worse than S2PL under many loads with higher levels of write
contention.
 
(3)  Serializable Snapshot Isolation (SSI).  This relies mostly on
snapshot isolation techniques, with detection of possible conflicts
with low overhead.  It is a very new technique, appearing in the
literature less than two years ago, and only implemented so far in
two prototypes: Oracle Berkeley-DB and InnoDB.  Under many loads
performance is very close to Snapshot Isolation (SI), but the
dynamics aren't too well known yet, and there may be some lurking
pessimal cases not yet identified.  Serialization errors are higher
than in SI or S2PL, so it should only be used where the software is
prepared to deal with serialization errors in a universal

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Boszormenyi Zoltan

Tom Lane írta:
 Peter Eisentraut pete...@gmx.net writes:
   
 On mĂĽn, 2009-12-28 at 11:08 -0500, Tom Lane wrote:
 
 And, by the same token, the scope for possibly breaking clients is nearly
 unlimited ...
   

   
 Why is that?  Are there programs out there that expect PQcmdTuples() to
 return something that is *not* the tuple count for these commands and
 will violently misbehave otherwise?
 

 It's more the possibility of doing strcmp(tag, SELECT) on the command
   

Actually it's strncmp(tag, SELECT , 7), so when you mix old server
with new clients or new server with old client, it will just work as
before, i.e.
return .

 tag that worries me.  Describing the API change here as being limited
 to PQcmdTuples misses the point rather completely: this is a protocol
 change, and could break both clients and non-libpq driver libraries.

   regards, tom lane

   


-- 
Bible has answers for everything. Proof:
But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil. (Matthew 5:37) - basics of digital technology.
May your kingdom come - superficial description of plate tectonics

--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ExecTidReScan exprCtxt

2009-12-28 Thread Robert Haas

In ExecTidReScan, we have the following:

/* If we are being passed an outer tuple, save it for runtime
key calc */
if (exprCtxt != NULL)
node-ss.ps.ps_ExprContext-ecxt_outertuple =
exprCtxt-ecxt_outertuple;

Is this dead code?  I have been pouring through all the callers of
ExecReScan() and AFAICT the only place where we use an exprCtxt that
is neither pushed down from a higher-level executor node nor NULL is
in ExecNestLoop().  So I think that the only reason why we would need
this if we supported a nestloop with an inner tidscan.  But I don't
think we do, nor do I see an obvious application for such a thing.  A
romp through CVS history shows we have had some variant of this code
in ExecTidReScan since tid-scans were originally added, which makes me
suspicious that there was some point to this at least at one time, but
I haven't been able to figure out what it is/was.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecTidReScan exprCtxt

2009-12-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 In ExecTidReScan, we have the following:
 /* If we are being passed an outer tuple, save it for runtime
 key calc */
 if (exprCtxt != NULL)
 node-ss.ps.ps_ExprContext-ecxt_outertuple =
 exprCtxt-ecxt_outertuple;

 Is this dead code?

I don't think it's reachable at the moment, but we do have interest
in allowing joins using the TID value.  See for example
http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php
http://archives.postgresql.org/pgsql-performance/2007-04/msg00231.php
So I wouldn't be in favor of removing it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Tom Lane

Boszormenyi Zoltan z...@cybertec.at writes:
 Tom Lane írta:
 It's more the possibility of doing strcmp(tag, SELECT) on the command

 Actually it's strncmp(tag, SELECT , 7), so when you mix old server
 with new clients or new server with old client, it will just work as
 before, i.e. return .

Are you deliberately ignoring the point?  We have no way to know whether
there is any client-side code that's doing a simple check for SELECT
command tag, but it's certainly possible.  The fact that it wouldn't be
hard to fix does not mean that it wouldn't be broken.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Admission Control Policy

2009-12-28 Thread Kevin Grittner

This paper has a brief but interesting discussion of Admission
Control in section 2.4:
 
Architecture of a Database System. (Joseph M. Hellerstein, Michael
Stonebraker and James Hamilton). Foundations and Trends in Databases
1(2).
 
http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
 
They describe a two-tier approach, where the first tier is already
effectively implemented in PostgreSQL with the max_connections and
superuser_reserved_connections GUCs.  The second tier is implemented
to run after a plan is chosen, and may postpone execution of a query
(or reduce the resources it is allowed) if starting it at that time
might overload available resources.  I think that implementing
something like this could potentially help with several types of
problems.
 
We often see posts from people who have more active connections than
is efficient.  We could, for example, have a policy which queues
query requests which are *not* from a superuser and not part of a
transaction which has acquired a snapshot or any locks, if the
number of active transactions is above a certain threshold.  Proper
configuration of a policy like this might change the performance
graph to stay relatively steady past the knee rather than
degrading.
 
We occasionally see posts where people have exhausted available
RAM and suffered a severe performance hit or a crash, due to an
excessively high setting of work_mem or maintenance_work_mem.
A good policy might warn and reduce the setting or reschedule
execution to keep things from getting too out of hand.
 
A good policy might also reduce conflicts between transactions,
making stricter transaction isolation less painful.  While this
observation motivated me to think about it, it seems potentially
useful on its own.
 
It might perhaps make sense to provide some hook to allow custom
policies to supplement or override a simple default policy.
 
Thoughts?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Port for Microsoft Services for Unix (SFU) or SUA

2009-12-28 Thread Josh Rovero

Has anyone built postgresql (or just libpq.a) on Windows SFU/SUA?
Would prefer to not reinvent any wheels

We have a number of Unix/Linux applications that are also compiled
under Microsoft Windows SFU 3.5.  We need to have a SFU-compatible
libpq.a, not a complete install.  The regular Unix/Linux or
Win32 postgresql database runs fine, these apps just need to connect.  

The system has a gnu build environment using the utilites
from Interop Systems at http://www.suacommunity.com/SUA.aspx.
The complete toolset has gcc, gmake, etc.

The OS reports itself (uname -a) as:
Interix zonda 3.5 SP-8.0.1969.1 x86 AMD_x86_Family15_Model72_Stepping2

I have run into and fixed a few issues, like INADDR_LOOPBACK not being
defined in the Interix /usr/include/netinet/in.h, adding dummy 
entries for src/include/port/interix.h, src/template/interix,
src/backend/port/dynloader/interix.[c|h], etc,

./configure runs, and gcc compiles things fine up until
the first ar, where I see ar: illegal option -- g.

Thanks in advance,
-- 
P.J. Josh RoveroVice President Sonalysts, Inc.
Email:  rov...@sonalysts.com  www.sonalysts.com  215 Parkway North
Work:   (860)326-3671Waterford, CT 06385



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Application name patch - v3

2009-12-28 Thread Guillaume Lelarge

Le 28/12/2009 17:06, Tom Lane a écrit :
 Guillaume Lelarge guilla...@lelarge.info writes:
 Le 28/12/2009 10:07, Dave Page a écrit :
 Yes, still waiting on the new API.
 
 Is there something I can do to make this move forward?
 
 I think we were stalled on the question of whether to use one array
 or two parallel arrays.  Do you want to try coding up a sample usage
 of each possibility so we can see which one seems more useful?
 

I'm interested in working on this. But I don't find the thread that talk
about this. I feel pretty dumb, but I re-read every mail on Application
name patch - v2, Application name patch - v3, and Application name
patch - v4 threads. I also re-read the Client application name
thread. The only mail I see that relates to the new API is the one from
Dave (the one I answered today).

So, can someone point me to the thread that deals with this new
array-based libpq connect API? or can someone explain it to me?

Thanks.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Dimitri Fontaine

Hi,

Le 28 déc. 2009 à 21:33, Kevin Grittner a écrit :
 We often see posts from people who have more active connections than
 is efficient.

How would your proposal better solve the problem than using pgbouncer?

mad proposal time
I'd be in favor of considering how to get pgbouncer into -core, and now that we 
have Hot Standby maybe implement a mode in which as soon as a real XID is 
needed, or maybe upon receiving start transaction read write command, the 
connection is handled transparently to the master.
/

Regards,
-- 
dim
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Andres Freund

On Monday 28 December 2009 22:39:06 Dimitri Fontaine wrote:
 Hi,
 
 Le 28 déc. 2009 à 21:33, Kevin Grittner a écrit :
  We often see posts from people who have more active connections than
  is efficient.
 
 How would your proposal better solve the problem than using pgbouncer?
 
 mad proposal time
 I'd be in favor of considering how to get pgbouncer into -core, and now
  that we have Hot Standby maybe implement a mode in which as soon as a
  real XID is needed, or maybe upon receiving start transaction read write
  command, the connection is handled transparently to the master. /
Thats not as easy as it sounds - the master may not have all data needed by 
the snapshot on the slave anymore.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Dimitri Fontaine

Le 28 déc. 2009 à 22:46, Andres Freund a écrit :
 mad proposal time
 I'd be in favor of considering how to get pgbouncer into -core, and now
 that we have Hot Standby maybe implement a mode in which as soon as a
 real XID is needed, or maybe upon receiving start transaction read write
 command, the connection is handled transparently to the master. /

 Thats not as easy as it sounds - the master may not have all data needed by 
 the snapshot on the slave anymore.

I suppose that if it was easy some patch would already be around for next 
commit fest? :)

Seriously, your point is why I'd be tempted to only consider getting to the 
master at transaction starting time. That is before any snapshot is taken.

Regards,
-- 
dim
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Application name patch - v3

2009-12-28 Thread Tom Lane

Guillaume Lelarge guilla...@lelarge.info writes:
 Le 28/12/2009 17:06, Tom Lane a écrit :
 I think we were stalled on the question of whether to use one array
 or two parallel arrays.  Do you want to try coding up a sample usage
 of each possibility so we can see which one seems more useful?

 I'm interested in working on this. But I don't find the thread that talk
 about this.

Try here
http://archives.postgresql.org/message-id/4aae8ccf.9070...@esilo.com

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Kevin Grittner

Dimitri Fontaine dfonta...@hi-media.com wrote:
 Le 28 déc. 2009 à 21:33, Kevin Grittner a écrit :
 We often see posts from people who have more active connections
 than is efficient.
 
 How would your proposal better solve the problem than using
 pgbouncer?
 
With my current knowledge of pgbouncer I can't answer that
definitively; but *if* pgbouncer, when configured for transaction
pooling, can queue new transaction requests until a connection is
free, then the differences would be:
 
(1)  According to pgbouncer documentation, transaction pooling is a
hack as it breaks application expectations of backend connection.
You can use it only when application cooperates with such usage by
not using features that can break.  This would not be an issue with
an ACP.
 
(2)  For the active connection aspect of the policy, you could let
through superuser requests while other requests were queuing.
 
(3)  With the ACP, the statements would be parsed and optimized
before queuing, so they would be ready to execute as soon as a
connection was freed.
 
(4)  Other factors than active connection count could be applied,
like expected memory consumption, or more esoteric metrics.
 
In favor of pgbouncer (or other connection poolers) they don't
require the overhead of a process and connection for each idle
connection, so I would recommend a connection pooler even with an
ACP.  They cover overlapping ground, but I see them as more
complementary than competing.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Dimitri Fontaine

Le 28 déc. 2009 à 22:59, Kevin Grittner a écrit :
 With my current knowledge of pgbouncer I can't answer that
 definitively; but *if* pgbouncer, when configured for transaction
 pooling, can queue new transaction requests until a connection is
 free, then the differences would be:

It does that, yes. You setup a pool, which is per database/user, and when 
there's no more server side connection in the pool, the clients are held in 
cl_waiting state.

 (1)  According to pgbouncer documentation, transaction pooling is a
 hack as it breaks application expectations of backend connection.
 You can use it only when application cooperates with such usage by
 not using features that can break.  This would not be an issue with
 an ACP.

That's why there's both transaction and session pooling. The benefit of session 
pooling is to avoid forking backends, reusing them instead, and you still get 
the pooling control.

 (2)  For the active connection aspect of the policy, you could let
 through superuser requests while other requests were queuing.

superuser is another user and gets its own pool, I'm not sure if you can size 
it differently though (yet). It's possible to trick a little by defining 
another (virtual) database where you force the user in the connection string to 
the server, then tell your application to use this special database.

 (3)  With the ACP, the statements would be parsed and optimized
 before queuing, so they would be ready to execute as soon as a
 connection was freed.

There's a pgfoundry project called preprepare, which can be used along with 
pgbouncer to get this effect. If you use 8.4, you can even get the effect 
without pgbouncer.

  http://preprepare.projects.postgresql.org/README.html

 (4)  Other factors than active connection count could be applied,
 like expected memory consumption, or more esoteric metrics.

All you can put in connection strings or per-role setting can be used to trick 
a virtual database and have it pre-set, but that means different pools (they 
accumulate, now) and different connection strings for the application. The only 
advantage is that it works with released and proven code! (except for 
preprepare... well I've been told it's running in production somewhere)

 In favor of pgbouncer (or other connection poolers) they don't
 require the overhead of a process and connection for each idle
 connection, so I would recommend a connection pooler even with an
 ACP.  They cover overlapping ground, but I see them as more
 complementary than competing.

Yeah, just trying to understand what you're proposing in terms of what I 
already know :)
-- 
dim


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Kevin Grittner

Dimitri Fontaine dfonta...@hi-media.com wrote:
 
 That's why there's both transaction and session pooling. The
 benefit of session pooling is to avoid forking backends, reusing
 them instead, and you still get the pooling control.
 
So the application would need to open and close a pgbouncer
connection for each database transaction in order to share the
backend properly?
 
 (4)  Other factors than active connection count could be applied,
 like expected memory consumption, or more esoteric metrics.
 
 All you can put in connection strings or per-role setting can be
 used to trick a virtual database and have it pre-set, but that
 means different pools (they accumulate, now) and different
 connection strings for the application. 
 
Well, I don't know that you can very accurately predict a plan or
what its memory usage would be.  Trying to work out all permutations
in advance and send each query to the right pool doesn't seem
workable on a large scale.
 
If we had a pooler bundled into the backend and defaulted to a
halfway reasonable configuration, it's possible that implementing an
active connection limit the second tier ACP would be covering close
enough to the same ground as to be redundant.  I'm not quite
convinced, however, that your proposed use of pgbouncer for this,
given the multiple pools which would need to be configured and the
possible application awareness and cooperation with policy would be
better than a fairly simple ACP.  It seems a bit like driving nails
with a wrench.  I like wrenches, I use them to turn things, but I
don't like using them to drive nails when I can help it.  :-)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Stats for inheritance trees

2009-12-28 Thread Tom Lane

Following up on the discussion here
http://archives.postgresql.org/message-id/4b3875c602250002d...@gw.wicourts.gov
I'd like to propose making the following changes that would allow saner
planning for queries involving inheritance:

1. Currently the primary key of pg_statistic is (starelid, staattnum)
indicating the table and column the stats entry is for.  I propose
adding a bool stainherit to the pkey.  false means the stats entry
is for just that table column, ie, the traditional interpretation.
true means the stats entry covers that column and all its inheritance
children.  Such entries could be used directly by the planner in cases
where it currently punts and delivers a default estimate.

2. When ANALYZE is invoked on a table that has inheritance children,
it will perform its normal duties for just that table (creating or
updating entries with stainherit = false) and then perform a second
scan that covers that table and all its children.  This will be used
to create or update entries with stainherit = true.  It might be
possible to avoid scanning the parent table itself twice, but I won't
contort the code too much to avoid that, since in most practical
applications the parent is empty or small anyway.

3. Ideally autovacuum would know enough to perform ANALYZEs on
inheritance parents after enough churn has occurred in their child
table(s).  I am not entirely clear about a good way to do that.
We could have it just directly force an ANALYZE on parent(s) of any
table it has chosen to ANALYZE, but that might be overkill --- in
particular leading to excess ANALYZEs when several children receive
a lot of updates.

Even without a really smart solution to #3, this would be a big step
forward for inheritance queries.

BTW, while at it I'm inclined to add a non-unique index on
pg_inherits.inhparent, so that find_inheritance_children won't have to
seqscan pg_inherits anymore.  It's surprising people haven't complained
about that before.  The code says

 * XXX might be a good idea to create an index on pg_inherits' inhparent
 * field, so that we can use an indexscan instead of sequential scan here.
 * However, in typical databases pg_inherits won't have enough entries to
 * justify an indexscan...

but we've long since learned that people stress databases in odd ways.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund

On Saturday 12 December 2009 21:38:41 Andres Freund wrote:
 On Saturday 12 December 2009 21:36:27 Michael Clemmons wrote:
  If ppl think its worth it I'll create a ticket
 Thanks, no need. I will post a patch tomorrow or so.
Well. It was a long day...

Anyway.
In this patch I delay the fsync done in copy_file and simply do a second pass 
over the directory in copy_dir and fsync everything in that pass.
Including the directory - which was not done before and actually might be 
necessary in some cases.
I added a posix_fadvise(..., FADV_DONTNEED) to make it more likely that the 
copied file reaches storage before the fsync. Without the speed benefits were 
quite a bit smaller and essentially random (which seems sensible).

This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s on my 
laptop.  Still slower than with fsync off (~0.25) but quite a worthy 
improvement.

The benefits are obviously bigger if the template database includes anything 
added.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Dimitri Fontaine

Le 28 déc. 2009 à 23:35, Kevin Grittner a écrit :
 So the application would need to open and close a pgbouncer
 connection for each database transaction in order to share the
 backend properly?

No, in session pooling you get the same backend connection for the entire 
pgbouncer connection, it's a 1-1 mapping.

 Well, I don't know that you can very accurately predict a plan or
 what its memory usage would be.  Trying to work out all permutations
 in advance and send each query to the right pool doesn't seem
 workable on a large scale.

True. I was just trying to see what components we already have, while you're 
explaining what's missing: teamwork? :)

 If we had a pooler bundled into the backend and defaulted to a
 halfway reasonable configuration, it's possible that implementing an
 active connection limit the second tier ACP would be covering close
 enough to the same ground as to be redundant.  I'm not quite
 convinced, however, that your proposed use of pgbouncer for this,
 given the multiple pools which would need to be configured and the
 possible application awareness and cooperation with policy would be
 better than a fairly simple ACP.  It seems a bit like driving nails
 with a wrench.  I like wrenches, I use them to turn things, but I
 don't like using them to drive nails when I can help it.  :-)

Hehe, pushing what we already have to their limits is often a nice way to 
describe what we want but still don't have... I think...
-- 
dim


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Kevin Grittner

Dimitri Fontaine dfonta...@hi-media.com wrote:
 Le 28 déc. 2009 à 22:59, Kevin Grittner a écrit :
 
 (3)  With the ACP, the statements would be parsed and optimized
 before queuing, so they would be ready to execute as soon as a
 connection was freed.
 
 There's a pgfoundry project called preprepare, which can be used
 along with pgbouncer to get this effect. If you use 8.4, you can
 even get the effect without pgbouncer.
 
   http://preprepare.projects.postgresql.org/README.html
 
I just reviewed the documentation for preprepare -- I can see a use
case for that, but I really don't think it has a huge overlap with
my point.  The parsing and planning mentioned in my point 3 would
apply to any query -- ad hoc, generated by an ORM, etc.  The
preprepare project seems to be a way to create persistent prepared
statements which are automatically materialized upon connection.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Dimitri Fontaine

Le 28 déc. 2009 à 23:56, Kevin Grittner a écrit :
  http://preprepare.projects.postgresql.org/README.html
 
 I just reviewed the documentation for preprepare -- I can see a use
 case for that, but I really don't think it has a huge overlap with
 my point.  The parsing and planning mentioned in my point 3 would
 apply to any query -- ad hoc, generated by an ORM, etc.  The
 preprepare project seems to be a way to create persistent prepared
 statements which are automatically materialized upon connection.

Just that, right.
-- 
dim
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund

On Monday 28 December 2009 23:54:51 Andres Freund wrote:
 On Saturday 12 December 2009 21:38:41 Andres Freund wrote:
  On Saturday 12 December 2009 21:36:27 Michael Clemmons wrote:
   If ppl think its worth it I'll create a ticket
 
  Thanks, no need. I will post a patch tomorrow or so.
 
 Well. It was a long day...
 
 Anyway.
 In this patch I delay the fsync done in copy_file and simply do a second
  pass over the directory in copy_dir and fsync everything in that pass.
 Including the directory - which was not done before and actually might be
 necessary in some cases.
 I added a posix_fadvise(..., FADV_DONTNEED) to make it more likely that the
 copied file reaches storage before the fsync. Without the speed benefits
  were quite a bit smaller and essentially random (which seems sensible).
 
 This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s on
  my laptop.  Still slower than with fsync off (~0.25) but quite a worthy
  improvement.
 
 The benefits are obviously bigger if the template database includes
  anything added.
Obviously the patch would be helpfull.

Andres
From bd80748883d1328a71607a447677b0bfb1f54ab0 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 28 Dec 2009 23:43:57 +0100
Subject: [PATCH] Delay fsyncing files during copying in CREATE DATABASE - this
 dramatically speeds up CREATE DATABASE on non battery backed
 rotational storage.
 Additionally fsync() the directory to ensure all metadata reaches
 storage.

---
 src/port/copydir.c |   58 +--
 1 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/src/port/copydir.c b/src/port/copydir.c
index a70477e..cde3dc7 100644
*** a/src/port/copydir.c
--- b/src/port/copydir.c
***
*** 37,42 
--- 37,43 
  
  
  static void copy_file(char *fromfile, char *tofile);
+ static void fsync_fname(char *fname);
  
  
  /*
*** copydir(char *fromdir, char *todir, bool
*** 64,69 
--- 65,73 
  (errcode_for_file_access(),
   errmsg(could not open directory \%s\: %m, fromdir)));
  
+ 	/*
+ 	 * Copy all the files
+ 	 */
  	while ((xlde = ReadDir(xldir, fromdir)) != NULL)
  	{
  		struct stat fst;
*** copydir(char *fromdir, char *todir, bool
*** 89,96 
  		else if (S_ISREG(fst.st_mode))
  			copy_file(fromfile, tofile);
  	}
- 
  	FreeDir(xldir);
  }
  
  /*
--- 93,120 
  		else if (S_ISREG(fst.st_mode))
  			copy_file(fromfile, tofile);
  	}
  	FreeDir(xldir);
+ 
+ 	/*
+ 	 * Be paranoid here and fsync all files to ensure we catch problems.
+ 	 */
+ 	xldir = AllocateDir(fromdir);
+ 	if (xldir == NULL)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg(could not open directory \%s\: %m, fromdir)));
+ 
+ 	while ((xlde = ReadDir(xldir, fromdir)) != NULL)
+ 	{
+ 		struct stat fst;
+ 
+ 		if (strcmp(xlde-d_name, .) == 0 ||
+ 			strcmp(xlde-d_name, ..) == 0)
+ 			continue;
+ 
+ 		snprintf(tofile, MAXPGPATH, %s/%s, todir, xlde-d_name);
+ 		fsync_fname(tofile);
+ 	}
  }
  
  /*
*** copy_file(char *fromfile, char *tofile)
*** 150,162 
  	}
  
  	/*
! 	 * Be paranoid here to ensure we catch problems.
  	 */
! 	if (pg_fsync(dstfd) != 0)
! 		ereport(ERROR,
! (errcode_for_file_access(),
!  errmsg(could not fsync file \%s\: %m, tofile)));
! 
  	if (close(dstfd))
  		ereport(ERROR,
  (errcode_for_file_access(),
--- 174,185 
  	}
  
  	/*
! 	 * We tell the kernel here to write the data back in order to make
! 	 * the later fsync cheaper.
  	 */
! #if defined(USE_POSIX_FADVISE)  defined(POSIX_FADV_DONTNEED)
! 	posix_fadvise(dstfd, 0, 0, POSIX_FADV_DONTNEED);
! #endif
  	if (close(dstfd))
  		ereport(ERROR,
  (errcode_for_file_access(),
*** copy_file(char *fromfile, char *tofile)
*** 166,168 
--- 189,212 
  
  	pfree(buffer);
  }
+ 
+ /*
+  * fsync a file
+  */
+ static void
+ fsync_fname(char *fname)
+ {
+ 	int	fd = BasicOpenFile(fname, O_RDWR| PG_BINARY,
+ 		  S_IRUSR | S_IWUSR);
+ 
+ 	if (fd  0)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg(could not create file \%s\: %m, fname)));
+ 
+ 	if (pg_fsync(fd) != 0)
+ 		ereport(ERROR,
+ (errcode_for_file_access(),
+  errmsg(could not fsync file \%s\: %m, fname)));
+ 	close(fd);
+ }
-- 
1.6.5.12.gd65df24


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Application name patch - v3

2009-12-28 Thread Guillaume Lelarge

Le 28/12/2009 22:59, Tom Lane a écrit :
 Guillaume Lelarge guilla...@lelarge.info writes:
 Le 28/12/2009 17:06, Tom Lane a écrit :
 I think we were stalled on the question of whether to use one array
 or two parallel arrays.  Do you want to try coding up a sample usage
 of each possibility so we can see which one seems more useful?
 
 I'm interested in working on this. But I don't find the thread that talk
 about this.
 
 Try here
 http://archives.postgresql.org/message-id/4aae8ccf.9070...@esilo.com
 

Thanks. I've read all the new version of PQconnectdb and Determining
client_encoding from client locale threads. I think I understand the
goal. Still need to re-read this one
(http://archives.postgresql.org/message-id/6222.1253734...@sss.pgh.pa.us) and
completely understand it (will probably need to look at the code, at
least the PQconnectdb one). But I'm definitely working on this.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Tom Lane

Andres Freund and...@anarazel.de writes:
 This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s on my
 laptop.  Still slower than with fsync off (~0.25) but quite a worthy 
 improvement.

I can't help wondering whether that's real or some kind of
platform-specific artifact.  I get numbers more like 3.5s (fsync off)
vs 4.5s (fsync on) on a machine where I believe the disks aren't lying
about write-complete.  It makes sense that an fsync at the end would be
a little bit faster, because it would give the kernel some additional
freedom in scheduling the required I/O, but it isn't cutting the total
I/O required at all.  So I find it really hard to believe a 10x speedup.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Kevin Grittner

Dimitri Fontaine dfonta...@hi-media.com wrote:
 
 No, in session pooling you get the same backend connection for the
 entire pgbouncer connection, it's a 1-1 mapping.
 
Right -- so it doesn't allow more logical connections than that with
a limit to how many are active at any one time, *unless* the clients
cooperate by closing the connections between transactions --
effectively requiring a client yield to accomplish what an ACP
could do without special client cooperation.
 
 Well, I don't know that you can very accurately predict a plan or
 what its memory usage would be.  Trying to work out all
 permutations in advance and send each query to the right pool
 doesn't seem workable on a large scale.
 
 True. I was just trying to see what components we already have,
 while you're explaining what's missing: teamwork? :)
 
It would take a lot more than teamwork to accurately predict those
things.  Particularly in an environment with a large number of
dynamically generated queries.
 
 pushing what we already have to their limits is often a nice way
 to describe what we want but still don't have...
 
Sure, and I'm a big fan of building things from proven smaller
pieces where possible.  Like with Linux utilities (grep, sed, awk,
find, xargs). I just think that in this case a connection pool is
complementary and doesn't fit into the solution to these particular
problems very well.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable implementation

2009-12-28 Thread Jeff Davis

On Mon, 2009-12-28 at 11:54 -0600, Kevin Grittner wrote:
 Serializable transaction isolation is attractive for shops with
 active development by many programmers against a complex schema
 because it guarantees data integrity with very little staff time --

I would like to see true serializability in some form as well.

 Given that each of these would be the best choice for some
 transaction mixes, it might make sense to allow some mapping of the
 four ANSI transaction isolation levels to strategies for
 implementation.  At the risk of generating some backlash on this
 particular point, it might allow a Read Committed implementation
 which avoids some of the current anomalies, as a user-selectable
 alternative to the current implementation.  As a hypothetical
 example, one might map the ANSI Read Uncommitted mode to what
 PostgreSQL currently calls Read Committed, Read Committed to a
 get-a-new-snapshot strategy, Repeatable Read to SI, and Serializable
 to SSI.  (Why do I feel like now would be a good time to duck?)

I like the idea of moving toward using serializable for true
serializability, and repeatable read for snapshot isolation (perhaps
with a compatibility GUC existing for a while to get the old behavior).

However, I don't know what you mean by get-a-new-snapshot strategy or
how it is different from the current read committed behavior. We
obviously want to be careful changing the default isolation level's
behavior.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable implementation

2009-12-28 Thread Florian G. Pflug


On 28.12.09 18:54 , Kevin Grittner wrote:

To give some idea of the scope of development, Michael Cahill added
SSI to InnoDB by modifying 250 lines of code and adding 450 lines of
 code; however, InnoDB already had the S2PL option and the prototype
 implementation isn't as sophisticated as I feel is necessary for
real production use (particularly regarding the granularity of SIREAD
locks).  I'm assuming it would take more to reach real production
quality in PostgreSQL.  My SWAG would be to multiply by two or
three.


I believe the hard part of implementing true serializability is not the
actual SSI or S2PL algorithm, but rather the necessary predicate locking
strategy.

So I think checking how InnoDB tackles that and how much of it's code is
invovled might give a more realistic estimate of the effort required.

best regards,
Florian Plug



smime.p7s
Description: S/MIME Cryptographic Signature

Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund

On Tuesday 29 December 2009 00:06:28 Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s
  on my laptop.  Still slower than with fsync off (~0.25) but quite a
  worthy improvement.
 I can't help wondering whether that's real or some kind of
 platform-specific artifact.  I get numbers more like 3.5s (fsync off)
 vs 4.5s (fsync on) on a machine where I believe the disks aren't lying
 about write-complete.  It makes sense that an fsync at the end would be
 a little bit faster, because it would give the kernel some additional
 freedom in scheduling the required I/O, but it isn't cutting the total
 I/O required at all.  So I find it really hard to believe a 10x speedup.
Well, a template database is about 5.5MB big here - that shouldnt take too 
long when written near-sequentially?
As I said the real benefit only occurred after adding posix_fadvise(.., 
FADV_DONTNEED) which is somewhat plausible, because i.e. the directory entries 
don't need to get scheduled for every file and because the kernel can reorder a 
whole directory nearly sequentially. Without the advice it the kernel doesn't 
know in time that it should write that data back and it wont do it for 5 
seconds by default on linux or such...

I looked at the strace output - it looks sensible timewise to me. If youre 
interested I can give you output of that.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund

On Tuesday 29 December 2009 00:06:28 Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  This speeds up CREATE DATABASE from ~9 seconds to something around 0.8s
  on my laptop.  Still slower than with fsync off (~0.25) but quite a
  worthy improvement.
 
 I can't help wondering whether that's real or some kind of
 platform-specific artifact.  I get numbers more like 3.5s (fsync off)
 vs 4.5s (fsync on) on a machine where I believe the disks aren't lying
 about write-complete.  It makes sense that an fsync at the end would be
 a little bit faster, because it would give the kernel some additional
 freedom in scheduling the required I/O, but it isn't cutting the total
 I/O required at all.  So I find it really hard to believe a 10x speedup.
I only comfortably have access to two smaller machines without BBU from here 
(being in the Hacker Jeopardy at the ccc congress ;-)) and both show this 
behaviour. I guess its somewhat filesystem dependent. 

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] special cases of true serializability

2009-12-28 Thread Jeff Davis

I was thinking about true serializability, and started thinking that
there are a lot of special cases where true serializability can be
achieved without modification.

For instance, the following is problematic:

  BEGIN;
  SELECT count(*) FROM mytable;
  ...

because future insert/update/deletes may be based on that count, so we
essentially would have to lock out write to mytable.

However,

  BEGIN READ ONLY;
  SELECT count(*) FROM mytable;
  ...

is fine, because snapshot isolation already gives true serializability.

And inserts (that don't read any tables) are always safe as well.

Also, in snapshot isolation mode, simple updates and deletes that have a
simple predicate involving a unique key also give true serializability.

It seems like the kinds of transactions that fall outside of these
special cases are exactly the kind of transactions where serializability
is most important -- reading an unknown number of tuples and then
writing. People executing transactions like that are probably willing to
pay some performance cost to ensure correct results.

How hard would it be to reliably detect the transactions for which
snapshot isolation already means true serializability?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable implementation

2009-12-28 Thread Kevin Grittner

Jeff Davis pg...@j-davis.com wrote:
 
 I don't know what you mean by get-a-new-snapshot strategy or
 how it is different from the current read committed behavior.
 
Our current Read Committed level, if it blocks on a competing UPDATE
or DELETE, can provide a view of data which is based on a mix of
snapshots, causing various anomalies.  (See the recent thread titled
determine snapshot after obtaining locks for  first statement that
veered off into a discussion of this issue.)
 
Basically, if an UPDATE or DELETE blocks on a competing UPDATE or
DELETE (within a READ COMMITTED database transaction), it will, once
it is free to proceed, find the modified version of the row on which
it was blocking.  So that will be committed data as of a later
transaction ID than other data we may be viewing, which would be
based on the snapshot from the start of the statement.  As described
in the thread, that causes a target row not to be visible unless it
is present in both snapshots, and may cause joined tables or
subquery data to be out of sync with data from target rows.
 
To avoid these anomalies PostgreSQL would need to roll back the
affected statement, get a new snapshot, and start over.  This would
apparently be rather messy to implement, and would probably be a bit
slower.  I, for one, would willingly pay that price for better data
integrity -- if it affected me.  Since we do all data modifications
in serializable transactions (or as close as we can come to that),
it doesn't actually affect me directly.  I just thought that others
might be interested.  Well, that and the possibility that if this
*was* implemented, it might allow for a nice optimization to reduce
serialization conflicts in true serializable mode.   :-)
 
 We obviously want to be careful changing the default isolation
 level's behavior.
 
Of course.  I've been sort of assuming that we would default to
maintaining current behavior, although that may be overly
conservative.  I certainly would not want to eliminate either of the
existing isolation mode implementations, as they are likely to be
useful performance/integrity trade-offs for a significant percentage
of the user base.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] special cases of true serializability

2009-12-28 Thread Kevin Grittner

Jeff Davis pg...@j-davis.com wrote:
 
 How hard would it be to reliably detect the transactions for which
 snapshot isolation already means true serializability?
 
To answer that question, there's really no substitute for reading
this:
 
http://hdl.handle.net/2123/5353
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Robert Haas

On Mon, Dec 28, 2009 at 3:33 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 They describe a two-tier approach, where the first tier is already
 effectively implemented in PostgreSQL with the max_connections and
 superuser_reserved_connections GUCs.  The second tier is implemented
 to run after a plan is chosen, and may postpone execution of a query
 (or reduce the resources it is allowed) if starting it at that time
 might overload available resources.

It seems like it might be helpful, before tackling what you're talking
about here, to have some better tools for controlling resource
utilization.  Right now, the tools we have a pretty crude.  You can't
even nice/ionice a certain backend without risking priority inversion,
and there's no sensible way to limit the amount of amount of working
memory per-query, only per query-node.

http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 
 It seems like it might be helpful, before tackling what you're
talking
 about here, to have some better tools for controlling resource
 utilization.  Right now, the tools we have a pretty crude.  You
can't
 even nice/ionice a certain backend without risking priority
inversion,
 and there's no sensible way to limit the amount of amount of working
 memory per-query, only per query-node.
 
 http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php
 
I will review and consider.  Thanks.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecTidReScan exprCtxt

2009-12-28 Thread Robert Haas

On Mon, Dec 28, 2009 at 2:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 In ExecTidReScan, we have the following:
         /* If we are being passed an outer tuple, save it for runtime
 key calc */
         if (exprCtxt != NULL)
                 node-ss.ps.ps_ExprContext-ecxt_outertuple =
                         exprCtxt-ecxt_outertuple;

 Is this dead code?

 I don't think it's reachable at the moment, but we do have interest
 in allowing joins using the TID value.  See for example
 http://archives.postgresql.org/pgsql-hackers/2009-01/msg01746.php
 http://archives.postgresql.org/pgsql-performance/2007-04/msg00231.php
 So I wouldn't be in favor of removing it.

Hmm.  If you're joining a table to itself on CTID, it seems that you
would normally be able to optimize away the join completely.  We've
had some previous requests to do that when the join is on the primary
key, and the CTID is an even more clear-cut case.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable implementation

2009-12-28 Thread Greg Stark

On Mon, Dec 28, 2009 at 11:20 PM, Florian G. Pflug f...@phlo.org wrote:
 I believe the hard part of implementing true serializability is not the
 actual SSI or S2PL algorithm, but rather the necessary predicate locking
 strategy.

 So I think checking how InnoDB tackles that and how much of it's code is
 invovled might give a more realistic estimate of the effort required.

We've been over this a few times. The reason MySQL and Sybase/MSSQL
can tackle this is because they have very few access paths which are
closely tied with their index structures.  Postgres supports many
different access paths and indexes are plugin-replacable modules with
a limited interface which can't specify or make detailed assumptions
about the pattern in which data is accessed.

If you read a record in a query then where would you place the
annotation to prevent me from performing a HOT update against that
record? How would you stop me from deleting that same record using an
access plan that doesn't use the same index as you used to find it? If
you scan a partial index where would you put the lock to prevent me
from updating one of those records to no longer fit the partial index
where clause? What if you do a reverse index scan instead of a forward
scan? If you scan the table for the one or two records matching
several indexes using bitmap index scans where would you put the
locks? In GIN, GIST, and hash indexes are you sure there's any
reasonable place to put the locks, especially given GIN's buffered
updates?

This is the hard part. Talking about the details of the conflict
resolution algorithms and the benefits of serializable mode to your
development methodology is all fantasy as long as you don't have any
approaches to solve actually being able to detect the conflicts in the
first place without rewriting many parts and large parts of the
system.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Greg Stark

On Mon, Dec 28, 2009 at 10:54 PM, Andres Freund and...@anarazel.de wrote:
 fsync everything in that pass.
 Including the directory - which was not done before and actually might be
 necessary in some cases.

Er. Yes. At least on ext4 this is pretty important. I wish it weren't,
but it doesn't look like we're going to convince the ext4 developers
they're crazy any day soon and it would really suck for a database
created from a template to have files in it go missin.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund

On Tuesday 29 December 2009 01:27:29 Greg Stark wrote:
 On Mon, Dec 28, 2009 at 10:54 PM, Andres Freund and...@anarazel.de wrote:
  fsync everything in that pass.
  Including the directory - which was not done before and actually might be
  necessary in some cases.
 
 Er. Yes. At least on ext4 this is pretty important. I wish it weren't,
 but it doesn't look like we're going to convince the ext4 developers
 they're crazy any day soon and it would really suck for a database
 created from a template to have files in it go missin.
Actually it was necessary on ext3 as well - the window to hit the problem just 
was much smaller, wasnt it?

Actually that part should possibly get backported.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] parse tree to XML format

2009-12-28 Thread matt

Is there some way to export the postgresql query parse tree in XML format? I 
can not locate the API/Tool etc to do that...

thanks
-Matt


  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parse tree to XML format

2009-12-28 Thread Andres Freund

On Monday 28 December 2009 22:30:44 matt wrote:
 Is there some way to export the postgresql query parse tree in XML format?
  I can not locate the API/Tool etc to do that...
Thats more of a -general question.

There is no such possibility in 8.4 - the not yet released 8.5 contains such a 
possibility.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parse tree to XML format

2009-12-28 Thread Robert Haas

On Mon, Dec 28, 2009 at 7:32 PM, Andres Freund and...@anarazel.de wrote:
 On Monday 28 December 2009 22:30:44 matt wrote:
 Is there some way to export the postgresql query parse tree in XML format?
  I can not locate the API/Tool etc to do that...
 Thats more of a -general question.

 There is no such possibility in 8.4 - the not yet released 8.5 contains such a
 possibility.

Well, you can export the plan as XML using EXPLAIN (FORMAT XML), but
that's not the same thing as the query parse-tree.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Admission Control Policy

2009-12-28 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 The second tier is implemented to run after a plan is chosen, and
 may postpone execution of a query (or reduce the resources it is
 allowed) if starting it at that time might overload available
 resources.
 
 It seems like it might be helpful, before tackling what you're
 talking about here, to have some better tools for controlling
 resource utilization.  Right now, the tools we have a pretty
 crude.  You can't even nice/ionice a certain backend without
 risking priority inversion, and there's no sensible way to limit
 the amount of amount of working memory per-query, only per
 query-node.
 
 http://archives.postgresql.org/pgsql-hackers/2009-10/msg00125.php
 
I see your point, but it seems largely orthogonal:
 
(1)  These issues wouldn't preclude a very simple but still useful
ACP which just limits the active connection count.  This is really
what I most want, and would solve a problem frequently reported on
the lists.
 
(2)  If the ACP had a hook to allow plugging new policies, it would
support development and testing of the types of measurement and
control you describe, not hinder it.
 
(3)  You could get some useful benefit from an ACP which just
postponed queries when a memory-heavy plan was ready and a lot of
memory was already reserved by executing queries anticipated to be
memory-heavy.  That is, you wouldn't need to solve the harder
problem of *limiting* memory usage to get benefit from being able to
roughly *estimate* memory usage.
 
Frankly, solving the problems you reference might be more work than
implementing true serializable transactions.  (At least *I'm*
clueless about how to solve the memory allocation problems, and feel
relatively confident about how to deal with serializable
transactions.)  I'm interested in ACPs because even the simplest
implementation could reduce the number of serialization errors in
some environments, improving performance in serializable isolation
level.  If doing that is a first step in helping to solve the
problems you describe, I'll be happy to have helped.  I don't think
our shop can afford to tackle everything you reference there,
however.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parse tree to XML format

2009-12-28 Thread Andres Freund

On Tuesday 29 December 2009 01:35:25 Robert Haas wrote:
 On Mon, Dec 28, 2009 at 7:32 PM, Andres Freund and...@anarazel.de wrote:
  On Monday 28 December 2009 22:30:44 matt wrote:
  Is there some way to export the postgresql query parse tree in XML
  format? I can not locate the API/Tool etc to do that...
 
  Thats more of a -general question.
 
  There is no such possibility in 8.4 - the not yet released 8.5 contains
  such a possibility.
 
 Well, you can export the plan as XML using EXPLAIN (FORMAT XML), but
 that's not the same thing as the query parse-tree.
Uh. Err. Sorry.

You can play around with debug_print_parse but thats many things but 
definitely not xml.

Matt, what are you trying to achieve?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable implementation

2009-12-28 Thread Kevin Grittner

Greg Stark gsst...@mit.edu wrote:
 
 Talking about the details of the conflict resolution algorithms
 and the benefits of serializable mode to your development
 methodology is all fantasy as long as you don't have any
 approaches to solve actually being able to detect the conflicts in
 the first place without rewriting many parts and large parts of
 the system.
 
Agreed.  I was hoping to keep the discussion on *this* thread
entirely off of technical issues like that.  I'll start specific
threads on specific technical issues as needed.  This will all turn
into a horribly opaque jumble if we mix it all up on one thread.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PERFORM] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund

On Tuesday 29 December 2009 01:30:17 da...@lang.hm wrote:
 On Tue, 29 Dec 2009, Greg Stark wrote:
  On Mon, Dec 28, 2009 at 10:54 PM, Andres Freund and...@anarazel.de 
wrote:
  fsync everything in that pass.
  Including the directory - which was not done before and actually might
  be necessary in some cases.
 
  Er. Yes. At least on ext4 this is pretty important. I wish it weren't,
  but it doesn't look like we're going to convince the ext4 developers
  they're crazy any day soon and it would really suck for a database
  created from a template to have files in it go missin.
 
 actually, as I understand it you need to do this on all filesystems except
 ext3, and on ext3 fsync is horribly slow because it writes out
 _everything_ that's pending, not just stuff related to the file you do the
 fsync on.
I dont think its all filesystems (ext2 should not be affected...), but 
generally 
youre right. At least jfs, xfs are affected as well.

Its btw not necessarily nearly-safe and slow on ext3 as well (data=writeback).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PERFORM] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread david


On Tue, 29 Dec 2009, Greg Stark wrote:


On Mon, Dec 28, 2009 at 10:54 PM, Andres Freund and...@anarazel.de wrote:

fsync everything in that pass.
Including the directory - which was not done before and actually might be
necessary in some cases.


Er. Yes. At least on ext4 this is pretty important. I wish it weren't,
but it doesn't look like we're going to convince the ext4 developers
they're crazy any day soon and it would really suck for a database
created from a template to have files in it go missin.


actually, as I understand it you need to do this on all filesystems except 
ext3, and on ext3 fsync is horribly slow because it writes out 
_everything_ that's pending, not just stuff related to the file you do the 
fsync on.


David Lang

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [PERFORM] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread david


On Tue, 29 Dec 2009, Andres Freund wrote:


On Tuesday 29 December 2009 01:30:17 da...@lang.hm wrote:

On Tue, 29 Dec 2009, Greg Stark wrote:

On Mon, Dec 28, 2009 at 10:54 PM, Andres Freund and...@anarazel.de

wrote:

fsync everything in that pass.
Including the directory - which was not done before and actually might
be necessary in some cases.


Er. Yes. At least on ext4 this is pretty important. I wish it weren't,
but it doesn't look like we're going to convince the ext4 developers
they're crazy any day soon and it would really suck for a database
created from a template to have files in it go missin.


actually, as I understand it you need to do this on all filesystems except
ext3, and on ext3 fsync is horribly slow because it writes out
_everything_ that's pending, not just stuff related to the file you do the
fsync on.

I dont think its all filesystems (ext2 should not be affected...), but generally
youre right. At least jfs, xfs are affected as well.


ext2 definantly needs the fsync on the directory as well as the file 
(well, if the file metadata like size, change)



Its btw not necessarily nearly-safe and slow on ext3 as well (data=writeback).


no, then it's just unsafe and slow ;-)

David Lang

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PERFORM] [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Greg Smith


Andres Freund wrote:
As I said the real benefit only occurred after adding posix_fadvise(.., 
FADV_DONTNEED) which is somewhat plausible, because i.e. the directory entries 
don't need to get scheduled for every file and because the kernel can reorder a 
whole directory nearly sequentially. Without the advice it the kernel doesn't 
know in time that it should write that data back and it wont do it for 5 
seconds by default on linux or such...
  
I know they just fiddled with the logic in the last release, but for 
most of the Linux kernels out there now pdflush wakes up every 5 seconds 
by default.  But typically it only worries about writing things that 
have been in the queue for 30 seconds or more until you've filled quite 
a bit of memory, so that's also an interesting number.  I tried to 
document the main tunables here and describe how they fit together at 
http://www.westnet.com/~gsmith/content/linux-pdflush.htm


It would be interesting to graph the Dirty and Writeback figures in 
/proc/meminfo over time with and without this patch in place.  That 
should make it obvious what the kernel is doing differently in the two 
cases.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] exec_execute_message crush

2009-12-28 Thread Tatsuo Ishii

While inspecting a complain from a pgpool user, I found that
PostgreSQL crushes with following statck trace:

#0  0x0826436a in list_length (l=0xaabe4e28)
at ../../../src/include/nodes/pg_list.h:94
#1  0x08262168 in IsTransactionStmtList (parseTrees=0xaabe4e28)
at postgres.c:2429
#2  0x0826132e in exec_execute_message (portal_name=0x857bab0 , max_rows=0)
at postgres.c:1824
#3  0x08263b2a in PostgresMain (argc=4, argv=0x84f6c28,
username=0x84f6b08 t-ishii) at postgres.c:3671
#4  0x0823299e in BackendRun (port=0x8511e68) at postmaster.c:3449
#5  0x08231f78 in BackendStartup (port=0x8511e68) at postmaster.c:3063
#6  0x0822f90a in ServerLoop () at postmaster.c:1387
#7  0x0822f131 in PostmasterMain (argc=3, argv=0x84f4bf8) at postmaster.c:1040
#8  0x081c6217 in main (argc=3, argv=0x84f4bf8) at main.c:188

This happens with following extended commands sequence:

parse
bind
describe
execute
normaly done
parse invalid SQL thus abort a transaction
bind (error)
describe (error)
execute (crush)

exec_execute_message crushes here:

/* Does the portal contain a transaction command? */
is_xact_command = IsTransactionStmtList(portal-stmts);

Looking into portal:

$5 = {name = 0x85727bc , prepStmtName = 0x0, heap = 0x8596798, resowner = 0x0,
  cleanup = 0, createSubid = 1,
  sourceText = 0x859ac78  SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM,  
ct.relname AS TABLE_NAME,  a.attname AS COLUMN_NAME,  a.attnum AS KEY_SEQ,  
ci.relname AS PK_NAME  FROM pg_catalog.pg_namespace n, pg_catalog.pg_c...,
  commandTag = 0x84682aa SELECT, stmts = 0xaabe4e28, cplan = 0x0,
  portalParams = 0x0, strategy = PORTAL_ONE_SELECT, cursorOptions = 4,
  status = PORTAL_READY, queryDesc = 0x0, tupDesc = 0x85db060,
  formats = 0x859b0c8, holdStore = 0x0, holdContext = 0x0, atStart = 1 '\001',
  atEnd = 1 '\001', posOverflow = 0 '\0', portalPos = 0,
  creation_time = 315313855337710, visible = 1 '\001'}

Problem is, stmts points to invalid memory address:

(gdb) p *portal-stmts
Cannot access memory at address 0xaabe4e28

It seems the source of the problem is, exec_execute_message tries to
execute unamed portal which has unnamed statement which has already
gone.

Please note that without pgpool backend does not crush. This is
because JDBC driver does not do execute() if prior parse, bind
etc. failed, I think.

The crush happens PostgreSQL 8.3.8, 8.3.9 and 8.4.2.

Any thought?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exec_execute_message crush

2009-12-28 Thread Tom Lane

Tatsuo Ishii is...@postgresql.org writes:
 It seems the source of the problem is, exec_execute_message tries to
 execute unamed portal which has unnamed statement which has already
 gone.

Could we see an actual test case?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exec_execute_message crush

2009-12-28 Thread Tatsuo Ishii

 Tatsuo Ishii is...@postgresql.org writes:
  It seems the source of the problem is, exec_execute_message tries to
  execute unamed portal which has unnamed statement which has already
  gone.
 
 Could we see an actual test case?

If you don't mind to use pgpool, it would be possible. If not, I need
to write a small program which handles frontend/backend protocol
directly. What shall I do?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PERFORM] [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund

On Tuesday 29 December 2009 01:46:21 Greg Smith wrote:
 Andres Freund wrote:
  As I said the real benefit only occurred after adding posix_fadvise(..,
  FADV_DONTNEED) which is somewhat plausible, because i.e. the directory
  entries don't need to get scheduled for every file and because the kernel
  can reorder a whole directory nearly sequentially. Without the advice it
  the kernel doesn't know in time that it should write that data back and
  it wont do it for 5 seconds by default on linux or such...
 It would be interesting to graph the Dirty and Writeback figures in
 /proc/meminfo over time with and without this patch in place.  That
 should make it obvious what the kernel is doing differently in the two
 cases.
I did some analysis using blktrace (usefull tool btw) and the results show that
the io pattern is *significantly* different.

For one with the direct fsyncing nearly no hardware queuing is used and for
another nearly no requests are merged on software side.

Short stats:

OLD:

Total (8,0):
 Reads Queued:   2,8KiB  Writes Queued:7854,29672KiB
 Read Dispatches:2,8KiB  Write Dispatches: 1926,29672KiB
 Reads Requeued: 0   Writes Requeued: 0
 Reads Completed:2,8KiB  Writes Completed: 2362,29672KiB
 Read Merges:0,0KiB  Write Merges: 5492,21968KiB
 PC Reads Queued:0,0KiB  PC Writes Queued:0,0KiB
 PC Read Disp.:436,0KiB  PC Write Disp.:  0,0KiB
 PC Reads Req.:  0   PC Writes Req.:  0
 PC Reads Compl.:0   PC Writes Compl.: 2362
 IO unplugs:  2395   Timer unplugs: 557


New:

Total (8,0):
 Reads Queued:   0,0KiB  Writes Queued:1716, 5960KiB
 Read Dispatches:0,0KiB  Write Dispatches:  324, 5960KiB
 Reads Requeued: 0   Writes Requeued: 0
 Reads Completed:0,0KiB  Writes Completed:  550, 5960KiB
 Read Merges:0,0KiB  Write Merges: 1166, 4664KiB
 PC Reads Queued:0,0KiB  PC Writes Queued:0,0KiB
 PC Read Disp.:226,0KiB  PC Write Disp.:  0,0KiB
 PC Reads Req.:  0   PC Writes Req.:  0
 PC Reads Compl.:0   PC Writes Compl.:  550
 IO unplugs:   503   Timer unplugs:  30


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exec_execute_message crush

2009-12-28 Thread Tom Lane

Tatsuo Ishii is...@postgresql.org writes:
 Could we see an actual test case?

 If you don't mind to use pgpool, it would be possible. If not, I need
 to write a small program which handles frontend/backend protocol
 directly. What shall I do?

Hm, can't you get libpq to do it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exec_execute_message crush

2009-12-28 Thread Tatsuo Ishii

  If you don't mind to use pgpool, it would be possible. If not, I need
  to write a small program which handles frontend/backend protocol
  directly. What shall I do?
 
 Hm, can't you get libpq to do it?

That depends on how libpq is intelligent:-) Let me try...

Another idea is a packet recorder, which could record packets from
pgpool to PostgreSQL and replay them. I don't remember at present, but
I vaguely recall something like that exists.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PERFORM] [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Michael Clemmons

Andres,
Great job.  Looking through the emails and thinking about why this works I
think this patch should significantly speedup 8.4 on most any file
system(obviously some more than others) unless the system has significantly
reduced memory or a slow single core. On a Celeron with 256 memory I suspect
it'll crash out or just hit the swap  and be a worse bottleneck.  Anyone
have something like this to test on?
-Michael

On Mon, Dec 28, 2009 at 9:05 PM, Andres Freund and...@anarazel.de wrote:

 On Tuesday 29 December 2009 01:46:21 Greg Smith wrote:
  Andres Freund wrote:
   As I said the real benefit only occurred after adding posix_fadvise(..,
   FADV_DONTNEED) which is somewhat plausible, because i.e. the directory
   entries don't need to get scheduled for every file and because the
 kernel
   can reorder a whole directory nearly sequentially. Without the advice
 it
   the kernel doesn't know in time that it should write that data back and
   it wont do it for 5 seconds by default on linux or such...
  It would be interesting to graph the Dirty and Writeback figures in
  /proc/meminfo over time with and without this patch in place.  That
  should make it obvious what the kernel is doing differently in the two
  cases.
 I did some analysis using blktrace (usefull tool btw) and the results show
 that
 the io pattern is *significantly* different.

 For one with the direct fsyncing nearly no hardware queuing is used and for
 another nearly no requests are merged on software side.

 Short stats:

 OLD:

 Total (8,0):
  Reads Queued:   2,8KiB  Writes Queued:7854,
  29672KiB
  Read Dispatches:2,8KiB  Write Dispatches: 1926,
  29672KiB
  Reads Requeued: 0   Writes Requeued: 0
  Reads Completed:2,8KiB  Writes Completed: 2362,
  29672KiB
  Read Merges:0,0KiB  Write Merges: 5492,
  21968KiB
  PC Reads Queued:0,0KiB  PC Writes Queued:0,
  0KiB
  PC Read Disp.:436,0KiB  PC Write Disp.:  0,
  0KiB
  PC Reads Req.:  0   PC Writes Req.:  0
  PC Reads Compl.:0   PC Writes Compl.: 2362
  IO unplugs:  2395   Timer unplugs: 557


 New:

 Total (8,0):
  Reads Queued:   0,0KiB  Writes Queued:1716,
 5960KiB
  Read Dispatches:0,0KiB  Write Dispatches:  324,
 5960KiB
  Reads Requeued: 0   Writes Requeued: 0
  Reads Completed:0,0KiB  Writes Completed:  550,
 5960KiB
  Read Merges:0,0KiB  Write Merges: 1166,
 4664KiB
  PC Reads Queued:0,0KiB  PC Writes Queued:0,
  0KiB
  PC Read Disp.:226,0KiB  PC Write Disp.:  0,
  0KiB
  PC Reads Req.:  0   PC Writes Req.:  0
  PC Reads Compl.:0   PC Writes Compl.:  550
  IO unplugs:   503   Timer unplugs:  30


 Andres

Re: [PERFORM] [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund

On Tuesday 29 December 2009 03:53:12 Michael Clemmons wrote:
 Andres,
 Great job.  Looking through the emails and thinking about why this works I
 think this patch should significantly speedup 8.4 on most any file
 system(obviously some more than others) unless the system has significantly
 reduced memory or a slow single core. On a Celeron with 256 memory I
  suspect it'll crash out or just hit the swap  and be a worse bottleneck. 
  Anyone have something like this to test on?
Why should it crash? The kernel should just block on writing and write out the 
dirty memory before continuing?
Pg is not caching anything here...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PERFORM] [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Michael Clemmons

Maybe not crash out but in this situation.
N=0
while(N=0):
CREATE DATABASE new_db_N;
Since the fsync is the part which takes the memory and time but is happening
in the background want the fsyncs pile up in the background faster than can
be run filling up the memory and stack.
This is very likely a mistake on my part about how postgres/processes
actually works.
-Michael

On Mon, Dec 28, 2009 at 9:55 PM, Andres Freund and...@anarazel.de wrote:

 On Tuesday 29 December 2009 03:53:12 Michael Clemmons wrote:
  Andres,
  Great job.  Looking through the emails and thinking about why this works
 I
  think this patch should significantly speedup 8.4 on most any file
  system(obviously some more than others) unless the system has
 significantly
  reduced memory or a slow single core. On a Celeron with 256 memory I
   suspect it'll crash out or just hit the swap  and be a worse bottleneck.
   Anyone have something like this to test on?
 Why should it crash? The kernel should just block on writing and write out
 the
 dirty memory before continuing?
 Pg is not caching anything here...

 Andres

Re: [PERFORM] [HACKERS] Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2009-12-28 Thread Andres Freund

On Tuesday 29 December 2009 04:04:06 Michael Clemmons wrote:
 Maybe not crash out but in this situation.
 N=0
 while(N=0):
 CREATE DATABASE new_db_N;
 Since the fsync is the part which takes the memory and time but is
  happening in the background want the fsyncs pile up in the background
  faster than can be run filling up the memory and stack.
 This is very likely a mistake on my part about how postgres/processes
The difference should not be visible outside the CREATE DATABASE ... at all.
Currently the process simplifiedly works like:


for file in source directory:
copy_file(source/file, target/file);
fsync(target/file);


I changed it to:

-
for file in source directory:
copy_file(source/file, target/file);

/*please dear kernel, write this out, but dont block*/
posix_fadvise(target/file, FADV_DONTNEED); 

for file in source directory:
fsync(target/file);
-

If at any point in time there is not enough cache available to cache anything 
copy_file() will just have to wait for the kernel to write out the data.
fsync() does not use memory itself.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Serializable implementation

2009-12-28 Thread Bruce Momjian

Kevin Grittner wrote:
 Jeff Davis pg...@j-davis.com wrote:
  
  I don't know what you mean by get-a-new-snapshot strategy or
  how it is different from the current read committed behavior.
  
 Our current Read Committed level, if it blocks on a competing UPDATE
 or DELETE, can provide a view of data which is based on a mix of
 snapshots, causing various anomalies.  (See the recent thread titled
 determine snapshot after obtaining locks for  first statement that
 veered off into a discussion of this issue.)
  
 Basically, if an UPDATE or DELETE blocks on a competing UPDATE or
 DELETE (within a READ COMMITTED database transaction), it will, once
 it is free to proceed, find the modified version of the row on which
 it was blocking.  So that will be committed data as of a later
 transaction ID than other data we may be viewing, which would be
 based on the snapshot from the start of the statement.  As described
 in the thread, that causes a target row not to be visible unless it
 is present in both snapshots, and may cause joined tables or
 subquery data to be out of sync with data from target rows.

FYI, this behavior is documented:


http://www.postgresql.org/docs/8.4/static/transaction-iso.html#XACT-READ-COMMITTED

I know Kevin knows this but others might find the docs help clarify the
issue.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exec_execute_message crush

2009-12-28 Thread Tatsuo Ishii

  Hm, can't you get libpq to do it?
 
 That depends on how libpq is intelligent:-) Let me try...
 
 Another idea is a packet recorder, which could record packets from
 pgpool to PostgreSQL and replay them. I don't remember at present, but
 I vaguely recall something like that exists.

It seems we can't get libpq to do it. libpq does not provide a
function which can execute bind alone. In my understanding
PQexecPrepared does bind + execute.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exec_execute_message crush

2009-12-28 Thread Tom Lane

Tatsuo Ishii is...@postgresql.org writes:
 Hm, can't you get libpq to do it?

 It seems we can't get libpq to do it. libpq does not provide a
 function which can execute bind alone. In my understanding
 PQexecPrepared does bind + execute.

The event sequence you mentioned had bind followed by execute, so
I'm not seeing the problem.

(In any case, some kind of quick lobotomy in libpq would be easier
than writing a standalone test program, no?)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exec_execute_message crush

2009-12-28 Thread Tatsuo Ishii

 (In any case, some kind of quick lobotomy in libpq would be easier
 than writing a standalone test program, no?)

Sounds nice idea.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exec_execute_message crush

2009-12-28 Thread Kris Jurka




On Tue, 29 Dec 2009, Tatsuo Ishii wrote:


parse
bind
describe
execute
normaly done
parse invalid SQL thus abort a transaction
bind (error)
describe (error)
execute (crush)

Please note that without pgpool backend does not crush. This is
because JDBC driver does not do execute() if prior parse, bind
etc. failed, I think.


The JDBC driver will fire away parse, bind, and execute all at once before 
a sync, to avoid network roundtrips, so your assumption of what's going on 
here without pgpool doesn't seem accurate.  Attached is a test case that 
tries to duplicate what you've described and it errors out normally. 
Below is the JDBC driver's protocol level logging.


21:41:39.407 (1)  FE= Parse(stmt=S_1,query=BEGIN,oids={})
21:41:39.407 (1)  FE= Bind(stmt=S_1,portal=null)
21:41:39.407 (1)  FE= Execute(portal=null,limit=0)
21:41:39.408 (1)  FE= Parse(stmt=null,query=SELECT $1 ,oids={23})
21:41:39.408 (1)  FE= Bind(stmt=null,portal=null,$1='1')
21:41:39.408 (1)  FE= Describe(portal=null)
21:41:39.408 (1)  FE= Execute(portal=null,limit=0)
21:41:39.408 (1)  FE= Parse(stmt=null,query= SELECT SELECT $1 
,oids={23})

21:41:39.408 (1)  FE= Bind(stmt=null,portal=null,$1='2')
21:41:39.409 (1)  FE= Describe(portal=null)
21:41:39.409 (1)  FE= Execute(portal=null,limit=0)
21:41:39.409 (1)  FE= Sync
21:41:39.443 (1)  =BE ParseComplete [S_1]
21:41:39.443 (1)  =BE BindComplete [null]
21:41:39.443 (1)  =BE CommandStatus(BEGIN)
21:41:39.443 (1)  =BE ParseComplete [null]
21:41:39.443 (1)  =BE BindComplete [null]
21:41:39.444 (1)  =BE RowDescription(1)
21:41:39.444 (1)  =BE DataRow
21:41:39.444 (1)  =BE CommandStatus(SELECT)
21:41:39.454 (1)  =BE ErrorMessage(ERROR: syntax error at or near 
SELECT

  Position: 9)

So this shows everything working as expected.  Perhaps enabling this 
logging on your JDBC client would show more clearly what it is trying to 
do.


Kris Jurkaimport java.sql.*;

public class Crash {

public static void main(String args[]) throws Exception {
Class.forName(org.postgresql.Driver);
Connection conn = 
DriverManager.getConnection(jdbc:postgresql://localhost:5844/jurka?loglevel=2,jurka,);
conn.setAutoCommit(false);

PreparedStatement pstmt = conn.prepareStatement(SELECT ? ; 
SELECT SELECT ? );
pstmt.setInt(1, 1);
pstmt.setInt(2, 2);
pstmt.execute();
}
}


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exec_execute_message crush

2009-12-28 Thread Tatsuo Ishii

  parse
  bind
  describe
  execute
  normaly done
  parse invalid SQL thus abort a transaction
  bind (error)
  describe (error)
  execute (crush)
 
  Please note that without pgpool backend does not crush. This is
  because JDBC driver does not do execute() if prior parse, bind
  etc. failed, I think.
 
 The JDBC driver will fire away parse, bind, and execute all at once before 
 a sync, to avoid network roundtrips, so your assumption of what's going on 
 here without pgpool doesn't seem accurate.  Attached is a test case that 
 tries to duplicate what you've described and it errors out normally. 
 Below is the JDBC driver's protocol level logging.
 
 21:41:39.407 (1)  FE= Parse(stmt=S_1,query=BEGIN,oids={})
 21:41:39.407 (1)  FE= Bind(stmt=S_1,portal=null)
 21:41:39.407 (1)  FE= Execute(portal=null,limit=0)
 21:41:39.408 (1)  FE= Parse(stmt=null,query=SELECT $1 ,oids={23})
 21:41:39.408 (1)  FE= Bind(stmt=null,portal=null,$1='1')
 21:41:39.408 (1)  FE= Describe(portal=null)
 21:41:39.408 (1)  FE= Execute(portal=null,limit=0)
 21:41:39.408 (1)  FE= Parse(stmt=null,query= SELECT SELECT $1 
 ,oids={23})
 21:41:39.408 (1)  FE= Bind(stmt=null,portal=null,$1='2')
 21:41:39.409 (1)  FE= Describe(portal=null)
 21:41:39.409 (1)  FE= Execute(portal=null,limit=0)
 21:41:39.409 (1)  FE= Sync
 21:41:39.443 (1)  =BE ParseComplete [S_1]
 21:41:39.443 (1)  =BE BindComplete [null]
 21:41:39.443 (1)  =BE CommandStatus(BEGIN)
 21:41:39.443 (1)  =BE ParseComplete [null]
 21:41:39.443 (1)  =BE BindComplete [null]
 21:41:39.444 (1)  =BE RowDescription(1)
 21:41:39.444 (1)  =BE DataRow
 21:41:39.444 (1)  =BE CommandStatus(SELECT)
 21:41:39.454 (1)  =BE ErrorMessage(ERROR: syntax error at or near 
 SELECT
Position: 9)
 
 So this shows everything working as expected.  Perhaps enabling this 
 logging on your JDBC client would show more clearly what it is trying to 
 do.

Thanks for clarification. I will look into more between pgpool and
PostgreSQL packet exchange.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Boszormenyi Zoltan

Tom Lane írta:
 Boszormenyi Zoltan z...@cybertec.at writes:
   
 Tom Lane írta:
 
 It's more the possibility of doing strcmp(tag, SELECT) on the command
   

   
 Actually it's strncmp(tag, SELECT , 7), so when you mix old server
 with new clients or new server with old client, it will just work as
 before, i.e. return .
 

 Are you deliberately ignoring the point?

No, I just thought you were commenting on my patch's details...

   We have no way to know whether
 there is any client-side code that's doing a simple check for SELECT
 command tag, but it's certainly possible.  The fact that it wouldn't be
 hard to fix does not mean that it wouldn't be broken.

   regards, tom lane

   


-- 
Bible has answers for everything. Proof:
But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil. (Matthew 5:37) - basics of digital technology.
May your kingdom come - superficial description of plate tectonics

--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Provide rowcount for utility SELECTs

2009-12-28 Thread Boszormenyi Zoltan

Tom Lane írta:
 Boszormenyi Zoltan z...@cybertec.at writes:
   
 attached is a small patch that makes it possible for clients
 to receive row count for SELECT ... INTO ... and CREATE TABLE ... AS ...
 

   
 Comments?
 

 This doesn't look tremendously well thought out to me.

 1. As given, the patch changes the result not only for SELECT INTO but
 for any SELECT executed in PORTAL_MULTI_QUERY context (consider SELECTs
 added by rules for example).  It seems like a pretty bad idea for the
 result of a statement to depend on context.

 2. In the past we have regretted it when we made the same command tag
 sometimes have numbers attached and sometimes not (note the hack at
 the bottom of PortalRunMulti).  It doesn't seem like terribly good
 design to do that here.  On the other hand, always attaching a count
 to SELECT tags would greatly increase the risk of breaking clients.
   

Okay, how about introducing a new SELECTINTO N command tag, then?
It's also a protocol change, but at least it can fall into the very last
else
anywhere, hence have a high chance of being ignored and handled the same
way as other not rowcount-returning tags.

 I'm not at all convinced that this is so useful as to justify taking
 any compatibility risks for.  People who really need that count can
 get it easily enough by breaking the command into a CREATE followed
 by INSERT/SELECT.
   

Yes, and every WITH RECURSIVE statement can also be broken up
manually as well. It's simply shorter and has a chance of being a little
more resource-effective:
- one parsing/planning phase instead of two on the server side
- one error checking in the app instead of two
- PostgreSQL already has the infrastructure to return the rowcount

Best regards,
Zoltán Böszörményi

-- 
Bible has answers for everything. Proof:
But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil. (Matthew 5:37) - basics of digital technology.
May your kingdom come - superficial description of plate tectonics

--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
http://www.postgresql.at/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

89 matches

Mail list logo