date:20080129

Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3

2008-01-29 Thread Marko Kreen

On 1/28/08, Greg Sabino Mullane [EMAIL PROTECTED] wrote:
  I am not thrilled about moving _some_ of pgcrypto into the backend ---
  pgcrypto right now seems well designed and if we pull part of it out it
  seems it will be less clear than what we have now.  Perhaps we just need
  to document that md5() isn't for general use and some function in
  pgcrypto should be used instead?

I don't think docs will help much.  The md5() has broken the
policy of all crypto in external module so I'm now thinking
we should just bite the bullet and add digest() to core.

Exctracting the part from pgcrypto is no-brainer, the hashes have
very thin wrapper around them, only thing common with rest of
pgcrypto is error handling, which hashes use the least.

Only thing that needs minor thought is that it would be nice
to access hashes from external module - that means exporting
the find_digest() function to it.  Which should be no problem.

 I think looking at this as putting some of pg_crypto into core is looking
 at this the wrong way. We are never going to put the whole thing into
 core given the current state of cryptography laws, as obviously the
 current status of giving users md5() and nothing else is not ideal. What
 we're looking for is a middle ground. It seems to me we've narrowed
 it down to two questions:

 1) Does sha1(), or other hashing algorithms risk running afoul of
 cryptography regulations?

 I'm 100% sure that sha1() itself is not a problem (it's even a PHP builtin,
 and good luck finding a box these days wihout that monstrosity installed).
 I highly doubt any of the rest (SHA*, HMAC, etc.) are a problem either:
 we're doing a one-way hash, not encrypting data. But common sense and
 cryptography have seldom been seen together since the start of the cold war,
 so I'll hold my final judgement.

 2) Which ones do we include?

 Putting sha1() seems a no-brainer, but as Joe points out, why not add all
 the rest in at the same time?

Considering we have DES and MD5 already, then how can SHA2 be
a problem when SHA1 isnt?

-- 
marko

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3

2008-01-29 Thread Florian Weimer

* Bruce Momjian:

 I am not thrilled about moving _some_ of pgcrypto into the backend ---
 pgcrypto right now seems well designed and if we pull part of it out it
 seems it will be less clear than what we have now.  Perhaps we just need
 to document that md5() isn't for general use and some function in
 pgcrypto should be used instead?

Yes, that would probably help those folks doing checklist-based
security audits.

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanningGUCvariable

2008-01-29 Thread Heikki Linnakangas


Jeff Davis wrote:

On Mon, 2008-01-28 at 23:13 +, Heikki Linnakangas wrote:
It's a good point that we don't want pg_dump to screw up the cluster 
order, but that's the only use case I've seen this far for disabling 
sync scans. Even that wouldn't matter much if our estimate for 
clusteredness didn't get screwed up by a table that looks like this: 
5 6 7 8 9 1 2 3 4


It doesn't seem like there is any reason for the estimate to get
confused, but it apparently does. I loaded a test table with a similar
distribution to your example, and it shows a correlation of about -0.5,
but it should be as good as something near -1 or +1.

I am not a statistics expert, but it seems like a better measurement
would be: what is the chance that, if the tuples are close together in
index order, the corresponding heap tuples are close together?.

The answer to that question in your example is very likely, so there
would be no problem.

Is there a reason we don't do this?


It has been discussed before, but no-one has come up with a good 
measurement for that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[HACKERS] GSSAPI and V2 protocol

2008-01-29 Thread Kris Jurka



Is it possible to authenticate using GSSAPI over the V2 protocol?  Is 
there any documentation on the message formats for V2?


Kris Jurka

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[HACKERS] How to use VB6 for store image to postgresql?

2008-01-29 Thread Premsun Choltanwanich



Dear All,

Haveyou ever  store image to postgresql using VB6? Could you give me an example?

Thank you,

Premsun


  NETsolutions Asia 
  Limited 


  +66 (2) 237 7247

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Zeugswetter Andreas ADI SD


 It's a good point that we don't want pg_dump to screw up the cluster 
 order, but that's the only use case I've seen this far for disabling 
 sync scans. Even that wouldn't matter much if our estimate for 
 clusteredness didn't get screwed up by a table that looks 
 like this: 
 5 6 7 8 9 1 2 3 4

I do think the guc to turn it off is useful, only I don't understand the
reasoning that pg_dump needs it to maintain the basic clustered
property.

Sorry, but I don't grok this at all.
Why the heck would we care if we have 2 parts of the table perfectly
clustered,
because we started in the middle ? Surely our stats collector should
recognize
such a table as perfectly clustered. Does it not ? We are talking about
one
breakage in the readahead logic here, this should only bring the
clustered property
from 100% to some 99.99% depending on table size vs readahead window.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Gregory Stark


Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:

 Sorry, but I don't grok this at all. Why the heck would we care if we have 2
 parts of the table perfectly clustered, because we started in the middle ?
 Surely our stats collector should recognize such a table as perfectly
 clustered. Does it not ? We are talking about one breakage in the readahead
 logic here, this should only bring the clustered property from 100% to some
 99.99% depending on table size vs readahead window.

Well clusteredness is used or could be used for a few different heuristics,
not all of which this would be quite as well satisfied as readahead. But for
the most common application, namely trying to figure out whether index probes
for sequential ids will be sequential i/o or random i/o you're right.

Currently the statistic we use to estimate this is the correlation of the
column value with the physical location on disk. That's not a perfect metric
for estimating how much random i/o would be needed to scan the table in index
order though.

It would be great if Postgres picked up a serious statistics geek who could
pipe up in discussions like this with how about using the Euler-Jacobian
Centroid or some such thing. If you have any suggestions of what metric to
use and how to calculate the info we need from it that would be great.

One suggestion from a long way back was scanning the index and counting how
many times the item pointer moves backward to an earlier block. That would
still require a full index scan though. And it doesn't help for columns which
aren't indexed though I'm not sure we need this info for columns which aren't
indexed. It's also not clear how to interpolate from that the amount of random
access a given query would perform.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] RFC: array_agg() per SQL:200n

2008-01-29 Thread Peter Eisentraut

Am Montag, 28. Januar 2008 schrieb Neil Conway:
 To parse the ORDER BY clause, we'd need to special-case array_agg() in
 the grammar

The ORDER BY clause would also used in XMLAGG, so we should try to parse this 
in a generalized way.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bogus cleanup code in GSSAPI/SSPI patch

2008-01-29 Thread Magnus Hagander

On Mon, Jan 28, 2008 at 09:11:31PM -0500, Tom Lane wrote:
 Hi Magnus,
   Would you take a look at the patch I just committed in fe-connect.c?
 
 I found out today that PQreset() wasn't working on a GSSAPI connection,
 because closePGconn hadn't been patched to clear out the GSSAPI state
 (resulting in duplicate GSS authentication request failure).  I think
 I fixed that but it wouldn't be bad for someone who knows that code
 better to double-check.  Also, I can't test the ENABLE_SSPI case,
 so that needs to be looked at.
 
 The test case I've been using is to select 1; in psql, then kill -9
 the backend from another window, then select 1; again.  psql should
 recover and reconnect successfully.

Looks good from here.

As for the SSPI case, it actually recovered just fine with SSPI before this
patch - from how I read it there was a memory leak, though. 

And I can confirm that it still works after this patch.

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Zeugswetter Andreas ADI SD


  +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future
release 
  cycle we do test the cases Simon described above and we agree we
need to 
  do a fine tune to benefit from this feature, we will need to
deprecate 
  'enable_sync_seqscans' and invent another one
(sync_seqscans_threshold). 
  Looking at this perpective, IMHO we should go with the number (0.25)

  instead of the boolean.
 
 Surely the risk-of-needing-to-deprecate argument applies ten times
more
 strongly to a number than a boolean.

Yes, I would expect the tuning to be more system than user specific.
So imho a boolean userset would couple well with a tuning guc, that
may usefully not be userset (if we later discover a need for tuning at
all).

so +1 for the bool. 

synchronize[d]_seqscan sounds a bit better in my ears than the plural
synchronize_seqscans.
To me the latter somehow suggests influece on the whole cluster,
probably not 
worth further discussion though, so if someone says no, ok.

Andreas


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] autonomous transactions

2008-01-29 Thread Bruce Momjian

Simon Riggs wrote:
 On Tue, 2008-01-22 at 20:53 +0100, Pavel Stehule wrote:
  
   Agreed.  I think Pavel Stehule was doing some experiments with them, I
   don't know if he got anywhere.
  
  
  I did only first research. Any hack is possible - you can stack
  current transaction, but real implementation needs similar work like
  nested transaction :( and it is too low level for me. And some code
  cleaning is necessary. There are global variables.
  
  And there is most important question about data visibility - is
  autonomous transaction independent on main transaction (isolation)?
  You have to thing about deadlock, about reference integrity, etc. This
  task isn't simple.
 
 Yes, I think autonomous transactions should be on the TODO. They're
 useful for
 - error logging
 - auditing
 - creating new partitions automatically
 
 Plus I think we'd be able to improve the code for CREATE INDEX under
 HOT, and probably a few other wrinkly bits of code.

Added to TODO:

* Add anonymous transactions

  http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php


-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] autonomous transactions

2008-01-29 Thread Bruce Momjian

Bruce Momjian wrote:
  Plus I think we'd be able to improve the code for CREATE INDEX under
  HOT, and probably a few other wrinkly bits of code.
 
 Added to TODO:
 
 * Add anonymous transactions
 
   http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php

Sorry, updated to Add _autonomous_ transactions.  (The one time I
don't cut/paste and I get it wrong.)

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Large pgstat.stat file causes I/O storm

2008-01-29 Thread Cristian Gafton


On Tue, 29 Jan 2008, Cristian Gafton wrote:

I have a ~150GB sized server, containing two databases that are active in 
mostly read mode. I have noticed lately that the global/pgstat.stat file is 
somewhere around 1MB freshly after a restart, but at some point it baloons to 
74MB in size for no apparent reason, after a few hours of uptime. Needless to 
say, having the stats collector dump 74MB of stuff on disk on its every loop 
takes a big bite of the I/O capabilities of this box.


Of course, leaving out the most important thing - this is postgresql 8.2.6 
on x86_64


Looking at all the othe replicas I have of this database (but which are under 
a more lightweight read load), the pgstat.stat file again is rather small in 
size. Am I right to assume that a 74MB pgstat.stat file is not normal - and 
what might have caused it?


Cristian
--
Cristian Gafton
rPath, Inc.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Large pgstat.stat file causes I/O storm

2008-01-29 Thread Cristian Gafton



Hello all,

I have a ~150GB sized server, containing two databases that are active in 
mostly read mode. I have noticed lately that the global/pgstat.stat file 
is somewhere around 1MB freshly after a restart, but at some point it 
baloons to 74MB in size for no apparent reason, after a few hours of 
uptime. Needless to say, having the stats collector dump 74MB of stuff on 
disk on its every loop takes a big bite of the I/O capabilities of this 
box.


Looking at all the othe replicas I have of this database (but which are 
under a more lightweight read load), the pgstat.stat file again is rather 
small in size. Am I right to assume that a 74MB pgstat.stat file is not 
normal - and what might have caused it?


Thanks,

Cristian
--
Cristian Gafton
rPath, Inc.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Kenneth Marshall

On Tue, Jan 29, 2008 at 10:40:40AM +0100, Zeugswetter Andreas ADI SD wrote:
 
  It's a good point that we don't want pg_dump to screw up the cluster 
  order, but that's the only use case I've seen this far for disabling 
  sync scans. Even that wouldn't matter much if our estimate for 
  clusteredness didn't get screwed up by a table that looks 
  like this: 
  5 6 7 8 9 1 2 3 4
 
 I do think the guc to turn it off is useful, only I don't understand the
 reasoning that pg_dump needs it to maintain the basic clustered
 property.
 
 Sorry, but I don't grok this at all.
 Why the heck would we care if we have 2 parts of the table perfectly
 clustered,
 because we started in the middle ? Surely our stats collector should
 recognize
 such a table as perfectly clustered. Does it not ? We are talking about
 one
 breakage in the readahead logic here, this should only bring the
 clustered property
 from 100% to some 99.99% depending on table size vs readahead window.
 
 Andreas
 

Andreas,

I agree with your logic. If the process that PostgreSQL uses to determine
how clustered a table is that breaks with such a layout, we may need to
see what should be changed to make it work. Having had pg_dump cause a
database to grind to a halt, I would definitely like the option of using
the synchronized scans even for clustered tables.

Ken

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Tom Lane

Euler Taveira de Oliveira [EMAIL PROTECTED] writes:
 +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release 
 cycle we do test the cases Simon described above and we agree we need to 
 do a fine tune to benefit from this feature, we will need to deprecate 
 'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). 
 Looking at this perpective, IMHO we should go with the number (0.25) 
 instead of the boolean.

Surely the risk-of-needing-to-deprecate argument applies ten times more
strongly to a number than a boolean.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] How to use VB6 for store image to postgresql?

2008-01-29 Thread Gevik Babakhani

I guess the fastest way is to:
 
- create a column of type text. ex.  create table foo( myimage text );
- read the contents of your image from file and encode it in base64 using:
http://www.vbforums.com/attachment.php?s=42957f48bac95dd18ca8bffcf7578dcc
http://www.vbforums.com/attachment.php?s=42957f48bac95dd18ca8bffcf7578dcca
ttachmentid=49287d=1152543402 attachmentid=49287d=1152543402
- save the base64 encoded string in the database
I hope this helps.
 
Regards,
Gevik.
 


  _  

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Premsun
Choltanwanich
Sent: Tuesday, January 29, 2008 10:28 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] How to use VB6 for store image to postgresql?


Dear All,
 
Have you ever store image to postgresql using VB6? Could you give me an
example?
 
Thank you,
 
Premsun


NETsolutions Asia Limited 

+66 (2) 237 7247 

 http://www.nsasia.co.th NETsolutions Asia Limited 

IMAGE.jpg

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Jeff Davis

On Tue, 2008-01-29 at 10:55 +, Gregory Stark wrote:
 Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
 
  Sorry, but I don't grok this at all. Why the heck would we care if we have 2
  parts of the table perfectly clustered, because we started in the middle ?
  Surely our stats collector should recognize such a table as perfectly
  clustered. Does it not ? We are talking about one breakage in the readahead
  logic here, this should only bring the clustered property from 100% to some
  99.99% depending on table size vs readahead window.
 
 Well clusteredness is used or could be used for a few different heuristics,
 not all of which this would be quite as well satisfied as readahead. But for

Can you give an example? Treating a file as a circular structure does
not impose any significant cost that I can see.

 It would be great if Postgres picked up a serious statistics geek who could
 pipe up in discussions like this with how about using the Euler-Jacobian
 Centroid or some such thing. If you have any suggestions of what metric to
 use and how to calculate the info we need from it that would be great.

Agreed.

 One suggestion from a long way back was scanning the index and counting how
 many times the item pointer moves backward to an earlier block. That would

An interesting metric. As you say, we really need a statistician to
definitively say what the correct metrics are, and what kind of sampling
we need to make good estimates.

 still require a full index scan though. And it doesn't help for columns which
 aren't indexed though I'm not sure we need this info for columns which aren't
 indexed. It's also not clear how to interpolate from that the amount of random
 access a given query would perform.

I don't think clusteredness has any meaning at all in postgres for an
unindexed column. I suppose a table could be clustered without an index,
but currently there's no way to do that in postgresql.

Regards,
Jeff Davis




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Euler Taveira de Oliveira


Simon Riggs wrote:


And if you have a partitioned table with partitions inconveniently
sized? You'd need to *reduce* shared_buffers specifically to get synch
scans and BAS to kick in. Or increase partition size. Both of which
reduce the impact of the benefits we've added.

I don't think the argument that a table is smaller than shared buffers
therefore it is already in shared buffers holds true in all cases. I/O
does matter.

+1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release 
cycle we do test the cases Simon described above and we agree we need to 
do a fine tune to benefit from this feature, we will need to deprecate 
'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). 
Looking at this perpective, IMHO we should go with the number (0.25) 
instead of the boolean.



--
  Euler Taveira de Oliveira
  http://www.timbira.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Kevin Grittner

 On Tue, Jan 29, 2008 at  1:09 PM, in message [EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 
 Or is someone prepared to argue that there are no applications out
 there that will be broken if the same query, against the same unchanging
 table, yields different results from one trial to the next?
 
If geqo kicks in, we're already there, aren't we?
 
Isn't an application which counts on the order of result rows
without specifying ORDER BY fundamentally broken?
 
-Kevin
 



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Large pgstat.stat file causes I/O storm

2008-01-29 Thread Cristian Gafton


On Tue, 29 Jan 2008, Tom Lane wrote:


(Pokes around in the code...)  I think the problem here is that the only
active mechanism for flushing dead stats-table entries is
pgstat_vacuum_tabstat(), which is invoked by a VACUUM command or an
autovacuum.  Once-a-day VACUUM isn't gonna cut it for you under those
circumstances.  What you might do is just issue a VACUUM on some
otherwise-uninteresting small table, once an hour or however often you
need to keep the stats file bloat to a reasonable level.


I just ran a vacuumdb -a on the box - the pgstat file is still 90MB in 
size. If vacuum is supposed to clean up the cruft from pgstat, then I 
don't know if we're looking at the right cruft - I kind of expected the 
pgstat file to go down in size and the I/O storm to subside, but that is 
not happening after vacuum.


I will try to instrument the application to record the oids of the temp 
tables it creates and investigate from that angle, but in the meantime is 
there any way to reset the stats collector altogether? Could this be a 
corrupt stat file that gets read and written right back on every loop 
without any sort of validation?


Thanks,

Cristian
--
Cristian Gafton
rPath, Inc.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Transition functions for SUM(::int2), SUM(::int4, SUM(::int8])

2008-01-29 Thread Tom Lane

Caleb Welton [EMAIL PROTECTED] writes:
 Forgive me if I'm being dense, but could you explain why

 int4_sum(bigint,int) must not be strict, but
 int4_avg_accum(bytea, int) can be strict?

The former is used with a NULL initial value, the latter isn't.
Read the fine print in the CREATE AGGREGATE man page about behavior
with strict transition functions.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Tom Lane

Kevin Grittner [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote: 
 Or is someone prepared to argue that there are no applications out
 there that will be broken if the same query, against the same unchanging
 table, yields different results from one trial to the next?
 
 If geqo kicks in, we're already there, aren't we?

Yup, and that's one of the reasons we have a way to turn geqo off.
(geqo is actually a good precedent for this --- notice that it has
an on/off switch that's separate from its tuning knobs.)

 Isn't an application which counts on the order of result rows
 without specifying ORDER BY fundamentally broken?

No doubt, but if it's always worked before, people are going to be
unhappy anyway.

Also, it's not just ordering that's at stake.  Try

regression=# create table foo as select x from generate_series(1,100) x;
SELECT
regression=# select * from foo limit 1;
   x   
---
 1
 2
 3
 4
 
regression=# select * from foo limit 1;
   x   
---
  7233
  7234
  7235
  7236
  
regression=# select * from foo limit 1;
   x   
---
 14465
 14466
 14467
 14468
 

Now admittedly we've never promised LIMIT without ORDER BY to be
well-defined either, but not everybody reads the fine print.
This case is particularly nasty because at smaller LIMIT values
the result *is* consistent, so you might never notice the problem
while testing.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Large pgstat.stat file causes I/O storm

2008-01-29 Thread Cristian Gafton


On Tue, 29 Jan 2008, Tom Lane wrote:


Cristian Gafton [EMAIL PROTECTED] writes:

Autovacuum is disabled, since the database is mostly read only. There is a
vacuumdb -a -z running nightly on the box. However, the application that
queries it does a lot of work with temporary tables - would those bloat
the stats at all?


Conceivably, if you mean a lot of short-lived tables rather than a lot
of operations on a few tables.  However, I'd think that would result in
a steady accumulation of stats entries, not a sudden jump as you seemed
to describe.


We are churning through a bunch of short-lived temp tables. Since I 
reported the problem, the pgstat file is now sitting at 85M, yet the 
pg_stat* tables barely have any entries in them:


count(*)
pg_stats298
pg_statistic298
pg_stat_all_indexes 76
pg_stat_all_tables  76
pg_statio_all_tables56
pg_statio_all_indexes   76

Is there a way to inspect the pgstat file and see what's in it that it is 
taking all this space? (it's not the space that bothers me, it's the fact 
that the statistics collector has to dump 85MB of stuff once a second to 
disk...)


Thanks,

Cristian
--
Cristian Gafton
rPath, Inc.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Tom Lane

Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
 synchronize[d]_seqscan sounds a bit better in my ears than the plural
 synchronize_seqscans.

The plural seems better to me; there's no such thing as a solitary
synchronized scan, no?  The whole point of the feature is to affect
the behavior of multiple scans.

BTW, so far as the rest of the thread goes, I'm not necessarily opposed
to exposing the switchover threshold as a tunable.  But I think it needs
more thought to design than we can give it in time for 8.3 (because of
the interaction with the buffer access strategy stuff).  Also I don't
like having pg_dump manipulating a tuning parameter.  I don't see
anything wrong with having both an on/off feature switch and a tunable
in future releases.  The feature switch can be justified on grounds
of backwards compatibility quite independently of whether pg_dump uses
it.  Or is someone prepared to argue that there are no applications out
there that will be broken if the same query, against the same unchanging
table, yields different results from one trial to the next?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Large pgstat.stat file causes I/O storm

2008-01-29 Thread Tom Lane

Cristian Gafton [EMAIL PROTECTED] writes:
 Autovacuum is disabled, since the database is mostly read only. There is a 
 vacuumdb -a -z running nightly on the box. However, the application that 
 queries it does a lot of work with temporary tables - would those bloat 
 the stats at all?

Conceivably, if you mean a lot of short-lived tables rather than a lot
of operations on a few tables.  However, I'd think that would result in
a steady accumulation of stats entries, not a sudden jump as you seemed
to describe.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Large pgstat.stat file causes I/O storm

2008-01-29 Thread Tom Lane

Cristian Gafton [EMAIL PROTECTED] writes:
 On Tue, 29 Jan 2008, Cristian Gafton wrote:
 I have a ~150GB sized server, containing two databases that are active in 
 mostly read mode. I have noticed lately that the global/pgstat.stat file is 
 somewhere around 1MB freshly after a restart, but at some point it baloons 
 to 
 74MB in size for no apparent reason, after a few hours of uptime. Needless 
 to 
 say, having the stats collector dump 74MB of stuff on disk on its every loop 
 takes a big bite of the I/O capabilities of this box.

 Of course, leaving out the most important thing - this is postgresql 8.2.6 
 on x86_64

Hmm ... do you have autovacuum enabled?  If not, what's the vacuuming
policy on that box?  I'm wondering if this is triggered by something
deciding to vacuum or analyze a bunch of otherwise-unused tables, and
thereby causing stats entries to be created for those tables.

You could investigate by comparing the contents of the stats views
before and after the file balloons.  I would expect to see a lot more
rows, and the key is exactly what non-null activity is recorded in
the extra rows.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Large pgstat.stat file causes I/O storm

2008-01-29 Thread Cristian Gafton


On Tue, 29 Jan 2008, Tom Lane wrote:


On Tue, 29 Jan 2008, Cristian Gafton wrote:

I have a ~150GB sized server, containing two databases that are active in
mostly read mode. I have noticed lately that the global/pgstat.stat file is
somewhere around 1MB freshly after a restart, but at some point it baloons to
74MB in size for no apparent reason, after a few hours of uptime. Needless to
say, having the stats collector dump 74MB of stuff on disk on its every loop
takes a big bite of the I/O capabilities of this box.



Of course, leaving out the most important thing - this is postgresql 8.2.6
on x86_64


Hmm ... do you have autovacuum enabled?  If not, what's the vacuuming
policy on that box?  I'm wondering if this is triggered by something
deciding to vacuum or analyze a bunch of otherwise-unused tables, and
thereby causing stats entries to be created for those tables.


Autovacuum is disabled, since the database is mostly read only. There is a 
vacuumdb -a -z running nightly on the box. However, the application that 
queries it does a lot of work with temporary tables - would those bloat 
the stats at all?



You could investigate by comparing the contents of the stats views
before and after the file balloons.  I would expect to see a lot more
rows, and the key is exactly what non-null activity is recorded in
the extra rows.


Any one of the stats views in particular? Currently all of the stats_*
flags are set to on.

Thanks,

Cristian
--
Cristian Gafton
rPath, Inc.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Large pgstat.stat file causes I/O storm

2008-01-29 Thread Tom Lane

Cristian Gafton [EMAIL PROTECTED] writes:
 We are churning through a bunch of short-lived temp tables.

I think that's probably the root of the problem ...

 Since I 
 reported the problem, the pgstat file is now sitting at 85M, yet the 
 pg_stat* tables barely have any entries in them:

   count(*)
 pg_stats  298
 pg_statistic  298
 pg_stat_all_indexes   76
 pg_stat_all_tables76
 pg_statio_all_tables  56
 pg_statio_all_indexes 76

Those views are joins against pg_class, so only tables that have live
pg_class rows can possibly show up there.  You could try remembering the
OIDs of some temp tables and probing the underlying pg_stat_get_xxx()
functions to see if there are stats-table entries for them.

(Pokes around in the code...)  I think the problem here is that the only
active mechanism for flushing dead stats-table entries is
pgstat_vacuum_tabstat(), which is invoked by a VACUUM command or an
autovacuum.  Once-a-day VACUUM isn't gonna cut it for you under those
circumstances.  What you might do is just issue a VACUUM on some
otherwise-uninteresting small table, once an hour or however often you
need to keep the stats file bloat to a reasonable level.

There is a pgstat_drop_relation() function to tell the stats collector
to drop a single table entry, but it's not being called from anyplace.
We probably ought to try a bit harder to make that work.  The problem
is described here:

2007-07-08 18:23  tgl

* src/: backend/postmaster/pgstat.c, backend/storage/smgr/smgr.c,
include/pgstat.h (REL8_1_STABLE), backend/postmaster/pgstat.c,
backend/storage/smgr/smgr.c, include/pgstat.h (REL8_2_STABLE),
backend/postmaster/pgstat.c, backend/storage/smgr/smgr.c,
include/pgstat.h: Remove the pgstat_drop_relation() call from
smgr_internal_unlink(), because we don't know at that point which
relation OID to tell pgstat to forget.  The code was passing the
relfilenode, which is incorrect, and could possibly cause some
other relation's stats to be zeroed out.  While we could try to
clean this up, it seems much simpler and more reliable to let the
next invocation of pgstat_vacuum_tabstat() fix things; which indeed
is how it worked before I introduced the buggy code into 8.1.3 and
later :-(.  Problem noticed by Itagaki Takahiro, fix is per
subsequent discussion.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Tom Lane

Ron Mayer [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Or is someone prepared to argue that there are no applications out
 there that will be broken if the same query, against the same unchanging
 table, yields different results from one trial to the next? 
 
 Won't even autovacuum analyze cause this too if the
 new stats changes the plan?

Given that the table is unchanging, that's moderately unlikely to happen
(especially for select * from foo ;-))

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Transition functions for SUM(::int2), SUM(::int4, SUM(::int8])

2008-01-29 Thread Caleb Welton

Thanks Tom,
  
Forgive me if I'm being dense, but could you explain why

int4_sum(bigint,int) must not be strict, but
int4_avg_accum(bytea, int) can be strict?

It seems that both of them have transition datatypes that are different from
the input.

-Caleb

On 1/28/08 7:31 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Caleb Welton [EMAIL PROTECTED] writes:
 Is there any reason that int2_sum, int4_sum, and int8_sum are not marked as
 being strict?
 
 They wouldn't work otherwise, because the transition datatypes aren't
 the same as the inputs.
 
 regards, tom lane


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Ron Mayer


Tom Lane wrote:

Kevin Grittner [EMAIL PROTECTED] writes:
  
Tom Lane [EMAIL PROTECTED] wrote: 


Or is someone prepared to argue that there are no applications out
there that will be broken if the same query, against the same unchanging
table, yields different results from one trial to the next? 
  

Won't even autovacuum analyze cause this too if the
new stats changes the plan?


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Large pgstat.stat file causes I/O storm

2008-01-29 Thread Tom Lane

Cristian Gafton [EMAIL PROTECTED] writes:
 I just ran a vacuumdb -a on the box - the pgstat file is still 90MB in 
 size. If vacuum is supposed to clean up the cruft from pgstat, then I 
 don't know if we're looking at the right cruft - I kind of expected the 
 pgstat file to go down in size and the I/O storm to subside, but that is 
 not happening after vacuum.

Hmph ... I did a simple test here involving creating a lot of temp
tables, and indeed it made the stats file bigger, but the size went
right down again after vacuuming.  Is it possible that the vacuumdb
failed to connect to the particular database in which the temp tables
are coming and going?

 I will try to instrument the application to record the oids of the temp 
 tables it creates and investigate from that angle, but in the meantime is 
 there any way to reset the stats collector altogether? Could this be a 
 corrupt stat file that gets read and written right back on every loop 
 without any sort of validation?

There's stats_reset_on_server_start (sp?), and I think 8.2 also has a
stats-reset function.  But what might be more interesting is to pull the
file-reading function out of pgstat.c and dump out the stats file in
readable form to see what the heck is in there.  (If you decide to try
resetting the stats, I'd suggest saving a copy of the stats file first
for possible analysis later.)  I have the beginnings of such a program
laying about, which I'll attach --- note that it was last used for 8.1
and might require some tweaks for 8.2, and that you'd need to flesh it
out a lot if you want details about individual entries instead of just
a count.

regards, tom lane

/*
 * dumpstat --- simple standalone program to read and analyze a PG stats
 * file.  Based on pgstat_read_statsfile() from 8.1 sources.
 *
 * Currently works with either 8.0 or 8.1 formats depending on which
 * headers it is compiled against.
 */
#include postgres.h

#include pgstat.h


int
main(int argc, char **argv)
{
PgStat_StatDBEntry dbbuf;
PgStat_StatTabEntry tabbuf;
PgStat_StatBeEntry beentry;
FILE   *fpin;
int32   format_id;
int maxbackends = 0;
int havebackends = 0;
int havedbs = 0;
int havetabs = 0;

/*
 * Try to open the status file. If it doesn't exist, the backends simply
 * return zero for anything and the collector simply starts from scratch
 * with empty counters.
 */
if ((fpin = fopen(argv[1], rb)) == NULL)
{
perror(argv[1]);
return 1;
}

/*
 * Verify it's of the expected format.
 */
#ifdef PGSTAT_FILE_FORMAT_ID
if (fread(format_id, 1, sizeof(format_id), fpin) != sizeof(format_id)
|| format_id != PGSTAT_FILE_FORMAT_ID)
{
fprintf(stderr, corrupted pgstat.stat file\n);
goto done;
}
#endif

/*
 * We found an existing collector stats file. Read it and put all the
 * hashtable entries into place.
 */
for (;;)
{
switch (fgetc(fpin))
{
/*
 * 'D'  A PgStat_StatDBEntry struct describing 
a database
 * follows. Subsequently, zero to many 'T' 
entries will follow
 * until a 'd' is encountered.
 */
case 'D':
if (fread(dbbuf, 1, sizeof(dbbuf), fpin) != 
sizeof(dbbuf))
{
fprintf(stderr, corrupted pgstat.stat 
file\n);
goto done;
}
havedbs++;
break;

/*
 * 'd'  End of this database.
 */
case 'd':
break;

/*
 * 'T'  A PgStat_StatTabEntry follows.
 */
case 'T':
if (fread(tabbuf, 1, sizeof(tabbuf), fpin) != 
sizeof(tabbuf))
{
fprintf(stderr, corrupted pgstat.stat 
file\n);
goto done;
}
havetabs++;
break;

/*
 * 'M'  The maximum number of backends to 
expect follows.
 */
case 'M':

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanningGUCvariable

2008-01-29 Thread Stephen Denne

Jeff Davis wrote
  Well clusteredness is used or could be used for a few 
 different heuristics,
  not all of which this would be quite as well satisfied as 
 readahead. But for
 
 Can you give an example? Treating a file as a circular structure does
 not impose any significant cost that I can see.

(Pure speculation follows... if you prefer facts, skip this noise)

The data used to create pg_stats.correlation is involved in estimating the cost 
of an index scan.
It could also be used in estimating the cost of a sequential scan, if the query 
includes a limit.

Consider:
select * from huge_table_clustered_by_A where Amost_As limit 1000

If the correlation for A is close to 1, a sequential scan should be cheaper 
than an index scan.

(If the query also included an order by clause, the sequential scan would have 
to read the entire table to ensure it had found the top 1000, instead of any 
old 1000 returned in order)

If A is a circular structure, you would have to know where it started, and 
include this info in the dump/restore (or lose A's correlation).

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__


---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Win32: Building with Longhorn SDK

2008-01-29 Thread Neil Conway

When building with MSVC 2005 (Express) and the Longhorn Platform SDK
(version 6.0a), it seems that IPPROTO_IPV6 is only defined when
_WIN32_WINNT = 0x0501. This results in a compiler error when trying to
build pqcomm.c (line 389). According to [1], building for Windows 2000
(that is, _WIN32_WINNT == 0x0500) is no longer supported with the
Longhorn version of the Platform SDK.

This isn't a huge problem (we can just require the use of prior versions
of the SDK), but I thought I'd mention it for the archives.

-Neil

[1]
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1960499SiteID=1


---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Opinions about wording of error messages for bug #3883?

2008-01-29 Thread Tom Lane

In the recent discussion of bug #3883 we decided that for safety's sake,
TRUNCATE, CLUSTER, and REINDEX ought to error out if there are any
active scans on the table (or index in the case of REINDEX).  This is
essentially the same as the test currently applied by ALTER TABLE,
which uses this code:

static void
CheckTableNotInUse(Relation rel)
{
intexpected_refcnt;

expected_refcnt = rel-rd_isnailed ? 2 : 1;
if (rel-rd_refcnt != expected_refcnt)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
 errmsg(relation \%s\ is being used by active queries in 
this session,
RelationGetRelationName(rel;

if (AfterTriggerPendingOnRel(RelationGetRelid(rel)))
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
 errmsg(cannot alter table \%s\ because it has pending 
trigger events,
RelationGetRelationName(rel;
}

I would like to export this routine and have it be used by all four
commands, instead of duplicating this logic everywhere.  However,
that brings up the question of whether the error messages are
generic enough for all four commands; and if not, how we want them
to read.  I'm tempted to rephrase both messages along the line of

cannot %s \%s\ because ...

where the first %s is replaced by a SQL command name, viz ALTER TABLE,
CLUSTER, etc.  I'm not sure how nice this is for translation though.

Also, with 8.3 release being so close, it's likely that any change would
not get reflected into translations before release.  I don't think
that's a showstopper because these messages should hardly ever be seen
by normal users anyway; but maybe it's a consideration.

Comments, better ideas?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Opinions about wording of error messages for bug #3883?

2008-01-29 Thread Alvaro Herrera

Tom Lane wrote:

 I would like to export this routine and have it be used by all four
 commands, instead of duplicating this logic everywhere.  However,
 that brings up the question of whether the error messages are
 generic enough for all four commands; and if not, how we want them
 to read.  I'm tempted to rephrase both messages along the line of
 
   cannot %s \%s\ because ...

 where the first %s is replaced by a SQL command name, viz ALTER TABLE,
 CLUSTER, etc.  I'm not sure how nice this is for translation though.

I suggest
cannot execute \%s\ on \%s\ because ...

 Also, with 8.3 release being so close, it's likely that any change would
 not get reflected into translations before release.  I don't think
 that's a showstopper because these messages should hardly ever be seen
 by normal users anyway; but maybe it's a consideration.

I wouldn't worry about that at this point.  We didn't declare a string
freeze anyway ...

It will likely be fixed in 8.3.1 for translations where it matters
anyway, if the translator is not able to do it for 8.3.  (That's
currently only fr, de and es -- currently even tr is a bit behind).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Opinions about wording of error messages for bug #3883?

2008-01-29 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'm tempted to rephrase both messages along the line of
 cannot %s \%s\ because ...
 
 where the first %s is replaced by a SQL command name, viz ALTER TABLE,
 CLUSTER, etc.  I'm not sure how nice this is for translation though.

 I suggest
   cannot execute \%s\ on \%s\ because ...

Hmm, why not just

cannot execute %s \%s\ because ...

?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable

2008-01-29 Thread Guillaume Smet

On Jan 29, 2008 8:09 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
  synchronize[d]_seqscan sounds a bit better in my ears than the plural
  synchronize_seqscans.

 The plural seems better to me; there's no such thing as a solitary
 synchronized scan, no?  The whole point of the feature is to affect
 the behavior of multiple scans.

+1. The plural is important IMHO.

 BTW, so far as the rest of the thread goes, I'm not necessarily opposed
 to exposing the switchover threshold as a tunable.  But I think it needs
 more thought to design than we can give it in time for 8.3 (because of
 the interaction with the buffer access strategy stuff).

+1. The current patch is simple and so far in the cycle, I really
think we should keep it that way.

 The feature switch can be justified on grounds
 of backwards compatibility quite independently of whether pg_dump uses
 it.  Or is someone prepared to argue that there are no applications out
 there that will be broken if the same query, against the same unchanging
 table, yields different results from one trial to the next?

As I stated earlier, I don't really like this argument (we already
broke badly designed applications a few times in the past) but we
really need a way to guarantee that the execution of a query is stable
and doesn't depend on external factors. And the original problem was
to guarantee that pg_dump builds a dump as identical as possible to
the existing data by ignoring external factors. It's now the case with
your patch.
The fact that it allows us not to break existing applications relying
too much on physical ordering is a nice side effect though :).

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] 8.3RC1 on windows missing descriptive Event handle names

2008-01-29 Thread Stephen Denne

I said...
 On Windows XP, using Process Explorer with the lower pane 
 showing Handles, not all postgres.exe processes are including 
 an Event type with a description of what the process is doing.

I've had difficulty reproducing this, but I now suspect that it is only 
happening when running both v8.2 and v8.3rc1 at once, and I think it is the 
second started that is missing the process descriptions.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Possible future performance improvement: sort updates/deletes by ctid

2008-01-29 Thread Tom Lane

We've had a couple of discussions recently revolving around the
inefficiency of using hashjoin/hashaggregation output to update a target
table, because of the resulting very random access pattern.  I believe
this same mechanism is underlying the slowness of Stephen Denne's
alternate query described here:
http://archives.postgresql.org/pgsql-performance/2008-01/msg00227.php

I made up the attached doubtless-oversimplified test case to model what
he was seeing.  It's cut down about 4x from the table size he describes,
but the UPDATE still takes forever --- I gave up waiting after 2 hours,
when it had deleted about a fifth of its hashjoin temp files, suggesting
that the total runtime would be about 10 hours.

A brute force idea for fixing this is to sort the intended update or
delete operations of an UPDATE/DELETE command according to the target
table's ctid, which is available for free anyway since the executor top
level must have it to perform the operation.  I made up an even more
brute force patch (also attached) that forces that to happen for every
UPDATE or DELETE --- obviously we'd not want that for real, it's just
for crude performance testing.  With that patch, I got the results

  QUERY PLAN
   
---
 Sort  (cost=6075623.03..6085623.05 rows=408 width=618) (actual 
time=2078726.637..3371944.124 rows=400 loops=1)
   Sort Key: df.ctid
   Sort Method:  external merge  Disk: 2478992kB
   -  Hash Join  (cost=123330.50..1207292.72 rows=408 width=618) (actual 
time=20186.510..721120.455 rows=400 loops=1)
 Hash Cond: (df.document_id = d.id)
 -  Seq Scan on document_file df  (cost=0.00..373334.08 rows=408 
width=614) (actual time=11.775..439993.807 rows=400 loops=1)
 -  Hash  (cost=57702.00..57702.00 rows=4000200 width=8) (actual 
time=19575.885..19575.885 rows=400 loops=1)
   -  Seq Scan on document d  (cost=0.00..57702.00 rows=4000200 
width=8) (actual time=0.039..14335.615 rows=400 loops=1)
 Total runtime: 3684037.097 ms

or just over an hour runtime --- still not exactly speedy, but it
certainly compares favorably to the estimated 10 hours for unsorted
updates.

This is with default shared_buffers (32MB) and work_mem (1MB);
a more aggressive work_mem would have meant fewer hash batches and fewer
sort runs and hence better performance in both cases, but with the
majority of the runtime going into the sort step here, I think that the
sorted update would benefit much more.

Nowhere near a workable patch of course, but seems like food for
thought.

regards, tom lane

drop table if exists document;
drop table if exists document_file ;

create table document (document_type_id int, id int primary key);
create table document_file (document_type_id int, document_id int primary key,
   filler char(600));

insert into document_file select x,x,'z' from generate_series(1,400) x;
insert into document select x,x from generate_series(1,400) x;

analyze document_file;
analyze document;

set enable_mergejoin = false;

explain analyze UPDATE ONLY document_file AS df SET document_type_id = 
d.document_type_id FROM document AS d WHERE d.id = document_id;
Index: src/backend/optimizer/prep/preptlist.c
===
RCS file: /cvsroot/pgsql/src/backend/optimizer/prep/preptlist.c,v
retrieving revision 1.88
diff -c -r1.88 preptlist.c
*** src/backend/optimizer/prep/preptlist.c  1 Jan 2008 19:45:50 -   
1.88
--- src/backend/optimizer/prep/preptlist.c  30 Jan 2008 03:06:30 -
***
*** 32,37 
--- 32,38 
  #include optimizer/var.h
  #include parser/analyze.h
  #include parser/parsetree.h
+ #include parser/parse_clause.h
  #include parser/parse_coerce.h
  
  
***
*** 103,108 
--- 104,120 
tlist = list_copy(tlist);
  
tlist = lappend(tlist, tle);
+ 
+   /*
+* Force the query result to be sorted by CTID, for better 
update
+* speed.  (Note: we expect parse-sortClause to be NIL here,
+* but this code will do no harm if it's not.)
+*/
+   parse-sortClause = addTargetToSortList(NULL, tle,
+   
parse-sortClause, tlist,
+   
SORTBY_DEFAULT,
+   
SORTBY_NULLS_DEFAULT,
+   
NIL, false);

Re: [pgtranslation-translators] [HACKERS] Opinions about wording of error messages for bug #3883?

2008-01-29 Thread Alvaro Herrera

Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:

  I suggest
  cannot execute \%s\ on \%s\ because ...
 
 Hmm, why not just
 
   cannot execute %s \%s\ because ...
 
 ?

Hmm, yeah, that seems fine too.  Thinking more about it, from the POV of
the translator probably the three forms are the same because he has all
the elements to construct the phrase however he sees fit.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Possible future performance improvement: sort updates/deletes by ctid

2008-01-29 Thread Stephen Denne

 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 doubtless-oversimplified
It looks equivalent.

 With that patch, I got the results
...
-  Hash Join  (cost=123330.50..1207292.72 rows=408 
 width=618) (actual time=20186.510..721120.455 rows=400 loops=1)

The plan from here is equivalent to the query plan that I had.
In an update query, does the actual time = 721120 mean that after 12 minutes it 
had completed figuring out what to update, and what to?

 This is with default shared_buffers (32MB) and work_mem (1MB);

I had tried a few larger settings, and though I had fewer temp files created, 
they still took longer than I was willing to wait to process.
I did figure out that contention with the background writer or checkpoint 
processing probably wasn't a large contributor.

How hard is it to match, recognise potential benefit, and rewrite the query from

UPDATE ONLY document_file AS df SET document_type_id = 
d.document_type_id FROM document AS d WHERE d.id = document_id;

to

UPDATE ONLY document_file AS df SET document_type_id = 
(SELECT d.document_type_id FROM document AS d WHERE d.id = document_id);

Which is several orders of magnitude faster for me.

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Possible future performance improvement: sort updates/deletes by ctid

2008-01-29 Thread Tom Lane

Stephen Denne [EMAIL PROTECTED] writes:
 How hard is it to match, recognise potential benefit, and rewrite the query 
 from

 UPDATE ONLY document_file AS df SET document_type_id = 
 d.document_type_id FROM document AS d WHERE d.id = document_id;

 to

 UPDATE ONLY document_file AS df SET document_type_id = 
 (SELECT d.document_type_id FROM document AS d WHERE d.id = document_id);

 Which is several orders of magnitude faster for me.

At the planner level that would be entirely the wrong way to go about
it, because that's forcing the equivalent of a nestloop join, which is
very unlikely to be faster for the numbers of rows that we're talking
about here.  The reason it looks faster to you is that the benefits of
updating the document_file rows in ctid order outweigh the costs of the
dumb join strategy ... but what we want to achieve here is to have both
benefits, or at least to give the planner the opportunity to make a
cost-driven decision about what to do.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Possible future performance improvement: sort updates/deletes by ctid

2008-01-29 Thread Stephen Denne

 At the planner level that would be entirely the wrong way to go about
 it, because that's forcing the equivalent of a nestloop join, which is
 very unlikely to be faster for the numbers of rows that we're talking
 about here.  The reason it looks faster to you is that the benefits of
 updating the document_file rows in ctid order outweigh the 
 costs of the
 dumb join strategy ... but what we want to achieve here is to 
 have both
 benefits, or at least to give the planner the opportunity to make a
 cost-driven decision about what to do.

Ok.

Here are some more data points, using a smaller table, v8.2.6:


Seq Scan on document_file df  (cost=0.00..208480.85 rows=25101 width=662) 
(actual time=0.239..773.834 rows=25149 loops=1)
  SubPlan
-  Index Scan using pk_document_id on document d  (cost=0.00..8.27 rows=1 
width=4) (actual time=0.011..0.015 rows=1 loops=25149)
  Index Cond: (id = $0)
Total runtime: 4492.363 ms



vs


Hash Join  (cost=1048.85..6539.32 rows=25149 width=666) (actual 
time=575.079..1408.363 rows=25149 loops=1)
  Hash Cond: (df.document_id = d.id)
  -  Seq Scan on document_file df  (cost=0.00..4987.49 rows=25149 width=662) 
(actual time=60.724..824.195 rows=25149 loops=1)
  -  Hash  (cost=734.49..734.49 rows=25149 width=8) (actual 
time=40.271..40.271 rows=25149 loops=1)
-  Seq Scan on document d  (cost=0.00..734.49 rows=25149 width=8) 
(actual time=0.055..22.559 rows=25149 loops=1)
Total runtime: 34961.504 ms


These are fairly repeatable for me after doing a vacuum full analyze of the two 
tables.


Have I simply not tuned postgres so that it knows it has everything on a single 
old IDE drive, not split over a few sets of raided SSD drives, hence 
random_page_cost should perhaps be larger than 4.0? Would that make the second 
estimate larger than the first estimate?

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] Will PostgreSQL get ported to CUDA?

2008-01-29 Thread Dann Corbit

http://www.scientificcomputing.com/ShowPR~PUBCODE~030~ACCT~300100~IS
SUE~0801~RELTYPE~HPCC~PRODCODE~~PRODLETT~C.html

http://www.nvidia.com/object/cuda_learn.html

http://www.nvidia.com/object/cuda_get.html

46 matches

Mail list logo