Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
On 1/28/08, Greg Sabino Mullane [EMAIL PROTECTED] wrote: I am not thrilled about moving _some_ of pgcrypto into the backend --- pgcrypto right now seems well designed and if we pull part of it out it seems it will be less clear than what we have now. Perhaps we just need to document that md5() isn't for general use and some function in pgcrypto should be used instead? I don't think docs will help much. The md5() has broken the policy of all crypto in external module so I'm now thinking we should just bite the bullet and add digest() to core. Exctracting the part from pgcrypto is no-brainer, the hashes have very thin wrapper around them, only thing common with rest of pgcrypto is error handling, which hashes use the least. Only thing that needs minor thought is that it would be nice to access hashes from external module - that means exporting the find_digest() function to it. Which should be no problem. I think looking at this as putting some of pg_crypto into core is looking at this the wrong way. We are never going to put the whole thing into core given the current state of cryptography laws, as obviously the current status of giving users md5() and nothing else is not ideal. What we're looking for is a middle ground. It seems to me we've narrowed it down to two questions: 1) Does sha1(), or other hashing algorithms risk running afoul of cryptography regulations? I'm 100% sure that sha1() itself is not a problem (it's even a PHP builtin, and good luck finding a box these days wihout that monstrosity installed). I highly doubt any of the rest (SHA*, HMAC, etc.) are a problem either: we're doing a one-way hash, not encrypting data. But common sense and cryptography have seldom been seen together since the start of the cold war, so I'll hold my final judgement. 2) Which ones do we include? Putting sha1() seems a no-brainer, but as Joe points out, why not add all the rest in at the same time? Considering we have DES and MD5 already, then how can SHA2 be a problem when SHA1 isnt? -- marko ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] SHA1 on postgres 8.3
* Bruce Momjian: I am not thrilled about moving _some_ of pgcrypto into the backend --- pgcrypto right now seems well designed and if we pull part of it out it seems it will be less clear than what we have now. Perhaps we just need to document that md5() isn't for general use and some function in pgcrypto should be used instead? Yes, that would probably help those folks doing checklist-based security audits. -- Florian Weimer[EMAIL PROTECTED] BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanningGUCvariable
Jeff Davis wrote: On Mon, 2008-01-28 at 23:13 +, Heikki Linnakangas wrote: It's a good point that we don't want pg_dump to screw up the cluster order, but that's the only use case I've seen this far for disabling sync scans. Even that wouldn't matter much if our estimate for clusteredness didn't get screwed up by a table that looks like this: 5 6 7 8 9 1 2 3 4 It doesn't seem like there is any reason for the estimate to get confused, but it apparently does. I loaded a test table with a similar distribution to your example, and it shows a correlation of about -0.5, but it should be as good as something near -1 or +1. I am not a statistics expert, but it seems like a better measurement would be: what is the chance that, if the tuples are close together in index order, the corresponding heap tuples are close together?. The answer to that question in your example is very likely, so there would be no problem. Is there a reason we don't do this? It has been discussed before, but no-one has come up with a good measurement for that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] GSSAPI and V2 protocol
Is it possible to authenticate using GSSAPI over the V2 protocol? Is there any documentation on the message formats for V2? Kris Jurka ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] How to use VB6 for store image to postgresql?
Dear All, Haveyou ever store image to postgresql using VB6? Could you give me an example? Thank you, Premsun NETsolutions Asia Limited +66 (2) 237 7247
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
It's a good point that we don't want pg_dump to screw up the cluster order, but that's the only use case I've seen this far for disabling sync scans. Even that wouldn't matter much if our estimate for clusteredness didn't get screwed up by a table that looks like this: 5 6 7 8 9 1 2 3 4 I do think the guc to turn it off is useful, only I don't understand the reasoning that pg_dump needs it to maintain the basic clustered property. Sorry, but I don't grok this at all. Why the heck would we care if we have 2 parts of the table perfectly clustered, because we started in the middle ? Surely our stats collector should recognize such a table as perfectly clustered. Does it not ? We are talking about one breakage in the readahead logic here, this should only bring the clustered property from 100% to some 99.99% depending on table size vs readahead window. Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: Sorry, but I don't grok this at all. Why the heck would we care if we have 2 parts of the table perfectly clustered, because we started in the middle ? Surely our stats collector should recognize such a table as perfectly clustered. Does it not ? We are talking about one breakage in the readahead logic here, this should only bring the clustered property from 100% to some 99.99% depending on table size vs readahead window. Well clusteredness is used or could be used for a few different heuristics, not all of which this would be quite as well satisfied as readahead. But for the most common application, namely trying to figure out whether index probes for sequential ids will be sequential i/o or random i/o you're right. Currently the statistic we use to estimate this is the correlation of the column value with the physical location on disk. That's not a perfect metric for estimating how much random i/o would be needed to scan the table in index order though. It would be great if Postgres picked up a serious statistics geek who could pipe up in discussions like this with how about using the Euler-Jacobian Centroid or some such thing. If you have any suggestions of what metric to use and how to calculate the info we need from it that would be great. One suggestion from a long way back was scanning the index and counting how many times the item pointer moves backward to an earlier block. That would still require a full index scan though. And it doesn't help for columns which aren't indexed though I'm not sure we need this info for columns which aren't indexed. It's also not clear how to interpolate from that the amount of random access a given query would perform. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] RFC: array_agg() per SQL:200n
Am Montag, 28. Januar 2008 schrieb Neil Conway: To parse the ORDER BY clause, we'd need to special-case array_agg() in the grammar The ORDER BY clause would also used in XMLAGG, so we should try to parse this in a generalized way. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Bogus cleanup code in GSSAPI/SSPI patch
On Mon, Jan 28, 2008 at 09:11:31PM -0500, Tom Lane wrote: Hi Magnus, Would you take a look at the patch I just committed in fe-connect.c? I found out today that PQreset() wasn't working on a GSSAPI connection, because closePGconn hadn't been patched to clear out the GSSAPI state (resulting in duplicate GSS authentication request failure). I think I fixed that but it wouldn't be bad for someone who knows that code better to double-check. Also, I can't test the ENABLE_SSPI case, so that needs to be looked at. The test case I've been using is to select 1; in psql, then kill -9 the backend from another window, then select 1; again. psql should recover and reconnect successfully. Looks good from here. As for the SSPI case, it actually recovered just fine with SSPI before this patch - from how I read it there was a memory leak, though. And I can confirm that it still works after this patch. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
+1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release cycle we do test the cases Simon described above and we agree we need to do a fine tune to benefit from this feature, we will need to deprecate 'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). Looking at this perpective, IMHO we should go with the number (0.25) instead of the boolean. Surely the risk-of-needing-to-deprecate argument applies ten times more strongly to a number than a boolean. Yes, I would expect the tuning to be more system than user specific. So imho a boolean userset would couple well with a tuning guc, that may usefully not be userset (if we later discover a need for tuning at all). so +1 for the bool. synchronize[d]_seqscan sounds a bit better in my ears than the plural synchronize_seqscans. To me the latter somehow suggests influece on the whole cluster, probably not worth further discussion though, so if someone says no, ok. Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] autonomous transactions
Simon Riggs wrote: On Tue, 2008-01-22 at 20:53 +0100, Pavel Stehule wrote: Agreed. I think Pavel Stehule was doing some experiments with them, I don't know if he got anywhere. I did only first research. Any hack is possible - you can stack current transaction, but real implementation needs similar work like nested transaction :( and it is too low level for me. And some code cleaning is necessary. There are global variables. And there is most important question about data visibility - is autonomous transaction independent on main transaction (isolation)? You have to thing about deadlock, about reference integrity, etc. This task isn't simple. Yes, I think autonomous transactions should be on the TODO. They're useful for - error logging - auditing - creating new partitions automatically Plus I think we'd be able to improve the code for CREATE INDEX under HOT, and probably a few other wrinkly bits of code. Added to TODO: * Add anonymous transactions http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] autonomous transactions
Bruce Momjian wrote: Plus I think we'd be able to improve the code for CREATE INDEX under HOT, and probably a few other wrinkly bits of code. Added to TODO: * Add anonymous transactions http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php Sorry, updated to Add _autonomous_ transactions. (The one time I don't cut/paste and I get it wrong.) -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Large pgstat.stat file causes I/O storm
On Tue, 29 Jan 2008, Cristian Gafton wrote: I have a ~150GB sized server, containing two databases that are active in mostly read mode. I have noticed lately that the global/pgstat.stat file is somewhere around 1MB freshly after a restart, but at some point it baloons to 74MB in size for no apparent reason, after a few hours of uptime. Needless to say, having the stats collector dump 74MB of stuff on disk on its every loop takes a big bite of the I/O capabilities of this box. Of course, leaving out the most important thing - this is postgresql 8.2.6 on x86_64 Looking at all the othe replicas I have of this database (but which are under a more lightweight read load), the pgstat.stat file again is rather small in size. Am I right to assume that a 74MB pgstat.stat file is not normal - and what might have caused it? Cristian -- Cristian Gafton rPath, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Large pgstat.stat file causes I/O storm
Hello all, I have a ~150GB sized server, containing two databases that are active in mostly read mode. I have noticed lately that the global/pgstat.stat file is somewhere around 1MB freshly after a restart, but at some point it baloons to 74MB in size for no apparent reason, after a few hours of uptime. Needless to say, having the stats collector dump 74MB of stuff on disk on its every loop takes a big bite of the I/O capabilities of this box. Looking at all the othe replicas I have of this database (but which are under a more lightweight read load), the pgstat.stat file again is rather small in size. Am I right to assume that a 74MB pgstat.stat file is not normal - and what might have caused it? Thanks, Cristian -- Cristian Gafton rPath, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Tue, Jan 29, 2008 at 10:40:40AM +0100, Zeugswetter Andreas ADI SD wrote: It's a good point that we don't want pg_dump to screw up the cluster order, but that's the only use case I've seen this far for disabling sync scans. Even that wouldn't matter much if our estimate for clusteredness didn't get screwed up by a table that looks like this: 5 6 7 8 9 1 2 3 4 I do think the guc to turn it off is useful, only I don't understand the reasoning that pg_dump needs it to maintain the basic clustered property. Sorry, but I don't grok this at all. Why the heck would we care if we have 2 parts of the table perfectly clustered, because we started in the middle ? Surely our stats collector should recognize such a table as perfectly clustered. Does it not ? We are talking about one breakage in the readahead logic here, this should only bring the clustered property from 100% to some 99.99% depending on table size vs readahead window. Andreas Andreas, I agree with your logic. If the process that PostgreSQL uses to determine how clustered a table is that breaks with such a layout, we may need to see what should be changed to make it work. Having had pg_dump cause a database to grind to a halt, I would definitely like the option of using the synchronized scans even for clustered tables. Ken ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Euler Taveira de Oliveira [EMAIL PROTECTED] writes: +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release cycle we do test the cases Simon described above and we agree we need to do a fine tune to benefit from this feature, we will need to deprecate 'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). Looking at this perpective, IMHO we should go with the number (0.25) instead of the boolean. Surely the risk-of-needing-to-deprecate argument applies ten times more strongly to a number than a boolean. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] How to use VB6 for store image to postgresql?
I guess the fastest way is to: - create a column of type text. ex. create table foo( myimage text ); - read the contents of your image from file and encode it in base64 using: http://www.vbforums.com/attachment.php?s=42957f48bac95dd18ca8bffcf7578dcc http://www.vbforums.com/attachment.php?s=42957f48bac95dd18ca8bffcf7578dcca ttachmentid=49287d=1152543402 attachmentid=49287d=1152543402 - save the base64 encoded string in the database I hope this helps. Regards, Gevik. _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Premsun Choltanwanich Sent: Tuesday, January 29, 2008 10:28 AM To: pgsql-hackers@postgresql.org Subject: [HACKERS] How to use VB6 for store image to postgresql? Dear All, Have you ever store image to postgresql using VB6? Could you give me an example? Thank you, Premsun NETsolutions Asia Limited +66 (2) 237 7247 http://www.nsasia.co.th NETsolutions Asia Limited IMAGE.jpg
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Tue, 2008-01-29 at 10:55 +, Gregory Stark wrote: Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: Sorry, but I don't grok this at all. Why the heck would we care if we have 2 parts of the table perfectly clustered, because we started in the middle ? Surely our stats collector should recognize such a table as perfectly clustered. Does it not ? We are talking about one breakage in the readahead logic here, this should only bring the clustered property from 100% to some 99.99% depending on table size vs readahead window. Well clusteredness is used or could be used for a few different heuristics, not all of which this would be quite as well satisfied as readahead. But for Can you give an example? Treating a file as a circular structure does not impose any significant cost that I can see. It would be great if Postgres picked up a serious statistics geek who could pipe up in discussions like this with how about using the Euler-Jacobian Centroid or some such thing. If you have any suggestions of what metric to use and how to calculate the info we need from it that would be great. Agreed. One suggestion from a long way back was scanning the index and counting how many times the item pointer moves backward to an earlier block. That would An interesting metric. As you say, we really need a statistician to definitively say what the correct metrics are, and what kind of sampling we need to make good estimates. still require a full index scan though. And it doesn't help for columns which aren't indexed though I'm not sure we need this info for columns which aren't indexed. It's also not clear how to interpolate from that the amount of random access a given query would perform. I don't think clusteredness has any meaning at all in postgres for an unindexed column. I suppose a table could be clustered without an index, but currently there's no way to do that in postgresql. Regards, Jeff Davis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Simon Riggs wrote: And if you have a partitioned table with partitions inconveniently sized? You'd need to *reduce* shared_buffers specifically to get synch scans and BAS to kick in. Or increase partition size. Both of which reduce the impact of the benefits we've added. I don't think the argument that a table is smaller than shared buffers therefore it is already in shared buffers holds true in all cases. I/O does matter. +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release cycle we do test the cases Simon described above and we agree we need to do a fine tune to benefit from this feature, we will need to deprecate 'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). Looking at this perpective, IMHO we should go with the number (0.25) instead of the boolean. -- Euler Taveira de Oliveira http://www.timbira.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Tue, Jan 29, 2008 at 1:09 PM, in message [EMAIL PROTECTED], Tom Lane [EMAIL PROTECTED] wrote: Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? If geqo kicks in, we're already there, aren't we? Isn't an application which counts on the order of result rows without specifying ORDER BY fundamentally broken? -Kevin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Large pgstat.stat file causes I/O storm
On Tue, 29 Jan 2008, Tom Lane wrote: (Pokes around in the code...) I think the problem here is that the only active mechanism for flushing dead stats-table entries is pgstat_vacuum_tabstat(), which is invoked by a VACUUM command or an autovacuum. Once-a-day VACUUM isn't gonna cut it for you under those circumstances. What you might do is just issue a VACUUM on some otherwise-uninteresting small table, once an hour or however often you need to keep the stats file bloat to a reasonable level. I just ran a vacuumdb -a on the box - the pgstat file is still 90MB in size. If vacuum is supposed to clean up the cruft from pgstat, then I don't know if we're looking at the right cruft - I kind of expected the pgstat file to go down in size and the I/O storm to subside, but that is not happening after vacuum. I will try to instrument the application to record the oids of the temp tables it creates and investigate from that angle, but in the meantime is there any way to reset the stats collector altogether? Could this be a corrupt stat file that gets read and written right back on every loop without any sort of validation? Thanks, Cristian -- Cristian Gafton rPath, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Transition functions for SUM(::int2), SUM(::int4, SUM(::int8])
Caleb Welton [EMAIL PROTECTED] writes: Forgive me if I'm being dense, but could you explain why int4_sum(bigint,int) must not be strict, but int4_avg_accum(bytea, int) can be strict? The former is used with a NULL initial value, the latter isn't. Read the fine print in the CREATE AGGREGATE man page about behavior with strict transition functions. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Kevin Grittner [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? If geqo kicks in, we're already there, aren't we? Yup, and that's one of the reasons we have a way to turn geqo off. (geqo is actually a good precedent for this --- notice that it has an on/off switch that's separate from its tuning knobs.) Isn't an application which counts on the order of result rows without specifying ORDER BY fundamentally broken? No doubt, but if it's always worked before, people are going to be unhappy anyway. Also, it's not just ordering that's at stake. Try regression=# create table foo as select x from generate_series(1,100) x; SELECT regression=# select * from foo limit 1; x --- 1 2 3 4 regression=# select * from foo limit 1; x --- 7233 7234 7235 7236 regression=# select * from foo limit 1; x --- 14465 14466 14467 14468 Now admittedly we've never promised LIMIT without ORDER BY to be well-defined either, but not everybody reads the fine print. This case is particularly nasty because at smaller LIMIT values the result *is* consistent, so you might never notice the problem while testing. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Large pgstat.stat file causes I/O storm
On Tue, 29 Jan 2008, Tom Lane wrote: Cristian Gafton [EMAIL PROTECTED] writes: Autovacuum is disabled, since the database is mostly read only. There is a vacuumdb -a -z running nightly on the box. However, the application that queries it does a lot of work with temporary tables - would those bloat the stats at all? Conceivably, if you mean a lot of short-lived tables rather than a lot of operations on a few tables. However, I'd think that would result in a steady accumulation of stats entries, not a sudden jump as you seemed to describe. We are churning through a bunch of short-lived temp tables. Since I reported the problem, the pgstat file is now sitting at 85M, yet the pg_stat* tables barely have any entries in them: count(*) pg_stats298 pg_statistic298 pg_stat_all_indexes 76 pg_stat_all_tables 76 pg_statio_all_tables56 pg_statio_all_indexes 76 Is there a way to inspect the pgstat file and see what's in it that it is taking all this space? (it's not the space that bothers me, it's the fact that the statistics collector has to dump 85MB of stuff once a second to disk...) Thanks, Cristian -- Cristian Gafton rPath, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: synchronize[d]_seqscan sounds a bit better in my ears than the plural synchronize_seqscans. The plural seems better to me; there's no such thing as a solitary synchronized scan, no? The whole point of the feature is to affect the behavior of multiple scans. BTW, so far as the rest of the thread goes, I'm not necessarily opposed to exposing the switchover threshold as a tunable. But I think it needs more thought to design than we can give it in time for 8.3 (because of the interaction with the buffer access strategy stuff). Also I don't like having pg_dump manipulating a tuning parameter. I don't see anything wrong with having both an on/off feature switch and a tunable in future releases. The feature switch can be justified on grounds of backwards compatibility quite independently of whether pg_dump uses it. Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Large pgstat.stat file causes I/O storm
Cristian Gafton [EMAIL PROTECTED] writes: Autovacuum is disabled, since the database is mostly read only. There is a vacuumdb -a -z running nightly on the box. However, the application that queries it does a lot of work with temporary tables - would those bloat the stats at all? Conceivably, if you mean a lot of short-lived tables rather than a lot of operations on a few tables. However, I'd think that would result in a steady accumulation of stats entries, not a sudden jump as you seemed to describe. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Large pgstat.stat file causes I/O storm
Cristian Gafton [EMAIL PROTECTED] writes: On Tue, 29 Jan 2008, Cristian Gafton wrote: I have a ~150GB sized server, containing two databases that are active in mostly read mode. I have noticed lately that the global/pgstat.stat file is somewhere around 1MB freshly after a restart, but at some point it baloons to 74MB in size for no apparent reason, after a few hours of uptime. Needless to say, having the stats collector dump 74MB of stuff on disk on its every loop takes a big bite of the I/O capabilities of this box. Of course, leaving out the most important thing - this is postgresql 8.2.6 on x86_64 Hmm ... do you have autovacuum enabled? If not, what's the vacuuming policy on that box? I'm wondering if this is triggered by something deciding to vacuum or analyze a bunch of otherwise-unused tables, and thereby causing stats entries to be created for those tables. You could investigate by comparing the contents of the stats views before and after the file balloons. I would expect to see a lot more rows, and the key is exactly what non-null activity is recorded in the extra rows. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Large pgstat.stat file causes I/O storm
On Tue, 29 Jan 2008, Tom Lane wrote: On Tue, 29 Jan 2008, Cristian Gafton wrote: I have a ~150GB sized server, containing two databases that are active in mostly read mode. I have noticed lately that the global/pgstat.stat file is somewhere around 1MB freshly after a restart, but at some point it baloons to 74MB in size for no apparent reason, after a few hours of uptime. Needless to say, having the stats collector dump 74MB of stuff on disk on its every loop takes a big bite of the I/O capabilities of this box. Of course, leaving out the most important thing - this is postgresql 8.2.6 on x86_64 Hmm ... do you have autovacuum enabled? If not, what's the vacuuming policy on that box? I'm wondering if this is triggered by something deciding to vacuum or analyze a bunch of otherwise-unused tables, and thereby causing stats entries to be created for those tables. Autovacuum is disabled, since the database is mostly read only. There is a vacuumdb -a -z running nightly on the box. However, the application that queries it does a lot of work with temporary tables - would those bloat the stats at all? You could investigate by comparing the contents of the stats views before and after the file balloons. I would expect to see a lot more rows, and the key is exactly what non-null activity is recorded in the extra rows. Any one of the stats views in particular? Currently all of the stats_* flags are set to on. Thanks, Cristian -- Cristian Gafton rPath, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Large pgstat.stat file causes I/O storm
Cristian Gafton [EMAIL PROTECTED] writes: We are churning through a bunch of short-lived temp tables. I think that's probably the root of the problem ... Since I reported the problem, the pgstat file is now sitting at 85M, yet the pg_stat* tables barely have any entries in them: count(*) pg_stats 298 pg_statistic 298 pg_stat_all_indexes 76 pg_stat_all_tables76 pg_statio_all_tables 56 pg_statio_all_indexes 76 Those views are joins against pg_class, so only tables that have live pg_class rows can possibly show up there. You could try remembering the OIDs of some temp tables and probing the underlying pg_stat_get_xxx() functions to see if there are stats-table entries for them. (Pokes around in the code...) I think the problem here is that the only active mechanism for flushing dead stats-table entries is pgstat_vacuum_tabstat(), which is invoked by a VACUUM command or an autovacuum. Once-a-day VACUUM isn't gonna cut it for you under those circumstances. What you might do is just issue a VACUUM on some otherwise-uninteresting small table, once an hour or however often you need to keep the stats file bloat to a reasonable level. There is a pgstat_drop_relation() function to tell the stats collector to drop a single table entry, but it's not being called from anyplace. We probably ought to try a bit harder to make that work. The problem is described here: 2007-07-08 18:23 tgl * src/: backend/postmaster/pgstat.c, backend/storage/smgr/smgr.c, include/pgstat.h (REL8_1_STABLE), backend/postmaster/pgstat.c, backend/storage/smgr/smgr.c, include/pgstat.h (REL8_2_STABLE), backend/postmaster/pgstat.c, backend/storage/smgr/smgr.c, include/pgstat.h: Remove the pgstat_drop_relation() call from smgr_internal_unlink(), because we don't know at that point which relation OID to tell pgstat to forget. The code was passing the relfilenode, which is incorrect, and could possibly cause some other relation's stats to be zeroed out. While we could try to clean this up, it seems much simpler and more reliable to let the next invocation of pgstat_vacuum_tabstat() fix things; which indeed is how it worked before I introduced the buggy code into 8.1.3 and later :-(. Problem noticed by Itagaki Takahiro, fix is per subsequent discussion. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Ron Mayer [EMAIL PROTECTED] writes: Tom Lane wrote: Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? Won't even autovacuum analyze cause this too if the new stats changes the plan? Given that the table is unchanging, that's moderately unlikely to happen (especially for select * from foo ;-)) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Transition functions for SUM(::int2), SUM(::int4, SUM(::int8])
Thanks Tom, Forgive me if I'm being dense, but could you explain why int4_sum(bigint,int) must not be strict, but int4_avg_accum(bytea, int) can be strict? It seems that both of them have transition datatypes that are different from the input. -Caleb On 1/28/08 7:31 PM, Tom Lane [EMAIL PROTECTED] wrote: Caleb Welton [EMAIL PROTECTED] writes: Is there any reason that int2_sum, int4_sum, and int8_sum are not marked as being strict? They wouldn't work otherwise, because the transition datatypes aren't the same as the inputs. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Tom Lane wrote: Kevin Grittner [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? Won't even autovacuum analyze cause this too if the new stats changes the plan? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Large pgstat.stat file causes I/O storm
Cristian Gafton [EMAIL PROTECTED] writes: I just ran a vacuumdb -a on the box - the pgstat file is still 90MB in size. If vacuum is supposed to clean up the cruft from pgstat, then I don't know if we're looking at the right cruft - I kind of expected the pgstat file to go down in size and the I/O storm to subside, but that is not happening after vacuum. Hmph ... I did a simple test here involving creating a lot of temp tables, and indeed it made the stats file bigger, but the size went right down again after vacuuming. Is it possible that the vacuumdb failed to connect to the particular database in which the temp tables are coming and going? I will try to instrument the application to record the oids of the temp tables it creates and investigate from that angle, but in the meantime is there any way to reset the stats collector altogether? Could this be a corrupt stat file that gets read and written right back on every loop without any sort of validation? There's stats_reset_on_server_start (sp?), and I think 8.2 also has a stats-reset function. But what might be more interesting is to pull the file-reading function out of pgstat.c and dump out the stats file in readable form to see what the heck is in there. (If you decide to try resetting the stats, I'd suggest saving a copy of the stats file first for possible analysis later.) I have the beginnings of such a program laying about, which I'll attach --- note that it was last used for 8.1 and might require some tweaks for 8.2, and that you'd need to flesh it out a lot if you want details about individual entries instead of just a count. regards, tom lane /* * dumpstat --- simple standalone program to read and analyze a PG stats * file. Based on pgstat_read_statsfile() from 8.1 sources. * * Currently works with either 8.0 or 8.1 formats depending on which * headers it is compiled against. */ #include postgres.h #include pgstat.h int main(int argc, char **argv) { PgStat_StatDBEntry dbbuf; PgStat_StatTabEntry tabbuf; PgStat_StatBeEntry beentry; FILE *fpin; int32 format_id; int maxbackends = 0; int havebackends = 0; int havedbs = 0; int havetabs = 0; /* * Try to open the status file. If it doesn't exist, the backends simply * return zero for anything and the collector simply starts from scratch * with empty counters. */ if ((fpin = fopen(argv[1], rb)) == NULL) { perror(argv[1]); return 1; } /* * Verify it's of the expected format. */ #ifdef PGSTAT_FILE_FORMAT_ID if (fread(format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) || format_id != PGSTAT_FILE_FORMAT_ID) { fprintf(stderr, corrupted pgstat.stat file\n); goto done; } #endif /* * We found an existing collector stats file. Read it and put all the * hashtable entries into place. */ for (;;) { switch (fgetc(fpin)) { /* * 'D' A PgStat_StatDBEntry struct describing a database * follows. Subsequently, zero to many 'T' entries will follow * until a 'd' is encountered. */ case 'D': if (fread(dbbuf, 1, sizeof(dbbuf), fpin) != sizeof(dbbuf)) { fprintf(stderr, corrupted pgstat.stat file\n); goto done; } havedbs++; break; /* * 'd' End of this database. */ case 'd': break; /* * 'T' A PgStat_StatTabEntry follows. */ case 'T': if (fread(tabbuf, 1, sizeof(tabbuf), fpin) != sizeof(tabbuf)) { fprintf(stderr, corrupted pgstat.stat file\n); goto done; } havetabs++; break; /* * 'M' The maximum number of backends to expect follows. */ case 'M':
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanningGUCvariable
Jeff Davis wrote Well clusteredness is used or could be used for a few different heuristics, not all of which this would be quite as well satisfied as readahead. But for Can you give an example? Treating a file as a circular structure does not impose any significant cost that I can see. (Pure speculation follows... if you prefer facts, skip this noise) The data used to create pg_stats.correlation is involved in estimating the cost of an index scan. It could also be used in estimating the cost of a sequential scan, if the query includes a limit. Consider: select * from huge_table_clustered_by_A where Amost_As limit 1000 If the correlation for A is close to 1, a sequential scan should be cheaper than an index scan. (If the query also included an order by clause, the sequential scan would have to read the entire table to ensure it had found the top 1000, instead of any old 1000 returned in order) If A is a circular structure, you would have to know where it started, and include this info in the dump/restore (or lose A's correlation). Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Win32: Building with Longhorn SDK
When building with MSVC 2005 (Express) and the Longhorn Platform SDK (version 6.0a), it seems that IPPROTO_IPV6 is only defined when _WIN32_WINNT = 0x0501. This results in a compiler error when trying to build pqcomm.c (line 389). According to [1], building for Windows 2000 (that is, _WIN32_WINNT == 0x0500) is no longer supported with the Longhorn version of the Platform SDK. This isn't a huge problem (we can just require the use of prior versions of the SDK), but I thought I'd mention it for the archives. -Neil [1] http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1960499SiteID=1 ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Opinions about wording of error messages for bug #3883?
In the recent discussion of bug #3883 we decided that for safety's sake, TRUNCATE, CLUSTER, and REINDEX ought to error out if there are any active scans on the table (or index in the case of REINDEX). This is essentially the same as the test currently applied by ALTER TABLE, which uses this code: static void CheckTableNotInUse(Relation rel) { intexpected_refcnt; expected_refcnt = rel-rd_isnailed ? 2 : 1; if (rel-rd_refcnt != expected_refcnt) ereport(ERROR, (errcode(ERRCODE_OBJECT_IN_USE), errmsg(relation \%s\ is being used by active queries in this session, RelationGetRelationName(rel; if (AfterTriggerPendingOnRel(RelationGetRelid(rel))) ereport(ERROR, (errcode(ERRCODE_OBJECT_IN_USE), errmsg(cannot alter table \%s\ because it has pending trigger events, RelationGetRelationName(rel; } I would like to export this routine and have it be used by all four commands, instead of duplicating this logic everywhere. However, that brings up the question of whether the error messages are generic enough for all four commands; and if not, how we want them to read. I'm tempted to rephrase both messages along the line of cannot %s \%s\ because ... where the first %s is replaced by a SQL command name, viz ALTER TABLE, CLUSTER, etc. I'm not sure how nice this is for translation though. Also, with 8.3 release being so close, it's likely that any change would not get reflected into translations before release. I don't think that's a showstopper because these messages should hardly ever be seen by normal users anyway; but maybe it's a consideration. Comments, better ideas? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Opinions about wording of error messages for bug #3883?
Tom Lane wrote: I would like to export this routine and have it be used by all four commands, instead of duplicating this logic everywhere. However, that brings up the question of whether the error messages are generic enough for all four commands; and if not, how we want them to read. I'm tempted to rephrase both messages along the line of cannot %s \%s\ because ... where the first %s is replaced by a SQL command name, viz ALTER TABLE, CLUSTER, etc. I'm not sure how nice this is for translation though. I suggest cannot execute \%s\ on \%s\ because ... Also, with 8.3 release being so close, it's likely that any change would not get reflected into translations before release. I don't think that's a showstopper because these messages should hardly ever be seen by normal users anyway; but maybe it's a consideration. I wouldn't worry about that at this point. We didn't declare a string freeze anyway ... It will likely be fixed in 8.3.1 for translations where it matters anyway, if the translator is not able to do it for 8.3. (That's currently only fr, de and es -- currently even tr is a bit behind). -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Opinions about wording of error messages for bug #3883?
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: I'm tempted to rephrase both messages along the line of cannot %s \%s\ because ... where the first %s is replaced by a SQL command name, viz ALTER TABLE, CLUSTER, etc. I'm not sure how nice this is for translation though. I suggest cannot execute \%s\ on \%s\ because ... Hmm, why not just cannot execute %s \%s\ because ... ? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PATCHES] Proposed patch: synchronized_scanning GUCvariable
On Jan 29, 2008 8:09 PM, Tom Lane [EMAIL PROTECTED] wrote: Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: synchronize[d]_seqscan sounds a bit better in my ears than the plural synchronize_seqscans. The plural seems better to me; there's no such thing as a solitary synchronized scan, no? The whole point of the feature is to affect the behavior of multiple scans. +1. The plural is important IMHO. BTW, so far as the rest of the thread goes, I'm not necessarily opposed to exposing the switchover threshold as a tunable. But I think it needs more thought to design than we can give it in time for 8.3 (because of the interaction with the buffer access strategy stuff). +1. The current patch is simple and so far in the cycle, I really think we should keep it that way. The feature switch can be justified on grounds of backwards compatibility quite independently of whether pg_dump uses it. Or is someone prepared to argue that there are no applications out there that will be broken if the same query, against the same unchanging table, yields different results from one trial to the next? As I stated earlier, I don't really like this argument (we already broke badly designed applications a few times in the past) but we really need a way to guarantee that the execution of a query is stable and doesn't depend on external factors. And the original problem was to guarantee that pg_dump builds a dump as identical as possible to the existing data by ignoring external factors. It's now the case with your patch. The fact that it allows us not to break existing applications relying too much on physical ordering is a nice side effect though :). -- Guillaume ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 8.3RC1 on windows missing descriptive Event handle names
I said... On Windows XP, using Process Explorer with the lower pane showing Handles, not all postgres.exe processes are including an Event type with a description of what the process is doing. I've had difficulty reproducing this, but I now suspect that it is only happening when running both v8.2 and v8.3rc1 at once, and I think it is the second started that is missing the process descriptions. Regards, Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Possible future performance improvement: sort updates/deletes by ctid
We've had a couple of discussions recently revolving around the inefficiency of using hashjoin/hashaggregation output to update a target table, because of the resulting very random access pattern. I believe this same mechanism is underlying the slowness of Stephen Denne's alternate query described here: http://archives.postgresql.org/pgsql-performance/2008-01/msg00227.php I made up the attached doubtless-oversimplified test case to model what he was seeing. It's cut down about 4x from the table size he describes, but the UPDATE still takes forever --- I gave up waiting after 2 hours, when it had deleted about a fifth of its hashjoin temp files, suggesting that the total runtime would be about 10 hours. A brute force idea for fixing this is to sort the intended update or delete operations of an UPDATE/DELETE command according to the target table's ctid, which is available for free anyway since the executor top level must have it to perform the operation. I made up an even more brute force patch (also attached) that forces that to happen for every UPDATE or DELETE --- obviously we'd not want that for real, it's just for crude performance testing. With that patch, I got the results QUERY PLAN --- Sort (cost=6075623.03..6085623.05 rows=408 width=618) (actual time=2078726.637..3371944.124 rows=400 loops=1) Sort Key: df.ctid Sort Method: external merge Disk: 2478992kB - Hash Join (cost=123330.50..1207292.72 rows=408 width=618) (actual time=20186.510..721120.455 rows=400 loops=1) Hash Cond: (df.document_id = d.id) - Seq Scan on document_file df (cost=0.00..373334.08 rows=408 width=614) (actual time=11.775..439993.807 rows=400 loops=1) - Hash (cost=57702.00..57702.00 rows=4000200 width=8) (actual time=19575.885..19575.885 rows=400 loops=1) - Seq Scan on document d (cost=0.00..57702.00 rows=4000200 width=8) (actual time=0.039..14335.615 rows=400 loops=1) Total runtime: 3684037.097 ms or just over an hour runtime --- still not exactly speedy, but it certainly compares favorably to the estimated 10 hours for unsorted updates. This is with default shared_buffers (32MB) and work_mem (1MB); a more aggressive work_mem would have meant fewer hash batches and fewer sort runs and hence better performance in both cases, but with the majority of the runtime going into the sort step here, I think that the sorted update would benefit much more. Nowhere near a workable patch of course, but seems like food for thought. regards, tom lane drop table if exists document; drop table if exists document_file ; create table document (document_type_id int, id int primary key); create table document_file (document_type_id int, document_id int primary key, filler char(600)); insert into document_file select x,x,'z' from generate_series(1,400) x; insert into document select x,x from generate_series(1,400) x; analyze document_file; analyze document; set enable_mergejoin = false; explain analyze UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id; Index: src/backend/optimizer/prep/preptlist.c === RCS file: /cvsroot/pgsql/src/backend/optimizer/prep/preptlist.c,v retrieving revision 1.88 diff -c -r1.88 preptlist.c *** src/backend/optimizer/prep/preptlist.c 1 Jan 2008 19:45:50 - 1.88 --- src/backend/optimizer/prep/preptlist.c 30 Jan 2008 03:06:30 - *** *** 32,37 --- 32,38 #include optimizer/var.h #include parser/analyze.h #include parser/parsetree.h + #include parser/parse_clause.h #include parser/parse_coerce.h *** *** 103,108 --- 104,120 tlist = list_copy(tlist); tlist = lappend(tlist, tle); + + /* +* Force the query result to be sorted by CTID, for better update +* speed. (Note: we expect parse-sortClause to be NIL here, +* but this code will do no harm if it's not.) +*/ + parse-sortClause = addTargetToSortList(NULL, tle, + parse-sortClause, tlist, + SORTBY_DEFAULT, + SORTBY_NULLS_DEFAULT, + NIL, false);
Re: [pgtranslation-translators] [HACKERS] Opinions about wording of error messages for bug #3883?
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: I suggest cannot execute \%s\ on \%s\ because ... Hmm, why not just cannot execute %s \%s\ because ... ? Hmm, yeah, that seems fine too. Thinking more about it, from the POV of the translator probably the three forms are the same because he has all the elements to construct the phrase however he sees fit. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Possible future performance improvement: sort updates/deletes by ctid
From: Tom Lane [mailto:[EMAIL PROTECTED] doubtless-oversimplified It looks equivalent. With that patch, I got the results ... - Hash Join (cost=123330.50..1207292.72 rows=408 width=618) (actual time=20186.510..721120.455 rows=400 loops=1) The plan from here is equivalent to the query plan that I had. In an update query, does the actual time = 721120 mean that after 12 minutes it had completed figuring out what to update, and what to? This is with default shared_buffers (32MB) and work_mem (1MB); I had tried a few larger settings, and though I had fewer temp files created, they still took longer than I was willing to wait to process. I did figure out that contention with the background writer or checkpoint processing probably wasn't a large contributor. How hard is it to match, recognise potential benefit, and rewrite the query from UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id; to UPDATE ONLY document_file AS df SET document_type_id = (SELECT d.document_type_id FROM document AS d WHERE d.id = document_id); Which is several orders of magnitude faster for me. Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Possible future performance improvement: sort updates/deletes by ctid
Stephen Denne [EMAIL PROTECTED] writes: How hard is it to match, recognise potential benefit, and rewrite the query from UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id; to UPDATE ONLY document_file AS df SET document_type_id = (SELECT d.document_type_id FROM document AS d WHERE d.id = document_id); Which is several orders of magnitude faster for me. At the planner level that would be entirely the wrong way to go about it, because that's forcing the equivalent of a nestloop join, which is very unlikely to be faster for the numbers of rows that we're talking about here. The reason it looks faster to you is that the benefits of updating the document_file rows in ctid order outweigh the costs of the dumb join strategy ... but what we want to achieve here is to have both benefits, or at least to give the planner the opportunity to make a cost-driven decision about what to do. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Possible future performance improvement: sort updates/deletes by ctid
At the planner level that would be entirely the wrong way to go about it, because that's forcing the equivalent of a nestloop join, which is very unlikely to be faster for the numbers of rows that we're talking about here. The reason it looks faster to you is that the benefits of updating the document_file rows in ctid order outweigh the costs of the dumb join strategy ... but what we want to achieve here is to have both benefits, or at least to give the planner the opportunity to make a cost-driven decision about what to do. Ok. Here are some more data points, using a smaller table, v8.2.6: Seq Scan on document_file df (cost=0.00..208480.85 rows=25101 width=662) (actual time=0.239..773.834 rows=25149 loops=1) SubPlan - Index Scan using pk_document_id on document d (cost=0.00..8.27 rows=1 width=4) (actual time=0.011..0.015 rows=1 loops=25149) Index Cond: (id = $0) Total runtime: 4492.363 ms vs Hash Join (cost=1048.85..6539.32 rows=25149 width=666) (actual time=575.079..1408.363 rows=25149 loops=1) Hash Cond: (df.document_id = d.id) - Seq Scan on document_file df (cost=0.00..4987.49 rows=25149 width=662) (actual time=60.724..824.195 rows=25149 loops=1) - Hash (cost=734.49..734.49 rows=25149 width=8) (actual time=40.271..40.271 rows=25149 loops=1) - Seq Scan on document d (cost=0.00..734.49 rows=25149 width=8) (actual time=0.055..22.559 rows=25149 loops=1) Total runtime: 34961.504 ms These are fairly repeatable for me after doing a vacuum full analyze of the two tables. Have I simply not tuned postgres so that it knows it has everything on a single old IDE drive, not split over a few sets of raided SSD drives, hence random_page_cost should perhaps be larger than 4.0? Would that make the second estimate larger than the first estimate? Stephen Denne. Disclaimer: At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/services/bqem.htm for details. __ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Will PostgreSQL get ported to CUDA?
http://www.scientificcomputing.com/ShowPR~PUBCODE~030~ACCT~300100~IS SUE~0801~RELTYPE~HPCC~PRODCODE~~PRODLETT~C.html http://www.nvidia.com/object/cuda_learn.html http://www.nvidia.com/object/cuda_get.html