Re: [HACKERS] pg_stop_backup wait bug fix
Fujii Masao wrote: On Wed, Dec 3, 2008 at 5:13 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Agreed, should use XLByteToPrevSeg. But I wonder if we can just replace the current XLByteToSeg call with XLByteToPrevSeg? That would offset the return value of the function by one byte as well, as well as the value printed to the backup history file. In fact, I think the original patch got that wrong; it would return the location of the *beginning* of the last xlog file. You're right. As you say, the value (stopxlogfilename) printed to the backup history file is wrong. But, since the value is not used fortunately, any troubles have not come up. So, I think that we can just replace them. Changing the return value doesn't seem like a good idea. If nothing else, it would be complicated to explain what it returns. I committed a patch that changes the waiting behavior, but not the return value or what's written into the backup label file, I also noticed that the 2nd BackupHistoryFileName call in that function is useless; histfilepath variable is already filled in earlier. Somewhat confusingly, BackupHistoryFileName is called only once. Isn't 1st (which probably you thought) BackupHistoryFilePath? Ouch, you're right. That's subtle. In order to prevent confusion, we should add new local variable (histfilename) for the backup history file name? Agreed. I included that in the patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions and temp tables
Emmanuel Cecchet wrote: I think that the Assert in is_temp_rel(Oid) in tablecmds.c should be replaced by if (on_commits == NULL) return false; As the use case below shows, a regular table can be created and hold a LOCKTAG_RELATION lock that will trigger the call to is_temp_rel in is_preparable_locktag. The assert will break if no temp table was accessed. Yes, you're right. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions and temp tables
Emmanuel Cecchet wrote: There is a problem with temp tables with on delete rows that are created inside a transaction. Take the 2pc_on_delete_rows_transaction.sql test case and change the creation statement, instead of create temp table foo(x int) on commit delete rows; try create temp table foo(x serial primary key) on commit delete rows; The test will fail. It looks like the onCommit field is not properly updated when serial or primary key is used in that context. I did not figure out why. A serial column uses a sequence behind the scenes. Hmm. Seems like we would need to treat sequences and indexes the same as tables with ON COMMIT DELETE ROWS, i.e release the locks early and don't error out. All in all, this is getting pretty messy. My patch felt a bit hackish to begin with, and having to add special cases for sequences and indexes would make it even more so. And what about temporary views? I'm starting to feel that instead of special-casing temp relations, we need to move into the opposite direction and make temp relations more like regular relations. Unfortunately, that's not going to happen in the 8.4 timeframe :-(. Let's try the other approach in 8.5. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stop_backup wait bug fix
Hi, I have looked at the patch and it looks OK to me. BTW I am not too much familiar with this area of code, so I am not at the position to argue that patch -:) . I haven't found an easy way to test the patch. On Wed, Dec 3, 2008 at 1:24 PM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Fujii Masao wrote: On Wed, Dec 3, 2008 at 5:13 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Agreed, should use XLByteToPrevSeg. But I wonder if we can just replace the current XLByteToSeg call with XLByteToPrevSeg? That would offset the return value of the function by one byte as well, as well as the value printed to the backup history file. In fact, I think the original patch got that wrong; it would return the location of the *beginning* of the last xlog file. You're right. As you say, the value (stopxlogfilename) printed to the backup history file is wrong. But, since the value is not used fortunately, any troubles have not come up. So, I think that we can just replace them. Changing the return value doesn't seem like a good idea. If nothing else, it would be complicated to explain what it returns. I committed a patch that changes the waiting behavior, but not the return value or what's written into the backup label file, I also noticed that the 2nd BackupHistoryFileName call in that function is useless; histfilepath variable is already filled in earlier. Somewhat confusingly, BackupHistoryFileName is called only once. Isn't 1st (which probably you thought) BackupHistoryFilePath? Ouch, you're right. That's subtle. In order to prevent confusion, we should add new local variable (histfilename) for the backup history file name? Agreed. I included that in the patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Ibrar Ahmed EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] GIN improvements
On 3 Dec 2008, at 06:57 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: If *that* is a use case we're interested in, the incoming tuples could be accumulated in backend-private memory, and inserted into the index at commit. That would be a lot simpler, with no need to worry about concurrent inserts or vacuums. Doesn't work --- the index would yield wrong answers for later queries in the same transaction. Queries would still need to check the backend-private list. More to the point -- at least if I'm guessing right about tom's thoughts --queries would still have to check the heap. That is the backend private list would just be a proxy for buffered *index* tuples. If we do this though it would be really nice to do it at a higher level than the indexam. If we could do it for any indexam that provides a kind of bulk insert method that would be great. I'm just not sure how to support all the indexable operators for the various indexams on the local buffered list. Incidentally buffering btree index inserts was originally Heikki's idea. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_all_tables vs NULLs
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: I've noticed that pg_stat_all_tables returns NULL for idx_scan and idx_tup_fetch if there are no indexes present on a table. Is this actually intended, or is that something that should be fixed? Hmm. I suspect it's an implementation artifact rather than something that was consciously chosen, but on reflection it doesn't seem like a bad thing. If we just COALESCE'd it to zero (which I assume is what you have in mind) then there would be no distinction in the view between you have no indexes and there are indexes but they aren't being used. But does it make sense to look for that information in pg_stat_*_tables, really? If you want to know if an index exists for a table, you'd normally go look in the system tables, not the statistics views, I think. I'd vote to leave it alone, I think. I can go for that as well though. I'd say Let's document it instead then, but it seems the stats views documentation is very short on what actually goes in the fields. But I guess we could just add a (NULL if no indexes are present) to that? In the long term it might be worthwhile to rewrite that section of the docs to focus more on the stats views (giving each it's own section with more information bout it than just a list of fields) and less on the underlying implementation functions. But that's a different day ;-) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance memory vs autovac
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Greg Stark wrote: One concern I have about this is people asking how come when I runvacuum manually it takes x minutes but when autovacuum runs it it tale 5x minutes? As long as the default is the same, people would get at least an initial clue that it might have something to do with them changing a configuration parameter... It seems like mostly a confusion-generator to me. Is there any actual evidence that autovac should use a different maintenance_work_mem than other processes? The use-case that made me think of that is one with lots of autovac workers in a system with lots of small tables in different databases. Turns out I read the documentation for autovac wrong. I understood that if I wanted it to look at 1000 databases at once, I needed autovac_workers at 1000. Talked a bit offlist with Alvaro and realized that's not what it is, but that the documentation is a bit unclear on that - will work on fixing that. Which means there's probably no real use-case for lots of autovac workers that each needs only a little maint_work_mem, in which case having such an extra parameter would become unnecessary. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance memory vs autovac
Guillaume Smet wrote: On Wed, Dec 3, 2008 at 2:00 AM, Tom Lane [EMAIL PROTECTED] wrote: It seems like mostly a confusion-generator to me. Is there any actual evidence that autovac should use a different maintenance_work_mem than other processes? IMHO, the point is that we were used to consider the maintenance_work_mem as a one process at a time thing. Even if it's not really true, we usually didn't do maintenance task on a concurrent basis. The autovacuum workers change that and make it a default behaviour (as we can have 3*maintenance_work_mem by default). It's still one per process, it's just that autovac uses more than one process. It's probably worthwhile to add a note about the effects of autovacuum around the documentation of maintenance_work_mem, though. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Erroring out on parser conflicts
On 3 Dec 2008, at 03:32 AM, Tom Lane [EMAIL PROTECTED] wrote: Bruce Momjian [EMAIL PROTECTED] writes: FYI, this is going to make it hard for developers to test CVS changes until they get their grammar cleaned up; perhaps add a comment on how to disable the check? Well, the point is that their grammar changes are broken if that check fails, so I'm not sure what the value of testing a known-incorrect grammar might be. It wouldn't necessarily act the same after being fixed. Well surely the c code the parser invokes will behave the same. A lot of c hackers are not bison grammar hackers. Even many of us former bison grammar hackers are way rusty. There have been a number of times when someone has posted an otherwise working patch with a grammar conflict you fixed Bruce surely nobody would object if you posted a path to add a comment. People would of course quibble with the wording but that's just par for the course. Perhaps something like postgres jas a policy of maintaining zero parser conflicts. If you disable this for testing make sure you re- enable it and eliminate any conflicts. Or post to -hackers asking for advice I'm not sure where to put a comment pointing them to the %expected line though. What does the error look like if they violate it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
Hiroshi Inoue wrote: I think the thing us that as long as the encodings are compatible (latin1 with different names for example) it worked fine. In any case I think the problem is that gettext is looking at a setting that is not what we are looking at. Particularly with the 8.4 changes to allow per-database locale settings, this has got to be fixed in a bulletproof way. Attached is a new patch to apply bind_textdomain_codeset() to most server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL and PG_EUC_JIS_2004. EUC-JP may be OK for EUC_JIS_2004. Unfortunately it's hard for Saito-san and me to check encodings other than EUC-JP. In principle this looks good, I think, but I'm a bit worried around the lack of testing. I can do some testing under LATIN1 which is what we use in Sweden (just need to get gettext working *at all* in my dev environment again - I've somehow managed to break it), and perhaps we can find someone to do a test in an eastern-european locale to get some more datapoints? Can you outline the steps one needs to go through to show the problem, so we can confirm it's fixed in these locales? //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance memory vs autovac
On Wed, Dec 3, 2008 at 10:49 AM, Magnus Hagander [EMAIL PROTECTED] wrote: The autovacuum workers change that and make it a default behaviour (as we can have 3*maintenance_work_mem by default). It's still one per process, it's just that autovac uses more than one process. I agree. What I implied is that by default you have 3 autovacuum workers so the behaviour has changed, even if it didn't change in a technical way. It's probably worthwhile to add a note about the effects of autovacuum around the documentation of maintenance_work_mem, though. +1 A lot of people set maintenance_work_mem quite high because of the old behaviour. -- Guillaume -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance memory vs autovac
Guillaume Smet [EMAIL PROTECTED] writes: On Wed, Dec 3, 2008 at 10:49 AM, Magnus Hagander [EMAIL PROTECTED] wrote: It's probably worthwhile to add a note about the effects of autovacuum around the documentation of maintenance_work_mem, though. +1 A lot of people set maintenance_work_mem quite high because of the old behaviour. The high level view is that an admin will expect to be able to allocate all the memory on his machine (at least all the memory he expects Postgres to use) as something like: shared_buffers and sundry shared mem + max_connections*work_mem + maintenance_work_mem + filesystem cache (Yes, max_connections isn't quite right there but that's the general idea) If you have 1G of ram and allocate 200M of shared buffers, 1M of work_mem of which you don't expect more than a hundred concurrent allocations, and want about half your ram set aside for filesystem cache you would be quite reasonable to expect to have about 256M to play with for maintenance_work_me -- which in my experience is a nice value (lower than that is noticeably slower and greater has little effect on sorting data sets I've seen). But if you set things up that way you could end up with three autovacuum daemons running with 256M allocated each on a 1G machine. That's pretty frightening, especially with a 200M shared buffers. We definitely need at the very least a prominent warning in the maintenance_work_mem documentation. Users can always raise it for manually run commands if they're sure they're only running one at a time. But all of this isn't a new issue is it? I thought we've had multiple autovacuum workers since 8.3. Have there been any complaints? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance memory vs autovac
Gregory Stark wrote: Guillaume Smet [EMAIL PROTECTED] writes: On Wed, Dec 3, 2008 at 10:49 AM, Magnus Hagander [EMAIL PROTECTED] wrote: It's probably worthwhile to add a note about the effects of autovacuum around the documentation of maintenance_work_mem, though. +1 A lot of people set maintenance_work_mem quite high because of the old behaviour. snip We definitely need at the very least a prominent warning in the maintenance_work_mem documentation. Users can always raise it for manually run commands if they're sure they're only running one at a time. Yeah. But all of this isn't a new issue is it? I thought we've had multiple autovacuum workers since 8.3. Have there been any complaints? Yes, that's why I brought it up. Haven't seen complaints on-list, but have heard a couple from customers off-list. Not necessarily so much complaints as what does this mean, but questions nevertheless. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] snapshot leak and core dump with serializable transactions
The following test flashes snapshot leak warning and subsequently dumps core. Though this looks very similar to other bug report, this is a different issue. postgres=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE ; BEGIN postgres=# SAVEPOINT A; SAVEPOINT postgres=# SELECT count(*) from pg_class; count --- 227 (1 row) postgres=# RELEASE SAVEPOINT A; WARNING: Snapshot reference leak: Snapshot 0x9e3e4d4 still referenced RELEASE postgres=# SELECT count(*) from pg_class; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. ! I looked at this briefly and ISTM that there are couple of issues here: 1. Since SAVEPOINT A is the first statement in the transaction, a subtransaction is started and CurrentResourceOwner is set to the resource owner of the subtransaction. Later when serializable snapshot is taken, its recorded in the subtransaction resource owner. Obviously, when the subtransaction commits, it complains about the snapshot leak because the serializable snapshot is not yet unregistered. So I tried to ensure that the serializable snapshot is always recorded in the TopTransactionResourceOwner. It solved the above issue, but there is still a core dump when the top transaction is committed. That leads to the second issue. 2. In CommitTransaction(), I think we should call AtEOXact_Snapshot *before* releasing the resource owners. Otherwise, ResourceOwnerReleaseInternal complains about snapshot leak and then forcefully unregisters the snapshot. Later when AtEOXact_Snapshot is called, it again tries to unregister the serializable snapshot and assertion fails. The attached patch fixes these issues. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/xact.c === RCS file: /repositories/postgreshome/cvs/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.269 diff -c -p -r1.269 xact.c *** src/backend/access/transam/xact.c 19 Nov 2008 10:34:50 - 1.269 --- src/backend/access/transam/xact.c 3 Dec 2008 12:47:35 - *** CommitTransaction(void) *** 1685,1690 --- 1685,1691 smgrDoPendingDeletes(true); AtEOXact_MultiXact(); + AtEOXact_Snapshot(true); ResourceOwnerRelease(TopTransactionResourceOwner, RESOURCE_RELEASE_LOCKS, *** CommitTransaction(void) *** 1706,1712 AtEOXact_ComboCid(); AtEOXact_HashTables(true); AtEOXact_PgStat(true); - AtEOXact_Snapshot(true); pgstat_report_xact_timestamp(0); CurrentResourceOwner = NULL; --- 1707,1712 Index: src/backend/utils/time/snapmgr.c === RCS file: /repositories/postgreshome/cvs/pgsql/src/backend/utils/time/snapmgr.c,v retrieving revision 1.7 diff -c -p -r1.7 snapmgr.c *** src/backend/utils/time/snapmgr.c 25 Nov 2008 20:28:29 - 1.7 --- src/backend/utils/time/snapmgr.c 3 Dec 2008 12:47:36 - *** GetTransactionSnapshot(void) *** 136,142 --- 136,145 */ if (IsXactIsoLevelSerializable) { + ResourceOwner oldowner = CurrentResourceOwner; + CurrentResourceOwner = TopTransactionResourceOwner; CurrentSnapshot = RegisterSnapshot(CurrentSnapshot); + CurrentResourceOwner = oldowner; registered_serializable = true; } *** AtEOXact_Snapshot(bool isCommit) *** 480,486 --- 483,494 * refcount to the serializable snapshot. */ if (registered_serializable) + { + ResourceOwner oldowner = CurrentResourceOwner; + CurrentResourceOwner = TopTransactionResourceOwner; UnregisterSnapshot(CurrentSnapshot); + CurrentResourceOwner = oldowner; + } if (RegisteredSnapshots != 0) elog(WARNING, %d registered snapshots seem to remain after cleanup, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance memory vs autovac
Guillaume Smet wrote: On Wed, Dec 3, 2008 at 10:49 AM, Magnus Hagander [EMAIL PROTECTED] wrote: The autovacuum workers change that and make it a default behaviour (as we can have 3*maintenance_work_mem by default). It's still one per process, it's just that autovac uses more than one process. I agree. What I implied is that by default you have 3 autovacuum workers so the behaviour has changed, even if it didn't change in a technical way. It's probably worthwhile to add a note about the effects of autovacuum around the documentation of maintenance_work_mem, though. +1 A lot of people set maintenance_work_mem quite high because of the old behaviour. How about something as simple as this? //Magnus *** doc/src/sgml/config.sgml --- doc/src/sgml/config.sgml *** *** 881,886 SET ENABLE_SEQSCAN TO OFF; --- 881,891 than varnamework_mem/varname. Larger settings might improve performance for vacuuming and for restoring database dumps. /para +para + Note that when autovacuum runs, up to + xref linkend=guc-autovacuum-max-workers times this memory may be + allocated, so be careful not to set the default value too high. +/para /listitem /varlistentry -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Hi, On Tue, Dec 2, 2008 at 10:09 PM, Simon Riggs [EMAIL PROTECTED] wrote: On Tue, 2008-12-02 at 21:37 +0900, Fujii Masao wrote: Thanks for taking many hours to review the code!! On Mon, Dec 1, 2008 at 8:42 PM, Simon Riggs [EMAIL PROTECTED] wrote: Can you confirm that all the docs on the Wiki page are up to date? There are a few minor discrepancies that make me think it isn't. Documentation is ongoing. Sorry for my slow progress. BTW, I'm going to add and change the sgml files listed on wiki. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Documentation_Plan I'm patient, I know it takes time. Happy to spend hours on the review, but I want to do that knowing I agree with the higher level features and architecture first. Since I thought that the figure was more intelligible for some people than my poor English, I illustrated the architecture first. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Detailed_Design Are there any other parts which should be illustrated for review? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_get_keywords descriptions
= select distinct catcode, catdesc from pg_get_keywords(); catcode |catdesc -+--- C | Column name T | Type or function name R | Reserved U | Unreserved I find the descriptions of C and T quite confusing. For example, saying that authorization is a type or function name (T) is somewhat bogus. In http://developer.postgresql.org/pgdocs/postgres/sql-keywords-appendix.html, the terms C = non-reserved (cannot be function or type) T = reserved (can be function or type) are used. Should we use these here as well (possibly adding name)? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot leak and core dump with serializable transactions
Pavan Deolasee escribió: 2. In CommitTransaction(), I think we should call AtEOXact_Snapshot *before* releasing the resource owners. Otherwise, ResourceOwnerReleaseInternal complains about snapshot leak and then forcefully unregisters the snapshot. Later when AtEOXact_Snapshot is called, it again tries to unregister the serializable snapshot and assertion fails. Hmm, I've been wondering if we can get away with not having AtEOXact_Snapshot at all. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance memory vs autovac
Magnus Hagander wrote: Tom Lane wrote: It seems like mostly a confusion-generator to me. Is there any actual evidence that autovac should use a different maintenance_work_mem than other processes? The use-case that made me think of that is one with lots of autovac workers in a system with lots of small tables in different databases. Another thing to consider here is that lazy vacuum will scale down its memory usage depending on table size. Turns out I read the documentation for autovac wrong. I understood that if I wanted it to look at 1000 databases at once, I needed autovac_workers at 1000. Talked a bit offlist with Alvaro and realized that's not what it is, but that the documentation is a bit unclear on that - will work on fixing that. Yeah, Rob Treat has also asked me twice about this, so it's probably worth rewriting. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cvs head initdb hangs on unixware
On Tue, 2 Dec 2008, Heikki Linnakangas wrote: Date: Tue, 02 Dec 2008 20:47:19 +0200 From: Heikki Linnakangas [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Zdenek Kotala [EMAIL PROTECTED], pgsql-hackers list pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cvs head initdb hangs on unixware [EMAIL PROTECTED] wrote: Suivi de pile correspondant à p1, Programme postmaster *[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97] [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1) [0x81e68d9] [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416, 0xb4) [0x81e6385] [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00] [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59] [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042] [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) [0x8097297] [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210] [8] boot_yyparse( présumé: 0x, 0x3, 0x8047ab8) [0x80c822b] [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4) [0x80ca233] [10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b] [11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce] [12] _start() [0x807ff96] seems interesting! We've had problems already with unixware optimizer, hope this one is fixable! Looking at fsm_rebuild_page, I wonder if the compiler is treating int as an unsigned integer? That would cause an infinite loop. No, a simple printf of nodeno shows it starting at 4096 all the way down to 0, starting back at 4096... I wonder if leftchild/rightchild definitions has something to do with it... -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 15, Chemin des Monges+33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map, partial vacuums
Heikki Linnakangas wrote: Here's an updated version, with a lot of smaller cleanups, and using relcache invalidation to notify other backends when the visibility map fork is extended. I already committed the change to FSM to do the same. I'm feeling quite satisfied to commit this patch early next week. Committed. I haven't done any doc changes for this yet. I think a short section in the database internal storage chapter is probably in order, and the fact that plain VACUUM skips pages should be mentioned somewhere. I'll skim through references to vacuum and see what needs to be changed. Hmm. It just occurred to me that I think this circumvented the anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM FREEZE does that already, but it's unnecessarily aggressive in freezing. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] libpq does not manage SSL callbacks properly when other libraries are involved.
Bruce Momjian wrote: Bruce Momjian wrote: Thanks for the review, Magnus. I have adjusted the patch to use the same mutex every time the counter is accessed, and adjusted the pqsecure_destroy() call to properly decrement in the right place. Also, I renamed the libpq global destroy function to be clearer (the function is not exported). Here is an updated version of the patch to match CVS HEAD. I've updated it to match what's CVS HEAD now, and made some minor modifications. Renamed destroySSL() to make it consistent with initializeSSL(). Added and changed some comments. ssldiff.patch contains my changes against Bruce's patch. I also removed the #ifdef NOT_USED parts. They are in CVS history if we need them, and they're trivial things anyway, so I think this is much cleaner. With this, it looks fine to me. Especially since we've seen some testing from the PHP folks already. //Magnus *** src/backend/libpq/be-secure.c --- src/backend/libpq/be-secure.c *** *** 88,94 static DH *tmp_dh_cb(SSL *s, int is_export, int keylength); static int verify_cb(int, X509_STORE_CTX *); static void info_cb(const SSL *ssl, int type, int args); static void initialize_SSL(void); - static void destroy_SSL(void); static int open_server_SSL(Port *); static void close_SSL(Port *); static const char *SSLerrmessage(void); --- 88,93 *** *** 193,209 secure_initialize(void) } /* - * Destroy global context - */ - void - secure_destroy(void) - { - #ifdef USE_SSL - destroy_SSL(); - #endif - } - - /* * Indicate if we have loaded the root CA store to verify certificates */ bool --- 192,197 *** *** 844,862 initialize_SSL(void) } /* - * Destroy global SSL context. - */ - static void - destroy_SSL(void) - { - if (SSL_context) - { - SSL_CTX_free(SSL_context); - SSL_context = NULL; - } - } - - /* * Attempt to negotiate SSL connection. */ static int --- 832,837 *** src/interfaces/libpq/fe-secure.c --- src/interfaces/libpq/fe-secure.c *** *** 44,49 --- 44,50 #endif #include arpa/inet.h #endif + #include sys/stat.h #ifdef ENABLE_THREAD_SAFETY *** *** 89,108 static bool verify_peer_name_matches_certificate(PGconn *); static int verify_cb(int ok, X509_STORE_CTX *ctx); static int client_cert_cb(SSL *, X509 **, EVP_PKEY **); static int init_ssl_system(PGconn *conn); static int initialize_SSL(PGconn *); ! static void destroy_SSL(void); static PostgresPollingStatusType open_client_SSL(PGconn *); static void close_SSL(PGconn *); static char *SSLerrmessage(void); static void SSLerrfree(char *buf); - #endif - #ifdef USE_SSL static bool pq_initssllib = true; - static SSL_CTX *SSL_context = NULL; #endif /* * Macros to handle disabling and then restoring the state of SIGPIPE handling. * Note that DISABLE_SIGPIPE() must appear at the start of a block. --- 90,121 static int verify_cb(int ok, X509_STORE_CTX *ctx); static int client_cert_cb(SSL *, X509 **, EVP_PKEY **); static int init_ssl_system(PGconn *conn); + static void destroy_ssl_system(void); static int initialize_SSL(PGconn *); ! static void destroySSL(void); static PostgresPollingStatusType open_client_SSL(PGconn *); static void close_SSL(PGconn *); static char *SSLerrmessage(void); static void SSLerrfree(char *buf); static bool pq_initssllib = true; static SSL_CTX *SSL_context = NULL; + + #ifdef ENABLE_THREAD_SAFETY + static int ssl_open_connections = 0; + + #ifndef WIN32 + static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER; + #else + static pthread_mutex_t ssl_config_mutex = NULL; + static long win32_ssl_create_mutex = 0; #endif + #endif /* ENABLE_THREAD_SAFETY */ + + #endif /* SSL */ + + /* * Macros to handle disabling and then restoring the state of SIGPIPE handling. * Note that DISABLE_SIGPIPE() must appear at the start of a block. *** *** 186,192 void pqsecure_destroy(void) { #ifdef USE_SSL ! destroy_SSL(); #endif } --- 199,205 pqsecure_destroy(void) { #ifdef USE_SSL ! destroySSL(); #endif } *** *** 734,739 client_cert_cb(SSL *ssl, X509 **x509, EVP_PKEY **pkey) --- 747,755 } #ifdef ENABLE_THREAD_SAFETY + /* + * Callback functions for OpenSSL internal locking + */ static unsigned long pq_threadidcallback(void) *** *** 765,818 pq_lockingcallback(int mode, int n, const char *file, int line) #endif /* ENABLE_THREAD_SAFETY */ /* ! * Also see similar code in fe-connect.c, default_threadlock() */ static int init_ssl_system(PGconn *conn) { #ifdef ENABLE_THREAD_SAFETY ! #ifndef WIN32 ! static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER; ! #else ! static pthread_mutex_t init_mutex = NULL; ! static long mutex_initlock = 0; ! ! if (init_mutex == NULL) { ! while
Re: [HACKERS] Visibility map, partial vacuums
Heikki Linnakangas [EMAIL PROTECTED] writes: Hmm. It just occurred to me that I think this circumvented the anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM FREEZE does that already, but it's unnecessarily aggressive in freezing. Having seen how the anti-wraparound vacuums work in the field I think merely replacing it with a regular vacuum which covers the whole table will not actually work well. What will happen is that, because nothing else is advancing the relfrozenxid, the age of the relfrozenxid for all tables will advance until they all hit autovacuum_max_freeze_age. Quite often all the tables were created around the same time so they will all hit autovacuum_max_freeze_age at the same time. So a database which was operating fine and receiving regular vacuums at a reasonable pace will suddenly be hit by vacuums for every table all at the same time, 3 at a time. If you don't have vacuum_cost_delay set that will cause a major issue. Even if you do have vacuum_cost_delay set it will prevent the small busy tables from getting vacuumed regularly due to the backlog in anti-wraparound vacuums. Worse, vacuum will set the freeze_xid to nearly the same value for all of the tables. So it will all happen again in another 100M transactions. And again in another 100M transactions, and again... I think there are several things which need to happen here. 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just means unnecessary full table vacuums long before they accomplish anything. 2) Include a factor which spreads out the anti-wraparound freezes in the autovacuum launcher. Some ideas: . we could implicitly add random(vacuum_freeze_min_age) to the autovacuum_max_freeze_age. That would spread them out evenly over 100M transactions. . we could check if another anti-wraparound vacuum is still running and implicitly add a vacuum_freeze_min_age penalty to the autovacuum_max_freeze_age for each running anti-wraparound vacuum. That would spread them out without being introducing non-determinism which seems better. . we could leave autovacuum_max_freeze_age and instead pick a semi-random vacuum_freeze_min_age. This would mean the first set of anti-wraparound vacuums would still be synchronized but subsequent ones might be spread out somewhat. There's not as much room to randomize this though and it would affect how much i/o vacuum did which makes it seem less palatable to me. 3) I also think we need to put a clamp on the vacuum_cost_delay. Too many people are setting it to unreasonably high values which results in their vacuums never completing. Actually I think what we should do is junk all the existing parameters and replace it with a vacuum_nice_level or vacuum_bandwidth_cap from which we calculate the cost_limit and hide all the other parameters as internal parameters. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map, partial vacuums
Heikki Linnakangas wrote: Hmm. It just occurred to me that I think this circumvented the anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM FREEZE does that already, but it's unnecessarily aggressive in freezing. Heh :-) Yes, this should be handled sanely, without having to invoke FREEZE. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map, partial vacuums
Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Hmm. It just occurred to me that I think this circumvented the anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM FREEZE does that already, but it's unnecessarily aggressive in freezing. Having seen how the anti-wraparound vacuums work in the field I think merely replacing it with a regular vacuum which covers the whole table will not actually work well. What will happen is that, because nothing else is advancing the relfrozenxid, the age of the relfrozenxid for all tables will advance until they all hit autovacuum_max_freeze_age. Quite often all the tables were created around the same time so they will all hit autovacuum_max_freeze_age at the same time. So a database which was operating fine and receiving regular vacuums at a reasonable pace will suddenly be hit by vacuums for every table all at the same time, 3 at a time. If you don't have vacuum_cost_delay set that will cause a major issue. Even if you do have vacuum_cost_delay set it will prevent the small busy tables from getting vacuumed regularly due to the backlog in anti-wraparound vacuums. Worse, vacuum will set the freeze_xid to nearly the same value for all of the tables. So it will all happen again in another 100M transactions. And again in another 100M transactions, and again... I think there are several things which need to happen here. 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just means unnecessary full table vacuums long before they accomplish anything. 2) Include a factor which spreads out the anti-wraparound freezes in the autovacuum launcher. Some ideas: . we could implicitly add random(vacuum_freeze_min_age) to the autovacuum_max_freeze_age. That would spread them out evenly over 100M transactions. . we could check if another anti-wraparound vacuum is still running and implicitly add a vacuum_freeze_min_age penalty to the autovacuum_max_freeze_age for each running anti-wraparound vacuum. That would spread them out without being introducing non-determinism which seems better. . we could leave autovacuum_max_freeze_age and instead pick a semi-random vacuum_freeze_min_age. This would mean the first set of anti-wraparound vacuums would still be synchronized but subsequent ones might be spread out somewhat. There's not as much room to randomize this though and it would affect how much i/o vacuum did which makes it seem less palatable to me. How about a way to say that only one (or a config parameter for n) of the autovac workers can be used for anti-wraparound vacuum? Then the other slots would still be available for the small-but-frequently-updated tables. 3) I also think we need to put a clamp on the vacuum_cost_delay. Too many people are setting it to unreasonably high values which results in their vacuums never completing. Actually I think what we should do is junk all the existing parameters and replace it with a vacuum_nice_level or vacuum_bandwidth_cap from which we calculate the cost_limit and hide all the other parameters as internal parameters. It would certainly be helpful if it was just a single parameter - the arbitraryness of the parameters there now make them pretty hard to set properly - or at least easy to set wrong. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot leak and core dump with serializable transactions
Pavan Deolasee [EMAIL PROTECTED] writes: 2. In CommitTransaction(), I think we should call AtEOXact_Snapshot *before* releasing the resource owners. That's absolutely wrong. It'll complain about whatever snapshots the owners still hold. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] tuplestore potential performance problem
While attacking this issue(*1), I found that tuplestore that is on the file status has potential performance problem. The performance problem introduced by Heikki's new approach was caused by BufFile's frequent flush out in such cases like you put a new row into it and read middle row of it then put another row again, and so on. When tuplestore switches its internal mode from TSS_WRITEFILE to TSS_READFILE, underlying BufFile seeks to read pointer and flushes out its dirty buffer if the reading pointer is not near the writing pointer. Also, reading to writing switch avoids OS disk cache benefit. This is not critical in TSS_INMEM. So I decided to keep writing until finish if the tuplestore gets in file mode from memory mode rather than switching reading and writing randomly, which recovers the earlier performance almost. I am not sure but am afraid that the nodeCtescan also uses similar logic. Doesn't CTE have any problem for large data set? Regards, *1:http://archives.postgresql.org/pgsql-hackers/2008-12/msg00077.php -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplestore potential performance problem
Hitoshi Harada [EMAIL PROTECTED] writes: While attacking this issue(*1), I found that tuplestore that is on the file status has potential performance problem. The performance problem introduced by Heikki's new approach was caused by BufFile's frequent flush out in such cases like you put a new row into it and read middle row of it then put another row again, and so on. When tuplestore switches its internal mode from TSS_WRITEFILE to TSS_READFILE, underlying BufFile seeks to read pointer and flushes out its dirty buffer if the reading pointer is not near the writing pointer. Also, reading to writing switch avoids OS disk cache benefit. This is not critical in TSS_INMEM. So I decided to keep writing until finish if the tuplestore gets in file mode from memory mode rather than switching reading and writing randomly, which recovers the earlier performance almost. I am not sure but am afraid that the nodeCtescan also uses similar logic. Doesn't CTE have any problem for large data set? If this means a lot of contortion/complication in the upper-level code, seems like it'd be better to address the performance issue within tuplestore/buffile. We could keep separate buffers for write and read perhaps. But do you have real evidence of a performance problem? I'd sort of expect the kernel's disk cache to mitigate this pretty well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
On Wed, 2008-12-03 at 21:37 +0900, Fujii Masao wrote: Since I thought that the figure was more intelligible for some people than my poor English, I illustrated the architecture first. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Detailed_Design Are there any other parts which should be illustrated for review? Those are very useful, thanks. Some questions to check my understanding (expected answers in brackets) * Diagram on p.2 has two Archives. We have just one (yes) * We send data continuously, whether or not we are in sync/async? (yes) So the only difference between sync/async is whether we wait when we flush the commit? (yes) * If we have synchronous_commit = off do we ignore synchronous_replication = on (yes) * If two transactions commit almost simultaneously and one is sync and the other async then only the sync backend will wait? (Yes) Do we definitely need the archiver to move the files written by walreceiver to archive and then move them back out again? Seems like we can streamline that part in many (all?) cases. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] GIN improvements
Greg Stark [EMAIL PROTECTED] writes: If we do this though it would be really nice to do it at a higher level than the indexam. If we could do it for any indexam that provides a kind of bulk insert method that would be great. I'm just not sure how to support all the indexable operators for the various indexams on the local buffered list. In principle, just return all those TIDs marked lossy, please recheck. This is a bit brute-force but I'm not sure any useful optimization is possible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplestore potential performance problem
I don't have real evidence but reasoned it. No strace was done. So it may not be cased by flushing out but this commit gets performance quite better, to earlier patch performance, around 44sec from around 76sec. Oh, I mean, 116sec to 44sec. -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplestore potential performance problem
2008/12/3 Tom Lane [EMAIL PROTECTED]: If this means a lot of contortion/complication in the upper-level code, seems like it'd be better to address the performance issue within tuplestore/buffile. We could keep separate buffers for write and read perhaps. But do you have real evidence of a performance problem? I'd sort of expect the kernel's disk cache to mitigate this pretty well. regards, tom lane I don't have real evidence but reasoned it. No strace was done. So it may not be cased by flushing out but this commit gets performance quite better, to earlier patch performance, around 44sec from around 76sec. http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=commitdiff;h=87d9b8ac5dca9fae5f3ac4f3218d8fb4eca8b5b0;hp=f1976a9d002b20006ac31ca85db27abcf56e9ea2 where pos = -1 means spool all rows until the end. The earlier approach was buffering all the table and the newer Heikki's approach was buffer on row by row while reading. The newest is buffering row by row while reading during in memory, and holding all the remaining tuples before reading after out to file, something like hybrid method. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] GIN improvements
Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: If we do this though it would be really nice to do it at a higher level than the indexam. If we could do it for any indexam that provides a kind of bulk insert method that would be great. I'm just not sure how to support all the indexable operators for the various indexams on the local buffered list. In principle, just return all those TIDs marked lossy, please recheck. This is a bit brute-force but I'm not sure any useful optimization is possible. You could flush the local buffer to the index whenever the index is queried. Not sure if it's better than returning them for recheck, though. This wouldn't work for unique indexes, BTW, but that's not a problem for GIN. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map, partial vacuums
Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Hmm. It just occurred to me that I think this circumvented the anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM FREEZE does that already, but it's unnecessarily aggressive in freezing. FWIW, it seems the omission is actually the other way 'round. Autovacuum always forces a full-scanning vacuum, making the visibility map useless for autovacuum. This obviously needs to be fixed. What will happen is that, because nothing else is advancing the relfrozenxid, the age of the relfrozenxid for all tables will advance until they all hit autovacuum_max_freeze_age. Quite often all the tables were created around the same time so they will all hit autovacuum_max_freeze_age at the same time. So a database which was operating fine and receiving regular vacuums at a reasonable pace will suddenly be hit by vacuums for every table all at the same time, 3 at a time. If you don't have vacuum_cost_delay set that will cause a major issue. Even if you do have vacuum_cost_delay set it will prevent the small busy tables from getting vacuumed regularly due to the backlog in anti-wraparound vacuums. Worse, vacuum will set the freeze_xid to nearly the same value for all of the tables. So it will all happen again in another 100M transactions. And again in another 100M transactions, and again... But we already have that problem, don't we? When you initially load your database, all tuples will have the same xmin, and all tables will have more or less the same relfrozenxid. I guess you can argue that it becomes more obvious if vacuums are otherwise cheaper, but I don't think the visibility map makes that much difference to suddenly make this issue urgent. Agreed that it would be nice to do something about it, though. I think there are several things which need to happen here. 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just means unnecessary full table vacuums long before they accomplish anything. It allows you to truncate clog. If I did my math right, 200M transactions amounts to ~50MB of clog. Perhaps we should still raise it, disk space is cheap after all. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map, partial vacuums
Heikki Linnakangas [EMAIL PROTECTED] writes: Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Hmm. It just occurred to me that I think this circumvented the anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM FREEZE does that already, but it's unnecessarily aggressive in freezing. FWIW, it seems the omission is actually the other way 'round. Autovacuum always forces a full-scanning vacuum, making the visibility map useless for autovacuum. This obviously needs to be fixed. How does it do that? Is there some option in the VacStmt to control this? Do we just need a syntax to set that option? How easy is it to tell what percentage of the table needs to be vacuumed? If it's 50% perhaps it would make sense to scan the whole table? (Hm. Not really if it's a contiguous 50% though...) Another idea: Perhaps each page of the visibility map should have a frozenxid (or multiple frozenxids?). Then if an individual page of the visibility map is old we could force scanning all the heap pages covered by that map page and update it. I'm not sure we can do that safely though without locking issues -- or is it ok because it's vacuum doing the updating? Worse, vacuum will set the freeze_xid to nearly the same value for all of the tables. So it will all happen again in another 100M transactions. And again in another 100M transactions, and again... But we already have that problem, don't we? When you initially load your database, all tuples will have the same xmin, and all tables will have more or less the same relfrozenxid. I guess you can argue that it becomes more obvious if vacuums are otherwise cheaper, but I don't think the visibility map makes that much difference to suddenly make this issue urgent. We already have that problem but it only bites in a specific case: if you have no other vacuums being triggered by the regular dead tuple scale factor. The normal case is intended to be that autovacuum triggers much more frequently than every 100M transactions to reduce bloat. However in practice this specific case does seem to arise rather alarmingly easy. Most databases do have some large tables which are never deleted from or updated. Also, the default scale factor of 20% is actually quite easy to never reach if your tables are also growing quickly -- effectively moving the goalposts further out as fast as the updates and deletes bloat the table. The visibility map essentially widens this specific use case to cover *all* tables. Since the relfrozenxid would never get advanced by regular vacuums the only time it would get advanced is when they all hit the 200M wall simultaneously. Agreed that it would be nice to do something about it, though. I think there are several things which need to happen here. 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just means unnecessary full table vacuums long before they accomplish anything. It allows you to truncate clog. If I did my math right, 200M transactions amounts to ~50MB of clog. Perhaps we should still raise it, disk space is cheap after all. Ah. Hm. Then perhaps this belongs in the realm of the config generator people are working on. They'll need a dial to say how much disk space you expect your database to take in addition to how much memory your machine has available. 50M is nothing for a 1TB database but it's kind of silly to have to keep hundreds of megs of clogs on a 1MB database. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cvs head initdb hangs on unixware
[EMAIL PROTECTED] wrote: Looking at fsm_rebuild_page, I wonder if the compiler is treating int as an unsigned integer? That would cause an infinite loop. No, a simple printf of nodeno shows it starting at 4096 all the way down to 0, starting back at 4096... I wonder if leftchild/rightchild definitions has something to do with it... With probably no relevance at all, I notice that this routine is declared extern, although it is only referenced in its own file apparently. Don't we have a tool that checks that? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cvs head initdb hangs on unixware
[EMAIL PROTECTED] wrote: On Tue, 2 Dec 2008, Heikki Linnakangas wrote: Date: Tue, 02 Dec 2008 20:47:19 +0200 From: Heikki Linnakangas [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Zdenek Kotala [EMAIL PROTECTED], pgsql-hackers list pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cvs head initdb hangs on unixware [EMAIL PROTECTED] wrote: Suivi de pile correspondant à p1, Programme postmaster *[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97] [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1) [0x81e68d9] [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416, 0xb4) [0x81e6385] [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00] [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59] [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042] [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) [0x8097297] [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210] [8] boot_yyparse( présumé: 0x, 0x3, 0x8047ab8) [0x80c822b] [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4) [0x80ca233] [10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b] [11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce] [12] _start() [0x807ff96] seems interesting! We've had problems already with unixware optimizer, hope this one is fixable! Looking at fsm_rebuild_page, I wonder if the compiler is treating int as an unsigned integer? That would cause an infinite loop. No, a simple printf of nodeno shows it starting at 4096 all the way down to 0, starting back at 4096... Hmm, it's probably looping in fsm_search_avail then. In a fresh cluster, there shouldn't be any broken FSM pages that need rebuilding. I'd like to see what the FSM page in question looks like. Could you try to run initdb with -d -n options? I bet you'll get an infinite number of lines like: DEBUG: fixing corrupt FSM block 1, relation 123/456/789 Could you zip up the FSM file of that relation (a file called e.g 789_fsm), and send it over? Or the whole data directory, it shouldn't be that big. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Looks like I need to add Python 2.5+Linux to my testing set. I did not expect that the UNIX distributions of Python 2.5 would ship with wintypes.py at all. I think I can fix this on the spot though. On line 40, you'll find this bit: except ImportError: Change that to the following: except ImportError,ValueError: That didn't work, same error message. And it should pass that point. If it doesn't, you can try the completely general: except: That worked. The settings that this initially spit out (I guess it defaults to mixed mode) didn't look too sane to me, because as discussed elsewhere on this thread 50 is not a reasonable value for default_statistics_target for my installation. It also wanted to set constraint_exclusion to on, which I'm pretty confident is useless. Then I tried -T web and got what seemed like a more reasonable set of values. But I wasn't sure I needed that many connections, so I added -c 150 to see how much difference that made. Kaboom! $ ./pgtune -i ~postgres/data/postgresql.conf -T web -c 150 Traceback (most recent call last): File ./pgtune, line 463, in module wizardTune(config,options,settings) File ./pgtune, line 403, in wizardTune 'web':mem/con, 'oltp':mem/con,'dw':mem/con/2, TypeError: unsupported operand type(s) for /: 'int' and 'str' I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. I'm also a bit surprised that there doesn't seem to be anything here that depends on the size of the database, even order-of-magnitude. It seems like the right value for checkpoint_segments, at least, might depend on that. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map, partial vacuums
Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Gregory Stark wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Hmm. It just occurred to me that I think this circumvented the anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM FREEZE does that already, but it's unnecessarily aggressive in freezing. FWIW, it seems the omission is actually the other way 'round. Autovacuum always forces a full-scanning vacuum, making the visibility map useless for autovacuum. This obviously needs to be fixed. How does it do that? Is there some option in the VacStmt to control this? Do we just need a syntax to set that option? The way it works now is that if VacuumStmt-freeze_min_age is not -1 (which means use the default), the visibility map is not used and the whole table is scanned. Autovacuum always sets freeze_min_age, so it's never using the visibility map. Attached is a patch I'm considering to fix that. How easy is it to tell what percentage of the table needs to be vacuumed? If it's 50% perhaps it would make sense to scan the whole table? (Hm. Not really if it's a contiguous 50% though...) Hmm. You could scan the visibility map to see how much you could skip by using it. You could account for contiguity. Another idea: Perhaps each page of the visibility map should have a frozenxid (or multiple frozenxids?). Then if an individual page of the visibility map is old we could force scanning all the heap pages covered by that map page and update it. I'm not sure we can do that safely though without locking issues -- or is it ok because it's vacuum doing the updating? We discussed that a while ago: http://archives.postgresql.org/message-id/[EMAIL PROTECTED] Tom was concerned about making the visibility map not just a hint but critical data. Rightly so. This is certainly 8.5 stuff; perhaps it would be more palatable after we get the index-only-scans working using the visibility map, since the map would be critical data anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index fd2429a..3e3cb9d 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -171,10 +171,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt, vacrelstats-hasindex = (nindexes 0); /* Should we use the visibility map or scan all pages? */ - if (vacstmt-freeze_min_age != -1) - scan_all = true; - else - scan_all = false; + scan_all = vacstmt-scan_all; /* Do the vacuuming */ lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all); diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index eb7ab4d..2781f6e 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -2771,6 +2771,7 @@ _copyVacuumStmt(VacuumStmt *from) COPY_SCALAR_FIELD(analyze); COPY_SCALAR_FIELD(verbose); COPY_SCALAR_FIELD(freeze_min_age); + COPY_SCALAR_FIELD(scan_all)); COPY_NODE_FIELD(relation); COPY_NODE_FIELD(va_cols); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index d4c57bb..86a032f 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -1436,6 +1436,7 @@ _equalVacuumStmt(VacuumStmt *a, VacuumStmt *b) COMPARE_SCALAR_FIELD(analyze); COMPARE_SCALAR_FIELD(verbose); COMPARE_SCALAR_FIELD(freeze_min_age); + COMPARE_SCALAR_FIELD(scan_all); COMPARE_NODE_FIELD(relation); COMPARE_NODE_FIELD(va_cols); diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 85f4616..1aab75c 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -5837,6 +5837,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose n-analyze = false; n-full = $2; n-freeze_min_age = $3 ? 0 : -1; + n-scan_all = $3; n-verbose = $4; n-relation = NULL; n-va_cols = NIL; @@ -5849,6 +5850,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose n-analyze = false; n-full = $2; n-freeze_min_age = $3 ? 0 : -1; + n-scan_all = $3; n-verbose = $4; n-relation = $5; n-va_cols = NIL; @@ -5860,6 +5862,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose n-vacuum = true; n-full = $2; n-freeze_min_age = $3 ? 0 : -1; + n-scan_all = $3; n-verbose |= $4; $$ = (Node *)n; } diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 8d8947f..2c68779 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2649,6 +2649,7 @@ autovacuum_do_vac_analyze(autovac_table *tab, vacstmt.full = false; vacstmt.analyze = tab-at_doanalyze; vacstmt.freeze_min_age = tab-at_freeze_min_age; + vacstmt.scan_all = tab-at_wraparound; vacstmt.verbose = false; vacstmt.relation = NULL;
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 2008-12-03 at 13:30 -0500, Robert Haas wrote: Looks like I need to add Python 2.5+Linux to my testing set. I did not expect that the UNIX distributions of Python 2.5 would ship with wintypes.py at all. I think I can fix this on the spot though. On line 40, you'll find this bit: except ImportError: Change that to the following: except ImportError,ValueError: That didn't work, same error message. And it should pass that point. If it doesn't, you can try the completely general: except: That worked. The settings that this initially spit out (I guess it defaults to mixed mode) didn't look too sane to me, because as discussed elsewhere on this thread 50 is not a reasonable value for default_statistics_target for my installation. It also wanted to set constraint_exclusion to on, which I'm pretty confident is useless. Then I tried -T web and got what seemed like a more reasonable set of values. But I wasn't sure I needed that many connections, so I added -c 150 to see how much difference that made. Kaboom! $ ./pgtune -i ~postgres/data/postgresql.conf -T web -c 150 Traceback (most recent call last): File ./pgtune, line 463, in module wizardTune(config,options,settings) File ./pgtune, line 403, in wizardTune 'web':mem/con, 'oltp':mem/con,'dw':mem/con/2, TypeError: unsupported operand type(s) for /: 'int' and 'str' I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. Why? I'm also a bit surprised that there doesn't seem to be anything here that depends on the size of the database, even order-of-magnitude. It seems like the right value for checkpoint_segments, at least, might depend on that. What does checkpoint_segments have to do with the size of the database? Joshua D. Drake ...Robert -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions and temp tables
I would really like to have support for temp tables at least for the case where the table is created and dropped in the same transaction. But I guess that the other limitations on index, sequences and views would still hold, right? manu Heikki Linnakangas wrote: Emmanuel Cecchet wrote: There is a problem with temp tables with on delete rows that are created inside a transaction. Take the 2pc_on_delete_rows_transaction.sql test case and change the creation statement, instead of create temp table foo(x int) on commit delete rows; try create temp table foo(x serial primary key) on commit delete rows; The test will fail. It looks like the onCommit field is not properly updated when serial or primary key is used in that context. I did not figure out why. A serial column uses a sequence behind the scenes. Hmm. Seems like we would need to treat sequences and indexes the same as tables with ON COMMIT DELETE ROWS, i.e release the locks early and don't error out. All in all, this is getting pretty messy. My patch felt a bit hackish to begin with, and having to add special cases for sequences and indexes would make it even more so. And what about temporary views? I'm starting to feel that instead of special-casing temp relations, we need to move into the opposite direction and make temp relations more like regular relations. Unfortunately, that's not going to happen in the 8.4 timeframe :-(. Let's try the other approach in 8.5. -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development Consulting -- Web: http://www.frogthinker.org email: [EMAIL PROTECTED] Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot leak and core dump with serializable transactions
Pavan Deolasee escribió: On Wed, Dec 3, 2008 at 7:42 PM, Tom Lane [EMAIL PROTECTED] wrote: That's absolutely wrong. It'll complain about whatever snapshots the owners still hold. You must be right; I don't understand that code much. But don't we expect the snapshots to be cleanly released at that point and if not we flash warnings anyways ? AtEOXact_Snapshot only unregisters the serialized snapshot which otherwise release resource owner will complain about. Yeah, we need two at-commit routines, one of which needs to be called early. I'm prepping a patch. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot leak and core dump with serializable transactions
Alvaro Herrera escribió: Yeah, we need two at-commit routines, one of which needs to be called early. I'm prepping a patch. Here it is ... the large object patch is also included. I've created new functions to specify the resource owner to register a snapshot in; now that there are two callers, it seems likely that there will be more in the future. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Index: src/backend/access/transam/xact.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.269 diff -c -p -r1.269 xact.c *** src/backend/access/transam/xact.c 19 Nov 2008 10:34:50 - 1.269 --- src/backend/access/transam/xact.c 3 Dec 2008 19:30:35 - *** CommitTransaction(void) *** 1667,1672 --- 1667,1675 /* Clean up the relation cache */ AtEOXact_RelationCache(true); + /* Clean up the snapshot manager */ + AtEarlyCommit_Snapshot(); + /* * Make catalog changes visible to all backends. This has to happen after * relcache references are dropped (see comments for *** PrepareTransaction(void) *** 1906,1911 --- 1909,1917 /* Clean up the relation cache */ AtEOXact_RelationCache(true); + /* Clean up the snapshot manager */ + AtEarlyCommit_Snapshot(); + /* notify and flatfiles don't need a postprepare call */ PostPrepare_PgStat(); Index: src/backend/storage/large_object/inv_api.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/large_object/inv_api.c,v retrieving revision 1.135 diff -c -p -r1.135 inv_api.c *** src/backend/storage/large_object/inv_api.c 2 Nov 2008 01:45:28 - 1.135 --- src/backend/storage/large_object/inv_api.c 3 Dec 2008 18:59:43 - *** inv_open(Oid lobjId, int flags, MemoryCo *** 247,253 } else if (flags INV_READ) { ! retval-snapshot = RegisterSnapshot(GetActiveSnapshot()); retval-flags = IFS_RDLOCK; } else --- 247,254 } else if (flags INV_READ) { ! retval-snapshot = RegisterSnapshotOnOwner(GetActiveSnapshot(), ! TopTransactionResourceOwner); retval-flags = IFS_RDLOCK; } else *** void *** 270,277 inv_close(LargeObjectDesc *obj_desc) { Assert(PointerIsValid(obj_desc)); if (obj_desc-snapshot != SnapshotNow) ! UnregisterSnapshot(obj_desc-snapshot); pfree(obj_desc); } --- 271,281 inv_close(LargeObjectDesc *obj_desc) { Assert(PointerIsValid(obj_desc)); + if (obj_desc-snapshot != SnapshotNow) ! UnregisterSnapshotFromOwner(obj_desc-snapshot, ! TopTransactionResourceOwner); ! pfree(obj_desc); } Index: src/backend/utils/time/snapmgr.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/utils/time/snapmgr.c,v retrieving revision 1.7 diff -c -p -r1.7 snapmgr.c *** src/backend/utils/time/snapmgr.c 25 Nov 2008 20:28:29 - 1.7 --- src/backend/utils/time/snapmgr.c 3 Dec 2008 19:28:50 - *** GetTransactionSnapshot(void) *** 136,142 */ if (IsXactIsoLevelSerializable) { ! CurrentSnapshot = RegisterSnapshot(CurrentSnapshot); registered_serializable = true; } --- 136,143 */ if (IsXactIsoLevelSerializable) { ! CurrentSnapshot = RegisterSnapshotOnOwner(CurrentSnapshot, ! TopTransactionResourceOwner); registered_serializable = true; } *** ActiveSnapshotSet(void) *** 345,351 /* * RegisterSnapshot ! * Register a snapshot as being in use * * If InvalidSnapshot is passed, it is not registered. */ --- 346,352 /* * RegisterSnapshot ! * Register a snapshot as being in use by the current resource owner * * If InvalidSnapshot is passed, it is not registered. */ *** RegisterSnapshot(Snapshot snapshot) *** 371,376 --- 372,396 } /* + * As above, but register it on a specific resource owner + */ + Snapshot + RegisterSnapshotOnOwner(Snapshot snapshot, ResourceOwner owner) + { + Snapshot retval; + ResourceOwner save_CurrentResourceOwner; + + save_CurrentResourceOwner = CurrentResourceOwner; + CurrentResourceOwner = TopTransactionResourceOwner; + + retval = RegisterSnapshot(snapshot); + + CurrentResourceOwner = save_CurrentResourceOwner; + + return retval; + } + + /* * UnregisterSnapshot * * Decrement the reference count of a snapshot, remove the corresponding *** UnregisterSnapshot(Snapshot snapshot) *** 395,400 --- 415,433 } } + void + UnregisterSnapshotFromOwner(Snapshot snapshot, ResourceOwner owner) + { + ResourceOwner save_CurrentResourceOwner; + + save_CurrentResourceOwner =
Re: [HACKERS] Simple postgresql.conf wizard
I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. Why? Because both of those settings are strictly worse for my database than the defaults. I don't have any partitioned tables, and see: http://archives.postgresql.org/pgsql-hackers/2008-11/msg01837.php I'm also a bit surprised that there doesn't seem to be anything here that depends on the size of the database, even order-of-magnitude. It seems like the right value for checkpoint_segments, at least, might depend on that. What does checkpoint_segments have to do with the size of the database? It seems unlikely that you would want 256 MB of checkpoint segments on a database that is only 100 MB (or even 500 MB). But you might very well want that on a database that is 1 TB. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] writes: On Wed, 2008-12-03 at 13:30 -0500, Robert Haas wrote: I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. Why? Well did you have any response to what I posited before? I said mixed should produce the same settings that the default initdb settings produce. At least on a moderately low-memory machine that initdb targets. It sure seems strange to me to have initdb which presumably is targeting a mixed system -- where it doesn't know for sure what workload will be run -- produce a different set of values than the tuner on the same machine. I'm also a bit surprised that there doesn't seem to be anything here that depends on the size of the database, even order-of-magnitude. It seems like the right value for checkpoint_segments, at least, might depend on that. What does checkpoint_segments have to do with the size of the database? I had the same reaction but I think he's right. checkpoint_segments is the maximum amount of space you want the WAL to take up (ideally). Presumably on a small database you don't want hundreds of megabytes of WAL for a 10M database. But on a terabyte data warehouse sitting on a big SAN you're not going to be concerned with how much space the WAL files are taking. In fact, really it would be nice if we allowed units of space (MB, GB, etc) for checkpoint_segments. I used to think of checkpoint_segments in terms of transaction rate and maximum tolerable recovery time but really if those are your constraints you're better off using checkpoint_timeout I think. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Well did you have any response to what I posited before? I said mixed should produce the same settings that the default initdb settings produce. At least on a moderately low-memory machine that initdb targets. I'm actually really skeptical of this whole idea of modes. The main thing mode does, other than set max connections, is handle desktop differently than other modes by decreasing shared_buffers and effective_cache_size by 4x and work_mem by 3x. And the default settings for max_connections are a SWAG that could easily be way off for any particular installation. I think it would be more useful to get rid of modes, accept the user is going to have to specify max_connections if the default of, say, 100 is not reasonable, and handle the desktop case by telling the user to rerun the tool overriding the system memory with a lower value. I'm not sure if you've thought about this, but there is also a difference between max_connections and maximum LIKELY connections. For example my apps don't have too many users, since they are internal-facing. But setting max_connections to 100 gives me a nice buffer just in case everyone decides to log on at once. Still, for performance reasons, I'd prefer to calculate based on a more likely scenario, where the concurrent user count might be only 10 or 20. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
I can see an argument about constraint_exclusion but default_statistics_target I don't. Why not? I don't want to accept a big increase in ANALYZE times (or planning times, though I'm really not seeing that at this point) without some benefit. It seems unlikely that you would want 256 MB of checkpoint segments on a database that is only 100 MB (or even 500 MB). But you might very well want that on a database that is 1 TB. It also seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. So why do we have this parameter at all? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] writes: It also seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. Remember postgresql only creates the segments when it needs them. Should we change the initdb output then? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: If you are concerned about the analyze time between 10, 50 and 150, I would suggest that you are concerned about the wrong things. Remember I can't rule that out. What things do you think I should be concerned about? ISTM that default_statistics_target trades off ANALYZE time and query planning time vs. the possibility of better plans. If the former considerations are not an issue for dst = 50, then maybe we should emit 50 by default. But the limited evidence that has been published in this forum thus far doesn't support that contention. It also seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. So why do we have this parameter at all? Excellent question, for a different thread :) I think the rhetorical answer is so that we don't fill up the disk, which gets us back to database size. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map, partial vacuums
Gregory Stark [EMAIL PROTECTED] writes: Heikki Linnakangas [EMAIL PROTECTED] writes: Gregory Stark wrote: 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just means unnecessary full table vacuums long before they accomplish anything. It allows you to truncate clog. If I did my math right, 200M transactions amounts to ~50MB of clog. Perhaps we should still raise it, disk space is cheap after all. Hm, the more I think about it the more this bothers me. It's another subtle change from the current behaviour. Currently *every* vacuum tries to truncate the clog. So you're constantly trimming off a little bit. With the visibility map (assuming you fix it not to do full scans all the time) you can never truncate the clog just as you can never advance the relfrozenxid unless you do a special full-table vacuum. I think in practice most people had a read-only table somewhere in their database which prevented the clog from ever being truncated anyways, so perhaps this isn't such a big deal. But the bottom line is that the anti-wraparound vacuums are going to be a lot more important and much more visible now than they were in the past. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 2008-12-03 at 17:33 -0500, Robert Haas wrote: On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: If you are concerned about the analyze time between 10, 50 and 150, I would suggest that you are concerned about the wrong things. Remember I can't rule that out. What things do you think I should be concerned about? Your databases production performance with the change of the parameter. Quite a bit more often than not, your problem (if you have one) isn't going to be default_statistics_target is too high. ISTM that default_statistics_target trades off ANALYZE time and query planning time vs. the possibility of better plans. If the former considerations are not an issue for dst = 50, then maybe we should emit 50 by default. But the limited evidence that has been published in this forum thus far doesn't support that contention. Actually there are years worth of evidence in these archives. Not that the 50 is the right number but that the current settings are definitely wrong and that higher ones are needed. That people generally start around 100 and go from there, except where they don't and then someone like Tom, I or some other person says, Oh you need to increase default_statistics_target. There is no empirical evidence that 50 is the right setting but there is more than enough anecdotal evidence to suggest that 50 is a lot better than 10 and that even higher than 50 is reasonable. In an effort to follow the PostgereSQL conservative mantra, 50 is a good compromise. It also seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. So why do we have this parameter at all? Excellent question, for a different thread :) I think the rhetorical answer is so that we don't fill up the disk, I don't think at any time I have said to my self, I am going to set this parameter low so I don't fill up my disk. If I am saying that to myself I have either greatly underestimated the hardware for the task. Consider that we are quarreling over what amounts to a nominal amount of hard drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more capacity than that. *If* hard drive is a space (that much of) a concern then you are having other problems already that pgtune won't satisfy and you should be manually tuning the conf in the first place. Joshua D. Drake which gets us back to database size. ...Robert -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Robert Haas [EMAIL PROTECTED] wrote: On Wed, Dec 3, 2008 at 4:41 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: If you are concerned about the analyze time between 10, 50 and 150, I would suggest that you are concerned about the wrong things. Remember I can't rule that out. What things do you think I should be concerned about? ISTM that default_statistics_target trades off ANALYZE time and query planning time vs. the possibility of better plans. If the former considerations are not an issue for dst = 50, then maybe we should emit 50 by default. But the limited evidence that has been published in this forum thus far doesn't support that contention. One more data point to try to help. While the jump from a default_statistics_target from 10 to 1000 resulted in a plan time increase for a common query from 50 ms to 310 ms, at a target of 50 the plan time was 53 ms. Analyze time was 7.2 minutes and 18.5 minutes for targets of 10 and 50. This is an 842 GB database on an 8 processor (3.5 GHz Xeon) machine with 64 GB RAM running (soon to be updated) PostgreSQL 8.2.7. Based on the minimal plan time increase of this test, we're going to try 50 in production and see how it goes. It's worth pondering that at the target of 1000, had we put that into production, running this query 300,000 times per day would have used 21 hours and 40 minutes of additional CPU time per day on planning the runs of this one query, while a target of 50 only consumes an additional 15 minutes of 3.5 GHz CPU time per day. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-place upgrade: catalog side
Zdenek Kotala wrote: If you compare with pg_migrator, there is better handling of locale and I think vacuum freeze is used correctly. Also shuffling with tablespaces is little bit different (it should avoid to move data outside of mountpoint). But in principal the idea is same. -There are 10 TODO items listed for the pg_migrator project, most or all of which look like should be squashed before this is really complete. Any chance somebody (Korry?) has an improved version of this floating around beyond what's in the pgfoundry CVS already? As I mentioned before pg_migrator and pg_upgrade.sh is not good way. It is workaround. It does not make sense to continue in this way. As the author of the original shell script, which was in /contrib/pg_upgrade, I think the code has grown to the point where it should be reimplemented in something like Perl. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Gregory Stark wrote: It sure seems strange to me to have initdb which presumably is targeting a mixed system -- where it doesn't know for sure what workload will be run -- produce a different set of values than the tuner on the same machine. It's been a long time since the output from initdb was targeting anything but a minimal system with an untuned kernel and limited resources. If you invert the normal tuning recommendations, as if its initial configuration were the output from typical practice, it would be aiming at a system with approximately 128MB of RAM. That feels about right to me; when I had 128MB of RAM in my high-end P2-300 server running PG 7.0, 32MB of shared_buffers was huge and 3 checkpoints segments was plenty. I don't recall regularly dirtying things fast enough to see checkpoints occuring too fast then like you can do trivially nowadays. Here in 2008, I push checkpoint_segments up to 10 even for the most trivial apps lest the logs fill with those buggers the first time I run an update on a table. Right now, my program doesn't fiddle with any memory settings if you've got less than 256MB of RAM. Were someone to champion the idea that *nothing* should be fiddled with in those cases, that's not an unreasonable position. I'm not the sort to be too concerned myself that the guy who thinks he's running a DW on a system with 64MB of RAM might get bad settings, but it's a fair criticism to point that out as a problem. In fact, really it would be nice if we allowed units of space (MB, GB, etc) for checkpoint_segments. That's a good way to think about this, let's run with that for a minute. The values I'm throwing in there look like this (if your tab stops aren't at 8 characters this will suck): Completion Max Max TypeSegsTarget SegmentsUsage web 8 0.7 23 368MB oltp16 0.9 47 752MB dw 64 0.9 187 3GB mixed 16 0.9 47 752MB desktop 3 0.5 9 144MB Is 368MB of overhead unreasonable for a web application database today, where you can get a mirrored pair of disks for under $1/GB? It's only the DW case that even starts to leave trivial territory. Your example of somebody who thinks the overhead is too high on their 10MB database is already being blown away even at the default of 3 segments (assuming that data has enough churn on it to go through that many segments ever--if it doesn't then the maximum doesn't matter anyway). The reality here is that it's the recovery playback time that's the real bear. If I were trying to argue against me, what would be more persuasive is some tests showing how long it takes to sort through, cleanup, and replay the appropriate portions of as many as 47 segments worth of WAL after an unclean shutdown when checkpoint_segments=16. Given how long that takes, it might be possible to find a modern system takes a while to process that much WAL volume. It's pretty rare I run into that (usually only after I do something abusive), whereas complaints about the logs filling with checkpoint warnings on systems set to the default seem to pop up all the time. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Kevin Grittner [EMAIL PROTECTED] writes: One more data point to try to help. While the jump from a default_statistics_target from 10 to 1000 resulted in a plan time increase for a common query from 50 ms to 310 ms, at a target of 50 the plan time was 53 ms. That sounds like it would be an interesting query to analyze in more detail. Is there any chance to could run the complete graph and get a chart of analyze times for all statistics values from 1..1000 ? And log the explain plans to a file so we can look for at what statistics targets the plan changed? Or if the data is public I would be interested in looking at doing it if you want to send it to me. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: I started to do this for you last week but got side-tracked. Do you have any time for this? I can do it if you have a script. So how big should a minimum postgres install be not including your data? Is 100M reasonable? Should we say Postgres requires 200M? 500? 1G? Whatever number we pick (or allow the user to pick) will determine how large this value ought to be. And incidentally also provide a bound on autovacuum_max_freeze_age as Heikki pointed out on another thread. I fail to see what any of the above paragraph has to do with checkpoint_segments. Anyway, I have made my arguments. Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Robert Haas wrote: I'm not sure if you've thought about this, but there is also a difference between max_connections and maximum LIKELY connections. It's actually an implicit assumption of the model Josh threw out if you stare at the numbers. The settings for work_mem are twice as high per connection in the Web+OLTP application cases, based on the assumption that you're just not going to get everybody doing sorting at once in those situations. I toyed with exposing that as an explicit connection load duty factor, then remembered I was trying to deliver something rather than tweak the parameters forever. It may be a bit too aggressive as written right now in those cases. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Guillaume Smet wrote: - it would be really nice to make it work with Python 2.4 as RHEL 5 is a Python 2.4 thing and it is a very widespread platform out there, The 2.5 stuff is only required in order to detect memory on Windows. My primary box is RHEL5 and runs 2.4, it works fine there. - considering the audience of this tool, I think you should explain in the usage text which type of workload implies each database type (DW, OLTP, Web, Mixed, Desktop). Once I'm done with the docs I'll refer over to those, it's too much to put into the usage without cluttering it. - it would be nice to be able to define the architecture (32-64 bits) from the command line (especially considering I won't be able to run it on our target boxes which are all RHEL 5 :)) I'm starting to lean toward making everything that gets detected as also being possible to override, for these case. I want to make this just work in as many cases as possible, but the situation where someone is configuring/testing on a system other than the server is pretty common. Now that I think about it I often setup configs on my 32-bit laptop and them move them over onto 64-bit servers. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: On Wed, 3 Dec 2008, Gregory Stark wrote: It sure seems strange to me to have initdb which presumably is targeting a mixed system -- where it doesn't know for sure what workload will be run -- produce a different set of values than the tuner on the same machine. It's been a long time since the output from initdb was targeting anything but a minimal system with an untuned kernel and limited resources. If you invert the normal tuning recommendations, as if its initial configuration were the output from typical practice, it would be aiming at a system with approximately 128MB of RAM. That feels about right to me; when I had 128MB of RAM in my high-end P2-300 server running PG 7.0, 32MB of shared_buffers was huge and 3 checkpoints segments was plenty. I don't recall regularly dirtying things fast enough to see checkpoints occuring too fast then like you can do trivially nowadays. Well I think there's also an assumption in initdb that Postgres can't assume it's on a dedicated machine. So whether it's 32MB on a dedicated 128MB machine or 32MB on a 256MB machine where it's only expected to be half the workload of the machine it works out to about the same thing. Right now, my program doesn't fiddle with any memory settings if you've got less than 256MB of RAM. What I'm suggesting is that you shouldn't have to special case this. That you should expect whatever formulas you're using to produce the same values as initdb if they were run on the same machine initdb is targeting. But actually I'm more concerned with the *non* memory related parameters. It may make sense to tweak those one way or the other for oltp or dss but mixed should be exactly what initdb produces since that's exactly what it's targeting -- a system that will have a wide mixture of queries and must function reasonably well for both data warehouse and oltp queries. Completion Max Max Type SegsTarget SegmentsUsage web 8 0.7 23 368MB oltp 16 0.9 47 752MB dw64 0.9 187 3GB mixed 16 0.9 47 752MB desktop 3 0.5 9 144MB (incidentally using tab stops in emails is probably a bad idea because of quoting as above) Uhm, I hadn't actually seen this list before. I don't understand how web is different from oltp. A web service really is just one (very typical) example of an oltp application. And desktop seems like an outlier here. I suppose it's meant to capture whether postgres is on a dedicated box? But it's possible to have a non-dedicated oltp application or non-dedicated data warehouse box just as easily. It's an orthogonal issue from the oltp/data-warehouse axis. Is 368MB of overhead unreasonable for a web application database today Well I think it's more than most people expect a single application install to take up before they start putting data in it. It would probably work better if we were asking how big their database was and then could say, well, you said you had 10G of data so 300MB of overhead isn't going to be so bad. The reality here is that it's the recovery playback time that's the real bear. I agree, but then that's what checkpoint_timeout is for, no? It might take longer to replay but the recovery time should bear some relation to how long it took to write out the wal. More so than to the sheer size of the wal. whereas complaints about the logs filling with checkpoint warnings on systems set to the default seem to pop up all the time. filling? The cure to having too much space taken up by logs is to take up space with, well, logs? The logs are filling up with warnings which explain exactly what parameter to adjust. Are there really complaints about this? I'm really beginning to think the root of the problem is the name. If it were transaction_log_max_space and measured in megabytes people would be happy to say ok, I'll make space for 100MB of logs or whatever. Today they don't know what to set it to or what the impact of setting it will be. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Robert Haas wrote: Then I tried -T web and got what seemed like a more reasonable set of values. But I wasn't sure I needed that many connections, so I added -c 150 to see how much difference that made. Kaboom! That and the import errors fixed in the version attached (just replacing the script, not its data), thanks for testing that out. I'm not sure what mixed mode is supposed to be, but based on what I've seen so far, I'm a skeptical of the idea that encouraging people to raise default_statistics_target to 50 and turn on constraint_exclusion is reasonable. The statistics stuff is obviously a broader discussion, will let that rage in existing threads. The reason for setting constraint_exclusion in the mixed case is that people who just pick the defaults without reading anything will get a configuration that supports partitions usefully. One of the HINTs I intend to throw out for that specific case is that they should turn it off if they don't ever intend to use paritions. The idea of the mixed mode is that you want to reduce the odds someone will get a massively wrong configuration if they're not paying attention. Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? That's a tough call; I could invert things, so that it defaults to off in mixed mode, as always, and just produces a HINT to turn it on. I don't have a really strong opinion there either way. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD pgtune.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? Surely that's the same question we faced when deciding what the Postgres default should be? That and the unstated other question Is someone more likely to use partitions without reading the manual or not use partitions without reading the manual about the down-sides of constraint_exclusion (in the partitioning section) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] writes: On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: I started to do this for you last week but got side-tracked. Do you have any time for this? I can do it if you have a script. Well, I can send you what I have so far but it still needs more work. I only got as far as the graphs I sent earlier which don't include scanning for changed plans. Also, if you have any sample databases with skewed data sets that would be interesting. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Gregory Stark escribió: Joshua D. Drake [EMAIL PROTECTED] writes: I don't think at any time I have said to my self, I am going to set this parameter low so I don't fill up my disk. If I am saying that to myself I have either greatly underestimated the hardware for the task. Consider that we are quarreling over what amounts to a nominal amount of hard drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more capacity than that. Well my phone has 16G of RAM, why not 1 ? I don't think the disk space used is the only consideration here. You also have to keep recovery time in mind. If you set it to 1000, recovery would take way too long. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Joshua D. Drake [EMAIL PROTECTED] writes: On Thu, 2008-12-04 at 00:11 +, Gregory Stark wrote: Joshua D. Drake [EMAIL PROTECTED] writes: I started to do this for you last week but got side-tracked. Do you have any time for this? I can do it if you have a script. So how big should a minimum postgres install be not including your data? Is 100M reasonable? Should we say Postgres requires 200M? 500? 1G? Whatever number we pick (or allow the user to pick) will determine how large this value ought to be. And incidentally also provide a bound on autovacuum_max_freeze_age as Heikki pointed out on another thread. I fail to see what any of the above paragraph has to do with checkpoint_segments. Are we all on the same page on what checkpoint_segments does? It's the number of segments of WAL log postgres will allow to accumulate before it triggers a checkpoint and trims off any it doesn't need. That means even if your database is just churning updating the same records over and over the WAL will grow to this size before Postgres makes any attempt to trim it (unless it hits checkpoint_timeout but that's a separate tunable). If you're loading data all your data will go into the heap *and* the wal log until it hits this size and triggers a checkpoint. So this is the minimum amount of extra space you need in addition to your data for a functioning postgres database install not including your data. If you don't anticipate postgres using this much space and set aside enough space for it, your database is at risk of randomly stopping and producing errors when it can't create new log files you told it needs. It's interesting that this is the *only* parameter we can come up with that really directly depends on disk space. The tables are obviously going to be as big as they have to be and there's not much to do about that. If we could eliminate this parameter it would be a lot nicer. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
Magnus Hagander wrote: Hiroshi Inoue wrote: I think the thing us that as long as the encodings are compatible (latin1 with different names for example) it worked fine. In any case I think the problem is that gettext is looking at a setting that is not what we are looking at. Particularly with the 8.4 changes to allow per-database locale settings, this has got to be fixed in a bulletproof way. Attached is a new patch to apply bind_textdomain_codeset() to most server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL and PG_EUC_JIS_2004. EUC-JP may be OK for EUC_JIS_2004. Unfortunately it's hard for Saito-san and me to check encodings other than EUC-JP. In principle this looks good, I think, but I'm a bit worried around the lack of testing. Thanks and I agree with you. I can do some testing under LATIN1 which is what we use in Sweden (just need to get gettext working *at all* in my dev environment again - I've somehow managed to break it), and perhaps we can find someone to do a test in an eastern-european locale to get some more datapoints? Can you outline the steps one needs to go through to show the problem, so we can confirm it's fixed in these locales? Saito-san and I have been working on another related problem about changing LC_MESSAGES locale properly under Windows and would be able to provide a patch in a few days. It seems preferable for us to apply the patch also so as to change the message catalog easily. Attached is an example in which LC_MESSAGES is cht_twn(zh_TW) and the server encoding is EUC-TW. You can see it as a UTF-8 text because the client_encoding is set to UTF-8 first. BTW you can see another problem at line 4 in the text. At the point the LC_MESSAGES is still japanese and postgres fails to convert a Japanese error message to EUC_TW encoding. There's no wonder but it doesn't seem preferable. regards, Hiroshi Inoue set client_encoding to utf_8; SET 1; psql:cmd/euctw.sql:2: ERROR: character 0xb9e6 of encoding EUC_TW has no equivalent in UTF8 select current_database(); current_database -- euctw (1 s) show server_encoding; server_encoding - EUC_TW (1 s) show lc_messages; lc_messages Japanese_Japan.932 (1 s) set lc_messages to cht; SET select a; psql:cmd/euctw.sql:7: é¯èª¤: æ¬ä½aä¸åå¨ LINE 1: select a; ^ 1; psql:cmd/euctw.sql:8: é¯èª¤: å¨èªæ³é¯èª¤éè¿ç¼ç 1 LINE 1: 1; ^ select * from a; psql:cmd/euctw.sql:9: é¯èª¤: relation aä¸åå¨ LINE 1: select * from a; ^ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 2008-12-03 at 22:17 -0300, Alvaro Herrera wrote: Gregory Stark escribió: Joshua D. Drake [EMAIL PROTECTED] writes: I don't think at any time I have said to my self, I am going to set this parameter low so I don't fill up my disk. If I am saying that to myself I have either greatly underestimated the hardware for the task. Consider that we are quarreling over what amounts to a nominal amount of hard drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more capacity than that. Well my phone has 16G of RAM, why not 1 ? I don't think the disk space used is the only consideration here. You also have to keep recovery time in mind. If you set it to 1000, recovery would take way too long. Well certainly but the original argument that came back was, (from Robert Haas): It seems unlikely that you would want 256 MB of checkpoint segments on a database that is only 100 MB (or even 500 MB). But you might very well want that on a database that is 1 TB. My whole point is that: 1. It seems unlikely that you would hit 256MB of checkpoint segments on a 100MB database before checkpoint_timeout and if you did, you certainly did need them. (the checkpoint segments) 2. taking up space is such a minute concern in comparison to the potential benefit. Recovery is certainly a consideration but let's be realistic it is the last consideration because it is the least likely to happen. What is more likely to happen is IO spikes because we are recycling logs too much. I know we have some other facilities to deal with that now too but it doesn't completely negate the problem and in my opinion, increasing the checkpoint_segments provides no perceivable downside in production use but does provide significant perceivable upside. Joshua D. Drake -- PostgreSQL Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 4 Dec 2008, Gregory Stark wrote: Right now, my program doesn't fiddle with any memory settings if you've got less than 256MB of RAM. What I'm suggesting is that you shouldn't have to special case this. That you should expect whatever formulas you're using to produce the same values as initdb if they were run on the same machine initdb is targeting. The reason that approach isn't taken is that the model here assumes the OS overhead is negligable relative to everything else going on. If you've only got a small amount of RAM, that assumption is so badly broken that you can't just extend the curves for everything down to there and expect that what comes out will make any sense. I started to make a more complicated bit that did scale down to the bottom by modeling the overhead better, Josh talked me out of doing it for now. But actually I'm more concerned with the *non* memory related parameters. It may make sense to tweak those one way or the other for oltp or dss but mixed should be exactly what initdb produces since that's exactly what it's targeting -- a system that will have a wide mixture of queries and must function reasonably well for both data warehouse and oltp queries. The only way this line of discussion will go is toward talking about what should be changed in initdb to make it more representative of the current real world, and I know that's not going anywhere (see default_statistics_target=10). The idea that the sample configuration is tuned usefully for any application whatsoever gets nothing from me but a chuckle. And desktop seems like an outlier here. I suppose it's meant to capture whether postgres is on a dedicated box? But it's possible to have a non-dedicated oltp application or non-dedicated data warehouse box just as easily. That's the target for something that's not a dedicated server--a desktop PC you use as a general workstation, maybe you're installing PostgreSQL as a developer that's competing with your web server and other apps; something like that. There might be a better name for that. Is 368MB of overhead unreasonable for a web application database today Well I think it's more than most people expect a single application install to take up before they start putting data in it. Segments don't get allocated until you churn through that much WAL activity; that figure is an upper-bound after you've pushed more than that worth of data through WAL and into the database. The only example where this overhead isn't dwarfed by the size of the resulting database is where some small number of records are inserted, then constantly updated and vacuumed. And you know what? The person doing that is likely to really benefit from having checkpoint_segments set to a larger value. Update and vacuum heavy workloads are exactly the sort where you end up checkpointing too often with the default parameters. I'm really beginning to think the root of the problem is the name. If it were transaction_log_max_space and measured in megabytes people would be happy to say ok, I'll make space for 100MB of logs or whatever. Today they don't know what to set it to or what the impact of setting it will be. Unless they do something crazy like read the documentation: http://www.postgresql.org/docs/8.3/static/wal-configuration.html There will always be at least one WAL segment file, and will normally not be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 files. Each segment file is normally 16 MB (though this size can be altered when building the server). You can use this to estimate space requirements for WAL. Too complicated for most people you say? I agree; that's why I put some annotated examples for what those translate into http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server like 32 (checkpoint every 512MB). What fun. I'm beginning to remember why nobody has ever managed to deliver a community tool that helps with this configuration task before. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
The idea of the mixed mode is that you want to reduce the odds someone will get a massively wrong configuration if they're not paying attention. Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? That's a tough call; I could invert things, so that it defaults to off in mixed mode, as always, and just produces a HINT to turn it on. I don't have a really strong opinion there either way. I think that the strong feelings about default_statistics_target and constraint_exclusion come from the fact that when they are too low (in the first case) or off (in the second case), you can get very, very bad query plans. The penalties in the opposite direction are more subtle. If they're so subtle that we don't care about incurring them, then let's change initdb too. If not, then let's not have the tuning tool generate them by default either. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
Agreed. I borrowed WAL parsing code from XLogdump and I think WAL parsing should be another candidate. 2008/12/3 Fujii Masao [EMAIL PROTECTED]: Hi, On Thu, Nov 27, 2008 at 9:04 PM, Koichi Suzuki [EMAIL PROTECTED] wrote: Please find enclosed a revised version of pg_readahead and a patch to invoke pg_readahead. Some similar functions are in xlog.c and pg_readahead.c (for example, RecordIsValid). I think that we should unify them as a common function, which helps to develop the tool (for example, xlogdump) treating WAL in the future. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: pgsql: Properly unregister OpenSSL callbacks when libpq is done with
Magnus Hagander wrote: Log Message: --- Properly unregister OpenSSL callbacks when libpq is done with it's connection. This is required for applications that unload the libpq library (such as PHP) in which case we'd otherwise have pointers to these functions when they no longer exist. Breaks the build with --enable-thread-safety and --with-openssl because of this typo. Kris Jurka *** a/src/interfaces/libpq/fe-secure.c --- b/src/interfaces/libpq/fe-secure.c *** *** 918,925 destroy_ssl_system(void) * This means we leak a little memory on repeated load/unload * of the library. */ ! free(pqlockarray); ! pqlockarray = NULL; } } --- 918,925 * This means we leak a little memory on repeated load/unload * of the library. */ ! free(pq_lockarray); ! pq_lockarray = NULL; } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pgsql: Properly unregister OpenSSL callbacks when libpq is done with
Kris Jurka wrote: Magnus Hagander wrote: Log Message: --- Properly unregister OpenSSL callbacks when libpq is done with it's connection. This is required for applications that unload the libpq library (such as PHP) in which case we'd otherwise have pointers to these functions when they no longer exist. Breaks the build with --enable-thread-safety and --with-openssl because of this typo. Thanks, applied. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
What fun. I'm beginning to remember why nobody has ever managed to deliver a community tool that helps with this configuration task before. I have to say I really like this tool. It may not be perfect but it's a lot easier than trying to do this analysis from scratch. And we are really only arguing about a handful of settings. It wouldn't take a lot to convince me that checkpoint_segments=3 is too low. I easily blew through that testing the bulk-insert tuning patch. I'm curious why wal_buffers is being set to 512 * checkpoint_segments. Are they related? The default value for wal_buffers is only 64 kB, which means someone thought you shouldn't need much space for this at all, but this suggests a setting in the 4-32 MB range, an increase of ~2 orders of magnitude. For all I know that could be right but it's a big increase. Regarding the religious war now in progress, I think it would be awfully good for someone to offer some thoughts on how to figure out which particular columns on which particular tables need a higher statistics target. That might allow us to either (a) build a wizard that helps you find those problems that could perhaps be used alongside this one or (b) incorporate those same smarts into core. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)
KaiGai Kohei wrote: I updated the patch set of SE-PostgreSQL (revision 1268). [1/6] http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1268.patch [2/6] http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1268.patch [3/6] http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1268.patch [4/6] http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1268.patch [5/6] http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1268.patch [6/6] http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1268.patch Draft of the SE-PostgreSQL documentation is here: http://wiki.postgresql.org/wiki/SEPostgreSQL List of updates: - The patches are rebased to the CVS HEAD. - RelOptions related hooks are cleaned up. - The Row-level ACL feature is chosen in default. - rowacl_table_default() is added to show the default ACL of the table. - The initial revision of regression test for Row-level ACL is added. If you have anything to comment for the patches, could you disclose it? It is not necessary to be a comprehensive one. Don't hesitate to submit. I looked over the patch and was wondering why you chose to have a configure option to disable row-level ACLs. I assume that could just be always enabled. In fact, perhaps that is the first part of the patch that should be applied because it doesn't rely on SE-Linux. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: On Thu, 4 Dec 2008, Gregory Stark wrote: What I'm suggesting is that you shouldn't have to special case this. That you should expect whatever formulas you're using to produce the same values as initdb if they were run on the same machine initdb is targeting. The reason that approach isn't taken is that the model here assumes the OS overhead is negligable relative to everything else going on. ok that does make sense. But the non-memory parameters... ... I've cut part of my response for a separate thread ... And desktop seems like an outlier here. I suppose it's meant to capture whether postgres is on a dedicated box? But it's possible to have a non-dedicated oltp application or non-dedicated data warehouse box just as easily. That's the target for something that's not a dedicated server--a desktop PC you use as a general workstation, maybe you're installing PostgreSQL as a developer that's competing with your web server and other apps; something like that. There might be a better name for that. My point was more that you could have a data warehouse on a non-dedicated machine, you could have a web server on a non-dedicated machine, or you could have a mixed server on a non-dedicated machine. I don't see how you would decide whether to set enable_constraint_exclusion for desktop for example. Is 368MB of overhead unreasonable for a web application database today Well I think it's more than most people expect a single application install to take up before they start putting data in it. Segments don't get allocated until you churn through that much WAL activity; that figure is an upper-bound after you've pushed more than that worth of data through WAL and into the database. The only example where this overhead isn't dwarfed by the size of the resulting database Right, well, no, it won't be dwarfed -- it'll be about the same size. Ie, if you load 100MB into the database there'll be about 100MB of logs generated. Up to the point where you hit this maximum upper bound. But yes, right that it's the upper bound for the extra space allocated in addition to the size of the database. And how much extra space should we allocate? I don't see why this extra space bound should depend on the type of OLTP vss DSS workload. Only on how much disk space is available that the admin is willing to dedicate to Postgres. Assuming an admin of a 1TB server is willing to dedicate 1GB to logs and the admin of a 1GB server would be annoyed to have to throw more than a few hundred megs seems as reasonable a place as any to start. is where some small number of records are inserted, then constantly updated and vacuumed. And you know what? The person doing that is likely to really benefit from having checkpoint_segments set to a larger value. Update and vacuum heavy workloads are exactly the sort where you end up checkpointing too often with the default parameters. Well there are a few problems with this. a) we have HOT now so you don't need any vacuums to be part of the picture. b) if you're updating the same pages over and over again a checkpoint will be super-quick since there will only be a few pages to write out so no you don't really need some large checkpoint_segments for any performance reason. Josh's logic is impeccable -- for the specific use case he's describing of a truly dedicated server with enough disk space for a major production database. But not every install is going to have gigabytes of space reserved for it and not every admin is going to realize that he really should set aside gigabytes of space even though he only expects his database to be a few megabytes. I'm really beginning to think the root of the problem is the name. If it were transaction_log_max_space and measured in megabytes people would be happy to say ok, I'll make space for 100MB of logs or whatever. Today they don't know what to set it to or what the impact of setting it will be. Unless they do something crazy like read the documentation: Well we know nobody does that :/ It's great that Postgres has such great documentation but whenever we have the chance to replace something with an option which doesn't need any documentation that would be even better. I'm just exploring whether that's an option here. What fun. I'm beginning to remember why nobody has ever managed to deliver a community tool that helps with this configuration task before. Well I don't think this is why. Nobody's even tried to do this side of things before. They always got bogged down in trying to parse config files and such. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 4 Dec 2008, Gregory Stark wrote: Greg Smith [EMAIL PROTECTED] writes: Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? Surely that's the same question we faced when deciding what the Postgres default should be? Gosh, you're right. I'm really new here, and I just didn't understand how things work. I should have known that there was lots of thorough research into that setting before the default was set. (hangs head in shame) Wait, what list am I on? pgsql-hackers? Oh, crap, that can't be right at all then. This one is actually an interesting example of how this stuff ends up ossified without being revisited, I'm glad you brought it up. First we have to visit the 8.1 and 8.2 documentation. There we find the real reason it originally defaulted to off: http://www.postgresql.org/docs/8.1/static/runtime-config-query.html Currently, constraint_exclusion is disabled by default because it risks incorrect results if query plans are cached if a table constraint is changed or dropped, the previously generated plan might now be wrong, and there is no built-in mechanism to force re-planning. It stayed off for that reason for years. Then the plan invalidation stuff went into 8.3 that made this no longer true. Bruce even removed the item from the TODO list that used to say that constraint_exclusion should be improved to allow it to be used for all statements with little performance impact. Then a couple of months later, when the 8.3 docs were being worked on, Tom updated the text to remove the obsolete warning about the plan risks: http://archives.postgresql.org/pgsql-committers/2007-03/msg00372.php Leaving only the leftovers of the original caveat about how it can also cause some overhead as the reason for why it was still off--a concern which was certainly more serious when that text was written in 2005 than it is today for multiple reasons. How much was that overhead lowered by the work done in 8.3? I can't find any public information suggesting that was ever even discussed. The only thing I found when poking around looking for it is that Tom had expressed some concerns that the proof overhead was too still large back in 2006: http://archives.postgresql.org/pgsql-committers/2006-02/msg00035.php But you know what? The cached proof comparison bit Tom commited a couple of weeks ago shifted the mechanics of the overhead for this specific case around, so even if we did have 8.3 results they'd need to get re-run at this point anyway. See below for more on what might be different soon. So, if you want to say that turning on constraint_exclusion by default is a horrible idea because it adds significant overhead, and you have any sort of evidence that will still be true for 8.4 on the kind of hardware 8.4 is likely to run on, I would greatly appreciate that information. But presuming that serious thought must have went into every decision made about what the defaults for all the performance-related parameter in the postgresql.conf is something we all know just ain't so. What I see is a parameter that doesn't add enough overhead relative to query execution time on today's systems that I've noticed whether it was on or off, one that's set to off only by historical accident combined with basic conservatism (mainly from Tom far as I can tell, he's a nice reliable source for that). Whereas if it's accidentally set wrong, it can lead to massively wrong plans. I'm not sure what the right move here is, but the appeal to authority approach for defending the default here isn't going to work on me. That and the unstated other question Is someone more likely to use partitions without reading the manual or not use partitions without reading the manual about the down-sides of constraint_exclusion (in the partitioning section) Have you started thinking about the implications of http://archives.postgresql.org/message-id/[EMAIL PROTECTED] yet? It is a bold new world of people who partition with less time stuck in the manual first we approach, and I was very much thinking about that when mulling over whether I agreed with Josh's suggestion to put that into the default mixed settings before I went with it (that's right--I wrote all the above and it wasn't even my idea originally). If that doesn't make it into 8.4 I will yield to your statement of the boring, manual-reading status quo still being on target. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1268)
Bruce Momjian wrote: KaiGai Kohei wrote: I updated the patch set of SE-PostgreSQL (revision 1268). [1/6] http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1268.patch [2/6] http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1268.patch [3/6] http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1268.patch [4/6] http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1268.patch [5/6] http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1268.patch [6/6] http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1268.patch Draft of the SE-PostgreSQL documentation is here: http://wiki.postgresql.org/wiki/SEPostgreSQL List of updates: - The patches are rebased to the CVS HEAD. - RelOptions related hooks are cleaned up. - The Row-level ACL feature is chosen in default. - rowacl_table_default() is added to show the default ACL of the table. - The initial revision of regression test for Row-level ACL is added. If you have anything to comment for the patches, could you disclose it? It is not necessary to be a comprehensive one. Don't hesitate to submit. I looked over the patch and was wondering why you chose to have a configure option to disable row-level ACLs. There are no explicit reasons. I thought it was natural, as if we can build Linux kernel without any enhanced security features (SELinux, SMACK and so on). I don't oppose to elimination of --disable-row-acl options, however, it is not clear for me whether it should be unavoidable selection in the future, or not. I assume that could just be always enabled. It is not always enabled. When we build it with SE-PostgreSQL feature, rest of enhanced security features (includes the row-level ACL) are disabled automatically, as we discussed before. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: On Thu, 4 Dec 2008, Gregory Stark wrote: Greg Smith [EMAIL PROTECTED] writes: Is it worse to suffer from additional query overhead if you're sloppy with the tuning tool, or to discover addition partitions didn't work as you expected? Surely that's the same question we faced when deciding what the Postgres default should be? Gosh, you're right. I'm really new here, and I just didn't understand how things work. I should have known that there was lots of thorough research into that setting before the default was set. (hangs head in shame) Oh no, in this case I meant just that if we want to change it we should change it in *both* places. That the argument you're making applies just as much to the Postgres default as it does to the mixed workload default in the tuner. But I admit I totally didn't remember that the main reason it was originally off was the lack of plan invalidation. That does rather change things. Perhaps we should be enabling it now. If we do though, it shouldn't default one way and then get randomly flipped by a tool that has the same information to make its decision on. What I'm saying is that mixed is the same information that initdb had about the workload. If we do change this then I wonder if we need the parameter at all. I mean, we don't generally have parameters to turn off random parts of the optimizer... How much was that overhead lowered by the work done in 8.3? I can't find any public information suggesting that was ever even discussed. Well it does have to compare every constraint with every clause and do a moderately complex analysis. It's never going to be super-fast for complex queries. But on the other hand it should drop out pretty fast if the tables haven't got any constraints so it does seem like it's only hurting people when they would want it on anyways. What I see is a parameter that doesn't add enough overhead relative to query execution time on today's systems that I've noticed whether it was on or off There's a danger in this. There's some famous, probably apocryphal, example of a fast food restaurant that taste tested their menu and got great results. Then they cheapened an ingredient and their testers couldn't taste the difference. Then they did that with another ingredient and another and so on and each time the testers couldn't taste a difference. And in the end they ended up rolling out a dramatically inferior menu which people panned compared to the original... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Mon, Dec 1, 2008 at 9:32 PM, Greg Smith [EMAIL PROTECTED] wrote: On Mon, 1 Dec 2008, Mark Wong wrote: So then I attempted to see if there might have been difference between the executing time of each individual query with the above parameters. The queries that don't seem to be effected are Q1, Q4, Q12, Q13, and Q15. Q17 suggests that anything higher than default_statistics_target=10 is an improvement. The rest of the queries appears not to follow any particular trend with respect to default_statistics_target. The interesting ones are Q2, Q9, Q17, Q18, and Q20, and that data is much more useful than the summary. As you mention, Q17 improves significantly with a higher target. All of the rest are dramatically slower in one or both tests going from default_statistics_target=10 to 100. Those look like the most useful data points on the X axis--the increases from 100 up to 1000 aren't particularly interesting in most of these, except in Q20 where the Power Test seems to oscillate between degrees of good and bad behavior seemingly at random. My picks for the most useful graphs from the long list Mark sent: http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png I think the tests you could consider next is to graph the target going from 10 to 100 in steps of 10 just for those 5 queries. If it gradually degrades, that's interesting but hard to nail down. But if there's a sharp transition, getting an explain plan for the two sides of that should provide some insight. I'm really more interested in the ones that slowed down than the one that improved, understanding that might finally provide some evidence against increasing it by default. I've updated the charts to include results from setting default_statistics_target from 20-90. The links to the charts are the same. The links to the raw data are in http://207.173.203.223/~markwkm/community6/dbt3/ in directories 45-52. Regards, Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
If we do though, it shouldn't default one way and then get randomly flipped by a tool that has the same information to make its decision on. What I'm saying is that mixed is the same information that initdb had about the workload. +1. If we do change this then I wonder if we need the parameter at all. I mean, we don't generally have parameters to turn off random parts of the optimizer... It probably isn't a good idea to both change the default setting and remove the parameter in the same release. It would be awesome if this is cheap enough now to have it on by default - but constraints are pretty useful for maintaining data integrity, so it's conceivable to me that someone could have a lot of constraints most of which are unuseful for query planning. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Thu, 4 Dec 2008, Gregory Stark wrote: My point was more that you could have a data warehouse on a non-dedicated machine, you could have a web server on a non-dedicated machine, or you could have a mixed server on a non-dedicated machine. I should just finish the documentation, where there will be a big disclaimer saying THESE SETTINGS ASSUME A SERVER DEDICATED TO POSTGRESQL! That's the context here. Why, after you follow my tuning instructions, you're lucky if the server will run anything but the database afterwards. Josh's logic is impeccable -- for the specific use case he's describing of a truly dedicated server with enough disk space for a major production database. But not every install is going to have gigabytes of space reserved for it and not every admin is going to realize that he really should set aside gigabytes of space even though he only expects his database to be a few megabytes. It's really quite simple. Josh and I don't care directly about disk space used by the WAL for people with trivial databases. At all. Whatsoever. Maybe once, long ago, when we were young and frugal and skinny[1]; not now, or probably ever again the future. If that's your concern, maybe there can be some companion utility named pgmiser that lowers parameters back down again. Your mascot can be some sort of animal that efficiently lives off small scraps of food or something.[2] The context here is pgtune, which is aiming to make a fat elephant of a server faster so that there's an answer to people who say My benchmarks are all running really slow, is this because my system with 16PT of RAM is only using 32MB of it for the database? This sucks, I'm going back to Oracle which used all my RAM. If there are people who instead think, hey, I'll run this tuning utility to make my database faster, then it will also be a lot smaller!, maybe we can find a class about space/time tradeoffs in algorithm design to send them to or something.[3] There are exactly two important things here. The first is how large checkpoint_settings needs to be in order to for the considerable overhead of checkpoints to be bearable. That drives the setting up. Our super-fat DW application gets set to at least 64 so that when you bulk-load another TB of data into it, that doesn't get bottlenecked dumping gigabytes of dirty buffers every few seconds. If the database crashes and recovery reads or writes a bunch of data, who cares about random writes because your SAN has a 4GB write cache on it and dozens of drives slaving away. Driving the setting down is knowing how much time you'll have to wait for recovery to happen, which is really a measure of what your tolerance for downtime is. We're thinking that someone who picks the Desktop tuning may have no tolerance for the database to be sluggish coming back up after Windows crashed and they rebooted, so tiny setting for them to make recovery super fast. Everybody else in our sample profiles fall in the middle of those two extremes, which is why the values curve the way they do. Web app? Probably not a lot of write volume, probably trouble if it's down a long time; how about 8, on the low side, but it gives checkpoints more time to spread out their I/O so worst-case latency isn't as bad. That's the sort of analysis those numbers come from. Do performance tuning and juggle these trade-offs for long enough for new people all the time, you get a gut feel for the right ballpark an app should start at based on its type. The whole idea behind this tool is that we're taking some of that hard-won knowledge and trying to automate the distribution of it. It's great that Postgres has such great documentation but whenever we have the chance to replace something with an option which doesn't need any documentation that would be even better. I'm just exploring whether that's an option here. I would be glad to have a post-CommitFest discussion of this very topic as it's quite a pain to me in its current form. Just not right now because it's too late to touch it. Nobody's even tried to do this side of things before. They always got bogged down in trying to parse config files and such. It's actually because most of them were working in Perl, which encourages deviant behavior where people delight in converting useful ideas into illegible punctuation rather than actually getting anything done. Except for that other Greg around here who's not involved in this discussion, his Perl is pretty good. [1] Josh is being aggressively bulked up right now for his next sumo match. [2] Like a rat, which would give you an excuse to add the long overdue PL/Ratfor. [3] This wouldn't actually help them learn anything, but it would make their heads explode at which point all their problems are gone. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
Re: [HACKERS] Simple postgresql.conf wizard
I think the tests you could consider next is to graph the target going from 10 to 100 in steps of 10 just for those 5 queries. If it gradually degrades, that's interesting but hard to nail down. But if there's a sharp transition, getting an explain plan for the two sides of that should provide some insight. I'm really more interested in the ones that slowed down than the one that improved, understanding that might finally provide some evidence against increasing it by default. I've updated the charts to include results from setting default_statistics_target from 20-90. The links to the charts are the same. The links to the raw data are in http://207.173.203.223/~markwkm/community6/dbt3/ in directories 45-52. I still think we're missing the boat here because it's not really the same query every time. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
On Wed, 3 Dec 2008, Mark Wong wrote: http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q2.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q9.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q17.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q18.png http://207.173.203.223/~markwkm/pgsql/default_statistics_target/q20.png I've updated the charts to include results from setting default_statistics_target from 20-90. Interesting. Q2: Explodes hard when the target hits 100, but not before. Would be interesting to compare the explain plans at, say, 50 and 150 to see if it's possible to nail down what is shifting so much. Q3: goes bonkers as soon as the target hits 20, so the interesting plans to compare are 10 and 20. Q17: our one from the original set that improved a bunch with the larger target gets all that behefit just from going to 20. Would be interesting to compare the plans at 10 and 20 to see what changed so much with such a small difference. Q18: looks like it was in some sort of local bad area around 100-400 before, with some more context that one doesn't look interesting anymore. Q20: also doesn't look very interesting anymore. The results at 10 were nice, and the ones at 100 were among the unusuallly bad ones, but it's pretty random--if there was something inherantly bad related to the stats, there wouldn't be low points around 200. Out of those, the most interesting one to me (as someone who is trying to defend raising the target some but not going crazy with that) is Q3. The reason I say that is that everything else is better or basically the same raising the target from 10, as long as you don't go too high (=100). That one falls apart immediately with a larger target which seems weird. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Hi, On Wed, Dec 3, 2008 at 11:33 PM, Simon Riggs [EMAIL PROTECTED] wrote: I'm patient, I know it takes time. Happy to spend hours on the review, but I want to do that knowing I agree with the higher level features and architecture first. I wrote the features and restrictions of Synch Rep. Please also check it together with the figures of architecture. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#User_Overview Some questions to check my understanding (expected answers in brackets) * Diagram on p.2 has two Archives. We have just one (yes) No, we need archive in both the primary and standby. The primary needs archive because a base backup is required when starting the standby. Meanwhile, the standby needs archive for cooperating with pg_standby. If the directory where pg_standby checks is the same as the directory where walreceiver writes the WAL, the halfway WAL file might be restored by pg_standby, and continuous recovery would fail. So, we have to separate the directories, and I assigned pg_xlog and archive to them. Another idea; walreceiver writes the WAL to the file with temporary name, and rename it to the formal name when it fills. So, pg_standby doesn't restore a halfway WAL file. But it's more difficult to perform the failover because the unrenamed WAL file remains. Do you have any other good idea? * We send data continuously, whether or not we are in sync/async? (yes) Yes. So the only difference between sync/async is whether we wait when we flush the commit? (yes) Yes. And, in asynch case, the backend basically doesn't send the wakeup-signal to walsender. * If we have synchronous_commit = off do we ignore synchronous_replication = on (yes) No, we can configure them independently. synchronous_commit covers only local writing of the WAL. If synch_*commit* should cover both local writing and replication, I'd like to add new GUC which covers only local writing (synchronous_local_write?). * If two transactions commit almost simultaneously and one is sync and the other async then only the sync backend will wait? (Yes) Yes. Do we definitely need the archiver to move the files written by walreceiver to archive and then move them back out again? Yes, it's because of cooperating with pg_standby. Seems like we can streamline that part in many (all?) cases. Agreed. But I thought that such streaming was TODO of next time. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-place upgrade: catalog side
Heikki Linnakangas napsal(a): Zdenek Kotala wrote: Greg Smith napsal(a): -There are 10 TODO items listed for the pg_migrator project, most or all of which look like should be squashed before this is really complete. Any chance somebody (Korry?) has an improved version of this floating around beyond what's in the pgfoundry CVS already? As I mentioned before pg_migrator and pg_upgrade.sh is not good way. It is workaround. It does not make sense to continue in this way. Why not? Problem is the pg_dump does not export all important data for upgrade. For example relfileid and so on. However, biggest problem here are dropped columns (thanks to point me on this issue). Dropped column does not have information about type. It could be possible to fake it somehow during a dump and drop them again after restore, but I'm not convinced that it is what we want. The solution is good now as a starter, but it is far from final solution. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Alvaro Herrera wrote: Gregory Stark escribió: Joshua D. Drake [EMAIL PROTECTED] writes: I don't think at any time I have said to my self, I am going to set this parameter low so I don't fill up my disk. If I am saying that to myself I have either greatly underestimated the hardware for the task. Consider that we are quarreling over what amounts to a nominal amount of hard drive space, 1000 checkpoint_segments = 1.6G of space. My phone has more capacity than that. Well my phone has 16G of RAM, why not 1 ? I don't think the disk space used is the only consideration here. You also have to keep recovery time in mind. If you set it to 1000, recovery would take way too long. Presumably if you set checkpoint_segments to a high value, you'd use checkpoint_timeout to limit recovery time. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-place upgrade: catalog side
On Wed, 3 Dec 2008, Zdenek Kotala wrote: It works fine for 8.3-8.4 too, but I'm working on cleanup and fixing bugs. I hope that I will send updated version to community today. That would be great. It didn't feel like you were quite done with it yet. I'll be glad to help test it out, just didn't want to jump into that if it was known to still have issues that were being worked on. Please let us know what the remaining bugs you know about are at that point, I really don't want this part of things to get ignored just because the page format stuff is the harder part. It is more workaround or temporary solution. This approach is easy but it has lot of limitation. Problem with toast tables is one, but biggest problem is with dropped columns. And maybe there will be more issues. Problem with dump is that you lost a internal data. Can you be a bit more specific about what the problems with TOAST and dropped columns are? If those are covered in your presentation or came up already and I missed it, just point me that way; I'm still working my way through parts of this and don't expect to ever have it all in my head like you do at this point. Obviously this approach is going to be somewhat traumatic even if perfectly executed because of things like losing table statistics. As we move closer to final crunch time here, what I am trying to keep clear in my head is which bits are absolutely required to do any type of in-place upgrade, whether or not the page format changes in 8.4. What's nice is that those parts I can be testing right now just by trying to upgrade from 8.3 to 8.4. Barring things like the TOAST problem you mention getting in the way, the fundamental approach taken by these upgrade scripts seems workable for the job even it's not optimal, and that's a whole lot better than nothing at all. I personally prefer to have special mode (like boostrap) which converts data from old catalog to new format. That's a perfectly fine idea I would like to see too. But if we have to write such a thing from scratch right now, I'm afraid that may be too late to implement and still ship the next release on schedule. And if such bootstrap code is needed, we sure need to make sure the prototype it's going to be built on is solid ASAP. That's what I want to help you look into if you can catch me up a bit here. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers