[PATCHES] Split _bt_insertonpg to two functions
Here's a patch that: Moves the logic to find a page with enough room from _bt_insertonpg to a new function, _bt_findinsertloc. It makes the code more readable, and simplifies the forthcoming Grouped Index Tuples patch. Also, the insert location within page used to be calculated twice for unique indexes, once in _bt_checkunique and second time in _bt_insertonpg. That's a waste of cycles, and this patch fixes that. I couldn't measure a difference with pgbench, but this micro-benchmark shows it: psql postgres -c CREATE TABLE inserttest (i int PRIMARY KEY); psql postgres -c TRUNCATE inserttest; checkpoint;; sync time ~/pgsql.cvshead/bin/psql postgres -c TRUNCATE inserttest; INSERT INTO inserttest SELECT a FROM generate_series(1,100) a; Without patch: real0m7.260s With patch: real0m6.963s On my laptop, fsync=off, full_page_writes=off, checkpoint_segments = 10, to remove any other variables. It's not a huge difference, but it's worth having, and performance wasn't the main motivation of the patch anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/nbtree/nbtinsert.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v retrieving revision 1.152 diff -c -r1.152 nbtinsert.c *** src/backend/access/nbtree/nbtinsert.c 21 Feb 2007 20:02:17 - 1.152 --- src/backend/access/nbtree/nbtinsert.c 26 Feb 2007 09:37:16 - *** *** 46,58 static Buffer _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf); static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, ! Relation heapRel, Buffer buf, ScanKey itup_scankey); static void _bt_insertonpg(Relation rel, Buffer buf, BTStack stack, - int keysz, ScanKey scankey, IndexTuple itup, ! OffsetNumber afteritem, bool split_only_page); static Buffer _bt_split(Relation rel, Buffer buf, OffsetNumber firstright, OffsetNumber newitemoff, Size newitemsz, --- 46,63 static Buffer _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf); static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, ! Relation heapRel, Buffer buf, OffsetNumber ioffset, ScanKey itup_scankey); + static void _bt_findinsertloc(Relation rel, + Buffer *bufptr, + OffsetNumber *offsetptr, + int keysz, + ScanKey scankey, + IndexTuple newtup); static void _bt_insertonpg(Relation rel, Buffer buf, BTStack stack, IndexTuple itup, ! OffsetNumber newitemoff, bool split_only_page); static Buffer _bt_split(Relation rel, Buffer buf, OffsetNumber firstright, OffsetNumber newitemoff, Size newitemsz, *** *** 86,91 --- 91,97 ScanKey itup_scankey; BTStack stack; Buffer buf; + OffsetNumber offset; /* we need an insertion scan key to do our search, so build one */ itup_scankey = _bt_mkscankey(rel, itup); *** *** 94,99 --- 100,107 /* find the first page containing this key */ stack = _bt_search(rel, natts, itup_scankey, false, buf, BT_WRITE); + offset = InvalidOffsetNumber; + /* trade in our read lock for a write lock */ LockBuffer(buf, BUFFER_LOCK_UNLOCK); LockBuffer(buf, BT_WRITE); *** *** 128,134 { TransactionId xwait; ! xwait = _bt_check_unique(rel, itup, heapRel, buf, itup_scankey); if (TransactionIdIsValid(xwait)) { --- 136,143 { TransactionId xwait; ! offset = _bt_binsrch(rel, buf, natts, itup_scankey, false); ! xwait = _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey); if (TransactionIdIsValid(xwait)) { *** *** 142,148 } /* do the insertion */ ! _bt_insertonpg(rel, buf, stack, natts, itup_scankey, itup, 0, false); /* be tidy */ _bt_freestack(stack); --- 151,158 } /* do the insertion */ ! _bt_findinsertloc(rel, buf, offset, natts, itup_scankey, itup); ! _bt_insertonpg(rel, buf, stack, itup, offset, false); /* be tidy */ _bt_freestack(stack); *** *** 152,169 /* * _bt_check_unique() -- Check for violation of unique index constraint * * Returns InvalidTransactionId if there is no conflict, else an xact ID * we must wait for to see if it commits a conflicting tuple. If an actual * conflict is detected, no return --- just ereport(). */ static TransactionId _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, ! Buffer buf, ScanKey itup_scankey) { TupleDesc itupdesc = RelationGetDescr(rel); int natts = rel-rd_rel-relnatts; ! OffsetNumber offset, ! maxoff; Page page; BTPageOpaque opaque; Buffer nbuf = InvalidBuffer; --- 162,182 /* * _bt_check_unique() -- Check for violation of unique index constraint * + * offset points to the first
Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum
Hi, I was wondering if we can apply the same logic of recalculating OldestXmin within IndexBuildHeapScan which occurs as part of create index operation? Having to index lesser number of tuples should be a good save in the CREATE INDEX CONCURRENTLY case, if the above is possible? Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] [BUGS] BUG #2969: Inaccuracies in Solaris FAQ
Peter Eisentraut wrote: Zdenek Kotala wrote: There is Solaris FAQ update. Please, look on it and let me know any comments. The actual answer to the question Can I compile PostgreSQL with Kerberos v5 support? is Yes, why not?. I don't think Can I use this weird internal private library that seems to provide a similar interface? is a Frequently Asked Question. I don't use Solaris so I don't know if it is frequent issue or not. I assume the author knows. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PATCHES] 1-byte packed varlena headers
Updated patch at: http://community.enterprisedb.com/varlena/patch-varvarlena-12.patch.gz This fixes a rather critical oversight which caused it all to appear to work but not actually save any space. I've added an assertion check that the predicted tuple size matches the tuple size generated by heap_form*tuple. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES] Numeric patch to add special-case representations for 8 bytes
I've uploaded a quick hack to store numerics in 8 bytes when possible. http://community.enterprisedb.com/numeric-hack-1.patch This is a bit of a kludge since it doesn't actually provide any interface for external clients of the numeric module to parse the resulting data. Ie, the macros in numeric.h will return garbage. But I'm not entirely convinced that matters. It's not like those macros were really useful to any other modules anyways since there was no way to extract the actual digits. The patch is also slightly unsatisfactory because as I discovered numbers like 1.1 are stored as two digits currently. But it does work and it does save a substantial amount of space for integers. postgres=# select n,pg_column_size(n) from n; n | pg_column_size --+ 1 | 2 11 | 2 101 | 2 1001 | 3 10001 | 9 11 | 9 1.1 | 9 10.1 | 9 100.1 | 9 1000.1 | 9 1.1 | 11 10.1 | 11 I had hoped to get the second batch to be 3-4 bytes. But even now note how much space is saved for integers 1. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Allow \pset to parse on or off for boolean values
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Chad Wagner wrote: This is a TODO item: o Allow psql \pset boolean variables to set to fixed values, rather than toggle Basically allows for: test=# \pset expanded on Expanded display is on. test=# \pset footer off Default footer is off. test=# \pset footer Default footer is on. test=# \pset footer Default footer is off. test=# \pset footer Default footer is on. Basically the change is to check if value in do_pset is != NULL to call ParseVariableBool to get a true/false and set the value directly, of the second parameter is not provided then the old logic of toggling is supported. The pset variables that are adjusted are: expanded numericlocale tuples_only pager (supports on/off/always now) footer [ Attachment, skipping... ] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
I am a little concerned about a log_* setting that is INFO. I understand why you used INFO (for log_min_error_messages), but INFO is inconsistent with the log* prefix, and by default INFO doesn't appear in the log file. So, by default, the INFO is going to go to the user terminal, and not to the logfile. Ideas? --- Simon Riggs wrote: On Mon, 2007-02-19 at 19:38 +, Simon Riggs wrote: On Tue, 2007-02-13 at 22:19 -0500, Bruce Momjian wrote: Simon Riggs wrote: On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote: Chris Campbell [EMAIL PROTECTED] writes: Is there additional logging information I can turn on to get more details? I guess I need to see exactly what locks both processes hold, and what queries they were running when the deadlock occurred? Is that easily done, without turning on logging for *all* statements? log_min_error_statement = error would at least get you the statements reporting the deadlocks, though not what they're conflicting against. Yeh, we need a much better locking logger for performance analysis. We really need to dump the whole wait-for graph for deadlocks, since this might be more complex than just two statements involved. Deadlocks ought to be so infrequent that we can afford the log space to do this - plus if we did this it would likely lead to fewer deadlocks. For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter that would allow you to dump the wait-for graph for any data-level locks that wait too long, rather than just those that deadlock. Many applications experience heavy locking because of lack of holistic design. That will also show up the need for other utilities to act CONCURRENTLY, if possible. Old email, but I don't see how our current output is not good enough? test= lock a; ERROR: deadlock detected DETAIL: Process 6855 waits for AccessExclusiveLock on relation 16394 of database 16384; blocked by process 6795. Process 6795 waits for AccessExclusiveLock on relation 16396 of database 16384; blocked by process 6855. This detects deadlocks, but it doesn't detect lock waits. When I wrote that it was previous experience driving me. Recent client experience has highlighted the clear need for this. We had a lock wait of 50 hours because of an RI check; thats the kind of thing I'd like to show up in the logs somewhere. Lock wait detection can be used to show up synchronisation points that have been inadvertently designed into an application, so its a useful tool in investigating performance issues. I have a patch implementing the logging as agreed with Tom, will post to patches later tonight. Patch for discussion, includes doc entries at top of patch, so its fairly clear how it works. Output is an INFO message, to allow this to trigger log_min_error_statement when it generates a message, to allow us to see the SQL statement that is waiting. This allows it to generate a message prior to the statement completing, which is important because it may not ever complete, in some cases, so simply logging a list of pids won't always tell you what the SQL was that was waiting. Other approaches are possible... Comments? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com [ Attachment, skipping... ] -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] correct format for date, time, timestamp for XML functionality
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Pavel Stehule wrote: Hello, this patch ensures independency datetime fields on current datestyle setting. Add new internal datestyle USE_XSD_DATESTYLE. It's almoust same to USE_ISO_DATESTYLE. Differences are for timestamp: ISO: -mm-dd hh24:mi:ss XSD: -mm-ddThh24:mi:ss I found one link about this topic: http://forums.oracle.com/forums/thread.jspa?threadID=467278tstart=0 Regards Pavel Stehule _ Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. http://messenger.msn.cz/ [ Attachment, skipping... ] ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] pg_standby Error cleanup
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Darcy Buskermolen wrote: Please find attached a patch which provides for logging in the event that -k is unable to clean up an old WAL file. Also make the failed to remove file error message consistant for the trigger file. -- Darcy Buskermolen Command Prompt, Inc. Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ [ Attachment, skipping... ] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
On Mon, 2007-02-26 at 13:34 -0500, Bruce Momjian wrote: I am a little concerned about a log_* setting that is INFO. I understand why you used INFO (for log_min_error_messages), but INFO is inconsistent with the log* prefix, and by default INFO doesn't appear in the log file. Yeh, LOG would be most appropriate, but thats not possible. log_min_messages allows only DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE and WARNING for non-error states. Possibly DEBUG1? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
Simon Riggs [EMAIL PROTECTED] writes: Yeh, LOG would be most appropriate, but thats not possible. You have not given any good reason for that. log_min_messages allows only DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE and WARNING for non-error states. I don't think you understand quite how the log message priority works... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
On Mon, 2007-02-26 at 14:11 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Yeh, LOG would be most appropriate, but thats not possible. You have not given any good reason for that. The idea of the patch is that it generates a log message which then invokes log_min_error_statement so that the SQL statement is displayed. LOG is not on the list of options there, otherwise I would use it. The reason for behaving like this is so that a message is generated while the statement is still waiting, rather than at the end. As I mentioned in the submission, you may not like that behaviour; I'm in two minds myself, but I'm trying to get to the stage of having useful information come out of the server when we have long lock waits. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
Simon Riggs [EMAIL PROTECTED] writes: The idea of the patch is that it generates a log message which then invokes log_min_error_statement so that the SQL statement is displayed. LOG is not on the list of options there, otherwise I would use it. As I said, you don't understand how the logging priority control works. LOG *is* the appropriate level for stuff intended to go to the server log. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
On Mon, 2007-02-26 at 14:28 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: The idea of the patch is that it generates a log message which then invokes log_min_error_statement so that the SQL statement is displayed. LOG is not on the list of options there, otherwise I would use it. As I said, you don't understand how the logging priority control works. LOG *is* the appropriate level for stuff intended to go to the server log. Please look at the definition of log_min_error_statement, so you understand where I'm coming from. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2007-02-26 at 14:28 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: The idea of the patch is that it generates a log message which then invokes log_min_error_statement so that the SQL statement is displayed. LOG is not on the list of options there, otherwise I would use it. As I said, you don't understand how the logging priority control works. LOG *is* the appropriate level for stuff intended to go to the server log. Please look at the definition of log_min_error_statement, so you understand where I'm coming from. I *have* read the definition of log_min_error_statement. (The SGML docs are wrong btw, as a quick look at the code shows that LOG is an accepted value.) The real issue here is that send_message_to_server_log just does if (edata-elevel = log_min_error_statement debug_query_string != NULL) to determine whether to log the statement, whereas arguably it should be using a test like is_log_level_output --- that is, the priority ordering for log_min_error_statement should be like log_min_messages not like client_min_messages. We've discussed that before in another thread, but it looks like nothing's been done yet. In any case, if you're unhappy with the code's choice of whether to emit the STATEMENT part of a log message, some changes here are what's indicated, not bizarre choices of elevel for individual messages. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
On Mon, 2007-02-26 at 14:52 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2007-02-26 at 14:28 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: The idea of the patch is that it generates a log message which then invokes log_min_error_statement so that the SQL statement is displayed. LOG is not on the list of options there, otherwise I would use it. As I said, you don't understand how the logging priority control works. LOG *is* the appropriate level for stuff intended to go to the server log. Please look at the definition of log_min_error_statement, so you understand where I'm coming from. I *have* read the definition of log_min_error_statement. (The SGML docs are wrong btw, as a quick look at the code shows that LOG is an accepted value.) OK, I should have looked passed the manual. The real issue here is that send_message_to_server_log just does if (edata-elevel = log_min_error_statement debug_query_string != NULL) to determine whether to log the statement, whereas arguably it should be using a test like is_log_level_output --- that is, the priority ordering for log_min_error_statement should be like log_min_messages not like client_min_messages. We've discussed that before in another thread, but it looks like nothing's been done yet. Hopefully not with me? Don't remember that. In any case, if you're unhappy with the code's choice of whether to emit the STATEMENT part of a log message, some changes here are what's indicated, not bizarre choices of elevel for individual messages. Well, I would have chosen LOG if I thought it was available. I'll do some more to the patch. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
Simon Riggs wrote: On Mon, 2007-02-26 at 13:34 -0500, Bruce Momjian wrote: I am a little concerned about a log_* setting that is INFO. I understand why you used INFO (for log_min_error_messages), but INFO is inconsistent with the log* prefix, and by default INFO doesn't appear in the log file. Yeh, LOG would be most appropriate, but thats not possible. log_min_messages allows only DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE and WARNING for non-error states. Possibly DEBUG1? This highlights a problem we have often had with LOG output where we also want the query. I think there are two possible approaches. First, we could add a new bitmap value like LOG_STATEMENT to ereport when we want the statement with the log line: ereport (LOG | LOG_STATEMENT, ...) (or a new LOG_WITH_STATEMENT log level) and a new GUC like log_include_statement that would control the output of statements for certain GUC parameters, and we document with GUC values it controls. A simpler idea would be to unconditionally include the query in the errdetail() of the actual LOG ereport. This is not the first GUC that has needed this. We had this issue with log_temp_files, which we just added, and the only suggested solution was to use log_statement = 'all'. Either of these ideas above would be useful for this as well. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Deadlock with pg_dump?
Bruce Momjian [EMAIL PROTECTED] writes: This is not the first GUC that has needed this. Exactly. I think that we simply made a mistake in the initial implementation of log_min_error_statement: we failed to think about whether it should use client or server priority ordering, and the easy-to-code behavior was the wrong one. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] COMMIT NOWAIT Performance Option (patch)
Simon Riggs [EMAIL PROTECTED] writes: A prototype patch is posted to -patches, which is WORK IN PROGRESS. [This patch matches discussion thread on -hackers.] What does this accomplish other than adding syntactic sugar over a feature that really doesn't work well anyway? I don't see any point in encouraging people to use commit_delay in its present form. If we had a portable solution for millisecond-or-so waits then maybe it would work ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] COMMIT NOWAIT Performance Option (patch)
On Mon, 2007-02-26 at 18:14 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: A prototype patch is posted to -patches, which is WORK IN PROGRESS. [This patch matches discussion thread on -hackers.] What does this accomplish other than adding syntactic sugar over a feature that really doesn't work well anyway? I don't see any point in encouraging people to use commit_delay in its present form. If we had a portable solution for millisecond-or-so waits then maybe it would work ... This patch doesn't intend to implement group commit. I've changed the meaning of commit_delay, sorry if that confuses. You and I discussed this in Toronto actually, IIRC. The best way to describe this proposal is deferred fsync, so perhaps a different parameter commit_fsync_delay would be more appropriate. Bruce has requested this feature many times from me, so I thought it about time to publish. The key point is that COMMIT NOWAIT doesn't wait for group commit to return, it just doesn't wait at all - leaving someone else to flush WAL. It's much better than fsync=off. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] [HACKERS] Load distributed checkpoint
Josh Berkus josh@agliodbs.com wrote: Can I have a copy of the patch to add to the Sun testing queue? This is the revised version of the patch. Delay factors in checkpoints can be specified by checkpoint_write_percent, checkpoint_nap_percent and checkpoint_sync_percent. They are relative to checkpoint_timeout. Also, checking of archive_timeout during checkpoints and some error handling routines were added. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center checkpoint_02-27.patch.gz Description: Binary data ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES]
Hi, This is the first posting to the community of the WIP patch for the Updatable Cursor implementation. I want to confirm that the community is satisfied that the effort to date is in a suitable direction and to get comments on the development to date. The patch is in the following state: The grammar definition is complete and 'yacc'ed to produce gram.y.c. The functions transformUpdateStmt and transformDeleteStmt have been updated to process the cursor name and obtain the related portal. The change to save the current tuple id (ctid) into the portal, related to the Fetch command has been done. The ctids relating to the Update/Delete statements' TidScan are being extracted to be saved in the executor. The parts in progress are to complete the saving of the ctids from the TidScan into a list stored in a file, plus related searching the list for an individual ctid obtained from the Update/Delete statements. Unstarted as yet: 1)Correctly process, in the database, the Delete / Update of the tuple from the cursor. 2)To enable the cursor name to be defined as a parameter in a PREPARE statement and provided as part if an EXECUTE statement. The community may wish to comment on the following issue: 1)At present the file that will contain the list of ctids is going into a new directory called pg_ctids, analogous to pg_twophase, and also stored in the pg_data directory. Regards, John Bartlett This is an email from Fujitsu Australia Software Technology Pty Ltd, ABN 27 003 693 481. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you. If you do not wish to receive commercial email messages from Fujitsu Australia Software Technology Pty Ltd, please email [EMAIL PROTECTED] 270207_updatable_cursor.diff Description: Binary data ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PATCHES] POSIX shared memory support
Recapitulating and addressing some of the issues with my previous attempt at this feature: PostgreSQL currently uses the System V shared memory APIs to access shared memory. On Mac OS X and other BSDs, the default System V shared memory limits are often very low and require adjustment for acceptable performance. Par- ticularly, when Postgres is included as part of larger end-user friendly software products, these kernel settings are often difficult to change programatically. The System V shared memory facilities provide a method to determine who is attached to a shared memory segment. This is used to prevent backends that were orphaned by crashed or killed database processes from corrupting the data- base as it is restarted. The same effect can be achieved with using the POSIX APIs, but since the POSIX library does not have a way to check who is attached to a segment, atomic segment creation must be used to ensure exclusive access to the database. In order for this to work, the key name used to open and create the shared memory segment must be unique for each data directory. This is done by using a strong hash of the canonical form of the data directory’s pathname. This also re- moves any risk of other applications, or other databases’ memory segments colliding with the current shared memory segment, which conveniently simplifies the logic. The algorithm changes are described by the before and after flowchart PDF included in the tarball below. There is also a Windows version of this patch included, which can replace the current SysV-to-Win32 shared memory layer as it currently does not check for orphaned backends in the database. If this is used, src/backend/port/win32/shmem.c and its makefile reference can be removed. Included is a patch for the configure.in file to opt-in all of the supported platforms to use this POSIX shared memory layer except those that are explicitly excluded (of course, this could be done in reverse if desired). Notably, NetBSD and OpenBSD do not support the POSIX shared memory calls and their template patches are included. Other platforms should be tested and excluded as needed. In addition, the documentation section 16.4.1 ‘Shared Memory and Sema- phores’ could be largely pruned and updated to reflect use of the POSIX calls. The patches are available here (the postings were being silently dropped when I attached this large of a file): http://homepage.mac.com/cmarcellino/postgres-posix-shmem.tar Please let me know if there is a better way to post this. Thanks for your feedback, Chris Marcellino ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] POSIX shared memory support
Chris Marcellino [EMAIL PROTECTED] writes: The System V shared memory facilities provide a method to determine who is attached to a shared memory segment. This is used to prevent backends that were orphaned by crashed or killed database processes from corrupting the data- base as it is restarted. The same effect can be achieved with using the POSIX APIs, ... except that it can't ... but since the POSIX library does not have a way to check who is attached to a segment, atomic segment creation must be used to ensure exclusive access to the database. How does that fix the problem? If you can't actually tell whether someone is attached to an existing segment, then you're still up against the basic rock-and-a-hard-place issue: either you assume there is no one there (and corrupt your database if you're wrong) or you assume there is someone there (and force manual intervention by the DBA to recover after postmaster crashes). Neither of these alternatives is really acceptable. In order for this to work, the key name used to open and create the shared memory segment must be unique for each data directory. This is done by using a strong hash of the canonical form of the data directorys pathname. Strong hash is not a guarantee, even if you could promise that you could get a unique canonical path, which I doubt you can. In any case this fails if the DBA decides to rename the directory on the fly (don't laugh; not only are there instances of that in our archives, there are people opining that we need to allow it --- even with the postmaster still running). This also re- moves any risk of other applications, or other databases memory segments colliding with the current shared memory segment, which conveniently simplifies the logic. How exactly does it remove that risk? I think you're wishfully-thinking that if you are creating an unreadable hash value then there will never be any collisions against someone else with the same touching faith that *his* unreadable hash values will never collide with anyone else's. Doesn't give me a lot of comfort. Not that it matters, since the approach is broken even if this specific assumption were sustainable. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] COMMIT NOWAIT Performance Option (patch)
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2007-02-26 at 18:14 -0500, Tom Lane wrote: What does this accomplish other than adding syntactic sugar over a feature that really doesn't work well anyway? This patch doesn't intend to implement group commit. I've changed the meaning of commit_delay, sorry if that confuses. Ah. The patch was pretty much unintelligible without the discussion (which got here considerably later :-(). I've still got misgivings about how safe it really is, but at least this is better than what commit_delay wishes it could do. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Load distributed checkpoint
On 2/26/07, ITAGAKI Takahiro [EMAIL PROTECTED] wrote: Josh Berkus josh@agliodbs.com wrote: Can I have a copy of the patch to add to the Sun testing queue? This is the revised version of the patch. Delay factors in checkpoints can be specified by checkpoint_write_percent, checkpoint_nap_percent and checkpoint_sync_percent. They are relative to checkpoint_timeout. Also, checking of archive_timeout during checkpoints and some error handling routines were added. One of the issues we had during testing with original patch was db stop not working properly. I think you coded something to do a stop checkpoint in immediately but if a checkpoint is already in progress at that time, it would take its own time to complete. Does this patch resolve that issue? Also, is it based on pg82stable or HEAD? regards, inaam Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Inaam Rana EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] POSIX shared memory support
On Feb 26, 2007, at 10:43 PM, Tom Lane wrote: Chris Marcellino [EMAIL PROTECTED] writes: The System V shared memory facilities provide a method to determine who is attached to a shared memory segment. This is used to prevent backends that were orphaned by crashed or killed database processes from corrupting the data- base as it is restarted. The same effect can be achieved with using the POSIX APIs, ... except that it can't ... but since the POSIX library does not have a way to check who is attached to a segment, atomic segment creation must be used to ensure exclusive access to the database. How does that fix the problem? If you can't actually tell whether someone is attached to an existing segment, then you're still up against the basic rock-and-a-hard-place issue: either you assume there is no one there (and corrupt your database if you're wrong) or you assume there is someone there (and force manual intervention by the DBA to recover after postmaster crashes). Neither of these alternatives is really acceptable. Ignoring the case where backends are still alive in the database, since they would require intervention or patience either way, there are two options: 1) There is a postmaster/backend still running and you try to start another postmaster: the unique segment cannot be closed and atomically recreated and will fail as it does in the current implementation. 2) There are no errant processes still in the database: the segment can be closed and atomically recreated. Try making a build with the patch, then start a postmaster for a given folder, delete the lock file and start another postmaster (on a different port) in that folder. Please let me know if I am overlooking something. In order for this to work, the key name used to open and create the shared memory segment must be unique for each data directory. This is done by using a strong hash of the canonical form of the data directory’s pathname. Strong hash is not a guarantee, even if you could promise that you could get a unique canonical path, which I doubt you can. In any case this fails if the DBA decides to rename the directory on the fly (don't laugh; not only are there instances of that in our archives, there are people opining that we need to allow it --- even with the postmaster still running). Strong hash is an effective guarantee that many computing paradigms are based upon. The collision rate is astronomically small, and can be made astronomically smaller with longer hashes. (For MD5 there would need to be 10^15 postmasters on a server before a collision is likely, and they all would need to have crashed and left backends in the database, etc. ) True, renaming is a problem that I had had not anticipated at all. Now that you mention it, hard links might be an issue on some machines that don't canonicalize them to a unique path, since that isn't required by the POSIX docs. Oh, the horrible degenerate cases. Good point though. Perhaps there is some other unique identifying feature of a given database. A per-database persistent UUID would fit nicely here. It could just be the shmem key. This also re- moves any risk of other applications, or other databases’ memory segments colliding with the current shared memory segment, which conveniently simplifies the logic. How exactly does it remove that risk? This is fruitless due to the renaming issue, but the hash isn't an issue. I'm not sure that a hex string beginning with \pg_x is any less readable than the shmem id integers that are generated ad-hoc by the current implementation. I think you're wishfully-thinking that if you are creating an unreadable hash value then there will never be any collisions against someone else with the same touching faith that *his* unreadable hash values will never collide with anyone else's. I'm flattered that you hold my coding abilities with such devout conviction, but I assure you that cryptography, even in this limited use, is based in rational thought :). In addition, the astronomically unlikely collision isn't a risk as the database can't be damaged. The admin would then need to clear the lockfile, after he won the lottery twice and was stuck by lightning in his overturned car. Doesn't give me a lot of comfort. Not that it matters, since the approach is broken even if this specific assumption were sustainable. Postmasters failing to load don't give me much comfort either, and that isn't a pipe dream. I suppose that the renaming issue relegates this patch to situations where the database cannot be renamed or hard linked to and started more than once, yet require this to start up databases without restarting and needing to control how many other databases are consuming shmem on the same box. Thanks for the reply, Chris Marcellino regards, tom lane ---(end of