Re: [HACKERS] Shared row locking
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Using a B-tree > > > At transaction end, nothing special happens (tuples are not unlocked > > explicitly). > > I don't think that works, because there is no guarantee that an entry > will get cleaned out before the XID counter wraps around. Worst case, > you might think that a tuple is locked when the XID is left over from > the previous cycle. (Possibly this could be avoided by cleaning out old > XIDs in this table whenever we truncate pg_clog, but that seems a tad > messy.) I'm also a bit concerned about how we avoid table bloat if > there's no proactive cleanup at transaction end. > > I think I like the pg_clog-modeled structure a bit better. However it > could be objected that that puts a hard limit of 4G share-locked tuples > at any one time. > > In the clog-modeled idea, it wasn't real clear how you decide whether to > assign a new counter value to a previously locked row, or reuse its > previous counter. You must *not* assign a new value when the existing > entry still has bits set, but you probably do want to be aggressive > about assigning new values when you can; else it gets tough to be sure > that the log can be truncated in a reasonable time. I assume you check and if all the bits are zero, you don't reuse it and get a new counter. In fact you shouldn't reuse it in case the log is being truncated while you are looking. :-) > ISTM that your description is conflating several orthogonal issues: > how do we identify entries in this data structure (by CTID, or a shared > counter that increments each time a new lock is acquired); how do we > index the data structure (btree or linear array); and what is stored in > each entry (array of XIDs, or bitmap indexed by BackendId). Not all of > the eight combinations work, but we do have more alternatives than the > two offered, even without coming up with any new ideas ;-) True. The only advantage to a bitmap vs. just a counter of locked backends is that you can clean out your own backend bits from the table without having to record them in your memory. However, because recording your own counters in local memory doesn't require fixed shared memory we might be better just recording the shared lock indexes in your local backend memory and just use an int4 counter in the pg_clog-like file that we can decrement on backend commit. However I am unclear that we can guarantee an exiting backend will do that. Certainly it is cleared on server start. > > Note that to check whether a transaction is running we need to lock > > SInvalLock. To minimize the time we hold it, we save the BackendId so > > we don't have to scan the whole shmInvalBuffer->procState array, only > > the item that we need to look at. Another possibility would be to use > > stock TransactionIdIsInProgress and save the extra 4 bytes of storage. > > I'm a bit worried about deadlocks and race conditions associated with > the conflict between locking a page of this data structure and locking > SInvalLock. > > > At server restart, the btree is created empty (or just deleted). There > > is one btree per database. > > One per cluster you meant, right? (Else we can't do locking of rows in > shared tables.) He meant one per database, I think. I suppose we would need another one for global tables or disallow shared locking of them. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Shared row locking
Bruce Momjian <[EMAIL PROTECTED]> writes: > You mean all empty/zero rows can be removed? Can we guarantee that on > commit we can clean up the bitmap? If not the idea doesn't work. For whatever data structure we use, we may reset the structure to empty during backend-crash recovery. So your objection boils down to "what if a backend exits normally but forgets to clean up its locks?" Assuming that doesn't happen isn't any worse than assuming a backend will clean up its shared memory state on non-crash exit, so I don't think it's a serious concern. That brings another thought: really what this is all about is working around the fact that the standard lock manager can only cope with a finite number of coexisting locks, because it's working in a fixed-size shared memory arena. Maybe we should instead think about ways to allow the existing lock table to spill to disk when it gets too big. That would eliminate max_locks_per_transaction as a source of hard failures, which would be a nice benefit. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Shared row locking
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Using a B-tree > At transaction end, nothing special happens (tuples are not unlocked > explicitly). I don't think that works, because there is no guarantee that an entry will get cleaned out before the XID counter wraps around. Worst case, you might think that a tuple is locked when the XID is left over from the previous cycle. (Possibly this could be avoided by cleaning out old XIDs in this table whenever we truncate pg_clog, but that seems a tad messy.) I'm also a bit concerned about how we avoid table bloat if there's no proactive cleanup at transaction end. I think I like the pg_clog-modeled structure a bit better. However it could be objected that that puts a hard limit of 4G share-locked tuples at any one time. In the clog-modeled idea, it wasn't real clear how you decide whether to assign a new counter value to a previously locked row, or reuse its previous counter. You must *not* assign a new value when the existing entry still has bits set, but you probably do want to be aggressive about assigning new values when you can; else it gets tough to be sure that the log can be truncated in a reasonable time. ISTM that your description is conflating several orthogonal issues: how do we identify entries in this data structure (by CTID, or a shared counter that increments each time a new lock is acquired); how do we index the data structure (btree or linear array); and what is stored in each entry (array of XIDs, or bitmap indexed by BackendId). Not all of the eight combinations work, but we do have more alternatives than the two offered, even without coming up with any new ideas ;-) > Note that to check whether a transaction is running we need to lock > SInvalLock. To minimize the time we hold it, we save the BackendId so > we don't have to scan the whole shmInvalBuffer->procState array, only > the item that we need to look at. Another possibility would be to use > stock TransactionIdIsInProgress and save the extra 4 bytes of storage. I'm a bit worried about deadlocks and race conditions associated with the conflict between locking a page of this data structure and locking SInvalLock. > At server restart, the btree is created empty (or just deleted). There > is one btree per database. One per cluster you meant, right? (Else we can't do locking of rows in shared tables.) regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [INTERFACES] PL/Python: How do I use result methods?
I wrote: > Michael Fuhr <[EMAIL PROTECTED]> writes: >> Any comments on this? The 8.0.0rc1 PL/Python documentation, >> Section 39.3 "Database Access", still mentions the nrows and >> status methods, but they don't work. Here's Oliver's original >> message and my followup: >> http://archives.postgresql.org/pgsql-interfaces/2004-10/msg00019.php >> http://archives.postgresql.org/pgsql-interfaces/2004-10/msg00020.php > It looks like someone #ifdef'd out those sections after observing that > the PLy_result_methods table isn't used anyplace. Perhaps the place > where it should have been used got lost in some earlier patch? > Just out of curiosity, what sort of patch would it take to enable these > functions? If it's at all nontrivial I'd vote to hold over to 8.1, > but if it's a line or two of code that got lost at some point, it would > seem like a reasonable bug fix ... Comparing the result and plan method types made it pretty obvious how those methods are supposed to be hooked up, and it was indeed a one-liner omission in the original source code. So I've fixed it. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Shared row locking
The SQL spec does not say anything on this respect (that I can find). It only talks of "FOR UPDATE" and "FOR READ ONLY". However, because the FK code uses SPI to do the locking, we definitely have to expose the funcionality through SQL. So I think we need a new clause, which I propose to be "FOR SHARE". MySQL uses LOCK IN SHARE MODE: http://dev.mysql.com/doc/mysql/en/InnoDB_locking_reads.html ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Shared row locking
Alvaro Herrera wrote: > The btree idea: > - does not need crash recovery. Maybe we could use a stripped down > version of nbtree. This could cause a maintanibility nightmare. Are you saying the btree is an index with no heap? If so, what about the xid's? Are they just in the btree? How does the btree get cleaned up over time? > The bitmap idea: > - seems to have lower overhead > > - can use the same lazy cleanup mechanism exposed for the btree idea (in > which case we don't need the list in local memory). You mean all empty/zero rows can be removed? Can we guarantee that on commit we can clean up the bitmap? If not the idea doesn't work. > - What can happen in presence of large max_connections settings? Is > this a real problem? I thought about that. 50 backends is 7 bytes, 1000 backends is 128 bytes. For a large number of backends you could just allow X concurrent locks and use space X*4 bytes. I think the basic issue is that the btree can be of variable length while the bitmap has to be of a fixed length. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[HACKERS] Shared row locking
Hi, I've been thinking on how to do shared row locking. There are some very preliminar ideas on this issue. Please comment; particularly if any part of it sounds unworkable or too incomplete. There are several problems to be solved here: the grammar, the internal SelectStmt representation, how to store and share the info between backends, how to clean up at transaction end, and how to clean up at backend crash. The Grammar === The SQL spec does not say anything on this respect (that I can find). It only talks of "FOR UPDATE" and "FOR READ ONLY". However, because the FK code uses SPI to do the locking, we definitely have to expose the funcionality through SQL. So I think we need a new clause, which I propose to be "FOR SHARE". The Parser and SelectStmt = The parser uses for_update_clause and opt_for_update_clause nonterminals. I assume it's best to change them to (new) locking_clause, which can in turn be for_update_clause or (new) for_share_clause. SelectStmt currently has a forUpdate field (a List to the to-be-locked tables, or an empty list meaning all of them). We could simply add another list, say forShare, or use a common list and a flag saying that it's one or the other. I prefer adding a new list. (Same with the Query node.) How to Store the Info = This is the really interesting part. I have two ideas, one mine (btree) and other Bruce's (bitmap). Using a B-tree -- When a backend wants to lock a tuple, it set a bit in its infomask. Then it inserts to a btree in a special tablespace, using RelationId-BlockNumber-OffsetNumber as key, and BackendId-TransactionId as value; actually, an array with a single element containing those two values. When a backend wants to lock a tuple that is already locked, it goes to the btree and inserts itself into the array. To do this, it inserts a new index item (the enlarged array) and delete the previous one. No other backend may want to insert simultaneously (thus causing an ugly race condition), because we hold an exclusive lock on the tuple's heap page's buffer. At transaction end, nothing special happens (tuples are not unlocked explicitly). When someone wants to know if the tuple is locked (to mark it FOR UPDATE, or to delete it), it checks the infomask. If it says it's locked, it goes to check the btree. If the array contains only BackendId-TransactionId pairs that are no longer running, then the tuple is not locked and can be deleted/marked (and the btree can be cleaned up). Else, it will have to wait, using XactLockTableWait, for the first transaction in the array that is still running. We can be sure that no one will try to share-lock the tuple while we check the btree because we hold an exclusive lock on the tuple's heap page's buffer. Note that to check whether a transaction is running we need to lock SInvalLock. To minimize the time we hold it, we save the BackendId so we don't have to scan the whole shmInvalBuffer->procState array, only the item that we need to look at. Another possibility would be to use stock TransactionIdIsInProgress and save the extra 4 bytes of storage. At server restart, the btree is created empty (or just deleted). There is one btree per database. Using a Bitmap -- First we create a counter called shared lock row counter. Then we create a file like pg_clog, and each counter slot has a bit for every backend. When we want to shared lock a row we increment the counter and put that counter value on the row, and set our backend bit in the new file. We also store that counter value in our backend local memory. On commit we go through that local memory list and reset all our bits for those counters. When a row has all zeros, it can be recycled like we do with pg_clog. Problems and random comments There is possibility of starvation, if somebody wants to lock exclusively a tuple and shared lockers are coming all the time. Not sure how to solve this. The wakeup mechanism is not discussed ... is there a special need for something beyond what we can do with XactLockTable{Insert,Wait} ? Thanks to tablespaces, it's very easy to create special Relations that can be dealt with by standard buffer and storage manager, etc. The btree idea: - does not need crash recovery. Maybe we could use a stripped down version of nbtree. This could cause a maintanibility nightmare. - can't hold more than 300 or so simultaneous lockers (because of value length, limited to 1/3 of a page). I doubt this is a real problem. - could have problems (excessive storage requirements) in the long run because of empty or almost-empty pages. The bitmap idea: - seems to have lower overhead - can use the same lazy cleanup mechanism exposed for the btree idea (in which case we don't need the list in local memory). - What can happen in presence of large max_connections settings? Is this a real problem? -- Alvar
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Darcy Buskermolen <[EMAIL PROTECTED]> writes: > On December 16, 2004 12:37 pm, Tom Lane wrote: > I think you are in need of the local equivalent to GNU ld's -E or > --export-dynamic switch, ie, make sure that all global symbols within > the backend will be available to dynamically loaded libraries. > xild just gets envoked as a wrapper to gnu ld by the looks of it. Hmm. I see in Makefile.freebsd: ifdef ELF_SYSTEM export_dynamic = -export-dynamic rpath = -R$(rpathdir) shlib_symbolic = -Wl,-Bsymbolic -lc endif Perhaps ELF_SYSTEM isn't getting defined? Or maybe it's being picky about seeing --export-dynamic instead of -export-dynamic ? Or you need to spell it like "-Wl,-E" to get it past icc and into the linker? You did not show the build log, but take a look at how the postgres executable is getting linked to see if -export-dynamic is getting in there or not. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Port report: NetBSD 2.0 mac68k
I will apply the ASM changes that affect only NetBSD mac68k ELF. Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Rémi Zara wrote: > Hi, > > Here is a port report for NetBSD 2.0 mac68k, with sources of > postgresql8.0.0rc1. > > Here is the configure line used : > ./configure --prefix=/data/postgresql/pgsql-8.0.0rc1 --with-openssl > --with-python --with-perl --with-tcl --with-krb5 --with-pam > > But some tweaking was necessary to make it work: > * krb5.h is in /usr/include/krb5 on netbsd (set via CPPFLAGS) > * krb5_encrypt is to be found in -lkrb5 -ldes -lasn1 -lroken -lcrypto > > --enable-thread-safety does not work because the thread safety test > fails (src/tools/thread/thread_test) > configure:18831: ./conftest > conftest in free(): error: freelist is destroyed. > [1] Abort trap (core dumped) ./conftest${ac_e... > > > Then the tas code in src/backend/storage/lmgr/s_lock.c cannot be > compiled and linked on this system without modification: > the '_' in front of the tas symbol should be removes, and '%' added in > front of register names. I've attached a diff that makes these > modifications only for NetBSD mac68k ELF. > > > With these modifications, make and make install are OK ! > > template1=# SELECT version(); > version > > - > PostgreSQL 8.0.0rc1 on m68k-unknown-netbsdelf2.0, compiled by GCC gcc > (GCC) 3.3.3 (NetBSD nb3 20040520) > (1 row) > > in make check, two tests fail: float8 and misc. > I've attached the regression.diffs file. > > Regards, > > R?mi Zara > > -- > R?mi Zara > http://www.remi-zara.net/ > > [ Attachment, skipping... ] > > > > [ Attachment, skipping... ] > -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] LISTEN/NOTIFY improvements?
On Fri, Dec 17, 2004 at 09:56:05AM +1100, Gavin Sherry wrote: > I've got a patch floating around that does this and also moves > LISTEN/NOTIFY into the shared inval code Uh, what will happen with idle backends? -- Alvaro Herrera (<[EMAIL PROTECTED]>) Maybe there's lots of data loss but the records of data loss are also lost. (Lincoln Yeoh) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] LISTEN/NOTIFY improvements?
On Thu, 16 Dec 2004, Aaron Hillegass wrote: > I am a programmer who works on a couple of products that use PostgreSQL > as their backend (http://www.nopali.com, http://www.iwanttops.com). > Both my clients need to receive notifications when rows in the database > change. Thus far, we've been rolling our own because LISTEN/NOTIFY > doesn't do what we need. > > What we need is on the TODO list: > - Add optional textual message to NOTIFY: This would allow an > informational message to be added to the notify message, perhaps > indicating the row modified or other custom information. > > I want to help get this done for 8.1, but lack the deep understanding > necessary. Is there someone on this list that I can beg/bribe to make > this happen? I would be delighted to do stress-testing and > benchmarking on the resulting code. > > (Incidentally, there are two related TODO items you might want to take > care of while you are in there: > - Allow NOTIFY in rules involving conditionals > - Allow LISTEN/NOTIFY to store info in memory rather than tables? > Currently LISTEN/NOTIFY information is stored in pg_listener. Storing > such information in memory would improve performance.) > > Regardless, thank you for all the great work that you have been doing; > PostgreSQL has made my life a better place. I've got a patch floating around that does this and also moves LISTEN/NOTIFY into the shared inval code -- which means its faster but still suffers the problem of relying on a statically sized shared memory buffer. The point is, I can get the patch into shape without too much trouble and send it in for 8.1. The great thing is that the patch does not break the FE/BE protocol because Tom had the foresight to allow optional messages with NOTIFY events. You legend. Thanks, Gavin ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
On December 16, 2004 02:28 pm, Tom Lane wrote: > Darcy Buskermolen <[EMAIL PROTECTED]> writes: > > On December 16, 2004 12:37 pm, Tom Lane wrote: > >> I think you are in need of the local equivalent to GNU ld's -E or > >> --export-dynamic switch, ie, make sure that all global symbols within > >> the backend will be available to dynamically loaded libraries. > > > > basicly it's done via a > > CC=icc LD=xild ./configure ... > > > > and for refrence find attached the icc -help output. > > That's the wrong list --- what about the linker's options? xild just gets envoked as a wrapper to gnu ld by the looks of it. But just incase: > > regards, tom lane -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com Usage: ld [options] file... Options: -a KEYWORD Shared library control for HP/UX compatibility -A ARCH, --architecture ARCH Set architecture -b TARGET, --format TARGET Specify target for following input files -c FILE, --mri-script FILE Read MRI format linker script -d, -dc, -dpForce common symbols to be defined -e ADDRESS, --entry ADDRESS Set start address -E, --export-dynamicExport all dynamic symbols -EB Link big-endian objects -EL Link little-endian objects -f SHLIB, --auxiliary SHLIB Auxiliary filter for shared object symbol table -F SHLIB, --filter SHLIBFilter for shared object symbol table -g Ignored -G SIZE, --gpsize SIZE Small data size (if no size, same as --shared) -h FILENAME, -soname FILENAME Set internal name of shared library -I PROGRAM, --dynamic-linker PROGRAM Set PROGRAM as the dynamic linker to use -l LIBNAME, --library LIBNAME Search for library LIBNAME -L DIRECTORY, --library-path DIRECTORY Add DIRECTORY to library search path -m EMULATIONSet emulation -M, --print-map Print map file on standard output -n, --nmagicDo not page align data -N, --omagicDo not page align data, do not make text readonly --no-omagic Page align data, make text readonly -o FILE, --output FILE Set output file name -O Optimize output file -Qy Ignored for SVR4 compatibility -q, --emit-relocs Generate relocations in final output -r, -i, --relocatable Generate relocatable output -R FILE, --just-symbols FILE Just link symbols (if directory, same as --rpath) -s, --strip-all Strip all symbols -S, --strip-debug Strip debugging symbols --strip-discarded Strip symbols in discarded sections --no-strip-discardedDo not strip symbols in discarded sections -t, --trace Trace file opens -T FILE, --script FILE Read linker script -u SYMBOL, --undefined SYMBOL Start with undefined reference to SYMBOL --unique [=SECTION] Don't merge input [SECTION | orphan] sections -Ur Build global constructor/destructor tables -v, --version Print version information -V Print version and emulation information -x, --discard-all Discard all local symbols -X, --discard-localsDiscard temporary local symbols (default) --discard-none Don't discard any local symbols -y SYMBOL, --trace-symbol SYMBOL Trace mentions of SYMBOL -Y PATH Default search path for Solaris compatibility -(, --start-group Start a group -), --end-group End a group --accept-unknown-input-arch Accept input files whose architecture cannot be determined --no-accept-unknown-input-arch Reject input files whose architecture is unknown -assert KEYWORD Ignored for SunOS compatibility -Bdynamic, -dy, -call_shared Link against shared libraries -Bstatic, -dn, -non_shared, -static Do not link against shared libraries -Bsymbolic Bind global references locally --check-sectionsCheck section addresses for overlaps (default) --no-check-sections Do not check section addresses for overlaps --cref Output cross reference table --defsym SYMBOL=EXPRESSION Define a symbol --demangle [=STYLE] Demangle symbol names [using STYLE] --embedded-relocs Generate embedded relocs -fini SYMBOLCall SYMBOL at unload-time --force-exe-suffix Force generation of file with .exe suffix --gc-sections Remove unused
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Darcy Buskermolen <[EMAIL PROTECTED]> writes: > On December 16, 2004 12:37 pm, Tom Lane wrote: >> I think you are in need of the local equivalent to GNU ld's -E or >> --export-dynamic switch, ie, make sure that all global symbols within >> the backend will be available to dynamically loaded libraries. >> > basicly it's done via a > CC=icc LD=xild ./configure ... > and for refrence find attached the icc -help output. That's the wrong list --- what about the linker's options? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
On December 16, 2004 01:49 pm, Bruce Momjian wrote: > Andrew Dunstan wrote: > > Bruce Momjian wrote: > > >Darcy Buskermolen wrote: > > >>As per > > >>http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16 > > >>%2018:46:18 > > >> > > >>This combination of OS/compiler does not result in a working copy. > > >> > > >>Andrew and i have been digging into this for better than a week now, > > >> and something just dosent look right. > > > > > >That is strange. The problem line appears to be: > > > > > > creating conversions ... FATAL: could not load library > > > ... /postgresql/ascii_and_mic.so":dlopen > > > ... /postgresql/ascii_and_mic.so'failed. > > > ... /postgresql/ascii_and_mic.so:Undefined symbol "pg_mic2ascii") > > > > > >I wonder if running ldconfig would help. > > > > What would it do? This is failing at the "make check" stage, before it's > > even installed (and buildfarm doesn't install to a standard location > > anyway). > > Sometimes pre-existing libraries that are already installed override > even the regression initdb on some platforms. It is just a guess. And in this case it's a "virgin", not having an installed copy of pg. -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] LISTEN/NOTIFY improvements?
I am a programmer who works on a couple of products that use PostgreSQL as their backend (http://www.nopali.com, http://www.iwanttops.com). Both my clients need to receive notifications when rows in the database change. Thus far, we've been rolling our own because LISTEN/NOTIFY doesn't do what we need. What we need is on the TODO list: - Add optional textual message to NOTIFY: This would allow an informational message to be added to the notify message, perhaps indicating the row modified or other custom information. I want to help get this done for 8.1, but lack the deep understanding necessary. Is there someone on this list that I can beg/bribe to make this happen? I would be delighted to do stress-testing and benchmarking on the resulting code. (Incidentally, there are two related TODO items you might want to take care of while you are in there: - Allow NOTIFY in rules involving conditionals - Allow LISTEN/NOTIFY to store info in memory rather than tables? Currently LISTEN/NOTIFY information is stored in pg_listener. Storing such information in memory would improve performance.) Regardless, thank you for all the great work that you have been doing; PostgreSQL has made my life a better place. Sincerely, Aaron Hillegass Big Nerd Ranch, Inc. 404-210-5663 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] UNION ALL vs INHERITANCE
Adi Alurkar <[EMAIL PROTECTED]> writes: > Why does the append resulting from a inheritance take longer than one > resulting from UNION ALL? The index scan is where the time difference is: > -> Index Scan using fftiallbgrgfid_1102715649 on > f_f_all_base (cost=0.00..3.52 rows=1 width=51) (actual > time=3.871..244.356 rows=28 loops=1) > Index Cond: (group_id = 78745) > Filter: (all_tidx @@ '\'mmcach\''::tsquery) > -> Index Scan using fftiallbgrgfid_1102715649 on > f_f_all_base (cost=0.00..3.52 rows=1 width=51) (actual > time=3.714..79.996 rows=28 loops=1) > Index Cond: (group_id = 78745) > Filter: (all_tidx @@ '\'mmcach\''::tsquery) One would have to suppose this is a caching effect, ie, the data is already in RAM on the second try and doesn't have to be read from disk again. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[HACKERS] Updates for beta/rc stamping
FYI, Magnus just pointed out that src/include/pg_config.h.win32 had the version string of 8.0devel rather than the correct 8.0.0rc1. I have added a mention in tools/RELEASE_CHANGES that this file should be updated during beta and rc as well as final release. This file is used by VC and BCC for non-configure-run win32-client-only builds. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
On December 16, 2004 12:37 pm, Tom Lane wrote: > Darcy Buskermolen <[EMAIL PROTECTED]> writes: > > As per > > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2 > >018:46:18 This combination of OS/compiler does not result in a working > > copy. > > The failure is > ascii_and_mic.so: Undefined symbol "pg_mic2ascii" > > I think you are in need of the local equivalent to GNU ld's -E or > --export-dynamic switch, ie, make sure that all global symbols within > the backend will be available to dynamically loaded libraries. > basicly it's done via a CC=icc LD=xild ./configure ... and for refrence find attached the icc -help output. > ---(end of broadcast)--- > TIP 8: explain analyze is your friend -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com Intel(R) C++ Compiler Help == usage: icc [options] file1 [file2 ...] where options represents zero or more compiler options fileN is a C/C++ source (.C .c .cc .cpp .cxx .i), assembly (.s), object (.o), static library (.a), or other linkable file Commonly used options may be placed in the icc.cfg file. Compiler Option List Performance --- -O1enable optimizations (DEFAULT) -O2same as -O1 -O3enable -O2 plus more aggressive optimizations that may not improve performance for all programs -O0disable optimizations -O same as -O1 -Ob control inline expansion: n=0 disables inlining n=1 inline functions declared with __inline, and perform C++ inlining n=2 inline any function, at the compiler's discretion (same as -Qip) -falias assume aliasing in program (DEFAULT) -fno-aliasassume no aliasing in program -ffnalias assume aliasing within functions (DEFAULT) -fno-fnalias assume no aliasing within functions, but assume aliasing across calls -nolib_inline disable inline expansion of intrinsic functions -mp maintain floating point precision (disables some optimizations) -mp1 improve floating-point precision (speed impact is less than -mp) -fp disable using EBP as general purpose register -prec_div improve precision of floating-point divides (some speed impact) -fp_port round fp results at assignments & casts (some speed impact) -pc32 set internal FPU precision to 24 bit significand -pc64 set internal FPU precision to 53 bit significand -pc80 set internal FPU precision to 64 bit significand (DEFAULT) -rcd rounding mode to enable fast float-to-int conversions -tpp5 optimize for Pentium(R) processor -tpp6 optimize for Pentium(R) Pro, Pentium(R) II and Pentium(R) III processors -tpp7 optimize for Pentium(R) 4 processor (DEFAULT) -mcpu= optimize for a specific cpu pentium- optimize for Pentium(R) processor pentiumpro - optimize for Pentium(R) Pro, Pentium(R) II and Pentium(R) III processors pentium4 - optimize for Pentium(R) 4 processor (DEFAULT) -ax generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions W Pentium(R) 4 New Instructions -x generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. -march= generate code excusively for a given pentiumpro - Pentium(R) Pro and Pentium(R) II processor instructions pentiumii - MMX(TM)instructions pentiumiii - streaming SIMD extensions pentium4 - Pentium(R) 4 New Instructions Advanced Performance Enable and specify the scope of Interprocedural (IP) Optimizations: -ip enable single-file IP optimizations (within files) -ipoenable multi-file IP optimizations (between files) -ipo_c generate a multi-file object file (ipo_out.o) -ipo_S generate a multi-file assembly file (ipo_out.s) Modify the behavior of IP: -ip_no_inliningdisable full and partial inlining (requires -ip or -ipo) -ip_no_pinlining disable partial inlining (requires -ip or -ipo) -ipo_obj force generation of real object files (requires -ipo) Other Advanced Performance Options: -unroll[n] set maximum number of times to unroll loops. Omit n to use default heuristics. Use n=0 to disable loop unroller. -prof_dirspecify directory for profiling output files (*.dyn and *.dpi) -prof_file specify file name for profiling summary file -prof_gen[x]instrument program for profiling; with the x qualifier, extra information is gathered -prof_use enable use
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Bruce Momjian <[EMAIL PROTECTED]> writes: > Andrew Dunstan wrote: >> What would it do? This is failing at the "make check" stage, before it's >> even installed (and buildfarm doesn't install to a standard location >> anyway). > Sometimes pre-existing libraries that are already installed override > even the regression initdb on some platforms. It is just a guess. The "undefined" symbol is in the main backend, not in any shared library. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: To answer your question about how to choose it, you do it through the buildfarm member's config file - example (showing use of ccache) can be seen here: http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/build-farm.conf?rev=1.3&content-type=text/x-cvsweb-markup I see. Would it be reasonable for the buildfarm members to provide links to the config files they're using? You'd have to do something about separating out the passwords :-( Not a link, but I could easily have them report the contents of the config hash, with the password removed. Feature request filed on pgfoundry. cheers andrew ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Andrew Dunstan <[EMAIL PROTECTED]> writes: > To answer your question about how to choose it, you do it through the > buildfarm member's config file - example (showing use of ccache) can be seen > here: > http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/build-farm.conf?rev=1.3&content-type=text/x-cvsweb-markup I see. Would it be reasonable for the buildfarm members to provide links to the config files they're using? You'd have to do something about separating out the passwords :-( regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Andrew Dunstan wrote: > > > Bruce Momjian wrote: > > >Darcy Buskermolen wrote: > > > > > >>As per > >>http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18 > >> > >>This combination of OS/compiler does not result in a working copy. > >> > >>Andrew and i have been digging into this for better than a week now, and > >>something just dosent look right. > >> > >> > > > >That is strange. The problem line appears to be: > > > > creating conversions ... FATAL: could not load library > > ... /postgresql/ascii_and_mic.so":dlopen > > ... /postgresql/ascii_and_mic.so'failed. > > ... /postgresql/ascii_and_mic.so:Undefined symbol "pg_mic2ascii") > > > >I wonder if running ldconfig would help. > > > > > > What would it do? This is failing at the "make check" stage, before it's > even installed (and buildfarm doesn't install to a standard location > anyway). Sometimes pre-existing libraries that are already installed override even the regression initdb on some platforms. It is just a guess. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Port report: NetBSD 2.0 mac68k
I am confused by the threading failure. I don't see any free() call in thread_test.c. Would you go to the tools/thread directory and run the program manually and use a debugger to see the failure line? Is there some threading flag NetBSD requires for compiles or linking? --- Rémi Zara wrote: > Hi, > > Here is a port report for NetBSD 2.0 mac68k, with sources of > postgresql8.0.0rc1. > > Here is the configure line used : > ./configure --prefix=/data/postgresql/pgsql-8.0.0rc1 --with-openssl > --with-python --with-perl --with-tcl --with-krb5 --with-pam > > But some tweaking was necessary to make it work: > * krb5.h is in /usr/include/krb5 on netbsd (set via CPPFLAGS) > * krb5_encrypt is to be found in -lkrb5 -ldes -lasn1 -lroken -lcrypto > > --enable-thread-safety does not work because the thread safety test > fails (src/tools/thread/thread_test) > configure:18831: ./conftest > conftest in free(): error: freelist is destroyed. > [1] Abort trap (core dumped) ./conftest${ac_e... > > > Then the tas code in src/backend/storage/lmgr/s_lock.c cannot be > compiled and linked on this system without modification: > the '_' in front of the tas symbol should be removes, and '%' added in > front of register names. I've attached a diff that makes these > modifications only for NetBSD mac68k ELF. > > > With these modifications, make and make install are OK ! > > template1=# SELECT version(); > version > > - > PostgreSQL 8.0.0rc1 on m68k-unknown-netbsdelf2.0, compiled by GCC gcc > (GCC) 3.3.3 (NetBSD nb3 20040520) > (1 row) > > in make check, two tests fail: float8 and misc. > I've attached the regression.diffs file. > > Regards, > > R?mi Zara > > -- > R?mi Zara > http://www.remi-zara.net/ > > [ Attachment, skipping... ] > > > > [ Attachment, skipping... ] > -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Bruce Momjian wrote: Darcy Buskermolen wrote: As per http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18 This combination of OS/compiler does not result in a working copy. Andrew and i have been digging into this for better than a week now, and something just dosent look right. That is strange. The problem line appears to be: creating conversions ... FATAL: could not load library ... /postgresql/ascii_and_mic.so":dlopen ... /postgresql/ascii_and_mic.so'failed. ... /postgresql/ascii_and_mic.so:Undefined symbol "pg_mic2ascii") I wonder if running ldconfig would help. What would it do? This is failing at the "make check" stage, before it's even installed (and buildfarm doesn't install to a standard location anyway). cheers andrew ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Tom Lane wrote: Darcy Buskermolen <[EMAIL PROTECTED]> writes: As per http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18 This combination of OS/compiler does not result in a working copy. The failure is ascii_and_mic.so: Undefined symbol "pg_mic2ascii" I think you are in need of the local equivalent to GNU ld's -E or --export-dynamic switch, ie, make sure that all global symbols within the backend will be available to dynamically loaded libraries. BTW, how did you get configure to choose icc? If you used a CC environment variable, it seems like the buildfarm output is missing important information by not reporting it. That log shows: ccache icc -g -fpic -DPIC -I../../../src/interfaces/libpq -I../../../src/include -c -o regress.o regress.c xild -x -shared -o regress.so regress.o The status page also says this regarding the member: FreeBSD 6-CURRENT Intel(tm) CC 7.1 Build 20040901Z ia32 I can highlight this and more info if required - just tell me what you want to know. I already have on my list your request for version numbers for changed files. It's a bit of a balancing act between providing too much info and not enough. To answer your question about how to choose it, you do it through the buildfarm member's config file - example (showing use of ccache) can be seen here: http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/build-farm.conf?rev=1.3&content-type=text/x-cvsweb-markup cheers andrew ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Darcy Buskermolen <[EMAIL PROTECTED]> writes: > As per > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18 > This combination of OS/compiler does not result in a working copy. The failure is ascii_and_mic.so: Undefined symbol "pg_mic2ascii" I think you are in need of the local equivalent to GNU ld's -E or --export-dynamic switch, ie, make sure that all global symbols within the backend will be available to dynamically loaded libraries. BTW, how did you get configure to choose icc? If you used a CC environment variable, it seems like the buildfarm output is missing important information by not reporting it. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
Darcy Buskermolen wrote: > As per > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18 > > This combination of OS/compiler does not result in a working copy. > > Andrew and i have been digging into this for better than a week now, and > something just dosent look right. That is strange. The problem line appears to be: creating conversions ... FATAL: could not load library ... /postgresql/ascii_and_mic.so":dlopen ... /postgresql/ascii_and_mic.so'failed. ... /postgresql/ascii_and_mic.so:Undefined symbol "pg_mic2ascii") I wonder if running ldconfig would help. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Port report: Fedora Core 3 x86_64
Bernd Helmle <[EMAIL PROTECTED]> writes: > Peter Eisentraut <[EMAIL PROTECTED]> wrote: >> Tom Lane wrote: >>> "Andrew Dunstan" <[EMAIL PROTECTED]> writes: Instead of doing that, do: --with-includes=/usr/include/et >>> This same workaround is in the RPMs. I wonder if it would be >>> worthwhile for configure to assume the above when --with-krb5 is >>> mentioned. I don't know how widespread this particular file layout >>> is ... >> I have never seen it before. I've never had to add any paths to compile >> with Kerberos support, at least since they moved it out >> of /usr/kerberos. > FYI, i had the same issue here on Mandrake Linux 10.0/10.1 (32Bit), > Community Edition and Official. Actually, I'm wondering why we directly include com_err.h at all. At least in the version of I have here, that file is included by krb5.h; so both backend/libpq/auth.c and interfaces/libpq/fe-auth.c compile just fine with #include diked out. I suspect that our problems in this area all stem from trying to include something we are not supposed to include to begin with. It's possible that com_err.h was considered an exported file in old krb5 releases but no longer is. So perhaps what we should do is make the C code read #ifdef HAVE_COM_ERR_H #include #endif and adjust configure to generate that symbol, but not require the header to be found. Comments? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] port report: [FAILURE] FreeBSD 6, Intel icc7
As per http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=herring&dt=2004-12-16%2018:46:18 This combination of OS/compiler does not result in a working copy. Andrew and i have been digging into this for better than a week now, and something just dosent look right. -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] join selectivity
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > Well at the moment PostGIS has a RESTRICT function that takes an expression > of the formwhere column is a column consisting of > geometries and constant is a bounding box. This is based upon histogram > statistics and works well. Are these functions that would be useful for GiST indexes in general? What's involved in pulling them into a system? I mean, for example, a database using RTREE (or GiST I guess) boxes and the @ operator. I didn't realize anyone really had any idea where to start with gathering statistics or writing selectivity functions for geometric types. It's great news to hear there's actually work in this area. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] join selectivity
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > ... But in the case of > , if we're estimating the number of rows to return then > that becomes harder I didn't say it was easy ;-). The existing selectivity functions can't do better than a rough guess in such cases, and I don't expect you can either. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] join selectivity
Hi Tom, > -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: 16 December 2004 17:56 > To: Mark Cave-Ayland > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: [HACKERS] join selectivity > > > "Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > > OK I think I've misunderstood something more fundamental > than that; I > > understood from what you said that the RESTRICT clause is used to > > evaluate the cost of table1.geom && table2.geom against > table2.geom && > > table1.geom (i.e. it is used to help decide which one should be seq > > scanned and which should be index scanned in a nested loop > node). So > > is the trick here for a commutative operator to simply > return the same > > value for both cases, as other factors such as index size costs are > > considered elsewhere? > > If the operator is commutative then the result should be too. > Really you should not be thinking about costs at all when > coding a selectivity > estimator: its charter is to estimate how many rows will > match the condition, not to estimate costs per se. > > Note however that these aren't really the "same case", as > you'd be referencing two different columns with presumably > different statistics. Well at the moment PostGIS has a RESTRICT function that takes an expression of the formwhere column is a column consisting of geometries and constant is a bounding box. This is based upon histogram statistics and works well. The surprise came when writing the JOIN function and finding that the RESTRICT clause was being called. Now I understand that this is part of the nested loop and not the JOIN so that helps. But in the case of , if we're estimating the number of rows to return then that becomes harder - I'm thinking pick a rectangle half the area of the statistical rectangle for the column and return the number of rows within that area. > You should probably read the existing selectivity estimators > in utils/adt/selfuncs.c. There's a fair amount of > infrastructure code in that file that you could borrow. > (It's not currently exported because it tends to change from > version to version, but maybe we could think about making > some of the routines global.) OK will try and find some inspiration within. Many thanks, Mark. WebBased Ltd South West Technology Centre Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 791021 F: +44 (0)1752 791023 W: http://www.webbased.co.uk ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] integer datetimes
Tom Lane wrote: I'm probably going to add the flag enabling it to the default buildfarm setup. This should be selected for some buildfarm members but not all, just like other configuration options. We're very democratic - every member gets to choose their own config ;-) cheers andrew ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] integer datetimes
Andrew Dunstan <[EMAIL PROTECTED]> writes: > Has any thought been given to making integer datetimes the default on > platforms that support it? Are there any performance implications? I don't know that anyone's done any serious performance comparisons. My guess is there wouldn't be a noticeable difference, but that's just a guess. The real reason it's not default is portability worries. We do not currently have any fundamental functionality that breaks if you don't have a working int64 datatype, and I'd just as soon not start. I'm not real thrilled with the idea of a platform-dependent default, either. > I'm probably going to add the flag enabling it to the default > buildfarm setup. This should be selected for some buildfarm members but not all, just like other configuration options. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] [Testperf-general] BufferSync and bgwriter
Josh Berkus wrote: Simon, Clearly, OSDL-DBT2 is not a real world test! That is its benefit, since it is heavily instrumented and we are able to re-run it many times without different parameter settings. The application is well known and doesn't suffer that badly from factors that would allow certain effects to be swamped. If it had too much randomness or variation, it would be difficult to interpret. I don't think you followed me. The issue is that for parameters designed to "smooth out spikes" like bgwriter and vacuum delay, it helps to have really bad spikes to begin with. There's a possibility that the parameters (and calculations) that work well for for a "steady-state" OLTP application are actually bad for an application with much more erratic usage, just as a high sort_mem is good for DSS and bad for OLTP. I'm a little concerned that in an erratic, or even just a changing environment there isn't going to be any set of values that are "correct". If I've got this right, the behaviour we're trying to get is: 1. Starting from the oldest dirty block, 2. Write as many dirty blocks as you can, but don't... 3. Re-write frequently used blocks too much (wasteful) So, can we not just keep track of two numbers: 1. Change in the number of dirty blocks this time vs last 2. Number of re-writes we perform (count collisions in a hash or similar - doesn't need to be perfect). If #1 is increasing, then we need to become more active (reduce bgwriter_delay, increase bgwriter_maxpages). If #2 starts to go up, or goes past some threshold then we reduce activity (increase bgwriter_delay, decrease bgwriter_maxpages). If of the last N blocks written, C have been collisions then assume we've run out of low-activity blocks to write, stop and sleep. This has a downside that the figures will never be completely accurate, but has the advantage that it will automatically track activity. I'm clearly beyond my technical knowledge here, so if I haven't understood / it's impractical / will never work, then don't be afraid to step up and let me know. If it helps, you could always think of me as an idiot savant who failed his savant exams :-) -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] join selectivity
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > OK I think I've misunderstood something more fundamental than that; I > understood from what you said that the RESTRICT clause is used to evaluate > the cost of table1.geom && table2.geom against table2.geom && table1.geom > (i.e. it is used to help decide which one should be seq scanned and which > should be index scanned in a nested loop node). So is the trick here for a > commutative operator to simply return the same value for both cases, as > other factors such as index size costs are considered elsewhere? If the operator is commutative then the result should be too. Really you should not be thinking about costs at all when coding a selectivity estimator: its charter is to estimate how many rows will match the condition, not to estimate costs per se. Note however that these aren't really the "same case", as you'd be referencing two different columns with presumably different statistics. > My final question would be how would can we detect the difference between > RESTRICT being called in this manner (as part ofwith > an unknown constant) as opposed towith a known > constant? You should probably read the existing selectivity estimators in utils/adt/selfuncs.c. There's a fair amount of infrastructure code in that file that you could borrow. (It's not currently exported because it tends to change from version to version, but maybe we could think about making some of the routines global.) regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] integer datetimes
Has any thought been given to making integer datetimes the default on platforms that support it? Are there any performance implications? I saw the Tom Lane raised this when it was added, and it looks like the decision was deferred. I know that, ceteris paribus, I would rather have fixed microsecond precision across a slightly smaller range in every app I've ever worked on. Which platforms don't support it? I did some search on all these questions but didn't find much info. I'm probably going to add the flag enabling it to the default buildfarm setup. cheers andrew ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] join selectivity
> -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: 16 December 2004 15:55 > To: Mark Cave-Ayland > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: [HACKERS] join selectivity > > > "Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > > ...and with two indices RESTRICT is called four times. The part I > > find > > confusing is why with one index that RESTRICT is called twice. > > [ shrug... ] clause_selectivity doesn't try to cache the result. Hi Tom, OK I think I've misunderstood something more fundamental than that; I understood from what you said that the RESTRICT clause is used to evaluate the cost of table1.geom && table2.geom against table2.geom && table1.geom (i.e. it is used to help decide which one should be seq scanned and which should be index scanned in a nested loop node). So is the trick here for a commutative operator to simply return the same value for both cases, as other factors such as index size costs are considered elsewhere? My final question would be how would can we detect the difference between RESTRICT being called in this manner (as part ofwith an unknown constant) as opposed towith a known constant? Many thanks, Mark. WebBased Ltd South West Technology Centre Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 791021 F: +44 (0)1752 791023 W: http://www.webbased.co.uk > > I was also thinking whether calling RESTRICT when comparing with an > > unknown value is worth doing at all, however I did think > that perhaps > > if you are using a cast to perform an operation on two > datatypes, then > > you may be able to imply something from the index, such as its > > physical size, and hint that the planner should use a > particular index > > in preference for the other. > > That would be inappropriate; the index size is factored in elsewhere > (gistcostestimate() to be specific). Restriction selectivity > shouldn't directly consider the existence of indexes at all. > > > Would it be correct to assume that if returning the same value for > > RESTRICT for both means that the planner will choose one at random? > > If the tables/indexes are exactly the same size then you'd > get the same cost and the choice would be effectively random. > > regards, tom lane > ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [Testperf-general] BufferSync and bgwriter
> Hmmm, I've not seen this. For example, with people who are having trouble > with checkpoint spikes on Linux, I've taken to recommending that they call > sync() (via cron) every 5-10 seconds (thanks, Bruce, for suggestion!). > Believe it or not, this does help smooth out the spikes and give better > overall performance in a many-small-writes situation. The reason is imho that the checkpoint otherwise also syncs all other writes. These can be writes other backends had to do to replace a buffer. Linux obviously lacks a mechanism to distribute the IO for cached writes over time ala bgwriter (or does not do it when already faced with an IO bottleneck). Andreas ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] join selectivity
"Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > ...and with two indices RESTRICT is called four times. The part I find > confusing is why with one index that RESTRICT is called twice. [ shrug... ] clause_selectivity doesn't try to cache the result. > I was also thinking whether calling RESTRICT when comparing with an unknown > value is worth doing at all, however I did think that perhaps if you are > using a cast to perform an operation on two datatypes, then you may be able > to imply something from the index, such as its physical size, and hint that > the planner should use a particular index in preference for the other. That would be inappropriate; the index size is factored in elsewhere (gistcostestimate() to be specific). Restriction selectivity shouldn't directly consider the existence of indexes at all. > Would it be correct to assume that if returning the same value for > RESTRICT for both means that the planner will choose one at random? If the tables/indexes are exactly the same size then you'd get the same cost and the choice would be effectively random. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] join selectivity
Hi Tom, > -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: 13 December 2004 17:16 > To: Mark Cave-Ayland > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > [EMAIL PROTECTED] > Subject: Re: [HACKERS] join selectivity > > > "Mark Cave-Ayland" <[EMAIL PROTECTED]> writes: > > For a query like this: > > > > SELECT id FROM table1, table2 > > WHERE table1.geom && table2.geom; > > > > RESTRICT selectivity is invoked twice and > > JOIN selectivity is invoked once. > > Hm, are you testing in a context where both tables have > indexes that are relevant to the && operator? > > The estimated join result size is computed from the join > selectivity estimate for the && operator. I was about to say > that restriction selectivity wouldn't be used at all, but on > second thought I believe that it would be invoked while > considering nestloop with inner indexscan plans. That is, > we'd consider > > NestLoop > Seq Scan on table2 > Indexscan on table1 > IndexCond: table1.geom && outer.geom > > and to determine the estimated cost of each indexscan, we > would invoke restriction selectivity for &&, with varRelid > referencing table1. Given this call you are supposed to treat > table2.geom as a constant of uncertain value, so the thing is > semantically sensible as a restriction clause for table1 > (whether you can produce a really good estimate is another > question :-(). > > Similarly, we'd consider the reverse plan with table1 as > outer, and that would give rise to another restriction > selectivity check with varRelid = table2. Just to clarify, here are the explain results from strk's query: strk=# explain analyze select * from test1, test2 where test1.geom && test2.geom; NOTICE: LWGEOM_gist_joinsel called (returning 0.05) QUERY PLAN -- Nested Loop (cost=3.27..105.84 rows=1 width=64) (actual time=0.217..39.305 rows=2700 loops=1) Join Filter: ("inner".geom && "outer".geom) -> Seq Scan on test2 (cost=0.00..28.32 rows=132 width=32) (actual time=0.081..1.111 rows=108 loops=1) -> Materialize (cost=3.27..3.52 rows=25 width=32) (actual time=0.001..0.011 rows=25 loops=108) -> Seq Scan on test1 (cost=0.00..3.25 rows=25 width=32) (actual time=0.043..0.129 rows=25 loops=1) Total runtime: 40.471 ms (6 rows) so with no indices the JOIN function is called once, RESTRICT never. I can understand this :) strk=# create index test2_gist on test2 using gist (geom gist_geometry_ops); CREATE INDEX strk=# explain analyze select * from test1, test2 where test1.geom && test2.geom; NOTICE: LWGEOM_gist_joinsel called (returning 0.05) NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity QUERY PLAN -- Nested Loop (cost=3.27..92.11 rows=1 width=64) (actual time=0.046..39.219 rows=2700 loops=1) Join Filter: ("inner".geom && "outer".geom) -> Seq Scan on test2 (cost=0.00..28.08 rows=108 width=32) (actual time=0.009..0.198 rows=108 loops=1) -> Materialize (cost=3.27..3.52 rows=25 width=32) (actual time=0.000..0.013 rows=25 loops=108) -> Seq Scan on test1 (cost=0.00..3.25 rows=25 width=32) (actual time=0.002..0.052 rows=25 loops=1) Total runtime: 40.307 ms (6 rows) ...with one index RESTRICT is called twice. strk=# create index test1_gist on test1 using gist (geom gist_geometry_ops); CREATE INDEX strk=# explain analyze select * from test1, test2 where test1.geom && test2.geom; NOTICE: LWGEOM_gist_joinsel called (returning 0.05) NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity NOTICE: LWGEOM_gist_sel called NOTICE: no constant arguments - returning default selectivity QUERY PLAN -- Nested Loop (cost=3.27..92.11 rows=1 width=64) (actual time=0.052..38.867 rows=2700 loops=1) Join Filter: ("inner".geom && "outer".geom) -> Seq Scan on test2 (cost=0.00..28.08 rows=108 width=32) (actual time=0.012..0.181 rows=108 loops=1) -> Materialize (cost=3.27..3.52 rows=25 width=32) (actual time=0.000..0.010 rows=25 loops=108) -> Seq Scan on test1 (cost=0.00..3.25 rows=25 width=32) (actual time=0.002..0.032 rows=25
Re: [Fwd: Re: [HACKERS] race condition for drop schema cascade?]
I have rebuild the filesystem on my indy (MIPS) that Andrew reported on. The first run completed 100%, I would give it a couple more runs before we can say its the filesystem not Postgresql that was causing the drop to fail. -- Original Message --- From: Andrew Dunstan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wed, 15 Dec 2004 16:42:59 -0500 Subject: [Fwd: Re: [HACKERS] race condition for drop schema cascade?] > Jim, please advise? > > thanks > > andrew > > Original Message > Subject: Re: [HACKERS] race condition for drop schema cascade? > Date: Wed, 15 Dec 2004 16:29:01 -0500 > From: Tom Lane <[EMAIL PROTECTED]> > To: Andrew Dunstan <[EMAIL PROTECTED]> > CC: PostgreSQL-development <[EMAIL PROTECTED]> > References: <[EMAIL PROTECTED]> > > Andrew Dunstan <[EMAIL PROTECTED]> writes: > > I have seen this failure several times, but not consistently, on the > > buildfarm member otter (Debian/MIPS) and possible on others, and am > > wondering if it indicates a possible race condition on DROP SCHEMA CASCADE. > > Hard to see what, considering that there's only one backend touching > that tablespace in the test. I'd be inclined to wonder if there's > a filesystem-level problem on that platform. What filesystem are you > running on anyway? > > regards, tom lane --- End of Original Message --- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] bgwriter changes
Zeugswetter Andreas DAZ SD wrote: This has the disadvantage of converging against 0 dirty pages. A system that has less than maxpages dirty will write every page with every bgwriter run. Yeah, I'm concerned about the bgwriter being overly aggressive if we disable bgwriter_percent. If we leave the settings as they are (delay = 200, maxpages = 100, shared_buffers = 1000 by default), we will be writing all the dirty pages to disk every 2 seconds, which seems far too much. It might also be good to reduce the delay, in order to more proactively keep the LRUs clean (e.g. scanning to find N dirty pages once per second is likely to reach father away from the LRU than scanning for N/M pages once per 1/M seconds). On the other hand the more often the bgwriter scans the buffer pool, the more times the BufMgrLock needs to be acquired -- and in a system in which pages aren't being dirtied very rapidly (or the dirtied pages tend to be very hot), each of those scans is going to take a while to find enough dirty pages using #2. So perhaps it is best to leave the delay as is for 8.0. This might have the disadvantage of either leaving too much for the checkpoint or writing too many dirty pages in one run. Is writing a lot in one run actually a problem though ? Or does the bgwriter pause periodically while writing the pages of one run ? The bgwriter does not pause between writing pages. What would be the point of doing that? The kernel is going to be caching the write() anyway. If this is expressed in pages it would naturally need to be more than the current maxpages (to accomodate for clean pages). The suggested 2% sounded way too low for me (that leaves 98% to the checkpoint). I agree this might be a problem, but it doesn't necessarily leave 98% to be written at checkpoint: if the buffers in the LRU change over time, the set of pages searched by the bgwriter will also change. I'm not sure how quickly the pages near the LRU change in a "typical workload"; moreover, I think this would vary between different workloads. -Neil ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] bgwriter changes
> > Only if you redefine the meaning of bgwriter_percent. At present it's > > defined by reference to the total number of dirty pages, and that can't > > be known without collecting them all. > > > > If it were, say, a percentage of the total length of the T1/T2 lists, > > then we'd have some chance of stopping the scan early. > The other way around would make sense. In order to avoid writing the > busiest buffers at all (except for checkpoinging), the parameter should > mean "don't scan the last x% of the queue at all". Your meaning is 1 - above meaning (at least that is what Tom and I meant), but is probably easier to understand (== Informix LRU_MIN_DIRTY). > Still, we need to avoid scanning over all the clean blocks of a large > buffer pool, so there is need for a separate dirty-LRU. Maybe a "may be dirty" bitmap would be easier to do without beeing deadlock prone ? Andreas ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] getting 'order by' working with unicode locale? ICU?
Palle Girgensohn wrote: > Not on FreeBSD, since collation is not implemented in unicode > locales. One way would be to implement it in the OS, of course... Try taking the locale definition files from another system and use localedef to build locale files for your local system. The localedef source files are supposed to be portable. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 8: explain analyze is your friend