Re: [HACKERS] Additional stats for Relations
Hi Jim, On 10/13/06, Jim C. Nasby [EMAIL PROTECTED] wrote: On Fri, Oct 13, 2006 at 06:17:47PM +0530, NikhilS wrote: Currently a select * from pg_statio_user_tables; displays only heap_blks_read, heap_blks_hit stats amongst others for the main relation. It would be good to have the following stats collected too. I think these stats can be used to better statistically analyze/understand the block I/O activity on the relation: heap_blks_reused: The number of buffers returned by the FSM for use to store a new tuple inThe description on this is misleading... FSM doesn't return buffers, it returns pages that have free space on them. Nikhils FSM returns the block number from which we fetch the buffer. This is similar to the way we track buffer_read stats in ReadBuffer. Nikhils I'm also not sure if this metric is what you actually want, since asingle page can be returned many times from the FSM even between vacuums. Tracking how many pages for a relation have been put into theFSM might be more useful... Nikhils Pages might be put into the FSM, but by this metric don't we get the actual usage of the pages from the FSM? Agreed a single page can be returned multiple times, but since it serves a new tuple, shouldn't we track it? Nikhils heap_blks_extend: The number of times file extend was invoked on the relation heap_blks_truncate: The total number of blocks that have been truncated due to vacuum activity e.g. As an addendum to the truncate stats above, we can also have the additional following stats: heap_blks_maxtruncate: The max block of buffers truncated in one go heap_blks_ntruncate: The number of times truncate was called on this relation I can come up with a patch (already have one) for the above. Any thought/comments?Do you have a use-case for this info? I can see where it might be neat to know, but I'm not sure how you'd actually use it in the real world. Nikhils The use-case according to me is that these stats help prove the effectiveness of autovacuum/vacuum operations. By varying some autovac guc variables, and doing subsequent (pgbench e.g.) runs, one can find out the optimum values for these variables using these stats. Nikhils Regards, Nikhils EnterpriseDB http://www.enterprisedb.com-- All the world's a stage, and most of us are desperately unrehearsed.
Re: [HACKERS] Interface of the R-tree in order to work with postgresql
Hi, Jorge, jorge alberto wrote: I'm Jorge from Peru South America, and this is my first post I want to know how can I add a new spatial access method into the postgresql (I'm doing research on spatial access methods( reading a lot of papers and programming a lot too ) but also I want to know how can I add my new data structure( if someday i get it, of course =) ) in the postgresql, I mean where can i find the .h that describes the interface that a spatial access method, like the R-tree, must have in order to work with postgresql. I think that to understand how postgresql works with spatial access methods, like the R-tree, I need to find a .h or .c to begin reading the code, so can you tell me where to start? Besides the hints others posted here, you might want to look at contrib/tsearch2 and PostGIS for examples of how to use GIST. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org signature.asc Description: OpenPGP digital signature
Re: [HACKERS] query optimization with UDFs
Hi, Tom, Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: BTW, I think it would make sense to implement a limited subset of the xfunc ideas: add options to CREATE FUNCTION to allow cost information to be specified, and then take advantage of this information instead of using the existing constant kludges. This would be a tangible improvement, and would have minimal impact on the planner. The trick is to figure out what a useful parameterized cost model would look like. IIRC, the main reason the xfunc code rotted on the vine was that its cost parameters didn't seem to be either easy to select or powerful in predicting actual cost. We'd have to do better this time. I don't know what the xfunc people did, but at least for some varlen data types (Arrays, PostGIS, text), some function costs (concatenation, GeomUnion etc.) can be estimated via the average field size of the tables. Has that idea been considered? HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org signature.asc Description: OpenPGP digital signature
Re: [HACKERS] query optimization with UDFs
Markus Schaber [EMAIL PROTECTED] writes: Tom Lane wrote: The trick is to figure out what a useful parameterized cost model would look like. IIRC, the main reason the xfunc code rotted on the vine was that its cost parameters didn't seem to be either easy to select or powerful in predicting actual cost. We'd have to do better this time. I don't know what the xfunc people did, but at least for some varlen data types (Arrays, PostGIS, text), some function costs (concatenation, GeomUnion etc.) can be estimated via the average field size of the tables Has that idea been considered? [ shrug... ] Concatenation is definitely not one of the functions we need to worry about. In fact, I'd say that only functions containing database accesses are really worth worrying about --- and the hard part of modeling them is the possible variation in the number of rows touched depending on their parameters. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] New shared memory hooks proposal (was Re: pre_load_libraries)
Marc Munro [EMAIL PROTECTED] writes: The attached patch provides add-ins with the means to register for shared memory and LWLocks. I finally got around to reviewing this patch, and realized that it's got a pretty fundamental design flaw: it isn't useful under Windows (or any other EXEC_BACKEND platform), because there isn't any provision for backends to locate structs allocated by other backends by means of searching in shared memory. AFAICS the code can only do something useful in a platform where allocations made in the postmaster process can be found by backends via fork inheritance of pointers. The right way to handle shmem allocations is to use ShmemInitStruct to either allocate a shared struct for the first time or attach to a previously made instance of the struct. (This struct could be a memory allocation arena itself, but that's not the core code's problem.) Now we could extend the patch so that each addin has its own ShmemIndex within its private workspace, but I think that's probably overkill. My inclination is to rip out ShmemAllocFromContext and expect addins to use ShmemInitStruct the same as everyone else. The hook callable at shared_preload_libraries time should just serve to add the necessary amount to the computed size of the shared memory segment. RegisterAddinLWLock is broken in the same way: it could only be used safely if the registered lock ID were remembered in shared memory, but since shared memory doesn't exist at the time it's supposed to be called, there's no way to do that. Again, it'd seem to work as long as the lock ID value were inherited via fork, but that's gonna fail on EXEC_BACKEND builds. I think we should probably take this out in favor of something that just increments a counter that replaces NUM_USER_DEFINED_LWLOCKS, and expect people to use LWLockAssign() at an appropriate time while initializing their shared memory areas. It strikes me that there's a race condition here, which we've not seen in previous use because backends expect all standard shared memory structs to have already been initialized by the postmaster. An add-on will instead have to operate under the regime of first backend wanting to use the struct must init it. Although ShmemInitStruct returns a found bool telling you you've got to init it, there's no interlock ensuring that you can do so before someone else comes along and tries to use the struct (which he'll assume is done because he sees found = true). And, per above discussion, an add-on can't solve this for itself using an add-on LWLock, because it really has to acquire its add-on locks while initializing that same shmem struct, which is where it's going to keep the locks' identity :-( So I think we need to provide a standard LWLock reserved for the purpose of synchronizing first-time use of a shmem struct. The coding rules for an add-on would then look like: * in the shared_preload_libraries hook: RequestAddinShmemSpace(size); RequestAddinLWLocks(n); * in a backend, to access a shared memory struct: static mystruct *ptr = NULL; if (!ptr) { boolfound; LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE); ptr = ShmemInitStruct(my struct name, size, found); if (!ptr) elog(ERROR, out of shared memory); if (!found) { initialize contents of shmem area; acquire any requested LWLocks using: ptr-mylockid = LWLockAssign(); } LWLockRelease(AddinShmemInitLock); } Thoughts? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Problems building 8.2beta1 on macos G5 xserve
Trying to build 8.2beta1 on MacOS G5 Xserver, OS version 10.4.7. I got this: gcc -no-cpp-precomp -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wendif-labels -fno-strict-aliasing -dynamiclib -install_name /usr/local/pgsql/lib/libpq.5.dylib -compatibility_version 5 -current_version 5.0 -exported_symbols_list exports.list -multiply_defined suppress fe-auth.o fe-connect.o fe-exec.o fe-misc.o fe-print.o fe-lobj.o fe-protocol2.o fe-protocol3.o pqexpbuffer.o pqsignal.o fe-secure.o md5.o ip.o wchar.o encnames.o noblock.o pgstrcasecmp.o thread.o -L../../../src/port -L/sw/lib -o libpq.5.0.dylib /usr/bin/libtool: for architecture: cputype (16777234) cpusubtype (0) file: -lSystem is not an object file (not allowed in a library) make[3]: *** [libpq.5.0.dylib] Error 1 make[2]: *** [all] Error 2 make[1]: *** [all] Error 2 make: *** [all] Error 2 % uname -a Darwin Sherlock 8.7.0 Darwin Kernel Version 8.7.0: Fri May 26 15:20:53 PDT 2006; root:xnu-792.6.76.obj~1/RELEASE_PPC Power Macintosh powerpc Any suggestions? Thanks, Sean ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Not quite there on timezone names in timestamp input
This doesn't work: regression=# select '2006-07-13 09:20:00 EST5EDT'::timestamptz; ERROR: invalid input syntax for type timestamp with time zone: 2006-07-13 09:20:00 EST5EDT It never has worked in the past, of course, but I expected it to work in HEAD seeing that the zic database includes EST5EDT as one of the defined zone names (and pg_timezone_names shows it as available). On investigation, the problem seems to lie with ParseDateTime(), which is being quite finicky about what it will believe is a timezone name. In particular it won't accept digits as part of the name. That can probably be fixed but I'm wondering about the other more or less special cases in the timezone name list, such as America/Port-au-Prince GB-Eire GMT-0 Etc/GMT+8 W-SU In particular it seems tough to accept GMT-0 as a timezone name without breaking recognition of month-name dates such as 'Nov-09-2006'. Any thoughts about appropriate heuristics? Should we just accept a few cases where we don't recognize zic timezone names? (If so, we'd better hack pg_timezone_names to not show those cases...) Another problem here is case folding. The existing code smashes the string to initcap format up to the '/' if any, but doesn't fold the part after the '/', leaving us with a weird combination of case sensitivity and insensitivity, not to mention another reason why EST5EDT won't work. I'd like it to be entirely case insensitive, but am not sure how to make the file lookup work that way. Thoughts? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Problems building 8.2beta1 on macos G5 xserve
Sean Davis [EMAIL PROTECTED] writes: Trying to build 8.2beta1 on MacOS G5 Xserver, OS version 10.4.7. I got this: /usr/bin/libtool: for architecture: cputype (16777234) cpusubtype (0) file: -lSystem is not an object file (not allowed in a library) What Xcode version have you got? My recollection is that this is the symptom of trying to build with pre-10.4 Xcode tools on 10.4. (Yeah, time to download that 800MB file again :-( ... but at least it's free...) regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Problems building 8.2beta1 on macos G5 xserve
Tom Lane wrote: Sean Davis [EMAIL PROTECTED] writes: Trying to build 8.2beta1 on MacOS G5 Xserver, OS version 10.4.7. I got this: /usr/bin/libtool: for architecture: cputype (16777234) cpusubtype (0) file: -lSystem is not an object file (not allowed in a library) What Xcode version have you got? My recollection is that this is the symptom of trying to build with pre-10.4 Xcode tools on 10.4. (Yeah, time to download that 800MB file again :-( ... but at least it's free...) Looks like that might be the case (my gcc is from 2003!). I'll try again after updating and let you know for sure. Thanks, Tom. Sean ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [BUGS] BUG #2683: spi_exec_query in plperl returns column names which are not marked as UTF8
Vitali Stupin [EMAIL PROTECTED] writes: If database uses UTF8 encoding, then spi_exec_query in plperl should return query results in UTF8 encoding. But unfortunately only data is marked as UTF8, while column names are not. It looks to me like basically everywhere in plperl.c that does newSVpv() should follow it with #if PERL_BCDVERSION = 0x5006000L if (GetDatabaseEncoding() == PG_UTF8) SvUTF8_on(sv); #endif whereas currently there are only a couple of places that do that. I'm tempted to consolidate this into a function on the order of newSVstring(const char *) or some such. Comments? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] large object regression tests
Patch applied. Thanks. --- Jeremy Drake wrote: On Sun, 24 Sep 2006, Jeremy Drake wrote: On Thu, 21 Sep 2006, Tom Lane wrote: I think we could do without the Moby Dick extract too ... I am open to suggestions. I saw one suggestion that I use an image of an elephant, but I suspect that was tongue-in-cheek. I am not very fond of the idea of generating repetitious data, as I think it would be more difficult to determine whether or not the loseek/tell functions put me in the right place in the middle of the file. I just had the idea that I could use one of the existing data files which are used for testing COPY instead of the Moby Dick extract. They are already there, a few of them are pretty good sized, they have data in the file which is not just simple repetition so it would be pretty obvious if the seek function broke, and they are very unlikely to change. I am considering changing the test I put together to use tenk.data as the input file tomorrow and send in what I have again, since I also am doing a test of \lo_import (which also requires a patch to psql I sent in earlier to fix the output of the \lo_* commands to respect the output settings). -- When does summertime come to Minnesota, you ask? Well, last year, I think it was a Tuesday. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] large object regression tests
Sorry, please disregard. Was means for a Japanese FAQ update. --- Bruce Momjian wrote: Patch applied. Thanks. --- Jeremy Drake wrote: On Sun, 24 Sep 2006, Jeremy Drake wrote: On Thu, 21 Sep 2006, Tom Lane wrote: I think we could do without the Moby Dick extract too ... I am open to suggestions. I saw one suggestion that I use an image of an elephant, but I suspect that was tongue-in-cheek. I am not very fond of the idea of generating repetitious data, as I think it would be more difficult to determine whether or not the loseek/tell functions put me in the right place in the middle of the file. I just had the idea that I could use one of the existing data files which are used for testing COPY instead of the Moby Dick extract. They are already there, a few of them are pretty good sized, they have data in the file which is not just simple repetition so it would be pretty obvious if the seek function broke, and they are very unlikely to change. I am considering changing the test I put together to use tenk.data as the input file tomorrow and send in what I have again, since I also am doing a test of \lo_import (which also requires a patch to psql I sent in earlier to fix the output of the \lo_* commands to respect the output settings). -- When does summertime come to Minnesota, you ask? Well, last year, I think it was a Tuesday. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings