Re: [HACKERS] Rethinking stats communication mechanisms
Greg Stark <[EMAIL PROTECTED]> writes: > PFC <[EMAIL PROTECTED]> writes: > >> - Will only be of use if the command is taking a long, long time. >> So, it need not be realtime ; no problem if the data comes with a >> little delay, or not at all if the command executes quickly. > > I would dispute this point. Picture a system running a very short very very > often. It may still be the main problem, may even be taking 90+% of the cpu > time. If you got an accurate snapshot of all the currently executing queries > you'll see it popping up suspiciously often. Yeah, but if you turn on query logging in that case you'll see the bajillions of short queries, so you don't need the accurate snapshot to diagnose that. -Doug ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Rethinking stats communication mechanisms
PFC <[EMAIL PROTECTED]> writes: > - Will only be of use if the command is taking a long, long time. > So, it need not be realtime ; no problem if the data comes with a > little delay, or not at all if the command executes quickly. I would dispute this point. Picture a system running a very short very very often. It may still be the main problem, may even be taking 90+% of the cpu time. If you got an accurate snapshot of all the currently executing queries you'll see it popping up suspiciously often. -- greg ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
R: [HACKERS] Per-server univocal identifier
> I assume by 'univocal' you mean unequivocal. Yes, sorry about that: I'm writing italish... > Can you set it up in a table per server? or in a file? or would you > rather use a guuid? A per-server table will probably be my way. > And how is this to be made available? Well, a function would be fine. > And is it to be unique per machine, or per cluster (since you can have > many postgresql clusters on one machine). If it is a per-machine discriminator, it will be a per-node discriminator as well... Also, it will be useful to people not running a cluster (like me), since they only need a multi-master capability on a table for a legacy app... > cheers > > andrew Cheers, giampaolo > > Giampaolo Tomassoni wrote: > > >Dears, > > > >I'm looking for a way to univocally identify the server on which > a sql function or statement is running. My idea would be > something close to the value returned by a 'host -f' under linux: > the FQDN of the host, but even a serial code or a number would be > fine to me. It needs only to be immutable, I guess. > > > >I know there is something suitable under Oracle and, even worse, > under mysql... > > > >The purpose is mostly related to a light replication problem I > have, in which I need to 'emulate' a multi-master replication on a table. > > > >I placed a question on the IRC list and I found a couple of > unreplied messages asking the same thing in the pgsql-general list. > > > > > > > > > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Per-server univocal identifier
Andrew Dunstan <[EMAIL PROTECTED]> writes: > And is it to be unique per machine, or per cluster (since you can have > many postgresql clusters on one machine). Actually, there are *lots* of ambiguities there. For instance, if you pg_dump and reload a cluster do you want the ID to change or stay the same? How about copying the $PGDATA tree to another server? How about redirecting the same cluster to listen on a new port number? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Per-server univocal identifier
I assume by 'univocal' you mean unequivocal. Can you set it up in a table per server? or in a file? or would you rather use a guuid? And how is this to be made available? And is it to be unique per machine, or per cluster (since you can have many postgresql clusters on one machine). cheers andrew Giampaolo Tomassoni wrote: Dears, I'm looking for a way to univocally identify the server on which a sql function or statement is running. My idea would be something close to the value returned by a 'host -f' under linux: the FQDN of the host, but even a serial code or a number would be fine to me. It needs only to be immutable, I guess. I know there is something suitable under Oracle and, even worse, under mysql... The purpose is mostly related to a light replication problem I have, in which I need to 'emulate' a multi-master replication on a table. I placed a question on the IRC list and I found a couple of unreplied messages asking the same thing in the pgsql-general list. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
R: [HACKERS] Per-server univocal identifier
> ...omissis... > > Perhaps inet_server_addr() and inet_server_port() would answer. These > aren't super-useful on local connections, however. No, infact. Mine are local cons... > How "immutable" do you want it to be exactly? The system_identifier > embedded in pg_control might be interesting if you want something that > will change at initdb. The same immutability of a 'host -f' would be fine to me. > I don't think there's a way to look at that from > SQL but you could write a C function to access it. I would prefer to avoid writing an external module: that way I would have to put more administration effort when upgrading the postgres installation. Well, I'll resort probably to put a unique value in a table. Or... Can I put a custom variable in pgsql.conf? --- Giampaolo Tomassoni - IT Consultant Piazza VIII Aprile 1948, 4 I-53044 Chiusi (SI) - Italy Ph: +39-0578-21100 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Per-server univocal identifier
"Giampaolo Tomassoni" <[EMAIL PROTECTED]> writes: > I'm looking for a way to univocally identify the server on which a sql > function or statement is running. My idea would be something close to the > value returned by a 'host -f' under linux: the FQDN of the host, but even a > serial code or a number would be fine to me. It needs only to be immutable, I > guess. Perhaps inet_server_addr() and inet_server_port() would answer. These aren't super-useful on local connections, however. How "immutable" do you want it to be exactly? The system_identifier embedded in pg_control might be interesting if you want something that will change at initdb. I don't think there's a way to look at that from SQL but you could write a C function to access it. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Per-server univocal identifier
Dears, I'm looking for a way to univocally identify the server on which a sql function or statement is running. My idea would be something close to the value returned by a 'host -f' under linux: the FQDN of the host, but even a serial code or a number would be fine to me. It needs only to be immutable, I guess. I know there is something suitable under Oracle and, even worse, under mysql... The purpose is mostly related to a light replication problem I have, in which I need to 'emulate' a multi-master replication on a table. I placed a question on the IRC list and I found a couple of unreplied messages asking the same thing in the pgsql-general list. Regards, --- Giampaolo Tomassoni - IT Consultant Piazza VIII Aprile 1948, 4 I-53044 Chiusi (SI) - Italy Ph: +39-0578-21100 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Rethinking stats communication mechanisms
PFC <[EMAIL PROTECTED]> writes: > So, the proposal : > On executing a command, Backend stores the command string, then > overwrites the counter with (counter + 1) and with the timestamp of > command start. > Periodically, like every N seconds, a separate process reads the > counter, > then reads the data, then reads the counter again. Well, it wouldn't be "periodic", it'd be "whenever someone reads pg_stat_activity". I was considering solutions like this, but I'm a little disturbed by the possibility that the would-be reader might loop indefinitely if the source backend is constantly changing its entry. Still, slow reads of pg_stat_activity might be a good tradeoff for taking overhead out of the update operation. BTW, I think the writer would actually need to bump the counter twice, once before and once after it modifies its stats area. Else there's no way to detect that you've copied a partially-updated stats entry. > If the backend process itself should update its process title, and this > > operation is costly, it should only be done if the current query has been > running for more than T seconds. Managing that would cost more than just doing it, I think. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Exporting type OID macros in a cleaner fashion
On Fri, Jun 16, 2006 at 10:58:05PM -0400, Tom Lane wrote: > The alternative I'm currently thinking about is to build and install an > auto-generated file comparable to fmgroids.h, containing *only* the type > OID macro #defines extracted from pg_type.h. This would require just a > trivial amount of sed hacking. This is a good idea. It would be nice to be able to have stuff useful for the frontend available without having to pullin everything for the backend. > I'm not entirely clear where to install such a thing though. The > fmgroids.h precedent suggests server/utils/fmgroids.h, but if this is > intended for client-side use it shouldn't go under /server. I'm > tempted to install it as "pgtypeoids.h" at the top level of the > installation include directory ... but then I'm not clear which source > directory ought to generate it. At first glance I'd get include/catalog to build it and install it. pgtypeoids.h sounds like a good name to me and it should be in the top level. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Rethinking stats communication mechanisms
It strikes me that we are using a single communication mechanism to handle what are really two distinct kinds of data: Interesting. I recently read a paper on how to get rid of locks for this kind of pattern. * For the Command String - Problem : need to display the currently executing command in the ps list. - Will only be of use if the command is taking a long, long time. So, it need not be realtime ; no problem if the data comes with a little delay, or not at all if the command executes quickly. People are only interested in the currently executing command to answer questions like "what query has my server grinding ?" Point : the currently executing query is only interesting to display if it's currently executing. If it's not, it's in the log (if enabled). So, the proposal : Each backend has a shared memory area where to store : - the currently executing command (like in your proposal). - a timestamp - a counter On executing a command, Backend stores the command string, then overwrites the counter with (counter + 1) and with the timestamp of command start. Periodically, like every N seconds, a separate process reads the counter, then reads the data, then reads the counter again. If the counter value changed, the process is repeated. If the counter value did not change, the command string did not change either, so it's valid, and can be used. Other thoughts : If the backend process itself should update its process title, and this operation is costly, it should only be done if the current query has been running for more than T seconds. However syscalls for getting the current time are costly. A separate process can update a counter in shared memory with the current time every N seconds, and the backend can check it. The main point is that if this value is written to every few seconds, but read often by only one process ; or written often but read seldom, there will not be a lot of interprocessor cache trashing on it. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Jim Nasby wrote: On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote: First thing as soon as I have a login, of course, is to set up a Buildfarm instance. Keep in mind that buildfarm clients and benchmarking stuff don't usually mix well. On a fast machine like this a buildfarm run is not going to take very long. You could run those once a day at times of low demand. Or even once or twice a week. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Rethinking stats communication mechanisms
In view of my oprofile results http://archives.postgresql.org/pgsql-hackers/2006-06/msg00859.php I'm thinking we need some major surgery on the way that the stats collection mechanism works. It strikes me that we are using a single communication mechanism to handle what are really two distinct kinds of data: * Current-state information, eg, what backends are alive and what commands they are currently working on. Ideally we'd like this type of info to be 100% up-to-date. But once a particular bit of information (eg a command string) is obsolete, it's not of interest anymore. * Event counts. These accumulate and so past information is still important. On the other hand, it's not so critical that the info be completely up-to-date --- the central counters can lag behind a bit, so long as events eventually get counted. I believe the stats code was designed with the second case in mind, but we've abused it to handle the first case, and that's why we've now got performance problems. If we are willing to assume that the current-state information is of fixed maximum size, we could store it in shared memory. (This suggestion already came up in the recent thread about ps_status, and I think it's been mentioned before too --- but my point here is that we have to separate this case from the event-counting case.) The only real restriction we'd be making is that we can only show the first N characters of current command string, but we're already accepting that limitation in the existing stats code. (And we could make N whatever we wanted, without worrying about UDP datagram limits.) I'm envisioning either adding fields to the PGPROC array, or perhaps better using a separate array with an entry for each backend ID. Backends would write their status info into this array and any interested backend could read it out again. The stats collector process needn't be involved at all AFAICS. This eliminates any process-dispatch overhead to report command start or command termination. Instead we'd have some locking overhead, but contention ought to be low enough that that's not a serious problem. I'm assuming a separate lock for each array entry so that backends don't contend with each other to update their entries; contention occurs only when someone is actively reading the information. We should probably use LWLocks not spinlocks because the time taken to copy a long command string into the shared area would be longer than we ought to hold a spinlock (but this seems a bit debatable given the expected low contention ... any thoughts?) The existing stats collection mechanism seems OK for event counts, although I'd propose two changes: one, get rid of the separate buffer process, and two, find a way to emit event reports in a time-driven way rather than once per transaction commit. I'm a bit vague about how to do the latter at the moment. Comments? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] MultiXacts & WAL
Josh Berkus writes: >> Please dump some of the WAL segments with xlogdump so we can get a >> feeling for what's in there. > OK, will do on Monday's test run. Is it possible for me to run this at the > end of the test run, or do I need to freeze it in the middle to get useful > data? I'd just copy off a random sample of WAL segment files while the run is proceeding. You don't need very many, half a dozen at most. > Also, we're toying with the idea of testing full_page_writes=off for Solaris. > The Solaris engineers claim that it should be safe on Sol10 + Sun hardware. > I'm not entirely sure that's true; is there a destruction test of the bug > that caused us to remove that option? The bug that made us turn it off in the 8.1 branch had nothing to do with hardware reliability or the lack thereof. As for testing, will they let you yank the power cord? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MultiXacts & WAL
Tom, > Please dump some of the WAL segments with xlogdump so we can get a > feeling for what's in there. OK, will do on Monday's test run. Is it possible for me to run this at the end of the test run, or do I need to freeze it in the middle to get useful data? Also, we're toying with the idea of testing full_page_writes=off for Solaris. The Solaris engineers claim that it should be safe on Sol10 + Sun hardware. I'm not entirely sure that's true; is there a destruction test of the bug that caused us to remove that option? -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community
On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote: First thing as soon as I have a login, of course, is to set up a Buildfarm instance. Keep in mind that buildfarm clients and benchmarking stuff don't usually mix well. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL community
On Jun 16, 2006, at 12:01 PM, Josh Berkus wrote: Folks, I am thrill to inform you all that Sun has just donated a fully loaded T2000 system to the PostgreSQL community, and it's being setup by Corey Shields at OSL (osuosl.org) and should be online probably early next week. The system has So this system will be hosted by Open Source Lab in Oregon. It's going to be "donated" to Software In the Public Interest, who will own for the PostgreSQL fund. We'll want to figure out a scheduling system to schedule performance and compatibility testing on this machine; I'm not sure exactly how that will work. Suggestions welcome. As a warning, Gavin Sherry and I have a bunch of pending tests already to run. First thing as soon as I have a login, of course, is to set up a Buildfarm instance. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MultiXacts & WAL
Josh Berkus writes: > I would like to see some checking of this, though. Currently I'm doing > testing of PostgreSQL under very large numbers of connections (2000+) and am > finding that there's a huge volume of xlog output ... far more than > comparable RDBMSes. So I think we are logging stuff we don't really have > to. Please dump some of the WAL segments with xlogdump so we can get a feeling for what's in there. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] MultiXacts & WAL
Tom, Paolo, > Yeah, it's difficult to believe that multixact stuff could form a > noticeable fraction of the total WAL load, except perhaps under really > pathological circumstances, because the code just isn't supposed to be > exercised often. So I don't think this is worth pursuing. Paolo's free > to try to prove the opposite of course ... but I'd want to see numbers > not speculation. I would like to see some checking of this, though. Currently I'm doing testing of PostgreSQL under very large numbers of connections (2000+) and am finding that there's a huge volume of xlog output ... far more than comparable RDBMSes. So I think we are logging stuff we don't really have to. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] MultiXacts & WAL
paolo romano <[EMAIL PROTECTED]> writes: > Concerning the prepare state of two phase commit, as I was pointing out in my > previous post, shared locks can safely be released once a transaction gets > precommitted, hence they do not have to be made durable. The above statement is plainly wrong. It would for example allow violation of FK constraints. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL
Tom, > 18% in s_lock is definitely bad :-(. Were you able to determine which > LWLock(s) are accounting for the contention? Gavin Sherry and Tom Daly (Sun) are currently working on identifying the problem lock using DLWLOCK_STATS. Any luck, Gavin? -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MultiXacts & WAL
On Sat, 17 Jun 2006, paolo romano wrote: The original point I was moving is if there were any concrete reason (which still I can't see) to require Multixacts recoverability (by means of logging). Concerning the prepare state of two phase commit, as I was pointing out in my previous post, shared locks can safely be released once a transaction gets precommitted, hence they do not have to be made durable. No, it's not safe to release them until 2nd phase commit. Imagine table foo and table bar. Table bar has a foreign key reference to foo. 1. Transaction A inserts a row to bar, referencing row R in foo. This acquires a shared lock on R. 2. Transaction A precommits, releasing the lock. 3. Transaction B deletes R. The new row inserted by A is not visible to B, so the delete succeeds. 4. Transaction A and B commit. Oops, the new row in bar references R that doesn't exist anymore. Holding the lock until the true end of transaction, the 2nd phase of commit, blocks B from deleting R. - Heikki ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] oprofile results for stats collector test
I've gotten some insight into the stats collection issues by monitoring Bruce's test case with oprofile (http://oprofile.sourceforge.net/). Test conditions: PG CVS HEAD, built with --enable-debug --disable-cassert (debug symbols are needed for oprofile), on current Fedora Core 5 (Linux kernel 2.6.16-1.2133_FC5 #1 SMP, gcc 4.1.1), running on dual Xeon EM64T with hyperthreading enabled. All configuration parameters are at defaults except for stats_command_string. I extended the test script to be 24 "SELECT 1;" commands so that it would run long enough to get a trustworthy profile. In the profile data below, I cut off each listing where the percentages fell below about 1% of global runtime. With stats_command_string off: 21.7 sec to run script (with oprofile off) 22.7 sec to run script (with oprofile active) These numbers stay pretty consistent over repeated trials. oprofile global report: CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 24 GLOBAL_POWER_E...| samples| %| -- 156544 47.2570 /usr/lib/debug/lib/modules/2.6.16-1.2133_FC5/vmlinux 84114 25.3921 /home/tgl/testversion/bin/postgres 63873 19.2818 /lib64/libc-2.4.so 13621 4.1119 /home/tgl/testversion/bin/psql 10101 3.0493 /home/tgl/testversion/lib/libpq.so.5.0 1324 0.3997 /usr/bin/oprofiled kernel (note percentages here are percentage of kernel time not total time): samples %symbol name 1932612.3454 schedule 11738 7.4982 _raw_spin_lock 11631 7.4299 __switch_to 9897 6.3222 mwait_idle 7153 4.5693 __find_first_bit 6395 4.0851 try_to_wake_up 3972 2.5373 unix_stream_recvmsg 3813 2.4357 system_call 3739 2.3885 thread_return 3612 2.3073 find_busiest_group 3446 2.2013 resched_task postgres: samples %symbol name 6415 7.6266 base_yyparse 6115 7.2699 AllocSetAlloc 3525 4.1907 hash_search libc: samples %symbol name 9276 14.5226 memcpy 8686 13.5989 _int_malloc 5865 9.1823 strncpy 5538 8.6703 strlen 5286 8.2758 strspn 4351 6.8120 vfprintf With stats_command_string on: 45.2 sec to run script (with oprofile off) 47.5 sec to run script (with oprofile active) Also repeatable. oprofile global report: CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 24 GLOBAL_POWER_E...| samples| %| -- 785756 71.9386 /usr/lib/debug/lib/modules/2.6.16-1.2133_FC5/vmlinux 135820 12.4348 /home/tgl/testversion/bin/postgres 105828 9.6889 /lib64/libc-2.4.so 19903 1.8222 /home/tgl/testversion/lib/libpq.so.5.0 19293 1.7663 /home/tgl/testversion/bin/psql 7489 0.6856 /ip_conntrack 4555 0.4170 /usr/bin/oprofiled kernel: samples %symbol name 62337 7.9334 security_port_sid 61566 7.8353 schedule 37897 4.8230 _raw_spin_lock 36410 4.6338 __switch_to 26580 3.3827 mwait_idle 21797 2.7740 __find_first_bit 19118 2.4331 avc_has_perm_noaudit 18929 2.4090 sys_select 17005 2.1642 system_call 15979 2.0336 do_select 15762 2.0060 fget 14838 1.8884 try_to_wake_up 13488 1.7166 find_busiest_group 11956 1.5216 _raw_read_lock 11685 1.4871 thread_return 10639 1.3540 resched_task 10542 1.3416 copy_user_generic postgres: samples %symbol name 7817 5.7554 PgstatBufferMain 7541 5.5522 base_yyparse 7247 5.3357 AllocSetAlloc 6725 4.9514 hash_search libc: samples %symbol name 1501214.1853 strncpy 1226511.5896 _int_malloc 1194911.2910 memcpy This system is running with SELinux enabled (in permissive mode). I presume that if I turned it off, security_port_sid() and avc_has_perm_noaudit() would drop off the radar screen but things wouldn't change much otherwise. Also, you'd expect to see PgstatCollectorMain() somewhere but it's not in the oprofile report at all. I believe that gcc sees that it's a static singly-referenced function and inlines it into PgstatBufferMain; so the time shown above for PgstatBufferMain should be read as the joint time for those two functions. Also it looks like all of the pgstat_recv_foo functions got inlined into it too. The only pgstat functions that show separately in the oprofile output are 1214 0.8938 pgstat_add_backend 860 0.6332 pgstat_report_tabstat 749 0.5515 pgstat_report_activity 434 0.3195 pgstat_write_statsfile 330 0.2430 pgstat_setheader 196 0.1443 pgstat_count_xact_commit 106 0.0780 pgstat_send 910.0670 pgstat_get_db_entry So the total time spent directly in pgstat.c is less than 9% of the postgres executable's time, or only about 1%
Re: [HACKERS] postgresql and process titles
On Jun 13, 2006, at 9:42 PM, Kris Kennaway wrote: BTW, there's another FBSD performance odditiy I've run across. Running pg_dump -t email_contrib -COx stats | bzip2 > ec.sql.bz2 & which dumps the email_contrib table to bzip2 then to disk, the OS won't use more than 1 CPU on an SMP system... unless the data is cached. According to both gstat and systat -v, the system isn't I/O bound; both are reporting the RAID10 with that table on it as only about 10% busy. If I let that command run for a bit then cancel it and re-start it so that the beginning of that table is in cache, it will use one entire CPU for bzip2, which is what I'd expect to happen. Hmm, odd..maybe something with the scheduler. I'd need access to a test case to be able to figure it out though. well, pg_dump of any sizeable database piped through bzip2 or gzip should show this. Try: pg_dump large_database | bzip2 > databasedump.sql.bz2 -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] postgresql and process titles
Moving to osdldbt-general and dropping Tom and Marc. On Jun 13, 2006, at 1:18 PM, Kris Kennaway wrote: On Tue, Jun 13, 2006 at 12:29:14PM -0500, Jim C. Nasby wrote: Unless supersmack has improved substantially, you're unlikely to find much interest. Last I heard it was a pretty brain-dead benchmark. DBT2/3 (http://sourceforge.net/projects/osdldbt) is much more realistic (based on TPC-C and TPC-H). Have you tried to compile this on FreeBSD? It looks like it (dbt1 at least) will need a moderate amount of hacking - there are some Linux assumptions in the source and the configure script makes assumptions about where things are installed that cannot be overridden on the commandline. Yeah, there's a number of issues that would need to be addressed. The biggest problem as I see it is that all the framework to run tests is pretty tied together, without any modularity. The run script itself launches all the stats collecting stuff, assumes it needs to happen on the local machine, etc. There is consensus that this kind of thing needs to be done to improve dbt*, it's just a matter of doing it. Also, it's desirable to improve the benchmarks so that they can store results in a database, and set them up so that the monitoring stuff is common to all the different tests. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Parallel index build during COPY
On Jun 15, 2006, at 9:45 PM, Toru SHIMOGAKI wrote: NTT has some ideas about index creation during a large amount of data loading. Our approach is the following: index tuples are created at the same time as heap tuples and added into heapsort. In addition, we use old index tuples as sorted list if the target table has already data. It is not necessary for data loader to sort all the index tuples including old ones. After only new index tuples are sorted, both sorted lists are merged and the whole index is built. It can save both CPU resources and disk accesses dramatically, especially if the target table has already so many tuples. This approach needs to acquire a table lock, which is unlike COPY's lock mode, so we have developed it as another bulk load tool. We will talk about it in PostgreSQL Anniversary Conference at Toronto. Thank you for Josh’s coordination. So does that mean you're able to do all that without hacking the back- end? Impressive. :) I look forward to hearing about it. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] MultiXacts & WAL
<[EMAIL PROTECTED]>Yeah, it's difficult to believe that multixact stuff could form anoticeable fraction of the total WAL load, except perhaps under reallypathological circumstances, because the code just isn't supposed to beexercised often. So I don't think this is worth pursuing. Paolo's freeto try to prove the opposite of course ... but I'd want to see numbersnot speculation. regards, tom laneTom is right, mine are indeed just plain speculations, motivated by my original doubt concerning whether there were hidden reasons for requiring multixacts recoverability.I don't know if I'll find the time to do some performance tests, at least in the short term, but I've enjoyed to exchange my views with you all, so thanks a lot for your feedback!Just a curiosity, what kind of benchmarks would you use to evaluate this effect? I am quite familiar with TPC-C and TPC-W, but i am a newbie of postgresql community so i was wondering if you were using any reference benchmark Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
Re: [HACKERS] Test request for Stats collector performance improvement
Bruce Momjian writes: > 1) Run this script and record the time reported: > ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.script One thing you neglected to specify is that the test must be done on a NON ASSERT CHECKING build of CVS HEAD (or recent head, at least). On these trivial "SELECT 1" commands, an assert-checking backend is going to spend over 50% of its time doing end-of-transaction assert checks. I was reminded of this upon trying to do oprofile: CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 24 samples %symbol name 129870 37.0714 AtEOXact_CatCache 6711219.1571 AllocSetCheck 16611 4.7416 AtEOXact_Buffers 10054 2.8699 base_yyparse 7499 2.1406 hash_seq_search 7037 2.0087 AllocSetAlloc 4267 1.2180 hash_search 4060 1.1589 AtEOXact_RelationCache 2537 0.7242 base_yylex 1984 0.5663 grouping_planner 1873 0.5346 LWLockAcquire 1837 0.5244 AllocSetFree 1808 0.5161 exec_simple_query 1763 0.5032 ExecutorStart 1527 0.4359 PostgresMain 1464 0.4179 MemoryContextAllocZeroAligned Let's be sure we're all measuring the same thing. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MultiXacts & WAL
In PostgreSQL, shared locks are not taken when just reading data. They're used to enforce foreign key constraints. When inserting a row to a table with a foreign key, the row in the parent table is locked to keep another transaction from deleting it. It's not safe to release the lock before end of transaction.Releasing shared locks (whether used for plain reading or enforcing foreign keys) before transaction end would be clearly wrong.The original point I was moving is if there were any concrete reason (which still I can't see) to require Multixacts recoverability (by means of logging). Concerning the prepare state of two phase commit, as I was pointing out in my previous post, shared locks can safely be released once a transaction gets precommitted, hence they do not have to be made durable. Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
Re: [HACKERS] PG_MODULE_MAGIC
Tom Lane wrote: No, each major release (8.2, 8.3, etc). There are hardly ever any major releases where you wouldn't need a new compilation anyway ... True. I'm all in favor of a magic used this way. It will save me some grief. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] MultiXacts & WAL
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > Also, multixacts are only used when two transactions hold a shared lock > on the same row. Yeah, it's difficult to believe that multixact stuff could form a noticeable fraction of the total WAL load, except perhaps under really pathological circumstances, because the code just isn't supposed to be exercised often. So I don't think this is worth pursuing. Paolo's free to try to prove the opposite of course ... but I'd want to see numbers not speculation. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PG_MODULE_MAGIC
Thomas Hallgren <[EMAIL PROTECTED]> writes: > A module magic patch was added recently and I'm a bit uncertain what the > implications are > for the external PL modules. Does it affect them at all? Yes. > Will I need to provide separate > binaries for each bug fix release even though the API's do not change? No, each major release (8.2, 8.3, etc). There are hardly ever any major releases where you wouldn't need a new compilation anyway ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MultiXacts & WAL
On Sat, 17 Jun 2006, paolo romano wrote: * Reduced I/O Activity: during transaction processing: current workloads are typically dominated by reads (rather than updates)... and reads give rise to multixacts (if there are at least two transactions reading the same page or if an explicit lock request is performed through heap_lock_tuple). And (long) transactions can read a lot of tuples, which directly translates into (long) multixact logging sooner or later. To accurately estimate the possible performance gain one should perform some profiling, but at first glance ISTM that there are good potentialities. Read-only transactions don't acquire shared locks. And updating transcations emit WAL records anyway; the additional I/O caused by multixact records is negligable. Also, multixacts are only used when two transactions hold a shared lock on the same row. * Reduced Recovery Time: because of shorter logs & less data structures to rebuild... and reducing recovery time helps improving system availability so should not be overlooked. I doubt the multixact stuff makes much difference compared to all other WAL traffic. In fact, logging the multixact stuff could be skipped when no two-phase transactions are involved. The problem is, you don't know if a transaction is one phase or two phase before you see COMMIT or PREPARE TRANSACTION. - Heikki ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MultiXacts & WAL
On Sat, 17 Jun 2006, paolo romano wrote: When a transaction enters (successfully) the prepared state it only retains its exclusive locks and releases any shared locks (i.e. multixacts)... or, at least, that's how it should be in principle according to serializiaton theory, i haven't yet checked out if this is what is done in postgresql . In PostgreSQL, shared locks are not taken when just reading data. They're used to enforce foreign key constraints. When inserting a row to a table with a foreign key, the row in the parent table is locked to keep another transaction from deleting it. It's not safe to release the lock before end of transaction. - Heikki ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Preventing DELETE and UPDATE without a WHERE clause?
> On 6/16/06, Mark Woodward <[EMAIL PROTECTED]> wrote: >> > Chris Campbell <[EMAIL PROTECTED]> writes: >> >> I heard an interesting feature request today: preventing the >> >> execution of a DELETE or UPDATE query that does not have a WHERE >> clause. >> > >> > These syntaxes are required by the SQL spec. Furthermore, it's easy >> > to imagine far-more-probable cases in which the system wouldn't detect >> > that you'd made a mistake, eg >> > >> > DELETE FROM tab WHERE key > 1 >> > >> > where you meant to type >> > >> > DELETE FROM tab WHERE key > 1000 >> > >> > I suggest counseling your client to learn how to use BEGIN/ROLLBACK. >> > This proposal strikes me as falling squarely within the rule about >> > "design a system that even a fool can use, and only a fool will want >> > to use it". >> > >> Just a theory, couldn't a trigger be set up that would case the query to >> tank if it touches too many rows? >> > > i haven't tried but maybe a FOR STATEMENT trigger AFTER the event can > ask ROW_COUNT using GET DIAGNOSTICS? Well, if you *can't" do it in a trigger, maybe that's a valid modification for Hackers to consider. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] PG_MODULE_MAGIC
A module magic patch was added recently and I'm a bit uncertain what the implications are for the external PL modules. Does it affect them at all? Will I need to provide separate binaries for each bug fix release even though the API's do not change? Exactly how is the magic determined? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] MultiXacts & WAL
Tom Lane <[EMAIL PROTECTED]> ha scritto: paolo romano <[EMAIL PROTECTED]> writes:> The point i am missing is the need to be able to completely recover> multixacts offsets and members data. These carry information about> current transactions holding shared locks on db tuples, which should> not be essential for recovery purposes.This might be optimizable if we want to assume that multixacts will neverbe used for any purpose except holding locks, but that seems a bit shortsighted. Is there any actually significant advantage to not loggingthis information? regards, tom laneI can see two main advantages: * Reduced I/O Activity: during transaction processing: current workloads are typically dominated by reads (rather than updates)... and reads give rise to multixacts (if there are at least two transactions reading the same page or if an explicit lock request is performed through heap_lock_tuple). And (long) transactions can read a lot of tuples, which directly translates into (long) multixact logging sooner or later. To accurately estimate the possible performance gain one should perform some profiling, but at first glance ISTM that there are good potentialities. * Reduced Recovery Time: because of shorter logs & less data structures to rebuild... and reducing recovery time helps improving system availability so should not be overlooked.Regards, Paolo Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com