[HACKERS] random_page_cost vs seq_page_cost
Hello list, I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. Emails in this mailing lists archive seem to indicate that 1.0 vs 3.0 - 4.0 are appropriate values on modern hardware. Which surprised me a bit as I had thought that on actual harddrives (ignoring SSDs) random_page_cost is higher. I guess that the number tries to reflect caching of the relevant pages in memory and modern hardware you have more of that? Anyhow it would be great to have a scientific way of setting those numbers. Background: We have recently upgrade two of our biggest postgres databases to new hardware and minor version number bump (8.4.5 - 8.4.9). We are experiencing a big performance regression in some queries. In those cases the planner seems to choose a nested loop index scan instead of hashing the index once and then joining. The new hardware was optimized for seq scans and does those very fast. We are not sure if the database used to choose differently before the move to the new hardware and the hardware is performing worse for random seeks. Or if the planner is now making different choices. As a counter measure we are experimenting with enable_nestloop = off random_page_cost = 20 (instead of the previous 4). It is worth noting that for many small tables the nestloop is indeed marginally faster (in the doesn't really matter because both cases are fast enough case). But the regression for the big tables (big in the sense of index just fits into memory but in practice might not because there other frequently accessed big tables) is a show stopper. For some of those tables we have also have recently (as part of the move) clustered for the first time in ages and it was speculated that that might have changed statistics (such as correlation) and increased the attractiveness of the index scan to the planner. Another thing that I have thought before might be provide some enlightenment would be a explain log select ... That would show all the sub plans considered and why they were discarded or something approximating this. Thanks in advance for any reply and sorry that this email turned out to be rather long stream of consciousness dump. Bene -- relevant parts of our postgresql.conf --- shared_buffers = 12GB # min 128kB # (change requires restart) temp_buffers = 512MB# min 800kB #max_prepared_transactions = 0 # zero disables the feature # (change requires restart) # Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). # It is not advisable to set max_prepared_transactions nonzero unless you # actively intend to use prepared transactions. work_mem = 192MB# min 64kB maintenance_work_mem = 1GB # min 1MB #max_stack_depth = 2MB # min 100kB # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 # (change requires restart) #shared_preload_libraries = '' # (change requires restart) # - Cost-Based Vacuum Delay - vacuum_cost_delay = 100ms # 0-100 milliseconds vacuum_cost_page_hit = 1# 0-1 credits vacuum_cost_page_miss = 10 # 0-1 credits vacuum_cost_page_dirty = 20 # 0-1 credits vacuum_cost_limit = 7500# 1-1 credits # - Background Writer - #bgwriter_delay = 200ms # 10-1ms between rounds #bgwriter_lru_maxpages = 100# 0-1000 max buffers written/round #bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round # - Asynchronous Behavior - effective_io_concurrency = 40# 1-1000. 0 disables prefetching # WRITE AHEAD LOG fsync = on # turns forced synchronization on or off synchronous_commit = off# immediate fsync at commit #wal_sync_method = fsync# the default is the first option # supported by the operating system: # open_datasync # fdatasync # fsync # fsync_writethrough # open_sync full_page_writes = on # recover from partial page writes wal_buffers = 16MB # min 32kB # (change requires restart) #wal_writer_delay = 200ms # 1-1 milliseconds commit_delay = 1000 # range 0-10, in microseconds commit_siblings = 5 # range 1-1000 # - Checkpoints - checkpoint_segments = 128 # in logfile segments, min 1, 16MB each checkpoint_timeout = 10min # range 30s-1h checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 checkpoint_warning = 30s# 0 disables # - Archiving - archive_mode = off # allows
Re: [HACKERS] WIP: Join push-down for foreign tables
On 03.12.2011 01:05, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: Hmm, so you're saying that the FDW function needs to be able to return multiple paths for a single joinrel. Fair enough, and that's not specific to remote joins. Even a single-table foreign scan could be implemented differently depending on whether you prefer fast-start or cheapest total. ... or ordered vs unordered, etc. Yeah, good point, we already got this wrong with the PlanForeignScan API. Good thing we didn't promise that would be stable. This discussion withered down here... I think the advice to Shigeru-san is to work on the API. We didn't reach a consensus on what exactly it should look like, but at least you need to be able to return multiple paths for a single joinrel, and should look at fixing the PlanForeignScan API to allow that too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Jan4, 2012, at 21:27 , Robert Haas wrote: I think the first thing we need to look at is increasing the number of CLOG buffers. What became of the idea to treat the stable (i.e. earlier than the oldest active xid) and the unstable (i.e. the rest) parts of the CLOG differently. On 64-bit machines at least, we could simply mmap() the stable parts of the CLOG into the backend address space, and access it without any locking at all. I believe that we could also compress the stable part by 50% if we use one instead of two bits per txid. AFAIK, we need two bits because we a) Distinguish between transaction where were ABORTED and those which never completed (due to i.e. a backend crash) and b) Mark transaction as SUBCOMMITTED to achieve atomic commits. Which both are strictly necessary for the stable parts of the clog. Note that we could still keep the uncompressed CLOG around for debugging purposes - the additional compressed version would require only 2^32/8 bytes = 512 MB in the worst case, which people who're serious about performance can very probably spare. The fly in the ointment are 32-bit machines, of course - but then, those could still fall back to the current way of doing things. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER DOMAIN DROP CONSTRAINT doesn't catch errors
On Thu, Dec 29, 2011 at 4:02 PM, Peter Eisentraut pete...@gmx.net wrote: Is there a secret reason why ALTER DOMAIN foo DROP CONSTRAINT nonexistent; doesn't report any error? If not, I think we should add one and also add the usual IF EXISTS option. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FATAL: bogus data in lock file postmaster.pid:
My laptop ran out of battery and turned itself off while I was just starting up postmaster. After plugging in the charger and rebooting, I got the following error when I tried to restart PostgreSQL: FATAL: bogus data in lock file postmaster.pid: postmaster.pid file was present in the data directory, but had zero length. Looking at the way the file is created and written, that can happen if you crash after the file is created, but before it's written/fsync'd (my laptop might have write-cache enabled, which would make the window larger). I was a bit surprised by that. That's probably not a big deal in practice, but I wonder if there was some easy way to avoid that. First I thought we could create the new postmaster.pid file with a temporary name and rename it in place, but rename(2) will merrily overwrite any existing file which is not what we want. We could use link(2), I guess. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:
On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: My laptop ran out of battery and turned itself off while I was just starting up postmaster. After plugging in the charger and rebooting, I got the following error when I tried to restart PostgreSQL: FATAL: bogus data in lock file postmaster.pid: postmaster.pid file was present in the data directory, but had zero length. Looking at the way the file is created and written, that can happen if you crash after the file is created, but before it's written/fsync'd (my laptop might have write-cache enabled, which would make the window larger). I was a bit surprised by that. That's probably not a big deal in practice, but I wonder if there was some easy way to avoid that. First I thought we could create the new postmaster.pid file with a temporary name and rename it in place, but rename(2) will merrily overwrite any existing file which is not what we want. We could use link(2), I guess. Is this really a problem big enough to spend even that much effort on? Perhaps a special-case in the error message instead is enough? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Thu, Jan 5, 2012 at 5:15 AM, Florian Pflug f...@phlo.org wrote: On Jan4, 2012, at 21:27 , Robert Haas wrote: I think the first thing we need to look at is increasing the number of CLOG buffers. What became of the idea to treat the stable (i.e. earlier than the oldest active xid) and the unstable (i.e. the rest) parts of the CLOG differently. I'm curious -- anyone happen to have an idea how big the unstable CLOG xid space is in the typical case? What's would be the main driver of making it bigger? What are the main tradeoffs in terms of trying to keep the unstable area compact? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] optimizing repeated MVCC snapshots
On Tue, Jan 3, 2012 at 2:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: Another thought is that it should always be safe to reuse an old snapshot if no transactions have committed or aborted since it was taken Yeah, that might work better. And it'd be a win for all MVCC snaps, not just the ones coming from promoted SnapshotNow ... Here's a rough patch for that. Some benchmark results from a 32-core Itanium server are included below. They look pretty good. Out of the dozen configurations I tested, all but one came out ahead with the patch. The loser was 80-clients on permanent tables, but I'm not too worried about that since 80 clients on unlogged tables came out ahead. This is not quite in a committable state yet, since it presumes atomic 64-bit reads and writes, and those aren't actually atomic everywhere. What I'm thinking we can do is: on platforms where 8-byte reads and writes are known to be atomic, we do as the patch does currently. On any platform where we don't know that to be the case, we can move the test I added to GetSnapshotData inside the lock, which should still be a small win at low concurrencies. At high concurrencies it's a bit iffy, because making GetSnapshotData's critical section shorter might lead to lots of people trying to manipulate the ProcArrayLock spinlock in very rapid succession. Even if that turns out to be an issue, I'm inclined to believe that anyone who has enough concurrency for that to matter probably also has atomic 8-byte reads and writes, and so the most that will be needed is an update to our notion of which platforms have that capability. If that turns out to be wrong, the other obvious alternative is to not to the test at all unless it can be done unlocked. To support the above, I'm inclined to add a new file src/include/atomic.h which optionally defines a macro called ATOMIC_64BIT_OPS and macros atomic_read_uint64(r) and atomic_write_uint64(l, r). That way we can eventually support (a) architectures where 64-bit operations aren't atomic at all, (b) architectures where ordinary 64-bit operations are atomic (atomic_read_unit64(r) - r, and atomic_write_uint64(l, r) - l = r), and (c) architectures (like 32-bit x86) where ordinary 64-bit operations aren't atomic but special instructions (cmpxchg8b) can be used to get that behavior. m = master, s = with patch. scale factor 100, median of three 5-minute test runs. shared_buffers=8GB, checkpoint_segments=300, checkpoint_timeout=30min, effective_cache_size=340GB, wal_buffers=16MB, wal_writer_delay=20ms, listen_addresses='*', synchronous_commit=off. binary modified with chatr +pd L +pi L and run with rtsched -s SCHED_NOAGE -p 178. Permanent Tables m01 tps = 912.865209 (including connections establishing) s01 tps = 916.848536 (including connections establishing) m08 tps = 6256.429549 (including connections establishing) s08 tps = 6364.214425 (including connections establishing) m16 tps = 10795.373683 (including connections establishing) s16 tps = 11038.233359 (including connections establishing) m24 tps = 13710.400042 (including connections establishing) s24 tps = 13836.823580 (including connections establishing) m32 tps = 14574.758456 (including connections establishing) s32 tps = 15125.196227 (including connections establishing) m80 tps = 12014.498814 (including connections establishing) s80 tps = 11825.302643 (including connections establishing) Unlogged Tables === m01 tps = 942.950926 (including connections establishing) s01 tps = 953.618574 (including connections establishing) m08 tps = 6492.238255 (including connections establishing) s08 tps = 6537.197731 (including connections establishing) m16 tps = 11363.708861 (including connections establishing) s16 tps = 11561.193527 (including connections establishing) m24 tps = 14656.659546 (including connections establishing) s24 tps = 14977.226426 (including connections establishing) m32 tps = 16310.814143 (including connections establishing) s32 tps = 16644.921538 (including connections establishing) m80 tps = 13422.438927 (including connections establishing) s80 tps = 13780.256723 (including connections establishing) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company optimize-repeated-snapshots.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing repeated MVCC snapshots
* Robert Haas: and (c) architectures (like 32-bit x86) where ordinary 64-bit operations aren't atomic but special instructions (cmpxchg8b) can be used to get that behavior. FILD and FIST are atomic, too, and are supported by more micro-architectures. -- Florian Weimerfwei...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Thu, Jan 5, 2012 at 6:15 AM, Florian Pflug f...@phlo.org wrote: On 64-bit machines at least, we could simply mmap() the stable parts of the CLOG into the backend address space, and access it without any locking at all. True. I think this could be done, but it would take some fairly careful thought and testing because (1) we don't currently use mmap() anywhere else in the backend AFAIK, so we might run into portability issues (think: Windows) and perhaps unexpected failure modes (e.g. mmap() fails because there are too many mappings already). Also, it's not completely guaranteed to be a win. Sure, you save on locking, but now you are doing an mmap() call in every backend instead of just one read() into shared memory. If concurrency isn't a problem that might be more expensive on net. Or maybe no, but I'm kind of inclined to steer clear of this whole area at least for 9.2. So far, the only test result I have only supports the notion that we run into trouble when NUM_CPUS NUM_CLOG_BUFFERS, and people have to before they can even start their I/Os. That can be fixed with a pretty modest reengineering. I'm sure there is a second-order effect from the cost of repeated I/Os per se, which a backend-private cache of one form or another might well help with, but it may not be very big. Test results are welcome, of course. I believe that we could also compress the stable part by 50% if we use one instead of two bits per txid. AFAIK, we need two bits because we a) Distinguish between transaction where were ABORTED and those which never completed (due to i.e. a backend crash) and b) Mark transaction as SUBCOMMITTED to achieve atomic commits. Which both are strictly necessary for the stable parts of the clog. Well, if we're going to do compression at all, I'm inclined to think that we should compress by more than a factor of two. Jim Nasby's numbers (the worst we've seen so far) show that 18% of 1k blocks of XIDs were all commits. Presumably if we reduced the chunk size to, say, 8 transactions, that percentage would go up, and even that would be enough to get 16x compression rather than 2x. Of course, then keeping the uncompressed CLOG files becomes required rather than optional, but that's OK. What bothers me about compressing by only 2x is that the act of compressing is not free. You have to read all the chunks and then write out new chunks, and those chunks then compete for each other in cache. Who is to say that we're not better off just reading the uncompressed data at that point? At least then we have only one copy of it. Note that we could still keep the uncompressed CLOG around for debugging purposes - the additional compressed version would require only 2^32/8 bytes = 512 MB in the worst case, which people who're serious about performance can very probably spare. I don't think it'd be even that much, because we only ever use half the XID space at a time, and often probably much less: the default value of vacuum_freeze_table_age is only 150 million transactions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:
On Thu, Jan 5, 2012 at 8:23 AM, Magnus Hagander mag...@hagander.net wrote: On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: My laptop ran out of battery and turned itself off while I was just starting up postmaster. After plugging in the charger and rebooting, I got the following error when I tried to restart PostgreSQL: FATAL: bogus data in lock file postmaster.pid: postmaster.pid file was present in the data directory, but had zero length. Looking at the way the file is created and written, that can happen if you crash after the file is created, but before it's written/fsync'd (my laptop might have write-cache enabled, which would make the window larger). I was a bit surprised by that. That's probably not a big deal in practice, but I wonder if there was some easy way to avoid that. First I thought we could create the new postmaster.pid file with a temporary name and rename it in place, but rename(2) will merrily overwrite any existing file which is not what we want. We could use link(2), I guess. Is this really a problem big enough to spend even that much effort on? Perhaps a special-case in the error message instead is enough? On a laptop it's not a big deal, but it sort of sucks if your production database fails to start automatically after an unexpected power outage. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On Thu, Jan 5, 2012 at 5:04 AM, Benedikt Grundmann bgrundm...@janestreet.com wrote: We are experiencing a big performance regression in some queries. In those cases the planner seems to choose a nested loop index scan instead of hashing the index once and then joining. I think you probably need to post EXPLAIN ANALYZE output from the actual queries to get useful advice, probably to pgsql-performance, rather than here. It's hard to believe that enable_nestloop=off is doing anything other than masking whatever the real problem is, but it's hard to tell what that problem is based on the information provided. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 05/01/12 10:04, Benedikt Grundmann wrote: As a counter measure we are experimenting with enable_nestloop = off random_page_cost = 20 (instead of the previous 4). For what it is worth we had to revert the enable_nestloop = off change. It just moved the pain around by making other queries perform much worse than before. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] optimizing repeated MVCC snapshots
On Thu, Jan 5, 2012 at 9:01 AM, Florian Weimer fwei...@bfk.de wrote: * Robert Haas: and (c) architectures (like 32-bit x86) where ordinary 64-bit operations aren't atomic but special instructions (cmpxchg8b) can be used to get that behavior. FILD and FIST are atomic, too, and are supported by more micro-architectures. Yeah, I think you (or someone) mentioned the code. If someone wants to write the code, I'm game... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/04/2012 08:32 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: Tom said: 2. A slightly cleaner fix for this should be to duplicate the SV and then release the copy around the SvPVutf8 call, only if it's readonly. Fixing it in do_util_elog is entirely the wrong thing. How do we tell if it's readonly? SvREADONLY(sv) macro. OK, so this seems to fix The issue David found: diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h index ac0a97d..f0e1afa 100644 --- a/src/pl/plperl/plperl_helpers.h +++ b/src/pl/plperl/plperl_helpers.h @@ -51,7 +51,10 @@ sv2cstr(SV *sv) /* * get a utf8 encoded char * out of perl. *note* it may not be valid utf8! */ - val = SvPVutf8(sv, len); + if (SvREADONLY(sv)) + val = SvPVutf8(sv_mortalcopy(sv), len); + else + val = SvPVutf8(sv, len); /* * we use perls length in the event we had an embedded null byte to ensure but it doesn't fix the one I found which passes a typeglob to elog(): do '$foo=1; elog(NOTICE,*foo);' language plperl; That still crashes, but doesn't if we use sv_mortalcopy unconditionally. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore direct to database is broken for --insert dumps
On 01/04/2012 06:20 PM, Tom Lane wrote: Could we detect an appropriate line ending in ahwrite() after it's been decompressed and buffer partial lines accordingly? Not easily: there could be newlines embedded in data strings or SQL identifiers. Should we look at eliminating those newlines for the future by using U identifiers where there are embedded newlines and unicode escapes for newlines in data strings? Then at least we'd possibly be able to get rid of the kludge some time in the future. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
For what's worth here are the numbers on one of our biggest databases (same system as I posted about separately wrt seq_scan_cost vs random_page_cost). 0053 1001 00BA 1009 0055 1001 00B9 1020 0054 983 00BB 1010 0056 1001 00BC 1019 0069 0 00BD 1009 006A 224 00BE 1018 006B 1009 00BF 1008 006C 1008 00C0 1006 006D 1004 00C1 1014 006E 1016 00C2 1023 006F 1003 00C3 1012 0070 1011 00C4 1000 0071 1011 00C5 1002 0072 1005 00C6 982 0073 1009 00C7 996 0074 1013 00C8 973 0075 1002 00D1 987 0076 997 00D2 968 0077 1007 00D3 974 0078 1012 00D4 964 0079 994 00D5 981 007A 1013 00D6 964 007B 999 00D7 966 007C 1000 00D8 971 007D 1000 00D9 956 007E 1008 00DA 976 007F 1010 00DB 950 0080 1001 00DC 967 0081 1009 00DD 983 0082 1008 00DE 970 0083 988 00DF 965 0084 1007 00E0 984 0085 1012 00E1 1004 0086 1004 00E2 976 0087 996 00E3 941 0088 1008 00E4 960 0089 1003 00E5 948 008A 995 00E6 851 008B 1001 00E7 971 008C 1003 00E8 954 008D 982 00E9 938 008E 1000 00EA 931 008F 1008 00EB 956 0090 1009 00EC 960 0091 1013 00ED 962 0092 1006 00EE 933 0093 1012 00EF 956 0094 994 00F0 978 0095 1017 00F1 292 0096 1004 0097 1005 0098 1014 0099 1012 009A 994 0035 1003 009B 1007 0036 1004 009C 1010 0037 981 009D 1024 0038 1002 009E 1009 0039 998 009F 1011 003A 995 00A0 1015 003B 996 00A1 1018 003C 1013 00A5 1007 003D 1008 00A3 1016 003E 1007 00A4 1020 003F 989 00A7 375 0040 989 00A6 1010 0041 975 00A9 3 0042 994 00A8 0 0043 1010 00AA 1 0044 1007 00AB 1 0045 1008 00AC 0 0046 991 00AF 4 0047 1010 00AD 0 0048 997 00AE 0 0049 1002 00B0 5 004A 1004 00B1 0 004B 1012 00B2 0 004C 999 00B3 0 004D 1008 00B4 0 004E 1007 00B5 807 004F 1010 00B6 1007 0050 1004 00B7 1007 0051 1009 00B8 1006 0052 1005 0057 1008 00C9 994 0058 991 00CA 977 0059 1000 00CB 978 005A 998 00CD 944 005B 971 00CC 972 005C 1005 00CF 969 005D 1010 00CE 988 005E 1006 00D0 975 005F 1015 0060 989 0061 998 0062 1014 0063 1000 0064 991 0065 990 0066 1000 0067 947 0068 377 00A2 1011 On 23/12/11 14:23, Kevin Grittner wrote: Jeff Janes jeff.ja...@gmail.com wrote: Could we get some major OLTP users to post their CLOG for analysis? I wouldn't think there would be much security/propietary issues with CLOG data. FWIW, I got the raw numbers to do my quick check using this Ruby script (put together for me by Peter Brant). If it is of any use to anyone else, feel free to use it and/or post any enhanced versions of it. #!/usr/bin/env ruby Dir.glob(*) do |file_name| contents = File.read(file_name) total = contents.enum_for(:each_byte).enum_for(:each_slice, 256).inject(0) do |count, chunk| if chunk.all? { |b| b == 0x55 } count + 1 else count end end printf %s %d\n, file_name, total end -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
Benedikt Grundmann bgrundm...@janestreet.com wrote: For what's worth here are the numbers on one of our biggest databases (same system as I posted about separately wrt seq_scan_cost vs random_page_cost). That's would be a 88.4% hit rate on the summarized data. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] easy way of copying regex_t ?
Hi all, I've been working on moving an extension that allows to move the ispell dictionaries to shared segment. It's almost complete, the last FIXME is about copying regex_t structure (stored in AFFIX). According to regex.h the structure is fairly complex and not exactly easy to understand, so I'd like to know if anyone here already implemented that or something that may serve the same purpose. Any ideas? kind regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Jan 4, 2012 at 3:13 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Jan 4, 2012 at 1:31 PM, Stephen Frost sfr...@snowman.net wrote: Simon, all, * Simon Riggs (si...@2ndquadrant.com) wrote: (1) report all errors on a page, including errors that don't change PostgreSQL data. This involves checksumming long strings of zeroes, which the checksum algorithm can't tell apart from long strings of ones. Do we actually know when/where it's supposed to be all zeros, and hence could we check for that explicitly? If we know what it's supposed to be, in order to be consistent with other information, I could certainly see value in actually checking that. Yes, we can. Excellent suggestion, will implement. No, we can't. I discover that non-all-zeroes holes are fairly common, just not very frequent. That may or may not be a problem, but not something to be dealt with here and now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
Andrew Dunstan and...@dunslane.net writes: On 01/04/2012 08:32 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: How do we tell if it's readonly? SvREADONLY(sv) macro. but it doesn't fix the one I found which passes a typeglob to elog(): do '$foo=1; elog(NOTICE,*foo);' language plperl; Mmm, I was wondering about that one. That still crashes, but doesn't if we use sv_mortalcopy unconditionally. Unconditional sv_mortalcopy sounds like the thing to do then, but a comment would help. And if this isn't a Perl bug, I would like to know what is. BTW, shouldn't we be making some attempt to drop the result of the sv_mortalcopy? Or is it just assumed that it will be garbage-collected before the memory leak gets too bad? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore direct to database is broken for --insert dumps
Andrew Dunstan and...@dunslane.net writes: On 01/04/2012 06:20 PM, Tom Lane wrote: Not easily: there could be newlines embedded in data strings or SQL identifiers. Should we look at eliminating those newlines for the future by using U identifiers where there are embedded newlines and unicode escapes for newlines in data strings? Then at least we'd possibly be able to get rid of the kludge some time in the future. Given our high expectations for forward-compatibility of dump files, we couldn't drop the code anytime in the foreseeable future, so I cannot get excited about spending time on that. (AFAIR, we have *never* intentionally broken compatibility of pg_dump files, unless you count breaking dumps made by unreleased development versions.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Wed, Jan 4, 2012 at 1:35 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon Riggs wrote: My focus was on getting something working first, then tuning. If we're agreed that we have everything apart from the tuning then we can proceed with tests to see which works better. Sure. I just think you are there already except for what I got into. FWIW, moving the modulus application out of the loop is a very trivial change and has no affect on the results; it's strictly a performance issue. New version attached, with your suggested changes included. Hole check code is there as well, but ifdef'd out since it isn't a valid check in all cases. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 0cc3296..9b367a3 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1701,6 +1701,48 @@ SET ENABLE_SEQSCAN TO OFF; /listitem /varlistentry + varlistentry id=guc-page-checksums xreflabel=page_checksums + indexterm + primaryvarnamepage_checksums/ configuration parameter/primary + /indexterm + termvarnamepage_checksums/varname (typeboolean/type)/term + listitem + para +When this parameter is on, the productnamePostgreSQL/ server +calculates checksums when it writes main database pages to disk, +flagging the page as checksum protected. When this parameter is off, +no checksum is written, only a standard watermark in the page header. +The database may thus contain a mix of pages with checksums and pages +without checksums. + /para + + para +When pages are read into shared buffers any page flagged with a +checksum has the checksum re-calculated and compared against the +stored value to provide greatly improved validation of page contents. + /para + + para +Writes via temp_buffers are not checksummed. + /para + + para +Turning this parameter off speeds normal operation, but +might allow data corruption to go unnoticed. The checksum uses +16-bit checksums, using the fast Fletcher 16 algorithm. With this +parameter enabled there is still a non-zero probability that an error +could go undetected, as well as a non-zero probability of false +positives. + /para + + para +This parameter can only be set in the filenamepostgresql.conf/ +file or on the server command line. +The default is literaloff/. + /para + /listitem + /varlistentry + varlistentry id=guc-wal-buffers xreflabel=wal_buffers termvarnamewal_buffers/varname (typeinteger/type)/term indexterm diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 91cc001..a43b7be 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -440,7 +440,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, smgrread(smgr, forkNum, blockNum, (char *) bufBlock); /* check for garbage data */ - if (!PageHeaderIsValid((PageHeader) bufBlock)) + if (!PageIsVerified((Page) bufBlock)) { if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages) { @@ -1860,6 +1860,8 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln) { XLogRecPtr recptr; ErrorContextCallback errcontext; + Block bufBlock; + char *bufCopy; /* * Acquire the buffer's io_in_progress lock. If StartBufferIO returns @@ -1907,10 +1909,24 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln) buf-flags = ~BM_JUST_DIRTIED; UnlockBufHdr(buf); + /* + * Set page verification info immediately before we write the buffer to disk. + * Once we have flushed the buffer is marked clean again, meaning it can + * be replaced quickly and silently with another data block, so we must + * write verification info now. For efficiency, the process of cleaning + * and page replacement is asynchronous, so we can't do this *only* when + * we are about to replace the buffer, we need to do this for every flush. + */ + bufBlock = BufHdrGetBlock(buf); + bufCopy = PageSetVerificationInfo((Page) bufBlock); + + /* + * bufToWrite is either the shared buffer or a copy, as appropriate. + */ smgrwrite(reln, buf-tag.forkNum, buf-tag.blockNum, - (char *) BufHdrGetBlock(buf), + (char *) bufCopy, false); pgBufferUsage.shared_blks_written++; @@ -1921,6 +1937,8 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln) */ TerminateBufferIO(buf, true, 0); + /* XXX Assert(buf is not BM_JUST_DIRTIED) */ + TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(buf-tag.forkNum, buf-tag.blockNum, reln-smgr_rnode.node.spcNode, diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c index
Re: [HACKERS] CLOG contention
On Tue, Dec 27, 2011 at 5:23 AM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, Dec 24, 2011 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Dec 22, 2011 at 4:20 PM, Robert Haas robertmh...@gmail.com wrote: Also, if it is that, what do we do about it? I don't think any of the ideas proposed so far are going to help much. If you don't like guessing, don't guess, don't think. Just measure. Does increasing the number of buffers solve the problems you see? That must be the first port of call - is that enough, or not? If not, we can discuss the various ideas, write patches and measure them. Just in case you want a theoretical prediction to test: increasing NUM_CLOG_BUFFERS should reduce the frequency of the spikes you measured earlier. That should happen proportionally, so as that is increased they will become even less frequent. But the size of the buffer will not decrease the impact of each event when it happens. I'm still catching up on email, so apologies for the slow response on this. I actually ran this test before Christmas, but didn't get around to emailing the results. I'm attaching graphs of the last 100 seconds of a run with the normal count of CLOG buffers, and the last 100 seconds of a run with NUM_CLOG_BUFFERS = 32. I am also attaching graphs of the entire runs. It appears to me that increasing the number of CLOG buffers reduced the severity of the latency spikes considerably. In the last 100 seconds, for example, master has several spikes in the 500-700ms range, but with 32 CLOG buffers it never goes above 400 ms. Also, the number of points associated with each spike is considerably less - each spike seems to affect fewer transactions. So it seems that at least on this machine, increasing the number of CLOG buffers both improves performance and reduces latency. I hypothesize that there are actually two kinds of latency spikes here. Just taking a wild guess, I wonder if the *remaining* latency spikes are caused by the effect that you mentioned before: namely, the need to write an old CLOG page every time we advance onto a new one. I further speculate that the spikes are more severe on the unpatched code because this effect combines with the one I mentioned before: if there are more simultaneous I/O requests than there are buffers, a new I/O request has to wait for one of the I/Os already in progress to complete. If the new I/O request that has to wait extra-long happens to be the one caused by XID advancement, then things get really ugly. If that hypothesis is correct, then it supports your previous belief that more than one fix is needed here... but it also means we can get a significant and I think quite worthwhile benefit just out of finding a reasonable way to add some more buffers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company attachment: latency-end.pngattachment: latency-clog32-end.pngattachment: latency.pngattachment: latency-clog32.png -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: My laptop ran out of battery and turned itself off while I was just starting up postmaster. After plugging in the charger and rebooting, I got the following error when I tried to restart PostgreSQL: FATAL: bogus data in lock file postmaster.pid: postmaster.pid file was present in the data directory, but had zero length. Looking at the way the file is created and written, that can happen if you crash after the file is created, but before it's written/fsync'd (my laptop might have write-cache enabled, which would make the window larger). I was a bit surprised by that. That's probably not a big deal in practice, but I wonder if there was some easy way to avoid that. First I thought we could create the new postmaster.pid file with a temporary name and rename it in place, but rename(2) will merrily overwrite any existing file which is not what we want. We could use link(2), I guess. I think link(2) would create race conditions of its own. I'd be inclined to suggest that maybe we should just special-case a zero length postmaster.pid file as meaning okay to proceed. In general, garbage data in postmaster.pid is something I'm happy to insist on manual recovery from, but maybe we could safely make an exception for empty files. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 4:04 PM, Robert Haas robertmh...@gmail.com wrote: It appears to me that increasing the number of CLOG buffers reduced the severity of the latency spikes considerably. In the last 100 seconds, for example, master has several spikes in the 500-700ms range, but with 32 CLOG buffers it never goes above 400 ms. Also, the number of points associated with each spike is considerably less - each spike seems to affect fewer transactions. So it seems that at least on this machine, increasing the number of CLOG buffers both improves performance and reduces latency. I believed before that the increase was worthwhile and now even more so. Let's commit the change to 32. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
Simon Riggs si...@2ndquadrant.com wrote: Robert Haas robertmh...@gmail.com wrote: So it seems that at least on this machine, increasing the number of CLOG buffers both improves performance and reduces latency. I believed before that the increase was worthwhile and now even more so. Let's commit the change to 32. +1 -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Client Messages
I have a need to send banner messages to a psql client that I can set on the server and will be displayed on any psql client that connects to the database. This would be mostly used as an additional indicator to which database you are connecting, but could also be used by people to force their users to see an security message when connecting to the database. The attached patch will allow you to execute ALTER DATABASE postgres SET client_message=E'\nBEWARE: You are connecting to a production database. If you do anything to\n bring this server down, you will be destroyed by your supreme overlord.\n\n'; And then when you connect to psql, you will see: [e3@workstation bin]$ ./psql -U user1 postgres psql (9.2devel) BEWARE: You are connecting to a production database. If you do anything to bring this server down, you will be destroyed by your supreme overlord. Type help for help. postgres= Any feedback is welcome. Thanks diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index 0cc3296..fa5b942 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** dynamic_library_path = 'C:\tools\postgre *** 5323,5328 --- 5323,5341 /listitem /varlistentry + varlistentry id=guc-client-message xreflabel=client_message + termvarnameclient_message/varname (typestring/type)/term + indexterm +primaryvarnameclient_message/ configuration parameter/primary + /indexterm + listitem +para + The varnameclient_message/varname can be any string that will be + displayed to the user in the banner of psql. +/para + /listitem + /varlistentry + /variablelist /sect2 /sect1 diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c new file mode 100644 index 5c910dd..58b6000 *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *** static char *log_destination_string; *** 455,460 --- 455,461 static char *syslog_ident_str; static bool phony_autocommit; static bool session_auth_is_superuser; + static char *client_message_string; static double phony_random_seed; static char *client_encoding_string; static char *datestyle_string; *** static struct config_string ConfigureNam *** 3018,3023 --- 3019,3035 check_application_name, assign_application_name, NULL }, + { + {client_message, PGC_USERSET, CLIENT_CONN_OTHER, + gettext_noop(Sets a message to be displayed to the user when connecting via psql.), + NULL, + GUC_REPORT | GUC_NO_SHOW_ALL + }, + client_message_string, + , + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, NULL, NULL, NULL, NULL diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample new file mode 100644 index 315db46..8eb5af5 *** a/src/backend/utils/misc/postgresql.conf.sample --- b/src/backend/utils/misc/postgresql.conf.sample *** *** 515,520 --- 515,521 #dynamic_library_path = '$libdir' #local_preload_libraries = '' + #client_message = '' #-- diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c new file mode 100644 index 69fac83..44bf114 *** a/src/bin/psql/command.c --- b/src/bin/psql/command.c *** static bool get_create_function_cmd(PGco *** 65,70 --- 65,71 static int strip_lineno_from_funcdesc(char *func); static void minimal_error_message(PGresult *res); + static void printClientMessage(void); static void printSSLInfo(void); #ifdef WIN32 *** connection_warnings(bool in_startup) *** 1699,1709 --- 1700,1729 checkWin32Codepage(); #endif printSSLInfo(); + + printClientMessage(); } } /* + * printClientMessage + * + * Prints any message stored in the client_message GUC + */ + static void + printClientMessage(void) + { + const char *message; + + message = PQparameterStatus(pset.db, client_message); + + if (message) + printf(_(%s\n), message); + } + + + /* * printSSLInfo * * Prints information about the current SSL connection, if SSL is in use -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Client Messages
Jim Mlodgenski jimm...@gmail.com wrote: Any feedback is welcome. You might want to add it here to make sure it doesn't slip through the cracks: https://commitfest.postgresql.org/action/commitfest_view/open -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/05/2012 10:34 AM, Tom Lane wrote: Unconditional sv_mortalcopy sounds like the thing to do then, but a comment would help. And if this isn't a Perl bug, I would like to know what is. Indeed. David, can you file a bug on this? BTW, shouldn't we be making some attempt to drop the result of the sv_mortalcopy? Or is it just assumed that it will be garbage-collected before the memory leak gets too bad? Yes, it will be garbage collected before long. See discussion in perldoc perlguts. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On Jan 5, 2012, at 9:08 AM, Andrew Dunstan wrote: Unconditional sv_mortalcopy sounds like the thing to do then, but a comment would help. And if this isn't a Perl bug, I would like to know what is. Indeed. David, can you file a bug on this? I could, but don’t know what to say, and wouldn’t be able to answer any questions intelligently. I suggest you or Tom file one. Just run `perlbug` and follow the prompts. I’m sure p5p would appreciate a good bug report that I would not be able to send. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On Jan 5, 2012, at 7:34 AM, Tom Lane wrote: That still crashes, but doesn't if we use sv_mortalcopy unconditionally. Unconditional sv_mortalcopy sounds like the thing to do then, but a comment would help. And if this isn't a Perl bug, I would like to know what is. Question: Is this an issue anywhere else in PL/Perl, or just elog()? What about SPI parameters or return values? david=# DO LANGUAGE PLPERL $$ david$# my $plan = spi_prepare('SELECT $1', 'TEXT'); david$# spi_query_prepared($plan, $^V); david$# spi_freeplan($plan); david$# return; david$# $$; ERROR: cannot convert Perl hash to non-composite type text at line 3. CONTEXT: PL/Perl anonymous code block No segfault, at least, though that’s a rather bizarre error message. AFAIK, $^V isn’t a hash. This works, though: DO LANGUAGE PLPERL $$ my $plan = spi_prepare('SELECT $1', 'TEXT'); spi_query_prepared($plan, v1); spi_freeplan($plan); return; $$; DO Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
David E. Wheeler da...@justatheory.com writes: On Jan 5, 2012, at 7:34 AM, Tom Lane wrote: Unconditional sv_mortalcopy sounds like the thing to do then, but a comment would help. And if this isn't a Perl bug, I would like to know what is. Question: Is this an issue anywhere else in PL/Perl, or just elog()? I would imagine you could reproduce it by returning the same kinds of objects as function results, since the actual problem is in utf8 to database-encoding conversion. No segfault, at least, though thats a rather bizarre error message. AFAIK, $^V isnt a hash. This works, though: spi_query_prepared($plan, v1); Is that actually a vstring? I confess I'd never heard of the things before this thread, but I remember reading somewhere that you need multiple dots in a string before it's considered a vstring and not something else. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/05/2012 12:28 PM, David E. Wheeler wrote: On Jan 5, 2012, at 7:34 AM, Tom Lane wrote: That still crashes, but doesn't if we use sv_mortalcopy unconditionally. Unconditional sv_mortalcopy sounds like the thing to do then, but a comment would help. And if this isn't a Perl bug, I would like to know what is. Question: Is this an issue anywhere else in PL/Perl, or just elog()? What about SPI parameters or return values? The fix that has been applied, as Tom suggested, is at the point where we call SvPVutf8(), so that's not just for elog(). david=# DO LANGUAGE PLPERL $$ david$# my $plan = spi_prepare('SELECT $1', 'TEXT'); david$# spi_query_prepared($plan, $^V); david$# spi_freeplan($plan); david$# return; david$# $$; ERROR: cannot convert Perl hash to non-composite type text at line 3. CONTEXT: PL/Perl anonymous code block No segfault, at least, though that’s a rather bizarre error message. AFAIK, $^V isn’t a hash. This works, though: As documented, it's not a scalar, and you need to stop treating it as one. If you want it as a scalar, which is what you'd need here, enclose it in quotes, or use the stuff shown in perldoc perlvar. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On Jan 5, 2012, at 9:41 AM, Tom Lane wrote: I would imagine you could reproduce it by returning the same kinds of objects as function results, since the actual problem is in utf8 to database-encoding conversion. No segfault, at least, though that‚s a rather bizarre error message. AFAIK, $^V isn‚t a hash. This works, though: spi_query_prepared($plan, v1); Is that actually a vstring? I confess I'd never heard of the things before this thread, but I remember reading somewhere that you need multiple dots in a string before it's considered a vstring and not something else. Yes, it is. You can declare v-strings either by prepending a v to one or more dot-separated integers, or just 3 or more dotted integers. http://austin.pm.org/presentations/march2000/raty/vstrings.html I believe the latter syntax is deprecated, though; it turned out to be confusing and pretty roundly hated. It's preferred to use the v syntax. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On Jan 5, 2012, at 9:50 AM, Andrew Dunstan wrote: The fix that has been applied, as Tom suggested, is at the point where we call SvPVutf8(), so that's not just for elog(). Great, thanks. As documented, it's not a scalar, and you need to stop treating it as one. If you want it as a scalar, which is what you'd need here, enclose it in quotes, or use the stuff shown in perldoc perlvar. Oh, right $^V is a version object. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/05/2012 12:41 PM, Tom Lane wrote: Is that actually a vstring? I confess I'd never heard of the things before this thread, but I remember reading somewhere that you need multiple dots in a string before it's considered a vstring and not something else. perldoc perlvar says: The revision, version, and subversion of the Perl interpreter, represented as a version object. This variable first appeared in perl 5.6.0; earlier versions of perl will see an undefined value. Before perl 5.10.0 $^V was represented as a v-string. I'm quite sure David isn't running on something older than 5.10.0. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/05/2012 12:17 PM, David E. Wheeler wrote: On Jan 5, 2012, at 9:08 AM, Andrew Dunstan wrote: Unconditional sv_mortalcopy sounds like the thing to do then, but a comment would help. And if this isn't a Perl bug, I would like to know what is. Indeed. David, can you file a bug on this? I could, but don’t know what to say, and wouldn’t be able to answer any questions intelligently. I suggest you or Tom file one. Just run `perlbug` and follow the prompts. I’m sure p5p would appreciate a good bug report that I would not be able to send. OK, done. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
Andrew Dunstan and...@dunslane.net writes: The fix that has been applied, as Tom suggested, is at the point where we call SvPVutf8(), so that's not just for elog(). That patch looks good, but do we want to look into the other point about error recovery? The real bottom line here is that when SvPVutf8 threw a croak(), we didn't recover sanely at all --- in fact, it was almost impossible to tell what had happened even in gdb, until I installed perl debugging symbols and single-stepped through things. I would have hoped that we'd at least see the croak message. This seems a bit nervous-making, as plperl sure calls a lot of stuff that in principle could croak. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On 01/05/2012 01:09 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: The fix that has been applied, as Tom suggested, is at the point where we call SvPVutf8(), so that's not just for elog(). That patch looks good, but do we want to look into the other point about error recovery? The real bottom line here is that when SvPVutf8 threw a croak(), we didn't recover sanely at all --- in fact, it was almost impossible to tell what had happened even in gdb, until I installed perl debugging symbols and single-stepped through things. I would have hoped that we'd at least see the croak message. This seems a bit nervous-making, as plperl sure calls a lot of stuff that in principle could croak. Yes, I think so. That seemed like a separate issue, though, which is why I just applied the workaround to the reported problem. I had similar issues with gdb - it wasn't a pleasant experience. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 4:04 PM, Robert Haas robertmh...@gmail.com wrote: I hypothesize that there are actually two kinds of latency spikes here. Just taking a wild guess, I wonder if the *remaining* latency spikes are caused by the effect that you mentioned before: namely, the need to write an old CLOG page every time we advance onto a new one. I further speculate that the spikes are more severe on the unpatched code because this effect combines with the one I mentioned before: if there are more simultaneous I/O requests than there are buffers, a new I/O request has to wait for one of the I/Os already in progress to complete. If the new I/O request that has to wait extra-long happens to be the one caused by XID advancement, then things get really ugly. If that hypothesis is correct, then it supports your previous belief that more than one fix is needed here... but it also means we can get a significant and I think quite worthwhile benefit just out of finding a reasonable way to add some more buffers. Sounds reaonable. Patch to remove clog contention caused by my dirty clog LRU. The patch implements background WAL allocation also, with the intention of being separately tested, if possible. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 4060e60..dbefa02 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -565,6 +565,26 @@ CheckPointCLOG(void) TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true); } +/* + * Conditionally flush the CLOG LRU. + * + * When a backend does ExtendCLOG we need to write the CLOG LRU if it is + * dirty. Performing I/O while holding XidGenLock prevents new write + * transactions from starting. To avoid that we flush the CLOG LRU, if + * we think that a page write is due soon, according to a heuristic. + * + * Note that we're reading ShmemVariableCache-nextXid without a lock + * since the exact value doesn't matter as input into our heuristic. + */ +void +CLOGBackgroundFlushLRU(void) +{ + TransactionId xid = ShmemVariableCache-nextXid; + int threshold = (CLOG_XACTS_PER_PAGE - (CLOG_XACTS_PER_PAGE / 4)); + + if (TransactionIdToPgIndex(xid) threshold) + SlruBackgroundFlushLRUPage(ClogCtl); +} /* * Make sure that CLOG has room for a newly-allocated XID. diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c index 30538ff..aea6c09 100644 --- a/src/backend/access/transam/slru.c +++ b/src/backend/access/transam/slru.c @@ -885,6 +885,82 @@ SlruReportIOError(SlruCtl ctl, int pageno, TransactionId xid) } /* + * Identify the LRU slot but just leave it as it is. + * + * Control lock must be held at entry, and will be held at exit. + */ +static int +SlruIdentifyLRUSlot(SlruCtl ctl) +{ + SlruShared shared = ctl-shared; + int slotno; + int cur_count; + int bestslot; + int best_delta; + int best_page_number; + + /* + * If we find any EMPTY slot, just select that one. Else locate the + * least-recently-used slot. + * + * Normally the page_lru_count values will all be different and so + * there will be a well-defined LRU page. But since we allow + * concurrent execution of SlruRecentlyUsed() within + * SimpleLruReadPage_ReadOnly(), it is possible that multiple pages + * acquire the same lru_count values. In that case we break ties by + * choosing the furthest-back page. + * + * In no case will we select the slot containing latest_page_number + * for replacement, even if it appears least recently used. + * + * Notice that this next line forcibly advances cur_lru_count to a + * value that is certainly beyond any value that will be in the + * page_lru_count array after the loop finishes. This ensures that + * the next execution of SlruRecentlyUsed will mark the page newly + * used, even if it's for a page that has the current counter value. + * That gets us back on the path to having good data when there are + * multiple pages with the same lru_count. + */ + cur_count = (shared-cur_lru_count)++; + best_delta = -1; + bestslot = 0; /* no-op, just keeps compiler quiet */ + best_page_number = 0; /* ditto */ + for (slotno = 0; slotno shared-num_slots; slotno++) + { + int this_delta; + int this_page_number; + + if (shared-page_status[slotno] == SLRU_PAGE_EMPTY) + return slotno; + this_delta = cur_count - shared-page_lru_count[slotno]; + if (this_delta 0) + { + /* + * Clean up in case shared updates have caused cur_count + * increments to get lost. We back off the page counts, + * rather than trying to increase cur_count, to avoid any + * question of infinite loops or failure in the presence of + * wrapped-around counts. + */ + shared-page_lru_count[slotno] = cur_count; + this_delta = 0; + } + this_page_number = shared-page_number[slotno]; + if ((this_delta best_delta || +
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote: Let's commit the change to 32. I would like to do that, but I think we need to at least figure out a way to provide an escape hatch for people without much shared memory. We could do that, perhaps, by using a formula like this: 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a maximum of 32 I also think it would be a worth a quick test to see how the increase performs on a system with 32 cores. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 7:12 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote: Let's commit the change to 32. I would like to do that, but I think we need to at least figure out a way to provide an escape hatch for people without much shared memory. We could do that, perhaps, by using a formula like this: 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a maximum of 32 We're talking about an extra 192KB or thereabouts and Clog buffers will only be the size of subtrans when we've finished. If you want to have a special low-memory option, then it would need to include many more things than clog buffers. Let's just use a constant value for clog buffers until the low-memory patch arrives. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 2:21 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Jan 5, 2012 at 7:12 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote: Let's commit the change to 32. I would like to do that, but I think we need to at least figure out a way to provide an escape hatch for people without much shared memory. We could do that, perhaps, by using a formula like this: 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a maximum of 32 We're talking about an extra 192KB or thereabouts and Clog buffers will only be the size of subtrans when we've finished. If you want to have a special low-memory option, then it would need to include many more things than clog buffers. Let's just use a constant value for clog buffers until the low-memory patch arrives. Tom already stated that he found that unacceptable. Unless he changes his opinion, we're not going to get far if you're only happy if it's constant and he's only happy if there's a formula. On the other hand, I think there's a decent argument that he should change his opinion, because 192kB of memory is not a lot. However, what I mostly want is something that nobody hates, so we can get it committed and move on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 1:12 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote: Let's commit the change to 32. I would like to do that, but I think we need to at least figure out a way to provide an escape hatch for people without much shared memory. We could do that, perhaps, by using a formula like this: 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a maximum of 32 The assumption that machines that need this will have gigabytes of shared memory set is not valid IMO. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 7:26 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 5, 2012 at 2:21 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Jan 5, 2012 at 7:12 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote: Let's commit the change to 32. I would like to do that, but I think we need to at least figure out a way to provide an escape hatch for people without much shared memory. We could do that, perhaps, by using a formula like this: 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a maximum of 32 We're talking about an extra 192KB or thereabouts and Clog buffers will only be the size of subtrans when we've finished. If you want to have a special low-memory option, then it would need to include many more things than clog buffers. Let's just use a constant value for clog buffers until the low-memory patch arrives. Tom already stated that he found that unacceptable. Unless he changes his opinion, we're not going to get far if you're only happy if it's constant and he's only happy if there's a formula. On the other hand, I think there's a decent argument that he should change his opinion, because 192kB of memory is not a lot. However, what I mostly want is something that nobody hates, so we can get it committed and move on. If that was a reasonable objection it would have applied when we added serializable support, or any other SLRU for that matter. If memory reduction is a concern to anybody, then a separate patch to address *all* issues is required. Blocking this patch makes no sense. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
Excerpts from Simon Riggs's message of jue ene 05 16:21:31 -0300 2012: On Thu, Jan 5, 2012 at 7:12 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote: Let's commit the change to 32. I would like to do that, but I think we need to at least figure out a way to provide an escape hatch for people without much shared memory. We could do that, perhaps, by using a formula like this: 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a maximum of 32 We're talking about an extra 192KB or thereabouts and Clog buffers will only be the size of subtrans when we've finished. Speaking of which, maybe it'd be a good idea to parametrize the subtrans size according to the same (or a similar) formula too. (It might be good to reduce multixact memory consumption too; I'd think that 4+4 pages should be more than sufficient for low memory systems, so making those be half the clog values should be good) So you get both things: reduce memory usage for systems on the low end, which has been slowly increasing lately as we've added more uses of SLRU, and more buffers for large systems. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
Robert Haas robertmh...@gmail.com wrote: Simon Riggs si...@2ndquadrant.com wrote: Robert Haas robertmh...@gmail.com wrote: Simon Riggs si...@2ndquadrant.com wrote: Let's commit the change to 32. I would like to do that, but I think we need to at least figure out a way to provide an escape hatch for people without much shared memory. We could do that, perhaps, by using a formula like this: 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a maximum of 32 If we go with such a formula, I think 32 MB would be a more appropriate divisor than 128 MB. Even on very large machines where 32 CLOG buffers would be a clear win, we often can't go above 1 or 2 GB of shared_buffers without hitting latency spikes due to overrun of the RAID controller cache. (Now, that may change if we get DW in, but that's not there yet.) 1 GB / 32 is 32 MB. This would leave CLOG pinned at the minimum of 8 buffers (64 KB) all the way up to shared_buffers of 256 MB. Let's just use a constant value for clog buffers until the low-memory patch arrives. Tom already stated that he found that unacceptable. Unless he changes his opinion, we're not going to get far if you're only happy if it's constant and he's only happy if there's a formula. On the other hand, I think there's a decent argument that he should change his opinion, because 192kB of memory is not a lot. However, what I mostly want is something that nobody hates, so we can get it committed and move on. I wouldn't hate it either way, as long as the divisor isn't too large. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
Robert Haas robertmh...@gmail.com writes: I would like to do that, but I think we need to at least figure out a way to provide an escape hatch for people without much shared memory. We could do that, perhaps, by using a formula like this: 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a maximum of 32 I would be in favor of that, or perhaps some other formula (eg, maybe the minimum should be less than 8 for when you've got very little shmem). I think that the reason it's historically been a constant is that the original coding took advantage of having a compile-time-constant number of buffers --- but since we went over to the common SLRU infrastructure for several different logs, there's no longer any benefit whatever to using a simple constant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
Simon Riggs si...@2ndquadrant.com writes: On Thu, Jan 5, 2012 at 7:26 PM, Robert Haas robertmh...@gmail.com wrote: On the other hand, I think there's a decent argument that he should change his opinion, because 192kB of memory is not a lot. However, what I mostly want is something that nobody hates, so we can get it committed and move on. If that was a reasonable objection it would have applied when we added serializable support, or any other SLRU for that matter. If memory reduction is a concern to anybody, then a separate patch to address *all* issues is required. Blocking this patch makes no sense. No, your argument is the one that makes no sense. The fact that things could be made better for low-mem situations is not an argument for instead making them worse. Which is what going to a fixed value of 32 would do, in return for no benefit that I can see compared to using a formula of some sort. The details of the formula barely matter, though I would like to see one that bottoms out at less than 8 buffers so that there is some advantage gained for low-memory cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:
On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: My laptop ran out of battery and turned itself off while I was just starting up postmaster. After plugging in the charger and rebooting, I got the following error when I tried to restart PostgreSQL: FATAL: bogus data in lock file postmaster.pid: postmaster.pid file was present in the data directory, but had zero length. Looking at the way the file is created and written, that can happen if you crash after the file is created, but before it's written/fsync'd (my laptop might have write-cache enabled, which would make the window larger). I was a bit surprised by that. That's probably not a big deal in practice, but I wonder if there was some easy way to avoid that. First I thought we could create the new postmaster.pid file with a temporary name and rename it in place, but rename(2) will merrily overwrite any existing file which is not what we want. We could use link(2), I guess. I think link(2) would create race conditions of its own. I'd be inclined to suggest that maybe we should just special-case a zero length postmaster.pid file as meaning okay to proceed. In general, garbage That's pretty much what I meant - but with a warning message. data in postmaster.pid is something I'm happy to insist on manual recovery from, but maybe we could safely make an exception for empty files. +1. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 7:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think that the reason it's historically been a constant is that the original coding took advantage of having a compile-time-constant number of buffers --- but since we went over to the common SLRU infrastructure for several different logs, there's no longer any benefit whatever to using a simple constant. You astound me, you really do. Parameterised slru buffer sizes were proposed about for 8.3 and opposed by you. I guess we all reserve the right to change our minds... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
Simon Riggs si...@2ndquadrant.com writes: Parameterised slru buffer sizes were proposed about for 8.3 and opposed by you. I guess we all reserve the right to change our minds... When presented with new data, sure. Robert's results offer a reason to worry about this, which we did not have before now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
On Thu, Dec 29, 2011 at 9:03 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: 3. Resolve two anticipated controversies that are, respectively, somewhat orthogonal and completely orthogonal to the binary bloat controversy. The first (controversy A) is that I have added a new piece of infrastructure, pg_always_inline, which, as the name suggests, is a portable way of insisting that a function should be invariably inlined. Portable libraries like libc have been doing this for a long time now, and I actually use the libc macro (that expands to __attribute__((always_inline)) ) where possible. I don't have a problem with the idea of a pg_always_inline, but I'm wondering what sort of fallback mechanism you propose. It seems to me that if the performance improvement is conditioned on inlining be done and we're not confident that we can force the compiler to inline, we ought to fall back to not making the specialization available, rather than to making something available that may not work well. Of course if it's only a question of a smaller performance gain, rather than an actual loss, then that's not as much of an issue... The second possible point of contention (controversy B) is that I have jettisoned various protections against bad qsort implementations that I believe are a legacy of when we used the system qsort pre-2006, that can no longer be justified. For example, quick sort performing badly in the face of lots of duplicates is a well understood problem today (partitioning should stop on equal keys), and ISTM that protecting against that outside the qsort implementation (but only for index sorts) is wrong-headed. Assuming you mean that qsort_arg() will protect against this stuff so that callers don't need to do it separately, +1. I had another idea when writing this patch that I haven't developed at all but I'll share anyway. That is, it might be a good idea to use a balance quicksort: http://xlinux.nist.gov/dads//HTML/balancedqsrt.html I can't find any description of how this actually works... obviously, we'd like to find a close-to-median element, but how do we actually do that cheaply? It might be possible to get a reasonable approximation of the actual median value of a given column with existing statistics, which could be hinted to qsort_arg. I doubt that this would be worthwhile -- the pivot that we pick at the toplevel doesn't really matter much; even if it's bad, we can recover just fine if the pivot we pick one level down is good, or the level after that. We only really have a problem if we keep systematically picking bad pivots every time. And if we do have that problem, improving the toplevel choice of pivot is only going to help modestly, because we'll still be O(n^2) on each partition. Can I get a straw poll on how much of a problem worst-case performance of qsort is believed to be? I'd be reluctant to remove any of the protections we currently have, because I bet they were all put in to fix problems that people hit in the field. But I don't know that we need more. Of course, consolidating them into qsort() itself rather than its callers probably makes sense, as noted above. In a perfect world, if it were somehow possible to know the perfect pivots ahead of time from a histogram or something, we'd have a quick sort variant with worst-case performance of O(n log(n)). That, or the limited approximation that I've sketched would perhaps be worthwhile, even if it was subject to a number of caveats. Wikipedia claims that the worst case for quick sort is O(n log(n)) with the refinements recommended by Sedgewick's 1978 paper, but this seems like nonsense to me - the med3 technique is a good heuristic in practice, but it's perfectly possible in theory for it's sampling to always get things completely wrong (this is not an unfamiliar problem). I agree. After reading http://nanxkurniawan.wordpress.com/2010/05/31/quicksort/ I am inclined to think that things are still O(n lg n) even if we always partition so that any fixed percentage of the data -- is on one side of the partition element and the remainder is on the other side. So for example if all of our partition elements are greater than 1% of the elements in their partition and less than the other 99%, we'll still be O(n lg n), though with a considerably higher constant factor, I think. However, if our partition elements are always a fixed NUMBER of elements from the end - whether it's 1 or 100 - then we'll be O(n^2), though again the constant factor will depend on how many elements you knock off each time. In practice I'm not sure whether an algorithmic analysis matters much, because in real life the constant factors are probably pretty important, hence the median-of-medians approach with n 40. While I'm thinking out loud, here's another idea: Have a GUC which, when enabled (perhaps potentially at various different granularities), makes Timsort the in-memory sorting algorithm, as it now
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 2:44 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: If we go with such a formula, I think 32 MB would be a more appropriate divisor than 128 MB. Even on very large machines where 32 CLOG buffers would be a clear win, we often can't go above 1 or 2 GB of shared_buffers without hitting latency spikes due to overrun of the RAID controller cache. (Now, that may change if we get DW in, but that's not there yet.) 1 GB / 32 is 32 MB. This would leave CLOG pinned at the minimum of 8 buffers (64 KB) all the way up to shared_buffers of 256 MB. That seems reasonable to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 2:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: I would be in favor of that, or perhaps some other formula (eg, maybe the minimum should be less than 8 for when you've got very little shmem). I have some results that show that, under the right set of circumstances, 8-32 is a win, and I can quantify by how much it wins. I don't have any data at all to quantify the cost of dropping the minimum from 8-6, or from 8-4, and therefore I'm reluctant to do it. My guess is that it's a bad idea, anyway. Even on a system where shared_buffers is just 8MB, we have 1024 regular buffers and 8 CLOG buffers. If we reduce the number of CLOG buffers from 8 to 4 (i.e. by 50%), we can increase the number of regular buffers from 1024 to 1028 (i.e. by 0.5%). Maybe you can find a case where that comes out to a win, but you might have to look pretty hard. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Jan 5, 2012 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 5, 2012 at 2:44 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: If we go with such a formula, I think 32 MB would be a more appropriate divisor than 128 MB. Even on very large machines where 32 CLOG buffers would be a clear win, we often can't go above 1 or 2 GB of shared_buffers without hitting latency spikes due to overrun of the RAID controller cache. (Now, that may change if we get DW in, but that's not there yet.) 1 GB / 32 is 32 MB. This would leave CLOG pinned at the minimum of 8 buffers (64 KB) all the way up to shared_buffers of 256 MB. That seems reasonable to me. likewise (champion bikeshedder here). It just so happens I typically set 'large' server shared memory to 256mb. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Perl Does not Like vstrings
On Jan 5, 2012, at 9:55 AM, Andrew Dunstan wrote: perldoc perlvar says: The revision, version, and subversion of the Perl interpreter, represented as a version object. This variable first appeared in perl 5.6.0; earlier versions of perl will see an undefined value. Before perl 5.10.0 $^V was represented as a v-string. I'm quite sure David isn't running on something older than 5.10.0. Perl 5.14.2: perl -E 'say ref $^V' version Perl 5.12.4: perl -E 'say ref $^V' version Perl 5.10.1: perl -E 'say ref $^V' version Perl 5.8.9: perl -le 'print ref $^V' Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Work around perl bug in SvPVutf8().
On 01/05/2012 12:05 PM, Andrew Dunstan wrote: Work around perl bug in SvPVutf8(). Certain things like typeglobs or readonly things like $^V cause perl's SvPVutf8() to die nastily and crash the backend. To avoid that bug we make a copy of the object, which will subsequently be garbage collected. Back patched to 9.1 where we first started using SvPVutf8(). Per -hackers discussion. Original problem reported by David Wheeler. Ugh. This broke the regression tests. I'll get it fixed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] our buffer replacement strategy is kind of lame
On 01/03/2012 06:22 PM, Jim Nasby wrote: On Jan 3, 2012, at 11:15 AM, Robert Haas wrote: I think that our current freelist is practically useless, because it is almost always empty, and the cases where it's not empty (startup, and after a table or database drop) are so narrow that we don't really get any benefit out of having it. However, I'm not opposed to the idea of a freelist in general: I think that if we actually put in some effort to keep the freelist in a non-empty state it would help a lot, because backends would then have much less work to do at buffer allocation time. This is exactly what the FreeBSD VM system does (which is at least one of the places where the idea of a clock sweep for PG came from ages ago). There is a process that does nothing but attempt to keep X amount of memory on the free list, where it can immediately be grabbed by anything that needs memory. Pages on the freelist are guaranteed to be clean (as in not dirty), but not zero'd. In fact, IIRC if a page on the freelist gets referenced again it can be pulled back out of the free list and put back into an active state. The one downside I see to this is that we'd need some heuristic to determine how many buffers we want to keep on the free list. http://wiki.postgresql.org/wiki/Todo#Background_Writer has Consider adding buffers the background writer finds reusable to the free list and Automatically tune bgwriter_delay based on activity rather then using a fixed interval, which both point to my 8.3 musing and other suggestionss starting at http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php I could write both those in an afternoon. The auto-tuning stuff already in the background writer originally intended to tackle this issue, but dropped it in lieu of shipping something simpler first. There's even a prototype somewhere on an old drive here. The first missing piece needed before this was useful was separating out the background writer and checkpointer processes. Once I realized the checkpoints were monopolizing so much time, especially when they hit bad states, it was obvious the writer couldn't be relied upon for this job. That's much better now since Simon's 806a2aee3791244bf0f916729bfdb5489936e068 Split work of bgwriter between 2 processes: bgwriter and checkpointer, which just became available in November to build on. The second missing piece blocking this work in my mind was how exactly we're going to benchmark the result, mainly to prove it doesn't hurt some workloads. I haven't fully internalized the implications of Robert's upthread comments, in terms of being able to construct a benchmark stressing both the best and worst case situation here. That's really the hardest part of this whole thing, by a lot. Recent spending has brought me an 8 HyperThread core laptop that can also run DTrace, so I expect to have better visibility into this soon too. I think here in 2011 the idea of having a background writer process that could potentially occupy most of a whole core doing work so backends don't have to is an increasingly attractive one. So long as that comes along with an auto-tuning delay, it shouldn't hurt the work toward lowering power management either. Might even help really, by allowing larger values of bgwriter_delay than you'd want to use during busy periods. I was planning to mimic the sort of fast attack/slow delay logic already used for the auto-tuned timing, so that you won't fall behind by more than one bgwriter_delay worth of activity. Then it should realize a burst is here and the writer has to start moving faster. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:
Magnus Hagander mag...@hagander.net writes: On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote: I think link(2) would create race conditions of its own. I'd be inclined to suggest that maybe we should just special-case a zero length postmaster.pid file as meaning okay to proceed. In general, garbage That's pretty much what I meant - but with a warning message. Actually ... wait a minute. If we allow that, don't we create a race condition between two postmasters starting at almost the same instant? The second one could see the lock file when the first has created but not yet filled it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
Robert Haas robertmh...@gmail.com writes: On Thu, Dec 29, 2011 at 9:03 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: The first (controversy A) is that I have added a new piece of infrastructure, pg_always_inline, which, as the name suggests, is a portable way of insisting that a function should be invariably inlined. I don't have a problem with the idea of a pg_always_inline, but I'm wondering what sort of fallback mechanism you propose. There is no compiler anywhere that implements always inline, unless you are talking about a macro. inline is a hint and nothing more, and if you think you can force it you are mistaken. So this controversy is easily resolved: we do not need any such construct. The real question is whether we should accept a patch that is a performance loss when the compiler fails to inline some reasonably simple function. I think that would depend on the actual numbers involved, so we'd need to see data before making a decision. The second possible point of contention (controversy B) is that I have jettisoned various protections against bad qsort implementations that I believe are a legacy of when we used the system qsort pre-2006, that can no longer be justified. No objection to that one, I think, as long as you've verified that our implementation is in fact okay about these things. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Work around perl bug in SvPVutf8().
Excerpts from Andrew Dunstan's message of jue ene 05 19:06:54 -0300 2012: On 01/05/2012 12:05 PM, Andrew Dunstan wrote: Work around perl bug in SvPVutf8(). Certain things like typeglobs or readonly things like $^V cause perl's SvPVutf8() to die nastily and crash the backend. To avoid that bug we make a copy of the object, which will subsequently be garbage collected. Back patched to 9.1 where we first started using SvPVutf8(). Per -hackers discussion. Original problem reported by David Wheeler. Ugh. This broke the regression tests. I'll get it fixed. Yeah, notice that it's causing crashes, not just different results. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 5, 2012 at 2:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: I would be in favor of that, or perhaps some other formula (eg, maybe the minimum should be less than 8 for when you've got very little shmem). I have some results that show that, under the right set of circumstances, 8-32 is a win, and I can quantify by how much it wins. I don't have any data at all to quantify the cost of dropping the minimum from 8-6, or from 8-4, and therefore I'm reluctant to do it. My guess is that it's a bad idea, anyway. Even on a system where shared_buffers is just 8MB, we have 1024 regular buffers and 8 CLOG buffers. If we reduce the number of CLOG buffers from 8 to 4 (i.e. by 50%), we can increase the number of regular buffers from 1024 to 1028 (i.e. by 0.5%). Maybe you can find a case where that comes out to a win, but you might have to look pretty hard. I think you're rejecting the concept too easily. A setup with very little shmem is only going to be suitable for low-velocity systems that are not pushing too many transactions through per second, so it's not likely to need so many CLOG buffers. And frankly I'm not that concerned about what the performance is like: I'm more concerned about whether PG will start up at all without modifying the system shmem limits, on systems with legacy values for SHMMAX etc. Shaving a few single-purpose buffers to make back what we spent on SSI, for example, seems like a good idea to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Work around perl bug in SvPVutf8().
On 01/05/2012 05:30 PM, Alvaro Herrera wrote: Excerpts from Andrew Dunstan's message of jue ene 05 19:06:54 -0300 2012: On 01/05/2012 12:05 PM, Andrew Dunstan wrote: Work around perl bug in SvPVutf8(). Certain things like typeglobs or readonly things like $^V cause perl's SvPVutf8() to die nastily and crash the backend. To avoid that bug we make a copy of the object, which will subsequently be garbage collected. Back patched to 9.1 where we first started using SvPVutf8(). Per -hackers discussion. Original problem reported by David Wheeler. Ugh. This broke the regression tests. I'll get it fixed. Yeah, notice that it's causing crashes, not just different results. Indeed. It looks like the garbage collection was a bit too eager, so I have a fix where we control when it gets done. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 2012-01-05 10:04, Benedikt Grundmann wrote: I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. It'd be really nice if the DBMS measured actual experienced values.. -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.
On Thu, Jan 5, 2012 at 16:02, Andrew Dunstan and...@dunslane.net wrote: Fix breakage from earlier plperl fix. Apparently the perl garbage collector was a bit too eager, so here we control when the new SV is garbage collected. I know im a little late to the party... I can't help but think this seems a bit inefficient for the common case. Would it be worth only copying the sv when its a glob or readonly? Something like the below? I tested a few more svtypes that were easy to make (code, regexp) and everything seems peachy. [ BTW it seems readonly in general is fine, for example elog(ERROR, 1); worked previously as well as elog(ERROR, 1);. Both of those sv's are readonly. ISTM, at least on my version of perl (5.14.2), globs and readonly vstrings seem to be the problem children. I think we could get away with testing if its a glob or vstring. But I don't have time right now to test all the way back to perl 5.8 and everything in-between, Ill find it if anyone is interested. ] -- *** a/src/pl/plperl/plperl_helpers.h --- b/src/pl/plperl/plperl_helpers.h *** *** 47,53 sv2cstr(SV *sv) { char *val, *res; STRLEN len; - SV *nsv; /* * get a utf8 encoded char * out of perl. *note* it may not be valid utf8! --- 47,52 *** *** 58,65 sv2cstr(SV *sv) * sv before passing it to SvPVutf8(). The copy is garbage collected * when we're done with it. */ ! nsv = newSVsv(sv); ! val = SvPVutf8(nsv, len); /* * we use perl's length in the event we had an embedded null byte to ensure --- 57,68 * sv before passing it to SvPVutf8(). The copy is garbage collected * when we're done with it. */ ! if(SvTYPE(sv) == SVt_PVGV || SvREADONLY(sv)) ! sv = newSVsv(sv); ! else ! SvREFCNT_inc(sv); ! ! val = SvPVutf8(sv, len); /* * we use perl's length in the event we had an embedded null byte to ensure *** *** 68,74 sv2cstr(SV *sv) res = utf_u2e(val, len); /* safe now to garbage collect the new SV */ ! SvREFCNT_dec(nsv); return res; } --- 71,77 res = utf_u2e(val, len); /* safe now to garbage collect the new SV */ ! SvREFCNT_dec(sv); return res; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.
On 01/05/2012 06:31 PM, Alex Hunsaker wrote: On Thu, Jan 5, 2012 at 16:02, Andrew Dunstanand...@dunslane.net wrote: Fix breakage from earlier plperl fix. Apparently the perl garbage collector was a bit too eager, so here we control when the new SV is garbage collected. I know im a little late to the party... I can't help but think this seems a bit inefficient for the common case. Would it be worth only copying the sv when its a glob or readonly? Something like the below? I tested a few more svtypes that were easy to make (code, regexp) and everything seems peachy. I'm not so concerned about elog() use, and anyway there the most common case surely will be passing a readonly string. I'm more concerned about all the other places we call sv2cstr(). SvTYPE(sv) == SVt_PVGV is what I was looking for in vain in the perl docs. So, yes, we should probably adjust this one more time, but ideally we need a better test than just SvREADONLY(). If you want to follow up your investigation of exactly when we need a copied SV and see how much you can narrow it down that would be great. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
On 5 January 2012 22:27, Tom Lane t...@sss.pgh.pa.us wrote: There is no compiler anywhere that implements always inline, unless you are talking about a macro. inline is a hint and nothing more, and if you think you can force it you are mistaken. So this controversy is easily resolved: we do not need any such construct. I'm slightly puzzled by your remarks here. GCC documentation on this is sparse (although, as I've demonstrated, I can get better numbers using the always_inline attribute on GCC 4.3), but take a look at this: http://msdn.microsoft.com/en-us/library/z8y1yy88.aspx While it is not strictly true to say that pg_always_inline could inline every function under every set of circumstances, it's pretty close to the truth. I do accept that this facility could quite easily be abused if its use isn't carefully measured. I also accept that C99/GNU C inline functions are generally just requests to the compiler that may be ignored (and indeed the compiler may inline without being asked to). It's short sighted to see this as a case of inlining itself making a big difference, so much as it making a big difference as an enabling transformation. The real question is whether we should accept a patch that is a performance loss when the compiler fails to inline some reasonably simple function. I think that would depend on the actual numbers involved, so we'd need to see data before making a decision. Who said anything about a performance loss? Since the raw improvement to qsort_arg is so large, it's pretty difficult to imagine a confluence of circumstances in which this results in a net loss. See the figures at http://archives.postgresql.org/pgsql-hackers/2011-11/msg01316.php , for example. The pg_always_inline idea is relatively recent. It just serves to provide additional encouragement to the compiler to inline, insofar as that is possible on the platform in question. The compiler's cost/benefit analysis cannot possibly appreciate how hot a code path qsort_arg is, because it has a set of generic heuristics that are quite fallible, and very probably are on quite conservative. We can appreciate such things though. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_restore direct to database is broken for --insert dumps
I wrote: Not easily: there could be newlines embedded in data strings or SQL identifiers. I'm not seeing any way around this except to restore the minimal lexing capability. One thing we could probably do is to restrict it to be used only when reading table data, and continue to assume that object creation commands can be emitted as-is. That would at least get us out of needing to parse dollar-quoted literals, which aren't used in the INSERT commands. Attached is a patch that restores a much-simplified version of the mini lexer; it deals only with the sorts of things that dumpTableData_insert actually emits. Fixing the standard_conforming_strings issue turns out to be really a one-liner, because the setting is in fact readily available to this code. Maybe we should have done it that way to begin with :-( ... though I admit to being glad to have gotten rid of the very questionable dollar-quote-recognition code that was there before. Barring objections to the approach, I'll apply and back-patch this tomorrow. regards, tom lane diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index d9edebb0f48308787bf263cd90504d3c5ed80e8b..234e50fb734573f9f3287d952821702a0838c4fd 100644 *** a/src/bin/pg_dump/pg_backup_archiver.c --- b/src/bin/pg_dump/pg_backup_archiver.c *** restore_toc_entry(ArchiveHandle *AH, Toc *** 620,639 if (te-copyStmt strlen(te-copyStmt) 0) { ahprintf(AH, %s, te-copyStmt); ! AH-writingCopyData = true; } (*AH-PrintTocDataPtr) (AH, te, ropt); /* * Terminate COPY if needed. */ ! if (AH-writingCopyData) ! { ! if (RestoringToDB(AH)) ! EndDBCopyMode(AH, te); ! AH-writingCopyData = false; ! } /* close out the transaction started above */ if (is_parallel te-created) --- 620,639 if (te-copyStmt strlen(te-copyStmt) 0) { ahprintf(AH, %s, te-copyStmt); ! AH-outputKind = OUTPUT_COPYDATA; } + else + AH-outputKind = OUTPUT_OTHERDATA; (*AH-PrintTocDataPtr) (AH, te, ropt); /* * Terminate COPY if needed. */ ! if (AH-outputKind == OUTPUT_COPYDATA ! RestoringToDB(AH)) ! EndDBCopyMode(AH, te); ! AH-outputKind = OUTPUT_SQLCMDS; /* close out the transaction started above */ if (is_parallel te-created) *** _allocAH(const char *FileSpec, const Arc *** 1975,1980 --- 1975,1982 AH-mode = mode; AH-compression = compression; + memset((AH-sqlparse), 0, sizeof(AH-sqlparse)); + /* Open stdout with no compression for AH output handle */ AH-gzOut = 0; AH-OF = stdout; *** CloneArchive(ArchiveHandle *AH) *** 4194,4200 clone = (ArchiveHandle *) pg_malloc(sizeof(ArchiveHandle)); memcpy(clone, AH, sizeof(ArchiveHandle)); ! /* Handle format-independent fields ... none at the moment */ /* The clone will have its own connection, so disregard connection state */ clone-connection = NULL; --- 4196,4203 clone = (ArchiveHandle *) pg_malloc(sizeof(ArchiveHandle)); memcpy(clone, AH, sizeof(ArchiveHandle)); ! /* Handle format-independent fields */ ! memset((clone-sqlparse), 0, sizeof(clone-sqlparse)); /* The clone will have its own connection, so disregard connection state */ clone-connection = NULL; *** DeCloneArchive(ArchiveHandle *AH) *** 4227,4233 /* Clear format-specific state */ (AH-DeClonePtr) (AH); ! /* Clear state allocated by CloneArchive ... none at the moment */ /* Clear any connection-local state */ if (AH-currUser) --- 4230,4238 /* Clear format-specific state */ (AH-DeClonePtr) (AH); ! /* Clear state allocated by CloneArchive */ ! if (AH-sqlparse.curCmd) ! destroyPQExpBuffer(AH-sqlparse.curCmd); /* Clear any connection-local state */ if (AH-currUser) diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h index 7a4fd360737ab3c48fdd8776879f804e0a1062a4..6dd5158ab4d9f7ee815e555029839190f335d9a1 100644 *** a/src/bin/pg_dump/pg_backup_archiver.h --- b/src/bin/pg_dump/pg_backup_archiver.h *** typedef size_t (*CustomOutPtr) (struct _ *** 134,139 --- 134,153 typedef enum { + SQL_SCAN = 0,/* normal */ + SQL_IN_SINGLE_QUOTE, /* '...' literal */ + SQL_IN_DOUBLE_QUOTE /* ... identifier */ + } sqlparseState; + + typedef struct + { + sqlparseState state; /* see above */ + bool backSlash; /* next char is backslash quoted? */ + PQExpBuffer curCmd; /* incomplete line (NULL if not created) */ + } sqlparseInfo; + + typedef enum + { STAGE_NONE = 0, STAGE_INITIALIZING, STAGE_PROCESSING, *** typedef enum *** 142,147 --- 156,168 typedef enum { + OUTPUT_SQLCMDS = 0, /* emitting general SQL commands */ + OUTPUT_COPYDATA, /*
Re: [HACKERS] random_page_cost vs seq_page_cost
On 1/5/12 3:00 PM, Jeremy Harris wrote: On 2012-01-05 10:04, Benedikt Grundmann wrote: I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. It'd be really nice if the DBMS measured actual experienced values.. Certainly it would. It would also require a whole lot of instrumentation. Feel free to write some ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Poorly thought out code in vacuum
Lately I have noticed buildfarm member jaguar occasionally failing regression tests with ERROR: invalid memory alloc request size 1073741824 during a VACUUM, as for example at http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguardt=2012-01-04%2023%3A05%3A02 Naturally I supposed it to be a consequence of the CLOBBER_CACHE_ALWAYS option, ie, somebody accessing an already-freed cache entry. I managed to duplicate and debug it here after an afternoon of trying, and it's not actually related to that at all, except perhaps to the extent that CLOBBER_CACHE_ALWAYS slows down some things enormously more than others. The failure comes when a ResourceOwner tries to enlarge its pinned-buffers array to more than 1GB, and that's not a typo: there are umpteen entries for the same buffer, whose PrivateRefCount is 134217727 and trying to go higher. A stack trace soon revealed the culprit, which is this change in lazy_vacuum_heap(): @@ -932,7 +965,8 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats) tblk = ItemPointerGetBlockNumber(vacrelstats-dead_tuples[tupindex]); buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL, vac_strategy); -LockBufferForCleanup(buf); +if (!ConditionalLockBufferForCleanup(buf)) +continue; tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats); /* Now that we've compacted the page, record its available space */ introduced in http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=bbb6e559c4ea0fb4c346beda76736451dc24eb4e The continue just results in another iteration of the containing tight loop, and of course that immediately re-pins the same buffer without having un-pinned it. So if someone else sits on their pin of the buffer for long enough, kaboom. We could fix the direct symptom by inserting UnlockReleaseBuffer() in front of the continue, but AFAICS that just makes this into a busy-waiting equivalent of waiting unconditionally, so I don't really see why we should not revert the above fragment altogether. However, I suppose Robert had something more intelligent in mind than a tight loop when the buffer can't be exclusively locked, so maybe there is some other change that should be made here instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
* Simon Riggs (si...@2ndquadrant.com) wrote: I discover that non-all-zeroes holes are fairly common, just not very frequent. Curious, might be interesting to find out why. That may or may not be a problem, but not something to be dealt with here and now. But I agree that it's not the job of this patch/effort. It sounds like we have clear indication, however, that those areas, as they are not necessairly all zeros, should be included in the checksum. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.
On Thu, Jan 5, 2012 at 16:59, Andrew Dunstan and...@dunslane.net wrote: On 01/05/2012 06:31 PM, Alex Hunsaker wrote: On Thu, Jan 5, 2012 at 16:02, Andrew Dunstanand...@dunslane.net wrote: Fix breakage from earlier plperl fix. I can't help but think this seems a bit inefficient So, yes, we should probably adjust this one more time, but ideally we need a better test than just SvREADONLY(). If you want to follow up your investigation of exactly when we need a copied SV and see how much you can narrow it down that would be great. After further digging I found it chokes on any non scalar (IOW any reference). I attached a simple c program that I tested with 5.8.9, 5.10.1, 5.12.4 and 5.14.2 (for those who did not know about it, perlbrew made testing across all those perls relatively painless). PFA that copies if its readonly and its not a scalar. Also I fixed up Tom's complaint about having sv2cstr() inside do_util_elog's PG_TRY block. I didn't bother fixing the ones in plperl.c tho-- some seemed like they would require quite a bit of rejiggering. I didn't bother adding regression tests-- should I have? *** a/src/pl/plperl/Util.xs --- b/src/pl/plperl/Util.xs *** *** 37,47 static void do_util_elog(int level, SV *msg) { MemoryContext oldcontext = CurrentMemoryContext; ! char * volatile cmsg = NULL; PG_TRY(); { - cmsg = sv2cstr(msg); elog(level, %s, cmsg); pfree(cmsg); } --- 37,46 do_util_elog(int level, SV *msg) { MemoryContext oldcontext = CurrentMemoryContext; ! char * volatile cmsg = sv2cstr(msg); PG_TRY(); { elog(level, %s, cmsg); pfree(cmsg); } *** a/src/pl/plperl/plperl_helpers.h --- b/src/pl/plperl/plperl_helpers.h *** *** 47,74 sv2cstr(SV *sv) { char *val, *res; STRLEN len; ! SV *nsv; /* * get a utf8 encoded char * out of perl. *note* it may not be valid utf8! * ! * SvPVutf8() croaks nastily on certain things, like typeglobs and ! * readonly objects such as $^V. That's a perl bug - it's not supposed to ! * happen. To avoid crashing the backend, we make a copy of the ! * sv before passing it to SvPVutf8(). The copy is garbage collected ! * when we're done with it. */ ! nsv = newSVsv(sv); ! val = SvPVutf8(nsv, len); /* * we use perl's length in the event we had an embedded null byte to ensure * we error out properly */ ! res = utf_u2e(val, len); /* safe now to garbage collect the new SV */ ! SvREFCNT_dec(nsv); return res; } --- 47,79 { char *val, *res; STRLEN len; ! svtype type = SvTYPE(sv); /* * get a utf8 encoded char * out of perl. *note* it may not be valid utf8! * ! * SvPVutf8() croaks nastily on readonly refs, That's a perl bug - it's not ! * supposed to happen. To avoid crashing the backend, we make a copy of the ! * sv before passing it to SvPVutf8(). */ ! if (SvREADONLY(sv) ! (type != SVt_IV || ! type != SVt_NV || ! type != SVt_PV)) ! sv = newSVsv(sv); ! else ! SvREFCNT_inc(sv); ! ! val = SvPVutf8(sv, len); /* * we use perl's length in the event we had an embedded null byte to ensure * we error out properly */ ! res = utf_u2e(val, len); /* safe now to garbage collect the new SV */ ! SvREFCNT_dec(sv); return res; } /* * compile with gcc -O2 -ggdb `perl -MExtUtils::Embed -e ccopts -e ldopts` svutf8_ro_test.c */ #include EXTERN.h #include perl.h int main(void) { char *embed[] = { , -e, 0 }; int x; AV *test; PerlInterpreter *perl; perl_construct(perl); perl_parse(perl, NULL, 3, embed, NULL); perl_run(perl); eval_pv(my $scalar = 'string'; @test = ( 'string', $scalar, \\$scalar, 1, 1.5, [], {}, $^V, , v5.0.0, sub {}, qr//, *STDIN, bless({}, ''), );, 1); test = get_av(test, 0); for(x=0; x=av_len(test); x++) { char *crap; STRLEN len; SV *sv = *av_fetch(test, x, 0); svtype type = SvTYPE(sv); SvREADONLY_on(sv); if (SvREADONLY(sv) type != SVt_IV || type != SVt_NV || type != SVt_PV) sv = newSVsv(sv); crap = SvPVutf8(sv, len); } perl_destruct(perl); perl_free(perl); return 0; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers