[HACKERS] patch: Use pg_mbcliplen for truncation in text-to-name conversion
The text_name function was truncating its input string to the first NAMEDATALEN-1 bytes, which is wrong if the string has multi-byte characters. This patch changes it to use pg_mbcliplen, following the namein function. diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index a5592d5..02fe9b4 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -2255,8 +2255,7 @@ text_name(PG_FUNCTION_ARGS) len = VARSIZE_ANY_EXHDR(s); /* Truncate oversize input */ - if (len = NAMEDATALEN) - len = NAMEDATALEN - 1; + len = pg_mbcliplen(VARDATA_ANY(s), len, NAMEDATALEN - 1); result = (Name) palloc(NAMEDATALEN); memcpy(NameStr(*result), VARDATA_ANY(s), len); diff --git a/src/test/regress/expected/name.out b/src/test/regress/expected/name.out index b359d52..f4b58f1 100644 --- a/src/test/regress/expected/name.out +++ b/src/test/regress/expected/name.out @@ -15,6 +15,19 @@ SELECT name 'name string' = name 'name string ' AS False; f (1 row) +-- name truncation with unicode characters +SELECT length(repeat(U'\0400', 32)::unknown::name::bytea, 'utf8') as length_from_unknown; + length_from_unknown +- + 31 +(1 row) + +SELECT length(repeat(U'\0400', 32)::text::name::bytea, 'utf8') as length_from_text; + length_from_text +-- + 31 +(1 row) + -- -- -- diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql index 1c7a671..9f7a5f0 100644 --- a/src/test/regress/sql/name.sql +++ b/src/test/regress/sql/name.sql @@ -8,6 +8,10 @@ SELECT name 'name string' = name 'name string' AS True; SELECT name 'name string' = name 'name string ' AS False; +-- name truncation with unicode characters +SELECT length(repeat(U'\0400', 32)::unknown::name::bytea, 'utf8') as length_from_unknown; +SELECT length(repeat(U'\0400', 32)::text::name::bytea, 'utf8') as length_from_text; + -- -- -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal: features for simpler integration psql to bash
Hello we don't have a macro language in psql. I did some work few years ago - see enhanced psql - http://postgres.cz/wiki/Enhanced-psql but important part of this experimental patch was not accepted (implementation was not in production quality code and was not finalized). Implementation of own macro language is relative difficult - execution in psql is complex because there is support for interactive multiline editing, there is support for history, ... actually we can use a lot of accessible languages - so any new interpret is not necessary. Interesting feature should better integration psql to bash, possibility to call bash scripts like custom backslash statements and similar. I have idea about following features: a) new output format that support simple result's reading in bash I would to simplify code: pavel ~ $ psql postgres -A -t --field-separator= \ -c copy (select 1,'Ahoj Svete', i from generate_series(1,3) g(i)) to stdout delimiter ' ' | \ while read var1 var2 var3; do echo a=$var1,b=$var2,c=$var3 ; done to pavel ~ $ psql postgres --format=bash \ -c select 1,'Ahoj Svete', i from generate_series(1,3) g(i)) ' | \ while read var1 var2 var3; do echo a=$var1,b=$var2,c=$var3 ; done b) possibility to store query result in psql variables \execute [query] into var1, var2, var c) enhancing communication protocol for support access to client and system variables We have well implementation of plpgsql - but usage of this language is difficult for scripting - there are no simple way how to parametrize code execution - I believe so we can solve this issue with following functions: * set_client_variable(varname, value) * set_client_system_variable(varname, value) * get_client_variable(varname, value) * get_client_system_variable(varname, value) these functions will be executed on server, but it enable access to client information and configuration - and enable simple parametrization of DO statement (via psql variables). What do you think about these ideas? Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statments queryid
On Thu, May 24, 2012 at 5:20 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 24 May 2012 16:06, Tom Lane t...@sss.pgh.pa.us wrote: I do not want to promise that it's stable over any timeframe longer than a server reboot. You already have though, since pg_stat_statements persistently stores statistics to disk by default, and can only ever recognise statement equivalence based on the (dbid, userid, queryid) hash key. Yes, if that's actually a problem, the whole way how pg_stat_statements stores it's data persistently across restarts needs to be rewritten. In a way that introduces an identifier that *is* stable across restarts. In which case we could just expose that identifier instead, and we're done. What exactly is it that could/would be unstable across a reboot? Not being stable across an initdb is of course a whole different story - I think it's perfectly reasonable not to be that. Aside from the OID dependence problem, we might well change the way the hash is calculated in minor releases, for example by adding or removing struct fields. You've already invalidated the saved statistics if you do that, so all bets are off anyway. If you have to do it, it'll be necessary to bump PGSS_FILE_HEADER, so that pg_stat_statements will be cleared upon restart. That will in turn necessitate documenting the issue in the minor version release notes. I'd hope to avoid that, but it doesn't seem to me that the situation is made any worse than before by exposing the value. On the contrary, it could help users to understand where the problem may have affected them. Agreed. We already break something very user-visible in this case. Two symptoms of the same breakage is really not that big an issue, IMO, compared to the big gains to be had. If you don't expose the value, users are going to do this sort of thing anyway, but will be far worse off due to using the query text or a hash thereof instead of the internal value. Exactly. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq URL syntax vs SQLAlchemy
Alex Shulgin a...@commandprompt.com writes: Upon closer inspection of the issue I came to believe that the proper fix is to drop support for special treatment of host part starting with slash altogether. Attached is a patch to do that. Well, I understand I might be asking for too much, but did anyone had a chance to look at the correcting patch? We're having the first CommitFest of 9.3 in three weeks, so this better be dealt with before it's too late. I believe the correcting patch makes our implementation comply to RFC 3986. I can produce a patch against 9.1 for improved readability: the removal of special handling of '/' at the start of URI created a bit of mess in the correcting patch, so it might be easier to look at the combined effect of the committed and this one. Comments please? -- Regards, Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck
On Thu, May 24, 2012 at 7:02 PM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, May 23, 2012 at 9:25 PM, Magnus Hagander mag...@hagander.net wrote: While reviewing and cleaning this patch up a bit I noticed it actually broke pg_receivexlog in the renaming. Here is a new version of the patch, reworked based on the above so we're down to a single callback. I moved the rename last segment file even if it's not complete to be a parameter into ReceiveXlogStream() instead of trying to overload a third functionality on the callback (which is what broke pg_receivexlog). How does this look? Have I overlooked any cases? Thanks for the patch! Looks good to me except the followings: pg_basebackup.c:233: warning: passing argument 6 of 'ReceiveXlogStream' from incompatible pointer type Hmm. I could've sworn I fixed that. I think I forgot to refresh the patch :-) It seems confusing that *stream_continue()* returns TRUE when streaming *cannot continue*, i.e., its name seems to be inconsistent with what it does. What about renaming it to stream_stop? That's a pre-existing issue, but agreed, I will rename it. Similarly, it also seems confusing that *continue_streaming()* returns TRUE when streaming *cannot continue*. Yeah, I renamed that one to stop_streaming as well. Will apply the updated version. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proclock table corrupted
Hi, We are encoutering the following error during normal operation of postgres. postgres[10982]: [2-1] PANIC: proclock table corrupted Version of Postgres : 9.0.3 Architecture : mips OS: RedHat Linux Can you please let me know if 'fix-strong-lock-cleanup.patch' and this error are related? Regards, Harshitha
Re: [HACKERS] heap metapages
On 24 May 2012 23:02, Bruce Momjian br...@momjian.us wrote: On Tue, May 22, 2012 at 09:52:30AM +0100, Simon Riggs wrote: Having pg_upgrade touch data files is both dangerous and difficult to back out in case of mistake, so I am wary of putting the metapage at block 0. Doing it the way I suggest means the .meta files would be wholly new and can be deleted as a back-out. We can also clean away any unnecessary .vm/.fsm files as a later step. Pg_upgrade never modifies the old cluster, except to lock it in link mode, so there is never anything to back out. Agreed. Robert's proposal was to make pg_upgrade modify the cluster, which I was observing wasn't a good plan. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 24, 2012 at 6:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote: On Thu, 24 May 2012, Jeff Janes wrote: Add #define LWLOCK_STATS near the top of: src/backend/storage/lmgr/lwlock.c and recompile and run a reduced-size workload. When the processes exits, they will dump a lot of data about LWLock usage to the logfile. Generally the LWLock with the most blocks on it will be the main culprit. Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted by blk. Not sure whether that's of much use or not: PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373 PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110 PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976 PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955 PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871 PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838 PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774 PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702 PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665 PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651 PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602 PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546 PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511 PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466 PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449 PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401 PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397 PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387 PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378 PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321 PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318 PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306 PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271 PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262 PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255 PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240 PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237 PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231 PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230 PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226 PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226 PID 7114 lwlock 0: shacq 0 exacq 14864 blk 219 PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215 PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215 PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214 PID 7111 lwlock 0: shacq 0 exacq 13638 blk 205 PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200 PID 7114 lwlock 7: shacq 0 exacq 547 blk 193 PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192 PID 7110 lwlock 0: shacq 0 exacq 16862 blk 191 PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183 PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183 PID 7113 lwlock 7: shacq 0 exacq 548 blk 178 PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178 PID 7115 lwlock 7: shacq 0 exacq 549 blk 177 PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177 PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177 PID 7112 lwlock 0: shacq 0 exacq 19538 blk 172 PID 7111 lwlock 7: shacq 0 exacq 549 blk 172 PID 7115 lwlock 17: shacq 0 exacq 9927 blk 167 PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167 PID 7112 lwlock 7: shacq 0 exacq 548 blk 166 PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165 PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163 PID 7110 lwlock 7: shacq 0 exacq 548 blk 162 PID 7109 lwlock 7: shacq 0 exacq 548 blk 161 PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158 PID 7112 lwlock 45: shacq 130843 exacq 1201 blk 154 PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153 PID 7111 lwlock 38: shacq 06 exacq 882 blk 152 PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151 PID 7108 lwlock 17: shacq 0 exacq 23165 blk 147 PID 7108 lwlock 7: shacq 0 exacq 549 blk 145 PID 7113 lwlock 0: shacq 0 exacq 13394 blk 142 PID 7112 lwlock 46: shacq 132972 exacq 1196 blk 142 PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140 PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140 PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139 PID 7114 lwlock 45: shacq 130818 exacq 903 blk 139 PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138 PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137 PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126 PID 7115 lwlock 0: shacq 0 exacq 3161 blk 125 PID 7110 lwlock 40: shacq 129577 exacq 1021 blk 122 PID 7113 lwlock 45: shacq 130833 exacq 814 blk 120 PID 7111 lwlock 17: shacq 0 exacq 4607 blk 119 PID 7109 lwlock 0: shacq 0 exacq 5711 blk 119 PID 7110 lwlock 45: shacq 130865 exacq 1001 blk 117 PID 7109 lwlock 38: shacq 40 exacq 322 blk 115 PID 7114 lwlock 46: shacq 132997 exacq 860 blk 112 PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112 PID 7112 lwlock 40: shacq 129528 exacq 1234 blk 111 PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109 PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109 PID 7112 lwlock 41: shacq 133600 exacq 1136 blk 109 PID 7111 lwlock 36: shacq 204675 exacq 830 blk 108 PID 7108 lwlock
Re: [HACKERS] pg_stat_statements temporary file
On Thu, May 24, 2012 at 2:19 PM, Magnus Hagander mag...@hagander.net wrote: On Thu, May 24, 2012 at 2:16 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 24 May 2012 12:42, Magnus Hagander mag...@hagander.net wrote: What actually happens if it tries to repalloc() something huge? palloc will throw an elog(ERROR), and since this happens during postmaster startup, are you sure it won't prevent the server from starting? Oh, yes, missed that. /* Previous incarnation might have had a larger query_size */ if (temp.query_len = buffer_size) { buffer = (char *) repalloc(buffer, temp.query_len + 1); buffer_size = temp.query_len + 1; } Here, temp receives its value from an fread(). This could probably be coded to be defensive against such things, but a better fix would be preferred. I have to wonder how much of a problem corruption is likely to be though, given that we only save to disk in a corresponding pgss_shmem_shutdown() call, which actually has more protections against corruption. The window for the saved file to be corrupt seems rather small, though I accept that a better window would be zero. Right. But writing to a temp file and rename()ing it into place is trivial. It's really the other issues raised that are bigger ;) Here's a patch that does the two easy fixes: 1) writes the file to a temp file and rename()s it over the main file as it writes down. This removes the (small) risk of corruption because of a crash during write 2) unlinks the file after reading it. this makes sure it's not included in online backups. I still think we should consider the placement of this file to not be in the global/ directory, but this is a quick (back-patchable) fix... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ pg_stat_statements_file.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, 25 May 2012, Merlin Moncure wrote: These are all on the buffer partition locks. That makes sense...I was wrong earlier: this case was in fact 'create table as', not 'insert select' which rules out both the freelist lock and the wal insert lock because create table as gets to use both a bulk insert strategy and wal avoiding logic (assuming wal log level is not 'archive or higher'. So, why aren't the lock partitions helping here? FYI the WAL level is default, and here are the relevand changed params from the .conf (autovacuum was disabled but the data was vacuumed manually). max_connections = 100 # (change requires restart) shared_buffers = 10GB # min 128kB work_mem = 500MB# min 64kB maintenance_work_mem = 500MB# min 1MB synchronous_commit = off# synchronization level; checkpoint_segments = 20# in logfile segments, min 1, 16MB each enable_hashjoin = off enable_mergejoin = off effective_cache_size = 10GB autovacuum = off# Enable autovacuum subprocess? 'on' Regards, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Make skipped sort explicit in EXPLAIN ANALYZE
Hi, This is just a small clarity improvement. tuplesort_performsort() skips sorting entirely when the result set has 0 or 1 tuples, but EXPLAIN still says it's using quicksort. The patch changes that to skipped For example: db=# explain analyze select * from now() order by 1; Sort (cost=0.02..0.03 rows=1 width=8) (actual time=0.126..0.126 rows=1 loops=1) Sort Key: now Sort Method: skipped Memory: 25kB - Function Scan on now (cost=0.00..0.01 rows=1 width=8) (actual time=0.032..0.033 rows=1 loops=1) Patch attached. Regards, Marti explain-sort-skipped.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proclock table corrupted
Harshitha S hershe...@gmail.com writes: We are encoutering the following error during normal operation of postgres. postgres[10982]: [2-1] PANIC: proclock table corrupted Ugh. Can you provide a reproducible test case? Version of Postgres : 9.0.3 Architecture : mips OS: RedHat Linux [ raised eyebrow... ] I've been working at Red Hat for ten years, and I'm pretty sure they have never shipped a MIPS-based distro in that time. So what is that OS really? Can you please let me know if 'fix-strong-lock-cleanup.patch' and this error are related? This is not an adequate identification of what patch you are talking about; but if you are speaking of something related to Robert Haas' fast-path locking code, that's not in 9.0.x. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make skipped sort explicit in EXPLAIN ANALYZE
Marti Raudsepp ma...@juffo.org writes: This is just a small clarity improvement. tuplesort_performsort() skips sorting entirely when the result set has 0 or 1 tuples, but EXPLAIN still says it's using quicksort. The patch changes that to skipped I'm not convinced this is an improvement; it's just one more value to confuse people with. And any implementation of quicksort is likely to have a fast exit path for 0 or 1 item, so I don't think the existing display is incorrect either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements temporary file
On 25 May 2012 14:13, Magnus Hagander mag...@hagander.net wrote: Here's a patch that does the two easy fixes: 1) writes the file to a temp file and rename()s it over the main file as it writes down. This removes the (small) risk of corruption because of a crash during write 2) unlinks the file after reading it. this makes sure it's not included in online backups. Seems reasonable. It might be better to consistently concatenate the string literals PGSS_DUMP_FILE and .tmp statically. Also, I'd have updated the string in the errmsg callsite after the error tag too, to refer to the tmp file rather than the file proper. Forgive the pedantry, but I should mention that I believe that it is project policy to not use squiggly parenthesis following an if expression when that is unnecessary due to there only being a single statement. I still think we should consider the placement of this file to not be in the global/ directory, but this is a quick (back-patchable) fix... Where do you suggest the file be written to? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements temporary file
On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote: I still think we should consider the placement of this file to not be in the global/ directory, but this is a quick (back-patchable) fix... Where do you suggest the file be written to? One could argue stats_temp_directory would be the correct place. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements temporary file
Peter Geoghegan pe...@2ndquadrant.com writes: On 25 May 2012 14:13, Magnus Hagander mag...@hagander.net wrote: I still think we should consider the placement of this file to not be in the global/ directory, but this is a quick (back-patchable) fix... Where do you suggest the file be written to? Given that pgstats keeps its permanent file in global/, I think the argument that pg_stat_statements should not do likewise is pretty thin. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make skipped sort explicit in EXPLAIN ANALYZE
On Fri, May 25, 2012 at 4:49 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not convinced this is an improvement; it's just one more value to confuse people with. I understand where you're coming from, but personally I think the current output is more confusing: Gee Postgres is stupid, it's sorting when there's nothing to sort! But let's wait for a third opinion. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade libraries check
pg_upgrade is a little over-keen about checking for shared libraries that back functions. In particular, it checks for libraries that support functions created in pg_catalog, even if pg_dump doesn't export the function. The attached patch mimics the filter that pg_dump uses for functions so that only the relevant libraries are checked. This would remove the need for a particularly ugly hack in making the 9.1 backport of JSON binary upgradeable. cheers andrew *** a/contrib/pg_upgrade/function.c --- b/contrib/pg_upgrade/function.c *** *** 11,16 --- 11,17 #include pg_upgrade.h + #include pqexpbuffer.h #include access/transam.h #define PG_UPGRADE_SUPPORT $libdir/pg_upgrade_support *** *** 141,157 get_loadable_libraries(void) { DbInfo *active_db = old_cluster.dbarr.dbs[dbnum]; PGconn *conn = connectToServer(old_cluster, active_db-db_name); ! /* Fetch all libraries referenced in this DB */ ! ress[dbnum] = executeQueryOrDie(conn, ! SELECT DISTINCT probin ! FROM pg_catalog.pg_proc ! WHERE prolang = 13 /* C */ AND ! probin IS NOT NULL AND ! oid = %u;, ! FirstNormalObjectId); totaltups += PQntuples(ress[dbnum]); PQfinish(conn); } --- 142,183 { DbInfo *active_db = old_cluster.dbarr.dbs[dbnum]; PGconn *conn = connectToServer(old_cluster, active_db-db_name); + PQExpBufferData query; ! initPQExpBuffer(query); ! ! /* ! * pg_dump doesn't export functions in pg_catalog unless (and only in ! * 9.1 and later) they are part of an extension. We therefore make ! * the same exclusions when choosing which libraries to test for. ! */ ! ! appendPQExpBufferStr(query, ! SELECT DISTINCT probin ! FROM pg_catalog.pg_proc p ! WHERE prolang = 13 /* C */ AND ! probin IS NOT NULL AND ! oid = %u AND ( ! pronamespace != ! (SELECT oid FROM pg_catalog.pg_namespace ! WHERE nspname = 'pg_catalog')); ! ! if (GET_MAJOR_VERSION(old_cluster.major_version) 901) ! appendPQExpBufferStr(query, ! OR EXISTS(SELECT 1 FROM pg_catalog.pg_depend ! WHERE classid = 'pg_proc'::regclass AND ! objid = p.oid AND ! refclassid = 'pg_extension'::regclass AND ! deptype = 'e')); ! ! appendPQExpBufferStr(query,)); ! ! /* Fetch all required libraries referenced in this DB */ ! ress[dbnum] = executeQueryOrDie(conn, query.data, FirstNormalObjectId); totaltups += PQntuples(ress[dbnum]); + termPQExpBuffer(query); + PQfinish(conn); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, May 25, 2012 at 8:06 AM, Merlin Moncure mmonc...@gmail.com wrote: On Thu, May 24, 2012 at 6:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote: On Thu, 24 May 2012, Jeff Janes wrote: Add #define LWLOCK_STATS near the top of: src/backend/storage/lmgr/lwlock.c and recompile and run a reduced-size workload. When the processes exits, they will dump a lot of data about LWLock usage to the logfile. Generally the LWLock with the most blocks on it will be the main culprit. Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted by blk. Not sure whether that's of much use or not: PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373 PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110 PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976 PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955 PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871 PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838 PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774 PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702 PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665 PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651 PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602 PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546 PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511 PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466 PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449 PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401 PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397 PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387 PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378 PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321 PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318 PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306 PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271 PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262 PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255 PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240 PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237 PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231 PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230 PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226 PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226 PID 7114 lwlock 0: shacq 0 exacq 14864 blk 219 PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215 PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215 PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214 PID 7111 lwlock 0: shacq 0 exacq 13638 blk 205 PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200 PID 7114 lwlock 7: shacq 0 exacq 547 blk 193 PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192 PID 7110 lwlock 0: shacq 0 exacq 16862 blk 191 PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183 PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183 PID 7113 lwlock 7: shacq 0 exacq 548 blk 178 PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178 PID 7115 lwlock 7: shacq 0 exacq 549 blk 177 PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177 PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177 PID 7112 lwlock 0: shacq 0 exacq 19538 blk 172 PID 7111 lwlock 7: shacq 0 exacq 549 blk 172 PID 7115 lwlock 17: shacq 0 exacq 9927 blk 167 PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167 PID 7112 lwlock 7: shacq 0 exacq 548 blk 166 PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165 PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163 PID 7110 lwlock 7: shacq 0 exacq 548 blk 162 PID 7109 lwlock 7: shacq 0 exacq 548 blk 161 PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158 PID 7112 lwlock 45: shacq 130843 exacq 1201 blk 154 PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153 PID 7111 lwlock 38: shacq 06 exacq 882 blk 152 PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151 PID 7108 lwlock 17: shacq 0 exacq 23165 blk 147 PID 7108 lwlock 7: shacq 0 exacq 549 blk 145 PID 7113 lwlock 0: shacq 0 exacq 13394 blk 142 PID 7112 lwlock 46: shacq 132972 exacq 1196 blk 142 PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140 PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140 PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139 PID 7114 lwlock 45: shacq 130818 exacq 903 blk 139 PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138 PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137 PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126 PID 7115 lwlock 0: shacq 0 exacq 3161 blk 125 PID 7110 lwlock 40: shacq 129577 exacq 1021 blk 122 PID 7113 lwlock 45: shacq 130833 exacq 814 blk 120 PID 7111 lwlock 17: shacq 0 exacq 4607 blk 119 PID 7109 lwlock 0: shacq 0 exacq 5711 blk 119 PID 7110 lwlock 45: shacq 130865 exacq 1001 blk 117 PID 7109 lwlock 38: shacq 40 exacq 322 blk 115 PID 7114 lwlock 46: shacq 132997 exacq 860 blk 112 PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112 PID 7112 lwlock 40: shacq 129528 exacq 1234 blk 111 PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109 PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109 PID 7112 lwlock 41: shacq 133600 exacq 1136 blk
Re: [HACKERS] [PATCH] Make skipped sort explicit in EXPLAIN ANALYZE
On 25 May 2012 15:19, Marti Raudsepp ma...@juffo.org wrote: I understand where you're coming from, but personally I think the current output is more confusing: Gee Postgres is stupid, it's sorting when there's nothing to sort! But let's wait for a third opinion. I agree with Tom. The idea that you sort when there's nothing to sort is not confusing. Textbook implementations of recursive sorting algorithms explicitly have a notion of sorting one element, by simply recognising that one element must already be sorted. For example, look at the quicksort pseudocode here: http://en.wikipedia.org/wiki/Quicksort -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements temporary file
On Fri, May 25, 2012 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: On 25 May 2012 14:13, Magnus Hagander mag...@hagander.net wrote: I still think we should consider the placement of this file to not be in the global/ directory, but this is a quick (back-patchable) fix... Where do you suggest the file be written to? Given that pgstats keeps its permanent file in global/, I think the argument that pg_stat_statements should not do likewise is pretty thin. Fair enough. As long as the file is unlinked after reading (per my patch), it doesn't cause issues on a standby anymore, so it's a lot less important, I guess. It's mostly namespace invasion at this time... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements temporary file
Magnus Hagander mag...@hagander.net writes: On Fri, May 25, 2012 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: Given that pgstats keeps its permanent file in global/, I think the argument that pg_stat_statements should not do likewise is pretty thin. Fair enough. As long as the file is unlinked after reading (per my patch), it doesn't cause issues on a standby anymore, so it's a lot less important, I guess. It's mostly namespace invasion at this time... Well, I could support moving both of those stats files someplace else, but it seems neatnik-ism more than something we have a definable need for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements temporary file
Andres Freund and...@2ndquadrant.com writes: On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote: Where do you suggest the file be written to? One could argue stats_temp_directory would be the correct place. No, that would be exactly the *wrong* place, because that directory can be on a RAM disk. We need to put this somewhere where it'll survive a shutdown. One could imagine creating a PGDATA subdirectory just for permanent (not temp) stats files, but right at the moment that seems like overkill. If we accumulate a few more similar files, I'd start to think it was worth doing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Merlin Moncure mmonc...@gmail.com writes: Hm, what if BufTableHashPartition() was pseudo randomized so that different backends would not get the same buffer partition for a particular tag? Huh? You have to make sure that different backends will find the same buffer for the same page, so I don't see how that can possibly work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interrupting long external library calls
On Thu, May 24, 2012 at 04:37:04PM +0200, Florian Pflug wrote: On May24, 2012, at 15:04 , Sandro Santilli wrote: On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote: On 16.05.2012 15:42, Sandro Santilli wrote: But CHECK_FOR_INTERRUPTS doesn't return, right ? Is there another macro for just checking w/out yet acting upon it ? Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending variable, but on Windows it also checks for UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's not totally certain that CHECK_FOR_INTERRUPTS() won't return. I think InterruptPending can be set spuriously (even if that's not possible today, I wouldn't rely on it), and if you're in a HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing even if InterruptPending is true. The only sane way to make 3rd party code interruptible is to add CHECK_FOR_INTERRUPTS() to it, in safe places. No place is safe if CHECK_FOR_INTERRUPTS doesn't return. How could caller code cleanup on interruption ? The postgres way is to use PG_TRY/PG_CATCH to make sure stuff gets cleaned up if an error or an interrupts occurs. You could use those to make the third-party library exception safe, but it'll probably be a quite invasive change :-(. Alternatively, you could replicate the check CHECK_FOR_INTERRUPTS() does, I ended up providing an explicit mechanism to request interruption of whatever the library is doing, and experimented (successfully so far) requesting the interruption from a SIGINT handler. Do you see any major drawback in doing so ? So far I installed the SIGINT handler within the library itself, but I guess it could be moved out instead to have ore fine-grained control over when to request interruption. Here's the code installing the signal handler within the library: https://github.com/strk/libgeos/commit/e820ecd0469b777953c132661877c2967b10cee2 --strk; ,--o-. | __/ |Delivering high quality PostGIS 2.0 ! | / 2.0 |http://strk.keybit.net - http://vizzuality.com `-o--' -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, May 25, 2012 at 10:22 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: Hm, what if BufTableHashPartition() was pseudo randomized so that different backends would not get the same buffer partition for a particular tag? Huh? You have to make sure that different backends will find the same buffer for the same page, so I don't see how that can possibly work. Right -- duh. Well, hm. Is this worth fixing? ISTM there's a bit of 'optimizing for pgbench-itis' in the buffer partitions -- they seem optimized to lever the mostly random access behavior of pgbench. But how likely is it to see multiple simultaneous scans in the real world? Interleaving scans like that is not a very effective optimization -- if it was me, it'd be trying to organize something around a partitioned tid scan for parallel sequential access. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
* Merlin Moncure (mmonc...@gmail.com) wrote: Right -- duh. Well, hm. Is this worth fixing? ISTM there's a bit of 'optimizing for pgbench-itis' in the buffer partitions -- they seem optimized to lever the mostly random access behavior of pgbench. But how likely is it to see multiple simultaneous scans in the real world? Interleaving scans like that is not a very effective optimization -- if it was me, it'd be trying to organize something around a partitioned tid scan for parallel sequential access. Didn't we implement a system whereby this is exactly what we intend to happen on the read side- that is, everyone doing a SeqScan gangs up on one ring buffer and follows it, which we felt was going to dramatically improve performance in some cases? Or is this completely different from that..? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 24, 2012 at 4:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote: On Thu, 24 May 2012, Jeff Janes wrote: Add #define LWLOCK_STATS near the top of: src/backend/storage/lmgr/lwlock.c and recompile and run a reduced-size workload. When the processes exits, they will dump a lot of data about LWLock usage to the logfile. Generally the LWLock with the most blocks on it will be the main culprit. Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted by blk. Not sure whether that's of much use or not: PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373 PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110 PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976 PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955 PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871 PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838 PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774 PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702 PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665 PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651 PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602 PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546 PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511 PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466 That is not an informative as I thought it would be (except to show WAL was not the issue). I'm guessing that 44 and 48 are the buffer mapping partitions which cover the root block of some highly used index. But just because those things are at the top of the list doesn't mean they are a problem. Something has to be at the top, and they don't dominate the total number of blocking they way I would expect them to if they were truly a substantial bottleneck. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, May 25, 2012 at 11:17 AM, Stephen Frost sfr...@snowman.net wrote: * Merlin Moncure (mmonc...@gmail.com) wrote: Right -- duh. Well, hm. Is this worth fixing? ISTM there's a bit of 'optimizing for pgbench-itis' in the buffer partitions -- they seem optimized to lever the mostly random access behavior of pgbench. But how likely is it to see multiple simultaneous scans in the real world? Interleaving scans like that is not a very effective optimization -- if it was me, it'd be trying to organize something around a partitioned tid scan for parallel sequential access. Didn't we implement a system whereby this is exactly what we intend to happen on the read side- that is, everyone doing a SeqScan gangs up on one ring buffer and follows it, which we felt was going to dramatically improve performance in some cases? yeah: /* * If the table is large relative to NBuffers, use a bulk-read access * strategy and enable synchronized scanning (see syncscan.c). Although * the thresholds for these features could be different, we make them the * same so that there are only two behaviors to tune rather than four. * (However, some callers need to be able to disable one or both of these * behaviors, independently of the size of the table; also there is a GUC * variable that can disable synchronized scanning.) * * During a rescan, don't make a new strategy object if we don't have to. */ if (!RelationUsesLocalBuffers(scan-rs_rd) scan-rs_nblocks NBuffers / 4) { allow_strat = scan-rs_allow_strat; allow_sync = scan-rs_allow_sync; } else allow_strat = allow_sync = false; if (allow_strat) { if (scan-rs_strategy == NULL) scan-rs_strategy = GetAccessStrategy(BAS_BULKREAD); } I wonder if the logic here is just being too strict... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interrupting long external library calls
Sandro Santilli s...@keybit.net writes: I ended up providing an explicit mechanism to request interruption of whatever the library is doing, and experimented (successfully so far) requesting the interruption from a SIGINT handler. Do you see any major drawback in doing so ? This seems a bit fragile. It might work all right in Postgres, where we tend to set up signal handlers just once at process start, but ISTM other systems might assume they can change their signal handlers at any time. The handler itself looks less than portable anyway --- what about the SIGINFO case? I assume that the geos::util::Interrupt::request() call sets a flag somewhere that's going to be periodically checked in long-running loops. Would it be possible for the periodic checks to include a provision for a callback into Postgres-specific glue code, wherein you could test the same flags CHECK_FOR_INTERRUPTS does? A similar approach might then be usable in other contexts, and it seems safer to me than messing with a host environment's signal handling. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Merlin Moncure mmonc...@gmail.com writes: On Fri, May 25, 2012 at 11:17 AM, Stephen Frost sfr...@snowman.net wrote: Didn't we implement a system whereby this is exactly what we intend to happen on the read side- that is, everyone doing a SeqScan gangs up on one ring buffer and follows it, which we felt was going to dramatically improve performance in some cases? yeah: ... I wonder if the logic here is just being too strict... I don't recall how much evidence there is behind the NBuffers/4 threshold. Maybe that needs some tuning? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, 25 May 2012, Merlin Moncure wrote: how likely is it to see multiple simultaneous scans in the real world? Interleaving scans like that is not a very effective optimization -- if it was me, it'd be trying to organize something around a partitioned tid scan for parallel sequential access. Regarding the real world use. I can say for myself is that the reason I'm running the queries like the ones I have in my test is that I have a large inflow of data every day, tens of gb, which has to be processed and since PG doesn't have any parallelization of queries built in, and running the processing in a single thread will take too long, I'm trying to work around that by manually running multiple threads in PG and trying to split the work among them. So not having scalability here is going to hurt us (but I admit that it's probably not very popular use-case for PG). Regards, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements temporary file
On 5/25/12 8:19 AM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote: Where do you suggest the file be written to? One could argue stats_temp_directory would be the correct place. No, that would be exactly the *wrong* place, because that directory can be on a RAM disk. We need to put this somewhere where it'll survive a shutdown. Mind you, I can imagine a busy system wanting to keep PSS on a ram disk as well. But users should be able to make that decision separately from the stats file. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements temporary file
On Fri, May 25, 2012 at 6:49 PM, Josh Berkus j...@agliodbs.com wrote: On 5/25/12 8:19 AM, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote: Where do you suggest the file be written to? One could argue stats_temp_directory would be the correct place. No, that would be exactly the *wrong* place, because that directory can be on a RAM disk. We need to put this somewhere where it'll survive a shutdown. Mind you, I can imagine a busy system wanting to keep PSS on a ram disk as well. But users should be able to make that decision separately from the stats file. Why would they want that? PSS only writes the tempfile on shutdown and reads it on startup. Unlike pgstats which reads and writes it all the time. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements temporary file
Why would they want that? PSS only writes the tempfile on shutdown and reads it on startup. Unlike pgstats which reads and writes it all the time. Ah, ok! Didn't know that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Thu, May 24, 2012 at 7:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote: Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted by blk. Not sure whether that's of much use or not: What are the top dozen or so entries if you sort by shacq? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On 5/24/12 2:34 PM, Peter Geoghegan wrote: On 21 May 2012 19:10, Josh Berkus j...@agliodbs.com wrote: For these reasons, it may be timely and appropriate, from a purely advocacy point-of-view, to call our new group commit group commit in release notes and documentation, and announce it as a new feature. First, shouldn't we be having this discussion on -advocacy? Well, no, because this is a specific discussion about release notes. True, but there's also the question of what we call this in the promotional materials. In any case, I've given up on the idea that we should market new group commit as group commit. I believe that that would be a useful and fair way of representing the feature, but there doesn't seem to be any support for that view. What else would you call it? What's wrong with Better Group Commit? From my perspective, it's pretty simple: we had group commit before, but the new group commit is much better. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, 25 May 2012, Robert Haas wrote: On Thu, May 24, 2012 at 7:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote: Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted by blk. Not sure whether that's of much use or not: What are the top dozen or so entries if you sort by shacq? Here it is: PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511 PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401 PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702 PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651 PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466 PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774 PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665 PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838 PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546 PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397 PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955 PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976 PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110 PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602 PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373 PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871 PID 7115 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7115 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7114 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7114 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7113 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7113 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7112 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7112 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7111 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7111 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7110 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7110 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7109 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7109 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7108 lwlock 1020: shacq 100 exacq 0 blk 0 PID 7108 lwlock 1018: shacq 100 exacq 0 blk 0 PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151 PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126 PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200 PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230 PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153 PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231 PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178 PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215 PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177 PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139 PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167 PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255 PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214 PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226 PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226 PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271 PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137 PID 7115 lwlock 38: shacq 333512 exacq 581 blk 106 PID 7109 lwlock 38: shacq 40 exacq 322 blk 115 PID 7111 lwlock 38: shacq 06 exacq 882 blk 152 PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183 PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165 PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163 PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183 PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237 PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138 PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215 PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140 PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192 PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262 PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318 PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240 PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177 PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109 PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378 PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158 PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321 PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306 PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387 PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449 PID 7108 lwlock 36: shacq 205287 exacq 1592 blk 78 PID 7115 lwlock 36: shacq 204845 exacq 535 blk 86 PID 7111 lwlock 36: shacq 204675 exacq 830 blk 108 PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109 PID 7109 lwlock 36: shacq 204581 exacq 363 blk 83 PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112 PID 7110 lwlock 36: shacq 204546 exacq 1014 blk 106 PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140 PID 7108 lwlock 35: shacq 136067 exacq 1537 blk 56 PID 7115 lwlock 35: shacq 135532 exacq 494 blk 47 PID 7111 lwlock 35: shacq 135404 exacq 820 blk 76 PID 7109 lwlock 35: shacq 135362 exacq 326 blk 56 PID 7114 lwlock 35: shacq 135342 exacq 879 blk 102 PID 7110 lwlock 35: shacq 135333 exacq 1016 blk 88 PID 7113 lwlock 35: shacq 135318 exacq 829 blk 75 PID 7112 lwlock 35: shacq 135257 exacq 1229 blk 98 PID 7108 lwlock 41: shacq 134338 exacq 1540 blk 55 PID 7115 lwlock 41: shacq 133929 exacq 542 blk 50 PID 7109 lwlock 41: shacq 133781 exacq 360 blk 49 PID 7111
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On Thu, May 24, 2012 at 4:52 AM, Magnus Hagander mag...@hagander.net wrote: On Wed, May 23, 2012 at 8:11 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, May 22, 2012 at 11:04 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, May 14, 2012 at 2:24 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander mag...@hagander.net wrote: Should we go down the easy way and just reject connections when the flag is mismatching between the client and the server (trivial to do - see the attached patch)? + char *tmpparam; You forgot to add const before char, which causes a compile-time warning. I went ahead and committed this, with this fix and a slight change to the message text. Thanks! Hope that's OK with everyone... What about calling PQfinish() before exit() to avoid unexpected EOF connection error? Patch attached. Makes sense, applied. Thanks! So, let's go back to the original problem: pg_receivexlog still doesn't work fine under --disable-integer-datetimes. I previously posted the patch which fixes that problem. http://archives.postgresql.org/message-id/CAHGQGwFutqnFPBYcHUCuoy1zMVDXto=o4ogsjrbwxw4zj2t...@mail.gmail.com Attached is the updated version of the patch. Comments? Regards, -- Fujii Masao timeout_handling_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, May 25, 2012 at 11:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Fri, May 25, 2012 at 11:17 AM, Stephen Frost sfr...@snowman.net wrote: Didn't we implement a system whereby this is exactly what we intend to happen on the read side- that is, everyone doing a SeqScan gangs up on one ring buffer and follows it, which we felt was going to dramatically improve performance in some cases? yeah: ... I wonder if the logic here is just being too strict... I don't recall how much evidence there is behind the NBuffers/4 threshold. Maybe that needs some tuning? probably. Sergey is going to get some numbers back from a hacked version that forces the scan to use a strategy. If that turns out to be the problem, I bet it'd make sense to hook the decision against ring size which is currently internal to freelist. In other words, since freelist is making the call to determine the size of the ring, it should also probably be making the call in terms of a particular input size being useful to apply a strategy to. Maybe something like IsAccessStrategyUseful(BufferAccessStrategyType btype, BlockNumber nblocks) ... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] 9.2beta1 regression: pg_restore --data-only does not set sequence values any more
On 05/21/2012 02:59 PM, Andrew Dunstan wrote: On 05/16/2012 10:23 AM, Andrew Dunstan wrote: On Wed, May 16, 2012 at 9:08 AM, Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us wrote: Martin Pitt mp...@debian.org mailto:mp...@debian.org writes: while packaging 9.2 beta 1 for Debian/Ubuntu the postgresql-common test suite noticed a regression: It seems that pg_restore --data-only now skips the current value of sequences, so that in the upgraded database the sequence counter is back to the default. I believe this is a consequence of commit a4cd6abcc901c1a8009c62a27f78696717bb8fe1, which introduced the entirely false assumption that --schema-only and --data-only have something to do with the order that entries appear in the archive ... Darn, will investigate. [cc -hackers] Well, the trouble is that we have these pesky SECTION_NONE entries for things like comments, security labels and ACLs that need to be dumped in the right section, so we can't totally ignore the order. But we could (and probably should) ignore the order for making decisions about everything BUT those entries. So, here's a revised plan: --section=data will dump exactly TABLE DATA, SEQUENCE SET or BLOBS entries --section=pre-data will dump SECTION_PRE_DATA items (other than SEQUENCE SET) plus any immediately following SECTION_NONE items. --section=post-data will dump everything else. It turns out there were some infelicities with pg_dump as well as with pg_restore. I think the attached patch does the right thing. I'll keep testing - I'll be happier if other people bang on it too. cheers andrew *** a/src/bin/pg_dump/pg_backup_archiver.c --- b/src/bin/pg_dump/pg_backup_archiver.c *** *** 2341,2354 _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls) if (!ropt-createDB strcmp(te-desc, DATABASE) == 0) return 0; ! /* skip (all but) post data section as required */ ! /* table data is filtered if necessary lower down */ if (ropt-dumpSections != DUMP_UNSECTIONED) { ! if (!(ropt-dumpSections DUMP_POST_DATA) te-inPostData) ! return 0; ! if (!(ropt-dumpSections DUMP_PRE_DATA) ! te-inPostData strcmp(te-desc, TABLE DATA) != 0) return 0; } --- 2341,2365 if (!ropt-createDB strcmp(te-desc, DATABASE) == 0) return 0; ! /* ! * Skip pre and post data section as required ! * Data is filtered if necessary lower down ! * Sequence set operations are in the pre data section for parallel ! * processing purposes, but part of the data section for sectioning ! * purposes. ! * SECTION_NONE items are filtered according to where they are ! * positioned in the list of TOC entries. ! */ if (ropt-dumpSections != DUMP_UNSECTIONED) { ! if (!(ropt-dumpSections DUMP_POST_DATA) /* post data skip */ ! ((te-section == SECTION_NONE te-inPostData) || ! te-section == SECTION_POST_DATA)) return 0; + if (!(ropt-dumpSections DUMP_PRE_DATA) /* pre data skip */ + ((te-section == SECTION_NONE ! te-inPostData) || + (te-section == SECTION_PRE_DATA strcmp(te-desc, SEQUENCE SET) != 0))) + return 0; } *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *** *** 7096,7101 dumpDumpableObject(Archive *fout, DumpableObject *dobj) --- 7096,7103 switch (dobj-objType) { + case DO_TABLE: + break; /* has its own controls */ case DO_INDEX: case DO_TRIGGER: case DO_CONSTRAINT: *** *** 12075,12081 dumpTable(Archive *fout, TableInfo *tbinfo) if (tbinfo-relkind == RELKIND_SEQUENCE) dumpSequence(fout, tbinfo); ! else if (!dataOnly) dumpTableSchema(fout, tbinfo); /* Handle the ACL here */ --- 12077,12083 if (tbinfo-relkind == RELKIND_SEQUENCE) dumpSequence(fout, tbinfo); ! else if (dumpSections DUMP_PRE_DATA) dumpTableSchema(fout, tbinfo); /* Handle the ACL here */ *** *** 13291,13297 dumpSequence(Archive *fout, TableInfo *tbinfo) * * Add a 'SETVAL(seq, last_val, iscalled)' as part of a data dump. */ ! if (!dataOnly) { /* * DROP must be fully qualified in case same name appears in --- 13293,13299 * * Add a 'SETVAL(seq, last_val, iscalled)' as part of a data dump. */ ! if (dumpSections DUMP_PRE_DATA) { /* * DROP must be fully qualified in case same name appears in *** *** 13412,13418 dumpSequence(Archive *fout, TableInfo *tbinfo) tbinfo-dobj.catId, 0, tbinfo-dobj.dumpId); } ! if (!schemaOnly) { resetPQExpBuffer(query); appendPQExpBuffer(query, SELECT pg_catalog.setval(); --- 13414,13420 tbinfo-dobj.catId, 0, tbinfo-dobj.dumpId); } ! if (dumpSections DUMP_DATA) { resetPQExpBuffer(query); appendPQExpBuffer(query, SELECT pg_catalog.setval(); -- Sent via pgsql-hackers mailing
Re: [HACKERS] Archiver not exiting upon crash
Jeff Janes jeff.ja...@gmail.com writes: So my test harness is an inexplicably effective show-case for the vulnerability, but it is not the reason the vulnerability should be fixed. I spent a bit of time looking into this. In principle the postmaster could be fixed to repeat the SIGQUIT signal every second or so, but it would be a rather considerable wart on code that's already overcomplicated for my taste (given that bugs in the postmaster are basically game over). An example of the problems is that the postmaster's own SIGQUIT signal handler presently just sends out the child signals and calls exit(). We'd need some new waiting for children to die state to allow for retransmission in such cases. As far as the postmaster is concerned, it would be a lot better if SIGQUIT could be considered reliable. This leads me to the thought that maybe we have to deprecate use of system() in Postgres backend code. If we simply fork and exec without touching the signal handling, ISTM that would do what we want, and the amount of added code would be pretty minimal (a lot less than would have to be added to the postmaster to address this the other way). However, I'm not too sure what would be required to make it go on Windows. Comments? Does the Windows emulation of system() even have this issue to begin with? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Foreground vacuum and buffer access strategy
If I invoke vacuum manually and do so with VacuumCostDelay == 0, I have basically declared my intentions to get this pain over with as fast as possible even if it might interfere with other processes. Under that condition, shouldn't it use BAS_BULKWRITE rather than BAS_VACUUM? The smaller ring size leads to a lot of synchronous WAL flushes which I think can slow the vacuum down a lot. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in new buffering GiST build code
On 22.05.2012 01:09, Alexander Korotkov wrote: Hi! On Tue, May 22, 2012 at 12:56 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The management of the path stacks is a bit complicated, anyway. I'll think about this some more tomorrow, maybe we can make it simpler, knowing that we have to do those extra lookups. WOW! You did enormous work on exploring that! I just arrived from PGCon and start looking at it when find you've already done comprehensive research of this problem. On the step 5 if we've NSN in GISTBufferingInsertStack structure, we could detect situation of changing parent of splitted page. Using this we could save copy of GISTBufferingInsertStack for B2 with original parent A, because we know split of B to occur after creating GISTBufferingInsertStack but before split of A. The question is how to find this copy from C, hash? I tested a patch that adds the extra getNodeBuffer() call after refinding the parent, as discussed. However, I'm still getting a failed to-refind parent error later in the build, so I think we're still missing some corner case. I think we should rewrite the way we track the parents completely. Rather than keep a path stack attached to every node buffer, let's just maintain a second hash table that contains the parent of every internal node. Whenever a downlink is moved to another page, update the hash table with the new information. That way we always have up-to-date information about the parent of every internal node. That's much easier to understand than the path stack structures we have now. I think the overall memory consumption will be about the same too. Although we need the extra hash table with one entry for every internal node, we get rid of the path stack structs, which are also one per every internal node at the moment. I believe it is faster too. I added some instrumentation to the existing gist code (with the additional getNodeBuffer() call added to fix this bug), to measure the time spent moving right, when refinding the parent of a page. I added gettimeofday() calls before and after moving right, and summed the total. In my test case, the final index size was about 19GB, and the index build took 3545 seconds (59 minutes). Of that time, 580 seconds (~ 10 minutes) was spent moving right to refind parents. That's a lot. I also printed a line whenever a refind operation had to move right 20 pages or more. That happened 2482 times during the build, in the worst case we moved right over 4 pages. Attached is a patch to replace the path stacks with a hash table. With this patch, the index build time in my test case dropped from 59 minutes to about 55 minutes. I don'ẗ know how representative or repeatable this test case is, but this definitely seems very worthwhile, not only because it fixes the bug and makes the code simpler, but also on performance grounds. Alexander, do you still have the test environments and data lying around that you used for GiST buffering testing last summer? Could you rerun some of those tests with this patch? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c index 988896d..9d9a031 100644 --- a/src/backend/access/gist/gistbuild.c +++ b/src/backend/access/gist/gistbuild.c @@ -55,16 +55,24 @@ typedef struct { Relation indexrel; GISTSTATE *giststate; - GISTBuildBuffers *gfbb; int64 indtuples; /* number of tuples indexed */ int64 indtuplesSize; /* total size of all indexed tuples */ Size freespace; /* amount of free space to leave on pages */ + /* + * Extra data structures used during a buffering build. GISTBuildBuffers + * contains information related to managing the build buffers. parentMap + * is a lookup table of the parent of each internal page. + */ + GISTBuildBuffers *gfbb; + HTAB *parentMap; + GistBufferingMode bufferingMode; } GISTBuildState; +/* prototypes for private functions */ static void gistInitBuffering(GISTBuildState *buildstate); static int calculatePagesPerBuffer(GISTBuildState *buildstate, int levelStep); static void gistBuildCallback(Relation index, @@ -76,18 +84,23 @@ static void gistBuildCallback(Relation index, static void gistBufferingBuildInsert(GISTBuildState *buildstate, IndexTuple itup); static bool gistProcessItup(GISTBuildState *buildstate, IndexTuple itup, -GISTBufferingInsertStack *startparent); +BlockNumber startblkno, int startlevel); static void gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer, IndexTuple *itup, int ntup, OffsetNumber oldoffnum, - GISTBufferingInsertStack *path); -static void gistBufferingFindCorrectParent(GISTBuildState *buildstate, - GISTBufferingInsertStack *child); + int level, BlockNumber parent); +static Buffer gistBufferingFindCorrectParent(GISTBuildState *buildstate, + BlockNumber
Re: [HACKERS] patch: Use pg_mbcliplen for truncation in text-to-name conversion
Karl Schnaitter karl...@gmail.com writes: The text_name function was truncating its input string to the first NAMEDATALEN-1 bytes, which is wrong if the string has multi-byte characters. This patch changes it to use pg_mbcliplen, following the namein function. Good catch, but poking around I note that bpchar_name has the same disease. Will fix, thanks for the report! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
2012/5/24 Robert Haas robertmh...@gmail.com: On Thu, May 24, 2012 at 6:11 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Perhaps when we see that RLS applies, we should replace the reference to the original table with a subquery RTE that has the security_barrier flag set - essentially treating a table with RLS as if it were a security view. I become to think it is a better approach than tracking origin of each qualifiers. One problem is case handling on update or delete statement. It may be possible to rewrite the update / delete query as follows: From: UPDATE tbl SET X = X + 1 WHERE f_leak(Y) To: UPDATE tbl SET X = X + 1 WHERE ctid = ( SELECT * FROM ( SELECT ctid FROM tbl WHERE uname = getpgusername() == (*) should have security-barrier ) AS tbl_subqry WHERE f_leak(Y) ); Expanded sub-queries will have security-barrier flag, so it enforces the uname = getpgusername() being checked earlier than f_leak(Y). We may need to measure the performance impact due to the reform. The problem with this is that it introduces an extra instance of tbl into the query - there are now two rather than one. UPDATE .. FROM is supposed to be a way to avoid this, but it's insufficiently general to handle all the cases (e.g. UPDATE a LEFT JOIN b can't be written using the existing syntax). Anyway we want to avoid inserting self-joins for performance reasons if at all possible. It should be easy to do that in the case of SELECT; UPDATE and DELETE may need a bit more work. I'll try to investigate a way to solve the matter without twice scan. Right now, I have no reasonable ideas. Please give us suggestion if you have something... I think, this situation is similar to a case when we reference a view without privileges to underlying tables. If Bob set up a view with something tricky function, it allows Bob to reference credentials of users who reference the view. More or less, it might be a problem when a user try to invoke a user defined function declared by others. (Thus, sepgsql policy does not allow users to invoke a function declared by another one in different domain; without DBA's checks.) This is true, but there are still some new threat models. For example, currently, pg_dump isn't going to run any user-defined code just because you do SELECT * FROM table, but that will change with this patch. Note that pg_dump need not actually select from views, only tables. I think it is a good idea not to apply RLS when current user has superuser privilege from perspective of security model consistency, but it is inconsistent to check privileges underlying tables. Seems like a somewhat random wart, if it's just an exception for superusers. I think we need to do better than that. For example, at my last company, sales reps A and B were permitted to see all customers of the company, but sales reps C, D, E, F, G, H, I, and J were permitted to see only their own accounts. Those sorts of policies need to be easy to implement. Probably, if sales_rep column records its responsible repo, its security policy is able to be described as: (my_sales_rep() in ('A', 'B') OR sales_rep = my_sales_rep()) Indeed, the design to check underlying table seems to me like column-level privileges towards table-level privileges, since it is checked only when user does not have requires privileges on whole of the table. However, I have no idea to modify ExecCheckRTEPerms() regarding to RLS. If we assume RLS is applied when user has no privileges on tables, the current ExecCheckRTEPerms() always raises an error towards unprivileged users, prior to execution of queries. Isn't it preferable behavior to allow unprivileged users to reference a table (or columns) when it has RLS policy? I think, table and column level privilege should be checked individually, in addition to row-level security policy. Another idea is to set things up so that the RLS policy function isn't applied to each row directly; instead, it's invoked once per query and *returns* a WHERE clause. This would be a lot more powerful than the proposed design, because now the table owner can write a function that imposes quals on some people but not others, which seems very useful. Sorry, I don't favor this idea. Even if table owner set up a function to generate additional qualifiers, it also has no guarantee the qualifiers are invoked prior to user-given one. It seems to me this approach will have same problem... It's not intended to solve the qual-ordering problem, just to allow additional policy flexibility. At the beginning, I thought it takes complex code to parse where-clause being provided as security policy, so it is the reason why I was inclined to give a function, instead of a clause. But I noticed we already have similar code at CreateTrigger() to handle it. Does it give policy flexibility? It's not clear to me that there is any need for built-in server functionality here. If the table
Re: [HACKERS] [RFC] Interface of Row Level Security
2012/5/24 Robert Haas robertmh...@gmail.com: On Thu, May 24, 2012 at 12:00 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to created references to rows which are invisible to you, or should FOREIGN KEY constraints be exempt from security policies? I'd say they shouldn't be, i.e. the policy WHERE clause should be added to constraint checking queries like usual. But maybe I'm missing some reason why that'd be undesirable… I agree. The row level security policy should not be applied during FK checks (or other internal stuff; to be harmless). At the previous discussion, it was issued that iteration of FK/PK proving enables malicious one to estimate existence of invisible tuple and its key value, although they cannot see the actual values. It is well documented limitation, thus, user should not use row- level security (or should not use natural key) if they cannot accept this limitation. You say I agree, but it seems to me that you and Florian are in fact taking opposite positions. Sorry, I misread what he described. FWIW, I'm inclined to think that you should NOT be able to create a row that references an invisible row. You might end up with that situation anyway, because we don't know what the semantics of the security policy are: rows might become visible or invisible after the fact, and we can't police that. But I think that if you take the opposite position that the select queries inside fkey triggers ought to be exempt from security policy, then you need to build some new mechanism to make that happen, which seems like extra work for no benefit. I think it is fair enough for RI_FKey_check_ins and RI_FKey_check_upd; RLS policy inside these trigger function will exhibit to create a row that references invisible row. However, it should not be applied on triggers being set on PK tables, because it allows to modify or delete primary-key being referenced by invisible foreign-key from the viewpoint of operators. I think, it makes sense to have exceptional cases; to make sure foreign-key constraint at the baseline. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Interface of Row Level Security
2012/5/25 Florian Pflug f...@phlo.org: On May24, 2012, at 19:25 , Robert Haas wrote: FWIW, I'm inclined to think that you should NOT be able to create a row that references an invisible row. You might end up with that situation anyway, because we don't know what the semantics of the security policy are: rows might become visible or invisible after the fact, and we can't police that. Right. I just realized, however, that there's another case which wasn't considered yet, which is how to handle the initial check during ALTER TABEL ADD CONSTRAINT. I'm thinking that it's fine to only consider visible rows in the parent table there too, but we should be checking all rows in the child table. The easiest way would be to restrict ALTER TABLE ADD CONSTRAINT to the table owner for tables with RLS (it seems that currently the REFERENCES privilege is sufficient), and make the table owner exempt from RLS on that table. The latter means you'd need at least two roles to use RLS, but anyone security-conscious enough to use RLS will probably not user the same role for DDL and DML operations anyway... I think, the RLS policy should perform as a view with qualifiers. It means database constraints have to be kept from the viewpoint of any clients, so, any orphan foreign-keys or any rows violating check constraint should not exist. In case when a user insert a row with foreign-key, it is an exceptional case. Even if he cannot insert a foreign-key that references invisible primary-key, it never breaks database constraints from the viewpoint of other folks. My standpoint deals with database constraints as first class customer that should be always kept in the lowest level, even though a part of results would be filtered out due to RLS. Isn't it a simple enough criteria? But I think that if you take the opposite position that the select queries inside fkey triggers ought to be exempt from security policy, then you need to build some new mechanism to make that happen, which seems like extra work for no benefit. Hm, interesting angle. Continuing this thought, without any extra work, UNIQUE and EXCLUSION constraints *will* be enforced regardless of row visibility, because their implementation isn't SPI-based but instead detects conflicts while inserting tuples into the index. For being so obviously inconsistent in its treatment of UNIQUE and EXCLUSION constraints vs. FK constraints, this feels surprisingly right. So, to prevent design by accident, here's an attempt to explain that divergence. For UNIQUE and EXCLUSION constraints, the most conservative assumption possible is that all rows are visible, since that leads to the most rejections. With that assumption, no matter what the actual policy is, the data returned by a query will always satisfy the constraint. Plus, the constraint is still sensible because it neither rejects nor allows all rows. So that conservative assumption is the one we make, i.e. we ignore RLS visibility when checking those kinds of constraints. For FK constraints, OTOH, the most conservative assumption is that no rows are visible. But that is meaningless, since it will simply reject all possible rows. Having thus no chance of enforcing the constraint ourselves under all possible policies, the best we can do is to at least make it possible for the constraint to work correctly for as many policies as possible. Now, if we go with KaiGai's suggestion of skipping RLS while checking FK constraints, the only policy that the constraint will work correctly for is one which doesn't actually hide any parent rows. Whereas if we apply RLS checks while checking FK constraints, all policies which behave consistently for parent and child rows (i.e. don't hide the former but show the latter) will work correctly. We thus go with the second option, since the class of working policies is larger. -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap metapages
On 5/22/12 12:09 PM, Simon Riggs wrote: On 22 May 2012 13:52, Robert Haasrobertmh...@gmail.com wrote: It seems pretty clear to me that making pg_upgrade responsible for emptying block zero is a non-starter. But I don't think that's a reason to throw out the design; I think it's a problem we can work around. I like your design better as well *if* you can explain how we can get to it. My proposal was a practical alternative that would allow the idea to proceed. It occurred to me that having a metapage with information useful to recovery operations in *every segment* would be useful; it certainly seems worth the extra block. It then occurred to me that we've basically been stuck with 2 places to store relation data; either at the relation level in pg_class or on each page. Sometimes neither one is a good fit. ISTM that a lot of problems we've faced in the past few years are because there's not a good abstraction between a (mostly) linear tuplespace and the physical storage that goes underneath it. - pg_upgrade progress is blocked because we can't deal with a new page that's BLKSZ - There's no good way to deal with table (or worse, index) bloat - There's no good way to add the concept of a heap metapage - Forks are being used to store data that might not belong there only because there's no other choice (visibility info) Would it make sense to take a step back and think about ways to abstract between logical tuplespace and physical storage? What if 1GB segments had their own metadata? Or groups of segments? Could certain operations that currently have to rewrite an entire table be changed so that they slowly moved pages from one group of segments to another, with a means of marking old pages as having been moved? Einstein said that problems cannot be solved by the same level of thinking that created them. Perhaps we're at the point where we need to take a step back from our current storage organization and look for a bigger picture? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backends stalled in 'startup' state: index corruption
Yeah, this is proof that what it was doing is the same as what we saw in Jeff's backtrace, ie loading up the system catalog relcache entries the hard way via seqscans on the core catalogs. So the question to be answered is why that's suddenly a big performance bottleneck. It's not a cheap operation of course (that's why we cache the results ;-)) but it shouldn't take minutes either. And, because they are seqscans, it doesn't seem like messed-up indexes should matter. FWIW, this appeared to be an all-or-nothing event: either every new backend was suffering through this, or none were. They all seemed to clear up at the same time as well. The theory I have in mind about Jeff's case is that it was basically an I/O storm, but it's not clear whether the same explanation works for your case. There may be some other contributing factor that we haven't identified yet. Let me know if you think of anything particular I can test while it is happening again. I'll try to arrange a (netapp) snapshot the next time it happens as well (this system is too busy and too large to do anything else). -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgpYJDovfAM7L.pgp Description: PGP signature
Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
On Fri, 25 May 2012, Merlin Moncure wrote: can you hack this in heapam.c and see if it helps? line 131-ish: if (!RelationUsesLocalBuffers(scan-rs_rd) scan-rs_nblocks NBuffers / 4) becomes if (!RelationUsesLocalBuffers(scan-rs_rd)) (also you can set the partition count back). The first few runs of my test with the suggested change made the multithreaded queries even slower by a factor of two. E.g. 44 seconds for multithreaded vs ~ 7sec single threaded. (with the default 9.2beta I get ~ 14-22sec for multithreaded run) But now after another few reruns of my test, i see again this variability in times for multithreaded runs. It went down to 18-22seconds. Then again to 44 sec. So overall the change you suggested either doesn't improve the situation or makes it worse . Regards, S * Sergey E. Koposov, PhD, Research Associate Institute of Astronomy, University of Cambridge Madingley road, CB3 0HA, Cambridge, UK Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backends stalled in 'startup' state: index corruption
Greg Sabino Mullane g...@endpoint.com writes: Yeah, this is proof that what it was doing is the same as what we saw in Jeff's backtrace, ie loading up the system catalog relcache entries the hard way via seqscans on the core catalogs. So the question to be answered is why that's suddenly a big performance bottleneck. It's not a cheap operation of course (that's why we cache the results ;-)) but it shouldn't take minutes either. And, because they are seqscans, it doesn't seem like messed-up indexes should matter. FWIW, this appeared to be an all-or-nothing event: either every new backend was suffering through this, or none were. They all seemed to clear up at the same time as well. Mostly not surprising. They'd definitely all hit the missing init file at the same time, so the stalling would start consistently for all. And once any one process successfully created a new file, subsequent incoming sessions wouldn't stall. However, the remaining processes trying to compute new init files would still have to complete the process, so I'd expect there to be a diminishing effect --- the ones that were stalling shouldn't all release exactly together. Unless there is some additional effect that's syncing them all. (I wonder for instance if the syncscan logic is kicking in here.) One interesting question is why there's a thundering herd of new arrivals in the first place. IIRC you said you were using a connection pooler. I wonder if it has a bug^H^H^Hdesign infelicity that makes it drop and reopen all its connections simultaneously. Let me know if you think of anything particular I can test while it is happening again. I just noticed something that might explain the persistency of the effect as observed by Jeff. The code that seqscans pg_attribute (in RelationBuildTupleDesc) knows how many rows it's expecting to find for a given catalog, and it falls out of the seqscan loop as soon as it's gotten them all. Now, the rows belonging to core system catalogs are generally right near the front of pg_attribute, being the oldest rows in that catalog, which means that generally this results in not having to seqscan very far even if pg_attribute is large. It strikes me though that there are at least two ways that nice property could get broken, resulting in much seqscan work if pg_attribute is large: 1. Somebody decides to update one of those rows, and it gets dropped in some remote region of the table. The only really plausible reason for this is deciding to fool with the column-specific stats target (attstattarget) of a system catalog. Does that sound like something either of you might have done? You could check it by looking at the ctid columns of the pg_attribute rows for system catalogs, and seeing if any have large block numbers. 2. If the syncscan logic were to kick in and cause some backend to pick up its seqscan of pg_attribute from a point beyond where some of the target rows are, that backend would have to scan all of pg_attribute, circling back around to the start, before it would find all the rows it seeks. And most likely this effect would lather-rinse-repeat for each catalog it's seeking the pg_attribute entries for. Not only does this mean a much-worse-than-normal startup time for that backend, but any other ones that arrive while the synchronized scan is in progress would be caught in the undertow and likewise spend a long time to get their results. So point 2 is suddenly looking like a really attractive theory for explaining what happened. As we were just remarking in an adjacent thread, the threshold for this to start happening would be for pg_attribute to get larger than one-fourth of shared_buffers; the syncscan logic doesn't kick in for relations smaller than that. IIRC this is close enough to the numbers Jeff mentioned to make it plausible that it happened to him, and plausible that his new installation has pg_attribute just enough smaller to avoid the scenario. Not sure about Greg's case, but he should be able to tell us the size of pg_attribute and his shared_buffers setting ... If this is the explanation, then it's easy enough to deal with point 2 --- just disable syncscan for these searches. I don't see an easy answer for problems of ilk #1, other than don't do that in a database that's been around for awhile. Another idea we might consider is to prevent the thundering herd effect by not having all the incoming backends try to update pg_internal.init independently. Instead let the first one take a lock while it's doing that, and the rest just queue up on that lock. They'd be stalled anyway, and they are not helping matters by duplicating the first one's effort. This is a rather more invasive change though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] heap metapages
On Fri, May 25, 2012 at 5:57 PM, Jim Nasby j...@nasby.net wrote: It occurred to me that having a metapage with information useful to recovery operations in *every segment* would be useful; it certainly seems worth the extra block. It then occurred to me that we've basically been stuck with 2 places to store relation data; either at the relation level in pg_class or on each page. Sometimes neither one is a good fit. AFAICS, having metadata in every segment is most only helpful for recovering from the situation where files have become disassociated from their filenames, i.e. database - lost+found. From the view point of virtually the entire server, the block number space is just a continuous sequence that starts at 0 and counts up forever (or, anyway, until 2^32-1). While it wouldn't be impossible to allow that knowledge to percolate up to other parts of the server, it would basically involve drilling a fairly arbitrary hole through an abstraction boundary that has been intact for a very long time, and it's not clear that there's anything magical about 1GB. Nonwithstanding the foregoing... ISTM that a lot of problems we've faced in the past few years are because there's not a good abstraction between a (mostly) linear tuplespace and the physical storage that goes underneath it. ...I agree with this. I'm not sure exactly what the replacement model would look like, but it's definitely worth some thought - e.g. perhaps there ought to be another mapping layer between logical block numbers and files on disk, so that we can effectively delete blocks out of the middle of a relation without requiring any special OS support, and so that we can multiplex many small relation forks onto a single physical file to minimize inode consumption. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] No, pg_size_pretty(numeric) was not such a hot idea
In 9.1: regression=# select pg_size_pretty(8*1024*1024); pg_size_pretty 8192 kB (1 row) In HEAD: regression=# select pg_size_pretty(8*1024*1024); ERROR: function pg_size_pretty(integer) is not unique LINE 1: select pg_size_pretty(8*1024*1024); ^ HINT: Could not choose a best candidate function. You might need to add explicit type casts. The argument for adding pg_size_pretty(numeric) was pretty darn thin in the first place, IMHO; it does not seem to me that it justified this loss of usability. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backends stalled in 'startup' state: index corruption
On May 25, 2012, at 4:02 PM, Tom Lane wrote: Greg Sabino Mullane g...@endpoint.com writes: Yeah, this is proof that what it was doing is the same as what we saw in Jeff's backtrace, ie loading up the system catalog relcache entries the hard way via seqscans on the core catalogs. So the question to be answered is why that's suddenly a big performance bottleneck. It's not a cheap operation of course (that's why we cache the results ;-)) but it shouldn't take minutes either. And, because they are seqscans, it doesn't seem like messed-up indexes should matter. FWIW, this appeared to be an all-or-nothing event: either every new backend was suffering through this, or none were. They all seemed to clear up at the same time as well. Mostly not surprising. They'd definitely all hit the missing init file at the same time, so the stalling would start consistently for all. And once any one process successfully created a new file, subsequent incoming sessions wouldn't stall. However, the remaining processes trying to compute new init files would still have to complete the process, so I'd expect there to be a diminishing effect --- the ones that were stalling shouldn't all release exactly together. Unless there is some additional effect that's syncing them all. (I wonder for instance if the syncscan logic is kicking in here.) In our customer's case, the size of pg_attribute was a little less than 1/4 of shared_buffers, so might not be the syncscan? BTW, In our case, I thought to take the system down to single user mode and reindex these. When the indexes were disabled, I immediately experienced the slow startup, so it certainly seems like an issue with seq scanning these. I'll see if i can reproduce that behavior by starting up with system indexes disabled. This probably won't happen until tuesday when we get that data directory moved to a test server. In our customer's case, it would happen for a while,then stop happening for some time...presumably this was after the caching, then it would start up again..presumably after something invalidated the cache. Switching from the master to the streaming replica made the situation better, but not go away. Then a full initdb solved the problem. I bet a vacuum full of pg_attribute would've done the trick though. 1. Somebody decides to update one of those rows, and it gets dropped in some remote region of the table. The only really plausible reason for this is deciding to fool with the column-specific stats target (attstattarget) of a system catalog. Does that sound like something either of you might have done? You could check it by looking at the ctid columns of the pg_attribute rows for system catalogs, and seeing if any have large block numbers. Definitely wasn't done by me and I'm pretty sure the customer wouldn't have done that either. --- Jeff Frost j...@pgexperts.com CTO, PostgreSQL Experts, Inc. Phone: 1-888-PG-EXPRT x506 FAX: 415-762-5122 http://www.pgexperts.com/
Re: [HACKERS] Backends stalled in 'startup' state: index corruption
Jeff Frost j...@pgexperts.com writes: In our customer's case, the size of pg_attribute was a little less than 1/4 of shared_buffers, so might not be the syncscan? Could you go back and double check that? If the shared_buffers setting were 7GB not 8GB, that would fall right between the pg_attribute sizes you posted before. I'm getting somewhat convinced that this is the right answer, because I've been able to reproduce an unexpectedly long stall with multiple clients connecting simultaneously to an init-file-less database whose pg_attribute is large enough to trigger syncscans. The particular case I'm testing has pg_attribute of about 1GB (in a machine with just 4GB RAM, so I'm not going to push it up much further). If I just remove the init file and connect with psql, there's about a 1-second startup delay, which is bad enough; but if I throw 50 concurrent connections at it with a hacked-up version of pgbench, it takes about 45 seconds for all of them to get through startup. (You need to hack pgbench to suppress the single initialization connection it normally likes to make, else the test degenerates to the one-incoming-connection case.) I think that a big chunk of this is coming from the syncscan logic defeating the early-exit optimization in RelationBuildTupleDesc; but it still seems like there's another inefficiency somewhere. Maybe our syncscan logic just really sucks given enough backends trying to piggyback on the same syncscan. Now that I can reproduce it, I'll take a closer look at that. In the meantime, it looks like a trivial workaround is to disable synchronize_seqscans via postgresql.conf. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-Database Roles
On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian br...@momjian.us wrote: On Tue, May 22, 2012 at 10:19:12AM -0400, Robert Haas wrote: In retrospect, I think the idea of shared catalogs was probably a bad idea. I think we should have made roles and tablespaces database objects rather than shared objects, and come up with some ad-hoc method of representing the set of available databases. But that decision seems to have been made sometime pre-1996, so the thought of changing it now is pretty painful, but I can dream... Yes, pre-1996. I think the fact that authentication/user names appear in pg_hba.conf really locked the user name idea into global objects, and we have never really been able to make a dent in that. Eh? Why would the presence of usernames in pg_hba.conf mean that they have to be global objects? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-Database Roles
* Robert Haas (robertmh...@gmail.com) wrote: On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian br...@momjian.us wrote: Yes, pre-1996. I think the fact that authentication/user names appear in pg_hba.conf really locked the user name idea into global objects, and we have never really been able to make a dent in that. Eh? Why would the presence of usernames in pg_hba.conf mean that they have to be global objects? I havn't had a chance (yet) to look, but perhaps the current code attempts to validate the role before figuring out what database is being requested? We'd have to essentially invert that, of course, for this.. One thing I was wondering about is if we're going to have an issue supporting things like tell me what databases exist (psql -l), which connect to the 'postgres' by default, for local-only roles. I'm not sure that I actually care, to be honest, but it's something to consider. I don't think we should require users to create every local role also in postgres, nor do I feel that we should allow connections to postgres by any role, nor do I want to break tools which use 'postgres' to basically get access to shared catalogs- but I don't see an immediate or easy solution.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_upgrade libraries check
On Fri, May 25, 2012 at 10:20:29AM -0400, Andrew Dunstan wrote: pg_upgrade is a little over-keen about checking for shared libraries that back functions. In particular, it checks for libraries that support functions created in pg_catalog, even if pg_dump doesn't export the function. The attached patch mimics the filter that pg_dump uses for functions so that only the relevant libraries are checked. This would remove the need for a particularly ugly hack in making the 9.1 backport of JSON binary upgradeable. Andrew is right that pg_upgrade is overly restrictive in checking _any_ shared object file referenced in pg_proc. I never expected that pg_catalog would have such references, but in Andrew's case it does, and pg_dump doesn't dump them, so I guess pg_upgrade shouldn't check them either. In some sense this is a hack for the JSON type, but it also gives users a way to create shared object references in old clusters that are _not_ checked by pg_upgrade, and not migrated to the new server, so I suppose it is fine. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-Database Roles
On Fri, May 25, 2012 at 10:34:54PM -0400, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian br...@momjian.us wrote: Yes, pre-1996. I think the fact that authentication/user names appear in pg_hba.conf really locked the user name idea into global objects, and we have never really been able to make a dent in that. Eh? Why would the presence of usernames in pg_hba.conf mean that they have to be global objects? I havn't had a chance (yet) to look, but perhaps the current code attempts to validate the role before figuring out what database is being requested? We'd have to essentially invert that, of course, for this.. One thing I was wondering about is if we're going to have an issue supporting things like tell me what databases exist (psql -l), which connect to the 'postgres' by default, for local-only roles. I'm not sure that I actually care, to be honest, but it's something to consider. I don't think we should require users to create every local role also in postgres, nor do I feel that we should allow connections to postgres by any role, nor do I want to break tools which use 'postgres' to basically get access to shared catalogs- but I don't see an immediate or easy solution.. Yes. In a simple case, you have a username, you want to validate it against LDAP or kerberos --- how do you partition the external authentication tool based on database name? Seems like an obvious problem to me. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backends stalled in 'startup' state: index corruption
On May 25, 2012, at 7:12 PM, Tom Lane wrote: Jeff Frost j...@pgexperts.com writes: In our customer's case, the size of pg_attribute was a little less than 1/4 of shared_buffers, so might not be the syncscan? Could you go back and double check that? If the shared_buffers setting were 7GB not 8GB, It definitely started happening with 8GB of shared_buffers. We actually tried reducing shared_buffers to 2GB to see if we were writing out too many dirty buffers at checkpoint time or something, but that had no effect, so we put it back to 8GB. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-Database Roles
Stephen Frost sfr...@snowman.net writes: * Robert Haas (robertmh...@gmail.com) wrote: Eh? Why would the presence of usernames in pg_hba.conf mean that they have to be global objects? I havn't had a chance (yet) to look, but perhaps the current code attempts to validate the role before figuring out what database is being requested? We'd have to essentially invert that, of course, for this.. Even more to the point, what do you do when the database column is all, or a list of more than one database name? It's possible that we could define this away by saying that only globally known usernames can be listed in pg_hba.conf, but I think we'll still have implementation problems with doing authentication for per-database usernames. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea
On Sat, May 26, 2012 at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: In 9.1: regression=# select pg_size_pretty(8*1024*1024); pg_size_pretty 8192 kB (1 row) In HEAD: regression=# select pg_size_pretty(8*1024*1024); ERROR: function pg_size_pretty(integer) is not unique LINE 1: select pg_size_pretty(8*1024*1024); ^ HINT: Could not choose a best candidate function. You might need to add explicit type casts. The argument for adding pg_size_pretty(numeric) was pretty darn thin in the first place, IMHO; it does not seem to me that it justified this loss of usability. Ouch! But removing pg_size_pretty(numeric) causes another usability issue, e.g., pg_size_pretty(pg_xlog_location_diff(...)) fails. So how about removing pg_size_pretty(bigint) to resolve those two issues? I guess pg_size_pretty(numeric) is a bit slower than bigint version, but I don't think that such a bit slowdown of pg_size_pretty() becomes a matter practically. No? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers