[HACKERS] patch: Use pg_mbcliplen for truncation in text-to-name conversion

2012-05-25 Thread Karl Schnaitter

The text_name function was truncating its input string to the first
NAMEDATALEN-1 bytes, which is wrong if the string has multi-byte
characters. This patch changes it to use pg_mbcliplen, following
the namein function.
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index a5592d5..02fe9b4 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -2255,8 +2255,7 @@ text_name(PG_FUNCTION_ARGS)
len = VARSIZE_ANY_EXHDR(s);
 
/* Truncate oversize input */
-   if (len = NAMEDATALEN)
-   len = NAMEDATALEN - 1;
+   len = pg_mbcliplen(VARDATA_ANY(s), len, NAMEDATALEN - 1);
 
result = (Name) palloc(NAMEDATALEN);
memcpy(NameStr(*result), VARDATA_ANY(s), len);
diff --git a/src/test/regress/expected/name.out 
b/src/test/regress/expected/name.out
index b359d52..f4b58f1 100644
--- a/src/test/regress/expected/name.out
+++ b/src/test/regress/expected/name.out
@@ -15,6 +15,19 @@ SELECT name 'name string' = name 'name string ' AS False;
  f
 (1 row)
 
+-- name truncation with unicode characters
+SELECT length(repeat(U'\0400', 32)::unknown::name::bytea, 'utf8') as 
length_from_unknown;
+ length_from_unknown 
+-
+  31
+(1 row)
+
+SELECT length(repeat(U'\0400', 32)::text::name::bytea, 'utf8') as 
length_from_text;
+ length_from_text 
+--
+   31
+(1 row)
+
 --
 --
 --
diff --git a/src/test/regress/sql/name.sql b/src/test/regress/sql/name.sql
index 1c7a671..9f7a5f0 100644
--- a/src/test/regress/sql/name.sql
+++ b/src/test/regress/sql/name.sql
@@ -8,6 +8,10 @@ SELECT name 'name string' = name 'name string' AS True;
 
 SELECT name 'name string' = name 'name string ' AS False;
 
+-- name truncation with unicode characters
+SELECT length(repeat(U'\0400', 32)::unknown::name::bytea, 'utf8') as 
length_from_unknown;
+SELECT length(repeat(U'\0400', 32)::text::name::bytea, 'utf8') as 
length_from_text;
+
 --
 --
 --

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: features for simpler integration psql to bash

2012-05-25 Thread Pavel Stehule
Hello

we don't have a macro language in psql. I did some work few years ago
- see enhanced psql - http://postgres.cz/wiki/Enhanced-psql but
important part of this  experimental patch was not accepted
(implementation was not in production quality code and was not
finalized).

Implementation of own macro language is relative difficult - execution
in psql is complex because there is support for interactive multiline
editing, there is support for history, ... actually  we can use a lot
of accessible languages  - so any new interpret is not necessary.
Interesting feature should better integration psql to bash,
possibility to call bash scripts like custom backslash statements and
similar. I have idea about following features:

a) new output format that support simple result's reading in bash

I would to simplify code:

pavel ~ $ psql postgres -A -t --field-separator=  \
   -c copy (select 1,'Ahoj Svete', i
from generate_series(1,3) g(i))
 to stdout delimiter ' ' | \
 while read var1 var2 var3;
 do
echo a=$var1,b=$var2,c=$var3 ;
 done

to

pavel ~ $ psql postgres --format=bash \
   -c select 1,'Ahoj Svete', i from generate_series(1,3) g(i)) ' | \
 while read var1 var2 var3;
 do
echo a=$var1,b=$var2,c=$var3 ;
 done

b) possibility to store query result in psql variables

\execute [query] into var1, var2, var


c) enhancing communication protocol for support access to client and
system variables

We have well implementation of plpgsql - but usage of this language is
difficult for scripting - there are no simple way how to parametrize
code execution - I believe so we can solve this issue with following
functions:

* set_client_variable(varname, value)
* set_client_system_variable(varname, value)
* get_client_variable(varname, value)
* get_client_system_variable(varname, value)

these functions will be executed on server, but it enable access to
client information and configuration - and enable simple
parametrization of DO statement (via psql variables).

What do you think about these ideas?

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statments queryid

2012-05-25 Thread Magnus Hagander
On Thu, May 24, 2012 at 5:20 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 24 May 2012 16:06, Tom Lane t...@sss.pgh.pa.us wrote:
 I do not want to promise that it's stable over any timeframe longer than
 a server reboot.

 You already have though, since pg_stat_statements persistently stores
 statistics to disk by default, and can only ever recognise statement
 equivalence based on the (dbid, userid, queryid) hash key.

Yes, if that's actually a problem, the whole way how
pg_stat_statements stores it's data persistently across restarts needs
to be rewritten. In a way that introduces an identifier that *is*
stable across restarts. In which case we could just expose that
identifier instead, and we're done.

What exactly is it that could/would be unstable across a reboot?

Not being stable across an initdb is of course a whole different story
- I think it's perfectly reasonable not to be that.



 Aside from the OID dependence problem, we might well
 change the way the hash is calculated in minor releases, for example by
 adding or removing struct fields.

 You've already invalidated the saved statistics if you do that, so all
 bets are off anyway. If you have to do it, it'll be necessary to bump
 PGSS_FILE_HEADER, so that pg_stat_statements will be cleared upon
 restart. That will in turn necessitate documenting the issue in the
 minor version release notes. I'd hope to avoid that, but it doesn't
 seem to me that the situation is made any worse than before by
 exposing the value. On the contrary, it could help users to understand
 where the problem may have affected them.

Agreed. We already break something very user-visible in this case. Two
symptoms of the same breakage is really not that big an issue, IMO,
compared to the big gains to be had.


 If you don't expose the value, users are going to do this sort of
 thing anyway, but will be far worse off due to using the query text or
 a hash thereof instead of the internal value.

Exactly.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq URL syntax vs SQLAlchemy

2012-05-25 Thread Alex

Alex Shulgin a...@commandprompt.com writes:

 Upon closer inspection of the issue I came to believe that the proper
 fix is to drop support for special treatment of host part starting
 with slash altogether.

 Attached is a patch to do that.

Well, I understand I might be asking for too much, but did anyone had a
chance to look at the correcting patch?  We're having the first
CommitFest of 9.3 in three weeks, so this better be dealt with before
it's too late.

I believe the correcting patch makes our implementation comply to RFC
3986.

I can produce a patch against 9.1 for improved readability: the removal
of special handling of '/' at the start of URI created a bit of mess in
the correcting patch, so it might be easier to look at the combined
effect of the committed and this one.

Comments please?

--
Regards,
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck

2012-05-25 Thread Magnus Hagander
On Thu, May 24, 2012 at 7:02 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, May 23, 2012 at 9:25 PM, Magnus Hagander mag...@hagander.net wrote:
 While reviewing and cleaning this patch up a bit I noticed it actually
 broke pg_receivexlog in the renaming.

 Here is a new version of the patch, reworked based on the above so
 we're down to a single callback. I moved the rename last segment file
 even if it's not complete to be a parameter into ReceiveXlogStream()
 instead of trying to overload a third functionality on the callback
 (which is what broke pg_receivexlog).

 How does this look? Have I overlooked any cases?

 Thanks for the patch! Looks good to me except the followings:

    pg_basebackup.c:233: warning: passing argument 6 of
 'ReceiveXlogStream' from incompatible pointer type

Hmm. I could've sworn I fixed that. I think I forgot to refresh the patch :-)

 It seems confusing that *stream_continue()* returns TRUE when
 streaming *cannot continue*, i.e.,
 its name seems to be inconsistent with what it does. What about
 renaming it to stream_stop?

That's a pre-existing issue, but agreed, I will rename it.

 Similarly, it also seems confusing that *continue_streaming()* returns
 TRUE when streaming
 *cannot continue*.

Yeah, I renamed that one to stop_streaming as well.

Will apply the updated version.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proclock table corrupted

2012-05-25 Thread Harshitha S
Hi,

We are encoutering the following error during normal operation of postgres.
 postgres[10982]: [2-1] PANIC:  proclock table corrupted

Version of Postgres : 9.0.3
Architecture : mips
OS: RedHat Linux

Can you please let me know if 'fix-strong-lock-cleanup.patch' and this
error are related?

Regards,
Harshitha


Re: [HACKERS] heap metapages

2012-05-25 Thread Simon Riggs
On 24 May 2012 23:02, Bruce Momjian br...@momjian.us wrote:
 On Tue, May 22, 2012 at 09:52:30AM +0100, Simon Riggs wrote:
 Having pg_upgrade touch data files is both dangerous and difficult to
 back out in case of mistake, so I am wary of putting the metapage at
 block 0. Doing it the way I suggest means the .meta files would be
 wholly new and can be deleted as a back-out. We can also clean away
 any unnecessary .vm/.fsm files as a later step.

 Pg_upgrade never modifies the old cluster, except to lock it in link
 mode, so there is never anything to back out.

Agreed. Robert's proposal was to make pg_upgrade modify the cluster,
which I was observing wasn't a good plan.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Merlin Moncure
On Thu, May 24, 2012 at 6:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 24 May 2012, Jeff Janes wrote:

 Add
 #define LWLOCK_STATS
 near the top of:
 src/backend/storage/lmgr/lwlock.c

 and recompile and run a reduced-size workload.  When the processes
 exits, they will dump a lot of data about LWLock usage to the logfile.
 Generally the LWLock with the most blocks on it will be the main
 culprit.


 Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted
 by blk. Not sure whether that's of much use or not:

 PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373
 PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110
 PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976
 PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955
 PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871
 PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838
 PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774
 PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702
 PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665
 PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651
 PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602
 PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546
 PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511
 PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466
 PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449
 PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401
 PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397
 PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387
 PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378
 PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321
 PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318
 PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306
 PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271
 PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262
 PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255
 PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240
 PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237
 PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231
 PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230
 PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226
 PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226
 PID 7114 lwlock 0: shacq 0 exacq 14864 blk 219
 PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215
 PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215
 PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214
 PID 7111 lwlock 0: shacq 0 exacq 13638 blk 205
 PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200
 PID 7114 lwlock 7: shacq 0 exacq 547 blk 193
 PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192
 PID 7110 lwlock 0: shacq 0 exacq 16862 blk 191
 PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183
 PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183
 PID 7113 lwlock 7: shacq 0 exacq 548 blk 178
 PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178
 PID 7115 lwlock 7: shacq 0 exacq 549 blk 177
 PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177
 PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177
 PID 7112 lwlock 0: shacq 0 exacq 19538 blk 172
 PID 7111 lwlock 7: shacq 0 exacq 549 blk 172
 PID 7115 lwlock 17: shacq 0 exacq 9927 blk 167
 PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167
 PID 7112 lwlock 7: shacq 0 exacq 548 blk 166
 PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165
 PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163
 PID 7110 lwlock 7: shacq 0 exacq 548 blk 162
 PID 7109 lwlock 7: shacq 0 exacq 548 blk 161
 PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158
 PID 7112 lwlock 45: shacq 130843 exacq 1201 blk 154
 PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153
 PID 7111 lwlock 38: shacq 06 exacq 882 blk 152
 PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151
 PID 7108 lwlock 17: shacq 0 exacq 23165 blk 147
 PID 7108 lwlock 7: shacq 0 exacq 549 blk 145
 PID 7113 lwlock 0: shacq 0 exacq 13394 blk 142
 PID 7112 lwlock 46: shacq 132972 exacq 1196 blk 142
 PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140
 PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140
 PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139
 PID 7114 lwlock 45: shacq 130818 exacq 903 blk 139
 PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138
 PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137
 PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126
 PID 7115 lwlock 0: shacq 0 exacq 3161 blk 125
 PID 7110 lwlock 40: shacq 129577 exacq 1021 blk 122
 PID 7113 lwlock 45: shacq 130833 exacq 814 blk 120
 PID 7111 lwlock 17: shacq 0 exacq 4607 blk 119
 PID 7109 lwlock 0: shacq 0 exacq 5711 blk 119
 PID 7110 lwlock 45: shacq 130865 exacq 1001 blk 117
 PID 7109 lwlock 38: shacq 40 exacq 322 blk 115
 PID 7114 lwlock 46: shacq 132997 exacq 860 blk 112
 PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112
 PID 7112 lwlock 40: shacq 129528 exacq 1234 blk 111
 PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109
 PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109
 PID 7112 lwlock 41: shacq 133600 exacq 1136 blk 109
 PID 7111 lwlock 36: shacq 204675 exacq 830 blk 108
 PID 7108 lwlock 

Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Magnus Hagander
On Thu, May 24, 2012 at 2:19 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, May 24, 2012 at 2:16 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 On 24 May 2012 12:42, Magnus Hagander mag...@hagander.net wrote:
 What actually happens if it tries to repalloc() something huge? palloc
 will throw an elog(ERROR), and since this happens during postmaster
 startup, are you sure it won't prevent the server from starting?

 Oh, yes, missed that.

                /* Previous incarnation might have had a larger query_size */
                if (temp.query_len = buffer_size)
                {
                        buffer = (char *) repalloc(buffer, temp.query_len + 
 1);
                        buffer_size = temp.query_len + 1;
                }

 Here, temp receives its value from an fread().

 This could probably be coded to be defensive against such things, but
 a better fix would be preferred. I have to wonder how much of a
 problem corruption is likely to be though, given that we only save to
 disk in a corresponding pgss_shmem_shutdown() call, which actually has
 more protections against corruption. The window for the saved file to
 be corrupt seems rather small, though I accept that a better window
 would be zero.

 Right. But writing to a temp file and rename()ing it into place is trivial.

 It's really the other issues raised that are bigger ;)

Here's a patch that does the two easy fixes:
1) writes the file to a temp file and rename()s it over the main file
as it writes down. This removes the (small) risk of corruption because
of a crash during write

2) unlinks the file after reading it. this makes sure it's not
included in online backups.

I still think we should consider the placement of this file to not be
in the global/ directory, but this is a quick (back-patchable) fix...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


pg_stat_statements_file.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Sergey Koposov

On Fri, 25 May 2012, Merlin Moncure wrote:


These are all on the buffer partition locks.  That makes sense...I was
wrong earlier: this case was in fact 'create table as', not 'insert
select' which rules out both the freelist lock and the wal insert lock
because create table as gets to use both a bulk insert strategy and
wal avoiding logic (assuming wal log level is not 'archive or higher'.
So, why aren't the lock partitions helping here?


FYI the WAL level is default, and here are the relevand changed params 
from the .conf (autovacuum was disabled but the data was vacuumed 
manually).


max_connections = 100   # (change requires restart)
shared_buffers = 10GB   # min 128kB
work_mem = 500MB# min 64kB
maintenance_work_mem = 500MB# min 1MB
synchronous_commit = off# synchronization level;
checkpoint_segments = 20# in logfile segments, min 1, 16MB each
enable_hashjoin = off
enable_mergejoin = off
effective_cache_size = 10GB
autovacuum = off# Enable autovacuum subprocess?  'on'


Regards,
S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Make skipped sort explicit in EXPLAIN ANALYZE

2012-05-25 Thread Marti Raudsepp
Hi,

This is just a small clarity improvement. tuplesort_performsort()
skips sorting entirely when the result set has 0 or 1 tuples, but
EXPLAIN still says it's using quicksort. The patch changes that to
skipped

For example:

db=# explain analyze select * from now() order by 1;
 Sort  (cost=0.02..0.03 rows=1 width=8) (actual time=0.126..0.126
rows=1 loops=1)
   Sort Key: now
   Sort Method: skipped  Memory: 25kB
   -  Function Scan on now  (cost=0.00..0.01 rows=1 width=8) (actual
time=0.032..0.033 rows=1 loops=1)

Patch attached.

Regards,
Marti


explain-sort-skipped.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proclock table corrupted

2012-05-25 Thread Tom Lane
Harshitha S hershe...@gmail.com writes:
 We are encoutering the following error during normal operation of postgres.
  postgres[10982]: [2-1] PANIC:  proclock table corrupted

Ugh.  Can you provide a reproducible test case?

 Version of Postgres : 9.0.3
 Architecture : mips
 OS: RedHat Linux

[ raised eyebrow... ]  I've been working at Red Hat for ten years, and
I'm pretty sure they have never shipped a MIPS-based distro in that time.
So what is that OS really?

 Can you please let me know if 'fix-strong-lock-cleanup.patch' and this
 error are related?

This is not an adequate identification of what patch you are talking
about; but if you are speaking of something related to Robert Haas'
fast-path locking code, that's not in 9.0.x.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make skipped sort explicit in EXPLAIN ANALYZE

2012-05-25 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 This is just a small clarity improvement. tuplesort_performsort()
 skips sorting entirely when the result set has 0 or 1 tuples, but
 EXPLAIN still says it's using quicksort. The patch changes that to
 skipped

I'm not convinced this is an improvement; it's just one more value
to confuse people with.  And any implementation of quicksort is likely
to have a fast exit path for 0 or 1 item, so I don't think the existing
display is incorrect either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Peter Geoghegan
On 25 May 2012 14:13, Magnus Hagander mag...@hagander.net wrote:
 Here's a patch that does the two easy fixes:
 1) writes the file to a temp file and rename()s it over the main file
 as it writes down. This removes the (small) risk of corruption because
 of a crash during write

 2) unlinks the file after reading it. this makes sure it's not
 included in online backups.

Seems reasonable. It might be better to consistently concatenate the
string literals PGSS_DUMP_FILE and .tmp statically. Also, I'd have
updated the string in the errmsg callsite after the error tag too,
to refer to the tmp file rather than the file proper. Forgive the
pedantry, but I should mention that I believe that it is project
policy to not use squiggly parenthesis following an if expression when
that is unnecessary due to there only being a single statement.

 I still think we should consider the placement of this file to not be
 in the global/ directory, but this is a quick (back-patchable) fix...

Where do you suggest the file be written to?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Andres Freund
On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:
  I still think we should consider the placement of this file to not be
  in the global/ directory, but this is a quick (back-patchable) fix...
 
 Where do you suggest the file be written to?
One could argue stats_temp_directory would be the correct place.

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 25 May 2012 14:13, Magnus Hagander mag...@hagander.net wrote:
 I still think we should consider the placement of this file to not be
 in the global/ directory, but this is a quick (back-patchable) fix...

 Where do you suggest the file be written to?

Given that pgstats keeps its permanent file in global/, I think the
argument that pg_stat_statements should not do likewise is pretty thin.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make skipped sort explicit in EXPLAIN ANALYZE

2012-05-25 Thread Marti Raudsepp
On Fri, May 25, 2012 at 4:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not convinced this is an improvement; it's just one more value
 to confuse people with.

I understand where you're coming from, but personally I think the
current output is more confusing: Gee Postgres is stupid, it's
sorting when there's nothing to sort!

But let's wait for a third opinion.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade libraries check

2012-05-25 Thread Andrew Dunstan
pg_upgrade is a little over-keen about checking for shared libraries 
that back functions. In particular, it checks for libraries that support 
functions created in pg_catalog, even if pg_dump doesn't export the 
function.


The attached patch mimics the filter that pg_dump uses for functions so 
that only the relevant libraries are checked.


This would remove the need for a particularly ugly hack in making the 
9.1 backport of JSON binary upgradeable.


cheers

andrew
*** a/contrib/pg_upgrade/function.c
--- b/contrib/pg_upgrade/function.c
***
*** 11,16 
--- 11,17 
  
  #include pg_upgrade.h
  
+ #include pqexpbuffer.h
  #include access/transam.h
  
  #define PG_UPGRADE_SUPPORT	$libdir/pg_upgrade_support
***
*** 141,157  get_loadable_libraries(void)
  	{
  		DbInfo	   *active_db = old_cluster.dbarr.dbs[dbnum];
  		PGconn	   *conn = connectToServer(old_cluster, active_db-db_name);
  
! 		/* Fetch all libraries referenced in this DB */
! 		ress[dbnum] = executeQueryOrDie(conn,
! 		SELECT DISTINCT probin 
! 		FROM	pg_catalog.pg_proc 
! 		WHERE	prolang = 13 /* C */ AND 
! 		probin IS NOT NULL AND 
! 		oid = %u;,
! 		FirstNormalObjectId);
  		totaltups += PQntuples(ress[dbnum]);
  
  		PQfinish(conn);
  	}
  
--- 142,183 
  	{
  		DbInfo	   *active_db = old_cluster.dbarr.dbs[dbnum];
  		PGconn	   *conn = connectToServer(old_cluster, active_db-db_name);
+ 		PQExpBufferData query;
  
! 		initPQExpBuffer(query);
! 
! 		/*
! 		 * pg_dump doesn't export functions in pg_catalog unless (and only in 
! 		 * 9.1 and later) they are part of an extension. We therefore make 
! 		 * the same exclusions when choosing which libraries to test for.
! 		 */
! 
! 		appendPQExpBufferStr(query,
! 			 SELECT DISTINCT probin 
! 			 FROM	pg_catalog.pg_proc p 
! 			 WHERE	prolang = 13 /* C */ AND
! 			 probin IS NOT NULL AND 
! 			 oid = %u AND (
! 			 pronamespace != 
! 			 (SELECT oid FROM pg_catalog.pg_namespace 
! 			 WHERE nspname = 'pg_catalog'));
! 
! 		if (GET_MAJOR_VERSION(old_cluster.major_version)  901)
! 			appendPQExpBufferStr(query,
!  OR EXISTS(SELECT 1 FROM pg_catalog.pg_depend 
!  WHERE classid = 'pg_proc'::regclass AND 
!  objid = p.oid AND 
!  refclassid = 'pg_extension'::regclass AND 
!  deptype = 'e'));
! 
! 		appendPQExpBufferStr(query,));
! 
! 		/* Fetch all required libraries referenced in this DB */
! 		ress[dbnum] = executeQueryOrDie(conn, query.data, FirstNormalObjectId);
  		totaltups += PQntuples(ress[dbnum]);
  
+ 		termPQExpBuffer(query);
+ 
  		PQfinish(conn);
  	}
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Merlin Moncure
On Fri, May 25, 2012 at 8:06 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, May 24, 2012 at 6:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 24 May 2012, Jeff Janes wrote:

 Add
 #define LWLOCK_STATS
 near the top of:
 src/backend/storage/lmgr/lwlock.c

 and recompile and run a reduced-size workload.  When the processes
 exits, they will dump a lot of data about LWLock usage to the logfile.
 Generally the LWLock with the most blocks on it will be the main
 culprit.


 Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted
 by blk. Not sure whether that's of much use or not:

 PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373
 PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110
 PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976
 PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955
 PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871
 PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838
 PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774
 PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702
 PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665
 PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651
 PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602
 PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546
 PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511
 PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466
 PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449
 PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401
 PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397
 PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387
 PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378
 PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321
 PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318
 PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306
 PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271
 PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262
 PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255
 PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240
 PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237
 PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231
 PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230
 PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226
 PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226
 PID 7114 lwlock 0: shacq 0 exacq 14864 blk 219
 PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215
 PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215
 PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214
 PID 7111 lwlock 0: shacq 0 exacq 13638 blk 205
 PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200
 PID 7114 lwlock 7: shacq 0 exacq 547 blk 193
 PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192
 PID 7110 lwlock 0: shacq 0 exacq 16862 blk 191
 PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183
 PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183
 PID 7113 lwlock 7: shacq 0 exacq 548 blk 178
 PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178
 PID 7115 lwlock 7: shacq 0 exacq 549 blk 177
 PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177
 PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177
 PID 7112 lwlock 0: shacq 0 exacq 19538 blk 172
 PID 7111 lwlock 7: shacq 0 exacq 549 blk 172
 PID 7115 lwlock 17: shacq 0 exacq 9927 blk 167
 PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167
 PID 7112 lwlock 7: shacq 0 exacq 548 blk 166
 PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165
 PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163
 PID 7110 lwlock 7: shacq 0 exacq 548 blk 162
 PID 7109 lwlock 7: shacq 0 exacq 548 blk 161
 PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158
 PID 7112 lwlock 45: shacq 130843 exacq 1201 blk 154
 PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153
 PID 7111 lwlock 38: shacq 06 exacq 882 blk 152
 PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151
 PID 7108 lwlock 17: shacq 0 exacq 23165 blk 147
 PID 7108 lwlock 7: shacq 0 exacq 549 blk 145
 PID 7113 lwlock 0: shacq 0 exacq 13394 blk 142
 PID 7112 lwlock 46: shacq 132972 exacq 1196 blk 142
 PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140
 PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140
 PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139
 PID 7114 lwlock 45: shacq 130818 exacq 903 blk 139
 PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138
 PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137
 PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126
 PID 7115 lwlock 0: shacq 0 exacq 3161 blk 125
 PID 7110 lwlock 40: shacq 129577 exacq 1021 blk 122
 PID 7113 lwlock 45: shacq 130833 exacq 814 blk 120
 PID 7111 lwlock 17: shacq 0 exacq 4607 blk 119
 PID 7109 lwlock 0: shacq 0 exacq 5711 blk 119
 PID 7110 lwlock 45: shacq 130865 exacq 1001 blk 117
 PID 7109 lwlock 38: shacq 40 exacq 322 blk 115
 PID 7114 lwlock 46: shacq 132997 exacq 860 blk 112
 PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112
 PID 7112 lwlock 40: shacq 129528 exacq 1234 blk 111
 PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109
 PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109
 PID 7112 lwlock 41: shacq 133600 exacq 1136 blk 

Re: [HACKERS] [PATCH] Make skipped sort explicit in EXPLAIN ANALYZE

2012-05-25 Thread Peter Geoghegan
On 25 May 2012 15:19, Marti Raudsepp ma...@juffo.org wrote:
 I understand where you're coming from, but personally I think the
 current output is more confusing: Gee Postgres is stupid, it's
 sorting when there's nothing to sort!

 But let's wait for a third opinion.

I agree with Tom. The idea that you sort when there's nothing to
sort is not confusing. Textbook implementations of recursive sorting
algorithms explicitly have a notion of sorting one element, by simply
recognising that one element must already be sorted. For example, look
at the quicksort pseudocode here:

http://en.wikipedia.org/wiki/Quicksort

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Magnus Hagander
On Fri, May 25, 2012 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan pe...@2ndquadrant.com writes:
 On 25 May 2012 14:13, Magnus Hagander mag...@hagander.net wrote:
 I still think we should consider the placement of this file to not be
 in the global/ directory, but this is a quick (back-patchable) fix...

 Where do you suggest the file be written to?

 Given that pgstats keeps its permanent file in global/, I think the
 argument that pg_stat_statements should not do likewise is pretty thin.

Fair enough. As long as the file is unlinked after reading (per my
patch), it doesn't cause issues on a standby anymore, so it's a lot
less important, I guess. It's mostly namespace invasion at this
time...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Fri, May 25, 2012 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Given that pgstats keeps its permanent file in global/, I think the
 argument that pg_stat_statements should not do likewise is pretty thin.

 Fair enough. As long as the file is unlinked after reading (per my
 patch), it doesn't cause issues on a standby anymore, so it's a lot
 less important, I guess. It's mostly namespace invasion at this
 time...

Well, I could support moving both of those stats files someplace else,
but it seems neatnik-ism more than something we have a definable need
for.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:
 Where do you suggest the file be written to?

 One could argue stats_temp_directory would be the correct place.

No, that would be exactly the *wrong* place, because that directory can
be on a RAM disk.  We need to put this somewhere where it'll survive
a shutdown.

One could imagine creating a PGDATA subdirectory just for permanent (not
temp) stats files, but right at the moment that seems like overkill.
If we accumulate a few more similar files, I'd start to think it was
worth doing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 Hm, what if BufTableHashPartition() was pseudo randomized so that
 different backends would not get the same buffer partition for a
 particular tag?

Huh?  You have to make sure that different backends will find the same
buffer for the same page, so I don't see how that can possibly work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interrupting long external library calls

2012-05-25 Thread Sandro Santilli
On Thu, May 24, 2012 at 04:37:04PM +0200, Florian Pflug wrote:
 On May24, 2012, at 15:04 , Sandro Santilli wrote:
  On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote:
  On 16.05.2012 15:42, Sandro Santilli wrote:
  But CHECK_FOR_INTERRUPTS doesn't return, right ?
  Is there another macro for just checking w/out yet acting upon it ?
  
  Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending
  variable, but on Windows it also checks for
  UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's
  not totally certain that CHECK_FOR_INTERRUPTS() won't return. I
  think InterruptPending can be set spuriously (even if that's not
  possible today, I wouldn't rely on it), and if you're in a
  HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing
  even if InterruptPending is true.
  
  The only sane way to make 3rd party code interruptible is to add
  CHECK_FOR_INTERRUPTS() to it, in safe places.
  
  No place is safe if CHECK_FOR_INTERRUPTS doesn't return.
  How could caller code cleanup on interruption ?
 
 The postgres way is to use PG_TRY/PG_CATCH to make sure stuff gets cleaned
 up if an error or an interrupts occurs. You could use those to make the
 third-party library exception safe, but it'll probably be a quite
 invasive change :-(.
 
 Alternatively, you could replicate the check CHECK_FOR_INTERRUPTS() does,

I ended up providing an explicit mechanism to request interruption of
whatever the library is doing, and experimented (successfully so far)
requesting the interruption from a SIGINT handler.

Do you see any major drawback in doing so ?

So far I installed the SIGINT handler within the library itself, but
I guess it could be moved out instead to have ore fine-grained control
over when to request interruption.

Here's the code installing the signal handler within the library:
https://github.com/strk/libgeos/commit/e820ecd0469b777953c132661877c2967b10cee2

--strk; 

  ,--o-. 
  |   __/  |Delivering high quality PostGIS 2.0 !
  |  / 2.0 |http://strk.keybit.net - http://vizzuality.com
  `-o--'


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Merlin Moncure
On Fri, May 25, 2012 at 10:22 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 Hm, what if BufTableHashPartition() was pseudo randomized so that
 different backends would not get the same buffer partition for a
 particular tag?

 Huh?  You have to make sure that different backends will find the same
 buffer for the same page, so I don't see how that can possibly work.

Right -- duh.  Well, hm.  Is this worth fixing?  ISTM there's a bit of
'optimizing for pgbench-itis' in the buffer partitions -- they seem
optimized to lever the mostly random access behavior of pgbench.  But
how likely is it to see multiple simultaneous scans in the real world?
 Interleaving scans like that is not a very effective optimization --
if it was me, it'd be trying to organize something around a
partitioned tid scan for parallel sequential access.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Stephen Frost
* Merlin Moncure (mmonc...@gmail.com) wrote:
 Right -- duh.  Well, hm.  Is this worth fixing?  ISTM there's a bit of
 'optimizing for pgbench-itis' in the buffer partitions -- they seem
 optimized to lever the mostly random access behavior of pgbench.  But
 how likely is it to see multiple simultaneous scans in the real world?
  Interleaving scans like that is not a very effective optimization --
 if it was me, it'd be trying to organize something around a
 partitioned tid scan for parallel sequential access.

Didn't we implement a system whereby this is exactly what we intend to
happen on the read side- that is, everyone doing a SeqScan gangs up on
one ring buffer and follows it, which we felt was going to dramatically
improve performance in some cases?

Or is this completely different from that..?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Jeff Janes
On Thu, May 24, 2012 at 4:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 24 May 2012, Jeff Janes wrote:

 Add
 #define LWLOCK_STATS
 near the top of:
 src/backend/storage/lmgr/lwlock.c

 and recompile and run a reduced-size workload.  When the processes
 exits, they will dump a lot of data about LWLock usage to the logfile.
 Generally the LWLock with the most blocks on it will be the main
 culprit.


 Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted
 by blk. Not sure whether that's of much use or not:

 PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373
 PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110
 PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976
 PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955
 PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871
 PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838
 PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774
 PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702
 PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665
 PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651
 PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602
 PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546
 PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511
 PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466

That is not an informative as I thought it would be (except to show
WAL was not the issue).

I'm guessing that 44 and 48 are the buffer mapping partitions which
cover the root block of some highly used index.

But just because those things are at the top of the list doesn't mean
they are a problem.  Something has to be at the top, and they don't
dominate the total number of blocking they way I would expect them to
if they were truly a substantial bottleneck.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Merlin Moncure
On Fri, May 25, 2012 at 11:17 AM, Stephen Frost sfr...@snowman.net wrote:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
 Right -- duh.  Well, hm.  Is this worth fixing?  ISTM there's a bit of
 'optimizing for pgbench-itis' in the buffer partitions -- they seem
 optimized to lever the mostly random access behavior of pgbench.  But
 how likely is it to see multiple simultaneous scans in the real world?
  Interleaving scans like that is not a very effective optimization --
 if it was me, it'd be trying to organize something around a
 partitioned tid scan for parallel sequential access.

 Didn't we implement a system whereby this is exactly what we intend to
 happen on the read side- that is, everyone doing a SeqScan gangs up on
 one ring buffer and follows it, which we felt was going to dramatically
 improve performance in some cases?

yeah:

/*
 * If the table is large relative to NBuffers, use a bulk-read access
 * strategy and enable synchronized scanning (see syncscan.c).  Although
 * the thresholds for these features could be different, we make them 
the
 * same so that there are only two behaviors to tune rather than four.
 * (However, some callers need to be able to disable one or both of 
these
 * behaviors, independently of the size of the table; also there is a 
GUC
 * variable that can disable synchronized scanning.)
 *
 * During a rescan, don't make a new strategy object if we don't have 
to.
 */
if (!RelationUsesLocalBuffers(scan-rs_rd) 
scan-rs_nblocks  NBuffers / 4)
{
allow_strat = scan-rs_allow_strat;
allow_sync = scan-rs_allow_sync;
}
else
allow_strat = allow_sync = false;

if (allow_strat)
{
if (scan-rs_strategy == NULL)
scan-rs_strategy = GetAccessStrategy(BAS_BULKREAD);
}


I wonder if the logic here is just being too strict...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interrupting long external library calls

2012-05-25 Thread Tom Lane
Sandro Santilli s...@keybit.net writes:
 I ended up providing an explicit mechanism to request interruption of
 whatever the library is doing, and experimented (successfully so far)
 requesting the interruption from a SIGINT handler.

 Do you see any major drawback in doing so ?

This seems a bit fragile.  It might work all right in Postgres, where
we tend to set up signal handlers just once at process start, but ISTM
other systems might assume they can change their signal handlers at
any time.  The handler itself looks less than portable anyway ---
what about the SIGINFO case?

I assume that the geos::util::Interrupt::request() call sets a flag
somewhere that's going to be periodically checked in long-running
loops.  Would it be possible for the periodic checks to include a
provision for a callback into Postgres-specific glue code, wherein
you could test the same flags CHECK_FOR_INTERRUPTS does?  A similar
approach might then be usable in other contexts, and it seems safer
to me than messing with a host environment's signal handling.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 On Fri, May 25, 2012 at 11:17 AM, Stephen Frost sfr...@snowman.net wrote:
 Didn't we implement a system whereby this is exactly what we intend to
 happen on the read side- that is, everyone doing a SeqScan gangs up on
 one ring buffer and follows it, which we felt was going to dramatically
 improve performance in some cases?

 yeah:
 ...
 I wonder if the logic here is just being too strict...

I don't recall how much evidence there is behind the NBuffers/4 threshold.
Maybe that needs some tuning?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Sergey Koposov

On Fri, 25 May 2012, Merlin Moncure wrote:

how likely is it to see multiple simultaneous scans in the real world?
Interleaving scans like that is not a very effective optimization --
if it was me, it'd be trying to organize something around a
partitioned tid scan for parallel sequential access.


Regarding the real world use.  I can say for myself is that the reason I'm
running the queries like the ones I have in my test is that I have a large
inflow of data every day, tens of gb, which has to be processed and since PG
doesn't have any parallelization of queries built in, and running the
processing in a single thread will take too long, I'm trying to work around
that by manually running multiple threads in PG and trying to split the work
among them. So not having scalability here is going to hurt us (but I 
admit that it's probably not very popular use-case for PG).


Regards,
S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Josh Berkus
On 5/25/12 8:19 AM, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:
 Where do you suggest the file be written to?
 
 One could argue stats_temp_directory would be the correct place.
 
 No, that would be exactly the *wrong* place, because that directory can
 be on a RAM disk.  We need to put this somewhere where it'll survive
 a shutdown.

Mind you, I can imagine a busy system wanting to keep PSS on a ram disk
as well. But users should be able to make that decision separately from
the stats file.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Magnus Hagander
On Fri, May 25, 2012 at 6:49 PM, Josh Berkus j...@agliodbs.com wrote:
 On 5/25/12 8:19 AM, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On Friday, May 25, 2012 04:03:49 PM Peter Geoghegan wrote:
 Where do you suggest the file be written to?

 One could argue stats_temp_directory would be the correct place.

 No, that would be exactly the *wrong* place, because that directory can
 be on a RAM disk.  We need to put this somewhere where it'll survive
 a shutdown.

 Mind you, I can imagine a busy system wanting to keep PSS on a ram disk
 as well. But users should be able to make that decision separately from
 the stats file.

Why would they want that? PSS only writes the tempfile on shutdown and
reads it on startup. Unlike pgstats which reads and writes it all the
time.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-25 Thread Josh Berkus

 Why would they want that? PSS only writes the tempfile on shutdown and
 reads it on startup. Unlike pgstats which reads and writes it all the
 time.

Ah, ok!  Didn't know that.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Robert Haas
On Thu, May 24, 2012 at 7:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted
 by blk. Not sure whether that's of much use or not:

What are the top dozen or so entries if you sort by shacq?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-25 Thread Josh Berkus
On 5/24/12 2:34 PM, Peter Geoghegan wrote:
 On 21 May 2012 19:10, Josh Berkus j...@agliodbs.com wrote:

 For these reasons, it may be timely and appropriate, from a purely
 advocacy point-of-view, to call our new group commit group commit in
 release notes and documentation, and announce it as a new feature.

 First, shouldn't we be having this discussion on -advocacy?
 
 Well, no, because this is a specific discussion about release notes.

True, but there's also the question of what we call this in the
promotional materials.

 In any case, I've given up on the idea that we should market new group
 commit as group commit. I believe that that would be a useful and
 fair way of representing the feature, but there doesn't seem to be any
 support for that view.

What else would you call it?  What's wrong with Better Group Commit?

From my perspective, it's pretty simple: we had group commit before, but
the new group commit is much better.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Sergey Koposov

On Fri, 25 May 2012, Robert Haas wrote:


On Thu, May 24, 2012 at 7:26 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:

Here is the output from a multi-threaded run (8thtreads, 22 seconds) sorted
by blk. Not sure whether that's of much use or not:


What are the top dozen or so entries if you sort by shacq?


Here it is:

PID 7108 lwlock 44: shacq 1127902 exacq 1548 blk 511
PID 7115 lwlock 44: shacq 1127495 exacq 633 blk 401
PID 7114 lwlock 44: shacq 1127418 exacq 1024 blk 702
PID 7111 lwlock 44: shacq 1127324 exacq 957 blk 651
PID 7109 lwlock 44: shacq 1127261 exacq 388 blk 466
PID 7110 lwlock 44: shacq 1127256 exacq 1132 blk 774
PID 7113 lwlock 44: shacq 1127179 exacq 920 blk 665
PID 7112 lwlock 44: shacq 1127148 exacq 1323 blk 838
PID 7108 lwlock 48: shacq 1125039 exacq 1592 blk 546
PID 7115 lwlock 48: shacq 1124666 exacq 559 blk 397
PID 7111 lwlock 48: shacq 1124523 exacq 1009 blk 955
PID 7114 lwlock 48: shacq 1124502 exacq 1041 blk 976
PID 7110 lwlock 48: shacq 1124460 exacq 1128 blk 1110
PID 7109 lwlock 48: shacq 1124402 exacq 384 blk 602
PID 7112 lwlock 48: shacq 1124394 exacq 1350 blk 1373
PID 7113 lwlock 48: shacq 1124383 exacq 868 blk 871
PID 7115 lwlock 1020: shacq 100 exacq 0 blk 0
PID 7115 lwlock 1018: shacq 100 exacq 0 blk 0
PID 7114 lwlock 1020: shacq 100 exacq 0 blk 0
PID 7114 lwlock 1018: shacq 100 exacq 0 blk 0
PID 7113 lwlock 1020: shacq 100 exacq 0 blk 0
PID 7113 lwlock 1018: shacq 100 exacq 0 blk 0
PID 7112 lwlock 1020: shacq 100 exacq 0 blk 0
PID 7112 lwlock 1018: shacq 100 exacq 0 blk 0
PID 7111 lwlock 1020: shacq 100 exacq 0 blk 0
PID 7111 lwlock 1018: shacq 100 exacq 0 blk 0
PID 7110 lwlock 1020: shacq 100 exacq 0 blk 0
PID 7110 lwlock 1018: shacq 100 exacq 0 blk 0
PID 7109 lwlock 1020: shacq 100 exacq 0 blk 0
PID 7109 lwlock 1018: shacq 100 exacq 0 blk 0
PID 7108 lwlock 1020: shacq 100 exacq 0 blk 0
PID 7108 lwlock 1018: shacq 100 exacq 0 blk 0
PID 7108 lwlock 43: shacq 439454 exacq 1614 blk 151
PID 7115 lwlock 43: shacq 438993 exacq 558 blk 126
PID 7111 lwlock 43: shacq 438817 exacq 899 blk 200
PID 7114 lwlock 43: shacq 438795 exacq 898 blk 230
PID 7109 lwlock 43: shacq 438787 exacq 314 blk 153
PID 7110 lwlock 43: shacq 438768 exacq 1026 blk 231
PID 7113 lwlock 43: shacq 438720 exacq 833 blk 178
PID 7112 lwlock 43: shacq 438691 exacq 1123 blk 215
PID 7108 lwlock 34: shacq 343982 exacq 1595 blk 177
PID 7115 lwlock 34: shacq 343582 exacq 600 blk 139
PID 7109 lwlock 34: shacq 343410 exacq 348 blk 167
PID 7111 lwlock 34: shacq 343398 exacq 926 blk 255
PID 7114 lwlock 34: shacq 343362 exacq 939 blk 214
PID 7110 lwlock 34: shacq 343338 exacq 1074 blk 226
PID 7113 lwlock 34: shacq 343283 exacq 832 blk 226
PID 7112 lwlock 34: shacq 343247 exacq 1255 blk 271
PID 7108 lwlock 38: shacq 333989 exacq 1704 blk 137
PID 7115 lwlock 38: shacq 333512 exacq 581 blk 106
PID 7109 lwlock 38: shacq 40 exacq 322 blk 115
PID 7111 lwlock 38: shacq 06 exacq 882 blk 152
PID 7114 lwlock 38: shacq 333266 exacq 977 blk 183
PID 7110 lwlock 38: shacq 333249 exacq 1013 blk 165
PID 7113 lwlock 38: shacq 333226 exacq 828 blk 163
PID 7112 lwlock 38: shacq 333175 exacq 1246 blk 183
PID 7108 lwlock 42: shacq 233215 exacq 1539 blk 237
PID 7115 lwlock 42: shacq 232846 exacq 517 blk 138
PID 7111 lwlock 42: shacq 232645 exacq 885 blk 215
PID 7109 lwlock 42: shacq 232611 exacq 363 blk 140
PID 7113 lwlock 42: shacq 232600 exacq 860 blk 192
PID 7110 lwlock 42: shacq 232599 exacq 1049 blk 262
PID 7114 lwlock 42: shacq 232591 exacq 935 blk 318
PID 7112 lwlock 42: shacq 232505 exacq 1262 blk 240
PID 7108 lwlock 47: shacq 228682 exacq 1495 blk 177
PID 7115 lwlock 47: shacq 228397 exacq 516 blk 109
PID 7110 lwlock 47: shacq 228218 exacq 1082 blk 378
PID 7109 lwlock 47: shacq 228097 exacq 386 blk 158
PID 7111 lwlock 47: shacq 228093 exacq 907 blk 321
PID 7113 lwlock 47: shacq 228085 exacq 909 blk 306
PID 7112 lwlock 47: shacq 227993 exacq 1248 blk 387
PID 7114 lwlock 47: shacq 227986 exacq 929 blk 449
PID 7108 lwlock 36: shacq 205287 exacq 1592 blk 78
PID 7115 lwlock 36: shacq 204845 exacq 535 blk 86
PID 7111 lwlock 36: shacq 204675 exacq 830 blk 108
PID 7114 lwlock 36: shacq 204634 exacq 930 blk 109
PID 7109 lwlock 36: shacq 204581 exacq 363 blk 83
PID 7113 lwlock 36: shacq 204566 exacq 833 blk 112
PID 7110 lwlock 36: shacq 204546 exacq 1014 blk 106
PID 7112 lwlock 36: shacq 204508 exacq 1180 blk 140
PID 7108 lwlock 35: shacq 136067 exacq 1537 blk 56
PID 7115 lwlock 35: shacq 135532 exacq 494 blk 47
PID 7111 lwlock 35: shacq 135404 exacq 820 blk 76
PID 7109 lwlock 35: shacq 135362 exacq 326 blk 56
PID 7114 lwlock 35: shacq 135342 exacq 879 blk 102
PID 7110 lwlock 35: shacq 135333 exacq 1016 blk 88
PID 7113 lwlock 35: shacq 135318 exacq 829 blk 75
PID 7112 lwlock 35: shacq 135257 exacq 1229 blk 98
PID 7108 lwlock 41: shacq 134338 exacq 1540 blk 55
PID 7115 lwlock 41: shacq 133929 exacq 542 blk 50
PID 7109 lwlock 41: shacq 133781 exacq 360 blk 49
PID 7111 

Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog

2012-05-25 Thread Fujii Masao
On Thu, May 24, 2012 at 4:52 AM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, May 23, 2012 at 8:11 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, May 22, 2012 at 11:04 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, May 14, 2012 at 2:24 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 Should we go down the easy way and just reject connections when the flag 
 is
 mismatching between the client and the server (trivial to do - see the
 attached patch)?

 +       char       *tmpparam;

 You forgot to add const before char, which causes a compile-time 
 warning.

 I went ahead and committed this, with this fix and a slight change to
 the message text.

 Thanks!

 Hope that's OK with everyone...

 What about calling PQfinish() before exit() to avoid unexpected EOF
 connection error?
 Patch attached.

 Makes sense, applied.

Thanks! So, let's go back to the original problem: pg_receivexlog
still doesn't work fine
under --disable-integer-datetimes. I previously posted the patch which
fixes that problem.
http://archives.postgresql.org/message-id/CAHGQGwFutqnFPBYcHUCuoy1zMVDXto=o4ogsjrbwxw4zj2t...@mail.gmail.com

Attached is the updated version of the patch. Comments?

Regards,

-- 
Fujii Masao


timeout_handling_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Merlin Moncure
On Fri, May 25, 2012 at 11:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Fri, May 25, 2012 at 11:17 AM, Stephen Frost sfr...@snowman.net wrote:
 Didn't we implement a system whereby this is exactly what we intend to
 happen on the read side- that is, everyone doing a SeqScan gangs up on
 one ring buffer and follows it, which we felt was going to dramatically
 improve performance in some cases?

 yeah:
 ...
 I wonder if the logic here is just being too strict...

 I don't recall how much evidence there is behind the NBuffers/4 threshold.
 Maybe that needs some tuning?


probably.  Sergey is going to get some numbers back from a hacked
version that forces the scan to use a strategy.

If that turns out to be the problem, I bet it'd make sense to hook the
decision against ring size which is currently internal to freelist.
In other words, since freelist is making the call to determine the
size of the ring, it should also probably be making the call in terms
of a particular input size being useful to apply a strategy to.

Maybe something like
IsAccessStrategyUseful(BufferAccessStrategyType btype, BlockNumber nblocks) ...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] 9.2beta1 regression: pg_restore --data-only does not set sequence values any more

2012-05-25 Thread Andrew Dunstan



On 05/21/2012 02:59 PM, Andrew Dunstan wrote:



On 05/16/2012 10:23 AM, Andrew Dunstan wrote:



On Wed, May 16, 2012 at 9:08 AM, Tom Lane t...@sss.pgh.pa.us 
mailto:t...@sss.pgh.pa.us wrote:


Martin Pitt mp...@debian.org mailto:mp...@debian.org writes:
 while packaging 9.2 beta 1 for Debian/Ubuntu the postgresql-common
 test suite noticed a regression: It seems that pg_restore
--data-only
 now skips the current value of sequences, so that in the upgraded
 database the sequence counter is back to the default.

I believe this is a consequence of commit
a4cd6abcc901c1a8009c62a27f78696717bb8fe1, which introduced the
entirely
false assumption that --schema-only and --data-only have 
something to

do with the order that entries appear in the archive ...



Darn, will investigate.




[cc -hackers]

Well, the trouble is that we have these pesky SECTION_NONE entries for 
things like comments, security labels and ACLs that need to be dumped 
in the right section, so we can't totally ignore the order. But we 
could (and probably should) ignore the order for making decisions 
about everything BUT those entries.


So, here's a revised plan:

--section=data will dump exactly TABLE DATA, SEQUENCE SET or BLOBS 
entries
--section=pre-data will dump SECTION_PRE_DATA items (other than 
SEQUENCE SET) plus any immediately following SECTION_NONE items.

--section=post-data will dump everything else.






It turns out there were some infelicities with pg_dump as well as with 
pg_restore.


I think the attached patch does the right thing. I'll keep testing - 
I'll be happier if other people bang on it too.


cheers

andrew
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
***
*** 2341,2354  _tocEntryRequired(TocEntry *te, RestoreOptions *ropt, bool include_acls)
  	if (!ropt-createDB  strcmp(te-desc, DATABASE) == 0)
  		return 0;
  
! 	/* skip (all but) post data section as required */
! 	/* table data is filtered if necessary lower down */
  	if (ropt-dumpSections != DUMP_UNSECTIONED)
  	{
! 		if (!(ropt-dumpSections  DUMP_POST_DATA)  te-inPostData)
! 			return 0;
! 		if (!(ropt-dumpSections  DUMP_PRE_DATA)  ! te-inPostData  strcmp(te-desc, TABLE DATA) != 0)
  			return 0;
  	}
  
  
--- 2341,2365 
  	if (!ropt-createDB  strcmp(te-desc, DATABASE) == 0)
  		return 0;
  
! 	/* 
! 	 * Skip pre and post data section as required 
! 	 * Data is filtered if necessary lower down 
! 	 * Sequence set operations are in the pre data section for parallel
! 	 * processing purposes, but part of the data section for sectioning
! 	 * purposes.
! 	 * SECTION_NONE items are filtered according to where they are 
! 	 * positioned in the list of TOC entries.
! 	 */
  	if (ropt-dumpSections != DUMP_UNSECTIONED)
  	{
! 		if (!(ropt-dumpSections  DUMP_POST_DATA)   /* post data skip */
! 			((te-section == SECTION_NONE  te-inPostData) || 
! 			  te-section == SECTION_POST_DATA))
  			return 0;
+ 		if (!(ropt-dumpSections  DUMP_PRE_DATA)   /* pre data skip */
+ 			((te-section == SECTION_NONE  ! te-inPostData) || 
+ 			 (te-section == SECTION_PRE_DATA  strcmp(te-desc, SEQUENCE SET) != 0)))
+ 			return 0;			
  	}
  
  
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***
*** 7096,7101  dumpDumpableObject(Archive *fout, DumpableObject *dobj)
--- 7096,7103 
  
  	switch (dobj-objType)
  	{
+ 		case DO_TABLE:
+ 			break; /* has its own controls */
  		case DO_INDEX:
  		case DO_TRIGGER:
  		case DO_CONSTRAINT:
***
*** 12075,12081  dumpTable(Archive *fout, TableInfo *tbinfo)
  
  		if (tbinfo-relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (!dataOnly)
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
--- 12077,12083 
  
  		if (tbinfo-relkind == RELKIND_SEQUENCE)
  			dumpSequence(fout, tbinfo);
! 		else if (dumpSections  DUMP_PRE_DATA)
  			dumpTableSchema(fout, tbinfo);
  
  		/* Handle the ACL here */
***
*** 13291,13297  dumpSequence(Archive *fout, TableInfo *tbinfo)
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a data dump.
  	 */
! 	if (!dataOnly)
  	{
  		/*
  		 * DROP must be fully qualified in case same name appears in
--- 13293,13299 
  	 *
  	 * Add a 'SETVAL(seq, last_val, iscalled)' as part of a data dump.
  	 */
! 	if (dumpSections  DUMP_PRE_DATA)
  	{
  		/*
  		 * DROP must be fully qualified in case same name appears in
***
*** 13412,13418  dumpSequence(Archive *fout, TableInfo *tbinfo)
  	 tbinfo-dobj.catId, 0, tbinfo-dobj.dumpId);
  	}
  
! 	if (!schemaOnly)
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, SELECT pg_catalog.setval();
--- 13414,13420 
  	 tbinfo-dobj.catId, 0, tbinfo-dobj.dumpId);
  	}
  
! 	if (dumpSections  DUMP_DATA)
  	{
  		resetPQExpBuffer(query);
  		appendPQExpBuffer(query, SELECT pg_catalog.setval();

-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] Archiver not exiting upon crash

2012-05-25 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 So my test harness is an inexplicably effective show-case for the
 vulnerability, but it is not the reason the vulnerability should be
 fixed.

I spent a bit of time looking into this.  In principle the postmaster
could be fixed to repeat the SIGQUIT signal every second or so, but
it would be a rather considerable wart on code that's already
overcomplicated for my taste (given that bugs in the postmaster are
basically game over).  An example of the problems is that the
postmaster's own SIGQUIT signal handler presently just sends out the
child signals and calls exit().  We'd need some new waiting for
children to die state to allow for retransmission in such cases.
As far as the postmaster is concerned, it would be a lot better if
SIGQUIT could be considered reliable.

This leads me to the thought that maybe we have to deprecate use of
system() in Postgres backend code.  If we simply fork and exec without
touching the signal handling, ISTM that would do what we want, and the
amount of added code would be pretty minimal (a lot less than would have
to be added to the postmaster to address this the other way).  However,
I'm not too sure what would be required to make it go on Windows.
Comments?  Does the Windows emulation of system() even have this issue
to begin with?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Foreground vacuum and buffer access strategy

2012-05-25 Thread Jeff Janes
If I invoke vacuum manually and do so with VacuumCostDelay == 0, I
have basically declared my intentions to get this pain over with as
fast as possible even if it might interfere with other processes.

Under that condition, shouldn't it use BAS_BULKWRITE rather than
BAS_VACUUM?  The smaller ring size leads to a lot of synchronous WAL
flushes which I think can slow the vacuum down a lot.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in new buffering GiST build code

2012-05-25 Thread Heikki Linnakangas

On 22.05.2012 01:09, Alexander Korotkov wrote:

Hi!

On Tue, May 22, 2012 at 12:56 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:


The management of the path stacks is a bit complicated, anyway. I'll think
about this some more tomorrow, maybe we can make it simpler, knowing that
we have to do those extra lookups.


WOW! You did enormous work on exploring that!
I just arrived from PGCon and start looking at it when find you've already
done comprehensive research of this problem.
On the step 5 if we've NSN in GISTBufferingInsertStack structure, we could
detect situation of changing parent of splitted page. Using this we could
save copy of GISTBufferingInsertStack for B2 with original parent A,
because we know split of B to occur after creating GISTBufferingInsertStack
but before split of A. The question is how to find this copy from C, hash?


I tested a patch that adds the extra getNodeBuffer() call after 
refinding the parent, as discussed. However, I'm still getting a failed 
to-refind parent error later in the build, so I think we're still 
missing some corner case.


I think we should rewrite the way we track the parents completely. 
Rather than keep a path stack attached to every node buffer, let's just 
maintain a second hash table that contains the parent of every internal 
node. Whenever a downlink is moved to another page, update the hash 
table with the new information. That way we always have up-to-date 
information about the parent of every internal node.


That's much easier to understand than the path stack structures we have 
now. I think the overall memory consumption will be about the same too. 
Although we need the extra hash table with one entry for every internal 
node, we get rid of the path stack structs, which are also one per every 
internal node at the moment.


I believe it is faster too. I added some instrumentation to the existing 
gist code (with the additional getNodeBuffer() call added to fix this 
bug), to measure the time spent moving right, when refinding the parent 
of a page. I added gettimeofday() calls before and after moving right, 
and summed the total. In my test case, the final index size was about 
19GB, and the index build took 3545 seconds (59 minutes). Of that time, 
580 seconds (~ 10 minutes) was spent moving right to refind parents. 
That's a lot. I also printed a line whenever a refind operation had to 
move right 20 pages or more. That happened 2482 times during the build, 
in the worst case we moved right over 4 pages.


Attached is a patch to replace the path stacks with a hash table. With 
this patch, the index build time in my test case dropped from 59 minutes 
to about 55 minutes. I don'ẗ know how representative or repeatable this 
test case is, but this definitely seems very worthwhile, not only 
because it fixes the bug and makes the code simpler, but also on 
performance grounds.


Alexander, do you still have the test environments and data lying around 
that you used for GiST buffering testing last summer? Could you rerun 
some of those tests with this patch?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 988896d..9d9a031 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -55,16 +55,24 @@ typedef struct
 {
 	Relation	indexrel;
 	GISTSTATE  *giststate;
-	GISTBuildBuffers *gfbb;
 
 	int64		indtuples;		/* number of tuples indexed */
 	int64		indtuplesSize;	/* total size of all indexed tuples */
 
 	Size		freespace;		/* amount of free space to leave on pages */
 
+	/*
+	 * Extra data structures used during a buffering build. GISTBuildBuffers
+	 * contains information related to managing the build buffers. parentMap
+	 * is a lookup table of the parent of each internal page.
+	 */
+	GISTBuildBuffers *gfbb;
+	HTAB	   *parentMap;
+
 	GistBufferingMode bufferingMode;
 } GISTBuildState;
 
+/* prototypes for private functions */
 static void gistInitBuffering(GISTBuildState *buildstate);
 static int	calculatePagesPerBuffer(GISTBuildState *buildstate, int levelStep);
 static void gistBuildCallback(Relation index,
@@ -76,18 +84,23 @@ static void gistBuildCallback(Relation index,
 static void gistBufferingBuildInsert(GISTBuildState *buildstate,
 		 IndexTuple itup);
 static bool gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
-GISTBufferingInsertStack *startparent);
+BlockNumber startblkno, int startlevel);
 static void gistbufferinginserttuples(GISTBuildState *buildstate,
 		  Buffer buffer,
 		  IndexTuple *itup, int ntup, OffsetNumber oldoffnum,
-		  GISTBufferingInsertStack *path);
-static void gistBufferingFindCorrectParent(GISTBuildState *buildstate,
-			   GISTBufferingInsertStack *child);
+		  int level, BlockNumber parent);
+static Buffer gistBufferingFindCorrectParent(GISTBuildState *buildstate,
+			   BlockNumber 

Re: [HACKERS] patch: Use pg_mbcliplen for truncation in text-to-name conversion

2012-05-25 Thread Tom Lane
Karl Schnaitter karl...@gmail.com writes:
 The text_name function was truncating its input string to the first
 NAMEDATALEN-1 bytes, which is wrong if the string has multi-byte
 characters. This patch changes it to use pg_mbcliplen, following
 the namein function.

Good catch, but poking around I note that bpchar_name has the same
disease.  Will fix, thanks for the report!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-25 Thread Kohei KaiGai
2012/5/24 Robert Haas robertmh...@gmail.com:
 On Thu, May 24, 2012 at 6:11 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Perhaps when we see that RLS
 applies, we should replace the reference to the original table with a
 subquery RTE that has the security_barrier flag set - essentially
 treating a table with RLS as if it were a security view.

 I become to think it is a better approach than tracking origin of each
 qualifiers. One problem is case handling on update or delete statement.

 It may be possible to rewrite the update / delete query as follows:

 From:
  UPDATE tbl SET X = X + 1 WHERE f_leak(Y)
 To:
  UPDATE tbl SET X = X + 1 WHERE ctid = (
      SELECT * FROM (
          SELECT ctid FROM tbl WHERE uname = getpgusername()  == (*)
 should have security-barrier
      ) AS tbl_subqry WHERE f_leak(Y)
  );

 Expanded sub-queries will have security-barrier flag, so it enforces
 the uname = getpgusername() being checked earlier than f_leak(Y).
 We may need to measure the performance impact due to the reform.

 The problem with this is that it introduces an extra instance of tbl
 into the query - there are now two rather than one.  UPDATE .. FROM is
 supposed to be a way to avoid this, but it's insufficiently general to
 handle all the cases (e.g. UPDATE a LEFT JOIN b can't be written using
 the existing syntax).  Anyway we want to avoid inserting self-joins
 for performance reasons if at all possible.  It should be easy to do
 that in the case of SELECT; UPDATE and DELETE may need a bit more
 work.

I'll try to investigate a way to solve the matter without twice scan.
Right now, I have no reasonable ideas. Please give us suggestion
if you have something...

 I think, this situation is similar to a case when we reference a view
 without privileges to underlying tables. If Bob set up a view with
 something tricky function, it allows Bob to reference credentials
 of users who reference the view.
 More or less, it might be a problem when a user try to invoke
 a user defined function declared by others.
 (Thus, sepgsql policy does not allow users to invoke a function
 declared by another one in different domain; without DBA's checks.)

 This is true, but there are still some new threat models.  For
 example, currently, pg_dump isn't going to run any user-defined code
 just because you do SELECT * FROM table, but that will change with
 this patch.  Note that pg_dump need not actually select from views,
 only tables.

 I think it is a good idea not to apply RLS when current user has
 superuser privilege from perspective of security model consistency,
 but it is inconsistent to check privileges underlying tables.

 Seems like a somewhat random wart, if it's just an exception for
 superusers.  I think we need to do better than that.  For example, at
 my last company, sales reps A and B were permitted to see all
 customers of the company, but sales reps C, D, E, F, G, H, I, and J
 were permitted to see only their own accounts.  Those sorts of
 policies need to be easy to implement.

Probably, if sales_rep column records its responsible repo, its
security policy is able to be described as:
  (my_sales_rep() in ('A', 'B') OR sales_rep = my_sales_rep())

Indeed, the design to check underlying table seems to me like
column-level privileges towards table-level privileges, since it
is checked only when user does not have requires privileges
on whole of the table.
However, I have no idea to modify ExecCheckRTEPerms()
regarding to RLS. If we assume RLS is applied when user has
no privileges on tables, the current ExecCheckRTEPerms()
always raises an error towards unprivileged users, prior to
execution of queries.
Isn't it preferable behavior to allow unprivileged users to
reference a table (or columns) when it has RLS policy?

I think, table and column level privilege should be checked
individually, in addition to row-level security policy.

 Another idea is to set things up so that the RLS policy function isn't
 applied to each row directly; instead, it's invoked once per query and
 *returns* a WHERE clause.  This would be a lot more powerful than the
 proposed design, because now the table owner can write a function that
 imposes quals on some people but not others, which seems very useful.

 Sorry, I don't favor this idea. Even if table owner set up a function to
 generate additional qualifiers, it also has no guarantee the qualifiers
 are invoked prior to user-given one.
 It seems to me this approach will have same problem...

 It's not intended to solve the qual-ordering problem, just to allow
 additional policy flexibility.

At the beginning, I thought it takes complex code to parse
where-clause being provided as security policy, so it is the
reason why I was inclined to give a function, instead of a clause.
But I noticed we already have similar code at CreateTrigger()
to handle it.
Does it give policy flexibility?

 It's not clear to me that there is any need for built-in server
 functionality here.  If the table 

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-25 Thread Kohei KaiGai
2012/5/24 Robert Haas robertmh...@gmail.com:
 On Thu, May 24, 2012 at 12:00 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
 created references to rows which are invisible to you, or should FOREIGN KEY
 constraints be exempt from security policies? I'd say they shouldn't be, 
 i.e.
 the policy WHERE clause should be added to constraint checking queries like
 usual. But maybe I'm missing some reason why that'd be undesirable…

 I agree. The row level security policy should not be applied during FK checks
 (or other internal stuff; to be harmless). At the previous discussion, it was
 issued that iteration of FK/PK proving enables malicious one to estimate
 existence of invisible tuple and its key value, although they cannot see the
 actual values. It is well documented limitation, thus, user should not use 
 row-
 level security (or should not use natural key) if they cannot accept
 this limitation.

 You say I agree, but it seems to me that you and Florian are in fact
 taking opposite positions.

Sorry, I misread what he described.

 FWIW, I'm inclined to think that you should NOT be able to create a
 row that references an invisible row.  You might end up with that
 situation anyway, because we don't know what the semantics of the
 security policy are: rows might become visible or invisible after the
 fact, and we can't police that.  But I think that if you take the
 opposite position that the select queries inside fkey triggers ought
 to be exempt from security policy, then you need to build some new
 mechanism to make that happen, which seems like extra work for no
 benefit.

I think it is fair enough for RI_FKey_check_ins and RI_FKey_check_upd;
RLS policy inside these trigger function will exhibit to create a row that
references invisible row.

However, it should not be applied on triggers being set on PK tables,
because it allows to modify or delete primary-key being referenced by
invisible foreign-key from the viewpoint of operators.
I think, it makes sense to have exceptional cases; to make sure
foreign-key constraint at the baseline.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-25 Thread Kohei KaiGai
2012/5/25 Florian Pflug f...@phlo.org:
 On May24, 2012, at 19:25 , Robert Haas wrote:
 FWIW, I'm inclined to think that you should NOT be able to create a
 row that references an invisible row.  You might end up with that
 situation anyway, because we don't know what the semantics of the
 security policy are: rows might become visible or invisible after the
 fact, and we can't police that.

 Right. I just realized, however, that there's another case which wasn't
 considered yet, which is how to handle the initial check during
 ALTER TABEL ADD CONSTRAINT. I'm thinking that it's fine to only consider
 visible rows in the parent table there too, but we should be checking
 all rows in the child table. The easiest way would be to restrict
 ALTER TABLE ADD CONSTRAINT to the table owner for tables with RLS
 (it seems that currently the REFERENCES privilege is sufficient), and
 make the table owner exempt from RLS on that table. The latter means you'd
 need at least two roles to use RLS, but anyone security-conscious enough
 to use RLS will probably not user the same role for DDL and DML operations
 anyway...

I think, the RLS policy should perform as a view with qualifiers.
It means database constraints have to be kept from the viewpoint of
any clients, so, any orphan foreign-keys or any rows violating check
constraint should not exist.

In case when a user insert a row with foreign-key, it is an exceptional
case. Even if he cannot insert a foreign-key that references invisible
primary-key, it never breaks database constraints from the viewpoint
of other folks.

My standpoint deals with database constraints as first class customer
that should be always kept in the lowest level, even though a part of
results would be filtered out due to RLS.
Isn't it a simple enough criteria?

 But I think that if you take the
 opposite position that the select queries inside fkey triggers ought
 to be exempt from security policy, then you need to build some new
 mechanism to make that happen, which seems like extra work for no
 benefit.

 Hm, interesting angle. Continuing this thought, without any extra work,
 UNIQUE and EXCLUSION constraints *will* be enforced regardless of row
 visibility, because their implementation isn't SPI-based but instead
 detects conflicts while inserting tuples into the index.

 For being so obviously inconsistent in its treatment of UNIQUE and
 EXCLUSION constraints vs. FK constraints, this feels surprisingly
 right. So, to prevent design by accident, here's an attempt to explain
 that divergence.

 For UNIQUE and EXCLUSION constraints, the most conservative assumption
 possible is that all rows are visible, since that leads to the most
 rejections. With that assumption, no matter what the actual policy is,
 the data returned by a query will always satisfy the constraint. Plus,
 the constraint is still sensible because it neither rejects nor allows
 all rows. So that conservative assumption is the one we make, i.e. we
 ignore RLS visibility when checking those kinds of constraints.

 For FK constraints, OTOH, the most conservative assumption is that
 no rows are visible. But that is meaningless, since it will simply reject
 all possible rows. Having thus no chance of enforcing the constraint
 ourselves under all possible policies, the best we can do is to at least
 make it possible for the constraint to work correctly for as many policies
 as possible. Now, if we go with KaiGai's suggestion of skipping RLS
 while checking FK constraints, the only policy that the constraint will
 work correctly for is one which doesn't actually hide any parent rows.
 Whereas if we apply RLS checks while checking FK constraints, all policies
 which behave consistently for parent and child rows (i.e. don't hide the
 former but show the latter) will work correctly. We thus go with the
 second option, since the class of working policies is larger.

-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap metapages

2012-05-25 Thread Jim Nasby

On 5/22/12 12:09 PM, Simon Riggs wrote:

On 22 May 2012 13:52, Robert Haasrobertmh...@gmail.com  wrote:


It seems pretty clear to me that making pg_upgrade responsible for
emptying block zero is a non-starter.  But I don't think that's a
reason to throw out the design; I think it's a problem we can work
around.


I like your design better as well *if* you can explain how we can get
to it. My proposal was a practical alternative that would allow the
idea to proceed.


It occurred to me that having a metapage with information useful to recovery 
operations in *every segment* would be useful; it certainly seems worth the 
extra block. It then occurred to me that we've basically been stuck with 2 
places to store relation data; either at the relation level in pg_class or on 
each page. Sometimes neither one is a good fit.

ISTM that a lot of problems we've faced in the past few years are because 
there's not a good abstraction between a (mostly) linear tuplespace and the 
physical storage that goes underneath it.

- pg_upgrade progress is blocked because we can't deal with a new page that's  
BLKSZ
- There's no good way to deal with table (or worse, index) bloat
- There's no good way to add the concept of a heap metapage
- Forks are being used to store data that might not belong there only because 
there's no other choice (visibility info)

Would it make sense to take a step back and think about ways to abstract 
between logical tuplespace and physical storage? What if 1GB segments had their 
own metadata? Or groups of segments? Could certain operations that currently 
have to rewrite an entire table be changed so that they slowly moved pages from 
one group of segments to another, with a means of marking old pages as having 
been moved?

Einstein said that problems cannot be solved by the same level of thinking that 
created them. Perhaps we're at the point where we need to take a step back from our 
current storage organization and look for a bigger picture?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Greg Sabino Mullane
 Yeah, this is proof that what it was doing is the same as what we saw in
 Jeff's backtrace, ie loading up the system catalog relcache entries the
 hard way via seqscans on the core catalogs.  So the question to be
 answered is why that's suddenly a big performance bottleneck.  It's not
 a cheap operation of course (that's why we cache the results ;-)) but
 it shouldn't take minutes either.  And, because they are seqscans, it
 doesn't seem like messed-up indexes should matter.

FWIW, this appeared to be an all-or-nothing event: either every new backend 
was suffering through this, or none were. They all seemed to clear up 
at the same time as well.

 The theory I have in mind about Jeff's case is that it was basically an
 I/O storm, but it's not clear whether the same explanation works for
 your case.  There may be some other contributing factor that we haven't
 identified yet.

Let me know if you think of anything particular I can test while it is 
happening again. I'll try to arrange a (netapp) snapshot the next time 
it happens as well (this system is too busy and too large to do anything 
else).

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpYJDovfAM7L.pgp
Description: PGP signature


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-25 Thread Sergey Koposov

On Fri, 25 May 2012, Merlin Moncure wrote:

can you hack this in heapam.c and see if it helps?
line 131-ish:

if (!RelationUsesLocalBuffers(scan-rs_rd) 
scan-rs_nblocks  NBuffers / 4)
becomes
if (!RelationUsesLocalBuffers(scan-rs_rd))

(also you can set the partition count back).


The first few runs of my test with the suggested change made the 
multithreaded queries even slower by a factor of two.

E.g. 44 seconds for multithreaded vs ~ 7sec single threaded.
(with the default 9.2beta I get ~ 14-22sec for multithreaded run)
But now after another few reruns of my test, i see again this variability 
in times for multithreaded runs. It went down to 18-22seconds. Then again 
to 44 sec. So overall the change you suggested either doesn't improve 
the situation or makes it worse .


Regards,
S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes:
 Yeah, this is proof that what it was doing is the same as what we saw in
 Jeff's backtrace, ie loading up the system catalog relcache entries the
 hard way via seqscans on the core catalogs.  So the question to be
 answered is why that's suddenly a big performance bottleneck.  It's not
 a cheap operation of course (that's why we cache the results ;-)) but
 it shouldn't take minutes either.  And, because they are seqscans, it
 doesn't seem like messed-up indexes should matter.

 FWIW, this appeared to be an all-or-nothing event: either every new backend 
 was suffering through this, or none were. They all seemed to clear up 
 at the same time as well.

Mostly not surprising.  They'd definitely all hit the missing init file
at the same time, so the stalling would start consistently for all.  And
once any one process successfully created a new file, subsequent incoming
sessions wouldn't stall.  However, the remaining processes trying to
compute new init files would still have to complete the process, so I'd
expect there to be a diminishing effect --- the ones that were stalling
shouldn't all release exactly together.  Unless there is some additional
effect that's syncing them all.  (I wonder for instance if the syncscan
logic is kicking in here.)

One interesting question is why there's a thundering herd of new
arrivals in the first place.  IIRC you said you were using a connection
pooler.  I wonder if it has a bug^H^H^Hdesign infelicity that makes it
drop and reopen all its connections simultaneously.

 Let me know if you think of anything particular I can test while it is 
 happening again.

I just noticed something that might explain the persistency of the
effect as observed by Jeff.  The code that seqscans pg_attribute (in
RelationBuildTupleDesc) knows how many rows it's expecting to find for
a given catalog, and it falls out of the seqscan loop as soon as it's
gotten them all.  Now, the rows belonging to core system catalogs are
generally right near the front of pg_attribute, being the oldest rows in
that catalog, which means that generally this results in not having to
seqscan very far even if pg_attribute is large.  It strikes me though
that there are at least two ways that nice property could get broken,
resulting in much seqscan work if pg_attribute is large:

1. Somebody decides to update one of those rows, and it gets dropped in
some remote region of the table.  The only really plausible reason for
this is deciding to fool with the column-specific stats target
(attstattarget) of a system catalog.  Does that sound like something
either of you might have done?  You could check it by looking at the
ctid columns of the pg_attribute rows for system catalogs, and seeing
if any have large block numbers.

2. If the syncscan logic were to kick in and cause some backend to pick
up its seqscan of pg_attribute from a point beyond where some of the
target rows are, that backend would have to scan all of pg_attribute,
circling back around to the start, before it would find all the rows it
seeks.  And most likely this effect would lather-rinse-repeat for each
catalog it's seeking the pg_attribute entries for.  Not only does this
mean a much-worse-than-normal startup time for that backend, but any
other ones that arrive while the synchronized scan is in progress would
be caught in the undertow and likewise spend a long time to get their
results.

So point 2 is suddenly looking like a really attractive theory for
explaining what happened.  As we were just remarking in an adjacent
thread, the threshold for this to start happening would be for
pg_attribute to get larger than one-fourth of shared_buffers; the
syncscan logic doesn't kick in for relations smaller than that.  IIRC
this is close enough to the numbers Jeff mentioned to make it plausible
that it happened to him, and plausible that his new installation has
pg_attribute just enough smaller to avoid the scenario.  Not sure about
Greg's case, but he should be able to tell us the size of pg_attribute
and his shared_buffers setting ...

If this is the explanation, then it's easy enough to deal with point 2
--- just disable syncscan for these searches.  I don't see an easy
answer for problems of ilk #1, other than don't do that in a database
that's been around for awhile.

Another idea we might consider is to prevent the thundering herd effect
by not having all the incoming backends try to update pg_internal.init
independently.  Instead let the first one take a lock while it's doing
that, and the rest just queue up on that lock.  They'd be stalled
anyway, and they are not helping matters by duplicating the first one's
effort.  This is a rather more invasive change though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap metapages

2012-05-25 Thread Robert Haas
On Fri, May 25, 2012 at 5:57 PM, Jim Nasby j...@nasby.net wrote:
 It occurred to me that having a metapage with information useful to recovery
 operations in *every segment* would be useful; it certainly seems worth the
 extra block. It then occurred to me that we've basically been stuck with 2
 places to store relation data; either at the relation level in pg_class or
 on each page. Sometimes neither one is a good fit.

AFAICS, having metadata in every segment is most only helpful for
recovering from the situation where files have become disassociated
from their filenames, i.e. database - lost+found.  From the view
point of virtually the entire server, the block number space is just a
continuous sequence that starts at 0 and counts up forever (or,
anyway, until 2^32-1).  While it wouldn't be impossible to allow that
knowledge to percolate up to other parts of the server, it would
basically involve drilling a fairly arbitrary hole through an
abstraction boundary that has been intact for a very long time, and
it's not clear that there's anything magical about 1GB.
Nonwithstanding the foregoing...

 ISTM that a lot of problems we've faced in the past few years are because
 there's not a good abstraction between a (mostly) linear tuplespace and the
 physical storage that goes underneath it.

...I agree with this.  I'm not sure exactly what the replacement model
would look like, but it's definitely worth some thought - e.g. perhaps
there ought to be another mapping layer between logical block numbers
and files on disk, so that we can effectively delete blocks out of the
middle of a relation without requiring any special OS support, and so
that we can multiplex many small relation forks onto a single physical
file to minimize inode consumption.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] No, pg_size_pretty(numeric) was not such a hot idea

2012-05-25 Thread Tom Lane
In 9.1:

regression=# select pg_size_pretty(8*1024*1024);
 pg_size_pretty 

 8192 kB
(1 row)

In HEAD:

regression=# select pg_size_pretty(8*1024*1024);
ERROR:  function pg_size_pretty(integer) is not unique
LINE 1: select pg_size_pretty(8*1024*1024);
   ^
HINT:  Could not choose a best candidate function. You might need to add 
explicit type casts.

The argument for adding pg_size_pretty(numeric) was pretty darn thin in
the first place, IMHO; it does not seem to me that it justified this
loss of usability.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Jeff Frost

On May 25, 2012, at 4:02 PM, Tom Lane wrote:

 Greg Sabino Mullane g...@endpoint.com writes:
 Yeah, this is proof that what it was doing is the same as what we saw in
 Jeff's backtrace, ie loading up the system catalog relcache entries the
 hard way via seqscans on the core catalogs.  So the question to be
 answered is why that's suddenly a big performance bottleneck.  It's not
 a cheap operation of course (that's why we cache the results ;-)) but
 it shouldn't take minutes either.  And, because they are seqscans, it
 doesn't seem like messed-up indexes should matter.
 
 FWIW, this appeared to be an all-or-nothing event: either every new backend 
 was suffering through this, or none were. They all seemed to clear up 
 at the same time as well.
 
 Mostly not surprising.  They'd definitely all hit the missing init file
 at the same time, so the stalling would start consistently for all.  And
 once any one process successfully created a new file, subsequent incoming
 sessions wouldn't stall.  However, the remaining processes trying to
 compute new init files would still have to complete the process, so I'd
 expect there to be a diminishing effect --- the ones that were stalling
 shouldn't all release exactly together.  Unless there is some additional
 effect that's syncing them all.  (I wonder for instance if the syncscan
 logic is kicking in here.)

In our customer's case, the size of pg_attribute was a little less than 1/4 of 
shared_buffers, so might not be the syncscan?

BTW, In our case, I thought to take the system down to single user mode and 
reindex these.  When the indexes were disabled, I immediately experienced the 
slow startup, so it certainly seems like an issue with seq scanning these.

I'll see if i can reproduce that behavior by starting up with system indexes 
disabled.  This probably won't happen until tuesday when we get that data 
directory moved to a test server.

In our customer's case, it would happen for a while,then stop happening for 
some time...presumably this was after the caching, then it would start up 
again..presumably after something invalidated the cache.

Switching from the master to the streaming replica made the situation better, 
but not go away.

Then a full initdb solved the problem. I bet a vacuum full of pg_attribute 
would've done the trick though.


 

 
 1. Somebody decides to update one of those rows, and it gets dropped in
 some remote region of the table.  The only really plausible reason for
 this is deciding to fool with the column-specific stats target
 (attstattarget) of a system catalog.  Does that sound like something
 either of you might have done?  You could check it by looking at the
 ctid columns of the pg_attribute rows for system catalogs, and seeing
 if any have large block numbers.
 


Definitely wasn't done by me and I'm pretty sure the customer wouldn't have 
done that either.




---
Jeff Frost j...@pgexperts.com
CTO, PostgreSQL Experts, Inc.
Phone: 1-888-PG-EXPRT x506
FAX: 415-762-5122
http://www.pgexperts.com/ 








Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Tom Lane
Jeff Frost j...@pgexperts.com writes:
 In our customer's case, the size of pg_attribute was a little less than 1/4 
 of shared_buffers, so might not be the syncscan?

Could you go back and double check that?  If the shared_buffers setting
were 7GB not 8GB, that would fall right between the pg_attribute sizes
you posted before.  I'm getting somewhat convinced that this is the
right answer, because I've been able to reproduce an unexpectedly long
stall with multiple clients connecting simultaneously to an
init-file-less database whose pg_attribute is large enough to trigger
syncscans.  The particular case I'm testing has pg_attribute of about
1GB (in a machine with just 4GB RAM, so I'm not going to push it up
much further).  If I just remove the init file and connect with psql,
there's about a 1-second startup delay, which is bad enough; but if
I throw 50 concurrent connections at it with a hacked-up version of
pgbench, it takes about 45 seconds for all of them to get through
startup.  (You need to hack pgbench to suppress the single
initialization connection it normally likes to make, else the test
degenerates to the one-incoming-connection case.)

I think that a big chunk of this is coming from the syncscan logic
defeating the early-exit optimization in RelationBuildTupleDesc;
but it still seems like there's another inefficiency somewhere.
Maybe our syncscan logic just really sucks given enough backends
trying to piggyback on the same syncscan.  Now that I can reproduce it,
I'll take a closer look at that.

In the meantime, it looks like a trivial workaround is to disable
synchronize_seqscans via postgresql.conf.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-Database Roles

2012-05-25 Thread Robert Haas
On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, May 22, 2012 at 10:19:12AM -0400, Robert Haas wrote:
 In retrospect, I think the idea of shared catalogs was probably a bad
 idea.  I think we should have made roles and tablespaces database
 objects rather than shared objects, and come up with some ad-hoc
 method of representing the set of available databases.  But that
 decision seems to have been made sometime pre-1996, so the thought of
 changing it now is pretty painful, but I can dream...

 Yes, pre-1996.  I think the fact that authentication/user names appear
 in pg_hba.conf really locked the user name idea into global objects, and
 we have never really been able to make a dent in that.

Eh?  Why would the presence of usernames in pg_hba.conf mean that they
have to be global objects?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-Database Roles

2012-05-25 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian br...@momjian.us wrote:
  Yes, pre-1996.  I think the fact that authentication/user names appear
  in pg_hba.conf really locked the user name idea into global objects, and
  we have never really been able to make a dent in that.
 
 Eh?  Why would the presence of usernames in pg_hba.conf mean that they
 have to be global objects?

I havn't had a chance (yet) to look, but perhaps the current code
attempts to validate the role before figuring out what database is being
requested?  We'd have to essentially invert that, of course, for this..
One thing I was wondering about is if we're going to have an issue
supporting things like tell me what databases exist (psql -l), which
connect to the 'postgres' by default, for local-only roles.  I'm not
sure that I actually care, to be honest, but it's something to consider.
I don't think we should require users to create every local role also in
postgres, nor do I feel that we should allow connections to postgres by
any role, nor do I want to break tools which use 'postgres' to basically
get access to shared catalogs- but I don't see an immediate or easy
solution..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade libraries check

2012-05-25 Thread Bruce Momjian
On Fri, May 25, 2012 at 10:20:29AM -0400, Andrew Dunstan wrote:
 pg_upgrade is a little over-keen about checking for shared libraries
 that back functions. In particular, it checks for libraries that
 support functions created in pg_catalog, even if pg_dump doesn't
 export the function.

 The attached patch mimics the filter that pg_dump uses for functions
 so that only the relevant libraries are checked.
 
 This would remove the need for a particularly ugly hack in making
 the 9.1 backport of JSON binary upgradeable.

Andrew is right that pg_upgrade is overly restrictive in checking _any_
shared object file referenced in pg_proc.  I never expected that
pg_catalog would have such references, but in Andrew's case it does, and
pg_dump doesn't dump them, so I guess pg_upgrade shouldn't check them
either.

In some sense this is a hack for the JSON type, but it also gives users
a way to create shared object references in old clusters that are _not_
checked by pg_upgrade, and not migrated to the new server, so I suppose
it is fine.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-Database Roles

2012-05-25 Thread Bruce Momjian
On Fri, May 25, 2012 at 10:34:54PM -0400, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  On Thu, May 24, 2012 at 6:21 PM, Bruce Momjian br...@momjian.us wrote:
   Yes, pre-1996.  I think the fact that authentication/user names appear
   in pg_hba.conf really locked the user name idea into global objects, and
   we have never really been able to make a dent in that.
  
  Eh?  Why would the presence of usernames in pg_hba.conf mean that they
  have to be global objects?
 
 I havn't had a chance (yet) to look, but perhaps the current code
 attempts to validate the role before figuring out what database is being
 requested?  We'd have to essentially invert that, of course, for this..
 One thing I was wondering about is if we're going to have an issue
 supporting things like tell me what databases exist (psql -l), which
 connect to the 'postgres' by default, for local-only roles.  I'm not
 sure that I actually care, to be honest, but it's something to consider.
 I don't think we should require users to create every local role also in
 postgres, nor do I feel that we should allow connections to postgres by
 any role, nor do I want to break tools which use 'postgres' to basically
 get access to shared catalogs- but I don't see an immediate or easy
 solution..

Yes.  In a simple case, you have a username, you want to validate it
against LDAP or kerberos --- how do you partition the external
authentication tool based on database name?  Seems like an obvious
problem to me.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Jeff Frost

On May 25, 2012, at 7:12 PM, Tom Lane wrote:

 Jeff Frost j...@pgexperts.com writes:
 In our customer's case, the size of pg_attribute was a little less than 1/4 
 of shared_buffers, so might not be the syncscan?
 
 Could you go back and double check that?  If the shared_buffers setting
 were 7GB not 8GB,

It definitely started happening with 8GB of shared_buffers.

We actually tried reducing shared_buffers to 2GB to see if we were writing out 
too many dirty buffers at checkpoint time or something, but that had no effect, 
so we put it back to 8GB.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-Database Roles

2012-05-25 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 Eh?  Why would the presence of usernames in pg_hba.conf mean that they
 have to be global objects?

 I havn't had a chance (yet) to look, but perhaps the current code
 attempts to validate the role before figuring out what database is being
 requested?  We'd have to essentially invert that, of course, for this..

Even more to the point, what do you do when the database column is
all, or a list of more than one database name?

It's possible that we could define this away by saying that only
globally known usernames can be listed in pg_hba.conf, but I think
we'll still have implementation problems with doing authentication
for per-database usernames.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea

2012-05-25 Thread Fujii Masao
On Sat, May 26, 2012 at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In 9.1:

 regression=# select pg_size_pretty(8*1024*1024);
  pg_size_pretty
 
  8192 kB
 (1 row)

 In HEAD:

 regression=# select pg_size_pretty(8*1024*1024);
 ERROR:  function pg_size_pretty(integer) is not unique
 LINE 1: select pg_size_pretty(8*1024*1024);
               ^
 HINT:  Could not choose a best candidate function. You might need to add 
 explicit type casts.

 The argument for adding pg_size_pretty(numeric) was pretty darn thin in
 the first place, IMHO; it does not seem to me that it justified this
 loss of usability.

Ouch! But removing pg_size_pretty(numeric) causes another usability
issue, e.g., pg_size_pretty(pg_xlog_location_diff(...)) fails. So how about
removing pg_size_pretty(bigint) to resolve those two issues?
I guess pg_size_pretty(numeric) is a bit slower than bigint version, but
I don't think that such a bit slowdown of pg_size_pretty() becomes
a matter practically. No?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers