[HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Benedikt Grundmann
Hello list,

I have a question of how to benchmark hardware to determine
the appropriate ratio of seq_page_cost vs random_page_cost.

Emails in this mailing lists archive seem to indicate that 
1.0 vs 3.0 - 4.0 are appropriate values on modern hardware.

Which surprised me a bit as I had thought that on actual 
harddrives (ignoring SSDs) random_page_cost is higher.
I guess that the number tries to reflect caching of the
relevant pages in memory and modern hardware you have
more of that?

Anyhow it would be great to have a scientific way of
setting those numbers.

Background:

We have recently upgrade two of our biggest postgres databases 
to new hardware and minor version number bump (8.4.5 - 8.4.9).

We are experiencing a big performance regression in some queries.
In those cases the planner seems to choose a nested loop index
scan instead of hashing the index once and then joining.

The new hardware was optimized for seq scans and does those
very fast.  

We are not sure if the database used to choose differently
before the move to the new hardware and the hardware is 
performing worse for random seeks.  Or if the planner is
now making different choices.

As a counter measure we are experimenting with 
enable_nestloop = off
random_page_cost = 20 (instead of the previous 4).

It is worth noting that for many small tables the nestloop
is indeed marginally faster (in the doesn't really matter 
because both cases are fast enough case).  But the regression
for the big tables (big in the sense of index just fits into
memory but in practice might not because there other frequently
accessed big tables) is a show stopper.

For some of those tables we have also have recently (as part
of the move) clustered for the first time in ages and it was
speculated that that might have changed statistics (such
as correlation) and increased the attractiveness of the 
index scan to the planner.

Another thing that I have thought before might be provide
some enlightenment would be a 
explain log select ...

That would show all the sub plans considered and why they 
were discarded or something approximating this.

Thanks in advance for any reply and sorry that this email
turned out to be rather long stream of consciousness dump.

Bene

-- relevant parts of our postgresql.conf ---

shared_buffers = 12GB   # min 128kB
# (change requires restart)
temp_buffers = 512MB# min 800kB
#max_prepared_transactions = 0  # zero disables the feature
# (change requires restart)
# Note:  Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
work_mem = 192MB# min 64kB
maintenance_work_mem = 1GB  # min 1MB
#max_stack_depth = 2MB  # min 100kB

# - Kernel Resource Usage -

#max_files_per_process = 1000   # min 25
# (change requires restart)
#shared_preload_libraries = ''  # (change requires restart)

# - Cost-Based Vacuum Delay -

vacuum_cost_delay = 100ms   # 0-100 milliseconds
vacuum_cost_page_hit = 1# 0-1 credits
vacuum_cost_page_miss = 10  # 0-1 credits
vacuum_cost_page_dirty = 20 # 0-1 credits
vacuum_cost_limit = 7500# 1-1 credits

# - Background Writer -

#bgwriter_delay = 200ms # 10-1ms between rounds
#bgwriter_lru_maxpages = 100# 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers 
scanned/round

# - Asynchronous Behavior -

effective_io_concurrency = 40# 1-1000. 0 disables prefetching

# WRITE AHEAD LOG

fsync = on  # turns forced synchronization on or off
synchronous_commit = off# immediate fsync at commit
#wal_sync_method = fsync# the default is the first option 
# supported by the operating system:
#   open_datasync
#   fdatasync
#   fsync
#   fsync_writethrough
#   open_sync
full_page_writes = on   # recover from partial page writes
wal_buffers = 16MB  # min 32kB
# (change requires restart)
#wal_writer_delay = 200ms   # 1-1 milliseconds

commit_delay = 1000 # range 0-10, in microseconds
commit_siblings = 5 # range 1-1000

# - Checkpoints -

checkpoint_segments = 128   # in logfile segments, min 1, 16MB each
checkpoint_timeout = 10min  # range 30s-1h
checkpoint_completion_target = 0.9  # checkpoint target duration, 0.0 - 1.0
checkpoint_warning = 30s# 0 disables

# - Archiving -

archive_mode = off  # allows 

Re: [HACKERS] WIP: Join push-down for foreign tables

2012-01-05 Thread Heikki Linnakangas

On 03.12.2011 01:05, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

Hmm, so you're saying that the FDW function needs to be able to return
multiple paths for a single joinrel. Fair enough, and that's not
specific to remote joins. Even a single-table foreign scan could be
implemented differently depending on whether you prefer fast-start or
cheapest total.


... or ordered vs unordered, etc.  Yeah, good point, we already got this
wrong with the PlanForeignScan API.  Good thing we didn't promise that
would be stable.


This discussion withered down here...

I think the advice to Shigeru-san is to work on the API. We didn't reach 
a consensus on what exactly it should look like, but at least you need 
to be able to return multiple paths for a single joinrel, and should 
look at fixing the PlanForeignScan API to allow that too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Florian Pflug
On Jan4, 2012, at 21:27 , Robert Haas wrote:
 I think the first thing we need to look at is increasing the number of
 CLOG buffers.

What became of the idea to treat the stable (i.e. earlier than the oldest
active xid) and the unstable (i.e. the rest) parts of the CLOG differently.

On 64-bit machines at least, we could simply mmap() the stable parts of the
CLOG into the backend address space, and access it without any locking at all.

I believe that we could also compress the stable part by 50% if we use one
instead of two bits per txid. AFAIK, we need two bits because we

  a) Distinguish between transaction where were ABORTED and those which never
 completed (due to i.e. a backend crash) and

  b) Mark transaction as SUBCOMMITTED to achieve atomic commits.

Which both are strictly necessary for the stable parts of the clog. Note that
we could still keep the uncompressed CLOG around for debugging purposes - the
additional compressed version would require only 2^32/8 bytes = 512 MB in the
worst case, which people who're serious about performance can very probably
spare.

The fly in the ointment are 32-bit machines, of course - but then, those could
still fall back to the current way of doing things.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER DOMAIN DROP CONSTRAINT doesn't catch errors

2012-01-05 Thread Robert Haas
On Thu, Dec 29, 2011 at 4:02 PM, Peter Eisentraut pete...@gmx.net wrote:
 Is there a secret reason why

 ALTER DOMAIN foo DROP CONSTRAINT nonexistent;

 doesn't report any error?

 If not, I think we should add one and also add the usual IF EXISTS
 option.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Heikki Linnakangas
My laptop ran out of battery and turned itself off while I was just 
starting up postmaster. After plugging in the charger and rebooting, I 
got the following error when I tried to restart PostgreSQL:


FATAL:  bogus data in lock file postmaster.pid: 

postmaster.pid file was present in the data directory, but had zero 
length. Looking at the way the file is created and written, that can 
happen if you crash after the file is created, but before it's 
written/fsync'd (my laptop might have write-cache enabled, which would 
make the window larger).


I was a bit surprised by that. That's probably not a big deal in 
practice, but I wonder if there was some easy way to avoid that. First I 
thought we could create the new postmaster.pid file with a temporary 
name and rename it in place, but rename(2) will merrily overwrite any 
existing file which is not what we want. We could use link(2), I guess.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Magnus Hagander
On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 My laptop ran out of battery and turned itself off while I was just starting
 up postmaster. After plugging in the charger and rebooting, I got the
 following error when I tried to restart PostgreSQL:

 FATAL:  bogus data in lock file postmaster.pid: 

 postmaster.pid file was present in the data directory, but had zero length.
 Looking at the way the file is created and written, that can happen if you
 crash after the file is created, but before it's written/fsync'd (my laptop
 might have write-cache enabled, which would make the window larger).

 I was a bit surprised by that. That's probably not a big deal in practice,
 but I wonder if there was some easy way to avoid that. First I thought we
 could create the new postmaster.pid file with a temporary name and rename it
 in place, but rename(2) will merrily overwrite any existing file which is
 not what we want. We could use link(2), I guess.

Is this really a problem big enough to spend even that much effort on?

Perhaps a special-case in the error message instead is enough?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Merlin Moncure
On Thu, Jan 5, 2012 at 5:15 AM, Florian Pflug f...@phlo.org wrote:
 On Jan4, 2012, at 21:27 , Robert Haas wrote:
 I think the first thing we need to look at is increasing the number of
 CLOG buffers.

 What became of the idea to treat the stable (i.e. earlier than the oldest
 active xid) and the unstable (i.e. the rest) parts of the CLOG differently.


I'm curious -- anyone happen to have an idea how big the unstable CLOG
xid space is in the typical case?  What's would be the main driver
of making it bigger?  What are the main tradeoffs in terms of trying
to keep the unstable area compact?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] optimizing repeated MVCC snapshots

2012-01-05 Thread Robert Haas
On Tue, Jan 3, 2012 at 2:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Another thought is that it should always be safe to reuse an old
 snapshot if no transactions have committed or aborted since it was
 taken

 Yeah, that might work better.  And it'd be a win for all MVCC snaps,
 not just the ones coming from promoted SnapshotNow ...

Here's a rough patch for that.  Some benchmark results from a 32-core
Itanium server are included below.  They look pretty good.  Out of the
dozen configurations I tested, all but one came out ahead with the
patch.  The loser was 80-clients on permanent tables, but I'm not too
worried about that since 80 clients on unlogged tables came out ahead.

This is not quite in a committable state yet, since it presumes atomic
64-bit reads and writes, and those aren't actually atomic everywhere.
What I'm thinking we can do is: on platforms where 8-byte reads and
writes are known to be atomic, we do as the patch does currently.  On
any platform where we don't know that to be the case, we can move the
test I added to GetSnapshotData inside the lock, which should still be
a small win at low concurrencies.  At high concurrencies it's a bit
iffy, because making GetSnapshotData's critical section shorter might
lead to lots of people trying to manipulate the ProcArrayLock spinlock
in very rapid succession.  Even if that turns out to be an issue, I'm
inclined to believe that anyone who has enough concurrency for that to
matter probably also has atomic 8-byte reads and writes, and so the
most that will be needed is an update to our notion of which platforms
have that capability.  If that turns out to be wrong, the other
obvious alternative is to not to the test at all unless it can be done
unlocked.

To support the above, I'm inclined to add a new file
src/include/atomic.h which optionally defines a macro called
ATOMIC_64BIT_OPS and macros atomic_read_uint64(r) and
atomic_write_uint64(l, r).  That way we can eventually support (a)
architectures where 64-bit operations aren't atomic at all, (b)
architectures where ordinary 64-bit operations are atomic
(atomic_read_unit64(r) - r, and atomic_write_uint64(l, r) - l = r),
and (c) architectures (like 32-bit x86) where ordinary 64-bit
operations aren't atomic but special instructions (cmpxchg8b) can be
used to get that behavior.

m = master, s = with patch.  scale factor 100, median of three
5-minute test runs.  shared_buffers=8GB, checkpoint_segments=300,
checkpoint_timeout=30min, effective_cache_size=340GB,
wal_buffers=16MB, wal_writer_delay=20ms, listen_addresses='*',
synchronous_commit=off.  binary modified with chatr +pd L +pi L and
run with rtsched -s SCHED_NOAGE -p 178.

Permanent Tables


m01 tps = 912.865209 (including connections establishing)
s01 tps = 916.848536 (including connections establishing)
m08 tps = 6256.429549 (including connections establishing)
s08 tps = 6364.214425 (including connections establishing)
m16 tps = 10795.373683 (including connections establishing)
s16 tps = 11038.233359 (including connections establishing)
m24 tps = 13710.400042 (including connections establishing)
s24 tps = 13836.823580 (including connections establishing)
m32 tps = 14574.758456 (including connections establishing)
s32 tps = 15125.196227 (including connections establishing)
m80 tps = 12014.498814 (including connections establishing)
s80 tps = 11825.302643 (including connections establishing)

Unlogged Tables
===

m01 tps = 942.950926 (including connections establishing)
s01 tps = 953.618574 (including connections establishing)
m08 tps = 6492.238255 (including connections establishing)
s08 tps = 6537.197731 (including connections establishing)
m16 tps = 11363.708861 (including connections establishing)
s16 tps = 11561.193527 (including connections establishing)
m24 tps = 14656.659546 (including connections establishing)
s24 tps = 14977.226426 (including connections establishing)
m32 tps = 16310.814143 (including connections establishing)
s32 tps = 16644.921538 (including connections establishing)
m80 tps = 13422.438927 (including connections establishing)
s80 tps = 13780.256723 (including connections establishing)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


optimize-repeated-snapshots.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing repeated MVCC snapshots

2012-01-05 Thread Florian Weimer
* Robert Haas:

 and (c) architectures (like 32-bit x86) where ordinary 64-bit
 operations aren't atomic but special instructions (cmpxchg8b) can be
 used to get that behavior.

FILD and FIST are atomic, too, and are supported by more
micro-architectures.

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 6:15 AM, Florian Pflug f...@phlo.org wrote:
 On 64-bit machines at least, we could simply mmap() the stable parts of the
 CLOG into the backend address space, and access it without any locking at all.

True.  I think this could be done, but it would take some fairly
careful thought and testing because (1) we don't currently use mmap()
anywhere else in the backend AFAIK, so we might run into portability
issues (think: Windows) and perhaps unexpected failure modes (e.g.
mmap() fails because there are too many mappings already).  Also, it's
not completely guaranteed to be a win.  Sure, you save on locking, but
now you are doing an mmap() call in every backend instead of just one
read() into shared memory.  If concurrency isn't a problem that might
be more expensive on net.  Or maybe no, but I'm kind of inclined to
steer clear of this whole area at least for 9.2.  So far, the only
test result I have only supports the notion that we run into trouble
when NUM_CPUS  NUM_CLOG_BUFFERS, and people have to before they can
even start their I/Os.  That can be fixed with a pretty modest
reengineering.  I'm sure there is a second-order effect from the cost
of repeated I/Os per se, which a backend-private cache of one form or
another might well help with, but it may not be very big.  Test
results are welcome, of course.

 I believe that we could also compress the stable part by 50% if we use one
 instead of two bits per txid. AFAIK, we need two bits because we

  a) Distinguish between transaction where were ABORTED and those which never
     completed (due to i.e. a backend crash) and

  b) Mark transaction as SUBCOMMITTED to achieve atomic commits.

 Which both are strictly necessary for the stable parts of the clog.

Well, if we're going to do compression at all, I'm inclined to think
that we should compress by more than a factor of two.  Jim Nasby's
numbers (the worst we've seen so far) show that 18% of 1k blocks of
XIDs were all commits.  Presumably if we reduced the chunk size to,
say, 8 transactions, that percentage would go up, and even that would
be enough to get 16x compression rather than 2x.  Of course, then
keeping the uncompressed CLOG files becomes required rather than
optional, but that's OK.  What bothers me about compressing by only 2x
is that the act of compressing is not free.  You have to read all the
chunks and then write out new chunks, and those chunks then compete
for each other in cache.  Who is to say that we're not better off just
reading the uncompressed data at that point?  At least then we have
only one copy of it.

 Note that
 we could still keep the uncompressed CLOG around for debugging purposes - the
 additional compressed version would require only 2^32/8 bytes = 512 MB in the
 worst case, which people who're serious about performance can very probably
 spare.

I don't think it'd be even that much, because we only ever use half
the XID space at a time, and often probably much less: the default
value of vacuum_freeze_table_age is only 150 million transactions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 8:23 AM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 My laptop ran out of battery and turned itself off while I was just starting
 up postmaster. After plugging in the charger and rebooting, I got the
 following error when I tried to restart PostgreSQL:

 FATAL:  bogus data in lock file postmaster.pid: 

 postmaster.pid file was present in the data directory, but had zero length.
 Looking at the way the file is created and written, that can happen if you
 crash after the file is created, but before it's written/fsync'd (my laptop
 might have write-cache enabled, which would make the window larger).

 I was a bit surprised by that. That's probably not a big deal in practice,
 but I wonder if there was some easy way to avoid that. First I thought we
 could create the new postmaster.pid file with a temporary name and rename it
 in place, but rename(2) will merrily overwrite any existing file which is
 not what we want. We could use link(2), I guess.

 Is this really a problem big enough to spend even that much effort on?

 Perhaps a special-case in the error message instead is enough?

On a laptop it's not a big deal, but it sort of sucks if your
production database fails to start automatically after an unexpected
power outage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 5:04 AM, Benedikt Grundmann
bgrundm...@janestreet.com wrote:
 We are experiencing a big performance regression in some queries.
 In those cases the planner seems to choose a nested loop index
 scan instead of hashing the index once and then joining.

I think you probably need to post EXPLAIN ANALYZE output from the
actual queries to get useful advice, probably to pgsql-performance,
rather than here.

It's hard to believe that enable_nestloop=off is doing anything other
than masking whatever the real problem is, but it's hard to tell what
that problem is based on the information provided.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Benedikt Grundmann
On 05/01/12 10:04, Benedikt Grundmann wrote:
 
 As a counter measure we are experimenting with 
 enable_nestloop = off
 random_page_cost = 20 (instead of the previous 4).
 
For what it is worth we had to revert the enable_nestloop = off 
change.  It just moved the pain around by making other queries
perform much worse than before. 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing repeated MVCC snapshots

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 9:01 AM, Florian Weimer fwei...@bfk.de wrote:
 * Robert Haas:
 and (c) architectures (like 32-bit x86) where ordinary 64-bit
 operations aren't atomic but special instructions (cmpxchg8b) can be
 used to get that behavior.

 FILD and FIST are atomic, too, and are supported by more
 micro-architectures.

Yeah, I think you (or someone) mentioned the code.  If someone wants
to write the code, I'm game...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread Andrew Dunstan



On 01/04/2012 08:32 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

Tom said:

2. A slightly cleaner fix for this should be to duplicate the SV and
then release the copy around the SvPVutf8 call, only if it's readonly.
Fixing it in do_util_elog is entirely the wrong thing.

How do we tell if it's readonly?

SvREADONLY(sv) macro.






OK, so this seems to fix The issue David found:

diff --git a/src/pl/plperl/plperl_helpers.h b/src/pl/plperl/plperl_helpers.h
index ac0a97d..f0e1afa 100644
--- a/src/pl/plperl/plperl_helpers.h
+++ b/src/pl/plperl/plperl_helpers.h
@@ -51,7 +51,10 @@ sv2cstr(SV *sv)
/*
 * get a utf8 encoded char * out of perl. *note* it may not be 
valid utf8!

 */
-   val = SvPVutf8(sv, len);
+   if (SvREADONLY(sv))
+   val = SvPVutf8(sv_mortalcopy(sv), len);
+   else
+   val = SvPVutf8(sv, len);

/*
 * we use perls length in the event we had an embedded null byte to 
ensure




but it doesn't fix the one I found which passes a typeglob to elog():

   do '$foo=1; elog(NOTICE,*foo);' language plperl;


That still crashes, but doesn't if we use sv_mortalcopy unconditionally.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-05 Thread Andrew Dunstan



On 01/04/2012 06:20 PM, Tom Lane wrote:

Could we detect an appropriate line ending in ahwrite() after it's been
decompressed and buffer partial lines accordingly?

Not easily: there could be newlines embedded in data strings or SQL
identifiers.




Should we look at eliminating those newlines for the future by using 
U identifiers where there are embedded newlines and unicode escapes 
for newlines in data strings?


Then at least we'd possibly be able to get rid of the kludge some time 
in the future.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Benedikt Grundmann
For what's worth here are the numbers on one of our biggest databases
(same system as I posted about separately wrt seq_scan_cost vs
random_page_cost).


0053 1001
00BA 1009
0055 1001
00B9 1020
0054 983
00BB 1010
0056 1001
00BC 1019
0069 0
00BD 1009
006A 224
00BE 1018
006B 1009
00BF 1008
006C 1008
00C0 1006
006D 1004
00C1 1014
006E 1016
00C2 1023
006F 1003
00C3 1012
0070 1011
00C4 1000
0071 1011
00C5 1002
0072 1005
00C6 982
0073 1009
00C7 996
0074 1013
00C8 973
0075 1002
00D1 987
0076 997
00D2 968
0077 1007
00D3 974
0078 1012
00D4 964
0079 994
00D5 981
007A 1013
00D6 964
007B 999
00D7 966
007C 1000
00D8 971
007D 1000
00D9 956
007E 1008
00DA 976
007F 1010
00DB 950
0080 1001
00DC 967
0081 1009
00DD 983
0082 1008
00DE 970
0083 988
00DF 965
0084 1007
00E0 984
0085 1012
00E1 1004
0086 1004
00E2 976
0087 996
00E3 941
0088 1008
00E4 960
0089 1003
00E5 948
008A 995
00E6 851
008B 1001
00E7 971
008C 1003
00E8 954
008D 982
00E9 938
008E 1000
00EA 931
008F 1008
00EB 956
0090 1009
00EC 960
0091 1013
00ED 962
0092 1006
00EE 933
0093 1012
00EF 956
0094 994
00F0 978
0095 1017
00F1 292
0096 1004
0097 1005
0098 1014
0099 1012
009A 994
0035 1003
009B 1007
0036 1004
009C 1010
0037 981
009D 1024
0038 1002
009E 1009
0039 998
009F 1011
003A 995
00A0 1015
003B 996
00A1 1018
003C 1013
00A5 1007
003D 1008
00A3 1016
003E 1007
00A4 1020
003F 989
00A7 375
0040 989
00A6 1010
0041 975
00A9 3
0042 994
00A8 0
0043 1010
00AA 1
0044 1007
00AB 1
0045 1008
00AC 0
0046 991
00AF 4
0047 1010
00AD 0
0048 997
00AE 0
0049 1002
00B0 5
004A 1004
00B1 0
004B 1012
00B2 0
004C 999
00B3 0
004D 1008
00B4 0
004E 1007
00B5 807
004F 1010
00B6 1007
0050 1004
00B7 1007
0051 1009
00B8 1006
0052 1005
0057 1008
00C9 994
0058 991
00CA 977
0059 1000
00CB 978
005A 998
00CD 944
005B 971
00CC 972
005C 1005
00CF 969
005D 1010
00CE 988
005E 1006
00D0 975
005F 1015
0060 989
0061 998
0062 1014
0063 1000
0064 991
0065 990
0066 1000
0067 947
0068 377
00A2 1011


On 23/12/11 14:23, Kevin Grittner wrote:
 Jeff Janes jeff.ja...@gmail.com wrote:
  
  Could we get some major OLTP users to post their CLOG for
  analysis?  I wouldn't think there would be much
  security/propietary issues with CLOG data.
  
 FWIW, I got the raw numbers to do my quick check using this Ruby
 script (put together for me by Peter Brant).  If it is of any use to
 anyone else, feel free to use it and/or post any enhanced versions
 of it.
  
 #!/usr/bin/env ruby
 
 Dir.glob(*) do |file_name|
   contents = File.read(file_name)
   total = 
 contents.enum_for(:each_byte).enum_for(:each_slice,
 256).inject(0) do |count, chunk|
   if chunk.all? { |b| b == 0x55 }
 count + 1
   else
 count
   end
 end
   printf %s %d\n, file_name, total
 end
  
 -Kevin
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums + Double Writes

2012-01-05 Thread Kevin Grittner
Benedikt Grundmann bgrundm...@janestreet.com wrote:
 
 For what's worth here are the numbers on one of our biggest
 databases (same system as I posted about separately wrt
 seq_scan_cost vs random_page_cost).
 
That's would be a 88.4% hit rate on the summarized data.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] easy way of copying regex_t ?

2012-01-05 Thread Tomas Vondra
Hi all,

I've been working on moving an extension that allows to move the ispell
dictionaries to shared segment. It's almost complete, the last FIXME is
about copying regex_t structure (stored in AFFIX).

According to regex.h the structure is fairly complex and not exactly easy
to understand, so I'd like to know if anyone here already implemented that
or something that may serve the same purpose. Any ideas?

kind regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-05 Thread Simon Riggs
On Wed, Jan 4, 2012 at 3:13 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jan 4, 2012 at 1:31 PM, Stephen Frost sfr...@snowman.net wrote:
 Simon, all,

 * Simon Riggs (si...@2ndquadrant.com) wrote:
 (1) report all errors on a page, including errors that don't change
 PostgreSQL data. This involves checksumming long strings of zeroes,
 which the checksum algorithm can't tell apart from long strings of
 ones.

 Do we actually know when/where it's supposed to be all zeros, and hence
 could we check for that explicitly?  If we know what it's supposed to
 be, in order to be consistent with other information, I could certainly
 see value in actually checking that.

 Yes, we can. Excellent suggestion, will implement.

No, we can't.

I discover that non-all-zeroes holes are fairly common, just not very frequent.

That may or may not be a problem, but not something to be dealt with
here and now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/04/2012 08:32 PM, Tom Lane wrote:
 Andrew Dunstanand...@dunslane.net  writes:
 How do we tell if it's readonly?
 SvREADONLY(sv) macro.

 but it doesn't fix the one I found which passes a typeglob to elog():
 do '$foo=1; elog(NOTICE,*foo);' language plperl;

Mmm, I was wondering about that one.

 That still crashes, but doesn't if we use sv_mortalcopy unconditionally.

Unconditional sv_mortalcopy sounds like the thing to do then, but a
comment would help.  And if this isn't a Perl bug, I would like to
know what is.

BTW, shouldn't we be making some attempt to drop the result of the
sv_mortalcopy?  Or is it just assumed that it will be garbage-collected
before the memory leak gets too bad?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-05 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/04/2012 06:20 PM, Tom Lane wrote:
 Not easily: there could be newlines embedded in data strings or SQL
 identifiers.

 Should we look at eliminating those newlines for the future by using 
 U identifiers where there are embedded newlines and unicode escapes 
 for newlines in data strings?

 Then at least we'd possibly be able to get rid of the kludge some time 
 in the future.

Given our high expectations for forward-compatibility of dump files, we
couldn't drop the code anytime in the foreseeable future, so I cannot
get excited about spending time on that.  (AFAIR, we have *never*
intentionally broken compatibility of pg_dump files, unless you count
breaking dumps made by unreleased development versions.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-05 Thread Simon Riggs
On Wed, Jan 4, 2012 at 1:35 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs  wrote:

 My focus was on getting something working first, then tuning. If
 we're agreed that we have everything apart from the tuning then we
 can proceed with tests to see which works better.

 Sure.  I just think you are there already except for what I got into.

 FWIW, moving the modulus application out of the loop is a very
 trivial change and has no affect on the results; it's strictly a
 performance issue.

New version attached, with your suggested changes included. Hole check
code is there as well, but ifdef'd out since it isn't a valid check in
all cases.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0cc3296..9b367a3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1701,6 +1701,48 @@ SET ENABLE_SEQSCAN TO OFF;
   /listitem
  /varlistentry
 
+ varlistentry id=guc-page-checksums xreflabel=page_checksums
+  indexterm
+   primaryvarnamepage_checksums/ configuration parameter/primary
+  /indexterm
+  termvarnamepage_checksums/varname (typeboolean/type)/term
+  listitem
+   para
+When this parameter is on, the productnamePostgreSQL/ server
+calculates checksums when it writes main database pages to disk,
+flagging the page as checksum protected.  When this parameter is off,
+no checksum is written, only a standard watermark in the page header.
+The database may thus contain a mix of pages with checksums and pages
+without checksums.
+   /para
+
+   para
+When pages are read into shared buffers any page flagged with a
+checksum has the checksum re-calculated and compared against the
+stored value to provide greatly improved validation of page contents.
+   /para
+
+   para
+Writes via temp_buffers are not checksummed.
+   /para
+
+   para
+Turning this parameter off speeds normal operation, but
+might allow data corruption to go unnoticed. The checksum uses
+16-bit checksums, using the fast Fletcher 16 algorithm. With this
+parameter enabled there is still a non-zero probability that an error
+could go undetected, as well as a non-zero probability of false
+positives.
+   /para
+
+   para
+This parameter can only be set in the filenamepostgresql.conf/
+file or on the server command line.
+The default is literaloff/.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-wal-buffers xreflabel=wal_buffers
   termvarnamewal_buffers/varname (typeinteger/type)/term
   indexterm
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 91cc001..a43b7be 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -440,7 +440,7 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
 
 			/* check for garbage data */
-			if (!PageHeaderIsValid((PageHeader) bufBlock))
+			if (!PageIsVerified((Page) bufBlock))
 			{
 if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages)
 {
@@ -1860,6 +1860,8 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
 {
 	XLogRecPtr	recptr;
 	ErrorContextCallback errcontext;
+	Block		bufBlock;
+	char		*bufCopy;
 
 	/*
 	 * Acquire the buffer's io_in_progress lock.  If StartBufferIO returns
@@ -1907,10 +1909,24 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
 	buf-flags = ~BM_JUST_DIRTIED;
 	UnlockBufHdr(buf);
 
+	/*
+	 * Set page verification info immediately before we write the buffer to disk.
+	 * Once we have flushed the buffer is marked clean again, meaning it can
+	 * be replaced quickly and silently with another data block, so we must
+	 * write verification info now. For efficiency, the process of cleaning
+	 * and page replacement is asynchronous, so we can't do this *only* when
+	 * we are about to replace the buffer, we need to do this for every flush.
+	 */
+	bufBlock = BufHdrGetBlock(buf);
+	bufCopy = PageSetVerificationInfo((Page) bufBlock);
+
+	/*
+	 * bufToWrite is either the shared buffer or a copy, as appropriate.
+	 */
 	smgrwrite(reln,
 			  buf-tag.forkNum,
 			  buf-tag.blockNum,
-			  (char *) BufHdrGetBlock(buf),
+			  (char *) bufCopy,
 			  false);
 
 	pgBufferUsage.shared_blks_written++;
@@ -1921,6 +1937,8 @@ FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
 	 */
 	TerminateBufferIO(buf, true, 0);
 
+	/* XXX Assert(buf is not BM_JUST_DIRTIED) */
+
 	TRACE_POSTGRESQL_BUFFER_FLUSH_DONE(buf-tag.forkNum,
 	   buf-tag.blockNum,
 	   reln-smgr_rnode.node.spcNode,
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 

Re: [HACKERS] CLOG contention

2012-01-05 Thread Robert Haas
On Tue, Dec 27, 2011 at 5:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sat, Dec 24, 2011 at 9:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Dec 22, 2011 at 4:20 PM, Robert Haas robertmh...@gmail.com wrote:

 Also, if it is that, what do we do about it?  I don't think any of the
 ideas proposed so far are going to help much.

 If you don't like guessing, don't guess, don't think. Just measure.

 Does increasing the number of buffers solve the problems you see? That
 must be the first port of call - is that enough, or not? If not, we
 can discuss the various ideas, write patches and measure them.

 Just in case you want a theoretical prediction to test:

 increasing NUM_CLOG_BUFFERS should reduce the frequency of the spikes
 you measured earlier. That should happen proportionally, so as that is
 increased they will become even less frequent. But the size of the
 buffer will not decrease the impact of each event when it happens.

I'm still catching up on email, so apologies for the slow response on
this.  I actually ran this test before Christmas, but didn't get
around to emailing the results.  I'm attaching graphs of the last 100
seconds of a run with the normal count of CLOG buffers, and the last
100 seconds of a run with NUM_CLOG_BUFFERS = 32.  I am also attaching
graphs of the entire runs.

It appears to me that increasing the number of CLOG buffers reduced
the severity of the latency spikes considerably.  In the last 100
seconds, for example, master has several spikes in the 500-700ms
range, but with 32 CLOG buffers it never goes above 400 ms.  Also, the
number of points associated with each spike is considerably less -
each spike seems to affect fewer transactions.  So it seems that at
least on this machine, increasing the number of CLOG buffers both
improves performance and reduces latency.

I hypothesize that there are actually two kinds of latency spikes
here.  Just taking a wild guess, I wonder if the *remaining* latency
spikes are caused by the effect that you mentioned before: namely, the
need to write an old CLOG page every time we advance onto a new one.
I further speculate that the spikes are more severe on the unpatched
code because this effect combines with the one I mentioned before: if
there are more simultaneous I/O requests than there are buffers, a new
I/O request has to wait for one of the I/Os already in progress to
complete.  If the new I/O request that has to wait extra-long happens
to be the one caused by XID advancement, then things get really ugly.
If that hypothesis is correct, then it supports your previous belief
that more than one fix is needed here... but it also means we can get
a significant and I think quite worthwhile benefit just out of finding
a reasonable way to add some more buffers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
attachment: latency-end.pngattachment: latency-clog32-end.pngattachment: latency.pngattachment: latency-clog32.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 My laptop ran out of battery and turned itself off while I was just 
 starting up postmaster. After plugging in the charger and rebooting, I 
 got the following error when I tried to restart PostgreSQL:

 FATAL:  bogus data in lock file postmaster.pid: 

 postmaster.pid file was present in the data directory, but had zero 
 length. Looking at the way the file is created and written, that can 
 happen if you crash after the file is created, but before it's 
 written/fsync'd (my laptop might have write-cache enabled, which would 
 make the window larger).

 I was a bit surprised by that. That's probably not a big deal in 
 practice, but I wonder if there was some easy way to avoid that. First I 
 thought we could create the new postmaster.pid file with a temporary 
 name and rename it in place, but rename(2) will merrily overwrite any 
 existing file which is not what we want. We could use link(2), I guess.

I think link(2) would create race conditions of its own.  I'd be
inclined to suggest that maybe we should just special-case a zero length
postmaster.pid file as meaning okay to proceed.  In general, garbage
data in postmaster.pid is something I'm happy to insist on manual
recovery from, but maybe we could safely make an exception for empty
files.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Simon Riggs
On Thu, Jan 5, 2012 at 4:04 PM, Robert Haas robertmh...@gmail.com wrote:

 It appears to me that increasing the number of CLOG buffers reduced
 the severity of the latency spikes considerably.  In the last 100
 seconds, for example, master has several spikes in the 500-700ms
 range, but with 32 CLOG buffers it never goes above 400 ms.  Also, the
 number of points associated with each spike is considerably less -
 each spike seems to affect fewer transactions.  So it seems that at
 least on this machine, increasing the number of CLOG buffers both
 improves performance and reduces latency.

I believed before that the increase was worthwhile and now even more so.

Let's commit the change to 32.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 Robert Haas robertmh...@gmail.com wrote:
 
 So it seems that at least on this machine, increasing the number
 of CLOG buffers both improves performance and reduces latency.
 
 I believed before that the increase was worthwhile and now even
 more so.
 
 Let's commit the change to 32.
 
+1
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Client Messages

2012-01-05 Thread Jim Mlodgenski
I have a need to send banner messages to a psql client that I can set
on the server and will be displayed on any psql client that connects
to the database. This would be mostly used as an additional indicator
to which database you are connecting, but could also be used by people
to force their users to see an security message when connecting to the
database. The attached patch will allow you to execute

ALTER DATABASE postgres SET
client_message=E'\nBEWARE:
You are connecting to a production database. If you do anything to\n
 bring this server down, you will be destroyed by your supreme
overlord.\n\n';

And then when you connect to psql, you will see:

[e3@workstation bin]$ ./psql -U user1 postgres
psql (9.2devel)

BEWARE: You are connecting to a production database. If you do anything to
bring this server down, you will be destroyed by your supreme overlord.


Type help for help.

postgres=


Any feedback is welcome.

Thanks
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 0cc3296..fa5b942
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** dynamic_library_path = 'C:\tools\postgre
*** 5323,5328 
--- 5323,5341 
/listitem
   /varlistentry
  
+  varlistentry id=guc-client-message xreflabel=client_message
+   termvarnameclient_message/varname (typestring/type)/term
+   indexterm
+primaryvarnameclient_message/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ The varnameclient_message/varname can be any string that will be 
+ displayed to the user in the banner of psql. 
+/para
+   /listitem
+  /varlistentry
+ 
   /variablelist
  /sect2
 /sect1
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index 5c910dd..58b6000
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*** static char *log_destination_string;
*** 455,460 
--- 455,461 
  static char *syslog_ident_str;
  static bool phony_autocommit;
  static bool session_auth_is_superuser;
+ static char *client_message_string;
  static double phony_random_seed;
  static char *client_encoding_string;
  static char *datestyle_string;
*** static struct config_string ConfigureNam
*** 3018,3023 
--- 3019,3035 
  		check_application_name, assign_application_name, NULL
  	},
  
+ {
+ {client_message, PGC_USERSET, CLIENT_CONN_OTHER,
+ gettext_noop(Sets a message to be displayed to the user when connecting via psql.),
+ NULL,
+ GUC_REPORT | GUC_NO_SHOW_ALL
+ },
+ client_message_string,
+ ,
+ NULL, NULL, NULL
+ },
+ 
  	/* End-of-list marker */
  	{
  		{NULL, 0, 0, NULL, NULL}, NULL, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
new file mode 100644
index 315db46..8eb5af5
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***
*** 515,520 
--- 515,521 
  
  #dynamic_library_path = '$libdir'
  #local_preload_libraries = ''
+ #client_message = ''
  
  
  #--
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
new file mode 100644
index 69fac83..44bf114
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
*** static bool get_create_function_cmd(PGco
*** 65,70 
--- 65,71 
  static int	strip_lineno_from_funcdesc(char *func);
  static void minimal_error_message(PGresult *res);
  
+ static void printClientMessage(void);
  static void printSSLInfo(void);
  
  #ifdef WIN32
*** connection_warnings(bool in_startup)
*** 1699,1709 
--- 1700,1729 
  		checkWin32Codepage();
  #endif
  		printSSLInfo();
+ 
+ 		printClientMessage();
  	}
  }
  
  
  /*
+  * printClientMessage
+  *
+  * Prints any message stored in the client_message GUC
+  */
+ static void
+ printClientMessage(void)
+ {
+ 	const char *message;
+ 
+ 	message = PQparameterStatus(pset.db, client_message);
+ 
+ 	if (message)
+ 		printf(_(%s\n), message);
+ }
+ 
+ 
+ /*
   * printSSLInfo
   *
   * Prints information about the current SSL connection, if SSL is in use

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Client Messages

2012-01-05 Thread Kevin Grittner
Jim Mlodgenski jimm...@gmail.com wrote:
 
 Any feedback is welcome.
 
You might want to add it here to make sure it doesn't slip through the
cracks:
 
https://commitfest.postgresql.org/action/commitfest_view/open
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread Andrew Dunstan



On 01/05/2012 10:34 AM, Tom Lane wrote:


Unconditional sv_mortalcopy sounds like the thing to do then, but a
comment would help.  And if this isn't a Perl bug, I would like to
know what is.


Indeed. David, can you file a bug on this?



BTW, shouldn't we be making some attempt to drop the result of the
sv_mortalcopy?  Or is it just assumed that it will be garbage-collected
before the memory leak gets too bad?





Yes, it will be garbage collected before long. See discussion in perldoc 
perlguts.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread David E. Wheeler
On Jan 5, 2012, at 9:08 AM, Andrew Dunstan wrote:

 Unconditional sv_mortalcopy sounds like the thing to do then, but a
 comment would help.  And if this isn't a Perl bug, I would like to
 know what is.
 
 Indeed. David, can you file a bug on this?

I could, but don’t know what to say, and wouldn’t be able to answer any 
questions intelligently. I suggest you or Tom file one. Just run `perlbug` and 
follow the prompts. I’m sure p5p would appreciate a good bug report that I 
would not be able to send.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread David E. Wheeler
On Jan 5, 2012, at 7:34 AM, Tom Lane wrote:

 That still crashes, but doesn't if we use sv_mortalcopy unconditionally.
 
 Unconditional sv_mortalcopy sounds like the thing to do then, but a
 comment would help.  And if this isn't a Perl bug, I would like to
 know what is.

Question: Is this an issue anywhere else in PL/Perl, or just elog()? What about 
SPI parameters or return values?

david=# DO LANGUAGE PLPERL $$
david$# my $plan = spi_prepare('SELECT $1', 'TEXT');
david$# spi_query_prepared($plan, $^V);
david$# spi_freeplan($plan);
david$# return;
david$# $$;
ERROR:  cannot convert Perl hash to non-composite type text at line 3.
CONTEXT:  PL/Perl anonymous code block

No segfault, at least, though that’s a rather bizarre error message. AFAIK, $^V 
isn’t a hash. This works, though:

DO LANGUAGE PLPERL $$
my $plan = spi_prepare('SELECT $1', 'TEXT');
spi_query_prepared($plan, v1);
spi_freeplan($plan);
return;
$$;
DO

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread Tom Lane
David E. Wheeler da...@justatheory.com writes:
 On Jan 5, 2012, at 7:34 AM, Tom Lane wrote:
 Unconditional sv_mortalcopy sounds like the thing to do then, but a
 comment would help.  And if this isn't a Perl bug, I would like to
 know what is.

 Question: Is this an issue anywhere else in PL/Perl, or just elog()?

I would imagine you could reproduce it by returning the same kinds of
objects as function results, since the actual problem is in utf8 to
database-encoding conversion.

 No segfault, at least, though that’s a rather bizarre error message. AFAIK, 
 $^V isn’t a hash. This works, though:
 spi_query_prepared($plan, v1);

Is that actually a vstring?  I confess I'd never heard of the things
before this thread, but I remember reading somewhere that you need
multiple dots in a string before it's considered a vstring and not
something else.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread Andrew Dunstan



On 01/05/2012 12:28 PM, David E. Wheeler wrote:

On Jan 5, 2012, at 7:34 AM, Tom Lane wrote:


That still crashes, but doesn't if we use sv_mortalcopy unconditionally.

Unconditional sv_mortalcopy sounds like the thing to do then, but a
comment would help.  And if this isn't a Perl bug, I would like to
know what is.

Question: Is this an issue anywhere else in PL/Perl, or just elog()? What about 
SPI parameters or return values?



The fix that has been applied, as Tom suggested, is at the point where 
we call SvPVutf8(), so that's not just for elog().





david=# DO LANGUAGE PLPERL $$
david$# my $plan = spi_prepare('SELECT $1', 'TEXT');
david$# spi_query_prepared($plan, $^V);
david$# spi_freeplan($plan);
david$# return;
david$# $$;
ERROR:  cannot convert Perl hash to non-composite type text at line 3.
CONTEXT:  PL/Perl anonymous code block

No segfault, at least, though that’s a rather bizarre error message. AFAIK, $^V 
isn’t a hash. This works, though:



As documented, it's not a scalar, and you need to stop treating it as 
one. If you want it as a scalar, which is what you'd need here, enclose 
it in quotes, or use the stuff shown in perldoc perlvar.



cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread David E. Wheeler
On Jan 5, 2012, at 9:41 AM, Tom Lane wrote:

 I would imagine you could reproduce it by returning the same kinds of
 objects as function results, since the actual problem is in utf8 to
 database-encoding conversion.
 
 No segfault, at least, though that‚s a rather bizarre error message. AFAIK, 
 $^V isn‚t a hash. This works, though:
spi_query_prepared($plan, v1);
 
 Is that actually a vstring?  I confess I'd never heard of the things
 before this thread, but I remember reading somewhere that you need
 multiple dots in a string before it's considered a vstring and not
 something else.

Yes, it is. You can declare v-strings either by prepending a v to one or more 
dot-separated integers, or just 3 or more dotted integers.

  http://austin.pm.org/presentations/march2000/raty/vstrings.html

I believe the latter syntax is deprecated, though; it turned out to be 
confusing and pretty roundly hated. It's preferred to use the v syntax.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread David E. Wheeler
On Jan 5, 2012, at 9:50 AM, Andrew Dunstan wrote:

 The fix that has been applied, as Tom suggested, is at the point where we 
 call SvPVutf8(), so that's not just for elog().

Great, thanks.

 As documented, it's not a scalar, and you need to stop treating it as one. If 
 you want it as a scalar, which is what you'd need here, enclose it in quotes, 
 or use the stuff shown in perldoc perlvar.

Oh, right $^V is a version object.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread Andrew Dunstan



On 01/05/2012 12:41 PM, Tom Lane wrote:

Is that actually a vstring?  I confess I'd never heard of the things
before this thread, but I remember reading somewhere that you need
multiple dots in a string before it's considered a vstring and not
something else.



perldoc perlvar says:

   The revision, version, and subversion of the Perl interpreter,
   represented as a version object.

   This variable first appeared in perl 5.6.0; earlier versions of perl
   will see an undefined value. Before perl 5.10.0 $^V was represented
   as a v-string.

I'm quite sure David isn't running on something older than 5.10.0.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread Andrew Dunstan



On 01/05/2012 12:17 PM, David E. Wheeler wrote:

On Jan 5, 2012, at 9:08 AM, Andrew Dunstan wrote:


Unconditional sv_mortalcopy sounds like the thing to do then, but a
comment would help.  And if this isn't a Perl bug, I would like to
know what is.

Indeed. David, can you file a bug on this?

I could, but don’t know what to say, and wouldn’t be able to answer any 
questions intelligently. I suggest you or Tom file one. Just run `perlbug` and 
follow the prompts. I’m sure p5p would appreciate a good bug report that I 
would not be able to send.



OK, done.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 The fix that has been applied, as Tom suggested, is at the point where 
 we call SvPVutf8(), so that's not just for elog().

That patch looks good, but do we want to look into the other point about
error recovery?  The real bottom line here is that when SvPVutf8 threw a
croak(), we didn't recover sanely at all --- in fact, it was almost
impossible to tell what had happened even in gdb, until I installed
perl debugging symbols and single-stepped through things.  I would have
hoped that we'd at least see the croak message.  This seems a bit
nervous-making, as plperl sure calls a lot of stuff that in principle
could croak.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread Andrew Dunstan



On 01/05/2012 01:09 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

The fix that has been applied, as Tom suggested, is at the point where
we call SvPVutf8(), so that's not just for elog().

That patch looks good, but do we want to look into the other point about
error recovery?  The real bottom line here is that when SvPVutf8 threw a
croak(), we didn't recover sanely at all --- in fact, it was almost
impossible to tell what had happened even in gdb, until I installed
perl debugging symbols and single-stepped through things.  I would have
hoped that we'd at least see the croak message.  This seems a bit
nervous-making, as plperl sure calls a lot of stuff that in principle
could croak.




Yes, I think so. That seemed like a separate issue, though, which is why 
I just applied the workaround to the reported problem. I had similar 
issues with gdb - it wasn't a pleasant experience.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Simon Riggs
On Thu, Jan 5, 2012 at 4:04 PM, Robert Haas robertmh...@gmail.com wrote:

 I hypothesize that there are actually two kinds of latency spikes
 here.  Just taking a wild guess, I wonder if the *remaining* latency
 spikes are caused by the effect that you mentioned before: namely, the
 need to write an old CLOG page every time we advance onto a new one.
 I further speculate that the spikes are more severe on the unpatched
 code because this effect combines with the one I mentioned before: if
 there are more simultaneous I/O requests than there are buffers, a new
 I/O request has to wait for one of the I/Os already in progress to
 complete.  If the new I/O request that has to wait extra-long happens
 to be the one caused by XID advancement, then things get really ugly.
 If that hypothesis is correct, then it supports your previous belief
 that more than one fix is needed here... but it also means we can get
 a significant and I think quite worthwhile benefit just out of finding
 a reasonable way to add some more buffers.

Sounds reaonable.

Patch to remove clog contention caused by my dirty clog LRU.

The patch implements background WAL allocation also, with the
intention of being separately tested, if possible.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 4060e60..dbefa02 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -565,6 +565,26 @@ CheckPointCLOG(void)
 	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true);
 }
 
+/*
+ * Conditionally flush the CLOG LRU.
+ *
+ * When a backend does ExtendCLOG we need to write the CLOG LRU if it is
+ * dirty. Performing I/O while holding XidGenLock prevents new write
+ * transactions from starting. To avoid that we flush the CLOG LRU, if
+ * we think that a page write is due soon, according to a heuristic.
+ *
+ * Note that we're reading ShmemVariableCache-nextXid without a lock
+ * since the exact value doesn't matter as input into our heuristic.
+ */
+void
+CLOGBackgroundFlushLRU(void)
+{
+	TransactionId xid = ShmemVariableCache-nextXid;
+	int		threshold = (CLOG_XACTS_PER_PAGE - (CLOG_XACTS_PER_PAGE / 4));
+
+	if (TransactionIdToPgIndex(xid)  threshold)
+		SlruBackgroundFlushLRUPage(ClogCtl);
+}
 
 /*
  * Make sure that CLOG has room for a newly-allocated XID.
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 30538ff..aea6c09 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -885,6 +885,82 @@ SlruReportIOError(SlruCtl ctl, int pageno, TransactionId xid)
 }
 
 /*
+ * Identify the LRU slot but just leave it as it is.
+ *
+ * Control lock must be held at entry, and will be held at exit.
+ */
+static int
+SlruIdentifyLRUSlot(SlruCtl ctl)
+{
+	SlruShared	shared = ctl-shared;
+	int			slotno;
+	int			cur_count;
+	int			bestslot;
+	int			best_delta;
+	int			best_page_number;
+
+	/*
+	 * If we find any EMPTY slot, just select that one. Else locate the
+	 * least-recently-used slot.
+	 *
+	 * Normally the page_lru_count values will all be different and so
+	 * there will be a well-defined LRU page.  But since we allow
+	 * concurrent execution of SlruRecentlyUsed() within
+	 * SimpleLruReadPage_ReadOnly(), it is possible that multiple pages
+	 * acquire the same lru_count values.  In that case we break ties by
+	 * choosing the furthest-back page.
+	 *
+	 * In no case will we select the slot containing latest_page_number
+	 * for replacement, even if it appears least recently used.
+	 *
+	 * Notice that this next line forcibly advances cur_lru_count to a
+	 * value that is certainly beyond any value that will be in the
+	 * page_lru_count array after the loop finishes.  This ensures that
+	 * the next execution of SlruRecentlyUsed will mark the page newly
+	 * used, even if it's for a page that has the current counter value.
+	 * That gets us back on the path to having good data when there are
+	 * multiple pages with the same lru_count.
+	 */
+	cur_count = (shared-cur_lru_count)++;
+	best_delta = -1;
+	bestslot = 0;			/* no-op, just keeps compiler quiet */
+	best_page_number = 0;	/* ditto */
+	for (slotno = 0; slotno  shared-num_slots; slotno++)
+	{
+		int			this_delta;
+		int			this_page_number;
+
+		if (shared-page_status[slotno] == SLRU_PAGE_EMPTY)
+			return slotno;
+		this_delta = cur_count - shared-page_lru_count[slotno];
+		if (this_delta  0)
+		{
+			/*
+			 * Clean up in case shared updates have caused cur_count
+			 * increments to get lost.  We back off the page counts,
+			 * rather than trying to increase cur_count, to avoid any
+			 * question of infinite loops or failure in the presence of
+			 * wrapped-around counts.
+			 */
+			shared-page_lru_count[slotno] = cur_count;
+			this_delta = 0;
+		}
+		this_page_number = shared-page_number[slotno];
+		if ((this_delta  best_delta ||
+			 

Re: [HACKERS] CLOG contention

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Let's commit the change to 32.

I would like to do that, but I think we need to at least figure out a
way to provide an escape hatch for people without much shared memory.
We could do that, perhaps, by using a formula like this:

1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a
maximum of 32

I also think it would be a worth a quick test to see how the increase
performs on a system with 32 cores.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Simon Riggs
On Thu, Jan 5, 2012 at 7:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Let's commit the change to 32.

 I would like to do that, but I think we need to at least figure out a
 way to provide an escape hatch for people without much shared memory.
 We could do that, perhaps, by using a formula like this:

 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a
 maximum of 32

We're talking about an extra 192KB or thereabouts and Clog buffers
will only be the size of subtrans when we've finished.

If you want to have a special low-memory option, then it would need to
include many more things than clog buffers.

Let's just use a constant value for clog buffers until the low-memory
patch arrives.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 2:21 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Jan 5, 2012 at 7:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Let's commit the change to 32.

 I would like to do that, but I think we need to at least figure out a
 way to provide an escape hatch for people without much shared memory.
 We could do that, perhaps, by using a formula like this:

 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a
 maximum of 32

 We're talking about an extra 192KB or thereabouts and Clog buffers
 will only be the size of subtrans when we've finished.

 If you want to have a special low-memory option, then it would need to
 include many more things than clog buffers.

 Let's just use a constant value for clog buffers until the low-memory
 patch arrives.

Tom already stated that he found that unacceptable.  Unless he changes
his opinion, we're not going to get far if you're only happy if it's
constant and he's only happy if there's a formula.

On the other hand, I think there's a decent argument that he should
change his opinion, because 192kB of memory is not a lot.  However,
what I mostly want is something that nobody hates, so we can get it
committed and move on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Merlin Moncure
On Thu, Jan 5, 2012 at 1:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Let's commit the change to 32.

 I would like to do that, but I think we need to at least figure out a
 way to provide an escape hatch for people without much shared memory.
 We could do that, perhaps, by using a formula like this:

 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a
 maximum of 32

The assumption that machines that need this will have gigabytes of
shared memory set is not valid IMO.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Simon Riggs
On Thu, Jan 5, 2012 at 7:26 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 5, 2012 at 2:21 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Jan 5, 2012 at 7:12 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Let's commit the change to 32.

 I would like to do that, but I think we need to at least figure out a
 way to provide an escape hatch for people without much shared memory.
 We could do that, perhaps, by using a formula like this:

 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a
 maximum of 32

 We're talking about an extra 192KB or thereabouts and Clog buffers
 will only be the size of subtrans when we've finished.

 If you want to have a special low-memory option, then it would need to
 include many more things than clog buffers.

 Let's just use a constant value for clog buffers until the low-memory
 patch arrives.

 Tom already stated that he found that unacceptable.  Unless he changes
 his opinion, we're not going to get far if you're only happy if it's
 constant and he's only happy if there's a formula.

 On the other hand, I think there's a decent argument that he should
 change his opinion, because 192kB of memory is not a lot.  However,
 what I mostly want is something that nobody hates, so we can get it
 committed and move on.

If that was a reasonable objection it would have applied when we added
serializable support, or any other SLRU for that matter.

If memory reduction is a concern to anybody, then a separate patch to
address *all* issues is required. Blocking this patch makes no sense.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of jue ene 05 16:21:31 -0300 2012:
 On Thu, Jan 5, 2012 at 7:12 PM, Robert Haas robertmh...@gmail.com wrote:
  On Thu, Jan 5, 2012 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Let's commit the change to 32.
 
  I would like to do that, but I think we need to at least figure out a
  way to provide an escape hatch for people without much shared memory.
  We could do that, perhaps, by using a formula like this:
 
  1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a
  maximum of 32
 
 We're talking about an extra 192KB or thereabouts and Clog buffers
 will only be the size of subtrans when we've finished.

Speaking of which, maybe it'd be a good idea to parametrize the subtrans
size according to the same (or a similar) formula too.  (It might be
good to reduce multixact memory consumption too; I'd think that 4+4
pages should be more than sufficient for low memory systems, so making
those be half the clog values should be good)

So you get both things: reduce memory usage for systems on the low end,
which has been slowly increasing lately as we've added more uses of SLRU,
and more buffers for large systems.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Simon Riggs si...@2ndquadrant.com wrote:
 Robert Haas robertmh...@gmail.com wrote:
 Simon Riggs si...@2ndquadrant.com wrote:
 Let's commit the change to 32.

 I would like to do that, but I think we need to at least figure
 out a way to provide an escape hatch for people without much
 shared memory.  We could do that, perhaps, by using a formula
 like this:

 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8
 and a maximum of 32
 
If we go with such a formula, I think 32 MB would be a more
appropriate divisor than 128 MB.  Even on very large machines where
32 CLOG buffers would be a clear win, we often can't go above 1 or 2
GB of shared_buffers without hitting latency spikes due to overrun
of the RAID controller cache.  (Now, that may change if we get DW
in, but that's not there yet.)  1 GB / 32 is 32 MB.  This would
leave CLOG pinned at the minimum of 8 buffers (64 KB) all the way up
to shared_buffers of 256 MB.
 
 Let's just use a constant value for clog buffers until the
 low-memory patch arrives.
 
 Tom already stated that he found that unacceptable.  Unless he
 changes his opinion, we're not going to get far if you're only
 happy if it's constant and he's only happy if there's a formula.
 
 On the other hand, I think there's a decent argument that he
 should change his opinion, because 192kB of memory is not a lot. 
 However, what I mostly want is something that nobody hates, so we
 can get it committed and move on.
 
I wouldn't hate it either way, as long as the divisor isn't too
large.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I would like to do that, but I think we need to at least figure out a
 way to provide an escape hatch for people without much shared memory.
 We could do that, perhaps, by using a formula like this:

 1 CLOG buffer per 128MB of shared_buffers, with a minimum of 8 and a
 maximum of 32

I would be in favor of that, or perhaps some other formula (eg, maybe
the minimum should be less than 8 for when you've got very little shmem).

I think that the reason it's historically been a constant is that the
original coding took advantage of having a compile-time-constant number
of buffers --- but since we went over to the common SLRU infrastructure
for several different logs, there's no longer any benefit whatever to
using a simple constant.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Thu, Jan 5, 2012 at 7:26 PM, Robert Haas robertmh...@gmail.com wrote:
 On the other hand, I think there's a decent argument that he should
 change his opinion, because 192kB of memory is not a lot.  However,
 what I mostly want is something that nobody hates, so we can get it
 committed and move on.

 If that was a reasonable objection it would have applied when we added
 serializable support, or any other SLRU for that matter.
 If memory reduction is a concern to anybody, then a separate patch to
 address *all* issues is required. Blocking this patch makes no sense.

No, your argument is the one that makes no sense.  The fact that things
could be made better for low-mem situations is not an argument for
instead making them worse.  Which is what going to a fixed value of 32
would do, in return for no benefit that I can see compared to using a
formula of some sort.  The details of the formula barely matter, though
I would like to see one that bottoms out at less than 8 buffers so that
there is some advantage gained for low-memory cases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Magnus Hagander
On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 My laptop ran out of battery and turned itself off while I was just
 starting up postmaster. After plugging in the charger and rebooting, I
 got the following error when I tried to restart PostgreSQL:

 FATAL:  bogus data in lock file postmaster.pid: 

 postmaster.pid file was present in the data directory, but had zero
 length. Looking at the way the file is created and written, that can
 happen if you crash after the file is created, but before it's
 written/fsync'd (my laptop might have write-cache enabled, which would
 make the window larger).

 I was a bit surprised by that. That's probably not a big deal in
 practice, but I wonder if there was some easy way to avoid that. First I
 thought we could create the new postmaster.pid file with a temporary
 name and rename it in place, but rename(2) will merrily overwrite any
 existing file which is not what we want. We could use link(2), I guess.

 I think link(2) would create race conditions of its own.  I'd be
 inclined to suggest that maybe we should just special-case a zero length
 postmaster.pid file as meaning okay to proceed.  In general, garbage

That's pretty much what I meant - but with a warning message.

 data in postmaster.pid is something I'm happy to insist on manual
 recovery from, but maybe we could safely make an exception for empty
 files.

+1.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Simon Riggs
On Thu, Jan 5, 2012 at 7:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I think that the reason it's historically been a constant is that the
 original coding took advantage of having a compile-time-constant number
 of buffers --- but since we went over to the common SLRU infrastructure
 for several different logs, there's no longer any benefit whatever to
 using a simple constant.

You astound me, you really do.

Parameterised slru buffer sizes were proposed about for 8.3 and opposed by you.

I guess we all reserve the right to change our minds...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Parameterised slru buffer sizes were proposed about for 8.3 and opposed by 
 you.
 I guess we all reserve the right to change our minds...

When presented with new data, sure.  Robert's results offer a reason to
worry about this, which we did not have before now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-05 Thread Robert Haas
On Thu, Dec 29, 2011 at 9:03 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 3. Resolve two anticipated controversies that are, respectively,
 somewhat orthogonal and completely orthogonal to the binary bloat
 controversy. The first (controversy A) is that I have added a new
 piece of infrastructure, pg_always_inline, which, as the name
 suggests, is a portable way of insisting that a function should be
 invariably inlined. Portable libraries like libc have been doing this
 for a long time now, and I actually use the libc macro (that expands
 to __attribute__((always_inline)) ) where possible.

I don't have a problem with the idea of a pg_always_inline, but I'm
wondering what sort of fallback mechanism you propose.  It seems to me
that if the performance improvement is conditioned on inlining be done
and we're not confident that we can force the compiler to inline, we
ought to fall back to not making the specialization available, rather
than to making something available that may not work well.  Of course
if it's only a question of a smaller performance gain, rather than an
actual loss, then that's not as much of an issue...

 The second
 possible point of contention (controversy B) is that I have jettisoned
 various protections against bad qsort implementations that I believe
 are a legacy of when we used the system qsort pre-2006, that can no
 longer be justified. For example, quick sort performing badly in the
 face of lots of duplicates is a well understood problem today
 (partitioning should stop on equal keys), and ISTM that protecting
 against that outside the qsort implementation (but only for index
 sorts) is wrong-headed.

Assuming you mean that qsort_arg() will protect against this stuff so
that callers don't need to do it separately, +1.

 I had another idea when writing this patch that I haven't developed at
 all but I'll share anyway. That is, it might be a good idea to use a
 balance quicksort:

 http://xlinux.nist.gov/dads//HTML/balancedqsrt.html

I can't find any description of how this actually works... obviously,
we'd like to find a close-to-median element, but how do we actually do
that cheaply?

 It might be possible to get a reasonable approximation of the actual
 median value of a given column with existing statistics, which could
 be hinted to qsort_arg.

I doubt that this would be worthwhile -- the pivot that we pick at the
toplevel doesn't really matter much; even if it's bad, we can recover
just fine if the pivot we pick one level down is good, or the level
after that.  We only really have a problem if we keep systematically
picking bad pivots every time.  And if we do have that problem,
improving the toplevel choice of pivot is only going to help modestly,
because we'll still be O(n^2) on each partition.

 Can I get a straw poll on how much of a problem worst-case performance
 of qsort is believed to be?

I'd be reluctant to remove any of the protections we currently have,
because I bet they were all put in to fix problems that people hit in
the field.  But I don't know that we need more.  Of course,
consolidating them into qsort() itself rather than its callers
probably makes sense, as noted above.

 In a perfect world, if it were somehow possible to know the perfect
 pivots ahead of time from a histogram or something, we'd have a quick
 sort variant with worst-case performance of O(n log(n)). That, or the
 limited approximation that I've sketched would perhaps be worthwhile,
 even if it was subject to a number of caveats. Wikipedia claims that
 the worst case for quick sort is O(n log(n)) with the refinements
 recommended by Sedgewick's 1978 paper, but this seems like nonsense to
 me - the med3 technique is a good heuristic in practice, but it's
 perfectly possible in theory for it's sampling to always get things
 completely wrong (this is not an unfamiliar problem).

I agree.  After reading
http://nanxkurniawan.wordpress.com/2010/05/31/quicksort/ I am inclined
to think that things are still O(n lg n) even if we always partition
so that any fixed percentage of the data -- is on one side of the
partition element and the remainder is on the other side.  So for
example if all of our partition elements are greater than 1% of the
elements in their partition and less than the other 99%, we'll still
be O(n lg n), though with a considerably higher constant factor, I
think.  However, if our partition elements are always a fixed NUMBER
of elements from the end - whether it's 1 or 100 - then we'll be
O(n^2), though again the constant factor will depend on how many
elements you knock off each time.  In practice I'm not sure whether an
algorithmic analysis matters much, because in real life the constant
factors are probably pretty important, hence the median-of-medians
approach with n  40.

 While I'm thinking out loud, here's another idea: Have a GUC which,
 when enabled (perhaps potentially at various different granularities),
 makes Timsort the in-memory sorting algorithm, as it now 

Re: [HACKERS] CLOG contention

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 2:44 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 If we go with such a formula, I think 32 MB would be a more
 appropriate divisor than 128 MB.  Even on very large machines where
 32 CLOG buffers would be a clear win, we often can't go above 1 or 2
 GB of shared_buffers without hitting latency spikes due to overrun
 of the RAID controller cache.  (Now, that may change if we get DW
 in, but that's not there yet.)  1 GB / 32 is 32 MB.  This would
 leave CLOG pinned at the minimum of 8 buffers (64 KB) all the way up
 to shared_buffers of 256 MB.

That seems reasonable to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Robert Haas
On Thu, Jan 5, 2012 at 2:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I would be in favor of that, or perhaps some other formula (eg, maybe
 the minimum should be less than 8 for when you've got very little shmem).

I have some results that show that, under the right set of
circumstances, 8-32 is a win, and I can quantify by how much it wins.
 I don't have any data at all to quantify the cost of dropping the
minimum from 8-6, or from 8-4, and therefore I'm reluctant to do it.
 My guess is that it's a bad idea, anyway.  Even on a system where
shared_buffers is just 8MB, we have 1024 regular buffers and 8 CLOG
buffers.  If we reduce the number of CLOG buffers from 8 to 4 (i.e. by
50%), we can increase the number of regular buffers from 1024 to 1028
(i.e. by 0.5%).  Maybe you can find a case where that comes out to a
win, but you might have to look pretty hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Merlin Moncure
On Thu, Jan 5, 2012 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 5, 2012 at 2:44 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 If we go with such a formula, I think 32 MB would be a more
 appropriate divisor than 128 MB.  Even on very large machines where
 32 CLOG buffers would be a clear win, we often can't go above 1 or 2
 GB of shared_buffers without hitting latency spikes due to overrun
 of the RAID controller cache.  (Now, that may change if we get DW
 in, but that's not there yet.)  1 GB / 32 is 32 MB.  This would
 leave CLOG pinned at the minimum of 8 buffers (64 KB) all the way up
 to shared_buffers of 256 MB.

 That seems reasonable to me.

likewise (champion bikeshedder here).  It just so happens I typically
set 'large' server shared memory to 256mb.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Perl Does not Like vstrings

2012-01-05 Thread David E. Wheeler
On Jan 5, 2012, at 9:55 AM, Andrew Dunstan wrote:

 perldoc perlvar says:
 
   The revision, version, and subversion of the Perl interpreter,
   represented as a version object.
 
   This variable first appeared in perl 5.6.0; earlier versions of perl
   will see an undefined value. Before perl 5.10.0 $^V was represented
   as a v-string.
 
 I'm quite sure David isn't running on something older than 5.10.0.

Perl 5.14.2:

 perl -E 'say ref $^V'
version

Perl 5.12.4:

 perl -E 'say ref $^V'
version

Perl 5.10.1:

 perl -E 'say ref $^V'
version

Perl 5.8.9:

 perl -le 'print ref $^V' 


Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Work around perl bug in SvPVutf8().

2012-01-05 Thread Andrew Dunstan



On 01/05/2012 12:05 PM, Andrew Dunstan wrote:

Work around perl bug in SvPVutf8().

Certain things like typeglobs or readonly things like $^V cause
perl's SvPVutf8() to die nastily and crash the backend. To avoid
that bug we make a copy of the object, which will subsequently be
garbage collected.

Back patched to 9.1 where we first started using SvPVutf8().

Per -hackers discussion. Original problem reported by David Wheeler.



Ugh. This broke the regression tests. I'll get it fixed.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-05 Thread Greg Smith

On 01/03/2012 06:22 PM, Jim Nasby wrote:

On Jan 3, 2012, at 11:15 AM, Robert Haas wrote:
   

I think that our current freelist is practically useless, because it
is almost always empty, and the cases where it's not empty (startup,
and after a table or database drop) are so narrow that we don't really
get any benefit out of having it.  However, I'm not opposed to the
idea of a freelist in general: I think that if we actually put in some
effort to keep the freelist in a non-empty state it would help a lot,
because backends would then have much less work to do at buffer
allocation time.
 

This is exactly what the FreeBSD VM system does (which is at least one of the 
places where the idea of a clock sweep for PG came from ages ago). There is a 
process that does nothing but attempt to keep X amount of memory on the free 
list, where it can immediately be grabbed by anything that needs memory. Pages 
on the freelist are guaranteed to be clean (as in not dirty), but not zero'd. 
In fact, IIRC if a page on the freelist gets referenced again it can be pulled 
back out of the free list and put back into an active state.

The one downside I see to this is that we'd need some heuristic to determine 
how many buffers we want to keep on the free list.
   


http://wiki.postgresql.org/wiki/Todo#Background_Writer has Consider 
adding buffers the background writer finds reusable to the free list 
and Automatically tune bgwriter_delay based on activity rather then 
using a fixed interval, which both point to my 8.3 musing and other 
suggestionss starting at 
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00781.php  I 
could write both those in an afternoon.  The auto-tuning stuff already 
in the background writer originally intended to tackle this issue, but 
dropped it in lieu of shipping something simpler first.  There's even a 
prototype somewhere on an old drive here.


The first missing piece needed before this was useful was separating out 
the background writer and checkpointer processes.  Once I realized the 
checkpoints were monopolizing so much time, especially when they hit bad 
states, it was obvious the writer couldn't be relied upon for this job.  
That's much better now since Simon's 
806a2aee3791244bf0f916729bfdb5489936e068 Split work of bgwriter between 
2 processes: bgwriter and checkpointer, which just became available in 
November to build on.


The second missing piece blocking this work in my mind was how exactly 
we're going to benchmark the result, mainly to prove it doesn't hurt 
some workloads.  I haven't fully internalized the implications of 
Robert's upthread comments, in terms of being able to construct a 
benchmark stressing both the best and worst case situation here.  That's 
really the hardest part of this whole thing, by a lot.  Recent spending 
has brought me an 8 HyperThread core laptop that can also run DTrace, so 
I expect to have better visibility into this soon too.


I think here in 2011 the idea of having a background writer process that 
could potentially occupy most of a whole core doing work so backends 
don't have to is an increasingly attractive one.  So long as that comes 
along with an auto-tuning delay, it shouldn't hurt the work toward 
lowering power management either.  Might even help really, by allowing 
larger values of bgwriter_delay than you'd want to use during busy 
periods.  I was planning to mimic the sort of fast attack/slow delay 
logic already used for the auto-tuned timing, so that you won't fall 
behind by more than one bgwriter_delay worth of activity.  Then it 
should realize a burst is here and the writer has to start moving faster.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FATAL: bogus data in lock file postmaster.pid:

2012-01-05 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Jan 5, 2012 at 17:13, Tom Lane t...@sss.pgh.pa.us wrote:
 I think link(2) would create race conditions of its own.  I'd be
 inclined to suggest that maybe we should just special-case a zero length
 postmaster.pid file as meaning okay to proceed.  In general, garbage

 That's pretty much what I meant - but with a warning message.

Actually ... wait a minute.  If we allow that, don't we create a race
condition between two postmasters starting at almost the same instant?
The second one could see the lock file when the first has created but
not yet filled it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Dec 29, 2011 at 9:03 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 The first (controversy A) is that I have added a new
 piece of infrastructure, pg_always_inline, which, as the name
 suggests, is a portable way of insisting that a function should be
 invariably inlined.

 I don't have a problem with the idea of a pg_always_inline, but I'm
 wondering what sort of fallback mechanism you propose.

There is no compiler anywhere that implements always inline, unless
you are talking about a macro.  inline is a hint and nothing more,
and if you think you can force it you are mistaken.  So this controversy
is easily resolved: we do not need any such construct.

The real question is whether we should accept a patch that is a
performance loss when the compiler fails to inline some reasonably
simple function.  I think that would depend on the actual numbers
involved, so we'd need to see data before making a decision.

 The second
 possible point of contention (controversy B) is that I have jettisoned
 various protections against bad qsort implementations that I believe
 are a legacy of when we used the system qsort pre-2006, that can no
 longer be justified.

No objection to that one, I think, as long as you've verified that our
implementation is in fact okay about these things.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Work around perl bug in SvPVutf8().

2012-01-05 Thread Alvaro Herrera

Excerpts from Andrew Dunstan's message of jue ene 05 19:06:54 -0300 2012:
 
 On 01/05/2012 12:05 PM, Andrew Dunstan wrote:
  Work around perl bug in SvPVutf8().
 
  Certain things like typeglobs or readonly things like $^V cause
  perl's SvPVutf8() to die nastily and crash the backend. To avoid
  that bug we make a copy of the object, which will subsequently be
  garbage collected.
 
  Back patched to 9.1 where we first started using SvPVutf8().
 
  Per -hackers discussion. Original problem reported by David Wheeler.
 
 
 Ugh. This broke the regression tests. I'll get it fixed.

Yeah, notice that it's causing crashes, not just different results.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLOG contention

2012-01-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 5, 2012 at 2:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I would be in favor of that, or perhaps some other formula (eg, maybe
 the minimum should be less than 8 for when you've got very little shmem).

 I have some results that show that, under the right set of
 circumstances, 8-32 is a win, and I can quantify by how much it wins.
  I don't have any data at all to quantify the cost of dropping the
 minimum from 8-6, or from 8-4, and therefore I'm reluctant to do it.
  My guess is that it's a bad idea, anyway.  Even on a system where
 shared_buffers is just 8MB, we have 1024 regular buffers and 8 CLOG
 buffers.  If we reduce the number of CLOG buffers from 8 to 4 (i.e. by
 50%), we can increase the number of regular buffers from 1024 to 1028
 (i.e. by 0.5%).  Maybe you can find a case where that comes out to a
 win, but you might have to look pretty hard.

I think you're rejecting the concept too easily.  A setup with very
little shmem is only going to be suitable for low-velocity systems that
are not pushing too many transactions through per second, so it's not
likely to need so many CLOG buffers.  And frankly I'm not that concerned
about what the performance is like: I'm more concerned about whether
PG will start up at all without modifying the system shmem limits,
on systems with legacy values for SHMMAX etc.  Shaving a few
single-purpose buffers to make back what we spent on SSI, for example,
seems like a good idea to me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Work around perl bug in SvPVutf8().

2012-01-05 Thread Andrew Dunstan



On 01/05/2012 05:30 PM, Alvaro Herrera wrote:

Excerpts from Andrew Dunstan's message of jue ene 05 19:06:54 -0300 2012:

On 01/05/2012 12:05 PM, Andrew Dunstan wrote:

Work around perl bug in SvPVutf8().

Certain things like typeglobs or readonly things like $^V cause
perl's SvPVutf8() to die nastily and crash the backend. To avoid
that bug we make a copy of the object, which will subsequently be
garbage collected.

Back patched to 9.1 where we first started using SvPVutf8().

Per -hackers discussion. Original problem reported by David Wheeler.


Ugh. This broke the regression tests. I'll get it fixed.

Yeah, notice that it's causing crashes, not just different results.



Indeed. It looks like the garbage collection was a bit too eager, so I 
have a fix where we control when it gets done.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Jeremy Harris

On 2012-01-05 10:04, Benedikt Grundmann wrote:

I have a question of how to benchmark hardware to determine
the appropriate ratio of seq_page_cost vs random_page_cost.


It'd be really nice if the DBMS measured actual experienced values..

--
Jeremy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-05 Thread Alex Hunsaker
On Thu, Jan 5, 2012 at 16:02, Andrew Dunstan and...@dunslane.net wrote:
 Fix breakage from earlier plperl fix.

 Apparently the perl garbage collector was a bit too eager, so here
 we control when the new SV is garbage collected.

I know im a little late to the party...

I can't help but think this seems a bit inefficient for the common
case. Would it be worth only copying the sv when its a glob or
readonly? Something like the below? I tested a few more svtypes that
were easy to make (code, regexp) and everything seems peachy.

[ BTW it seems readonly in general is fine, for example elog(ERROR,
1); worked previously as well as elog(ERROR, 1);. Both of those sv's
are readonly. ISTM, at least on my version of perl (5.14.2), globs and
readonly vstrings seem to be the problem children. I think we could
get away with testing if its a glob or vstring. But I don't have time
right now to test all the way back to perl 5.8 and everything
in-between, Ill find it if anyone is interested.  ]

--

*** a/src/pl/plperl/plperl_helpers.h
--- b/src/pl/plperl/plperl_helpers.h
***
*** 47,53  sv2cstr(SV *sv)
  {
char   *val, *res;
STRLEN  len;
-   SV *nsv;

/*
 * get a utf8 encoded char * out of perl. *note* it may not be valid 
utf8!
--- 47,52 
***
*** 58,65  sv2cstr(SV *sv)
 * sv before passing it to SvPVutf8(). The copy is garbage collected
 * when we're done with it.
 */
!   nsv = newSVsv(sv);
!   val = SvPVutf8(nsv, len);

/*
 * we use perl's length in the event we had an embedded null byte to 
ensure
--- 57,68 
 * sv before passing it to SvPVutf8(). The copy is garbage collected
 * when we're done with it.
 */
!   if(SvTYPE(sv) == SVt_PVGV || SvREADONLY(sv))
!   sv = newSVsv(sv);
!   else
!   SvREFCNT_inc(sv);
!
!   val = SvPVutf8(sv, len);

/*
 * we use perl's length in the event we had an embedded null byte to 
ensure
***
*** 68,74  sv2cstr(SV *sv)
res =  utf_u2e(val, len);

/* safe now to garbage collect the new SV */
!   SvREFCNT_dec(nsv);

return res;
  }
--- 71,77 
res =  utf_u2e(val, len);

/* safe now to garbage collect the new SV */
!   SvREFCNT_dec(sv);

return res;
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-05 Thread Andrew Dunstan



On 01/05/2012 06:31 PM, Alex Hunsaker wrote:

On Thu, Jan 5, 2012 at 16:02, Andrew Dunstanand...@dunslane.net  wrote:

Fix breakage from earlier plperl fix.

Apparently the perl garbage collector was a bit too eager, so here
we control when the new SV is garbage collected.

I know im a little late to the party...

I can't help but think this seems a bit inefficient for the common
case. Would it be worth only copying the sv when its a glob or
readonly? Something like the below? I tested a few more svtypes that
were easy to make (code, regexp) and everything seems peachy.



I'm not so concerned about elog() use, and anyway there the most common 
case surely will be passing a readonly string.


I'm more concerned about all the other places we call sv2cstr().

SvTYPE(sv) == SVt_PVGV is what I was looking for in vain in the perl docs.

So, yes, we should probably adjust this one more time, but ideally we 
need a better test than just SvREADONLY(). If you want to follow up your 
investigation of exactly when we need a copied SV and see how much you 
can narrow it down that would be great.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-01-05 Thread Peter Geoghegan
On 5 January 2012 22:27, Tom Lane t...@sss.pgh.pa.us wrote:
 There is no compiler anywhere that implements always inline, unless
 you are talking about a macro.  inline is a hint and nothing more,
 and if you think you can force it you are mistaken.  So this controversy
 is easily resolved: we do not need any such construct.

I'm slightly puzzled by your remarks here. GCC documentation on this
is sparse (although, as I've demonstrated, I can get better numbers
using the always_inline attribute on GCC 4.3), but take a look at
this:

http://msdn.microsoft.com/en-us/library/z8y1yy88.aspx

While it is not strictly true to say that pg_always_inline could
inline every function under every set of circumstances, it's pretty
close to the truth. I do accept that this facility could quite easily
be abused if its use isn't carefully measured. I also accept that
C99/GNU C inline functions are generally just requests to the compiler
that may be ignored (and indeed the compiler may inline without being
asked to).

It's short sighted to see this as a case of inlining itself making a
big difference, so much as it making a big difference as an enabling
transformation.

 The real question is whether we should accept a patch that is a
 performance loss when the compiler fails to inline some reasonably
 simple function.  I think that would depend on the actual numbers
 involved, so we'd need to see data before making a decision.

Who said anything about a performance loss? Since the raw improvement
to qsort_arg is so large, it's pretty difficult to imagine a
confluence of circumstances in which this results in a net loss. See
the figures at http://archives.postgresql.org/pgsql-hackers/2011-11/msg01316.php
, for example.

The pg_always_inline idea is relatively recent. It just serves to
provide additional encouragement to the compiler to inline, insofar as
that is possible on the platform in question. The compiler's
cost/benefit analysis cannot possibly appreciate how hot a code path
qsort_arg is, because it has a set of generic heuristics that are
quite fallible, and very probably are on quite conservative. We can
appreciate such things though.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore direct to database is broken for --insert dumps

2012-01-05 Thread Tom Lane
I wrote:
 Not easily: there could be newlines embedded in data strings or SQL
 identifiers.  I'm not seeing any way around this except to restore the
 minimal lexing capability.  One thing we could probably do is to
 restrict it to be used only when reading table data, and continue to
 assume that object creation commands can be emitted as-is.  That would
 at least get us out of needing to parse dollar-quoted literals, which
 aren't used in the INSERT commands.

Attached is a patch that restores a much-simplified version of the mini
lexer; it deals only with the sorts of things that dumpTableData_insert
actually emits.  Fixing the standard_conforming_strings issue turns out
to be really a one-liner, because the setting is in fact readily
available to this code.  Maybe we should have done it that way to begin
with :-( ... though I admit to being glad to have gotten rid of the very
questionable dollar-quote-recognition code that was there before.

Barring objections to the approach, I'll apply and back-patch this
tomorrow.

regards, tom lane

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index d9edebb0f48308787bf263cd90504d3c5ed80e8b..234e50fb734573f9f3287d952821702a0838c4fd 100644
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
*** restore_toc_entry(ArchiveHandle *AH, Toc
*** 620,639 
  	if (te-copyStmt  strlen(te-copyStmt)  0)
  	{
  		ahprintf(AH, %s, te-copyStmt);
! 		AH-writingCopyData = true;
  	}
  
  	(*AH-PrintTocDataPtr) (AH, te, ropt);
  
  	/*
  	 * Terminate COPY if needed.
  	 */
! 	if (AH-writingCopyData)
! 	{
! 		if (RestoringToDB(AH))
! 			EndDBCopyMode(AH, te);
! 		AH-writingCopyData = false;
! 	}
  
  	/* close out the transaction started above */
  	if (is_parallel  te-created)
--- 620,639 
  	if (te-copyStmt  strlen(te-copyStmt)  0)
  	{
  		ahprintf(AH, %s, te-copyStmt);
! 		AH-outputKind = OUTPUT_COPYDATA;
  	}
+ 	else
+ 		AH-outputKind = OUTPUT_OTHERDATA;
  
  	(*AH-PrintTocDataPtr) (AH, te, ropt);
  
  	/*
  	 * Terminate COPY if needed.
  	 */
! 	if (AH-outputKind == OUTPUT_COPYDATA 
! 		RestoringToDB(AH))
! 		EndDBCopyMode(AH, te);
! 	AH-outputKind = OUTPUT_SQLCMDS;
  
  	/* close out the transaction started above */
  	if (is_parallel  te-created)
*** _allocAH(const char *FileSpec, const Arc
*** 1975,1980 
--- 1975,1982 
  	AH-mode = mode;
  	AH-compression = compression;
  
+ 	memset((AH-sqlparse), 0, sizeof(AH-sqlparse));
+ 
  	/* Open stdout with no compression for AH output handle */
  	AH-gzOut = 0;
  	AH-OF = stdout;
*** CloneArchive(ArchiveHandle *AH)
*** 4194,4200 
  	clone = (ArchiveHandle *) pg_malloc(sizeof(ArchiveHandle));
  	memcpy(clone, AH, sizeof(ArchiveHandle));
  
! 	/* Handle format-independent fields ... none at the moment */
  
  	/* The clone will have its own connection, so disregard connection state */
  	clone-connection = NULL;
--- 4196,4203 
  	clone = (ArchiveHandle *) pg_malloc(sizeof(ArchiveHandle));
  	memcpy(clone, AH, sizeof(ArchiveHandle));
  
! 	/* Handle format-independent fields */
! 	memset((clone-sqlparse), 0, sizeof(clone-sqlparse));
  
  	/* The clone will have its own connection, so disregard connection state */
  	clone-connection = NULL;
*** DeCloneArchive(ArchiveHandle *AH)
*** 4227,4233 
  	/* Clear format-specific state */
  	(AH-DeClonePtr) (AH);
  
! 	/* Clear state allocated by CloneArchive ... none at the moment */
  
  	/* Clear any connection-local state */
  	if (AH-currUser)
--- 4230,4238 
  	/* Clear format-specific state */
  	(AH-DeClonePtr) (AH);
  
! 	/* Clear state allocated by CloneArchive */
! 	if (AH-sqlparse.curCmd)
! 		destroyPQExpBuffer(AH-sqlparse.curCmd);
  
  	/* Clear any connection-local state */
  	if (AH-currUser)
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 7a4fd360737ab3c48fdd8776879f804e0a1062a4..6dd5158ab4d9f7ee815e555029839190f335d9a1 100644
*** a/src/bin/pg_dump/pg_backup_archiver.h
--- b/src/bin/pg_dump/pg_backup_archiver.h
*** typedef size_t (*CustomOutPtr) (struct _
*** 134,139 
--- 134,153 
  
  typedef enum
  {
+ 	SQL_SCAN = 0,/* normal */
+ 	SQL_IN_SINGLE_QUOTE,		/* '...' literal */
+ 	SQL_IN_DOUBLE_QUOTE			/* ... identifier */
+ } sqlparseState;
+ 
+ typedef struct
+ {
+ 	sqlparseState state;		/* see above */
+ 	bool		backSlash;		/* next char is backslash quoted? */
+ 	PQExpBuffer curCmd;			/* incomplete line (NULL if not created) */
+ } sqlparseInfo;
+ 
+ typedef enum
+ {
  	STAGE_NONE = 0,
  	STAGE_INITIALIZING,
  	STAGE_PROCESSING,
*** typedef enum
*** 142,147 
--- 156,168 
  
  typedef enum
  {
+ 	OUTPUT_SQLCMDS = 0,			/* emitting general SQL commands */
+ 	OUTPUT_COPYDATA,			/* 

Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Josh Berkus
On 1/5/12 3:00 PM, Jeremy Harris wrote:
 On 2012-01-05 10:04, Benedikt Grundmann wrote:
 I have a question of how to benchmark hardware to determine
 the appropriate ratio of seq_page_cost vs random_page_cost.
 
 It'd be really nice if the DBMS measured actual experienced values..

Certainly it would.  It would also require a whole lot of
instrumentation.  Feel free to write some ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Poorly thought out code in vacuum

2012-01-05 Thread Tom Lane
Lately I have noticed buildfarm member jaguar occasionally failing
regression tests with
ERROR:  invalid memory alloc request size 1073741824
during a VACUUM, as for example at
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguardt=2012-01-04%2023%3A05%3A02

Naturally I supposed it to be a consequence of the CLOBBER_CACHE_ALWAYS
option, ie, somebody accessing an already-freed cache entry.  I managed
to duplicate and debug it here after an afternoon of trying, and it's
not actually related to that at all, except perhaps to the extent that
CLOBBER_CACHE_ALWAYS slows down some things enormously more than others.
The failure comes when a ResourceOwner tries to enlarge its
pinned-buffers array to more than 1GB, and that's not a typo: there are
umpteen entries for the same buffer, whose PrivateRefCount is 134217727
and trying to go higher.  A stack trace soon revealed the culprit, which
is this change in lazy_vacuum_heap():

@@ -932,7 +965,8 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 tblk = ItemPointerGetBlockNumber(vacrelstats-dead_tuples[tupindex]);
 buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
  vac_strategy);
-LockBufferForCleanup(buf);
+if (!ConditionalLockBufferForCleanup(buf))
+continue;
 tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats);
 
 /* Now that we've compacted the page, record its available space */

introduced in
http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=bbb6e559c4ea0fb4c346beda76736451dc24eb4e

The continue just results in another iteration of the containing tight
loop, and of course that immediately re-pins the same buffer without
having un-pinned it.  So if someone else sits on their pin of the buffer
for long enough, kaboom.

We could fix the direct symptom by inserting UnlockReleaseBuffer()
in front of the continue, but AFAICS that just makes this into a
busy-waiting equivalent of waiting unconditionally, so I don't really
see why we should not revert the above fragment altogether.  However,
I suppose Robert had something more intelligent in mind than a tight
loop when the buffer can't be exclusively locked, so maybe there is
some other change that should be made here instead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-05 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote:
 I discover that non-all-zeroes holes are fairly common, just not very 
 frequent.

Curious, might be interesting to find out why.

 That may or may not be a problem, but not something to be dealt with
 here and now.

But I agree that it's not the job of this patch/effort.  It sounds like
we have clear indication, however, that those areas, as they are not
necessairly all zeros, should be included in the checksum.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-05 Thread Alex Hunsaker
On Thu, Jan 5, 2012 at 16:59, Andrew Dunstan and...@dunslane.net wrote:


 On 01/05/2012 06:31 PM, Alex Hunsaker wrote:

 On Thu, Jan 5, 2012 at 16:02, Andrew Dunstanand...@dunslane.net  wrote:

 Fix breakage from earlier plperl fix.

 I can't help but think this seems a bit inefficient

 So, yes, we should probably adjust this one more time, but ideally we need a
 better test than just SvREADONLY(). If you want to follow up your
 investigation of exactly when we need a copied SV and see how much you can
 narrow it down that would be great.

After further digging I found it chokes on any non scalar (IOW any
reference). I attached a simple c program that I tested with 5.8.9,
5.10.1, 5.12.4 and 5.14.2 (for those who did not know about it,
perlbrew made testing across all those perls relatively painless).

PFA that copies if its readonly and its not a scalar. Also I fixed up
Tom's complaint about having sv2cstr() inside do_util_elog's PG_TRY
block. I didn't bother fixing the ones in plperl.c tho-- some seemed
like they would require quite a bit of rejiggering.

I didn't bother adding regression tests-- should I have?
*** a/src/pl/plperl/Util.xs
--- b/src/pl/plperl/Util.xs
***
*** 37,47  static void
  do_util_elog(int level, SV *msg)
  {
  	MemoryContext oldcontext = CurrentMemoryContext;
! 	char	   * volatile cmsg = NULL;
  
  	PG_TRY();
  	{
- 		cmsg = sv2cstr(msg);
  		elog(level, %s, cmsg);
  		pfree(cmsg);
  	}
--- 37,46 
  do_util_elog(int level, SV *msg)
  {
  	MemoryContext oldcontext = CurrentMemoryContext;
! 	char	   * volatile cmsg = sv2cstr(msg);
  
  	PG_TRY();
  	{
  		elog(level, %s, cmsg);
  		pfree(cmsg);
  	}
*** a/src/pl/plperl/plperl_helpers.h
--- b/src/pl/plperl/plperl_helpers.h
***
*** 47,74  sv2cstr(SV *sv)
  {
  	char	   *val, *res;
  	STRLEN		len;
! 	SV *nsv;
  
  	/*
  	 * get a utf8 encoded char * out of perl. *note* it may not be valid utf8!
  	 *
! 	 * SvPVutf8() croaks nastily on certain things, like typeglobs and
! 	 * readonly objects such as $^V. That's a perl bug - it's not supposed to
! 	 * happen. To avoid crashing the backend, we make a copy of the
! 	 * sv before passing it to SvPVutf8(). The copy is garbage collected 
! 	 * when we're done with it.
  	 */
! 	nsv = newSVsv(sv);
! 	val = SvPVutf8(nsv, len);
  
  	/*
  	 * we use perl's length in the event we had an embedded null byte to ensure
  	 * we error out properly
  	 */
! 	res =  utf_u2e(val, len);
  
  	/* safe now to garbage collect the new SV */
! 	SvREFCNT_dec(nsv);
  
  	return res;
  }
--- 47,79 
  {
  	char	   *val, *res;
  	STRLEN		len;
! 	svtype		type = SvTYPE(sv);
  
  	/*
  	 * get a utf8 encoded char * out of perl. *note* it may not be valid utf8!
  	 *
! 	 * SvPVutf8() croaks nastily on readonly refs, That's a perl bug - it's not
! 	 * supposed to happen. To avoid crashing the backend, we make a copy of the
! 	 * sv before passing it to SvPVutf8().
  	 */
! 	if (SvREADONLY(sv) 
! 			(type != SVt_IV ||
! 			type != SVt_NV ||
! 			type != SVt_PV))
! 		sv = newSVsv(sv);
! 	else
! 		SvREFCNT_inc(sv);
! 
! 	val = SvPVutf8(sv, len);
  
  	/*
  	 * we use perl's length in the event we had an embedded null byte to ensure
  	 * we error out properly
  	 */
! 	res = utf_u2e(val, len);
  
  	/* safe now to garbage collect the new SV */
! 	SvREFCNT_dec(sv);
  
  	return res;
  }
/*
 * compile with gcc  -O2 -ggdb `perl -MExtUtils::Embed -e ccopts -e ldopts` svutf8_ro_test.c
 */
#include EXTERN.h
#include perl.h

int main(void)
{
	char *embed[] = { , -e, 0 };
	int x;
	AV	*test;
	PerlInterpreter *perl;

	perl_construct(perl);
	perl_parse(perl, NULL, 3, embed, NULL);
	perl_run(perl);

	eval_pv(my $scalar = 'string';
			@test = (
			'string', 
			$scalar, 
			\\$scalar, 
			1, 
			1.5, 
			[], 
			{}, 
			$^V, ,
			v5.0.0, 
			sub {}, 
			qr//, 
			*STDIN, 
			bless({}, ''), 
			);, 1);

	test = get_av(test, 0);
	for(x=0; x=av_len(test); x++)
	{
		char *crap;
		STRLEN len;
		SV *sv = *av_fetch(test, x, 0);
		svtype type = SvTYPE(sv);

		SvREADONLY_on(sv);

		if (SvREADONLY(sv) 
type != SVt_IV ||
type != SVt_NV ||
type != SVt_PV)
			sv = newSVsv(sv);

		crap = SvPVutf8(sv, len);
	}

	perl_destruct(perl);
	perl_free(perl);

	return 0;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers