Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Simon Riggs
On Tue, Jan 3, 2012 at 4:40 AM, Noah Misch n...@leadboat.com wrote:
 On Tue, Jan 03, 2012 at 01:18:41AM +, Simon Riggs wrote:
 Just for the record, yes we do run multiple catalog scans in some
 parts of the code.

 So I can see how we might trigger 4 nested scans, using cache
 replacement while scanning, so best assume more, with no guarantee of
 them being neatly stacked for pop/push type access.

 Yeah, I wouldn't want to commit to a nesting limit.

Agreed

 However, I _would_ have
 expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot()
 is adequate for a great deal of the backend, after all.  In what sort of
 situation do catalog scans not strictly nest?

It's possible. Making assumptions about what is possible bit me
before, as you showed.

I've seen code where we are explicitly running two concurrent scans.

I don't want to put unnecessary and subtle assumptions into catalog
scan code so I'm working on the assumption that endscan may not be
called in strict FIFO order. The difference is fairly minor and
doesn't restrict us in other ways.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-01-03 Thread Albe Laurenz
Pavel Stehule wrote:

 here is new version of CHECK FUNCTION patch

I won't be able to review that one because I'll be in
California from Jan 6 to Jan 29.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add SPI results constants available for PL/*

2012-01-03 Thread Samuel PHAN
I'm using PL/Python, and when getting the result object from a
plpy.execute(), I can access to the result.status().

E.g.: the result.status() is 4. But to know what 4 corresponds to, I must
open the spi.h file from the sources to see :

#define SPI_OK_CONNECT 1
#define SPI_OK_FINISH 2
#define SPI_OK_FETCH 3
#define SPI_OK_UTILITY 4
#define SPI_OK_SELECT 5
#define SPI_OK_SELINTO 6
#define SPI_OK_INSERT 7
#define SPI_OK_DELETE 8
#define SPI_OK_UPDATE 9
#define SPI_OK_CURSOR 10
#define SPI_OK_INSERT_RETURNING 11
#define SPI_OK_DELETE_RETURNING 12
#define SPI_OK_UPDATE_RETURNING 13
#define SPI_OK_REWRITTEN 14

Is there a way to have access to these constants from the PL/python code
and other PL/* (Tcl, Perl, etc.) ?

Thanks,

--
Samuel PHAN


Re: [HACKERS] Review of VS 2010 support patches

2012-01-03 Thread Andrew Dunstan



On 12/31/2011 06:10 PM, Brar Piening wrote:

Brar Piening wrote:

Andrew Dunstan wrote:
Can you narrow down exactly what in that commit broke VS 2010? Are 
there any compiler warnings?


I was able to nail down the problem.


In the absence of reaction, to keep my promise, I'm sending the 
attached Patch which restores the previous working behaviour for 
Visual Studio 2011.
Note however that it also restores the previous conflicts with errno.h 
which aren't neccessarily a problem, but might be in future.





OK, committed with a minor change to remove another compiler warning.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-03 Thread Christopher Browne
On Tue, Jan 3, 2012 at 5:55 AM, Samuel PHAN sam...@nomao.com wrote:
 I'm using PL/Python, and when getting the result object from a
 plpy.execute(), I can access to the result.status().

 E.g.: the result.status() is 4. But to know what 4 corresponds to, I must
 open the spi.h file from the sources to see :

 #define SPI_OK_CONNECT 1
 #define SPI_OK_FINISH 2
 #define SPI_OK_FETCH 3
 #define SPI_OK_UTILITY 4
 #define SPI_OK_SELECT 5
 #define SPI_OK_SELINTO 6
 #define SPI_OK_INSERT 7
 #define SPI_OK_DELETE 8
 #define SPI_OK_UPDATE 9
 #define SPI_OK_CURSOR 10
 #define SPI_OK_INSERT_RETURNING 11
 #define SPI_OK_DELETE_RETURNING 12
 #define SPI_OK_UPDATE_RETURNING 13
 #define SPI_OK_REWRITTEN 14

 Is there a way to have access to these constants from the PL/python code and
 other PL/* (Tcl, Perl, etc.) ?

I'd suppose it interesting to add a table to pg_catalog containing this data.

That would be one of the easier ways to allow all languages to have
access to the constants.

It might be a SMOP (Simple Matter Of Programming) to write a script to
transform the .h file into a suitable INSERT statement for such a
table.

I wonder if there are other sets of constants worth having.  I'd think
that the various forms of command OK codes would also be interesting
to have as a table like this.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-03 Thread Pavel Stehule
2012/1/3 Christopher Browne cbbro...@gmail.com:
 On Tue, Jan 3, 2012 at 5:55 AM, Samuel PHAN sam...@nomao.com wrote:
 I'm using PL/Python, and when getting the result object from a
 plpy.execute(), I can access to the result.status().

 E.g.: the result.status() is 4. But to know what 4 corresponds to, I must
 open the spi.h file from the sources to see :

 #define SPI_OK_CONNECT 1
 #define SPI_OK_FINISH 2
 #define SPI_OK_FETCH 3
 #define SPI_OK_UTILITY 4
 #define SPI_OK_SELECT 5
 #define SPI_OK_SELINTO 6
 #define SPI_OK_INSERT 7
 #define SPI_OK_DELETE 8
 #define SPI_OK_UPDATE 9
 #define SPI_OK_CURSOR 10
 #define SPI_OK_INSERT_RETURNING 11
 #define SPI_OK_DELETE_RETURNING 12
 #define SPI_OK_UPDATE_RETURNING 13
 #define SPI_OK_REWRITTEN 14

 Is there a way to have access to these constants from the PL/python code and
 other PL/* (Tcl, Perl, etc.) ?

 I'd suppose it interesting to add a table to pg_catalog containing this data.


 That would be one of the easier ways to allow all languages to have
 access to the constants.

- it is useless overhead

I am expecting so definition some constants in Perl, Python is simple

Regards

Pavel


 It might be a SMOP (Simple Matter Of Programming) to write a script to
 transform the .h file into a suitable INSERT statement for such a
 table.

 I wonder if there are other sets of constants worth having.  I'd think
 that the various forms of command OK codes would also be interesting
 to have as a table like this.
 --
 When confronted by a difficult problem, solve it by reducing it to the
 question, How would the Lone Ranger handle this?

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-03 Thread Robert Haas
On Mon, Jan 2, 2012 at 2:53 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Get rid of the freelist?  Once shared buffers are full, it's just about
 useless anyway.  But you'd need to think about the test cases that you
 pay attention to, as there might be scenarios where it remains useful.

 Agree freelist is mostly useless, but startup and dropping objects requires 
 it.

Not really.  If someone drops an object, we must invalidate all the
buffers immediately, but adding them to the free list is just an
optimization to make sure we reuse the invalidated buffers in
preference to evicting buffers that still have valid contents.  But I
suspect that in practice this is not a very important optimization,
because (1) most people probably don't drop permanent tables or
databases very often and (2) buffers that are being heavily used
should have a positive reference count, in which case the clock sweep
will pass over them and land on one of the newly-invalidated buffers
anyway.

 When its full its just an integer  0 test, which is cheap and its on
 the same cacheline as the nextVictimBuffer anyway, so we have to fetch
 it.

 The clock sweep is where all the time goes, in its current form.

...but I agree with this.  In its current form, the clock sweep has to
acquire a spinlock for every buffer it touches.  That's really
expensive, and I think we need to either get rid of it altogether or
at least find some way to move it into the background.  Ideally, in
the common case, a backend that wants to allocate a buffer shouldn't
need to do more than acquire a spinlock, pop a buffer off some sort of
linked list of allocation targets, and release the spinlock.  Whatever
other computation is needed should get pushed out of foreground
processing.

 We have 2 problems

 1. StrategySyncStart() requests exclusive lock on BufFreelistLock, so
 bgwriter has to fight with backends to find out where it should clean.
 As a result it spends lots of time waiting and insufficient time
 cleaning.

I have trouble accepting that this is really the problem we should be
solving.  There's only one background writer process, and that process
is waking up every 200ms by default.  Many people probably tune that
down quite a bit, but unless I'm quite mistaken, it should still be a
drop in the bucket next to what the backends are doing.

 2. When a backend can't find a free buffer, it spins for a long time
 while holding the lock. This makes the buffer strategy O(N) in its
 worst case, which slows everything down. Notably, while this is
 happening the bgwriter sits doing nothing at all, right at the moment
 when it is most needed. The Clock algorithm is an approximation of an
 LRU, so is already suboptimal as a perfect cache. Tweaking to avoid
 worst case behaviour makes sense. How much to tweak? Well,...

I generally agree with this analysis, but I don't think the proposed
patch is going to solve the problem.  It may have some merit as a way
of limiting the worst case behavior.  For example, if every shared
buffer has a reference count of 5, the first buffer allocation that
misses is going to have a lot of work to do before it can actually
come up with a victim.  But I don't think it's going to provide good
scaling in general.  Even if the background writer only spins through,
on average, ten or fifteen buffers before finding one to evict, that
still means we're acquiring ten or fifteen spinlocks while holding
BufFreelistLock. I don't currently have the measurements to prove
that's too expensive, but I bet it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 5:47 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Jan 3, 2012 at 4:40 AM, Noah Misch n...@leadboat.com wrote:
 On Tue, Jan 03, 2012 at 01:18:41AM +, Simon Riggs wrote:
 Just for the record, yes we do run multiple catalog scans in some
 parts of the code.

 So I can see how we might trigger 4 nested scans, using cache
 replacement while scanning, so best assume more, with no guarantee of
 them being neatly stacked for pop/push type access.

 Yeah, I wouldn't want to commit to a nesting limit.

 Agreed

 However, I _would_ have
 expected that a stack would suffice; PushActiveSnapshot()/PopActiveSnapshot()
 is adequate for a great deal of the backend, after all.  In what sort of
 situation do catalog scans not strictly nest?

 It's possible. Making assumptions about what is possible bit me
 before, as you showed.

 I've seen code where we are explicitly running two concurrent scans.

 I don't want to put unnecessary and subtle assumptions into catalog
 scan code so I'm working on the assumption that endscan may not be
 called in strict FIFO order. The difference is fairly minor and
 doesn't restrict us in other ways.

I feel like the first thing we should be doing here is some
benchmarking.  If we change just the scans in dependency.c and then
try the test case Tom suggested (dropping a schema containing a large
number of functions), we can compare the patched code with master and
get an idea of whether the performance is acceptable.  If it is,
changing everything else is a mostly mechanical process that we can
simply grind through.  If it's not, I'd rather learn that before we
start grinding.

I started to do this before Christmas, but then I ... went on
vacation.  Here's Perl script that can be used to generate an SQL
script to create as many functions as you'd like to try dropping at
once, in case it's helpful.  The idea is to run the resulting SQL
script through psql and then test the speed of DROP SCHEMA
lots_of_functions CASCADE;.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


lots_of_functions.pl
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: ALTER TABLE IF EXISTS

2012-01-03 Thread Robert Haas
On Mon, Jan 2, 2012 at 12:01 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 here is updated patch

I think the comments in parse_utilcmd.c probably need a bit of adjustment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add protransform for numeric, varbit, and temporal types

2012-01-03 Thread Robert Haas
On Sat, Dec 31, 2011 at 7:36 PM, Noah Misch n...@leadboat.com wrote:
 Building on commit 8f9fe6edce358f7904e0db119416b4d1080a83aa, this adds
 protransform functions to the length coercions for numeric, varbit, timestamp,
 timestamptz, time, timetz and interval.  This mostly serves to make more ALTER
 TABLE ALTER TYPE operations avoid a rewrite, including numeric(10,2) -
 numeric(12,2), varbit(4) - varbit(8) and timestamptz(2) - timestamptz(4).
 The rules for varbit are exactly the same as for varchar.  Numeric is slightly
 more complex:

  * Flatten calls to our length coercion function that solely represent
  * increases in allowable precision.  Scale changes mutate every datum, so
  * they are unoptimizable.  Some values, e.g. 1E-1001, can only fit into an
  * unconstrained numeric, so a change from an unconstrained numeric to any
  * constrained numeric is also unoptimizable.

 time{,stamp}{,tz} are similar to varchar for these purposes, except that, for
 example, plain timestamptz is equivalent to timestamptz(6).  interval has
 a vastly different typmod format, but the principles applicable to length
 coercion remain the same.

 Under --disable-integer-datetimes, I'm not positive that timestamp_scale() is
 always a no-op when one would logically expect as much.  Does there exist a
 timestamp such that v::timestamp(2) differs from v:timestamp(2)::timestamp(4)
 due to floating point rounding?  Even if so, I'm fairly comfortable calling it
 a feature rather than a bug to avoid perturbing values that way.

 After these patches, the only core length coercion casts not having
 protransform functions are those for bpchar and bit.  For those, we could
 only optimize trivial cases of no length change.  I'm not planning to do so.

This is cool stuff.  I will plan to review this once the CF starts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: ALTER TABLE IF EXISTS

2012-01-03 Thread Pavel Stehule
Hello

2012/1/3 Robert Haas robertmh...@gmail.com:
 On Mon, Jan 2, 2012 at 12:01 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 here is updated patch

 I think the comments in parse_utilcmd.c probably need a bit of adjustment.

I don't see it - there is only one comment and it is adjusted with
if statement.

please, show it

Regards

Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-03 Thread Simon Riggs
On Tue, Jan 3, 2012 at 3:18 PM, Robert Haas robertmh...@gmail.com wrote:

 The clock sweep is where all the time goes, in its current form.

 ...but I agree with this.  In its current form, the clock sweep has to
 acquire a spinlock for every buffer it touches.  That's really
 expensive, and I think we need to either get rid of it altogether or
 at least find some way to move it into the background.  Ideally, in
 the common case, a backend that wants to allocate a buffer shouldn't
 need to do more than acquire a spinlock, pop a buffer off some sort of
 linked list of allocation targets, and release the spinlock.  Whatever
 other computation is needed should get pushed out of foreground
 processing.

So you don't think a freelist is worth having, but you want a list of
allocation targets.
What is the practical difference?


 We have 2 problems

 1. StrategySyncStart() requests exclusive lock on BufFreelistLock, so
 bgwriter has to fight with backends to find out where it should clean.
 As a result it spends lots of time waiting and insufficient time
 cleaning.

 I have trouble accepting that this is really the problem we should be
 solving.  There's only one background writer process, and that process
 is waking up every 200ms by default.  Many people probably tune that
 down quite a bit, but unless I'm quite mistaken, it should still be a
 drop in the bucket next to what the backends are doing.

That doesn't follow. Forcing the bgwriter to wait makes no sense. Yes,
most of the contention is caused by the other backends, but the
bgwriter experiences that contention currently when it has no need to
do so.

Presumably if you did have an allocation list maintained in the
background by the bgwriter you wouldn't expect the bgwriter to wait
behind a lock for no reason when it could be doing useful work.


 2. When a backend can't find a free buffer, it spins for a long time
 while holding the lock. This makes the buffer strategy O(N) in its
 worst case, which slows everything down. Notably, while this is
 happening the bgwriter sits doing nothing at all, right at the moment
 when it is most needed. The Clock algorithm is an approximation of an
 LRU, so is already suboptimal as a perfect cache. Tweaking to avoid
 worst case behaviour makes sense. How much to tweak? Well,...

 I generally agree with this analysis, but I don't think the proposed
 patch is going to solve the problem.  It may have some merit as a way
 of limiting the worst case behavior.  For example, if every shared
 buffer has a reference count of 5, the first buffer allocation that
 misses is going to have a lot of work to do before it can actually
 come up with a victim.  But I don't think it's going to provide good
 scaling in general.  Even if the background writer only spins through,
 on average, ten or fifteen buffers before finding one to evict, that
 still means we're acquiring ten or fifteen spinlocks while holding
 BufFreelistLock. I don't currently have the measurements to prove
 that's too expensive, but I bet it is.

I think its worth reducing the cost of scanning, but that has little
to do with solving the O(N) problem. I think we need both.

I've left the way open for you to redesign freelist management in many
ways. Please take the opportunity and go for it, though we must
realise that major overhauls require significantly more testing to
prove their worth. Reducing spinlocking only sounds like a good way to
proceed for this release.

If you don't have time in 9.2, then these two small patches are worth
having. The bgwriter locking patch needs less formal evidence to show
its worth. We simply don't need to have a bgwriter that just sits
waiting doing nothing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: ALTER TABLE IF EXISTS

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 10:38 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello

 2012/1/3 Robert Haas robertmh...@gmail.com:
 On Mon, Jan 2, 2012 at 12:01 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 here is updated patch

 I think the comments in parse_utilcmd.c probably need a bit of adjustment.

 I don't see it - there is only one comment and it is adjusted with
 if statement.

 please, show it

Well, basically, the comment preceding the part you altered say the
lock level requested here, but here is getting spread out quite a
bit more with this code change.  Maybe that doesn't matter.

However, on further examination, this is a pretty awkward way to write
the code.  Why not something like this:

rel = relation_openrv_extended(stmt-relation, lockmode, stmt-missing_ok);
if (rel == NULL)
{
ereport(...);
return NIL;
}

Maybe the intent of sticking heap_openrv_extended() into the upper
branch of the if statement is to try to bounce relations that aren't
tables, but that's not actually what that function does (e.g. a
foreign table will slip through).  And I think if we're going to have
IF EXISTS support for ALTER TABLE at all, we ought to have it for the
whole family of related statements: ALTER VIEW, ALTER SEQUENCE, ALTER
FOREIGN TABLE, etc., not just ALTER TABLE itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Simon Riggs
On Tue, Jan 3, 2012 at 3:24 PM, Robert Haas robertmh...@gmail.com wrote:

 I feel like the first thing we should be doing here is some
 benchmarking.  If we change just the scans in dependency.c and then
 try the test case Tom suggested (dropping a schema containing a large
 number of functions), we can compare the patched code with master and
 get an idea of whether the performance is acceptable.

Yes, I've done this and it takes 2.5s to drop 10,000 functions using
an MVCC snapshot.

That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.

We can do a lot of tests but at the end its a human judgement. Is 100%
correct results from catalog accesses worth having when the real world
speed of it is not substantially very good? (Whether its x100
times slower or not is not relevant if it is still fast enough).

Anybody with that many DDL statements probably cares about doing
various operations with lower lock levels.

Fastpath locking will slow down DDL but we didn't measure the
performance slow down there. We understood the benefit and were
willing to pay the price.

So I'll proceed for now with the patch, which isn't as simple as you think.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to allow users to kill their own queries

2012-01-03 Thread Noah Misch
On Tue, Dec 20, 2011 at 02:30:08PM +0100, Magnus Hagander wrote:
 That said - can someone who knows the translation stuff better than me
 comment on if this is actually going to be translatable, or if it
 violates too many translation rules?

 +pg_signal_backend(int pid, int sig, bool allow_same_role, const char 
 *actionstr, const char *hint)

 + ereport(ERROR,
 + 
 (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 +  errmsg(must be superuser to %s other 
 server processes, actionstr),
 +  errhint(%s, hint)));

 + PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM, false,
 +  
 gettext_noop(terminate),
 +  
 gettext_noop(You can cancel your own processes with pg_cancel_backend().)));
  }

You need errhint(%s, _(hint)) or errhint(hint) to substitute the
translation at runtime; only the printf-pattern string gets an automatic
message catalog lookup.

Regarding the other message, avoid composing a translated message from
independently-translated parts.  The translator sees this:


#: utils/adt/misc.c:110
#, c-format
msgid must be superuser to %s other server processes
msgstr 

#: utils/adt/misc.c:166
msgid terminate
msgstr 

#: utils/adt/misc.c:167
msgid You can cancel your own processes with pg_cancel_backend().
msgstr 


He'll probably need to read the code to see how the strings go together.  If
we add an alternative to terminate, not all languages will necessarily have
a translation of the outer message that works for both inner fragments.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Fix Leaky View Problem

2012-01-03 Thread Kohei KaiGai
2011/12/23 Robert Haas robertmh...@gmail.com:
 On Fri, Dec 23, 2011 at 5:56 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I'd like the regression test on select_view test being committed also
 to detect unexpected changed in the future. How about it?

 Can you resend that as a separate patch?  I remember there were some
 things I didn't like about it, but I don't remember what they were at
 the moment...

Sorry for this late response.

The attached one is patch of the regression test that checks scenario
of malicious function with/without security_barrier option.

I guess you concerned about that expected/select_views_1.out is
patched, not expected/select_views.out.
I'm not sure the reason why regression test script tries to make diff
between results/select_views and expected/select_views_1.out.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


pgsql-regtest-leaky-views.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 10:56 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Jan 3, 2012 at 3:18 PM, Robert Haas robertmh...@gmail.com wrote:
 The clock sweep is where all the time goes, in its current form.

 ...but I agree with this.  In its current form, the clock sweep has to
 acquire a spinlock for every buffer it touches.  That's really
 expensive, and I think we need to either get rid of it altogether or
 at least find some way to move it into the background.  Ideally, in
 the common case, a backend that wants to allocate a buffer shouldn't
 need to do more than acquire a spinlock, pop a buffer off some sort of
 linked list of allocation targets, and release the spinlock.  Whatever
 other computation is needed should get pushed out of foreground
 processing.

 So you don't think a freelist is worth having, but you want a list of
 allocation targets.
 What is the practical difference?

I think that our current freelist is practically useless, because it
is almost always empty, and the cases where it's not empty (startup,
and after a table or database drop) are so narrow that we don't really
get any benefit out of having it.  However, I'm not opposed to the
idea of a freelist in general: I think that if we actually put in some
effort to keep the freelist in a non-empty state it would help a lot,
because backends would then have much less work to do at buffer
allocation time.

 I have trouble accepting that this is really the problem we should be
 solving.  There's only one background writer process, and that process
 is waking up every 200ms by default.  Many people probably tune that
 down quite a bit, but unless I'm quite mistaken, it should still be a
 drop in the bucket next to what the backends are doing.

 That doesn't follow. Forcing the bgwriter to wait makes no sense. Yes,
 most of the contention is caused by the other backends, but the
 bgwriter experiences that contention currently when it has no need to
 do so.

Sure, but if that contention is a negligible fraction of the total and
doesn't cost anything measurable, then it doesn't make sense to add
code to eliminate it.  There are all sorts of places in the system
where we have contention at a level that doesn't affect performance.
Many of those could probably be fixed by adding more complicated
locking, but that would just complicate the code to no purpose.  If
there's a demonstrable performance benefit to this change then it's
worth a harder look, but my gut feeling is that there won't be.

 2. When a backend can't find a free buffer, it spins for a long time
 while holding the lock. This makes the buffer strategy O(N) in its
 worst case, which slows everything down. Notably, while this is
 happening the bgwriter sits doing nothing at all, right at the moment
 when it is most needed. The Clock algorithm is an approximation of an
 LRU, so is already suboptimal as a perfect cache. Tweaking to avoid
 worst case behaviour makes sense. How much to tweak? Well,...

 I generally agree with this analysis, but I don't think the proposed
 patch is going to solve the problem.  It may have some merit as a way
 of limiting the worst case behavior.  For example, if every shared
 buffer has a reference count of 5, the first buffer allocation that
 misses is going to have a lot of work to do before it can actually
 come up with a victim.  But I don't think it's going to provide good
 scaling in general.  Even if the background writer only spins through,
 on average, ten or fifteen buffers before finding one to evict, that
 still means we're acquiring ten or fifteen spinlocks while holding
 BufFreelistLock. I don't currently have the measurements to prove
 that's too expensive, but I bet it is.

 I think its worth reducing the cost of scanning, but that has little
 to do with solving the O(N) problem. I think we need both.

Maybe.  I would really like a unified solution to both problems, but
I'd be willing to accept a solution for one of them if we're confident
that we've actually solved it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 11:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, Jan 3, 2012 at 3:24 PM, Robert Haas robertmh...@gmail.com wrote:
 I feel like the first thing we should be doing here is some
 benchmarking.  If we change just the scans in dependency.c and then
 try the test case Tom suggested (dropping a schema containing a large
 number of functions), we can compare the patched code with master and
 get an idea of whether the performance is acceptable.

 Yes, I've done this and it takes 2.5s to drop 10,000 functions using
 an MVCC snapshot.

 That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.

 We can do a lot of tests but at the end its a human judgement. Is 100%
 correct results from catalog accesses worth having when the real world
 speed of it is not substantially very good? (Whether its x100
 times slower or not is not relevant if it is still fast enough).

Sure.  But I don't see why that means it wouldn't be nice to know
whether or not it is in fact a million times slower.  If Tom's
artificial worst case is a million times slower, then probably there
are some cases we care about more that are going to be measurably
impacted, and we're going to want to think about what to do about
that.  We can wait until you've finished the patch before we do that
testing, or we can do it now and maybe get some idea whether the
approach is likely to be viable or whether it's going to require some
adjustment before we actually go trawl through all that code.

On my laptop, dropping a schema with 10,000 functions using commit
d5448c7d31b5af66a809e6580bae9bd31448bfa7 takes 400-500 ms.  If my
laptop is the same speed as your laptop, that would mean a 5-6x
slowdown, but of course that's comparing apples and oranges...  in any
event, if the real number is anywhere in that ballpark, it's probably
a surmountable obstacle, but I'd like to know rather than guessing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sorting operators in pg_dump

2012-01-03 Thread Robert Haas
On Sun, Jan 1, 2012 at 11:50 PM, Peter Eisentraut pete...@gmx.net wrote:
 Currently, pg_dump sorts operators by name, but operators with the same
 name come out in random order.  A few releases ago we adjusted this for
 functions, so that they are in increasing number of arguments order.
 I'd like to do this for operators as well, so that they come out in
 order, say, prefix, postfix, infix.

 (It might be surprising that something like this is necessary, but it
 happens.  ip4r for example contains operators with different fixnesses
 (fixities?).)

 Patch attached, and a little test case.

Seems like a good idea.  The patch looks good, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 That was acceptable to *me*, so I didn't try measuring using just SnapshotNow.

 We can do a lot of tests but at the end its a human judgement. Is 100%
 correct results from catalog accesses worth having when the real world
 speed of it is not substantially very good? (Whether its x100
 times slower or not is not relevant if it is still fast enough).

That argument is just nonsense AFAICT.  Yes, 2.5 s to drop 1
functions is probably fine, but that is an artificial test case whose
only real interest is to benchmark what a change in SnapshotNow scans
might cost us.  In the real world it's hard to guess what fraction of a
real workload might consist of such scans, but I suspect there are some
where a significant increase in that cost might hurt.  So in my mind the
point of the exercise is to find out how much the cost increased, and
I'm just baffled as to why you won't use a benchmark case to, um,
benchmark.

Another point that requires some thought is that switching SnapshotNow
to be MVCC-based will presumably result in a noticeable increase in each
backend's rate of wanting to acquire snapshots.  Hence, more contention
in GetSnapshotData can be expected.  A single-threaded test case doesn't
prove anything at all about what that might cost under load.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to allow users to kill their own queries

2012-01-03 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Regarding the other message, avoid composing a translated message from
 independently-translated parts.

Yes.  I haven't looked at the patch, but I wonder whether it wouldn't be
better to dodge both of these problems by having the subroutine return a
success/failure result code, so that the actual ereport can be done at
an outer level where the full message (and hint) can be written
directly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Simon Riggs
On Fri, Sep 9, 2011 at 11:02 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Assuming the issue really is the physical unlinks (which I agree I'd
 like to see some evidence for), I wonder whether the problem could be
 addressed by moving smgrDoPendingDeletes() to after locks are released,
 instead of before, in CommitTransaction/AbortTransaction.  There does
 not seem to be any strong reason why we have to do that before lock
 release, since incoming potential users of a table should not be trying
 to access the old physical storage after that anyway.

 Alright, since this concern about confirming the expensive part of
 index dropping has come up a few times but otherwise the waters are
 warm, I'll go ahead and do some work to pin things down a bit before
 we continue working on those assumptions.


 This suspicion seems to be proven correct; there came an opportunity
 where we were removing some indexes on a live system and I took the
 opportunity to carefully control and time the process.  There's not
 much relationship between size of the index and the delay, but the
 pauses are still very real. On the other hand, the first time this was
 noticed there was significantly higher load.

 I'd still like to do something to solve this problem, though: even if
 the time-consuming part of the process is not file unlinking, it's
 clearly something after the AccessExclusiveLock is acquired based on
 our other measurements.

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

On top of that, taking what Robert Haas mentioned on another thread,
InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
an ExclusiveLock on the BufFreelistLock. On a busy system this will be
heavily contended, so adding blocks to the freelist only if the lock
is free seems warranted.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..af7215f 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -218,12 +218,21 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 }
 
 /*
- * StrategyFreeBuffer: put a buffer on the freelist
+ * StrategyFreeBuffer: put a buffer on the freelist, unless we're busy
  */
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
-	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+	/*
+	 * The buffer is already invalidated and is now an allocation target.
+	 * Adding buffers back onto the freelist is an optimisation only,
+	 * so we can decide to skip this step if the lock is busy.
+	 * This improves the speed of dropping indexes and tables on a busy system.
+	 * If the system is busy the newly invalidated buffers will be reallocated
+	 * within one clock sweep.
+	 */
+	if (!LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE))
+		return;
 
 	/*
 	 * It is possible that we are told to put something in the freelist that

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 That was acceptable to *me*, so I didn't try measuring using just 
 SnapshotNow.

 We can do a lot of tests but at the end its a human judgement. Is 100%
 correct results from catalog accesses worth having when the real world
 speed of it is not substantially very good? (Whether its x100
 times slower or not is not relevant if it is still fast enough).

 That argument is just nonsense AFAICT.  Yes, 2.5 s to drop 1
 functions is probably fine, but that is an artificial test case whose
 only real interest is to benchmark what a change in SnapshotNow scans
 might cost us.  In the real world it's hard to guess what fraction of a
 real workload might consist of such scans, but I suspect there are some
 where a significant increase in that cost might hurt.  So in my mind the
 point of the exercise is to find out how much the cost increased, and
 I'm just baffled as to why you won't use a benchmark case to, um,
 benchmark.

Ditto.

 Another point that requires some thought is that switching SnapshotNow
 to be MVCC-based will presumably result in a noticeable increase in each
 backend's rate of wanting to acquire snapshots.  Hence, more contention
 in GetSnapshotData can be expected.  A single-threaded test case doesn't
 prove anything at all about what that might cost under load.

This is obviously true at some level, but I'm not sure that it really
matters.  It's not that difficult to construct a test case where we
have lots of people concurrently reading a table, or reading many
tables, or writing a table, or writing many tables, but what kind of
realistic test case involves enough DDL for any of this to matter?  If
you're creating or dropping tables, for example, the filesystem costs
are likely to be a much bigger problem than GetSnapshotData(), to the
point where you probably can't get enough concurrency for
GetSnapshotData() to matter.  Maybe you could find a problem case
involving creating or dropping lots and lots of functions
concurrently, or something like that, but who does that?  You'd have
to be performing operations on hundreds of non-table SQL objects per
second, and it is hard for me to imagine why anyone would be doing
that.  Maybe I'm not imaginative enough?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar ene 03 12:24:52 -0300 2012:

 I feel like the first thing we should be doing here is some
 benchmarking.  If we change just the scans in dependency.c and then
 try the test case Tom suggested (dropping a schema containing a large
 number of functions), we can compare the patched code with master and
 get an idea of whether the performance is acceptable.  If it is,
 changing everything else is a mostly mechanical process that we can
 simply grind through.  If it's not, I'd rather learn that before we
 start grinding.

If there are many call sites, maybe it'd be a good idea to use a
semantic patcher tool such as Coccinelle instead of doing it one by one.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Another point that requires some thought is that switching SnapshotNow
 to be MVCC-based will presumably result in a noticeable increase in each
 backend's rate of wanting to acquire snapshots.  Hence, more contention
 in GetSnapshotData can be expected.  A single-threaded test case doesn't
 prove anything at all about what that might cost under load.

 This is obviously true at some level, but I'm not sure that it really
 matters.  It's not that difficult to construct a test case where we
 have lots of people concurrently reading a table, or reading many
 tables, or writing a table, or writing many tables, but what kind of
 realistic test case involves enough DDL for any of this to matter?

Um ... you're supposing that only DDL uses SnapshotNow, which is wrong.
I refer you to the parser, the planner, execution functions for arrays,
records, enums, any sort of relcache reload, etc etc etc.  Yes, some
of that is masked by backend-internal caching, some of the time, but
it's folly to just assume that there are no SnapshotNow scans during
normal queries.

None of this is necessarily grounds to reject a patch along the proposed
lines.  I'm just asking for some benchmarking effort to establish what
the costs might be, rather than naively hoping they are negligible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 1:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Um ... you're supposing that only DDL uses SnapshotNow, which is wrong.
 I refer you to the parser, the planner, execution functions for arrays,
 records, enums, any sort of relcache reload, etc etc etc.  Yes, some
 of that is masked by backend-internal caching, some of the time, but
 it's folly to just assume that there are no SnapshotNow scans during
 normal queries.

Hmm.  That's unfortunate, because it seems difficult to construct a
test case that will exercise every feature in the system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Um ... you're supposing that only DDL uses SnapshotNow, which is
 wrong.  I refer you to the parser, the planner, execution
 functions for arrays, records, enums, any sort of relcache
 reload, etc etc etc.  Yes, some of that is masked by
 backend-internal caching, some of the time, but it's folly to
 just assume that there are no SnapshotNow scans during normal
 queries.
 
 Hmm.  That's unfortunate, because it seems difficult to construct
 a test case that will exercise every feature in the system.
 
Would the result of IsMVCCSnapshot() change for these snapshots?  If
so, it might require work in the SSI code to avoid a performance hit
there.  In early profiling and stepping through execution I noticed
that we had overhead in serializable transactions for the planner
checks for the actual values at the beginning or end of an index. 
This went away when we avoided SSI work for reads using a non-MVCC
snapshot.  If we're going to start using MVCC snapshots for such
things, we need some other way to avoid unnecessary work in this
area (and possibly others).
 
At a minimum, some comparative benchmarks at the serializable
isolation level would be in order when considering a patch like
this.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Tom Lane
I wrote:
 Another point that requires some thought is that switching SnapshotNow
 to be MVCC-based will presumably result in a noticeable increase in each
 backend's rate of wanting to acquire snapshots.

BTW, I wonder if this couldn't be ameliorated by establishing some
ground rules about how up-to-date a snapshot really needs to be.
Arguably, it should be okay for successive SnapshotNow scans to use the
same snapshot as long as we have not acquired a new lock in between.
If not, reusing an old snap doesn't introduce any race condition that
wasn't there already.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] improve pg_restore warning on text dump input

2012-01-03 Thread Andrew Dunstan
From time to time there are complaints because people mistakenly feed a 
text format dump to pg_restore and get back a somewhat cryptic message 
about the file not being a valid archive. It's been suggested that we 
should have pg_restore run the file through psql, but that would involve 
more work than I at least care to give the problem. However, I think we 
should give a nicer message, suggesting the user try feeding the file to 
psql instead. The attached small patch does that.


cheers

andrew


diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 7d895c4..9918c4d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -77,6 +77,9 @@ typedef struct _parallel_slot
 
 #define NO_SLOT (-1)
 
+#define TEXT_DUMP_HEADER --\n-- PostgreSQL database dump\n--\n\n
+#define TEXT_DUMPALL_HEADER --\n-- PostgreSQL database cluster dump\n--\n\n
+
 /* state needed to save/restore an archive's output target */
 typedef struct _outputContext
 {
@@ -1872,7 +1875,19 @@ _discoverArchiveFormat(ArchiveHandle *AH)
 			die_horribly(AH, modulename, input file does not appear to be a valid archive (too short?)\n);
 
 		if (!isValidTarHeader(AH-lookahead))
-			die_horribly(AH, modulename, input file does not appear to be a valid archive\n);
+		{
+			if (strncmp(AH-lookahead, TEXT_DUMP_HEADER, strlen(TEXT_DUMP_HEADER)) == 0 ||
+strncmp(AH-lookahead, TEXT_DUMPALL_HEADER, strlen(TEXT_DUMPALL_HEADER)) == 0)
+			{
+/* looks like it's probably a text format dump. so suggest they try psql */
+die_horribly(AH, modulename, input file appears to be a text format dump. Please use psql.\n);
+			}
+			else
+			{
+/* we have no idea what this is */
+die_horribly(AH, modulename, input file does not appear to be a valid archive\n);
+			}
+		}
 
 		AH-format = archTar;
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improve pg_restore warning on text dump input

2012-01-03 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
  From time to time there are complaints because people mistakenly feed a 
 text format dump to pg_restore and get back a somewhat cryptic message 
 about the file not being a valid archive. It's been suggested that we 
 should have pg_restore run the file through psql, but that would involve 
 more work than I at least care to give the problem. However, I think we 
 should give a nicer message, suggesting the user try feeding the file to 
 psql instead. The attached small patch does that.

It would probably be better if you put this test before the one that
insists the file is at least 512 bytes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 1:42 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Um ... you're supposing that only DDL uses SnapshotNow, which is
 wrong.  I refer you to the parser, the planner, execution
 functions for arrays, records, enums, any sort of relcache
 reload, etc etc etc.  Yes, some of that is masked by
 backend-internal caching, some of the time, but it's folly to
 just assume that there are no SnapshotNow scans during normal
 queries.

 Hmm.  That's unfortunate, because it seems difficult to construct
 a test case that will exercise every feature in the system.

 Would the result of IsMVCCSnapshot() change for these snapshots?  If
 so, it might require work in the SSI code to avoid a performance hit
 there.  In early profiling and stepping through execution I noticed
 that we had overhead in serializable transactions for the planner
 checks for the actual values at the beginning or end of an index.
 This went away when we avoided SSI work for reads using a non-MVCC
 snapshot.  If we're going to start using MVCC snapshots for such
 things, we need some other way to avoid unnecessary work in this
 area (and possibly others).

 At a minimum, some comparative benchmarks at the serializable
 isolation level would be in order when considering a patch like
 this.

Ugh.  Yeah, that sounds like a problem.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improve pg_restore warning on text dump input

2012-01-03 Thread Andrew Dunstan



On 01/03/2012 01:55 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

   From time to time there are complaints because people mistakenly feed a
text format dump to pg_restore and get back a somewhat cryptic message
about the file not being a valid archive. It's been suggested that we
should have pg_restore run the file through psql, but that would involve
more work than I at least care to give the problem. However, I think we
should give a nicer message, suggesting the user try feeding the file to
psql instead. The attached small patch does that.

It would probably be better if you put this test before the one that
insists the file is at least 512 bytes.





Hmm, yeah. I guess we're pretty much certain that these markers can't 
reasonably appear at the start of a tar archive.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Another point that requires some thought is that switching SnapshotNow
 to be MVCC-based will presumably result in a noticeable increase in each
 backend's rate of wanting to acquire snapshots.

 BTW, I wonder if this couldn't be ameliorated by establishing some
 ground rules about how up-to-date a snapshot really needs to be.
 Arguably, it should be okay for successive SnapshotNow scans to use the
 same snapshot as long as we have not acquired a new lock in between.
 If not, reusing an old snap doesn't introduce any race condition that
 wasn't there already.

Is that likely to help much?  I think our usual pattern is to lock the
catalog, scan it, and then release the lock, so there will normally be
an AcceptInvalidationMessages() just before the scan.  Or at least, I
think there will.

Another thought is that it should always be safe to reuse an old
snapshot if no transactions have committed or aborted since it was
taken (possibly we could narrow that to no transactions have
committed since it was taken, but I think that might cause some
headaches with RecentGlobalXmin).  I wonder if we could come up with a
cheap, hopefully lock-free method of determining whether or not that's
the case.  For example, suppose we store the last XID to commit or
abort in shared memory.  In GetSnapshotData(), we check whether that
value has changed (probably, after enforcing a memory barrier of some
kind) before acquiring the lock. If it has, we proceed normally, but
if not, we just reuse the results from the previous GetSnapshotData().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 3, 2012 at 1:42 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, I wonder if this couldn't be ameliorated by establishing some
 ground rules about how up-to-date a snapshot really needs to be.
 Arguably, it should be okay for successive SnapshotNow scans to use the
 same snapshot as long as we have not acquired a new lock in between.
 If not, reusing an old snap doesn't introduce any race condition that
 wasn't there already.

 Is that likely to help much?  I think our usual pattern is to lock the
 catalog, scan it, and then release the lock, so there will normally be
 an AcceptInvalidationMessages() just before the scan.  Or at least, I
 think there will.

Um, good point.  Those locks aren't meant to avoid race conditions,
but the mechanism doesn't know that.

 Another thought is that it should always be safe to reuse an old
 snapshot if no transactions have committed or aborted since it was
 taken

Yeah, that might work better.  And it'd be a win for all MVCC snaps,
not just the ones coming from promoted SnapshotNow ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: ALTER TABLE IF EXISTS

2012-01-03 Thread Pavel Stehule
Hello

2012/1/3 Robert Haas robertmh...@gmail.com:
 On Tue, Jan 3, 2012 at 10:38 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 Hello

 2012/1/3 Robert Haas robertmh...@gmail.com:
 On Mon, Jan 2, 2012 at 12:01 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 here is updated patch

 I think the comments in parse_utilcmd.c probably need a bit of adjustment.

 I don't see it - there is only one comment and it is adjusted with
 if statement.

 please, show it

 Well, basically, the comment preceding the part you altered say the
 lock level requested here, but here is getting spread out quite a
 bit more with this code change.  Maybe that doesn't matter.

 However, on further examination, this is a pretty awkward way to write
 the code.  Why not something like this:

 rel = relation_openrv_extended(stmt-relation, lockmode, stmt-missing_ok);
 if (rel == NULL)
 {
    ereport(...);
    return NIL;
 }

 Maybe the intent of sticking heap_openrv_extended() into the upper
 branch of the if statement is to try to bounce relations that aren't
 tables, but that's not actually what that function does (e.g. a
 foreign table will slip through).  And I think if we're going to have
 IF EXISTS support for ALTER TABLE at all, we ought to have it for the
 whole family of related statements: ALTER VIEW, ALTER SEQUENCE, ALTER
 FOREIGN TABLE, etc., not just ALTER TABLE itself.


jup, we can continue in enhancing step by step.

I change a patch and now ALTER TABLE, ALTER INDEX, ALTER SEQUENCE and
ALTER VIEW has IF EXISTS clause

Regards

Pavel


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company
*** ./doc/src/sgml/ref/alter_index.sgml.orig	2012-01-02 17:01:00.0 +0100
--- ./doc/src/sgml/ref/alter_index.sgml	2012-01-03 19:45:24.210189185 +0100
***
*** 21,30 
  
   refsynopsisdiv
  synopsis
! ALTER INDEX replaceable class=PARAMETERname/replaceable RENAME TO replaceable class=PARAMETERnew_name/replaceable
! ALTER INDEX replaceable class=PARAMETERname/replaceable SET TABLESPACE replaceable class=PARAMETERtablespace_name/replaceable
! ALTER INDEX replaceable class=PARAMETERname/replaceable SET ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] )
! ALTER INDEX replaceable class=PARAMETERname/replaceable RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] )
  /synopsis
   /refsynopsisdiv
  
--- 21,30 
  
   refsynopsisdiv
  synopsis
! ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RENAME TO replaceable class=PARAMETERnew_name/replaceable
! ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable SET TABLESPACE replaceable class=PARAMETERtablespace_name/replaceable
! ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable SET ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] )
! ALTER INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] )
  /synopsis
   /refsynopsisdiv
  
***
*** 38,43 
--- 38,53 
variablelist
  
 varlistentry
+ termliteralIF EXISTS/literal/term
+ listitem
+  para
+   Do not throw an error if the index does not exist. A notice is issued
+   in this case.
+  /para
+ /listitem
+/varlistentry
+ 
+varlistentry
  termliteralRENAME/literal/term
  listitem
   para
*** ./doc/src/sgml/ref/alter_sequence.sgml.orig	2012-01-02 17:01:00.0 +0100
--- ./doc/src/sgml/ref/alter_sequence.sgml	2012-01-03 18:44:14.397429013 +0100
***
*** 23,37 
  
   refsynopsisdiv
  synopsis
! ALTER SEQUENCE replaceable class=parametername/replaceable [ INCREMENT [ BY ] replaceable class=parameterincrement/replaceable ]
  [ MINVALUE replaceable class=parameterminvalue/replaceable | NO MINVALUE ] [ MAXVALUE replaceable class=parametermaxvalue/replaceable | NO MAXVALUE ]
  [ START [ WITH ] replaceable class=parameterstart/replaceable ]
  [ RESTART [ [ WITH ] replaceable class=parameterrestart/replaceable ] ]
  [ CACHE replaceable class=parametercache/replaceable ] [ [ NO ] CYCLE ]
  [ OWNED BY { replaceable class=parametertable/replaceable.replaceable class=parametercolumn/replaceable | NONE } ]
! ALTER SEQUENCE replaceable class=parametername/replaceable OWNER TO replaceable class=PARAMETERnew_owner/replaceable
! ALTER SEQUENCE replaceable class=parametername/replaceable RENAME TO replaceable class=parameternew_name/replaceable
! ALTER SEQUENCE replaceable class=parametername/replaceable SET SCHEMA replaceable class=parameternew_schema/replaceable
  /synopsis
   /refsynopsisdiv
  
--- 23,37 
  
   refsynopsisdiv
  synopsis
! ALTER SEQUENCE [ IF EXISTS ] replaceable class=parametername/replaceable [ INCREMENT [ BY ] replaceable class=parameterincrement/replaceable ]
  [ 

Re: [HACKERS] spinlocks on powerpc

2012-01-03 Thread Jeremy Harris

On 2012-01-03 04:44, Robert Haas wrote:

On read-only workloads, you get spinlock contention, because everyone
who wants a snapshot has to take the LWLock mutex to increment the
shared lock count and again (just a moment later) to decrement it.


Does the LWLock protect anything but the shared lock count?  If not
then the usually quickest manipulation is along the lines of:

loop: lwarx r5,0,r3  #load and reserve
add r0,r4,r5 #increment word
stwcx. r0,0,r3  #store new value if still reserved
bne-loop  #loop if lost reservation

(per IBM's software ref manual,
 
https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719DF2
)

The same sort of thing generally holds on other instruction-sets also.

Also, heavy-contention locks should be placed in cache lines away from other
data (to avoid thrashing the data cache lines when processors are fighting
over the lock cache lines).
--
Jeremy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-01-03 Thread Alexander Korotkov
Hi!

Thanks for your great work on reviewing this patch. Now I'm trying to find
memory corruption bug. Unfortunately it doesn't appears on my system. Can
you check if this bug remains in attached version of patch. If so, please
provide me information about system you're running (processor, OS etc.).

--
With best regards,
Alexander Korotkov.


arrayanalyze-0.8.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on powerpc

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 3:05 PM, Jeremy Harris j...@wizmail.org wrote:
 On 2012-01-03 04:44, Robert Haas wrote:
 On read-only workloads, you get spinlock contention, because everyone
 who wants a snapshot has to take the LWLock mutex to increment the
 shared lock count and again (just a moment later) to decrement it.

 Does the LWLock protect anything but the shared lock count?  If not
 then the usually quickest manipulation is along the lines of:

 loop: lwarx r5,0,r3  #load and reserve
        add     r0,r4,r5 #increment word
        stwcx. r0,0,r3  #store new value if still reserved
        bne-    loop      #loop if lost reservation

 (per IBM's software ref manual,
  https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719DF2
 )

 The same sort of thing generally holds on other instruction-sets also.

Sure, but the actual critical section is not that simple.  You might
look at the code for LWLockAcquire() if you're curious.

 Also, heavy-contention locks should be placed in cache lines away from other
 data (to avoid thrashing the data cache lines when processors are fighting
 over the lock cache lines).

Yep.  This is possibly a problem, and has been discussed before, but I
don't think we have any firm evidence that it's a problem, or how much
padding helps.  The heavily contended LWLocks are mostly
non-consecutive, except perhaps for the buffer mapping locks.

It's been suggested to me that we should replace our existing LWLock
implementation with a CAS-based implementation that crams all the
relevant details into a single 8-byte word.  The pointer to the head
of the wait queue, for example, could be stored as an offset into the
allProcs array rather than a pointer value, which would allow it to be
stored in 24 bits rather than 8 bytes.  But there's not quite enough
bit space to make it work without making compromises -- most likely,
reducing the theoretical upper limit on MaxBackends from 2^24 to, say,
2^22.  Even if we were willing to do that, the performance benefits of
using atomics here are so far unproven... which doesn't mean they
don't exist, but someone's going to have to do some work to show that
they do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-01-03 Thread Noah Misch
On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote:
 Thanks for your great work on reviewing this patch. Now I'm trying to find
 memory corruption bug. Unfortunately it doesn't appears on my system. Can
 you check if this bug remains in attached version of patch. If so, please
 provide me information about system you're running (processor, OS etc.).

I get the same diagnostic from this version.  Opteron processor, operating
system is Ubuntu 8.04 (64-bit).  You're using --enable-cassert, right?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on powerpc

2012-01-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 3, 2012 at 3:05 PM, Jeremy Harris j...@wizmail.org wrote:
 Also, heavy-contention locks should be placed in cache lines away from other
 data (to avoid thrashing the data cache lines when processors are fighting
 over the lock cache lines).

 Yep.  This is possibly a problem, and has been discussed before, but I
 don't think we have any firm evidence that it's a problem, or how much
 padding helps.  The heavily contended LWLocks are mostly
 non-consecutive, except perhaps for the buffer mapping locks.

We are in fact already padding and aligning LWLocks on (if memory
serves) 16 or 32 byte boundaries depending on platform.  So there
might be 2 to 4 LWLocks in the same cache line, depending on platform.
It's been suggested before that we pad more to reduce this number,
but nobody's demonstrated any performance win from doing so.

 It's been suggested to me that we should replace our existing LWLock
 implementation with a CAS-based implementation that crams all the
 relevant details into a single 8-byte word.

I'm under the impression that the main costs are associated with trading
cache line ownership between CPUs, which makes me think that this'd be
essentially a waste of effort, quite aside from the portability problems
involved.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-01-03 Thread Alexander Korotkov
On Wed, Jan 4, 2012 at 12:33 AM, Noah Misch n...@leadboat.com wrote:

 On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote:
  Thanks for your great work on reviewing this patch. Now I'm trying to
 find
  memory corruption bug. Unfortunately it doesn't appears on my system. Can
  you check if this bug remains in attached version of patch. If so, please
  provide me information about system you're running (processor, OS etc.).

 I get the same diagnostic from this version.  Opteron processor, operating
 system is Ubuntu 8.04 (64-bit).  You're using --enable-cassert, right?

Oh, actually no. Thanks for point.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Simon Riggs
On Tue, Jan 3, 2012 at 6:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 3, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Another point that requires some thought is that switching SnapshotNow
 to be MVCC-based will presumably result in a noticeable increase in each
 backend's rate of wanting to acquire snapshots.  Hence, more contention
 in GetSnapshotData can be expected.  A single-threaded test case doesn't
 prove anything at all about what that might cost under load.

 This is obviously true at some level, but I'm not sure that it really
 matters.  It's not that difficult to construct a test case where we
 have lots of people concurrently reading a table, or reading many
 tables, or writing a table, or writing many tables, but what kind of
 realistic test case involves enough DDL for any of this to matter?

 Um ... you're supposing that only DDL uses SnapshotNow, which is wrong.
 I refer you to the parser, the planner, execution functions for arrays,
 records, enums, any sort of relcache reload, etc etc etc.  Yes, some
 of that is masked by backend-internal caching, some of the time, but
 it's folly to just assume that there are no SnapshotNow scans during
 normal queries.

 None of this is necessarily grounds to reject a patch along the proposed
 lines.  I'm just asking for some benchmarking effort to establish what
 the costs might be, rather than naively hoping they are negligible.

All of which is reasonable doubt.

So far, all I'm saying is that on a couple of heavy duty cases that
I've run, I haven't noticed a problem. That is sufficient for me to
push forwards with a patch that does very close to what we want, so we
can judge acceptability with wider tests. I'm not saying it will be
acceptable a priori, only that it is worth trying because the benefits
are high. The number of call points are also high.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Simon Riggs
On Tue, Jan 3, 2012 at 6:14 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 If there are many call sites, maybe it'd be a good idea to use a
 semantic patcher tool such as Coccinelle instead of doing it one by one.

Thanks for the suggestion, regrettably I've already made those changes.

After examining the call sites, I identified 35 that might need
changing. Of those, about 30 were changed to use systable_beginscan,
while a few others use declared snapshots instead. So not a great
effort and worth doing the by-hand inspection.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2012-01-03 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of mar ene 03 17:57:56 -0300 2012:
 
 On Tue, Jan 3, 2012 at 6:14 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  If there are many call sites, maybe it'd be a good idea to use a
  semantic patcher tool such as Coccinelle instead of doing it one by one.
 
 Thanks for the suggestion, regrettably I've already made those changes.
 
 After examining the call sites, I identified 35 that might need
 changing. Of those, about 30 were changed to use systable_beginscan,
 while a few others use declared snapshots instead. So not a great
 effort and worth doing the by-hand inspection.

Yeah, for 35 changes I wouldn't have gone all the length required to
learn the new tool either.  If they had been 200, OTOH ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on powerpc

2012-01-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I'm unconvinced by these numbers.  There is a measurable change but it
 is pretty small.  The Itanium changes resulted in an enormous gain at
 higher concurrency levels.

Yeah, that was my problem with it also: I couldn't measure enough gain
to convince me it was a real effect, and not an artifact of the specific
machine being tested.  It occurred to me though that we already know that
pgbench itself is a bottleneck in tests like this, and there's an easy
way to take it out of the picture: move the selects into a plpgsql
function that iterates multiple times per client query.  The attached
testing script reduces the client interaction costs by a thousandfold
compared to plain pgbench -S, and also takes parse/plan time out of
the loop, so that it becomes easier to see the effects of contention.
With this script, I still see a loss of 1% or so from adding the
unlocked test in TAS_SPIN at moderate contention levels, but there's a
very clear jump when the machine is saturated.  So this convinces me
that Manabu-san's results are reproducible, and I've committed the
TAS_SPIN addition.

git head as of this morning, on 8-core IBM 8406-71Y:

pgbench -c 1 -j 1 -f bench.script -T 300 bench  tps = 50.142878 
(including connections establishing)
pgbench -c 2 -j 1 -f bench.script -T 300 bench  tps = 97.179234 
(including connections establishing)
pgbench -c 8 -j 4 -f bench.script -T 300 bench  tps = 341.731982 
(including connections establishing)
pgbench -c 16 -j 8 -f bench.script -T 300 bench tps = 402.114111 
(including connections establishing)
pgbench -c 32 -j 16 -f bench.script -T 300 benchtps = 371.338156 
(including connections establishing)
pgbench -c 64 -j 32 -f bench.script -T 300 benchtps = 359.785087 
(including connections establishing)
pgbench -c 96 -j 48 -f bench.script -T 300 benchtps = 363.879950 
(including connections establishing)
pgbench -c 128 -j 64 -f bench.script -T 300 bench   tps = 376.794934 
(including connections establishing)

after re-adding TAS_SPIN macro:

pgbench -c 1 -j 1 -f bench.script -T 300 bench  tps = 50.182676 
(including connections establishing)
pgbench -c 2 -j 1 -f bench.script -T 300 bench  tps = 96.751910 
(including connections establishing)
pgbench -c 8 -j 4 -f bench.script -T 300 bench  tps = 327.108510 
(including connections establishing)
pgbench -c 16 -j 8 -f bench.script -T 300 bench tps = 395.425611 
(including connections establishing)
pgbench -c 32 -j 16 -f bench.script -T 300 benchtps = 444.291852 
(including connections establishing)
pgbench -c 64 -j 32 -f bench.script -T 300 benchtps = 486.151168 
(including connections establishing)
pgbench -c 96 -j 48 -f bench.script -T 300 benchtps = 496.379981 
(including connections establishing)
pgbench -c 128 -j 64 -f bench.script -T 300 bench   tps = 494.058124 
(including connections establishing)


 For Itanium, I was able to find some fairly official-looking
 documentation that said this is how you should do it.  It would be
 nice to find something similar for PPC64, instead of testing every
 machine and reinventing the wheel ourselves.

You are aware that our spinlock code is pretty much verbatim from the
PPC ISA spec, no?  The issue here is that the official documentation
has been a moving target over the decades the ISA has been in existence.

regards, tom lane

#! /bin/sh

psql bench EOF
create or replace function bench(scale int, reps int) returns void
language plpgsql
as $$
declare
  naccounts int := 10 * scale;
  v_aid int;
  v_abalance int;
begin
  for i in 1 .. reps loop
v_aid := round(random() * naccounts + 1);
SELECT abalance INTO v_abalance FROM pgbench_accounts WHERE aid = v_aid;
  end loop;
end;
$$;
EOF

cat bench.script EOF
select bench(:scale, 1000);
EOF

# warm the caches a bit
pgbench -c 1 -j 1 -f bench.script -T 30 bench /dev/null

echo pgbench -c 1 -j 1 -f bench.script -T 300 bench
pgbench -c 1 -j 1 -f bench.script -T 300 bench | grep including
echo pgbench -c 2 -j 1 -f bench.script -T 300 bench
pgbench -c 2 -j 1 -f bench.script -T 300 bench | grep including
echo pgbench -c 8 -j 4 -f bench.script -T 300 bench
pgbench -c 8 -j 4 -f bench.script -T 300 bench | grep including
echo pgbench -c 16 -j 8 -f bench.script -T 300 bench
pgbench -c 16 -j 8 -f bench.script -T 300 bench | grep including
echo pgbench -c 32 -j 16 -f bench.script -T 300 bench
pgbench -c 32 -j 16 -f bench.script -T 300 bench | grep including
echo pgbench -c 64 -j 32 -f bench.script -T 300 bench
pgbench -c 64 -j 32 -f bench.script -T 300 bench | grep including
echo pgbench -c 96 -j 48 -f bench.script -T 300 bench
pgbench -c 96 -j 48 -f bench.script -T 300 bench | grep including
echo pgbench -c 128 -j 64 -f bench.script -T 300 bench
pgbench -c 128 -j 64 -f bench.script -T 300 bench | grep including

-- 
Sent via pgsql-hackers mailing list 

[HACKERS] [patch] Improve documentation around FreeBSD Kernel Tuning

2012-01-03 Thread Brad Davis
Hi,

I have a patch that improves the documentation for FreeBSD Kernel Tuning:

- Show a # prompt instead of $ to indicate a root shell is needed
- Remove the -w flag to sysctl since it is not needed anymore and just silently 
ignored
- Encourage the user to set the read-only sysctls in /boot/loader.conf, instead 
of setting them manually in the loader.

I have put these in a github fork of the repo, but I am new to git. So I 
apologize if this is incorrect. 

https://github.com/so14k/postgres/commit/12c03bdb2967346e7ad9ce0bdd3db8dfcf81507e


Regards,
Brad Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Improve documentation around FreeBSD Kernel Tuning

2012-01-03 Thread Andrew Dunstan



On 01/03/2012 04:49 PM, Brad Davis wrote:

Hi,

I have a patch that improves the documentation for FreeBSD Kernel Tuning:

- Show a # prompt instead of $ to indicate a root shell is needed
- Remove the -w flag to sysctl since it is not needed anymore and just silently 
ignored
- Encourage the user to set the read-only sysctls in /boot/loader.conf, instead 
of setting them manually in the loader.

I have put these in a github fork of the repo, but I am new to git. So I 
apologize if this is incorrect.

https://github.com/so14k/postgres/commit/12c03bdb2967346e7ad9ce0bdd3db8dfcf81507e




Instead of a URL, please just email us the diff as an attachment. 
Normally we prefer these in context diff format, although it doesn't 
matter so much for such a small patch.


See

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlocks on powerpc

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 4:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 For Itanium, I was able to find some fairly official-looking
 documentation that said this is how you should do it.  It would be
 nice to find something similar for PPC64, instead of testing every
 machine and reinventing the wheel ourselves.

 You are aware that our spinlock code is pretty much verbatim from the
 PPC ISA spec, no?  The issue here is that the official documentation
 has been a moving target over the decades the ISA has been in existence.

I wasn't aware of that, but I think my basic point still stands: it's
gonna be painful if we have to test large numbers of different PPC
boxes to figure all this out...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-03 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 v2 of checksum patch, using a conditional copy if checksumming is
 enabled, so locking is removed.
 
 Thanks to Andres for thwacking me with the cluestick, though I
 have used a simple copy rather than a copy  calc.
 
 Tested using make installcheck with parameter on/off, then restart
 and vacuumdb to validate all pages.
 
 Reviews, objections, user interface tweaks all welcome.
 
I'm happy with how this looks, except (as noted in a code comment)
that there seems to be room for optimization of the calculation
itself.  Details below:
 
(1)  I like the choice of Fletcher-16.  It should be very good at
detecting problems while being a lot less expensive that an official
CRC calculation.  The fact that it was developed at Lawrence
Livermore Labs and has been subjected to peer review in the IEEE
Transactions on Communications generates confidence in the
technique.  According to what I've read, though, the technique is
conceptually based around the modulus of the sums.  Doing the
modulus calculation for each addition is often done to prevent
overflow, but if we define sum1 and sum2 as uint I don't see how we
can get an overflow with 8k byes, so I suggest we declare both of
these local variables as uint and leave the modulus to the end.  It
can't hurt to leave off 16000 division operations per checksum
generation.
 
(2)  I'm not sure about doing this in three parts, to skip the
checksum itself and the hole in the middle of the page.  Is this
because the hole might not have predictable data?  Why would that
matter, as long as it is read back the same?  Is it expected that
this will reduce the cost by checking fewer bytes?  That we could
tolerate the read of an actual corrupted disk sector in the middle
of a page if it didn't contain any data?  If we can set the checksum
to zero before starting, might it be faster to load four bytes at a
time, and iterate fewer times?  (Like I said, I'm not sure about any
of this, but it seemed worth asking the questions.)
 
(3)  Rather than having PageSetVerificationInfo() use memcpy,
followed by pass through the copied data to calculate the checksum,
might it be better to have a copy and calculate version of the
function (as in VMware's original patch) to save an extra pass over
the page image?
 
Other than these performance tweaks around the calculation phase, I
didn't spot any problems.  I beat up on it a bit on a couple
machines without hitting any bugs or seeing any regression test
failures.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] grants vs. inherited tables

2012-01-03 Thread Marko Kreen
On Mon, Jan 02, 2012 at 12:31:13PM +0100, Dimitri Fontaine wrote:
 Marko Kreen mark...@gmail.com writes:
  I tried to generalize a function that creates partitions
  for a table and found out it's impossible to do it for grants.
 
  Basically, what I want is a child table that takes it's grants
  from parent table.  IMHO quite reasonable request.  But I don't
  see a way to do it in pl/pgsql.  (Writing parser in plpgsql
  for aclitemout() output does not count.)
 
 We solved that manually in https://github.com/slardiere/PartMgr, maybe
 you will find it useful for pre-9.2 releases. See function
 partition.grant() and partition.setgrant() in part_api.sql.

Thanks, thats interesting.  Here is my current state:

  
https://github.com/markokr/skytools/blob/master/sql/dispatch/create_partition.sql

which uses info-schema for grants, which seems nicer than
parsing, but still not as nice as including all.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-03 Thread Jim Nasby
On Jan 3, 2012, at 4:21 PM, Kevin Grittner wrote:
 (2)  I'm not sure about doing this in three parts, to skip the
 checksum itself and the hole in the middle of the page.  Is this
 because the hole might not have predictable data?  Why would that
 matter, as long as it is read back the same?

IMO not checksumming the free space would be a really bad idea. It's entirely 
possible to have your hardware crapping on your free space, and I'd still want 
to know that that was happening. Now, it would be interesting if the free space 
could be checksummed separately, since there's no reason to refuse to read the 
page if only the free space is screwed up... But given the choice, I'd rather 
get an error when the free space is corrupted and be forced to look into 
things rather than blissfully ignore corrupted free space only to be hit later 
with real data loss.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Jim Nasby
On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.
 
 On top of that, taking what Robert Haas mentioned on another thread,
 InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
 an ExclusiveLock on the BufFreelistLock. On a busy system this will be
 heavily contended, so adding blocks to the freelist only if the lock
 is free seems warranted.

Couldn't we just leave the buffers alone? Once an index is dropped and that's 
pushed out through the catalog then nothing should be trying to access them and 
they'll eventually just get aged out.

In fact, IIRC the function that scans for buffers actually checks to see if a 
rel still exists before it returns the buffer...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Improve documentation around FreeBSD Kernel Tuning

2012-01-03 Thread Brad Davis
On Tue, Jan 03, 2012 at 05:02:57PM -0500, Andrew Dunstan wrote:
 
 
 On 01/03/2012 04:49 PM, Brad Davis wrote:
  Hi,
 
  I have a patch that improves the documentation for FreeBSD Kernel Tuning:
 
  - Show a # prompt instead of $ to indicate a root shell is needed
  - Remove the -w flag to sysctl since it is not needed anymore and just 
  silently ignored
  - Encourage the user to set the read-only sysctls in /boot/loader.conf, 
  instead of setting them manually in the loader.
 
  I have put these in a github fork of the repo, but I am new to git. So I 
  apologize if this is incorrect.
 
  https://github.com/so14k/postgres/commit/12c03bdb2967346e7ad9ce0bdd3db8dfcf81507e
 
 
 
 Instead of a URL, please just email us the diff as an attachment. 
 Normally we prefer these in context diff format, although it doesn't 
 matter so much for such a small patch.

Sorry about that.. it is attached below.


Thanks,
Brad Davis


diff against doc/src/sgml/runtime.sgml

797,799c797,799
 prompt$/prompt userinputsysctl -w kern.ipc.shmall=32768/userinput
 prompt$/prompt userinputsysctl -w kern.ipc.shmmax=134217728/userinput
 prompt$/prompt userinputsysctl -w kern.ipc.semmap=256/userinput
---
 prompt#/prompt userinputsysctl kern.ipc.shmall=32768/userinput
 prompt#/prompt userinputsysctl kern.ipc.shmmax=134217728/userinput
 prompt#/prompt userinputsysctl kern.ipc.semmap=256/userinput
807,815c807,815
 commandsysctl/command is concerned, but can be changed
 before boot using the commandloader/command prompt:
 screen
 prompt(loader)/prompt userinputset kern.ipc.semmni=256/userinput
 prompt(loader)/prompt userinputset kern.ipc.semmns=512/userinput
 prompt(loader)/prompt userinputset kern.ipc.semmnu=256/userinput
 /screen
 Similarly these can be saved between reboots in
 filename/boot/loader.conf/filename.
---
 commandsysctl/command is concerned, but can be set in
 filename/boot/loader.conf/filename:
 programlisting
 kern.ipc.semmni=256
 kern.ipc.semmns=512
 kern.ipc.semmnu=256
 /programlisting
 After modifying these values a reboot is required for the new
 settings to take affect.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-03 Thread Jim Nasby
On Jan 3, 2012, at 11:15 AM, Robert Haas wrote:
 So you don't think a freelist is worth having, but you want a list of
 allocation targets.
 What is the practical difference?
 
 I think that our current freelist is practically useless, because it
 is almost always empty, and the cases where it's not empty (startup,
 and after a table or database drop) are so narrow that we don't really
 get any benefit out of having it.  However, I'm not opposed to the
 idea of a freelist in general: I think that if we actually put in some
 effort to keep the freelist in a non-empty state it would help a lot,
 because backends would then have much less work to do at buffer
 allocation time.

This is exactly what the FreeBSD VM system does (which is at least one of the 
places where the idea of a clock sweep for PG came from ages ago). There is a 
process that does nothing but attempt to keep X amount of memory on the free 
list, where it can immediately be grabbed by anything that needs memory. Pages 
on the freelist are guaranteed to be clean (as in not dirty), but not zero'd. 
In fact, IIRC if a page on the freelist gets referenced again it can be pulled 
back out of the free list and put back into an active state.

The one downside I see to this is that we'd need some heuristic to determine 
how many buffers we want to keep on the free list.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-01-03 Thread Peter Eisentraut
On mån, 2012-01-02 at 23:42 -0500, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On mån, 2012-01-02 at 15:47 +0100, Magnus Hagander wrote:
  Were you thinking one option pointing to a directory or one option per
  file?
 
  One option per file:
 
 That seems like serious overkill.  Why not one option specifying the
 directory?  What use-case is there for letting them be in different
 directories, let alone letting the DBA choose random names for each one?

Several consistency reasons:

- We provide the same per-file options in libpq.

- Indeed, if you use something like dblink or plproxy, these might even
point to the same files.

- We also have settings for hba_file and ident_file that point to a
file.

- All other daemons with SSL support known to me, such as mail and web
servers, have per-file options.

Also some practical reasons:

- Yes, choosing random names is important.  We have local naming
schemes.  And I would rather have a descriptive file name for the CA
than having them all named root.crt, and if I want to know which one it
is I have to look inside the file.

- You might not want all of the files in the same directory.  CA and CRL
might live elsewhere, for example.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Tom Lane
Jim Nasby j...@nasby.net writes:
 On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.

 Couldn't we just leave the buffers alone? Once an index is dropped and that's 
 pushed out through the catalog then nothing should be trying to access them 
 and they'll eventually just get aged out.

No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file.  It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.  Simon is after a different solution involving getting
rid of the clock sweep, but he has failed to explain how that's not
going to end up being the same type of contention-prone coding that we
got rid of by adopting the clock sweep, some years ago.  Yeah, the sweep
takes a lot of spinlocks, but that only matters if there is contention
for them, and the sweep approach avoids the need for a centralized data
structure.

(BTW, do we have a separate clock sweep hand for each backend?  If not,
there might be some low hanging fruit there.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_regress: Replace exit_nicely() with exit() plus atexit() hook

2012-01-03 Thread Peter Eisentraut
On mån, 2012-01-02 at 17:27 -0500, Andrew Dunstan wrote:
 
 On 01/02/2012 04:37 PM, Peter Eisentraut wrote:
  On mån, 2012-01-02 at 15:55 -0500, Andrew Dunstan wrote:
  On 01/02/2012 03:12 PM, Peter Eisentraut wrote:
  pg_regress: Replace exit_nicely() with exit() plus atexit() hook
 
  This appears to have broken the buildfarm.
  I think you mean it has caused the build to fail on some buildfarm
  members.  AFAICT, the buildfarm itself is still intact.

 This is hardly the first use of this idiom.

But it's about as helpful as a bug report saying help it's broken.

  I think there is some room for improvement there:
 
  - Why is the isolation test not part of check-world/installcheck-world?
 
 
 The buildfarm does not use the -world targets, for several reasons, 
 including:

That was not my question.  I run check-world/installcheck-world locally,
and if the isolation test had been part of this, this problem wouldn't
have happened.

  - Could we get the buildfarm server to send out emails whenever a build
  fails?

 This facility has been there for years

Cool, maybe it could be advertised on buildfarm.postgresql.org.  I
couldn't see any link leading from there to anywhere near the mailing
lists.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Improve documentation around FreeBSD Kernel Tuning

2012-01-03 Thread Andrew Dunstan



On 01/03/2012 06:15 PM, Brad Davis wrote:

On Tue, Jan 03, 2012 at 05:02:57PM -0500, Andrew Dunstan wrote:


On 01/03/2012 04:49 PM, Brad Davis wrote:

Hi,

I have a patch that improves the documentation for FreeBSD Kernel Tuning:

- Show a # prompt instead of $ to indicate a root shell is needed
- Remove the -w flag to sysctl since it is not needed anymore and just silently 
ignored
- Encourage the user to set the read-only sysctls in /boot/loader.conf, instead 
of setting them manually in the loader.

I have put these in a github fork of the repo, but I am new to git. So I 
apologize if this is incorrect.

https://github.com/so14k/postgres/commit/12c03bdb2967346e7ad9ce0bdd3db8dfcf81507e



Instead of a URL, please just email us the diff as an attachment.
Normally we prefer these in context diff format, although it doesn't
matter so much for such a small patch.

Sorry about that.. it is attached below.




But you didn't :-) You just cut and pasted it. And if you're not going 
to send a context diff, it should be a unidiff such as git normally 
produces. Since I didn't actually tell you that I've made a context diff 
for you, and it's attached. I'll let someone with more FBSD-fu than me 
actually comment on it.


cheers

andrew
*** a/doc/src/sgml/runtime.sgml
--- b/doc/src/sgml/runtime.sgml
***
*** 794,802  options SEMMNS=240
  commandloader/command interfaces.  The following
  parameters can be set using commandsysctl/command:
  screen
! prompt$/prompt userinputsysctl -w kern.ipc.shmall=32768/userinput
! prompt$/prompt userinputsysctl -w kern.ipc.shmmax=134217728/userinput
! prompt$/prompt userinputsysctl -w kern.ipc.semmap=256/userinput
  /screen
  To have these settings persist over reboots, modify
  filename/etc/sysctl.conf/filename.
--- 794,802 
  commandloader/command interfaces.  The following
  parameters can be set using commandsysctl/command:
  screen
! prompt#/prompt userinputsysctl kern.ipc.shmall=32768/userinput
! prompt#/prompt userinputsysctl kern.ipc.shmmax=134217728/userinput
! prompt#/prompt userinputsysctl kern.ipc.semmap=256/userinput
  /screen
  To have these settings persist over reboots, modify
  filename/etc/sysctl.conf/filename.
***
*** 804,818  options SEMMNS=240
  
 para
  The remaining semaphore settings are read-only as far as
! commandsysctl/command is concerned, but can be changed
! before boot using the commandloader/command prompt:
! screen
! prompt(loader)/prompt userinputset kern.ipc.semmni=256/userinput
! prompt(loader)/prompt userinputset kern.ipc.semmns=512/userinput
! prompt(loader)/prompt userinputset kern.ipc.semmnu=256/userinput
! /screen
! Similarly these can be saved between reboots in
! filename/boot/loader.conf/filename.
 /para
  
 para
--- 804,818 
  
 para
  The remaining semaphore settings are read-only as far as
! commandsysctl/command is concerned, but can be set in
! filename/boot/loader.conf/filename:
! programlisting
! kern.ipc.semmni=256
! kern.ipc.semmns=512
! kern.ipc.semmnu=256
! /programlisting
! After modifying these values a reboot is required for the new
! settings to take affect.
 /para
  
 para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_regress: Replace exit_nicely() with exit() plus atexit() hook

2012-01-03 Thread Andrew Dunstan



On 01/03/2012 06:29 PM, Peter Eisentraut wrote:



I think there is some room for improvement there:

- Why is the isolation test not part of check-world/installcheck-world?


The buildfarm does not use the -world targets, for several reasons,
including:

That was not my question.  I run check-world/installcheck-world locally,
and if the isolation test had been part of this, this problem wouldn't
have happened.



I have no idea why. It was probably an oversight when the isolation 
tests were added. I guess something like this would fix it?


   diff --git a/GNUmakefile.in b/GNUmakefile.in
   index 50fae41..5976832 100644
   --- a/GNUmakefile.in
   +++ b/GNUmakefile.in
   @@ -68,7 +68,7 @@ check installcheck installcheck-parallel:

   $(call recurse,check-world,src/test src/pl src/interfaces/ecpg
   contrib,check)

   -$(call recurse,installcheck-world,src/test src/pl
   src/interfaces/ecpg contrib,installcheck)
   +$(call recurse,installcheck-world,src/test src/test/isolation
   src/pl src/interfaces/ecpg contrib,installcheck)

   $(call recurse,maintainer-check,doc src config contrib)


The isolation tests only run installcheck, not plain check. See the 
archives for details of why.




- Could we get the buildfarm server to send out emails whenever a build
fails?

This facility has been there for years

Cool, maybe it could be advertised on buildfarm.postgresql.org.  I
couldn't see any link leading from there to anywhere near the mailing
lists.





OK, I'll look at providing links from the web site to the lists.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Improve documentation around FreeBSD Kernel Tuning

2012-01-03 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Since I didn't actually tell you that I've made a context diff 
 for you, and it's attached. I'll let someone with more FBSD-fu than me 
 actually comment on it.

I have no FBSD-fu whatever, but the question this patch raises in my
mind is whether /boot/loader.conf exists in every version of FBSD.
If not, we probably need to say something like do this in versions =
whatever, and do the other in versions before that.  Likewise, has
it always been true that -w is unnecessary?  For other systems
such as Mac OS X, we have recommendations covering quite ancient OS
releases, and I don't see why we'd not hold the FreeBSD section to the
same standard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Jim Nasby
On Jan 3, 2012, at 5:28 PM, Tom Lane wrote:
 Jim Nasby j...@nasby.net writes:
 On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
 This could well be related to the fact that DropRelFileNodeBuffers()
 does a scan of shared_buffers, which is an O(N) approach no matter the
 size of the index.
 
 Couldn't we just leave the buffers alone? Once an index is dropped and 
 that's pushed out through the catalog then nothing should be trying to 
 access them and they'll eventually just get aged out.
 
 No, we can't, because if they're still dirty then the bgwriter would
 first try to write them to the no-longer-existing storage file.  It's
 important that we kill the buffers immediately during relation drop.
 
 I'm still thinking that it might be sufficient to mark the buffers
 invalid and let the clock sweep find them, thereby eliminating the need
 for a freelist.  Simon is after a different solution involving getting
 rid of the clock sweep, but he has failed to explain how that's not
 going to end up being the same type of contention-prone coding that we
 got rid of by adopting the clock sweep, some years ago.  Yeah, the sweep
 takes a lot of spinlocks, but that only matters if there is contention
 for them, and the sweep approach avoids the need for a centralized data
 structure.

Yeah, but the problem we run into is that with every backend trying to run the 
clock on it's own we end up with high contention again... it's just in a 
different place than when we had a true LRU. The clock sweep might be cheaper 
than the linked list was, but it's still awfully expensive. I believe our best 
bet is to have a free list that is actually useful in normal operations, and 
then optimize the cost of pulling buffers out of that list as much as possible 
(and let the bgwriter deal with keeping enough pages in that list to satisfy 
demand).

Heh, it occurs to me that the SQL analogy for how things work right now is that 
backends currently have to run a SeqScan (or 5) to find a free page... what we 
need to do is CREATE INDEX free ON buffers(buffer_id) WHERE count = 0;.

 (BTW, do we have a separate clock sweep hand for each backend?  If not,
 there might be some low hanging fruit there.)

No... having multiple clock hands is an interesting idea, but I'm worried that 
it could potentially get us into trouble if scores of backends were suddenly 
decrementing usage counts all over the place. For example, what if 5 backends 
all had their hands in basically the same place, all pointing at a very heavily 
used buffer. All 5 backends go for free space, they each grab the spinlock on 
that buffer in succession and suddenly this highly used buffer that started 
with a count of 5 has now been freed. We could potentially use more than one 
hand, but I think the relation between the number of hands and the maximum 
usage count has to be tightly controlled.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] information schema/aclexplode doesn't know about default privileges

2012-01-03 Thread Jim Nasby
On Jan 1, 2012, at 10:43 PM, Peter Eisentraut wrote:
 I figured the best and most flexible way to address this is to export
 acldefault() as an SQL function and replace
 
aclexplode(proacl)
 
 with
 
aclexplode(coalesce(proacl, acldefault('f', proowner)))

It would be nice to provide a convenience function that does the coalesce for 
you. End users sometimes need this stuff as well as info_schema.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Tom Lane
Jim Nasby j...@nasby.net writes:
 Yeah, but the problem we run into is that with every backend trying to run 
 the clock on it's own we end up with high contention again... it's just in a 
 different place than when we had a true LRU. The clock sweep might be cheaper 
 than the linked list was, but it's still awfully expensive. I believe our 
 best bet is to have a free list that is actually useful in normal operations, 
 and then optimize the cost of pulling buffers out of that list as much as 
 possible (and let the bgwriter deal with keeping enough pages in that list to 
 satisfy demand).

Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.

Still, he who wants to do the work can try whatever approach he feels
like.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2012-01-03 Thread Jim Nasby
On Dec 28, 2011, at 3:31 AM, Simon Riggs wrote:
 On Wed, Dec 28, 2011 at 9:00 AM, Robert Haas robertmh...@gmail.com wrote:
 
  What I'm not too clear
 about is whether a 16-bit checksum meets the needs of people who want
 checksums.
 
 We need this now, hence the gymnastics to get it into this release.
 
 16-bits of checksum is way better than zero bits of checksum, probably
 about a million times better (numbers taken from papers quoted earlier
 on effectiveness of checksums).
 
 The strategy I am suggesting is 16-bits now, 32/64 later.

What about allowing for an initdb option? That means that if you want binary 
compatibility so you can pg_upgrade then you're stuck with 16 bit checksums. If 
you can tolerate replicating all your data then you can get more robust 
checksumming.

In either case, it seems that we're quickly approaching the point where we need 
to start putting resources into binary page upgrading...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] Improve documentation around FreeBSD Kernel Tuning

2012-01-03 Thread Brad Davis
On Tue, Jan 03, 2012 at 06:43:52PM -0500, Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
  Since I didn't actually tell you that I've made a context diff 
  for you, and it's attached. I'll let someone with more FBSD-fu than me 
  actually comment on it.
 
 I have no FBSD-fu whatever, but the question this patch raises in my
 mind is whether /boot/loader.conf exists in every version of FBSD.
 If not, we probably need to say something like do this in versions =
 whatever, and do the other in versions before that.  Likewise, has
 it always been true that -w is unnecessary?  For other systems
 such as Mac OS X, we have recommendations covering quite ancient OS
 releases, and I don't see why we'd not hold the FreeBSD section to the
 same standard.

Well.. The man page appeared somewhere between FreeBSD 3.0 and 4.0.. and
4.0 was released March 14, 2000.


Regards,
Brad Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Setting -Werror in CFLAGS

2012-01-03 Thread Peter Geoghegan
During the talk How To Get Your PostgreSQL Patch Accepted during
PgCon last year, I raised the idea of making a -Werror build option
easily available. I think it was Robert that pointed out that the
problem with that was that there is a warning due to an upstream Flex
bug that we can't do anything about. He was right about that; I
subsequently tried fairly hard to get the Flex guys to fix it a few
months back, but it seems that Flex isn't that well maintained, so
even though I convinced someone to write a patch to fix it, it ended
up going nowhere. Looking back through the commit log, I can see that
this is an old problem.

i thought that we could get most of the benefit of a -Werror option by
excluding the relevant class of error in CLAGS; gcc allows for this
(pity it doesn't have MSVC's much more granular ability to
discriminate against diagnostic messages). This should work:

./configure --prefix=/home/peter/pgsql CFLAGS=-Werror
-Wno-error=unused-but-set-variable

However, it does not (with or without the -Wno-error):

checking whether getpwuid_r takes a fifth argument... no
checking whether strerror_r returns int... no
checking test program... ok
checking whether long int is 64 bits... no
checking whether long long int is 64 bits... no
configure: error: Cannot find a working 64-bit integer type.

Obviously it breaks this one configure test. However, I got it to work
in 2 minutes by hacking up config.log. It would be nice if someone
could fix the test (and possibly later ones) so that it doesn't
produce a warning/error when invoking the compiler, and it finds a
64-bit int type as expected. That way, it could sort of become folk
wisdom that you should build Postgres with those flags during
development, and maybe even, on some limited basis, on the build farm,
and we'd all be sightly better off.

Giving that many people build with CFLAGS=-O0 -g on a day-to-day
basis, which doesn't show some warnings, the folk wisdom might end up
being: Just before you submit your patch, see how an -O2 -Werror
build goes.

There is an entry in the parser Makefile to support this sort of use,
which makes it look like my -Wno-error=unused-but-set-variable use
would be redundant if it worked:

# Latest flex causes warnings in this file.
ifeq ($(GCC),yes)
gram.o: CFLAGS += -Wno-error
endif

(although we could discriminate better within the parse here by using my flag).

As if to prove my point, I found this warning which I didn't
previously notice, just from 5 minutes of playing around:

hashovfl.c: In function ‘_hash_freeovflpage’:
hashovfl.c:394:10: error: variable ‘bucket’ set but not used
[-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

(plus others)

Yes, I know that these only appeared in GCC 4.6+ and as such are a
relatively recent phenomenon, but there has been some effort to
eliminate them, and if I could get a non-hacked -Werror build I'd feel
happy enough about excluding them as already outlined.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_internal.init and an index file have the same inode

2012-01-03 Thread Daniel Farina
I'm not sure if this is an XFS problem, or Postgres.  There's enough
suspicious evidence that it's too hard to say.

Today, I get an interesting issue raised whereby a reasonably simple
query fails on a system that does take successful pg_dumps regularly.
To make a short story shorter, I end up with the following situation
on an index relation file:

$ sha1sum 16587.8 pg_internal.init
4a0f94285c182b67175d6a68669c6b9fd46fa11e  16587.8
4a0f94285c182b67175d6a68669c6b9fd46fa11e  pg_internal.init

And, even more interestingly,

$ stat 16587.8 pg_internal.init
  File: `16587.8'
  Size: 98532   Blocks: 200IO Block: 4096   regular file
Device: fe00h/65024dInode: 1073741952  Links: 1
Access: (0600/-rw---)  Uid: (  107/postgres)   Gid: (  115/postgres)
Access: 2011-12-14 23:35:35.630043643 +
Modify: 2011-12-14 23:35:35.630043643 +
Change: 2011-12-14 23:35:35.630043643 +
  File: `pg_internal.init'
  Size: 98532   Blocks: 200IO Block: 4096   regular file
Device: fe00h/65024dInode: 1073741952  Links: 1
Access: (0600/-rw---)  Uid: (  107/postgres)   Gid: (  115/postgres)
Access: 2011-12-14 23:35:35.630043643 +
Modify: 2011-12-14 23:35:35.630043643 +
Change: 2011-12-14 23:35:35.630043643 +

Most notably, the inode numbers are the same.  At first, I thought
this was a file descriptor race in PG, but then I noticed the file
system only reports *one* link: that doesn't look like a valid state
for XFS.  Should I raise this on their mailing list, and would
pgsql-hackers like to know more about this?  Also, consider this an
advisory, as this was found on a vanilla Ubuntu Lucid 10.04 machine.

As such, I do think this probably could have happened to a non-index
relation file.

This database has had uptime since mid-December, and that inode seems
to have been modified at the time of the last reboot.  The database
was born in October via point-in-time recovery.

...Happy New Year!

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] our buffer replacement strategy is kind of lame

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 6:22 PM, Jim Nasby j...@nasby.net wrote:
 On Jan 3, 2012, at 11:15 AM, Robert Haas wrote:
 So you don't think a freelist is worth having, but you want a list of
 allocation targets.
 What is the practical difference?

 I think that our current freelist is practically useless, because it
 is almost always empty, and the cases where it's not empty (startup,
 and after a table or database drop) are so narrow that we don't really
 get any benefit out of having it.  However, I'm not opposed to the
 idea of a freelist in general: I think that if we actually put in some
 effort to keep the freelist in a non-empty state it would help a lot,
 because backends would then have much less work to do at buffer
 allocation time.

 This is exactly what the FreeBSD VM system does (which is at least one of the 
 places where the idea of a clock sweep for PG came from ages ago). There is a 
 process that does nothing but attempt to keep X amount of memory on the free 
 list, where it can immediately be grabbed by anything that needs memory. 
 Pages on the freelist are guaranteed to be clean (as in not dirty), but not 
 zero'd. In fact, IIRC if a page on the freelist gets referenced again it can 
 be pulled back out of the free list and put back into an active state.

 The one downside I see to this is that we'd need some heuristic to determine 
 how many buffers we want to keep on the free list.

Fortuitously, I believe the background writer already has most of the
necessary logic: it attempts to predict how many buffers are about to
be needed - I think based on a decaying average.

Actually, I think that logic could use some improvement, because I
believe I've heard Greg Smith comment that it's often necessary to
tune bgwriter_delay downward.  It'd be nice to make the delay adaptive
somehow, to avoid the need for manual tuning (and unnecessary wake-ups
when the system goes idle).

But possibly the existing logic is good enough for a first cut.
However, in the interest of full disclosure, I'll admit that I've done
no testing in this area at all and am talking mostly out of my
posterior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby j...@nasby.net writes:
 Yeah, but the problem we run into is that with every backend trying to run 
 the clock on it's own we end up with high contention again... it's just in a 
 different place than when we had a true LRU. The clock sweep might be 
 cheaper than the linked list was, but it's still awfully expensive. I 
 believe our best bet is to have a free list that is actually useful in 
 normal operations, and then optimize the cost of pulling buffers out of that 
 list as much as possible (and let the bgwriter deal with keeping enough 
 pages in that list to satisfy demand).

 Well, maybe, but I think the historical evidence suggests that that
 approach will be a loser, simply because no matter how cheap, the
 freelist will remain a centralized and heavily contended data structure.
 IMO we need to be looking for a mechanism that has no single point of
 contention, and modifying the clock sweep rules looks like the best way
 to get there.

 Still, he who wants to do the work can try whatever approach he feels
 like.

It might be possible to partition the free list.  So imagine, for
example, 8 free lists.  The background writer runs the clock sweep and
finds some buffers that are about ready to be reallocated and
distributes one-eighth of them to each free list.  Then, when a
backend wants to allocate a buffer, it picks a free list somehow
(round robin?) and pulls a buffer off it.  If the buffer turns out to
have been used since it was added to the free list, we give up on it
and try again.  This hopefully shouldn't happen too often, though, as
long as we only add enough buffers to the free list to satisfy the
requests that we expect to get over the next
small-fraction-of-a-second.

Of course you have to think about what happens if you find that your
chosen free list is empty.  In that case you probably want to try a
different one, and if in the worst case where they're all empty, run
the clock sweep in the foreground.  You probably also want to kick the
background writer in the pants if even a single one is empty (or
nearly empty?) and tell it to hurry up and add some more buffers to
the freelist.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SEGFAULT on SELECT * FROM view

2012-01-03 Thread Robert Haas
On Mon, Jan 2, 2012 at 1:42 AM, chris r. chri...@gmx.net wrote:
 I ported the entire schema to my test DB server and could not reproduce
 the error there. Note that probably recreating the view solves this
 issue. Given this, how should I proceed to create a test case? Any
 tutorial on this? (I'm not too familiar with all this yet.)

 It's possibly statistics-dependent; make sure you have the same stats
 targets on both DBs, and try re-analyzing a few times.  Check other
 planner parameters are the same, too.
 In addition to the schema, I now also copied parts of the data in our
 production system to the testing DB. (I cannot copy all data, as it's
 too large for the testing DB.) After a couple of VACUUM FULL ANALYZEs,
 the bug still doesn't reproduce.

 In our production environment, I'm working with the re-created view that
 doesn't show the SEGFAULT behavior. I can live with this workaround, as
 only one particular view seems to be affected, but if you want me to
 debug more on the error please let me know.

It would be awfully nice to figure out what was going on.  Another
thing that would help, if you can still reproduce the error, would be
to get a stack trace.  The easiest way would probably be to enable
core dumps (pg_ctl -c start, or ulimit -c unlimited followed by pg_ctl
start).  Then, when you get a core, which might end up inside $PGDATA
or perhaps elsewhere depending on your system configuration, do:

gdb /path/to/postgres /path/to/corefile

and then from within gdb:

bt

...to get a backtrace.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 9:33 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I'd suppose it interesting to add a table to pg_catalog containing this data.

 - it is useless overhead

I tend to agree.

 I am expecting so definition some constants in Perl, Python is simple

Presumably one could instead write a script to transform the list of
constants into a .pm file that could be loaded into the background, or
whatever PL/python's equivalent of that concept is.  Not sure if
there's a better way to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-03 Thread Robert Haas
On Fri, Dec 30, 2011 at 11:58 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On 12/29/11, Ants Aasma ants.aa...@eesti.ee wrote:
 Unless I'm missing something, double-writes are needed for all writes,
 not only the first page after a checkpoint. Consider this sequence of
 events:

 1. Checkpoint
 2. Double-write of page A (DW buffer write, sync, heap write)
 3. Sync of heap, releasing DW buffer for new writes.
  ... some time goes by
 4. Regular write of page A
 5. OS writes one part of page A
 6. Crash!

 Now recovery comes along, page A is broken in the heap with no
 double-write buffer backup nor anything to recover it by in the WAL.

 Isn't 3 the very definition of a checkpoint, meaning that 4 is not
 really a regular write as it is the first one after a checkpoint?

I think you nailed it.

 But it doesn't seem safe to me replace a page from the DW buffer and
 then apply WAL to that replaced page which preceded the age of the
 page in the buffer.

That's what LSNs are for.

If we write the page to the checkpoint buffer just once per
checkpoint, recovery can restore the double-written versions of the
pages and then begin WAL replay, which will restore all the subsequent
changes made to the page.  Recovery may also need to do additional
double-writes if it encounters pages that for which we wrote WAL but
never flushed the buffer, because a crash during recovery can also
create torn pages.  When we reach a restartpoint, we fsync everything
down to disk and then nuke the double-write buffer.  Similarly, in
normal running, we can nuke the double-write buffer at checkpoint
time, once the fsyncs are complete.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 7:39 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Yes, I know that these only appeared in GCC 4.6+ and as such are a
 relatively recent phenomenon, but there has been some effort to
 eliminate them, and if I could get a non-hacked -Werror build I'd feel
 happy enough about excluding them as already outlined.

I just do this:

echo COPT=-Werror  src/Makefile.custom

...which seems to work reasonably well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_regress: Replace exit_nicely() with exit() plus atexit() hook

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 6:29 PM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2012-01-02 at 17:27 -0500, Andrew Dunstan wrote:

 On 01/02/2012 04:37 PM, Peter Eisentraut wrote:
  On mån, 2012-01-02 at 15:55 -0500, Andrew Dunstan wrote:
  On 01/02/2012 03:12 PM, Peter Eisentraut wrote:
  pg_regress: Replace exit_nicely() with exit() plus atexit() hook
 
  This appears to have broken the buildfarm.
  I think you mean it has caused the build to fail on some buildfarm
  members.  AFAICT, the buildfarm itself is still intact.

 This is hardly the first use of this idiom.

 But it's about as helpful as a bug report saying help it's broken.

I don't see why.  Normally you can just go to buildfarm.postgresql.org
and see which machines are failing and at what stage, and the view the
stage logs to see the specific errors.  It's not the best web
interface I've ever seen, but it's not *that* bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-01-03 Thread Robert Haas
On Tue, Jan 3, 2012 at 6:25 PM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2012-01-02 at 23:42 -0500, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On mån, 2012-01-02 at 15:47 +0100, Magnus Hagander wrote:
  Were you thinking one option pointing to a directory or one option per
  file?

  One option per file:

 That seems like serious overkill.  Why not one option specifying the
 directory?  What use-case is there for letting them be in different
 directories, let alone letting the DBA choose random names for each one?

 [ reasons ]

I agree with these reasons.  We don't get charged $0.50 per GUC, so
there's no particular reason to contort things to have fewer of them.
It's nice where it's possible, of course, but not when it makes people
contort things to support the way we think our users should lay out
their filesystem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add SPI results constants available for PL/*

2012-01-03 Thread Andrew Dunstan



On 01/03/2012 08:40 PM, Robert Haas wrote:

On Tue, Jan 3, 2012 at 9:33 AM, Pavel Stehulepavel.steh...@gmail.com  wrote:

I'd suppose it interesting to add a table to pg_catalog containing this data.

- it is useless overhead

I tend to agree.


I am expecting so definition some constants in Perl, Python is simple

Presumably one could instead write a script to transform the list of
constants into a .pm file that could be loaded into the background, or
whatever PL/python's equivalent of that concept is.  Not sure if
there's a better way to do it.


Yeah, I'm with you and Pavel. Here's my quick perl one-liner to produce 
a set of SPI_* constants for pl/perl. I'm looking at the best way to 
include this in the bootstrap code.


   perl -ne 'BEGIN { print use constant\n{\n; } END { print };\n; }
   print \t$1 = $2,\n if /#define (SPI_\S+)\s+\(?(-?\d+)\)?/;'
   src/include/executor/spi.h


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ordering op for WHERE

2012-01-03 Thread Robert Haas
On Tue, Dec 27, 2011 at 10:48 PM, YAMAMOTO Takashi
y...@mwd.biglobe.ne.jp wrote:
 does it make sense to teach the planner (and the executor?) use an ordering op
 to optimize queries like the following?

        select * from t where a - 1000  10

Seems useful to me.  I'm not sure how hard it is, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_regress: Replace exit_nicely() with exit() plus atexit() hook

2012-01-03 Thread Andrew Dunstan



On 01/03/2012 09:00 PM, Robert Haas wrote:

Normally you can just go to buildfarm.postgresql.org
and see which machines are failing and at what stage, and the view the
stage logs to see the specific errors.  It's not the best web
interface I've ever seen, but it's not *that* bad.


And if someone wants to improve it they are welcome. The code is at 
https://github.com/PGBuildFarm/server-code. It helps if you know 
Template Toolkit :-)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standalone synchronous master

2012-01-03 Thread Robert Haas
On Tue, Dec 27, 2011 at 6:39 AM, Alexander Björnhagen
alex.bjornha...@gmail.com wrote:
 And so we get back to the three likelihoods in our two-node setup :

 1.The master fails
  - Okay, promote the standby

 2.The standby fails
  - Okay, the system still works but you no longer have data
 redundancy. Deal with it.

 3.Both fail, together or one after the other.

It seems to me that if you are happy with #2, you don't really need to
enable sync rep in the first place.

At any rate, even without multiple component failures, this
configuration makes it pretty easy to lose durability (which is the
only point of having sync rep in the first place).  Suppose the NIC
card on the master is the failing component.  If it happens to drop
the TCP connection to the clients just before it drops the connection
to the standby, the standby will have all the transactions, and you
can fail over just fine.  If it happens to drop the TCP connection to
the just before it drops the connection to the clients, the standby
will not have all the transactions, and failover will lose some
transactions - and presumably you enabled this feature in the first
place precisely to prevent that sort of occurrence.

I do think that it might be useful to have this if there's a
configurable timeout involved - that way, people could say, well, I'm
OK with maybe losing transactions if the standby has been gone for X
seconds.  But if the only possible behavior is equivalent to a
zero-second timeout I don't think it's too useful.  It's basically
just going to lead people to believe that their data is more secure
than it really is, which IMHO is not helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting -Werror in CFLAGS

2012-01-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 3, 2012 at 7:39 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Yes, I know that these only appeared in GCC 4.6+ and as such are a
 relatively recent phenomenon, but there has been some effort to
 eliminate them, and if I could get a non-hacked -Werror build I'd feel
 happy enough about excluding them as already outlined.

 I just do this:
 echo COPT=-Werror  src/Makefile.custom
 ...which seems to work reasonably well.

I see no point in -Werror whatsoever.  If you aren't examining the make
output for warnings, you're not following proper development practice
IMO.  gcc is not the only tool we use in the build process, so if you
are relying on -Werror to call attention to everything you should be
worrying about, you lost already.

I'm also less than thrilled with the idea that whatever the gcc boys
decide to make a warning tomorrow will automatically become a MUST FIX
NOW for us.  If you don't see why this is a problem, try building any
PG release more than a few months old on latest and greatest gcc.
(Of note here, latest-and-greatest is changing again this week, at least
for Fedora, and I fully expect 4.7 to start whinging about things we
never heard of before.)  Moving-target warnings are one thing,
moving-target hard errors are another thing entirely.

Personally I tend to do something like

make -j4 make.out 2make.err
cat make.err

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_internal.init and an index file have the same inode

2012-01-03 Thread Tom Lane
Daniel Farina dan...@heroku.com writes:
 And, even more interestingly,

 $ stat 16587.8 pg_internal.init
   File: `16587.8'
   Size: 98532 Blocks: 200IO Block: 4096   regular file
 Device: fe00h/65024d  Inode: 1073741952  Links: 1
 Access: (0600/-rw---)  Uid: (  107/postgres)   Gid: (  115/postgres)
 Access: 2011-12-14 23:35:35.630043643 +
 Modify: 2011-12-14 23:35:35.630043643 +
 Change: 2011-12-14 23:35:35.630043643 +
   File: `pg_internal.init'
   Size: 98532 Blocks: 200IO Block: 4096   regular file
 Device: fe00h/65024d  Inode: 1073741952  Links: 1
 Access: (0600/-rw---)  Uid: (  107/postgres)   Gid: (  115/postgres)
 Access: 2011-12-14 23:35:35.630043643 +
 Modify: 2011-12-14 23:35:35.630043643 +
 Change: 2011-12-14 23:35:35.630043643 +

 Most notably, the inode numbers are the same.  At first, I thought
 this was a file descriptor race in PG, but then I noticed the file
 system only reports *one* link: that doesn't look like a valid state
 for XFS.

Yeah.  In any case it wouldn't be a PG bug, because we don't issue any
link(2) calls AFAIR.  File an XFS bug.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-01-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 3, 2012 at 6:25 PM, Peter Eisentraut pete...@gmx.net wrote:
 [ reasons ]

 I agree with these reasons.  We don't get charged $0.50 per GUC, so
 there's no particular reason to contort things to have fewer of them.

Well, there definitely is a distributed cost to each additional GUC.
Peter's given what are probably adequate reasons to add several of them
here, but that doesn't mean we should not ask the question whether each
new GUC is really necessary.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL:2011 features

2012-01-03 Thread temporalcraig
I assume you guys know where to go to get the complete sql:2011 spec:

http://www.iso.org/iso/search.htm?qt=9075searchSubmit=Searchsort=reltype=simplepublished=true

But if time/money is an issue the following seems to be the best publicly
available description of the temporal features:

http://metadata-standards.org/Document-library/Documents-by-number/WG2-N1501-N1550/WG2_N1536_koa046-Temporal-features-in-SQL-standard.pdf
 
If/as you all make progress with this we would love to hear about it in the
Temporal Data group on LinkedIn, where we are discussing temporal data in
general and the SQL:2011 extensions in particular.

http://www.linkedin.com/groups?home=gid=3885228

We have had some excellent input from Jeff Davis in the past. 

Craig

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/SQL-2011-features-tp5114317p5116875.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?

2012-01-03 Thread Jim Nasby
On Jan 3, 2012, at 7:34 PM, Robert Haas wrote:
 On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby j...@nasby.net writes:
 Yeah, but the problem we run into is that with every backend trying to run 
 the clock on it's own we end up with high contention again... it's just in 
 a different place than when we had a true LRU. The clock sweep might be 
 cheaper than the linked list was, but it's still awfully expensive. I 
 believe our best bet is to have a free list that is actually useful in 
 normal operations, and then optimize the cost of pulling buffers out of 
 that list as much as possible (and let the bgwriter deal with keeping 
 enough pages in that list to satisfy demand).
 
 Well, maybe, but I think the historical evidence suggests that that
 approach will be a loser, simply because no matter how cheap, the
 freelist will remain a centralized and heavily contended data structure.
 IMO we need to be looking for a mechanism that has no single point of
 contention, and modifying the clock sweep rules looks like the best way
 to get there.
 
 Still, he who wants to do the work can try whatever approach he feels
 like.
 
 It might be possible to partition the free list.  So imagine, for
 example, 8 free lists.  The background writer runs the clock sweep and
 finds some buffers that are about ready to be reallocated and
 distributes one-eighth of them to each free list.  Then, when a
 backend wants to allocate a buffer, it picks a free list somehow
 (round robin?) and pulls a buffer off it.  If the buffer turns out to
 have been used since it was added to the free list, we give up on it
 and try again.  This hopefully shouldn't happen too often, though, as
 long as we only add enough buffers to the free list to satisfy the
 requests that we expect to get over the next
 small-fraction-of-a-second.
 
 Of course you have to think about what happens if you find that your
 chosen free list is empty.  In that case you probably want to try a
 different one, and if in the worst case where they're all empty, run
 the clock sweep in the foreground.  You probably also want to kick the
 background writer in the pants if even a single one is empty (or
 nearly empty?) and tell it to hurry up and add some more buffers to
 the freelist.

If it comes down to it, we can look at partitioning the free list. But here's 
the thing: this is the strategy that FreeBSD (and I think now Linux as well) 
use to service memory requests, be they for free memory or for reading data 
from disk. If it's good enough for an OS to use, I would expect we could make 
it work as well.

I would expect that pulling a page off the free list would be an extremely fast 
operation... lock the read lock on the list (we should probably have separate 
locks for putting stuff on the list vs taking it off), pin the buffer (which 
shouldn't be contentious), update the read pointer and drop the read lock. Even 
in C code that's probably less than 100 machine instructions, and no looping. 
Compared to the cost of running a clock sweep it should be significantly 
faster. So while there will be contention on the free list read lock, none of 
that contention should last very long.

Do we have any other places where we take extremely short locks like this and 
still run into problems?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PL/Perl Does not Like vstrings

2012-01-03 Thread David E. Wheeler
Is this perhaps by design?

Oy, this doesn’t look good:

$ do LANGUAGE plperl $$ elog(NOTICE, $^V) $$;
ERROR:  server conn crashed?
ERROR:  server conn crashed?
The connection to the server was lost. Attempting reset: Succeeded.
(pgxn@localhost:5900) 06:44:42 [pgxn] 
$ 

Best,

David

smime.p7s
Description: S/MIME cryptographic signature


[HACKERS] BGWriter latch, power saving

2012-01-03 Thread Peter Geoghegan
As part of the ongoing effort to reduce wake-ups when idle/power
consumption, the attached patch modifies the background writer to
hibernate in ten second bursts once the bgwriter laps the clock sweep.
It's fairly well commented, so a description of how it works here
would probably be redundant. The most controversial aspect of this
patch is the fact that it adds a SetLatch() call to MarkBufferDirty(),
though I went to some lengths to ensure that that call will very
probably find the latch already set when it actually matters, so it
only checks a single flag.

Thoughts? I can produce benchmarks, if that helps, but I think it's
fairly unlikely that the patch introduces a measurable performance
regression.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
new file mode 100644
index e3ae92d..0fcaf01
*** a/src/backend/bootstrap/bootstrap.c
--- b/src/backend/bootstrap/bootstrap.c
*** AuxiliaryProcessMain(int argc, char *arg
*** 416,421 
--- 416,422 
  
  		case BgWriterProcess:
  			/* don't set signals, bgwriter has its own agenda */
+ 			ProcGlobal-bgwriterLatch = MyProc-procLatch; /* Expose */
  			BackgroundWriterMain();
  			proc_exit(1);		/* should never return */
  
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
new file mode 100644
index fc1a579..a5248ba
*** a/src/backend/port/unix_latch.c
--- b/src/backend/port/unix_latch.c
***
*** 50,55 
--- 50,56 
  #include miscadmin.h
  #include postmaster/postmaster.h
  #include storage/latch.h
+ #include storage/proc.h
  #include storage/shmem.h
  
  /* Are we currently in WaitLatch? The signal handler would like to know. */
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
new file mode 100644
index 1f8d2d6..ad8a056
*** a/src/backend/postmaster/bgwriter.c
--- b/src/backend/postmaster/bgwriter.c
***
*** 51,56 
--- 51,58 
  #include storage/ipc.h
  #include storage/lwlock.h
  #include storage/pmsignal.h
+ #include storage/proc.h
+ #include storage/procsignal.h
  #include storage/shmem.h
  #include storage/smgr.h
  #include storage/spin.h
*** static volatile sig_atomic_t shutdown_re
*** 73,78 
--- 75,82 
  /*
   * Private state
   */
+ #define BGWRITER_HIBERNATE_MS			1
+ 
  static bool am_bg_writer = false;
  
  /* Prototypes for private functions */
*** BackgroundWriterMain(void)
*** 123,129 
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, SIG_IGN);			/* reserve for ProcSignal */
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
--- 127,133 
  	pqsignal(SIGQUIT, bg_quickdie);		/* hard crash time */
  	pqsignal(SIGALRM, SIG_IGN);
  	pqsignal(SIGPIPE, SIG_IGN);
! 	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
  	pqsignal(SIGUSR2, SIG_IGN);
  
  	/*
*** BackgroundWriterMain(void)
*** 238,278 
  	for (;;)
  	{
  		/*
! 		 * Emergency bailout if postmaster has died.  This is to avoid the
! 		 * necessity for manual cleanup of all postmaster children.
  		 */
! 		if (!PostmasterIsAlive())
! 			exit(1);
! 
! 		if (got_SIGHUP)
  		{
! 			got_SIGHUP = false;
! 			ProcessConfigFile(PGC_SIGHUP);
! 			/* update global shmem state for sync rep */
  		}
! 		if (shutdown_requested)
  		{
  			/*
! 			 * From here on, elog(ERROR) should end with exit(1), not send
! 			 * control back to the sigsetjmp block above
  			 */
! 			ExitOnAnyError = true;
! 			/* Normal exit from the bgwriter is here */
! 			proc_exit(0);		/* done */
! 		}
  
! 		/*
! 		 * Do one cycle of dirty-buffer writing.
! 		 */
! 		BgBufferSync();
  
! 		/* Nap for the configured time. */
! 		BgWriterNap();
  	}
  }
  
  /*
   * BgWriterNap -- Nap for the configured time or until a signal is received.
   */
  static void
  BgWriterNap(void)
--- 242,337 
  	for (;;)
  	{
  		/*
! 		 * Do one cycle of dirty-buffer writing, potentially hibernating if
! 		 * there have been no buffers to write.
  		 */
! 		if (!BgBufferSync())
  		{
! 			/* Clock sweep was not lapped - nap for the configured time. */
! 			BgWriterNap();
  		}
! 		else
  		{
+ 			int res = 0;
  			/*
! 			 * Initiate hibernation by arming the process latch. This usage
! 			 * differs from the standard pattern for latches outlined in
! 			 * latch.h, where the latch is generally reset immediately after
! 			 * WaitLatch returns, in the next iteration of an infinite loop.
! 			 *
! 			 * It should only be possible to *really* set the latch from client
! 			 * backends while the bgwriter is idle, and not during productive
! 			 * cycles where buffers are written, or shortly thereafter. It's
! 			 * important that the SetLatch() call within the buffer manager
! 			 * usually inexpensively returns having found that the latch is
! 			 * already 

Re: [HACKERS] 16-bit page checksums for 9.2

2012-01-03 Thread Kevin Grittner
Kevin Grittner  wrote:
 
 if we define sum1 and sum2 as uint I don't see how we can get an
 overflow with 8k byes
 
I feel the need to amend that opinion.
 
While sum1 only needs to hold a maximum of (BLCKSZ * 255), which
would be adequate for a BLCKSZ up to 16 MB, sum2 needs to hold up to
a maximum of ((BLCKSZ * (BLCKSZ + 1) / 2) * 255) so a 32-bit unsigned
isn't even good for an 8 KB block size.  A uint64 for sum2 can handle
BLCKSZ up to 256 MB.  So we could define sum1 as uint (which is
presumably either uint32 or uint64) and sum2 as uint64 and for a
BLCKSZ up toe 16 MB we still don't need to find the modulo 255
numbers until we're done adding things up.
 
Does anyone think we need to support anything bigger than that? 
Making them both uint64 would support a BLCKSZ up to 256 MB without
needing to do the division inside the loop. but it might be slightly
slower on 32-bit builds.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BGWriter latch, power saving

2012-01-03 Thread Heikki Linnakangas

On 04.01.2012 07:58, Peter Geoghegan wrote:

As part of the ongoing effort to reduce wake-ups when idle/power
consumption, the attached patch modifies the background writer to
hibernate in ten second bursts once the bgwriter laps the clock sweep.
It's fairly well commented, so a description of how it works here
would probably be redundant. The most controversial aspect of this
patch is the fact that it adds a SetLatch() call to MarkBufferDirty(),
though I went to some lengths to ensure that that call will very
probably find the latch already set when it actually matters, so it
only checks a single flag.


I think SetBufferCommitInfoNeedsSave() needs the same treatment as 
MarkBufferDirty(). And it would probably be good to only set the latch 
if the buffer wasn't dirty already. Setting a latch that's already set 
is fast, but surely it's even faster to not even try.



Thoughts? I can produce benchmarks, if that helps, but I think it's
fairly unlikely that the patch introduces a measurable performance
regression.


Yeah, I'd like to see a micro-benchmark of a worst-case scenario. I'm a 
bit worried about the impact on systems with a lot of CPUs. If you have 
a lot of CPUs writing to the same cache line that contains the latch's 
flag, that might get expensive.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers