date:20120113

Re: [HACKERS] ERRCODE_READ_ONLY_SQL_TRANSACTION

2012-01-13 Thread Dimitri Fontaine

Hi,

Tom Lane t...@sss.pgh.pa.us writes:
 Simon Riggs si...@2ndquadrant.com writes:
 Hot Standby returns ERRCODE_READ_ONLY_SQL_TRANSACTION in most cases
 for illegal actions on a standby.

 I don't think I like this patch: you are promoting what are and ought to
 be very low-level internal sanity checks into user-facing errors (which
 among other things will require translation effort for the messages).

So it seems the last-9-2-CF deadline is making us a little too hasty.

Apparently as you're saying there's no way to exercise that code paths
from an SQL connection on a Hot Standby short of deploying a C coded
extension calling either GetNewTransactionId() or XLogInsert(), which
means it's out of scope.

My quest was figuring out if ERRCODE_READ_ONLY_SQL_TRANSACTION really is
trustworthy as a signal that you could transparently now redirect the
transaction to the master when seeing that in a “proxy” of some sort.

I felt that we were missing something simple here, but after review I
think we finally have all the pieces to achieve that with current 9.2
code base in fact.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ERRCODE_READ_ONLY_SQL_TRANSACTION

2012-01-13 Thread Simon Riggs

On Fri, Jan 13, 2012 at 8:55 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:

 I felt that we were missing something simple here, but after review I
 think we finally have all the pieces to achieve that with current 9.2
 code base in fact.

Good, patch revoked. No time wasted, it was worth checking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench post-connection command

2012-01-13 Thread Simon Riggs

On Thu, Jan 12, 2012 at 8:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On Thu, Jan 12, 2012 at 5:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 More like \once ... any SQL command or meta command here ...
 if we want to extend the scripting language.  But I'd be perfectly happy
 with a command-line switch that specifies a script file to be run once.

 Once per connection, yes?

 Hmmm ... good question.  Heikki was speculating about doing CREATE TABLE
 or similar, which you'd want done only once period.  But I see no very
 strong reason why cases like that couldn't be handled outside of
 pgbench.  So yeah, once per connection.

OK, its a TODO item, but I don't think I'll have time for it for the next CF.

If we need it during testing of other patches then I'll write it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inlining comparators as a performance optimisation

2012-01-13 Thread Pierre C


On Wed, 21 Sep 2011 18:13:07 +0200, Tom Lane t...@sss.pgh.pa.us wrote:


Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

On 21.09.2011 18:46, Tom Lane wrote:

The idea that I was toying with was to allow the regular SQL-callable
comparison function to somehow return a function pointer to the
alternate comparison function,



You could have a new function with a pg_proc entry, that just returns a
function pointer to the qsort-callback.


Yeah, possibly.  That would be a much more invasive change, but cleaner
in some sense.  I'm not really prepared to do all the legwork involved
in that just to get to a performance-testable patch though.


A few years ago I had looked for a way to speed up COPY operations, and it  
turned out that COPY TO has a good optimization opportunity. At that time,  
for each datum, COPY TO would :


- test for nullness
- call an outfunc through fmgr
- outfunc pallocs() a bytea or text, fills it with data, and returns it  
(sometimes it uses an extensible string buffer which may be repalloc()d  
several times)
- COPY memcpy()s returned data to a buffer and eventually flushes the  
buffer to client socket.


I introduced a special write buffer with an on-flush callback (ie, a close  
relative of the existing string-buffer), in this case the callback was  
flush to client socket, and several outfuncs (one per type) which took  
that buffer as argument, besides the datum to output, and simply put the  
datum inside the buffer, with appropriate transformations (like converting  
to bytea or text), and flushed if needed.


Then the COPY TO BINARY of a constant-size datum would turn to :
- one test for nullness
- one C function call
- one test to ensure appropriate space available in buffer (flush if  
needed)
- one htonl() and memcpy of constant size, which the compiler turns out  
into a couple of simple instructions


I recall measuring speedups of 2x - 8x on COPY BINARY, less for text, but  
still large gains.


Although eliminating fmgr call and palloc overhead was an important part  
of it, another large part was getting rid of memcpy()'s which the compiler  
turned into simple movs for known-size types, a transformation that can be  
done only if the buffer write functions are inlined inside the outfuncs.  
Compilers love constants...


Additionnally, code size growth was minimal since I moved the old outfuncs  
code into the new outfuncs, and replaced the old fmgr-callable outfuncs  
with create buffer with on-full callback=extend_and_repalloc() - pass to  
new outfunc(buffer,datum) - return buffer. Which is basically equivalent  
to the previous palloc()-based code, maybe with a few extra instructions.


When I submitted the patch for review, Tom rightfully pointed out that my  
way of obtaining the C function pointer sucked very badly (I don't  
remember how I did it, only that it was butt-ugly) but the idea was to get  
a quick measurement of what could be gained, and the result was positive.  
Unfortunately I had no time available to finish it and make it into a real  
patch, I'm sorry about that.


So why do I post in this sorting topic ? It seems, by bypassing fmgr for  
functions which are small, simple, and called lots of times, there is a  
large gain to be made, not only because of fmgr overhead but also because  
of the opportunity for new compiler optimizations, palloc removal, etc.  
However, in my experiment the arguments and return types of the new  
functions were DIFFERENT from the old functions : the new ones do the same  
thing, but in a different manner. One manner was suited to sql-callable  
functions (ie, palloc and return a bytea) and another one to writing large  
amounts of data (direct buffer write). Since both have very different  
requirements, being fast at both is impossible for the same function.


Anyway, all that rant boils down to :

Some functions could benefit having two versions (while sharing almost all  
the code between them) :

- User-callable (fmgr) version (current one)
- C-callable version, usually with different parameters and return type

And it would be cool to have a way to grab a bare function pointer on the  
second one.


Maybe an extra column in pg_proc would do (but then, the proargtypes and  
friends would describe only the sql-callable version) ?

Or an extra table ? pg_cproc ?
Or an in-memory hash : hashtable[ fmgr-callable function ] = C version
- What happens if a C function has no SQL-callable equivalent ?
Or (ugly) introduce an extra per-type function type_get_function_ptr(  
function_kind ) which returns the requested function ptr


If one of those happens, I'll dust off my old copy-optimization patch ;)

Hmm... just my 2c

Regards
Pierre

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] replay_location indicates incorrect location

2012-01-13 Thread Fujii Masao

Hi,

When I looked at pg_stat_replication just after starting the standby before
executing any write transactions on the master, I found that replay_location
indicated incorrect location different from sent/write/flush_location. Then,
if I ran write transaction on the master, replay_location indicated the same
location as the others.

The cause of this problem is that Xlogctl-recoveryLastRecPtr which points
to replay_location is initialized with wrong variable ReadRecPtr. Instead, it
should be initialized with EndRecPtr. Attached patch does that. This needs
to be backported to 9.0.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6407,6413  StartupXLOG(void)
  		 */
  		SpinLockAcquire(xlogctl-info_lck);
  		xlogctl-replayEndRecPtr = ReadRecPtr;
! 		xlogctl-recoveryLastRecPtr = ReadRecPtr;
  		xlogctl-recoveryLastXTime = 0;
  		xlogctl-currentChunkStartTime = 0;
  		xlogctl-recoveryPause = false;
--- 6407,6413 
  		 */
  		SpinLockAcquire(xlogctl-info_lck);
  		xlogctl-replayEndRecPtr = ReadRecPtr;
! 		xlogctl-recoveryLastRecPtr = EndRecPtr;
  		xlogctl-recoveryLastXTime = 0;
  		xlogctl-currentChunkStartTime = 0;
  		xlogctl-recoveryPause = false;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2012-01-13 Thread Alexander Björnhagen

At this point I feel that this new functionality might be a bit
overkill for postgres, maybe it's better to stay lean and mean rather
than add a controversial feature like this.

I also agree that a more general replication timeout variable would be
more useful to a larger audience but that would in my view add more
complexity to the replication code which is quite simple and
understandable right now ...

Anyway, my backup plan was to achieve the same thing by triggering on
the logging produced on the primary server and switch to async mode
when detecting that the standby replication link has failed (and then
back again when it is restored). In effect I would put a replication
monitor on the outside of the server instead of embedding it.

/A

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CLONE TABLE DATA TO new_table

2012-01-13 Thread Marc Mamin

Hello,

I wonder if it would be possible to have a fast table clone function
(data only) while copying the corresponding data files
instead of using the CREATE TABLE AS  way.

pg_upgrade seems to have such a mechanisms, though it requires to first
stop the server...
This would of course require to lock the complete table and ensure that
all latest changes are flushed to the plates.
I don't know how are the plan about switching  from UNLOGGED to LOGGED
tables, but I guess this might be required to start logging the table
only after the copy.

Background: I have daily tables with hourly imports which may contain 
100 Mio rows and require 7 indices on them.
In order to improve import performances, I first do a copy of the active
table, import new data and rebuild the indexes.


Thanks for your great job,

Marc Mamin

Re: [HACKERS] replay_location indicates incorrect location

2012-01-13 Thread Simon Riggs

On Fri, Jan 13, 2012 at 10:04 AM, Fujii Masao masao.fu...@gmail.com wrote:

 When I looked at pg_stat_replication just after starting the standby before
 executing any write transactions on the master, I found that replay_location
 indicated incorrect location different from sent/write/flush_location. Then,
 if I ran write transaction on the master, replay_location indicated the same
 location as the others.

 The cause of this problem is that Xlogctl-recoveryLastRecPtr which points
 to replay_location is initialized with wrong variable ReadRecPtr. Instead, it
 should be initialized with EndRecPtr. Attached patch does that. This needs
 to be backported to 9.0.

Agreed. Will commit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] read transaction and sync rep

2012-01-13 Thread Fujii Masao

Hi,

I found that even read transaction waits for sync rep when it generates
WAL records even if XID is not assigned. For example, imagine the case
where SELECT query does a heap clean operation and generates
XLOG_HEAP2_CLEAN record. ISTM that such a read transaction doesn't
need to wait for sync rep because that's not visible to the client... Can
we skip waiting for sync rep if XID is not assigned?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] read transaction and sync rep

2012-01-13 Thread Simon Riggs

On Fri, Jan 13, 2012 at 11:30 AM, Fujii Masao masao.fu...@gmail.com wrote:

 I found that even read transaction waits for sync rep when it generates
 WAL records even if XID is not assigned. For example, imagine the case
 where SELECT query does a heap clean operation and generates
 XLOG_HEAP2_CLEAN record. ISTM that such a read transaction doesn't
 need to wait for sync rep because that's not visible to the client... Can
 we skip waiting for sync rep if XID is not assigned?

Your example of XLOG_HEAP2_CLEAN records is a good one but there are
other record types and circumstances, as described in the comment near
the top of RecordTransactionCommit

/*
 * If we didn't create XLOG entries, we're done here; otherwise 
we
 * should flush those entries the same as a commit record.  
(An
 * example of a possible record that wouldn't cause an XID to be
 * assigned is a sequence advance record due to nextval() --- 
we want
 * to flush that to disk before reporting commit.)
 */
if (!wrote_xlog)
goto cleanup;

Perhaps there is a case to say that sequences don't need to be flushed
if all they did was do nextval but no change was associated with that,
I'm not sure.

So I think there is a case for optimisation using finer grained decision making.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New replication mode: write

2012-01-13 Thread Fujii Masao

On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao masao.fu...@gmail.com wrote:

 Thought? Comments?

 This is almost exactly the same as my patch series
 syncrep_queues.v[1,2].patch earlier this year. Which I know because
 I was updating that patch myself last night for 9.2. I'm about half
 way through doing that, since you and I agreed in Ottawa I would do
 this. Perhaps it is better if we work together?

 I think this comment is mostly pointless. We don't have time to work
 together and there's no real reason to. You know what you're doing, so
 I'll leave you to do it.

 Please add the Apply mode.

OK, will do.

 In my patch, the reason I avoided doing WRITE mode (which we had
 previously referred to as RECV) was that no fsync of the WAL contents
 takes place. In that case we are applying changes using un-fsynced WAL
 data and in case of crash this would cause a problem.

My patch has not changed the execution order of WAL flush and replay.
WAL records are always replayed after they are flushed by walreceiver.
So, such a problem doesn't happen.

But which means that transaction might need to wait for WAL flush caused
by previous transaction even if WRITE mode is chosen. Which limits the
performance gain by WRITE mode, and should be improved later, I think.

 I was going to
 make the WalWriter available during recovery to cater for that. Do you
 not think that is no longer necessary?

That's still necessary to improve the performance in sync rep further, I think.
What I'd like to do (maybe in 9.3dev) after supporting WRITE mode is:

* Allow WAL records to be replayed before they are flushed to the disk.
* Add new GUC parameter specifying whether to allow the standby to defer
   WAL flush. If the parameter is false, walreceiver flushes WAL whenever it
   receives WAL (i.e., it's same as the current behavior). If true, walreceiver
   doesn't flush WAL at all. Instead, walwriter, backend or startup process
   does that. Walwriter periodically checks whether there is un-flushed WAL
   file, and flushes it if exists. When the buffer page is written out, backend
   or startup process forces WAL flush up to buffer's LSN.

If the above GUC parameter is set to true (i.e., walreceiver doesn't flush
WAL at all) and WRITE mode is chosen, transaction doesn't need to wait
for WAL flush on the standby at all. Also the frequency of WAL flush on
the standby would become lower, which significantly reduces I/O load.
After all, the performance in sync rep would improve very much.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New replication mode: write

2012-01-13 Thread Simon Riggs

On Fri, Jan 13, 2012 at 12:27 PM, Fujii Masao masao.fu...@gmail.com wrote:

 In my patch, the reason I avoided doing WRITE mode (which we had
 previously referred to as RECV) was that no fsync of the WAL contents
 takes place. In that case we are applying changes using un-fsynced WAL
 data and in case of crash this would cause a problem.

 My patch has not changed the execution order of WAL flush and replay.
 WAL records are always replayed after they are flushed by walreceiver.
 So, such a problem doesn't happen.

 But which means that transaction might need to wait for WAL flush caused
 by previous transaction even if WRITE mode is chosen. Which limits the
 performance gain by WRITE mode, and should be improved later, I think.

If the WALreceiver still flushes that is OK.

The latency would be smoother and lower if the WALwriter were active.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench post-connection command

2012-01-13 Thread Greg Smith

There's one part of this that's still fuzzy in the spec I'd like to 
clarify, if nothing else than for my own memory's sake--as the person 
most likely to review a random pgbench patch.  Simon gave an example 
like this:


  pgbench -x SET synchronous_commit = off

All are agreed this should take the name of a script instead on the 
command line:


  pgbench -x nosync

And execute that script as part of the initial setup to every connection.

Now:  nosync might be a shell script called similarly to \shell.  
That's more flexible in terms of its ability to do complicated setup 
things, say a more ambitious iteration along the 'create a table per 
connection' trail.  But it can't really prime connection parameters 
anymore, so that's pretty worthless.


It seems it must then be a pgbench script like -f specifies instead.  
It will execute SQL and pgbench's \ meta commands.  And I think that's 
OK so long as two things are nailed down (as in, tested to work and 
hopefully alluded to in the documentation):


1) The pgbench connection setup script can call \shell or \setshell.  
Then we've got a backdoor to also handle the complicated external script 
situation.


2) The connection setup script can set variables and they will still be 
active after passing control to the main test script.  Returning to the 
table per connection sort of idea, that might be:


setup:

\setrandom tablename 1 100
CREATE TABLE test_:tablename;

main:

SELECT count(*) FROM test_tablename;

I would use this feature all the time once it was added, so glad to see 
the idea pop up.  I also have a long standing personal TODO to write a 
doc update in this area:  suggest how to use environment variables to 
sneak settings into things.  There's been a couple of ideas for pgbench 
proposed that were blocked with you can already do that setting PGxxx 
before calling pgbench.  That is true, but not obvious, and 
periodically it get reinvented again.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLONE TABLE DATA TO new_table

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 5:56 AM, Marc Mamin m.ma...@intershop.de wrote:
 I wonder if it would be possible to have a fast table clone function (data
 only) while copying the corresponding data files

 instead of using the CREATE TABLE AS  way.

 pg_upgrade seems to have such a mechanisms, though it requires to first stop
 the server...

 This would of course require to lock the complete table and ensure that all
 latest changes are flushed to the plates.

I think it would be possible to implement this.  In fact, it could
probably be done as a contrib module.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLONE TABLE DATA TO new_table

2012-01-13 Thread Marc Mamin

This would be great, but I can't C  :-(

Marc

 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Freitag, 13. Januar 2012 14:12
 To: Marc Mamin
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] CLONE TABLE DATA TO new_table

 On Fri, Jan 13, 2012 at 5:56 AM, Marc Mamin m.ma...@intershop.de
 wrote:
  I wonder if it would be possible to have a fast table clone function
 (data
  only) while copying the corresponding data files

  instead of using the CREATE TABLE AS  way.

  pg_upgrade seems to have such a mechanisms, though it requires to
 first stop
  the server...

  This would of course require to lock the complete table and ensure
 that all
  latest changes are flushed to the plates.

 I think it would be possible to implement this.  In fact, it could
 probably be done as a contrib module.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Simon Riggs

The open items list for 9.2 says

Index-only scans need to be disabled when running under Hot Standby

There is no explanation, so please explain here so we can change it,
if possible.

Not sure its great policy to keep implementing things that don't work
on HS, while not giving a proper chance for other people to fix that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres ReviewFest Patch: URI connection string support for libpq

2012-01-13 Thread Greg Smith


On 01/12/2012 11:07 PM, Nick Roosevelt wrote:
Just reviewed the patch for adding URI connection string support for 
libpg.
...Seems like the file has changed since this patch was created.  
Please recreate the patch.


Thanks for the review.  In the CommitFest appliation, it looks like 
someone marked this patch Returned with Feedback after adding your 
review.  Whoever did that, it wasn't the right next step for this one.  
Returned with Feedback suggests a patch has been pushed out of the 
CommitFest as not possible to commit yet.  It's a final state--normally 
a submission going there is finished with, until the next CommitFest 
starts.  In a case like this one, where a problem is identified but 
there's still time to fix it, the right next step is Waiting on Author.


There's some more details about this at 
http://wiki.postgresql.org/wiki/Running_a_CommitFest , which we don't 
expect everyone to know because it isn't one of the popular pages to 
read.  I've sorted out the problem for this one already, just letting 
whoever made that change know about the process here.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Poorly thought out code in vacuum

2012-01-13 Thread Robert Haas

On Fri, Jan 6, 2012 at 12:45 PM, Robert Haas robertmh...@gmail.com wrote:
 Oh, that's brilliant.  OK, I'll go try that.

All right, that test does in fact reveal the broken-ness of the
current code, and the patch I committed upthread does seem to fix it,
so I've committed that.

After further reflection, I'm thinking that skipping the *second* heap
pass when a cleanup lock is not immediately available is safe even
during an anti-wraparound vacuum (i.e. scan_all = true), because the
only thing the second pass is doing is changing dead line pointers to
unused, and failing to do that doesn't prevent relfrozenxid
advancement: the dead line pointer doesn't contain an XID.  The next
vacuum will clean it up: it'll see that there's a dead line pointer,
add that to the list of TIDs to be killed if seen during the index
vacuum (it won't be, since we got that far the first time, but vacuum
doesn't know or care), and then try again during the second heap pass
to mark that dead line pointer unused.  This might be worth a code
comment, since it's not entirely obvious.  At any rate, it seems that
the worst thing that happens from skipping a block during the second
pass is that we postpone reclaiming a line pointer whose tuple storage
has already been recovered, which is pretty far down in the weeds.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to allow users to kill their own queries

2012-01-13 Thread Greg Smith


On 01/03/2012 12:59 PM, Tom Lane wrote:

Noah Mischn...@leadboat.com  writes:
   

Regarding the other message, avoid composing a translated message from
independently-translated parts.
 

Yes.  I haven't looked at the patch, but I wonder whether it wouldn't be
better to dodge both of these problems by having the subroutine return a
success/failure result code, so that the actual ereport can be done at
an outer level where the full message (and hint) can be written
directly.
   


Between your and Noah's comments I understand the issue and likely 
direction to sort this out now.  Bounced forward to the next CommitFest 
and back on my plate again.  Guess I'll be learning new and exciting 
things about translation this week.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to allow users to kill their own queries

2012-01-13 Thread Magnus Hagander

On Fri, Jan 13, 2012 at 14:42, Greg Smith g...@2ndquadrant.com wrote:
 On 01/03/2012 12:59 PM, Tom Lane wrote:

 Noah Mischn...@leadboat.com  writes:


 Regarding the other message, avoid composing a translated message from
 independently-translated parts.


 Yes.  I haven't looked at the patch, but I wonder whether it wouldn't be
 better to dodge both of these problems by having the subroutine return a
 success/failure result code, so that the actual ereport can be done at
 an outer level where the full message (and hint) can be written
 directly.



 Between your and Noah's comments I understand the issue and likely direction
 to sort this out now.  Bounced forward to the next CommitFest and back on my
 plate again.  Guess I'll be learning new and exciting things about
 translation this week.

I should say that I have more or less finished this one off, since I
figured you were working on more important things. Just a final round
of cleanup and commit left, really, so you can take it off your plate
again if you want to :-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 8:27 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The open items list for 9.2 says

 Index-only scans need to be disabled when running under Hot Standby

 There is no explanation, so please explain here so we can change it,
 if possible.

 Not sure its great policy to keep implementing things that don't work
 on HS, while not giving a proper chance for other people to fix that.

Well, I agree that it's not a great policy to implement lots of things
that don't work on Hot Standby, but this particular issue is an old
one.

http://archives.postgresql.org/pgsql-hackers/2008-10/msg01422.php

Since the xmin horizon on the standby may precede the xmin horizon on
the master, we can't assume that a page is all-visible on the standby
just because it's all-visible on the master.  The crash-safe
visibililty map code is in some ways a step forward for Hot Standby,
because in previous releases, we didn't WAL-log setting visibility map
bits at all, and therefore a just-promoted Hot Standby server would
have no bits set apart from those which had been continuously set
since the base backup was taken, or those it happened to get by chance
in full page images.  Even in old releases, that's not particularly
good, because it means that (1) sequential scans on a just-promoted
standby will be slower, since those can leverage PD_ALL_VISIBLE to
avoid retail MVCC checks on every tuple and (2) the first VACUUM on
each table on the standby after promotion will possibly need to scan a
lot more of the table than was being scanned by a typical VACUUM on
the old master, owing to the relevant bits in the visibillity map not
being set.  9.2 will be better in this regard, because the bits will
propagate from master to standby as they're set on the master, but
it's still not safe to rely on them until we've exited recovery.

I'm not sure I have a brilliant idea for how to fix this.  We could
skip replay of any WAL record that sets a visibility map or
PD_ALL_VISIBLE bit unless the page is also all-visible under the
standby's xmin horizon, but we'd need some way to know that, and we'd
need to worry not only about the records emitted by vacuum that
explicitly set the bit, but also about any place where we send the
whole page and the bit goes along with it; we'd need to add code to go
in and clear the bit.  That seems a bit complex and error-prone.  Or,
we could try to lift this restriction in the special case when
hot_standby_feedback is set, though I have a feeling that's not really
robust - any time you lose the connection to the master, it'll lose
your xmin holdback and possibly mark some things all-visible that
really aren't on the standby, and then you reconnect and replay those
WAL records and bad things happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Simon Riggs

On Fri, Jan 13, 2012 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote:

 Since the xmin horizon on the standby may precede the xmin horizon on
 the master

When hot_standby_feedback is turned on, the xmin of the standby is
provided to the master to ensure the situation you describe never
happens.

Surely that means the problem is solved, if we can confirm the setting
of the parameter. So this seems like a cleanup issues that can/should
be resolved.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] show Heap Fetches in EXPLAIN for index-only scans

2012-01-13 Thread Robert Haas

I think that people who are using index-only scans are going to want
some way to find out the degree to which the scans are in fact
index-only.

So here's a 5-line patch that adds the number of heap fetches to the
EXPLAIN ANALYZE output.  This might not be all the instrumentation
we'll ever want here, but I think we at least want this much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


explain-heap-fetches.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 10:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote:

 Since the xmin horizon on the standby may precede the xmin horizon on
 the master

 When hot_standby_feedback is turned on, the xmin of the standby is
 provided to the master to ensure the situation you describe never
 happens.

 Surely that means the problem is solved, if we can confirm the setting
 of the parameter. So this seems like a cleanup issues that can/should
 be resolved.

How do you respond to the concerns I raised about that approach in the
last sentence of my previous email?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans

2012-01-13 Thread Magnus Hagander

On Fri, Jan 13, 2012 at 16:21, Robert Haas robertmh...@gmail.com wrote:
 I think that people who are using index-only scans are going to want
 some way to find out the degree to which the scans are in fact
 index-only.

 So here's a 5-line patch that adds the number of heap fetches to the
 EXPLAIN ANALYZE output.  This might not be all the instrumentation
 we'll ever want here, but I think we at least want this much.

Agreed.

Would also be good to have counter sin pg_stat_* for this, since you'd
usually want to look at this kind of data over time as well. In your
plans? ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Jim Nasby

On Jan 13, 2012, at 8:03 AM, Robert Haas wrote:
 Or,
 we could try to lift this restriction in the special case when
 hot_standby_feedback is set, though I have a feeling that's not really
 robust - any time you lose the connection to the master, it'll lose
 your xmin holdback and possibly mark some things all-visible that
 really aren't on the standby, and then you reconnect and replay those
 WAL records and bad things happen.

Also, what happens if you started off with hot_standby_feedback turned off? New 
stuff won't just magically start working when you turn it on (or is that 
parameter settable only on restart?)

ISTM that hot standbys need some ability to store some interim data that is 
only needed by the hot standby while older transactions are running. IIRC we've 
seen other places where we have this problem too.

Perhaps it would be possible to keep older copies of pages around when there 
are older transactions running on the standby?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans

2012-01-13 Thread Peter Geoghegan

On 13 January 2012 15:21, Robert Haas robertmh...@gmail.com wrote:
 I think that people who are using index-only scans are going to want
 some way to find out the degree to which the scans are in fact
 index-only.

 So here's a 5-line patch that adds the number of heap fetches to the
 EXPLAIN ANALYZE output.  This might not be all the instrumentation
 we'll ever want here, but I think we at least want this much.

Good idea. The fact that that information wasn't available did bother me.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 10:29 AM, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Jan 13, 2012 at 16:21, Robert Haas robertmh...@gmail.com wrote:
 I think that people who are using index-only scans are going to want
 some way to find out the degree to which the scans are in fact
 index-only.

 So here's a 5-line patch that adds the number of heap fetches to the
 EXPLAIN ANALYZE output.  This might not be all the instrumentation
 we'll ever want here, but I think we at least want this much.

 Agreed.

 Would also be good to have counter sin pg_stat_* for this, since you'd
 usually want to look at this kind of data over time as well. In your
 plans? ;)

Not really.  I don't have a clear enough idea about what that should
look like, and I expect a vigorous debate over the distributed cost of
another counter.  But I'm happy to have someone who feels more
strongly about it than I do take up the cause.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 10:31 AM, Jim Nasby j...@nasby.net wrote:
 Perhaps it would be possible to keep older copies of pages around when there 
 are older transactions running on the standby?

I've thought about that... it's basically a rollback segment, which
for that matter might be judged superior to what we do on the master
(i.e. VACUUM).  Needless to say, that would be just a teensy bit more
work than we're likely to have time for in 9.2.  But maybe for 10.2...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans

2012-01-13 Thread Peter Geoghegan

On 13 January 2012 15:31, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 13, 2012 at 10:29 AM, Magnus Hagander mag...@hagander.net wrote:
 Would also be good to have counter sin pg_stat_* for this, since you'd
 usually want to look at this kind of data over time as well. In your
 plans? ;)

 Not really.  I don't have a clear enough idea about what that should
 look like, and I expect a vigorous debate over the distributed cost of
 another counter.  But I'm happy to have someone who feels more
 strongly about it than I do take up the cause.

Maybe the the ioss_HeapFetches field should be a uint32? That's all
it's going to be on some platforms anyway.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Text comparison suddenly can't find collation?

2012-01-13 Thread Tom Lane

Johann 'Myrkraverk' Oskarsson joh...@2ndquadrant.com writes:
 Why would a string comparison work in one case and not another?  In
 the following example, it works to compare a and b, but not a and d.

 This is in a C module which calls
   DirectFunctionCall2( text_le, d1, d2 );

As of 9.1, I'd expect that coding to fail every time.  text_le needs to
be passed a collation, and you aren't doing so.  You need to be using
DirectFunctionCall2Coll.

Where to get the collation from might be an interesting question too,
but without more context it's hard to guess what will be appropriate for
you.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TG_DEPTH patch v1

2012-01-13 Thread Kevin Grittner

[reviving discussion of another old patch]
 
In post:
 
http://archives.postgresql.org/pgsql-hackers/2011-06/msg00870.php
 
Florian Pflug f...@phlo.org wrote:
 
 Updated patch attached.
 
Thanks.

 The trigger depth is incremented before calling the trigger
 function in ExecCallTriggerFunc() and decremented right
 afterwards, which seems fine - apart from the fact that the
 decrement is skipped in case of an error. The patch handles that
 by saving respectively restoring the value of pg_depth in
 PushTransaction() respectively {Commit,Abort}SubTransaction().
 
 While I can't come up with a failure case for that approach, it
 seems rather fragile - who's to say that nested trigger
 invocations correspond that tightly to sub-transaction?
 
Good question -- is it reasonable to assume that if an error is
thrown in a trigger, that the state restored when the error is
caught will be at the same trigger depth as when the transaction or
subtransaction started?
 
FWIW, the patch, using this technique, has been in production use
with about 3,000 OLTP users for about six months, with no apparent
problems.  On the other hand, we don't do a lot with explicit
subtransactions.
 
 At the very least this needs a comment explaining why this is
 safe,
 
Agreed.
 
 but I believe the same effect could be achieved more cleanly by
 a TRY/CATCH block in ExecCallTriggerFunc().
 
That occurred to me, but I was concerned about the cost of setting
and clearing a longjump target for every trigger call.  It seems
like I've seen comments about that being relatively expensive. 
Tracking one more int in the current subtransaction structures is
pretty cheap, if that works.
 
 * Other in-core PLs
 As it stands, the patch only export tg_depth to plpgsql functions,
 not to functions written in one of the other in-core PLs. It'd be
 good to change that, I believe - otherwise the other PLs become
 second-class citizens in the long run.
 
Are you suggesting that this be implemented as a special trigger
variable in every PL, or that it simply be a regular function that
returns zero when not in a trigger and some positive value when
called from a trigger?  The latter seems like a pretty good idea to
me.  If that is done, is there any point to *also* having a TG_DEPTH
trigger variable in plpgsql?  (I don't think there is.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 10:41 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 13 January 2012 15:31, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 13, 2012 at 10:29 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 Would also be good to have counter sin pg_stat_* for this, since you'd
 usually want to look at this kind of data over time as well. In your
 plans? ;)

 Not really.  I don't have a clear enough idea about what that should
 look like, and I expect a vigorous debate over the distributed cost of
 another counter.  But I'm happy to have someone who feels more
 strongly about it than I do take up the cause.

 Maybe the the ioss_HeapFetches field should be a uint32? That's all
 it's going to be on some platforms anyway.

I originally had it that way, but then it didn't match what
ExplainPropertyLong() was looking for.  I didn't think it was worth
further complicating explain.c for this...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Simon Riggs

On Fri, Jan 13, 2012 at 3:22 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 13, 2012 at 10:17 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote:

 Since the xmin horizon on the standby may precede the xmin horizon on
 the master

 When hot_standby_feedback is turned on, the xmin of the standby is
 provided to the master to ensure the situation you describe never
 happens.

 Surely that means the problem is solved, if we can confirm the setting
 of the parameter. So this seems like a cleanup issues that can/should
 be resolved.

 How do you respond to the concerns I raised about that approach in the
 last sentence of my previous email?

I think it should be you that comes up with a fix, not for me to
respond to your concerns about how hard it is. Many things that don't
fully work are rejected for that reason.

Having said that, I have input that seems to solve the problem.

Many WAL records have latestRemovedXid on them. We can use the same
idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the
latest vacrelstats-latestRemovedXid. That then creates a recovery
snapshot conflict that would cancel any query that might then see a
page of the vis map that was written when the xmin was later than on
the standby. If replication disconnects briefly and a vimap bit is
updated that would cause a problem, just as the same situation would
cause a problem because of other record types.

If problem solved then we can enable IndexOnlyScans on standby.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans

2012-01-13 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 So here's a 5-line patch that adds the number of heap fetches to the
 EXPLAIN ANALYZE output.  This might not be all the instrumentation
 we'll ever want here, but I think we at least want this much.

Cosmetic gripes:

1. Please initialize the counter in ExecInitIndexOnlyScan.  We don't
generally rely on node fields to init as zeroes.

2. Project style is to use foo++, not ++foo, in contexts where
it doesn't actually matter which is used.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres ReviewFest Patch: URI connection string support for libpq

2012-01-13 Thread Josh Berkus


 There's some more details about this at
 http://wiki.postgresql.org/wiki/Running_a_CommitFest , which we don't
 expect everyone to know because it isn't one of the popular pages to
 read.  I've sorted out the problem for this one already, just letting
 whoever made that change know about the process here.

Yeah, I need to go through the new reviewer's comments.  Several of them
got ahead of the group and started marking things up on the CF page;
should be easily cleaned up though.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Review of: explain / allow collecting row counts without timing info

2012-01-13 Thread Josh Berkus

Eric's review follows:

Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from
2012-01-12 git checkout.

Patch applied fine.

'make check' results in failures when this patch is put into place.


 6 of 128 tests failed.


Here are the relevant failures.

parallel group (2 tests):  create_view create_index
 create_index ... FAILED
parallel group (9 tests):  create_cast create_aggregate drop_if_exists
typed_table vacuum constraints create_table_like triggers inherit
 inherit  ... FAILED
parallel group (20 tests):  select_distinct_on btree_index update random
select_distinct select_having namespace delete case hash_index union
select_implicit select_into transactions portals subselect arrays
aggregates join prepared_xacts
 join ... FAILED
 aggregates   ... FAILED
parallel group (15 tests):  combocid portals_p2 advisory_lock xmlmap
tsdicts guc functional_deps dependency select_views cluster tsearch
window foreign_data foreign_key bitmapops
 foreign_data ... FAILED
parallel group (19 tests):  limit prepare copy2 conversion xml plancache
returning temp sequence without_oid with rowtypes truncate polymorphism
domain largeobject rangefuncs alter_table plpgsql
 alter_table  ... FAILED


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 11:13 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I think it should be you that comes up with a fix, not for me to
 respond to your concerns about how hard it is. Many things that don't
 fully work are rejected for that reason.

Well, I disagree.  The fact that all-visible info can't be trusted in
standby mode is a problem that has existed since Hot Standby was
committed, and I don't feel obliged to fix it just because I was
involved in developing a new feature that happens to rely on
all-visible info.  I'm sorry to butt heads with you on this one, but
this limitation has been long-known and discussed many times before on
pgsql-hackers, and I'm not going to drop everything and start working
on this just because you seem to think that I should.

 Having said that, I have input that seems to solve the problem.

 Many WAL records have latestRemovedXid on them. We can use the same
 idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the
 latest vacrelstats-latestRemovedXid. That then creates a recovery
 snapshot conflict that would cancel any query that might then see a
 page of the vis map that was written when the xmin was later than on
 the standby. If replication disconnects briefly and a vimap bit is
 updated that would cause a problem, just as the same situation would
 cause a problem because of other record types.

That could create a lot of recovery conflicts when
hot_standby_feedback=off, I think, but it might work when
hot_standby_feedback=on.  I don't fully understand the
latestRemovedXid machinery, but I guess the idea would be to kill any
standby transaction whose proc-xmin precedes the oldest committed
xmin or xmax on the page.  If hot_standby_feedback=on then there
shouldn't be any, except in the case where it's just been enabled or
the SR connection is bouncing.

Also, what happens if an all-visible bit gets set on the standby
through some other mechanism - e.g. restored from an FPI or
XLOG_HEAP_NEWPAGE?  I'm not sure whether we ever do an FPI of the
visibility map page itself, but we certainly do it for the heap pages.
 So it might be that this infrastructure would (somewhat bizarrely)
trust the visibility map bits but not the PD_ALL_VISIBLE bits.  I'm
hoping Heikki or Tom will comment on this thread, because I think
there are a bunch of subtle issues here and that we could easily screw
it up by trying to plow through the problem too hastily.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpoint writeback via sync_file_range

2012-01-13 Thread Jeff Janes

On Thu, Jan 12, 2012 at 7:26 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 1/11/12 9:25 AM, Andres Freund wrote:

 The heavy pressure putting it directly in the writeback queue
 leads to less efficient io because quite often it won't reorder sensibly
 with
 other io anymore and thelike. At least that was my experience in using it
 with
 in another application.


 Sure, this is one of the things I was cautioning about in the Double Writes
 thread, with VACUUM being the worst such case I've measured.

 The thing to realize here is that the data we're talking about must be
 flushed to disk in the near future.  And Linux will happily cache gigabytes
 of it.  Right now, the database asks for that to be forced to disk via
 fsync, which means in chunks that can be large as a gigabyte.

 Let's say we have a traditional storage array and there's competing
 activity.  10MB/s would be a good random I/O write rate in that situation.
  A single fsync that forces 1GB out at that rate will take *100 seconds*.
  And I've seen exactly that when trying to--about 80 seconds is my current
 worst checkpoint stall ever.

 And we don't have a latency vs. throughput knob any finer than that.  If one
 is added, and you turn it too far toward latency, throughput is going to
 tank for the reasons you've also seen.  Less reordering, elevator sorting,
 and write combining.  If the database isn't going to micro-manage the
 writes, it needs to give the OS room to do that work for it.

Are there any IO benchmarking tools out there that benchmark the
effects of reordering, elevator sorting, write combining, etc.?

What I've seen is basically either completely sequential or
completely random with not much in between.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Simon Riggs

On Fri, Jan 13, 2012 at 5:08 PM, Robert Haas robertmh...@gmail.com wrote:

 Also, what happens if an all-visible bit gets set on the standby
 through some other mechanism - e.g. restored from an FPI or
 XLOG_HEAP_NEWPAGE?  I'm not sure whether we ever do an FPI of the
 visibility map page itself, but we certainly do it for the heap pages.
  So it might be that this infrastructure would (somewhat bizarrely)
 trust the visibility map bits but not the PD_ALL_VISIBLE bits.  I'm
 hoping Heikki or Tom will comment on this thread, because I think
 there are a bunch of subtle issues here and that we could easily screw
 it up by trying to plow through the problem too hastily.

An FPI can't change the all visible flag. If it did, it would imply
that we'd skipped generation or replay of an XLOG_HEAP2_VISIBLE
record, or that we're doing crash recovery/inital startup and HS is
not yet enabled.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-13 Thread Tom Lane

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 You were right.  One of the isolation tests failed an assertion;
 things pass with the attached change, setting the cmax
 conditionally.  Some comments updated.  I hope this is helpful.

I worked over this patch a bit, mostly cosmetically; updated version
attached.  However, we're not there yet.  I have a couple of remaining
concerns:

1. I think the error message needs more thought.  I believe it's
possible to trigger this error not only with BEFORE triggers, but also
with a volatile function used in the query that reaches around and
updates row(s) of the target table.  Now, I don't have the slightest
qualms about breaking any apps that do anything so dirty, but should
we try to generalize the message text to cope with that?

2. The HINT needs work too, as it seems pretty useless as it stands.
I'd have expected some short reference to the technique of re-executing
the update in the trigger and then returning NULL.  (BTW, does this
patch require any documentation changes, and if so where?)

3. I modified heap_lock_tuple to also return a HeapUpdateFailureData,
principally on the grounds that its API should be largely parallel to
heap_update, but having done that I can't help wondering whether its
callers are okay with skipping self-updated tuples.  I see that you
changed EvalPlanQualFetch, but I'm not entirely sure that that is right;
shouldn't it continue to ignore rows modified by the current command
itself?  And you did not touch the other two callers, which definitely
have got issues.  Here is an example, which is basically the reference
count case refactored into a single self-referencing table, so that we
can hit both parent and child rows in one operation:

create table test (
id int primary key,
parent int references test,
data text,
nchildren int not null default 0
);

create function test_ins_func()
  returns trigger language plpgsql as
$$
begin
  if new.parent is not null then
update test set nchildren = nchildren + 1 where id = new.parent;
  end if;
  return new;
end;
$$;
create trigger test_ins_trig before insert on test
  for each row execute procedure test_ins_func();

create function test_del_func()
  returns trigger language plpgsql as
$$
begin
  if old.parent is not null then
update test set nchildren = nchildren - 1 where id = old.parent;
  end if;
  return old;
end;
$$;
create trigger test_del_trig before delete on test
  for each row execute procedure test_del_func();

insert into test values (1, null, 'root');
insert into test values (2, 1, 'root child A');
insert into test values (3, 1, 'root child B');
insert into test values (4, 2, 'grandchild 1');
insert into test values (5, 3, 'grandchild 2');

update test set data = 'root!' where id = 1;

select * from test;

delete from test;

select * from test;

drop table test;
drop function test_ins_func();
drop function test_del_func();

When you run this, with or without the current patch, you get:

 id | parent | data | nchildren 
++--+---
  2 |  1 | root child A | 1
  4 |  2 | grandchild 1 | 0
  3 |  1 | root child B | 1
  5 |  3 | grandchild 2 | 0
  1 || root!| 2
(5 rows)

DELETE 4
 id | parent | data  | nchildren 
++---+---
  1 || root! | 0
(1 row)

the reason being that when the outer delete arrives at the last (root!)
row, GetTupleForTrigger fails because the tuple is already self-updated,
and we treat that as canceling the outer delete action.

I'm not sure what to do about this.  If we throw an error there, there
will be no way that the trigger can override the error because it will
never get to run.  Possibly we could plow ahead with the expectation
of throwing an error later if the trigger doesn't cancel the
update/delete, but is it safe to do so if we don't hold lock on the
tuple?  In any case that idea doesn't help with the remaining caller,
ExecLockRows.

regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5f6ac2ec1fdaf7598084150d017d8c85bfeeebe5..eb94060371a41913c63b2187fb4b66012ebfaed3 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*** simple_heap_insert(Relation relation, He
*** 2317,2343 
   *
   *	relation - table to be modified (caller must hold suitable lock)
   *	tid - TID of tuple to be deleted
-  *	ctid - output parameter, used only for failure case (see below)
-  *	update_xmax - output parameter, used only for failure case (see below)
   *	cid - delete command ID (used for visibility test, and stored into
   *		cmax if successful)
   *	crosscheck - if not InvalidSnapshot, also check tuple against this
   *	wait - true if should wait for any conflicting update to commit/abort
   *
   * Normal, successful return value is HeapTupleMayBeUpdated, which
   * actually

Re: [HACKERS] Standalone synchronous master

2012-01-13 Thread Jeff Janes

On Fri, Jan 13, 2012 at 2:30 AM, Alexander Björnhagen
alex.bjornha...@gmail.com wrote:
 At this point I feel that this new functionality might be a bit
 overkill for postgres, maybe it's better to stay lean and mean rather
 than add a controversial feature like this.

I don't understand why this is controversial.  In the current code, if
you have a master and a single sync standby, and the master disappears
and you promote the standby, now the new master is running *without a
standby*.  If you are willing to let the new master run without a
standby, why are you not willing to let the
the old one do so if it were the standby which failed in the first place?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2012-01-13 Thread Tom Lane

Jeff Janes jeff.ja...@gmail.com writes:
 I don't understand why this is controversial.  In the current code, if
 you have a master and a single sync standby, and the master disappears
 and you promote the standby, now the new master is running *without a
 standby*.

If you configured it to use sync rep, it won't accept any transactions
until you give it a standby.  If you configured it not to, then it's you
that has changed the replication requirements.

 If you are willing to let the new master run without a
 standby, why are you not willing to let the
 the old one do so if it were the standby which failed in the first place?

Doesn't follow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] read transaction and sync rep

2012-01-13 Thread Jeff Janes

On Fri, Jan 13, 2012 at 3:44 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 11:30 AM, Fujii Masao masao.fu...@gmail.com wrote:

 I found that even read transaction waits for sync rep when it generates
 WAL records even if XID is not assigned. For example, imagine the case
 where SELECT query does a heap clean operation and generates
 XLOG_HEAP2_CLEAN record. ISTM that such a read transaction doesn't
 need to wait for sync rep because that's not visible to the client... Can
 we skip waiting for sync rep if XID is not assigned?

I think this applies not only to sync rep, but also the xlog sync on
the master as well.
That is, the transaction could just commit asynchronously.

 Your example of XLOG_HEAP2_CLEAN records is a good one but there are
 other record types and circumstances, as described in the comment near
 the top of RecordTransactionCommit

                /*
                 * If we didn't create XLOG entries, we're done here; 
 otherwise we
                 * should flush those entries the same as a commit record.     
  (An
                 * example of a possible record that wouldn't cause an XID to 
 be
                 * assigned is a sequence advance record due to nextval() --- 
 we want
                 * to flush that to disk before reporting commit.)
                 */
                if (!wrote_xlog)
                        goto cleanup;

 Perhaps there is a case to say that sequences don't need to be flushed
 if all they did was do nextval but no change was associated with that,
 I'm not sure.

I think there that that is the case.  If one wants to argue that
someone could take the sequence value and do something with it outside
the database and so that sequence can't be repeated after recovery,
then the commit is the wrong place to make that argument.  The flush
would have to occur before the value is handed out, not before the
transaction which obtained the value commits.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2012-01-13 Thread Kevin Grittner

Jeff Janes jeff.ja...@gmail.com wrote:\
 
 I don't understand why this is controversial.
 
I'm having a hard time seeing why this is considered a feature.  It
seems to me what is being proposed is a mode with no higher
integrity guarantee than asynchronous replication, but latency
equivalent to synchronous replication.  I can see where it's
tempting to want to think it gives something more in terms of
integrity guarantees, but when I think it through, I'm not really
seeing any actual benefit.
 
If this fed into something such that people got jabber message,
emails, or telephone calls any time it switched between synchronous
and stand-alone mode, that would make it a built-in monitoring,
fail-over, and alert system -- which *would* have some value.  But
in the past we've always recommended external tools for such
features.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-13 Thread Tom Lane

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 3. I modified heap_lock_tuple to also return a
 HeapUpdateFailureData, principally on the grounds that its API
 should be largely parallel to heap_update, but having done that I
 can't help wondering whether its callers are okay with skipping
 self-updated tuples.  I see that you changed EvalPlanQualFetch,
 but I'm not entirely sure that that is right; shouldn't it
 continue to ignore rows modified by the current command itself?
 
 I made that change when working on the approach where safe
 conflicts (like a DELETE from within the BEFORE DELETE trigger) were
 quietly ignored.  Without that change, it didn't see the end of the
 ctid chain, and didn't behave correctly.  I've been wondering if it
 is still needed.  I had been planning to create a test case to try
 to show that it was.  Maybe an UPDATE with a join to force multiple
 row updates from the primary statement, followed by a triggered
 UPDATE to the row.  If that doesn't need the EvalPlanQualFetch
 change, I don't know what would.

The EvalPlanQual code isn't really exercised by any existing regression
tests, because it is only triggered when two sessions concurrently
update the same row, something that we avoid for reproducibility's sake
in the regression tests.  I think we could now test it using the
isolation test scaffolding, and maybe it would be a good idea to add
some tests there.  I did verify that whether that part of your patch
is included or not makes no difference to any existing test.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-13 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 
 I worked over this patch a bit, mostly cosmetically; updated
 version attached.
 
Thanks!
 
 However, we're not there yet.  I have a couple of remaining
 concerns:
 
 1. I think the error message needs more thought.  I believe it's
 possible to trigger this error not only with BEFORE triggers, but
 also with a volatile function used in the query that reaches
 around and updates row(s) of the target table.  Now, I don't have
 the slightest qualms about breaking any apps that do anything so
 dirty, but should we try to generalize the message text to cope
 with that?
 
I hadn't thought of that, but it does seem possible.  Maybe after
dealing with the other points, I'll work on a test case to show that
behavior.
 
I'm also fine with generating an error for such dirty tricks, and I
agree that if that's indeed possible we should make the message
general enough to cover that case.  Nothing comes to mind at the
moment, but I'll think on it.
 
 2. The HINT needs work too, as it seems pretty useless as it
 stands.  I'd have expected some short reference to the technique
 of re-executing the update in the trigger and then returning NULL.
 
In the previous (rather long) thread on the topic, it seemed that
most cases where people have hit this, the problem was easily solved
by moving the offending code to the AFTER trigger.  The technique of
re-issuing the command didn't turn up until much later.  I would bet
it will be the right thing to do 20% of the time when people get
such an error.  I don't want to leave the 80% solution out of the
hint, but if you don't think it's too verbose, I guess it would be a
good thing to mention the 20% solution, too.
 
 (BTW, does this patch require any documentation changes, and if so
 where?)
 
I've been wondering that.  Perhaps a paragraph or two with an
example on this page?:
 
http://www.postgresql.org/docs/devel/static/trigger-definition.html
 
 3. I modified heap_lock_tuple to also return a
 HeapUpdateFailureData, principally on the grounds that its API
 should be largely parallel to heap_update, but having done that I
 can't help wondering whether its callers are okay with skipping
 self-updated tuples.  I see that you changed EvalPlanQualFetch,
 but I'm not entirely sure that that is right; shouldn't it
 continue to ignore rows modified by the current command itself?
 
I made that change when working on the approach where safe
conflicts (like a DELETE from within the BEFORE DELETE trigger) were
quietly ignored.  Without that change, it didn't see the end of the
ctid chain, and didn't behave correctly.  I've been wondering if it
is still needed.  I had been planning to create a test case to try
to show that it was.  Maybe an UPDATE with a join to force multiple
row updates from the primary statement, followed by a triggered
UPDATE to the row.  If that doesn't need the EvalPlanQualFetch
change, I don't know what would.
 
 And you did not touch the other two callers, which definitely
 have got issues.  Here is an example
 
 [example]
 
 I'm not sure what to do about this.  If we throw an error there,
 there will be no way that the trigger can override the error
 because it will never get to run.  Possibly we could plow ahead
 with the expectation of throwing an error later if the trigger
 doesn't cancel the update/delete, but is it safe to do so if we
 don't hold lock on the tuple?  In any case that idea doesn't help
 with the remaining caller, ExecLockRows.
 
It will take me some time to work though the example and review the
relevant code.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reprise: pretty print viewdefs

2012-01-13 Thread Andrew Dunstan




On 01/13/2012 12:31 AM, Hitoshi Harada wrote:
  So my conclusion is it's better than nothing, but we could do better 
job here.


From timeline perspective, it'd be ok to apply this patch and improve 
more later in 9.3+. 



I agree, let's look at the items other than the target list during 9.3. 
But I do think this addresses the biggest pain point.


Thanks for the review.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Multithread Query Planner

2012-01-13 Thread Frederico

Hi folks.

Is there any restriction in create and start threads inside Postgres? 

I'm trying to develop a multithread planner, and some times is raised a 
exception of access memory.

I'm debugging the code to see if is a bug in the planner, but until now, I 
still not found. I tried to use the same memory context of root process and 
create a new context to each new thread, but doesn't worked.


Any tips?

Att,

Fred

Enviado via iPad
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multithread Query Planner

2012-01-13 Thread Christopher Browne

On Fri, Jan 13, 2012 at 3:14 PM, Frederico zepf...@gmail.com wrote:
 Hi folks.

 Is there any restriction in create and start threads inside Postgres?

 I'm trying to develop a multithread planner, and some times is raised a 
 exception of access memory.

 I'm debugging the code to see if is a bug in the planner, but until now, I 
 still not found. I tried to use the same memory context of root process and 
 create a new context to each new thread, but doesn't worked.


 Any tips?

Yes, don't try to use threads.

http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F

... threads are not currently used instead of multiple processes for
backends because:

Historically, threads were poorly supported and buggy.
An error in one backend can corrupt other backends if they're
threads within a single process
Speed improvements using threads are small compared to the
remaining backend startup time.
The backend code would be more complex.
Terminating backend processes allows the OS to cleanly and quickly
free all resources, protecting against memory and file descriptor
leaks and making backend shutdown cheaper and faster
Debugging threaded programs is much harder than debugging worker
processes, and core dumps are much less useful
Sharing of read-only executable mappings and the use of
shared_buffers means processes, like threads, are very memory
efficient
Regular creation and destruction of processes helps protect
against memory fragmentation, which can be hard to manage in
long-running processes

There's a pretty large burden of reasons *not* to use threads, and
while some of them have diminished in importance, most have not.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-13 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 
[rearranging so results directly follow statements]
 
 select * from test;
  id | parent | data | nchildren 
 ++--+---
   2 |  1 | root child A | 1
   4 |  2 | grandchild 1 | 0
   3 |  1 | root child B | 1
   5 |  3 | grandchild 2 | 0
   1 || root!| 2
 (5 rows)
 
 delete from test;
 DELETE 4
 
 select * from test;
  id | parent | data  | nchildren 
 ++---+---
   1 || root! | 0
 (1 row)
 
And other minor updates to the data column can result in totally
different sets of rows left over after a DELETE from the table with
no WHERE clause.  It makes me pretty queasy whenever the semantics
of a statement can depend on the order of tuples in the heap.  It's
pretty hard to view this particular case as anything other than a
bug.
 
 the reason being that when the outer delete arrives at the last
 (root!) row, GetTupleForTrigger fails because the tuple is already
 self-updated, and we treat that as canceling the outer delete
 action.
 
 I'm not sure what to do about this.  If we throw an error there,
 there will be no way that the trigger can override the error
 because it will never get to run.  Possibly we could plow ahead
 with the expectation of throwing an error later if the trigger
 doesn't cancel the update/delete, but is it safe to do so if we
 don't hold lock on the tuple?  In any case that idea doesn't help
 with the remaining caller, ExecLockRows.
 
I'm still trying to sort through what could be done at the source
code level, but from a user level I would much rather see an error
than such surprising and unpredictable behavior.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-13 Thread Tom Lane

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not sure what to do about this.  If we throw an error there,
 there will be no way that the trigger can override the error
 because it will never get to run.  Possibly we could plow ahead
 with the expectation of throwing an error later if the trigger
 doesn't cancel the update/delete, but is it safe to do so if we
 don't hold lock on the tuple?  In any case that idea doesn't help
 with the remaining caller, ExecLockRows.
 
 I'm still trying to sort through what could be done at the source
 code level, but from a user level I would much rather see an error
 than such surprising and unpredictable behavior.

I don't object to throwing an error by default.  What I'm wondering is
what would have to be done to make such updates work safely.  AFAICT,
throwing an error in GetTupleForTrigger would preclude any chance of
coding a trigger to make this work, which IMO greatly weakens the
argument that this whole approach is acceptable.

In this particular example, I think it would work just as well to do the
reference-count updates in AFTER triggers, and maybe the short answer
is to tell people they have to do it like that instead of in BEFORE
triggers.  However, I wonder what use-case led you to file bug #6123 to
begin with.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-13 Thread Tom Lane

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 I'm also fine with generating an error for such dirty tricks, and I
 agree that if that's indeed possible we should make the message
 general enough to cover that case.  Nothing comes to mind at the
 moment, but I'll think on it.

What do you think of

ERROR: tuple to be updated was already modified by an operation triggered by 
the UPDATE command
HINT: Consider using an AFTER trigger instead of a BEFORE trigger to propagate 
changes to other rows.

(s/update/delete/ for the DELETE case of course)

The phrase triggered by seems slippery enough to cover cases such as a
volatile function executed by the UPDATE.  The HINT doesn't cover that
case of course, but we have a ground rule that HINTs can be wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-13 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 
 What do you think of
 
 ERROR: tuple to be updated was already modified by an operation
 triggered by the UPDATE command
 HINT: Consider using an AFTER trigger instead of a BEFORE trigger
 to propagate changes to other rows.
 
 (s/update/delete/ for the DELETE case of course)
 
 The phrase triggered by seems slippery enough to cover cases
 such as a volatile function executed by the UPDATE.  The HINT
 doesn't cover that case of course, but we have a ground rule that
 HINTs can be wrong.
 
Looks good to me.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-13 Thread Josh Berkus

Hackers,

It occurs to me that I would find it quite personally useful if the
vacuumdb utility was multiprocess capable.

For example, just today I needed to manually analyze a database with
over 500 tables, on a server with 24 cores.   And I needed to know when
the analyze was done, because it was part of a downtime.  I had to
resort to a python script.

I'm picturing doing this in the simplest way possible: get the list of
tables and indexes, divide them by the number of processes, and give
each child process its own list.

Any reason not to hack on this for 9.3?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers

2012-01-13 Thread Robert Haas

On Tue, Nov 29, 2011 at 10:10 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Nov 29, 2011 at 3:37 AM, Daniel Farina dan...@heroku.com wrote:
 Hmm, just to prod this thread: has any fix for this been committed?
 After Nikhil confirmed that this bug could cause pg_dump to not be
 able to operate without direct catalog surgery I have encountered this
 bug (and treated its symptoms in the same manner) twice.  I tried to
 do my homework in a backbranch, but am not seeing anything beaming out
 at me.

 I have plans to try to improve this, but it's one of those things that
 I care about more than the people who write the checks do, so it
 hasn't quite gotten to the top of the priority list yet.

All right... I have a patch that I think fixes this, at least so far
as relations are concerned.  I rewrote
RangeVarGetAndCheckCreationNamespace() extensively, did surgery on its
callers, and then modified CREATE OR REPLACE VIEW and ALTER relkind
.. SET SCHEMA to make use of it rather than doing things as they did
before.

So this patch prevents (1) concurrently dropping a schema in which a
new relation is being created, (2) concurrently dropping a schema into
which an existing relation is being moved, and (3) using CREATE OR
REPLACE VIEW to attempt to obtain a lock on a relation you don't own
(the command would eventually fail, of course, but if there were, say,
an outstanding AccessShareLock on the relation, you'd queue up for the
lock and thus prevent any further locks from being granted, despite
your lack of any rights on the target).

It doesn't fix any of the non-relation object types.  That would be
nice to do, but I think it's material for a separate patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


concurrent-drop-schema.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-13 Thread Jan Lentfer


Am 13.01.2012 22:50, schrieb Josh Berkus:

It occurs to me that I would find it quite personally useful if the
vacuumdb utility was multiprocess capable.

For example, just today I needed to manually analyze a database with
over 500 tables, on a server with 24 cores.   And I needed to know when
the analyze was done, because it was part of a downtime.  I had to
resort to a python script.

I'm picturing doing this in the simplest way possible: get the list of
tables and indexes, divide them by the number of processes, and give
each child process its own list.

Any reason not to hack on this for 9.3?


I don't see any reason not to do it, but plenty to do it.
Right now I have systems hosting many databases, I need to vacuum full 
from time to time. I have wrapped vacuumdb with a shell script to 
actually use all the capacity that is available. A vacuumdb -faz just 
isn't that usefull on large machines anymore.


Jan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-13 Thread Andres Freund

On Friday, January 13, 2012 10:50:32 PM Josh Berkus wrote:
 Hackers,
 
 It occurs to me that I would find it quite personally useful if the
 vacuumdb utility was multiprocess capable.
 
 For example, just today I needed to manually analyze a database with
 over 500 tables, on a server with 24 cores.   And I needed to know when
 the analyze was done, because it was part of a downtime.  I had to
 resort to a python script.
 
 I'm picturing doing this in the simplest way possible: get the list of
 tables and indexes, divide them by the number of processes, and give
 each child process its own list.
That doesn't sound like a good idea. Its way too likely that you will end up 
with one backend doing all the work because it got some big tables.

I don't think this task deserves using threads or subprocesses. Multiple 
connections from one process seems way more sensible and mostly avoids the 
above problem.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-13 Thread Euler Taveira de Oliveira

On 13-01-2012 18:50, Josh Berkus wrote:
 It occurs to me that I would find it quite personally useful if the
 vacuumdb utility was multiprocess capable.
 
It is in the mid of my TODO list. reindexdb is in the plans too.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-13 Thread Christopher Browne

On Fri, Jan 13, 2012 at 4:50 PM, Josh Berkus j...@agliodbs.com wrote:
 It occurs to me that I would find it quite personally useful if the
 vacuumdb utility was multiprocess capable.

 For example, just today I needed to manually analyze a database with
 over 500 tables, on a server with 24 cores.   And I needed to know when
 the analyze was done, because it was part of a downtime.  I had to
 resort to a python script.

 I'm picturing doing this in the simplest way possible: get the list of
 tables and indexes, divide them by the number of processes, and give
 each child process its own list.

I think simplest isn't *quite* best...

There's the risk that all the big tables get tied to one child, and so
the one child is doing them serially.

Better:

Have two logical tasks:
a) A process that manages the list, and
b) Child processes doing vacuums.

Each time a child completes a table, it asks the parent for another one.

So the tendency will be that if there are 8 big tables, and 12 child
processes, it's *certain* that the 8 big tables will be spread across
the children.

It guarantees that the child processes will all be busy until there
are fewer tables left than there are child processes.

-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #

2012-01-13 Thread Josh Berkus

On 1/13/12 2:12 PM, Euler Taveira de Oliveira wrote:
 On 13-01-2012 18:50, Josh Berkus wrote:
 It occurs to me that I would find it quite personally useful if the
 vacuumdb utility was multiprocess capable.

 It is in the mid of my TODO list. reindexdb is in the plans too.

I'm even happier to have someone else do it.  ;-)


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-01-13 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Maybe we should try to split the baby here and defer the question of
 whether to expose any of the parse tree internals, and if so how much,
 to a future release.  It seems to me that we could design a fairly
 useful set of functionality around AFTER-CREATE, BEFORE-DROP, and
 maybe even AFTER-ALTER triggers without exposing any parse tree
 details.

Please find attached v5 of the patch, with nodeToString() support removed.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



command-trigger.v5.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-01-13 Thread Andres Freund

On Monday, December 19, 2011 07:37:46 PM Robert Haas wrote:
 Maybe we should try to split the baby here and defer the question of
 whether to expose any of the parse tree internals, and if so how much,
 to a future release. 
I personally think this is an error and those details should at least be 
available on the c level (e.g. some pg_command_trigger_get_plan() function, 
only available via C) to allow sensible playing around with that knowledge. I 
don't really see making progress towards a nice interface unless we get 
something to play around with out there.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Well, I disagree.  The fact that all-visible info can't be trusted in
 standby mode is a problem that has existed since Hot Standby was
 committed, and I don't feel obliged to fix it just because I was
 involved in developing a new feature that happens to rely on

I'm very sorry to read that, because I sure feel like that's exactly
what us non commiters are asked to be doing when cooking any patch.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2012-01-13 Thread Dimitri Fontaine

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 I'm having a hard time seeing why this is considered a feature.  It
 seems to me what is being proposed is a mode with no higher
 integrity guarantee than asynchronous replication, but latency
 equivalent to synchronous replication.  I can see where it's
 tempting to want to think it gives something more in terms of
 integrity guarantees, but when I think it through, I'm not really
 seeing any actual benefit.

Same here, so what I think is that the new recv and write modes that
Fujii is working on could maybe be demoted from sync variant, while not
being really async ones.  Maybe “eager” or some other term.

It seems to me that would answer the OP use case and your remark here.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multithread Query Planner

2012-01-13 Thread Dimitri Fontaine

Christopher Browne cbbro...@gmail.com writes:
 Yes, don't try to use threads.

 http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F

 ... threads are not currently used instead of multiple processes for
 backends because:

I would only add that the backend code is really written in a process
based perspective, with a giant number of private variables that are in
fact global variables.

Trying to “clean” that out in order to get to threads… wow.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-01-13 Thread Dimitri Fontaine

Andres Freund and...@anarazel.de writes:
 I personally think this is an error and those details should at least be 
 available on the c level (e.g. some pg_command_trigger_get_plan() function, 
 only available via C) to allow sensible playing around with that knowledge. I 
 don't really see making progress towards a nice interface unless we get 
 something to play around with out there.

If you target C coded triggers then all you need to do is provide a
pointer to the Node *parsetree, I would think.  What else?

The drawback though is still the same, the day you do that you've
proposed a public API and changing the parsetree stops being internal
refactoring.  The way around this problem is that if you want a command
trigger in C, just write an extension that implements the Process
Utility hook.  Bonus, you can have that working with already released
versions of PostgreSQL.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review of: explain / allow collecting row counts without timing info

2012-01-13 Thread Tomas Vondra

On 13.1.2012 18:07, Josh Berkus wrote:
 Eric's review follows:
 
 Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from
 2012-01-12 git checkout.
 
 Patch applied fine.
 
 'make check' results in failures when this patch is put into place.
 
 
  6 of 128 tests failed.
 
 
 Here are the relevant failures.
 
 parallel group (2 tests):  create_view create_index
  create_index ... FAILED
 parallel group (9 tests):  create_cast create_aggregate drop_if_exists
 typed_table vacuum constraints create_table_like triggers inherit
  inherit  ... FAILED
 parallel group (20 tests):  select_distinct_on btree_index update random
 select_distinct select_having namespace delete case hash_index union
 select_implicit select_into transactions portals subselect arrays
 aggregates join prepared_xacts
  join ... FAILED
  aggregates   ... FAILED
 parallel group (15 tests):  combocid portals_p2 advisory_lock xmlmap
 tsdicts guc functional_deps dependency select_views cluster tsearch
 window foreign_data foreign_key bitmapops
  foreign_data ... FAILED
 parallel group (19 tests):  limit prepare copy2 conversion xml plancache
 returning temp sequence without_oid with rowtypes truncate polymorphism
 domain largeobject rangefuncs alter_table plpgsql
  alter_table  ... FAILED
 

Fixed. The default value of TIMING option did not work as intended, it
was set to true even for plain EXPLAIN (without ANALYZE). In that case
the EXPLAIN failed.

Tomas
diff --git a/contrib/auto_explain/auto_explain.c 
b/contrib/auto_explain/auto_explain.c
index 61da6a2..9fc0ae1 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -23,6 +23,7 @@ static intauto_explain_log_min_duration = -1; /* msec or 
-1 */
 static bool auto_explain_log_analyze = false;
 static bool auto_explain_log_verbose = false;
 static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_timing  = false;
 static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
 
@@ -133,6 +134,17 @@ _PG_init(void)
 NULL,
 NULL);
 
+   DefineCustomBoolVariable(auto_explain.log_timing,
+Collect timing data, 
not just row counts.,
+NULL,
+
auto_explain_log_timing,
+true,
+PGC_SUSET,
+0,
+NULL,
+NULL,
+NULL);
+
EmitWarningsOnPlaceholders(auto_explain);
 
/* Install hooks. */
@@ -170,7 +182,12 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* Enable per-node instrumentation iff log_analyze is required. 
*/
if (auto_explain_log_analyze  (eflags  
EXEC_FLAG_EXPLAIN_ONLY) == 0)
{
-   queryDesc-instrument_options |= INSTRUMENT_TIMER;
+   if (auto_explain_log_timing)
+   queryDesc-instrument_options |= 
INSTRUMENT_TIMER;
+   else
+   queryDesc-instrument_options |= 
INSTRUMENT_ROWS;
+
+
if (auto_explain_log_buffers)
queryDesc-instrument_options |= 
INSTRUMENT_BUFFERS;
}
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 8a9c9de..4488956 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -40,6 +40,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] replaceable 
class=parameterstatement/replac
 VERBOSE [ replaceable class=parameterboolean/replaceable ]
 COSTS [ replaceable class=parameterboolean/replaceable ]
 BUFFERS [ replaceable class=parameterboolean/replaceable ]
+TIMING [ replaceable class=parameterboolean/replaceable ]
 FORMAT { TEXT | XML | JSON | YAML }
 /synopsis
  /refsynopsisdiv
@@ -172,6 +173,20 @@ ROLLBACK;
/varlistentry
 
varlistentry
+termliteralTIMING/literal/term
+listitem
+ para
+  Include information on timing for each node. Specifically, include the
+  actual startup time and time spent in the node. This may involve a lot
+  of gettimeofday calls, and on some systems this means significant
+  slowdown of the query. 
+  This parameter may only be used when literalANALYZE/literal is also
+  enabled.  It defaults to

Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-13 Thread Andrew Dunstan




On 01/12/2012 09:28 PM, Alex Hunsaker wrote:
Util.c/o not depending on plperl_helpers.h was also throwing me for a 
loop so I fixed it and SPI.c... Thoughts? 


Basically looks good, but I'm confused by this:

   do language plperl $$ elog('NOTICE', ${^TAINT}); $$;



Why is NOTICE quoted here? It's constant that we've been careful to set up.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-13 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 
 In this particular example, I think it would work just as well to
 do the reference-count updates in AFTER triggers, and maybe the
 short answer is to tell people they have to do it like that
 instead of in BEFORE triggers.
 
I think that is quite often the right answer.
 
 However, I wonder what use-case led you to file bug #6123 to begin
 with.
 
In our Circuit Court software, we have about 1600 trigger functions
on about 400 tables, and this summer we converted them from our Java
middle tier framework to native PostgreSQL triggers.  Since we had
been writing them in our interpretation of ANSI SQL trigger code,
parsing that, and using the parse tree to build a Java class for
each trigger, we were able to generate the PostgreSQL trigger
functions and CREATE TRIGGER statement mechanically (from the parse
tree), with pretty good success.  In testing, though, our business
analysts noticed a number of situations where an attempt to delete a
row actually deleted related rows which should have gone away with
the row they were directly trying to delete, but the target row was
still there.  A few days of investigation, including stepping
through query execution in gdb, turned up this issue.
 
Having identified (at least one flavor of) the problem, we grepped
the source code for the BEFORE triggers for UPDATE and DELETE
statements, and were able to fix a number of them by moving code to
AFTER triggers or setting values into NEW fields rather than running
an UPDATE.  So far, so good.
 
But there were a number of situations where the DELETE of a row
needed to cause related rows in other tables to be deleted, and for
one reason or another a foreign key with ON DELETE CASCADE was not
an option.  At the same time, triggers on some of those related
tables needed to update summary or redundant data in other tables
for performance reasons.  Because a number of tables could be
involved, and some of the triggers (at the lower levels) could be
AFTER triggers and still contribute to the problem, (1) we had no
reliable way to ensure we would find all of the cases of this on all
code paths, and (2) due to referential integrity and other trigger-
based validations, it would be hard to restructure such that the
DELETE of the child rows was not done on the BEFORE DELETE trigger
of the parent.
 
The patch we've been using in production throws errors if the row
for a BEFORE UPDATE trigger is updated by another statement.  (Well,
OK, you showed me that it really is throwing an error if the row is
updated and there has been another statement executed, but as long
as it is *more* strict than we actually need, we won't corrupt data
-- and the current rule hasn't been hard for us to live with.)  It
allows the DELETE to proceed if the tuple is updated from within the
BEFORE DELETE trigger.  We would need to tweak some triggers to move
to the approach embodied in the recent patch drafts, but the IF
FOUND logic suggested by Florian looks like it will cover all of our
use cases, and they should be fairly easy to find with grep.
 
Hopefully this answers your question.  I went looking for details on
particular failures here, but didn't have luck with so far.  I can
try again if more detail like that would help.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.

2012-01-13 Thread Alex Hunsaker

On Fri, Jan 13, 2012 at 16:07, Andrew Dunstan and...@dunslane.net wrote:


 On 01/12/2012 09:28 PM, Alex Hunsaker wrote:

 Util.c/o not depending on plperl_helpers.h was also throwing me for a loop
 so I fixed it and SPI.c... Thoughts?


 Basically looks good, but I'm confused by this:

   do language plperl $$ elog('NOTICE', ${^TAINT}); $$;



 Why is NOTICE quoted here? It's constant that we've been careful to set up.

No reason.

[ Err well It was just part of me flailing around while trying to
figure out why elog was still crashing even thought I had the issue
fixed. The real problem was Util.o was not being recompiled due to
Util.c not depending on plperl_helpers.h) ]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remembering bug #6123

2012-01-13 Thread Tom Lane

Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 However, I wonder what use-case led you to file bug #6123 to begin
 with.

 [ details ]
 
 Hopefully this answers your question.  I went looking for details on
 particular failures here, but didn't have luck with so far.  I can
 try again if more detail like that would help.

Well, the bottom line that's concerning me here is whether throwing
errors is going to push anyone's application into an unfixable corner.
I'm somewhat encouraged that your Circuit Courts software can adapt
to it, since that's certainly one of the larger and more complex
applications out there.  Or at least I would be if you had actually
verified that the CC code was okay with the recently-proposed patch
versions.  Do you have any thorough tests you can run against whatever
we end up with?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Command Triggers

2012-01-13 Thread Andres Freund

On Friday, January 13, 2012 11:53:32 PM Dimitri Fontaine wrote:
 Andres Freund and...@anarazel.de writes:
  I personally think this is an error and those details should at least be
  available on the c level (e.g. some pg_command_trigger_get_plan()
  function, only available via C) to allow sensible playing around with
  that knowledge. I don't really see making progress towards a nice
  interface unless we get something to play around with out there.
 
 If you target C coded triggers then all you need to do is provide a
 pointer to the Node *parsetree, I would think.  What else?
Yes.

Being able to turn that into a statement again is still valuable imo.

 The drawback though is still the same, the day you do that you've
 proposed a public API and changing the parsetree stops being internal
 refactoring.
Yes, sure. I don't particularly care though actually. Changing some generic 
guts of trigger functions is not really that much of a problem compared to the 
other stuff involoved in a version migration.

 The way around this problem is that if you want a command
 trigger in C, just write an extension that implements the Process
 Utility hook.
The point is that with CREATE COMMAND TRIGGER only the internal part of the 
triggers need to change. No the external interface. Which is a big difference 
from my pov.
Also hooks are relatively hard to stack, i.e. its hard to use them sensibly 
from multiple independent projects. They also cannot be purely installed via 
extensions/sql.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Noah Misch

On Fri, Jan 13, 2012 at 12:08:05PM -0500, Robert Haas wrote:
 On Fri, Jan 13, 2012 at 11:13 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Many WAL records have latestRemovedXid on them. We can use the same
  idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the
  latest vacrelstats-latestRemovedXid. That then creates a recovery
  snapshot conflict that would cancel any query that might then see a
  page of the vis map that was written when the xmin was later than on
  the standby. If replication disconnects briefly and a vimap bit is
  updated that would cause a problem, just as the same situation would
  cause a problem because of other record types.
 
 That could create a lot of recovery conflicts when
 hot_standby_feedback=off, I think, but it might work when
 hot_standby_feedback=on.  I don't fully understand the
 latestRemovedXid machinery, but I guess the idea would be to kill any
 standby transaction whose proc-xmin precedes the oldest committed
 xmin or xmax on the page.  If hot_standby_feedback=on then there
 shouldn't be any, except in the case where it's just been enabled or
 the SR connection is bouncing.

FWIW, Tom aired the same idea here:
http://archives.postgresql.org/message-id/27743.1291135...@sss.pgh.pa.us

While reviewing the crash-safe visibility map patch, I echoed the idea and
explained why the extra conflicts would be immaterial:
http://archives.postgresql.org/message-id/20110618133703.ga1...@tornado.leadboat.com

 Also, what happens if an all-visible bit gets set on the standby
 through some other mechanism - e.g. restored from an FPI or
 XLOG_HEAP_NEWPAGE?  I'm not sure whether we ever do an FPI of the
 visibility map page itself, but we certainly do it for the heap pages.
  So it might be that this infrastructure would (somewhat bizarrely)
 trust the visibility map bits but not the PD_ALL_VISIBLE bits.

Simon spoke to the FPI side of the question.  For heap pages, the
XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET
TABLESPACE.  For the last, we will have already logged any PD_ALL_VISIBLE bits
through normal channels.  CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE or
visibility map bits.  When, someday, they do, we might emit a separate WAL
record to force the recovery conflict.  However, CLUSTER/VACUUM FULL already
remove tuples still-visible to standby snapshots without provoking a recovery
conflict.  (Again only with hot_standby_feedback=off.)

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Speed dblink using alternate libpq tuple storage

2012-01-13 Thread Greg Smith

One patch that fell off the truck during a turn in the November 
CommitFest was Allow substitute allocators for PGresult.  Re-reading 
the whole thing again, it actually turned into a rather different 
submission in the middle, and I know I didn't follow that shift 
correctly then.  I'm replying to its thread but have changed the subject 
to reflect that change.  From a procedural point of view, I don't feel 
right kicking this back to its author on a Friday night when the 
deadline for resubmitting it would be Sunday.  Instead I've refreshed 
the patch myself and am adding it to the January CommitFest.  The new 
patch is a single file; it's easy enough to split out the dblink changes 
if someone wants to work with the pieces separately.


After my meta-review I think we should get another reviewer familiar 
with using dblink to look at this next.  This is fundamentally a 
performance patch now.  Some results and benchmarking code were 
submitted along with it; the other issues are moot if those aren't 
reproducible.  The secondary goal for a new review here is to provide 
another opinion on the naming issues and abstraction concerns raised so far.


To clear out the original line of thinking, this is not a replacement 
low-level storage allocator anymore.  The idea of using such a mechanism 
to help catch memory leaks has also been dropped.


Instead this adds and documents a new path for libpq callers to more 
directly receive tuples, for both improved speed and lower memory 
usage.  dblink has been modified to use this new mechanism.  
Benchmarking by the author suggests no significant change in libpq speed 
when only that change was made, while the modified dblink using the new 
mechanism was significantly faster.  It jumped from 332K tuples/sec to 
450K, a 35% gain, and had a lower memory footprint too.  Test 
methodology and those results are at 
http://archives.postgresql.org/pgsql-hackers/2011-12/msg8.php


Robert Haas did a quick code review of this already, it along with 
author response mixed in are at 
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01149.php  I see 
two areas of contention there:


-There are several naming bits no one is happy with yet.  Robert didn't 
like some of them, but neither did Kyotaro.  I don't have an opinion 
myself.  Is it the case that some changes to the existing code's 
terminology are what's actually needed to make this all better?  Or is 
this just fundamentally warty and there's nothing to be done about it.  
Dunno.


-There is an abstraction wrapper vs. coding convenience trade-off 
centering around PGresAttValue.  It sounded to me like it raised always 
fun questions like where's the right place for the line between 
lipq-fe.h and libpq-int.h to be?


dblink is pretty popular, and this is a big performance win for it.  If 
naming and API boundary issues are the worst problems here, this sounds 
like something well worth pursuing as part of 9.2's still advancing 
performance theme.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..f48fe4f 100644
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
*** typedef struct remoteConn
*** 63,73 
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
- static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
--- 63,85 
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
+ typedef struct storeInfo
+ {
+ 	Tuplestorestate *tuplestore;
+ 	int nattrs;
+ 	AttInMetadata *attinmeta;
+ 	MemoryContext oldcontext;
+ 	char *attrvalbuf;
+ 	void **valbuf;
+ 	size_t *valbufsize;
+ 	bool error_occurred;
+ 	bool nummismatch;
+ } storeInfo;
+ 
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
*** static char *escape_param_str(const char
*** 90,95 
--- 102,111 
  static void validate_pkattnums(Relation rel,
     int2vector *pkattnums_arg, int32 pknumatts_arg,
     int **pkattnums, int *pknumatts);
+ static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+ static void finishStoreInfo(storeInfo *sinfo);
+ static void *addTuple(PGresult *res, AddTupFunc func, int id, size_t size);
+ 
  
  /* Global */
  static remoteConn *pconn = NULL;
*** dblink_fetch(PG_FUNCTION_ARGS)
*** 503,508

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 5:37 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Well, I disagree.  The fact that all-visible info can't be trusted in
 standby mode is a problem that has existed since Hot Standby was
 committed, and I don't feel obliged to fix it just because I was
 involved in developing a new feature that happens to rely on

 I'm very sorry to read that, because I sure feel like that's exactly
 what us non commiters are asked to be doing when cooking any patch.

There is obviously room for disagreement over what should or should
not be within the scope of any particular development project, and we
may not always agree on that.  However, I do try to act in good faith
and apply, as much as I can, the same standard to other people's
patches as I do to my own.  In general, when I ask for the scope of
something to be expanded, I try to limit myself to something that I
feel can be accomplished in a day or two of development and which are
closely related to the core of what the patch is about.  There are
exceptions, of course, where I feel that more than that is needed, but
there are also many examples on file of me arguing for a narrower
scope - either that an already-written patch should be divided into
pieces so that the uncontroversial pieces can be committed, or that a
project need not include every element that anyone wants in order to
be acceptable.  I try very hard to be sensitive to the fact that
non-committers also have stuff they want to get done, which is why I
am one of the most prolific committers of the work of non-committers,
even extending in numerous instances (most recently, today) to
expending rather large amounts of time rewriting stuff that neither I
nor my employer have a huge stake in to reach the quality of code that
I think is needed for commit, or to satisfy the objections of other
community members that I may not personally share.

In this particular case, I am not even the person who committed the
patch.  Tom committed index-only scans well before I thought it was
ready for prime time, and then did a whole lot of work to improve it
afterwards.  Even now, I believe there are significant issues
remaining.  There is no EXPLAIN support (for which I supported a patch
today); the statistics might need work (as Magnus complained about in
response to my EXPLAIN patch); we don't yet support index-only quals
(i.e. even though you know you'll eventually need the heap tuple,
evaluate the quals you can using just the index tuple in the hopes of
avoiding some heap fetches); and I'm pretty worried that we'll run
across cases where too much of the heap is not-all-visible and we
either don't use index only scans or we use them but they don't
perform well - a problem which I suspect will require adjustment of
our vacuuming algorithm to avoid.  With the exception of EXPLAIN
support which I think is merely an oversight, all of those issues,
including the problems in Hot Standby mode, remain because nobody
knows exactly what we ought to do to fix them.  When somebody figures
it out, I predict they'll get fixed pretty quickly.   Now, whether or
not that will be in time for 9.2, or whether I'll be the one to write
the code, I don't know.  But in the meantime, there are really only
two choices here: we can understand that the feature we have has some
limitations, or we can revert it.  Considering the amount of time that
was put into it, particular by Tom, I can't imagine that we really
want to revert it, but then again I'm relatively shocked that I'm
being asked, one day before final CommitFest starts, to drop
everything and work on a limitation that I thought had been
well-understood by everyone for three years and that it's not clear we
know how to lift.  It may be that I didn't do a sufficiently good job
communicating that this limitation was bound to exist, and I apologize
for that.  I did mention it on-list multiple times, as did Heikki, and
I thought it was understood, but apparently not.  If people who didn't
object to committing the patch would have done so had they realized
this, and don't feel that I made it clear enough, I am sincerely
sorry.  I would have made the same argument then that I am making now,
but at least we would have hashed it out one way or the other before
moving forward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 8:02 PM, Noah Misch n...@leadboat.com wrote:
 Simon spoke to the FPI side of the question.  For heap pages, the
 XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET
 TABLESPACE.  For the last, we will have already logged any PD_ALL_VISIBLE bits
 through normal channels.  CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE or
 visibility map bits.  When, someday, they do, we might emit a separate WAL
 record to force the recovery conflict.  However, CLUSTER/VACUUM FULL already
 remove tuples still-visible to standby snapshots without provoking a recovery
 conflict.  (Again only with hot_standby_feedback=off.)

Is the big about CLUSTER/VACUUM FULL a preexisting bug?  If not, why not?

Other than that, it seems like we might be converging on a workable
solution: if hot_standby_feedback=off, disable index-only scans for
snapshots taken during recovery; if hot_standby_feedback=on, generate
recovery conflicts when a snapshot's xmin precedes the youngest xmin
on a page marked all-visible, but allow the use of index-only scans,
and allow sequential scans to trust PD_ALL_VISIBLE.  Off the top of my
head, I don't see a hole in that logic...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Robert Haas

On Fri, Jan 13, 2012 at 12:18 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jan 13, 2012 at 5:08 PM, Robert Haas robertmh...@gmail.com wrote:
 Also, what happens if an all-visible bit gets set on the standby
 through some other mechanism - e.g. restored from an FPI or
 XLOG_HEAP_NEWPAGE?  I'm not sure whether we ever do an FPI of the
 visibility map page itself, but we certainly do it for the heap pages.
  So it might be that this infrastructure would (somewhat bizarrely)
 trust the visibility map bits but not the PD_ALL_VISIBLE bits.  I'm
 hoping Heikki or Tom will comment on this thread, because I think
 there are a bunch of subtle issues here and that we could easily screw
 it up by trying to plow through the problem too hastily.

 An FPI can't change the all visible flag. If it did, it would imply
 that we'd skipped generation or replay of an XLOG_HEAP2_VISIBLE
 record, or that we're doing crash recovery/inital startup and HS is
 not yet enabled.

Interesting.  I hadn't considered that point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disabled features on Hot Standby

2012-01-13 Thread Noah Misch

On Sat, Jan 14, 2012 at 12:42:02AM -0500, Robert Haas wrote:
 On Fri, Jan 13, 2012 at 8:02 PM, Noah Misch n...@leadboat.com wrote:
  Simon spoke to the FPI side of the question. ?For heap pages, the
  XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET
  TABLESPACE. ?For the last, we will have already logged any PD_ALL_VISIBLE 
  bits
  through normal channels. ?CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE 
  or
  visibility map bits. ?When, someday, they do, we might emit a separate WAL
  record to force the recovery conflict. ?However, CLUSTER/VACUUM FULL already
  remove tuples still-visible to standby snapshots without provoking a 
  recovery
  conflict. ?(Again only with hot_standby_feedback=off.)
 
 Is the big about CLUSTER/VACUUM FULL a preexisting bug?  If not, why not?

I suspect it goes back to 9.0, yes.  I'm on the fence regarding whether to call
it a bug or an unimplemented feature.  In any case, +1 for improving it.

 Other than that, it seems like we might be converging on a workable
 solution: if hot_standby_feedback=off, disable index-only scans for
 snapshots taken during recovery; if hot_standby_feedback=on, generate
 recovery conflicts when a snapshot's xmin precedes the youngest xmin
 on a page marked all-visible, but allow the use of index-only scans,
 and allow sequential scans to trust PD_ALL_VISIBLE.  Off the top of my
 head, I don't see a hole in that logic...

I wouldn't check hot_standby_feedback.  Rather, mirror what we do for
XLOG_HEAP2_CLEAN.  Unconditionally add an xid to xl_heap_visible bearing the
youngest xmin on the page (alternately, some convenient upper bound thereof).
Have heap_xlog_visible() call ResolveRecoveryConflictWithSnapshot() on that
xid.  Now, unconditionally trust PD_ALL_VISIBLE and permit index-only scans.

The user's settings of hot_standby_feedback, vacuum_defer_cleanup_age,
max_standby_streaming_delay and max_standby_archive_delay will drive the
consequential trade-off:  nothing, query cancellation, or recovery delay.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

79 matches

Mail list logo