Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-27 Thread Martijn van Oosterhout
On Wed, Jun 27, 2012 at 12:58:47AM +0200, Nils Goroll wrote:
 So it looks like using pthread_mutexes could at least be an option on Linux.
 
 Using futexes directly could be even cheaper.

Note that below this you only have the futex(2) system call. Futexes
require all counter manipulation to happen in userspace, just like now,
so all the per architecture stuff remains.  On Linux pthread mutexes
are really just a thin wrapper on top of this.

The futex(2) system call merely provides an interface for handling the
blocking and waking of other processes and releasing locks on process
exit (so everything can still work after a kill -9).

So it's more a replacement for the SysV semaphores than anything else.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Schema version management

2012-06-27 Thread Joel Jacobson
Robert, thank you for keeping this thread alive.

Hopefully some more will join the discussion.

I'm still hopeful the community can manage to agree upon acceptable
tradeoffs and work-arounds to make this possible.

I think the benefits clearly outweighs the minor issues of filenames,
dumping order, etc.

On Tue, Jun 26, 2012 at 6:04 PM, Robert Haas robertmh...@gmail.com wrote:

 I don't think either of these problems ought to be a complete
 show-stopper.  It seems to me that the trade-off is that when object
 names are long, contain special characters, or are overloaded, we'll
 have to munge the names in some way to prevent collisions.  That could
 mean that the names are not 100% stable, which would possibly produce
 some annoyance if you're using a VCS to track changes, but maybe
 that's an acceptable trade-off, because it shouldn't happen very
 often.  If we could guararantee that identifiers less than 64
 characters which are not overloaded and contain no special characters
 requiring quoting end up in an eponymous file, I think that would be
 good enough to make most of our users pretty happy.  In other cases, I
 think the point would just be to make it work (with a funny name)
 rather than fail.


I agree. It's not a problem if the filename is not identical to the name of
the object, as long as the same name generates the same filename on
all architectures. Url escaping would work, but converting all non-ascii
characters to ascii would be nicer, and dropping any problematic characters,
or replacing them with _ or any other suitable character.

For the small fraction of users how have managed to find a good reason
to name a function this/is\a/good.name/of/a\function.. the filename
of such a function would be this_is_a_good_name_of_a_function.

As long as the objects are dumped in the same order, there will be no
merge problems when two developers commit changes of the same
file. I think pg_dump does a reasonable job already making sure the order is
always the same. How big is the problem, really?

It would of course be a little easier to keep track of changes and do
merging
if all overloaded functions would be kept in separate files, but I see that
as a
minor feature request. As long as all objects with the same name are kept in
separate files, that's good enough for my needs, and I have _a lot_ of
functions,
whereof quite a few are overloaded.




  \i /home/postgres/database/gluepay-split/aml/FUNCTION/get_linkid.sql
  \i /home/postgres/database/gluepay-split/aml/FUNCTION/set_address.sql

 It would be better to use \ir here rather than hard-code path names, I
 think.  Then you'd only need to require that all the files be in the
 same directory, rather than requiring them to be at a certain
 hard-coded place in the filesystem.


I fully agree!
I didn't know about the \ir feature.

Best regards,

Joel Jacobson


Re: [HACKERS] new --maintenance-db options

2012-06-27 Thread Amit Kapila

Tom Lane t...@sss.pgh.pa.us writes:
Amit Kapila amit.kap...@huawei.com writes:
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
 The implementation I've wanted to see for some time is that you can
 start a standalone backend, but it speaks FE/BE protocol to its caller
 (preferably over pipes, so that there is no issue whatsoever of where
 you can securely put a socket or anything like that).  

 Can't it be done like follow the FE/BE protocol, but call directly the
 server API's 
 at required places. 

 That wouldn't be easier, nor cleaner, and it would open us up to
 client-induced database corruption (from failure to follow APIs, crashes
 in the midst of an operation, memory stomps, etc).  We decided long ago
 that we would never support truly embedded operation in the sense of PG
 executing in the client's process/address space.  

Okay.

 I like the design
 suggested above because it has many of the good properties of an
 embedded database (in particular, no need to manage or contact a server)
 but still keeps the client code at arm's length.

In such a case will that standalone backend manage other processes like (wal
writer, checkpoint, ...) or no background processes like in current --single
mode.

Can there be any performance advantage also in such a mode as compare to
current when client and server on same m/c and uses Domain Socket?


With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plpython issue with Win64 (PG 9.2)

2012-06-27 Thread Asif Naeem
Hi,

On Windows 7 64bit, plpython is causing server crash with the following
test case i.e.

CREATE PROCEDURAL LANGUAGE 'plpython3u';
 CREATE OR REPLACE FUNCTION pymax (a integer, b integer)
   RETURNS integer
 AS $$
   if a  b:
 return a
   return b
 $$ LANGUAGE plpython3u;
 SELECT pymax(1, 2);


Server exit with the following exception i.e.

 Unhandled exception at 0x777a3483 in postgres.exe: 0xC0FD: Stack
overflow.

  plpython3.dll!PLy_traceback(char * * xmsg, char * * tbmsg, int *
tb_depth)  Line 174 C
  plpython3.dll!PLy_elog(int elevel, const char * fmt, ...)  Line 67
  C
  plpython3.dll!PLyUnicode_AsString(_object * unicode)  Line 96 C
  plpython3.dll!PLy_traceback(char * * xmsg, char * * tbmsg, int *
tb_depth)  Line 176 + 0x8 bytes C
  plpython3.dll!PLy_elog(int elevel, const char * fmt, ...)  Line 67
  C
  plpython3.dll!PLyUnicode_AsString(_object * unicode)  Line 96 C
  ...
  ...
  plpython3.dll!PLy_traceback(char * * xmsg, char * * tbmsg, int *
tb_depth)  Line 176 + 0x8 bytes C
  plpython3.dll!PLy_elog(int elevel, const char * fmt, ...)  Line 67
  C
  plpython3.dll!PLyUnicode_AsString(_object * unicode)  Line 96 C
  plpython3.dll!PLy_traceback(char * * xmsg, char * * tbmsg, int *
tb_depth)  Line 176 + 0x8 bytes C

Dbserver get stuck in the following call loop i.e.
... PLy_elog() - PLy_traceback() - PLyUnicode_AsString() -
PLyUnicode_Bytes() - PLy_elog() ...

I think primary reason that trigger this issue is when Function
PLyUnicode_Bytes() calls PyUnicode_AsEncodedString( ,WIN1252 /*Server
encoding*/, )  it fails with null. I built latest pg 9.2 source code with
python 3.2.2.3 by using Visual Studio 2010. Thanks.

Best Regards,
Muhammad Asif Naeem


Re: [HACKERS] experimental: replace s_lock spinlock code with pthread_mutex on linux

2012-06-27 Thread Nils Goroll
 Using futexes directly could be even cheaper.
 Note that below this you only have the futex(2) system call.
I was only referring to the fact that we could save one function and one library
call, which could make a difference for the uncontended case.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-27 Thread Andres Freund
On Tuesday, June 26, 2012 04:06:08 PM Andres Freund wrote:
 On Tuesday, June 26, 2012 04:01:26 PM Robert Haas wrote:
  On Fri, Jun 22, 2012 at 12:35 PM, Andres Freund and...@2ndquadrant.com
 
 wrote:
   Can you elaborate on that a bit?  What scenarios did you play around
   with, and what does win mean in this context?
   
   I had two machines connected locally and setup HS and my prototype
   between them (not at once obviously).
   The patch reduced all the average latency between both nodes (measured
   by 'ticker' rows arriving in a table on the standby), the jitter in
   latency and the amount of load I had to put on the master before the
   standby couldn't keep up anymore.
   
   I played with different loads:
   * multple concurrent ~50MB COPY's
   * multple concurrent ~50MB COPY's, pgbench
   * pgbench
   
   All three had a ticker running concurrently with synchronous_commit=off
   (because it shouldn't cause any difference in the replication pattern
   itself).
   
   The difference in averagelag and cutoff were smallest with just pgbench
   running alone and biggest with COPY running alone. Highjitter was most
   visible with just pgbench running alone but thats likely just because
   the average lag was smaller.
  
  OK, that sounds pretty promising.  I'd like to run a few performance
  tests on this just to convince myself that it doesn't lead to a
  significant regression in other scenarios.  Assuming that doesn't turn
  up anything major, I'll go ahead and commit this.
 
 Independent testing would be great, its definitely possible that I oversaw
 something although I obviously don't think so ;).
 
  Can you provide a rebased version?  It seems that one of the hunks in
  xlog.c no longer applies.
 
 Will do so. Not sure if I can finish it today though, I am in the midst of
 redoing the ilist and xlogreader patches. I guess tomorrow will suffice
 otherwise...
Ok, attached are two patches:
The first is the rebased version of the original patch with 
WalSndWakeupProcess renamed to WalSndWakeupProcessRequests (seems clearer).

The second changes WalSndWakeupRequest and WalSndWakeupProcessRequests into 
macros as you requested before. I am not sure if its a good idea or not.

Anything else?

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 164276afc60fa2451f28de996fcc54567e6168e2 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 29 May 2012 20:00:16 +0200
Subject: [PATCH 1/2] Overhaul walsender wakeup handling

The previous coding could miss xlog writeouts at several places. E.g. when wal
was written out by the background writer or even after a commit if
synchronous_commit=off.
This could lead to delays in sending data to the standby of up to 7 seconds.

To fix this move the responsibility of notification to the layer where the
neccessary information is actually present. We take some care not to do the
notification while we hold conteded locks like WALInsertLock or WalWriteLock
locks.

Document the preexisting fact that we rely on SetLatch to be safe from within
signal handlers and critical sections.

This removes the temporary bandaid from 2c8a4e9be2730342cbca85150a2a9d876aa77ff6
---
 src/backend/access/transam/twophase.c |   21 -
 src/backend/access/transam/xact.c |7 --
 src/backend/access/transam/xlog.c |   25 ++--
 src/backend/port/unix_latch.c |3 +++
 src/backend/port/win32_latch.c|4 
 src/backend/replication/walsender.c   |   41 -
 src/include/replication/walsender.h   |2 ++
 7 files changed, 67 insertions(+), 36 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index e8fb78b..7f198c2 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1042,13 +1042,6 @@ EndPrepare(GlobalTransaction gxact)
 
 	/* If we crash now, we have prepared: WAL replay will fix things */
 
-	/*
-	 * Wake up all walsenders to send WAL up to the PREPARE record immediately
-	 * if replication is enabled
-	 */
-	if (max_wal_senders  0)
-		WalSndWakeup();
-
 	/* write correct CRC and close file */
 	if ((write(fd, statefile_crc, sizeof(pg_crc32))) != sizeof(pg_crc32))
 	{
@@ -2045,13 +2038,6 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	/* Flush XLOG to disk */
 	XLogFlush(recptr);
 
-	/*
-	 * Wake up all walsenders to send WAL up to the COMMIT PREPARED record
-	 * immediately if replication is enabled
-	 */
-	if (max_wal_senders  0)
-		WalSndWakeup();
-
 	/* Mark the transaction committed in pg_clog */
 	TransactionIdCommitTree(xid, nchildren, children);
 
@@ -2133,13 +2119,6 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	XLogFlush(recptr);
 
 	/*
-	 * Wake up all walsenders to send WAL up to the ABORT PREPARED record
-	 * immediately if replication is enabled
-	 */
-	

Re: [HACKERS] [v9.3] Row-Level Security

2012-06-27 Thread Florian Pflug
On Jun27, 2012, at 07:18 , Kohei KaiGai wrote:
 The problem is the way to implement it.
 If we would have permission checks on planner stage, it cannot handle
 a case when user-id would be switched prior to executor stage, thus
 it needs something remedy to handle the scenario correctly.
 Instead of a unique plan per query, it might be a solution to generate
 multiple plans depending on user-id, and choose a proper one in
 executor stage.
 
 Which type of implementation is what everybody is asking for?

I think you need to

 a) Determine the user-id at planning time, and insert the matching
RLS clause

b1) Either re-plan the query if the user-id changes between planning
and execution time, which means making the user-id a part of the
plan-cache key.

b2) Or decree that for RLS purposes, it's the user-id at planning time,
not execution time, that counts.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Robert Haas
On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 So, here's a patch.  Instead of using POSIX shmem, I just took the
 expedient of using mmap() to map a block of MAP_SHARED|MAP_ANONYMOUS
 memory.  The sysv shm is still allocated, but it's just a copy of
 PGShmemHeader; the real shared memory is the anonymous block.  This
 won't work if EXEC_BACKEND is defined so it just falls back on
 straight sysv shm in that case.

 Um.  I hadn't thought about the EXEC_BACKEND interaction, but that seems
 like a bit of a showstopper.  I would not like to give up the ability
 to debug EXEC_BACKEND mode on Unixen.

 Would Posix shmem help with that at all?  Why did you choose not to
 use the Posix API, anyway?

It seemed more complicated.  If we use the POSIX API, we've got to
have code to find a non-colliding name for the shm, and we've got to
arrange to clean it up at process exit.  Anonymous shm doesn't require
a name and goes away automatically when it's no longer in use.

With respect to EXEC_BACKEND, I wasn't proposing to kill it, just to
make it continue to use a full-sized sysv shm.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proof concept - access to session variables on client side

2012-06-27 Thread Pavel Stehule
2012/6/27 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 Another thing I don't care for is the unannounced protocol extension.

 yes, it needs protocol extension and increasing version too. But I
 don't afraid about dissynchronisation - server doesn't send 'v'
 message when client doesn't support it.

 And you would know that how, exactly?

minor version of protocol can be used

http://archives.postgresql.org/pgsql-hackers/2011-12/msg00025.php

I don't know if this topic is done, I only remember this thread

Regards

Pavel Stehule


                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Magnus Hagander
On Wed, Jun 27, 2012 at 3:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 A.M. age...@themactionfaction.com writes:
 On 06/26/2012 07:30 PM, Tom Lane wrote:
 I solved this via fcntl locking.

 No, you didn't, because fcntl locks aren't inherited by child processes.
 Too bad, because they'd be a great solution otherwise.

 You claimed this last time and I replied:
 http://archives.postgresql.org/pgsql-hackers/2011-04/msg00656.php

 I address this race condition by ensuring that a lock-holding violator
 is the postmaster or a postmaster child. If such as condition is
 detected, the child exits immediately without touching the shared
 memory. POSIX shmem is inherited via file descriptors.

 This is possible because the locking API allows one to request which PID
 violates the lock. The child expects the lock to be held and checks that
 the PID is the parent. If the lock is not held, that means that the
 postmaster is dead, so the child exits immediately.

 OK, I went back and re-read the original patch, and I now agree that
 something like this is possible --- but I don't like the way you did
 it. The dependence on particular PIDs seems both unnecessary and risky.

 The key concept here seems to be that the postmaster first stakes a
 claim on the data directory by exclusive-locking a lock file.  If
 successful, it reduces that lock to shared mode (which can be done
 atomically, according to the SUS fcntl specification), and then holds
 the shared lock until it exits.  Spawned children will not initially
 have a lock, but what they can do is attempt to acquire shared lock on
 the lock file.  If fail, exit.  If successful, *check to see that the
 parent postmaster is still alive* (ie, getppid() != 1).  If so, the
 parent must have been continuously holding the lock, and the child has
 successfully joined the pool of shared lock holders.  Otherwise bail
 out without having changed anything.  It is the parent is still alive
 check, not any test on individual PIDs, that makes this work.

 There are two concrete reasons why I don't care for the
 GetPIDHoldingLock() way.  Firstly, the fact that you can get a blocking
 PID from F_GETLK isn't an essential part of the concept of file locking
 IMO --- it's just an incidental part of this particular API.  May I
 remind you that the reason we're stuck on SysV shmem in the first place
 is that we decided to depend on an incidental part of that API, namely
 nattch?  I would like to not require file locking to have any semantics
 more specific than a process can hold an exclusive or a shared lock on
 a file, which is auto-released at process exit.  Secondly, in an NFS
 world I don't believe that the returned l_pid value can be trusted for
 anything.  If it's a PID from a different machine then it might
 accidentally conflict with one on our machine, or not.

 Reflecting on this further, it seems to me that the main remaining
 failure modes are (1) file locking doesn't work, or (2) idiot DBA
 manually removes the lock file.  Both of these could be ameliorated
 with some refinements to the basic idea.  For (1), I suggest that
 we tweak the startup process (only) to attempt to acquire exclusive lock
 on the lock file.  If it succeeds, we know that file locking is broken,
 and we can complain.  (This wouldn't help for cases where cross-machine
 locking is broken, but I see no practical way to detect that.)
 For (2), the problem really is that the proposed patch conflates the PID
 file with the lock file, but people are conditioned to think that PID
 files are removable.  I suggest that we create a separate, permanently
 present file that serves only as the lock file and doesn't ever get
 modified (it need have no content other than the string Don't remove
 this!).  It'd be created by initdb, not by individual postmaster runs;
 indeed the postmaster should fail if it doesn't find the lock file
 already present.  The postmaster PID file should still exist with its
 current contents, but it would serve mostly as documentation and as
 server-contact information for pg_ctl; it would not be part of the data
 directory locking mechanism.

 I wonder whether this design can be adapted to Windows?  IIRC we do
 not have a bulletproof data directory lock scheme for Windows.
 It seems like this makes few enough demands on the lock mechanism
 that there ought to be suitable primitives available there too.

I assume you're saying we need to make changes in the internal API,
right? Because we alreayd have a windows native implementation of
shared memory that AFAIK works, so if the new Unix stuff can be done
with the same internal APIs, it shouldn't nede to be changed. (Sorry,
haven't followed the thread in detail)

If so - can we define exactly what properties it is we *need*?

(A native API worth looking at is e.g.
http://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx
- but there are probably others as well if that one doesn't do)

-- 
 Magnus Hagander
 Me: 

Re: [HACKERS] plpython issue with Win64 (PG 9.2)

2012-06-27 Thread Jan Urbański

On 27/06/12 11:51, Asif Naeem wrote:

Hi,

On Windows 7 64bit, plpython is causing server crash with the following
test case i.e.

CREATE PROCEDURAL LANGUAGE 'plpython3u';

CREATE OR REPLACE FUNCTION pymax (a integer, b integer)
   RETURNS integer
AS $$
   if a  b:
 return a
   return b
$$ LANGUAGE plpython3u;
SELECT pymax(1, 2);




I think primary reason that trigger this issue is when Function
PLyUnicode_Bytes() calls PyUnicode_AsEncodedString( ,WIN1252 /*Server
encoding*/, )  it fails with null. I built latest pg 9.2 source code with
python 3.2.2.3 by using Visual Studio 2010. Thanks.


I'll try to reproduce this on Linux, which should be possible given the 
results of your investigation.


Cheers,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend for same-role

2012-06-27 Thread Magnus Hagander
On Tue, Jun 26, 2012 at 10:58 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Mar 29, 2012 at 3:04 AM, Daniel Farina dan...@heroku.com wrote:
 On Mon, Mar 26, 2012 at 12:14 AM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2012-03-20 at 01:38 -0700, Daniel Farina wrote:
 On Thu, Mar 15, 2012 at 9:39 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Fri, Mar 16, 2012 at 8:14 AM, Daniel Farina dan...@heroku.com wrote:
  Parallel to pg_cancel_backend, it'd be nice to allow the user to just
  outright kill a backend that they own (politely, with a SIGTERM),
  aborting any transactions in progress, including the idle transaction,
  and closing the socket.
 
  +1

 Here's a patch implementing the simple version, with no more guards
 against signal racing than have been seen previously.  The more
 elaborate variants to close those races is being discussed in a
 parallel thread, but I thought I'd get this simple version out there.

 Review:

 After reading through the threads, it looks like there was no real
 objection to this approach -- pid recycling is not something we're
 concerned about.

 I think you're missing a doc update though, in func.sgml:

 For the less restrictive functionpg_cancel_backend/, the role of an
 active backend can be found from
 the structfieldusename/structfield column of the
 structnamepg_stat_activity/structname view.

 Also, high-availability.sgml makes reference to pg_cancel_backend(), and
 it might be worthwhile to add an ...and pg_terminate_backend() there.

 Other than that, it looks good to me.

 Good comments. Patch attached to address the doc issues.  The only
 iffy thing is that the paragraph For the less restrictive... I have
 opted to remove in its entirely.  I think the documents are already
 pretty clear about the same-user rule, and I'm not sure if this is the
 right place for a crash-course on attributes in pg_stat_activity (but
 maybe it is).

 ...and pg_terminate_backend seems exactly right.

 And I think now that the system post-patch doesn't have such a strange
 contrast between the ability to send SIGINT vs. SIGTERM, such a
 contrast may not be necessary.

 I'm also not sure what the policy is about filling paragraphs in the
 manual.  I filled one, which increases the sgml churn a bit. git
 (show|diff) --word-diff helps clean it up.

 I went ahead and committed this.

 I kinda think we should back-patch this into 9.2.  It doesn't involve
 a catalog change, and would make the behavior consistent between the
 two releases, instead of changing in 9.1 and then changing again in
 9.2.  Thoughts?

+1.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] Row-Level Security

2012-06-27 Thread Kohei KaiGai
2012/6/27 Florian Pflug f...@phlo.org:
 On Jun27, 2012, at 07:18 , Kohei KaiGai wrote:
 The problem is the way to implement it.
 If we would have permission checks on planner stage, it cannot handle
 a case when user-id would be switched prior to executor stage, thus
 it needs something remedy to handle the scenario correctly.
 Instead of a unique plan per query, it might be a solution to generate
 multiple plans depending on user-id, and choose a proper one in
 executor stage.

 Which type of implementation is what everybody is asking for?

 I think you need to

  a) Determine the user-id at planning time, and insert the matching
    RLS clause

 b1) Either re-plan the query if the user-id changes between planning
    and execution time, which means making the user-id a part of the
    plan-cache key.

 b2) Or decree that for RLS purposes, it's the user-id at planning time,
    not execution time, that counts.

My preference is b1, because b2 approach takes user visible changes
in concepts of permission checks.

Probably, plan-cache should be also invalidated when user's property
was modified or grant/revoke is issued, in addition to the table itself.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-27 Thread Ants Aasma
Sorry about the delay in answering. I have been swamped with non-PG
related things lately.

On Tue, Jun 26, 2012 at 11:08 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
 I'm confused by this remark, because surely the query planner does it this
 way only if there's no LIMIT.  When there is a LIMIT, we choose based on
 the startup cost plus the estimated fraction of the total cost we expect
 to pay based on dividing the LIMIT by the overall row count estimate.  Or
 is this not what you're talking about?

My reasoning was that query_planner returns the cheapest-total path
and cheapest fractional presorted (by the aggregation pathkeys). When
evaluating hash-aggregates with this patch these two are indeed
compared considering the esimated fraction of the total cost, but this
might miss cheapest fractional unordered path for lazy hashaggregates.

Reviewing the code now I discovered this path could be picked out from
the pathlist, just like it is done by
get_cheapest_fractional_path_for_pathkeys when pathkeys is nil. This
would need to be returned in addition to the other two paths. To
minimize overhead, this should only be done when we possibly want to
consider lazy hash-aggregation (there is a group clause with no
aggregates and grouping is hashable) But this is starting to get
pretty crufty considering that there doesn't seem to be any really
compelling usecases for this.

 Ants, do you intend to update this patch for this CommitFest?  Or at
 all?  It seems nobody's too excited about this, so I'm not sure
 whether it makes sense for you to put more work on it.  But please
 advise as to your plans.

If anyone thinks that this patch might be worth considering, then I'm
prepared to do minor cleanup this CF (I saw some possibly unnecessary
cruft in agg_fill_hash_and_retrieve). On the other hand, if you think
the use case is too marginal to consider for inclusion then I won't
shed a tear if this gets rejected. For me this was mostly a learning
experience for poking around in the planner.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] Row-Level Security

2012-06-27 Thread Robert Haas
On Wed, Jun 27, 2012 at 7:21 AM, Florian Pflug f...@phlo.org wrote:
 On Jun27, 2012, at 07:18 , Kohei KaiGai wrote:
 The problem is the way to implement it.
 If we would have permission checks on planner stage, it cannot handle
 a case when user-id would be switched prior to executor stage, thus
 it needs something remedy to handle the scenario correctly.
 Instead of a unique plan per query, it might be a solution to generate
 multiple plans depending on user-id, and choose a proper one in
 executor stage.

 Which type of implementation is what everybody is asking for?

 I think you need to

  a) Determine the user-id at planning time, and insert the matching
    RLS clause

 b1) Either re-plan the query if the user-id changes between planning
    and execution time, which means making the user-id a part of the
    plan-cache key.

 b2) Or decree that for RLS purposes, it's the user-id at planning time,
    not execution time, that counts.

Or b3, flag plans that depend on the user ID inside the plan-cache,
and just flush all of those (but only those) when the user ID changes.
 In the common case where RLS is not used, that might ease the sting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_terminate_backend for same-role

2012-06-27 Thread Robert Haas
On Wed, Jun 27, 2012 at 8:13 AM, Magnus Hagander mag...@hagander.net wrote:
 I went ahead and committed this.

 I kinda think we should back-patch this into 9.2.  It doesn't involve
 a catalog change, and would make the behavior consistent between the
 two releases, instead of changing in 9.1 and then changing again in
 9.2.  Thoughts?

 +1.

Three votes with no objections is good enough for me, so, done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Reporting hba lines

2012-06-27 Thread Magnus Hagander
When debugging strange and complex pg_hba lines, it can often be quite
useful to know which line is matching a particular connection that
failed for some reason. Because more often than not, it's actually not
using the line in pg_hba.conf that's expected.

The easiest way to do this is to emit an errdetail for the login
failure, per this patch.

Question is - is that leaking information to the client that we
shouldn't be leaking?

And if it is, what would be the preferred way to deal with it? We
could put that as a detail to basically every single error message
coming out of the auth system, but that seems like a bad idea. Or we
could make a separate ereport(LOG) before send it to the client,
perhaps?

Thoughts?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


hba_line.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] Row-Level Security

2012-06-27 Thread Kohei KaiGai
2012/6/27 Robert Haas robertmh...@gmail.com:
 On Wed, Jun 27, 2012 at 7:21 AM, Florian Pflug f...@phlo.org wrote:
 On Jun27, 2012, at 07:18 , Kohei KaiGai wrote:
 The problem is the way to implement it.
 If we would have permission checks on planner stage, it cannot handle
 a case when user-id would be switched prior to executor stage, thus
 it needs something remedy to handle the scenario correctly.
 Instead of a unique plan per query, it might be a solution to generate
 multiple plans depending on user-id, and choose a proper one in
 executor stage.

 Which type of implementation is what everybody is asking for?

 I think you need to

  a) Determine the user-id at planning time, and insert the matching
    RLS clause

 b1) Either re-plan the query if the user-id changes between planning
    and execution time, which means making the user-id a part of the
    plan-cache key.

 b2) Or decree that for RLS purposes, it's the user-id at planning time,
    not execution time, that counts.

 Or b3, flag plans that depend on the user ID inside the plan-cache,
 and just flush all of those (but only those) when the user ID changes.
  In the common case where RLS is not used, that might ease the sting.

Probably, PlannedStmt-invalItems allows to handle invalidation of
plan-cache without big code changes. I'll try to put a flag of user-id
to track the query plan with RLS assumed, or InvalidOid if no RLS
was applied in this plan.
I'll investigate the implementation for more details.

Do we have any other scenario that run a query plan under different
user privilege rather than planner stage?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting hba lines

2012-06-27 Thread Cédric Villemain
Le mercredi 27 juin 2012 14:54:15, Magnus Hagander a écrit :
 When debugging strange and complex pg_hba lines, it can often be quite
 useful to know which line is matching a particular connection that
 failed for some reason. Because more often than not, it's actually not
 using the line in pg_hba.conf that's expected.
 
 The easiest way to do this is to emit an errdetail for the login
 failure, per this patch.
 
 Question is - is that leaking information to the client that we
 shouldn't be leaking?

I fear it is exactly the problem. Too bad because the feature is interesting.

 And if it is, what would be the preferred way to deal with it? We
 could put that as a detail to basically every single error message
 coming out of the auth system, but that seems like a bad idea. Or we
 could make a separate ereport(LOG) before send it to the client,
 perhaps?

The problem is also that 1/ you're not sure how you did connect exactly  2/ 
you're not connecting like you wanted to do. (my case with ipv4 vs ipv6 just 
below). Providing more information on what we receive from client and tell it 
to the client  sounds good, no ?

 Thoughts?

I just consume some moment before fixing an ipv6 vs ipv4 login failure. The 
matched line from pg_hba .conf would have been very useful.

Maybe it is enough to report what happens on the connection: from which host, 
TCP/socket (and which  one, given that we should have multiple option here 
soon), dbname, user,  inspecting pg_hba.conf will remain difficult if map 
is 
used or some other  +group option.

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Wed, Jun 27, 2012 at 3:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wonder whether this design can be adapted to Windows?  IIRC we do
 not have a bulletproof data directory lock scheme for Windows.
 It seems like this makes few enough demands on the lock mechanism
 that there ought to be suitable primitives available there too.

 I assume you're saying we need to make changes in the internal API,
 right? Because we alreayd have a windows native implementation of
 shared memory that AFAIK works,

Right, but does it provide honest protection against starting two
postmasters in the same data directory?  Or more to the point,
does it prevent starting a new postmaster when the old postmaster
crashed but there are still orphaned backends making changes?
AFAIR we basically punted on those problems for the Windows port,
for lack of an equivalent to nattch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visual Studio 2012 RC

2012-06-27 Thread Heikki Linnakangas

On 10.06.2012 11:41, Magnus Hagander wrote:

On Sun, Jun 10, 2012 at 3:16 AM, Brar Pieningb...@gmx.de  wrote:

Magnus Hagander wrote:


I don't have too much hope for them actually changing it - they seem
hell-bent on forcing everybody into metro, and this seems to be yet another
way to do that. But we can always hope...



Looks like they've learnt their lesson...

http://blogs.msdn.com/b/visualstudio/archive/2012/06/08/visual-studio-express-2012-for-windows-desktop.aspx


Yeah. Didn't expect that to happen, but happy to see that it did! :-)


I don't think we can realistically support VS2012 until Microsoft 
releases the gratis Visual Studio Express 2012 for Windows Desktop. 
We'll hardly find anyone willing to run a buildfarm machine with VS2012 
until that, and I don't think any of the committers have access to 
VS2012 to test with.


So, I'm bumping this to the next commitfest. By the time that begins, 
let's see if Microsoft has released it yet.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Would Posix shmem help with that at all?  Why did you choose not to
 use the Posix API, anyway?

 It seemed more complicated.  If we use the POSIX API, we've got to
 have code to find a non-colliding name for the shm, and we've got to
 arrange to clean it up at process exit.  Anonymous shm doesn't require
 a name and goes away automatically when it's no longer in use.

I see.  Those are pretty good reasons ...

 With respect to EXEC_BACKEND, I wasn't proposing to kill it, just to
 make it continue to use a full-sized sysv shm.

Well, if the ultimate objective is to get out from under the SysV APIs
entirely, we're not going to get there if we still have to have all that
code for the EXEC_BACKEND case.  Maybe it's time to decide that we don't
need to support EXEC_BACKEND on Unix.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Stephen Frost
All,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Would Posix shmem help with that at all?  Why did you choose not to
  use the Posix API, anyway?
 
  It seemed more complicated.  If we use the POSIX API, we've got to
  have code to find a non-colliding name for the shm, and we've got to
  arrange to clean it up at process exit.  Anonymous shm doesn't require
  a name and goes away automatically when it's no longer in use.
 
 I see.  Those are pretty good reasons ...

After talking to Magnus a bit this morning regarding this, it sounds
like what we're doing on Windows is closer to Anonymous shm, except that
they use an intentionally specific name, which also allows them to
detect if any children are still alive by using a create-if-not-exists
approach on the shm segment and failing if it still exists.  There were
some corner cases around restarts due to it taking a few seconds for the
Windows kernel to pick up on the fact that all the children are dead and
that the shm segment should go away, but they were able to work around
that, and failure to start is surely much better than possible
corruption.

What this all boils down to is- can you have a shm segment that goes
away when no one is still attached to it, but actually give it a name
and then detect if it already exists atomically on startup on
Linux/Unixes?  If so, perhaps we could use the same mechanism on both..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Right, but does it provide honest protection against starting two
 postmasters in the same data directory?  Or more to the point,
 does it prevent starting a new postmaster when the old postmaster
 crashed but there are still orphaned backends making changes?
 AFAIR we basically punted on those problems for the Windows port,
 for lack of an equivalent to nattch.

See my other mail, but, after talking to Magnus, it's my understanding
that we had that problem initially, but it was later solved by using a
named shared memory segment which the kernel will clean up when all
children are gone.  That, combined with a 'create-if-exists' call,
allows detection of lost children to be done.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Re: [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical

2012-06-27 Thread Heikki Linnakangas

On 13.06.2012 14:28, Andres Freund wrote:

@@ -2584,6 +2610,73 @@ l1:
rdata[1].buffer_std = true;
rdata[1].next = NULL;

+   /*
+* XXX: We could decide not to log changes when the origin is 
not the
+* local node, that should reduce redundant logging.
+*/
+   if(need_tuple){
+   xl_heap_header xlhdr;

 ...

+   relationFindPrimaryKey(relation,indexoid,pknratts, 
pkattnum, pktypoid, pkopclass);
+
+   if(!indexoid){
+   elog(WARNING, Could not find primary key for table 
with oid %u,
+relation-rd_id);
+   goto no_index_found;
+   }
+
+   index_rel = index_open(indexoid, AccessShareLock);
+
+   indexdesc = RelationGetDescr(index_rel);
+
+   for(natt = 0; natt  indexdesc-natts; natt++){
+   idxvals[natt] =
+   fastgetattr(tp, pkattnum[natt], 
desc,idxisnull[natt]);
+   Assert(!idxisnull[natt]);
+   }
+
+   idxtuple = heap_form_tuple(indexdesc, idxvals, 
idxisnull);
+
+   xlhdr.t_infomask2 = idxtuple-t_data-t_infomask2;
+   xlhdr.t_infomask = idxtuple-t_data-t_infomask;
+   xlhdr.t_hoff = idxtuple-t_data-t_hoff;
+
+   rdata[1].next =(rdata[2]);
+   rdata[2].data = (char*)xlhdr;
+   rdata[2].len = SizeOfHeapHeader;
+   rdata[2].buffer = InvalidBuffer;
+   rdata[2].next = NULL;
+
+   rdata[2].next =(rdata[3]);
+   rdata[3].data = (char *) idxtuple-t_data + 
offsetof(HeapTupleHeaderData, t_bits);
+   rdata[3].len = idxtuple-t_len - 
offsetof(HeapTupleHeaderData, t_bits);
+   rdata[3].buffer = InvalidBuffer;
+   rdata[3].next = NULL;
+
+   heap_close(index_rel, NoLock);
+   no_index_found:
+   ;
+   }
+
recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE, rdata);

PageSetLSN(page, recptr);


It's not cool to do all that primary key lookup stuff within the 
critical section, while holding a lock on the page.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting hba lines

2012-06-27 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 When debugging strange and complex pg_hba lines, it can often be quite
 useful to know which line is matching a particular connection that
 failed for some reason. Because more often than not, it's actually not
 using the line in pg_hba.conf that's expected.

 The easiest way to do this is to emit an errdetail for the login
 failure, per this patch.

 Question is - is that leaking information to the client that we
 shouldn't be leaking?

Yes.

 And if it is, what would be the preferred way to deal with it?

Report to the postmaster log only.  errdetail_log should do.

BTW, are you sure that auth_failed is only called in cases where
an hba line has already been identified?  Even if true today,
it seems fairly risky to assume that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting hba lines

2012-06-27 Thread Magnus Hagander
On Wed, Jun 27, 2012 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 When debugging strange and complex pg_hba lines, it can often be quite
 useful to know which line is matching a particular connection that
 failed for some reason. Because more often than not, it's actually not
 using the line in pg_hba.conf that's expected.

 The easiest way to do this is to emit an errdetail for the login
 failure, per this patch.

 Question is - is that leaking information to the client that we
 shouldn't be leaking?

 Yes.

 And if it is, what would be the preferred way to deal with it?

 Report to the postmaster log only.  errdetail_log should do.

Oh, I wasn't aware we had that :) You learn something new every day.


 BTW, are you sure that auth_failed is only called in cases where
 an hba line has already been identified?  Even if true today,
 it seems fairly risky to assume that.

It is true today, but yes, it might be safe to guard against it with
something like this?

I also fixed the error message to follow the guidelines better - I think :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


hba_line.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Magnus Hagander
On Wed, Jun 27, 2012 at 3:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Wed, Jun 27, 2012 at 3:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wonder whether this design can be adapted to Windows?  IIRC we do
 not have a bulletproof data directory lock scheme for Windows.
 It seems like this makes few enough demands on the lock mechanism
 that there ought to be suitable primitives available there too.

 I assume you're saying we need to make changes in the internal API,
 right? Because we alreayd have a windows native implementation of
 shared memory that AFAIK works,

 Right, but does it provide honest protection against starting two
 postmasters in the same data directory?  Or more to the point,
 does it prevent starting a new postmaster when the old postmaster
 crashed but there are still orphaned backends making changes?
 AFAIR we basically punted on those problems for the Windows port,
 for lack of an equivalent to nattch.

No, we spent a lot of time trying to *fix* it, and IIRC we did.

We create a shared memory segment with a fixed name based on the data
directory. This shared memory segment is inherited by all children. It
will automatically go away only when all processes that have an open
handle to it go away (in fact, it can even take a second or two more,
if they go away by crash and not by cleanup - we have a workaround in
the code for that). But as long as there is an orphaned backend
around, the shared memory segment stays around.

We don't have nattch. But we do have nattch0. Or something like that.

You can work around it if you find two different paths to the same
data directory (e.g .using junctions), but you are really actively
trying to break the system if you do that...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning handling in Perl scripts

2012-06-27 Thread Andrew Dunstan



On 06/24/2012 04:05 PM, Robert Haas wrote:

On Sun, Jun 24, 2012 at 2:40 PM, Peter Eisentrautpete...@gmx.net  wrote:

Every time I make a change to the structure of the catalog files,
genbki.pl produces a bunch of warnings (like Use of uninitialized value
in string eq at genbki.pl line ...), and produces corrupted output
files, that are then (possibly) detected later by the compiler.  Also,
getting out of that is difficult because due to the complicated
dependency relationship between the involved files, you need to remove a
bunch of files manually, or clean everything.  So error handling could
be better.

It seems that adding

diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index ebc4825..7d66da9 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -19,6 +19,8 @@
  use strict;
  use warnings;

+local $SIG{__WARN__} = sub { die $_[0] };
+
  my @input_files;
  our @include_path;
  my $output_path = '';

would address that.

Could that cause any other problems?  Should it be added to all Perl
scripts?

This seems like a band-aid.  How about if we instead add whatever
error-handling the script is missing, so that it produces an
appropriate, human-readable error message?



I realise I'm late to this party, but I'm with Robert. The root cause of 
the errors should be fixed.


That's not to say that making warnings fatal might not also be a good 
idea as a general defense mechanism.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting hba lines

2012-06-27 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Wed, Jun 27, 2012 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, are you sure that auth_failed is only called in cases where
 an hba line has already been identified?  Even if true today,
 it seems fairly risky to assume that.

 It is true today, but yes, it might be safe to guard against it with
 something like this?

FWIW, the usual approach for conditionally emitting bits of an ereport
is more like

ereport(FATAL,
(errcode(errcode_return),
 errmsg(errstr, port-user_name),
 port-hba ? errdetail_log(Connection matched pg_hba.conf line 
%d, port-hba-linenumber) : 0));

but that's just a nitpick.  A bigger issue is that I'm not convinced
that a line number will be tremendously helpful: it's easy to miscount
lines, and a line number will certainly not be helpful in the frequent
cases where people are modifying the wrong hba file.  Can we show
the source text of the hba line?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Regarding WAL Format Changes

2012-06-27 Thread Amit Kapila
While reading patch-1 (
http://archives.postgresql.org/pgsql-hackers/2012-06/txtFskHiYakjO.txt
1-use-uint64-got-segno.patch) of WAL Format
Changes(http://archives.postgresql.org/message-id/4FDA5136.6080206@enterpris
edb.com), I had few observations which are summarized below:

 

1. Function header for following functions still contains referece to log,
seg 
   a. InstallXLogFileSegment() 
   b. RemoveOldXlogFiles() 
   c. XLogFileCopy() 
   d. XLogGetLastRemoved() 
   e. UpdateLastRemovedPtr() 
   f. RemoveOldXlogFiles() 

2. @@ -2680,8 +2645,8 @@ InstallXLogFileSegment(uint32 *log, uint32 *seg,
char *tmppath, 
 LWLockRelease(ControlFileLock); 
 ereport(LOG, 
 (errcode_for_file_access(), 
- errmsg(could not link file \%s\ to
\%s\ (initialization of log file %u, segment %u): %m, 
-tmppath, path, *log,
*seg))); 
+ errmsg(could not link file \%s\ to
\%s\ (initialization of log file): %m, 
+tmppath, path))); 
   If Changed error message can contain log file and segment number, it
would be more clear. That should be easily 
   deducible from segment number. 

3.   -RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr) 
+RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) 
. 
. 
.
@@ -4016,8 +3953,9 @@ retry: 
 if (!(((XLogPageHeader) readBuf)-xlp_info 
XLP_FIRST_IS_CONTRECORD)) 
 { 
 ereport(emode_for_corrupt_record(emode,
*RecPtr), 
-(errmsg(there is no
contrecord flag in log file %u, segment %u, offset %u, 
-readId,
readSeg, readOff))); 
+(errmsg(there is no
contrecord flag in log segment %s, offset %u, 
+
XLogFileNameP(curFileTLI, readSegNo), 
+readOff)));

 goto next_record_is_invalid; 
 } 
 pageHeaderSize =
XLogPageHeaderSize((XLogPageHeader) readBuf); 
@@ -4025,10 +3963,13 @@ retry: 
 if (contrecord-xl_rem_len == 0 || 
 total_len != (contrecord-xl_rem_len +
gotlen)) 
 { 
+char fname[MAXFNAMELEN]; 
+XLogFileName(fname, curFileTLI, readSegNo);

 ereport(emode_for_corrupt_record(emode,
*RecPtr), 
-(errmsg(invalid contrecord
length %u in log file %u, segment %u, offset %u, 
+(errmsg(invalid contrecord
length %u in log segment %s, offset %u, 
 
contrecord-xl_rem_len, 
-readId,
readSeg, readOff))); 
+
XLogFileNameP(curFileTLI, readSegNo), 
+readOff)));

 goto next_record_is_invalid; 
 } 
  
  For the above 2 changed error messages, 'log segment' is used for
filename. 
  However all similar changes has 'log file' for filename. There are some
places 
  where 'log segment' is used and other places it is 'log file'. 
  So is there any particular reason for it? 
  
  
4. @@ -533,33 +533,17 @@ pg_xlog_location_diff(PG_FUNCTION_ARGS)
-/* 
- * Sanity check 
- */ 
-if (loc1.xrecoff  XLogFileSize) 
-ereport(ERROR, 
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE), 
- errmsg(xrecoff \%X\ is out of valid
range, 0..%X, loc1.xrecoff, XLogFileSize))); 
-if (loc2.xrecoff  XLogFileSize) 
-ereport(ERROR, 
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE), 
- errmsg(xrecoff \%X\ is out of valid
range, 0..%X, loc2.xrecoff, XLogFileSize))); 
+bytes1 = (((uint64)loc1.xlogid)  32L) + loc1.xrecoff; 
+bytes2 = (((uint64)loc2.xlogid)  32L) + loc2.xrecoff; 

Is there no chance that it can be out of valid range after new changes,
just a doubt? 


5. 
--- a/src/backend/replication/walreceiver.c 
+++ b/src/backend/replication/walreceiver.c 
@@ -69,11 +69,12 @@ walrcv_disconnect_type walrcv_disconnect = NULL; 
  
 /* 
  * These variables are used similarly to openLogFile/Id/Seg/Off, 
- * but for walreceiver to write the XLOG. 
+ * but for walreceiver to write the XLOG. recvFileTLI is the TimeLineID 

In the above comments, there is still reference to Id/Seg/Off.

 

 

With Regards,

Amit Kapila.

 



Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Wed, Jun 27, 2012 at 3:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 AFAIR we basically punted on those problems for the Windows port,
 for lack of an equivalent to nattch.

 No, we spent a lot of time trying to *fix* it, and IIRC we did.

OK, in that case this isn't as interesting as I thought.

If we do go over to a file-locking-based solution on Unix, it might be
worthwhile changing to something similar on Windows.  But it would be
more about reducing coding differences between the platforms than
plugging any real holes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting hba lines

2012-06-27 Thread Magnus Hagander
On Wed, Jun 27, 2012 at 4:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Wed, Jun 27, 2012 at 3:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, are you sure that auth_failed is only called in cases where
 an hba line has already been identified?  Even if true today,
 it seems fairly risky to assume that.

 It is true today, but yes, it might be safe to guard against it with
 something like this?

 FWIW, the usual approach for conditionally emitting bits of an ereport
 is more like

        ereport(FATAL,
                (errcode(errcode_return),
                 errmsg(errstr, port-user_name),
                 port-hba ? errdetail_log(Connection matched pg_hba.conf 
 line %d, port-hba-linenumber) : 0));

Hmm. Ok. So it treats a 0/NULL there as a way to ignore it. I tried
something with the NULL inside the errdetail, which obviously failed.


 but that's just a nitpick.  A bigger issue is that I'm not convinced
 that a line number will be tremendously helpful: it's easy to miscount
 lines, and a line number will certainly not be helpful in the frequent

Editors will help you count the lines, no? :-)

 cases where people are modifying the wrong hba file.  Can we show
 the source text of the hba line?

We don't currently keep the full source text around - but we certainly
could do that if we wanted to.

I'm not sure how much it helps - usually, you're going to end up on a
line that's completely irrelevant if you get the wrong hba file (e.g.
a comment or a line that's not even in the file at all due to size).
Maybe we should just include the *name* of the HBA file in the error
message?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Robert Haas
On Wed, Jun 27, 2012 at 9:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Would Posix shmem help with that at all?  Why did you choose not to
 use the Posix API, anyway?

 It seemed more complicated.  If we use the POSIX API, we've got to
 have code to find a non-colliding name for the shm, and we've got to
 arrange to clean it up at process exit.  Anonymous shm doesn't require
 a name and goes away automatically when it's no longer in use.

 I see.  Those are pretty good reasons ...

 With respect to EXEC_BACKEND, I wasn't proposing to kill it, just to
 make it continue to use a full-sized sysv shm.

 Well, if the ultimate objective is to get out from under the SysV APIs
 entirely, we're not going to get there if we still have to have all that
 code for the EXEC_BACKEND case.  Maybe it's time to decide that we don't
 need to support EXEC_BACKEND on Unix.

I don't personally see a need to do anything that drastic at this
point.  Admittedly, I rarely compile with EXEC_BACKEND, but I don't
think it's bad to have the option available.  Adjusting shared memory
limits isn't really a big problem for PostgreSQL developers; what
we're trying to avoid is the need for PostgreSQL *users* to concern
themselves with it.  And surely anyone who is using EXEC_BACKEND on
Unix is a developer, not a user.

If and when we come up with a substitute for the nattch interlock,
then this might be worth thinking a bit harder about.  At that point,
if we still want to support EXEC_BACKEND on Unix, then we'd need the
EXEC_BACKEND case at least to use POSIX shm rather than anonymous
shared mmap.  Personally I think that would be not that hard and
probably worth doing, but there doesn't seem to be any point in
writing that code now, because for the simple case of just reducing
the amount of shm that we allocate, an anonymous mapping seems better
all around.

We shouldn't overthink this.  Our shared memory code has allocated a
bunch of crufty hacks over the years to work around various
platform-specific issues, but it's still not a lot of code, so I don't
see any reason to worry unduly about making a surgical fix without
having a master plan.  Nothing we want to do down the road will
require moving the earth.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting hba lines

2012-06-27 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Wed, Jun 27, 2012 at 4:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 cases where people are modifying the wrong hba file.  Can we show
 the source text of the hba line?

 We don't currently keep the full source text around - but we certainly
 could do that if we wanted to.

If we're concerned about providing a message like this, I think it'd be
well worthwhile.  We presumably would only store the active lines not
comment lines, so the extra memory space would be negligible in just
about any real use-case.

 I'm not sure how much it helps - usually, you're going to end up on a
 line that's completely irrelevant if you get the wrong hba file (e.g.
 a comment or a line that's not even in the file at all due to size).

Only if you count accurately, and aren't fooled into miscounting by the
expectation that you must arrive on a non-comment line.  I don't buy the
the editor can do it for you argument either, at least not since
noticing that recent versions of emacs don't count lines the way I do.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Robert Haas
On Wed, Jun 27, 2012 at 9:52 AM, Stephen Frost sfr...@snowman.net wrote:
 What this all boils down to is- can you have a shm segment that goes
 away when no one is still attached to it, but actually give it a name
 and then detect if it already exists atomically on startup on
 Linux/Unixes?  If so, perhaps we could use the same mechanism on both..

As I understand it, no.  You can either have anonymous shared
mappings, which go away when no longer in use but do not have a name.
Or you can have POSIX or sysv shm, which have a name but do not
automatically go away when no longer in use.  There seems to be no
method for setting up a segment that both has a name and goes away
automatically.  POSIX shm in particular tries to look like a file,
whereas anonymous memory tries to look more like malloc (except that
you can share the mapping with child processes).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding WAL Format Changes

2012-06-27 Thread Heikki Linnakangas

On 27.06.2012 17:14, Amit Kapila wrote:

1. Function header for following functions still contains referece to log,
seg
a. InstallXLogFileSegment()
b. RemoveOldXlogFiles()
c. XLogFileCopy()
d. XLogGetLastRemoved()
e. UpdateLastRemovedPtr()
f. RemoveOldXlogFiles()


Thanks, fixed.


2. @@ -2680,8 +2645,8 @@ InstallXLogFileSegment(uint32 *log, uint32 *seg,
char *tmppath,
  LWLockRelease(ControlFileLock);
  ereport(LOG,
  (errcode_for_file_access(),
- errmsg(could not link file \%s\ to
\%s\ (initialization of log file %u, segment %u): %m,
-tmppath, path, *log,
*seg)));
+ errmsg(could not link file \%s\ to
\%s\ (initialization of log file): %m,
+tmppath, path)));
If Changed error message can contain log file and segment number, it
would be more clear. That should be easily
deducible from segment number.


That seems redundant. The target file name is calculated from the 
segment number, and we're now using the file name instead of log+seg in 
other messages too.



3.   -RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr)
+RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr)
.
.
.
@@ -4016,8 +3953,9 @@ retry:
  if (!(((XLogPageHeader) readBuf)-xlp_info
XLP_FIRST_IS_CONTRECORD))
  {
  ereport(emode_for_corrupt_record(emode,
*RecPtr),
-(errmsg(there is no
contrecord flag in log file %u, segment %u, offset %u,
-readId,
readSeg, readOff)));
+(errmsg(there is no
contrecord flag in log segment %s, offset %u,
+
XLogFileNameP(curFileTLI, readSegNo),
+readOff)));

  goto next_record_is_invalid;
  }
  pageHeaderSize =
XLogPageHeaderSize((XLogPageHeader) readBuf);
@@ -4025,10 +3963,13 @@ retry:
  if (contrecord-xl_rem_len == 0 ||
  total_len != (contrecord-xl_rem_len +
gotlen))
  {
+char fname[MAXFNAMELEN];
+XLogFileName(fname, curFileTLI, readSegNo);

  ereport(emode_for_corrupt_record(emode,
*RecPtr),
-(errmsg(invalid contrecord
length %u in log file %u, segment %u, offset %u,
+(errmsg(invalid contrecord
length %u in log segment %s, offset %u,

contrecord-xl_rem_len,
-readId,
readSeg, readOff)));
+
XLogFileNameP(curFileTLI, readSegNo),
+readOff)));

  goto next_record_is_invalid;
  }

   For the above 2 changed error messages, 'log segment' is used for
filename.
   However all similar changes has 'log file' for filename. There are some
places
   where 'log segment' is used and other places it is 'log file'.
   So is there any particular reason for it?


Not really. There are several messages that use log file %s, and also 
several places that use log segment %s Should we make it consistent 
and use either log segment or log file everywhere?



4. @@ -533,33 +533,17 @@ pg_xlog_location_diff(PG_FUNCTION_ARGS)
-/*
- * Sanity check
- */
-if (loc1.xrecoff  XLogFileSize)
-ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg(xrecoff \%X\ is out of valid
range, 0..%X, loc1.xrecoff, XLogFileSize)));
-if (loc2.xrecoff  XLogFileSize)
-ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg(xrecoff \%X\ is out of valid
range, 0..%X, loc2.xrecoff, XLogFileSize)));
+bytes1 = (((uint64)loc1.xlogid)  32L) + loc1.xrecoff;
+bytes2 = (((uint64)loc2.xlogid)  32L) + loc2.xrecoff;

 Is there no chance that it can be out of valid range after new changes,
just a doubt?


No. Not in the sense it used to be, anyway, the XLogFileSize check is no 
longer relevant. Perhaps we should check for InvalidXLogRecPtr or that 
the pointer doesn't point e.g in the middle of a page header. But then 
again, this calculation works fine with both of those cases, so I see no 
reason to make it stricter.



5.
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -69,11 +69,12 @@ 

[HACKERS] Re: [COMMITTERS] pgsql: Move WAL continuation record information to WAL page header.

2012-06-27 Thread Heikki Linnakangas

On 27.06.2012 17:53, Andres Freund wrote:

I had noticed one thing when reviewing the patches before:

@@ -717,6 +719,15 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata)
 booldoPageWrites;
 boolisLogSwitch = (rmid == RM_XLOG_ID  info == XLOG_SWITCH);
 uint8   info_orig = info;
+   static XLogRecord *rechdr;
+
+   if (rechdr == NULL)
+   {
+   rechdr = malloc(SizeOfXLogRecord);
+   if (rechdr == NULL)
+   elog(ERROR, out of memory);
+   MemSet(rechdr, 0, SizeOfXLogRecord);
+   }

 /* cross-check on whether we should be here or not */
 if (!XLogInsertAllowed())

Why do you allocate this dynamically? XLogRecord is 32bytes, there doesn't
seem to be much point in this?


On 64-bit architectures, the struct needs padding at the end to make the 
size MAXALIGNed to 32 bytes; a simple XLogRecord rechdr local variable 
would not include that. You could do something like:


union
{
  XLogRecord rechdr;
  char bytes[SizeOfXLogRecord];
}

but that's quite ugly too.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] warning handling in Perl scripts

2012-06-27 Thread David E. Wheeler
On Jun 27, 2012, at 4:07 PM, Andrew Dunstan wrote:

 I realise I'm late to this party, but I'm with Robert. The root cause of the 
 errors should be fixed.
 
 That's not to say that making warnings fatal might not also be a good idea as 
 a general defense mechanism.

ISTM that if they are fatal, one will be more motivated to fix them. I think 
that was the point.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks

2012-06-27 Thread Alvaro Herrera

Excerpts from Kevin Grittner's message of mié jun 27 08:40:58 -0400 2012:
 Alvaro Herrera  wrote:
  
  here's fklocks v14, which also merges to new master as there were
  several other conflicts. It passes make installcheck-world now.
  
 Recent commits broke it again, so here's a rebased v15.  (No changes
 other than attempting to merge your work with the pg_controldata and
 pg_resetxlog changes and wrapping that FIXME in a comment.)

Oooh, so sorry -- I was supposed to have git-stashed that before
producing the patch.  This change cannot have caused the pg_dumpall
problem below though, because it only touched the multixact cache code.

 Using that patch, pg_upgrade completes, but pg_dumpall fails:

Hmm, something changed enough that requires more than just a code merge.
I'll look into it.

Thanks for the merge.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding WAL Format Changes

2012-06-27 Thread Alvaro Herrera

Excerpts from Heikki Linnakangas's message of mié jun 27 10:56:13 -0400 2012:
 On 27.06.2012 17:14, Amit Kapila wrote:

 For the above 2 changed error messages, 'log segment' is used for
  filename.
 However all similar changes has 'log file' for filename. There are some
  places
 where 'log segment' is used and other places it is 'log file'.
 So is there any particular reason for it?
 
 Not really. There are several messages that use log file %s, and also 
 several places that use log segment %s Should we make it consistent 
 and use either log segment or log file everywhere?

I think it would be better to use log segment for WAL segments.  That
way we don't cause confusion with the regular text/csv log output files.
Heck, maybe even WAL segment instead of log.

As a translator, I can't have a single, clear explanation of what log
file is because there are multiple meanings.  It would be better not to
depend on context.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding WAL Format Changes

2012-06-27 Thread Fujii Masao
On Wed, Jun 27, 2012 at 11:56 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 27.06.2012 17:14, Amit Kapila wrote:

 1. Function header for following functions still contains referece to log,
 seg
    a. InstallXLogFileSegment()
    b. RemoveOldXlogFiles()
    c. XLogFileCopy()
    d. XLogGetLastRemoved()
    e. UpdateLastRemovedPtr()
    f. RemoveOldXlogFiles()


 Thanks, fixed.

There is still reference to log/seg in the comment of InstallXLogFileSegment().
The attached patch should be applied?

Regards,

-- 
Fujii Masao


logseg2segno_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.3] Row-Level Security

2012-06-27 Thread Florian Pflug
On Jun27, 2012, at 15:07 , Kohei KaiGai wrote:
 Probably, PlannedStmt-invalItems allows to handle invalidation of
 plan-cache without big code changes. I'll try to put a flag of user-id
 to track the query plan with RLS assumed, or InvalidOid if no RLS
 was applied in this plan.
 I'll investigate the implementation for more details.
 
 Do we have any other scenario that run a query plan under different
 user privilege rather than planner stage?

Hm, what happens if a SECURITY DEFINER functions returns a refcursor?

Actually, I wonder how we handle that today. If the executor is
responsible for permission checks, that wouldn't we apply the calling
function's privilege level in that case, at least of the cursor isn't
fetched from in the SECURITY DEFINER function? If I find some time,
I'll check...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Move WAL continuation record information to WAL page header.

2012-06-27 Thread Andres Freund
On Wednesday, June 27, 2012 05:09:46 PM Heikki Linnakangas wrote:
 On 27.06.2012 17:53, Andres Freund wrote:
  I had noticed one thing when reviewing the patches before:
  
  @@ -717,6 +719,15 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData
  *rdata)
  
   booldoPageWrites;
   boolisLogSwitch = (rmid == RM_XLOG_ID  info ==
   XLOG_SWITCH); uint8   info_orig = info;
  
  +   static XLogRecord *rechdr;
  +
  +   if (rechdr == NULL)
  +   {
  +   rechdr = malloc(SizeOfXLogRecord);
  +   if (rechdr == NULL)
  +   elog(ERROR, out of memory);
  +   MemSet(rechdr, 0, SizeOfXLogRecord);
  +   }
  
   /* cross-check on whether we should be here or not */
   if (!XLogInsertAllowed())
  
  Why do you allocate this dynamically? XLogRecord is 32bytes, there
  doesn't seem to be much point in this?
 
 On 64-bit architectures, the struct needs padding at the end to make the
 size MAXALIGNed to 32 bytes; a simple XLogRecord rechdr local variable
 would not include that. You could do something like:
 
 union
 {
XLogRecord rechdr;
char bytes[SizeOfXLogRecord];
 }
 
 but that's quite ugly too.
Ah, yes. Makes sense.

Btw, the XLogRecord struct is directly layed out with 32bytes here (64bit, 
linux) because xl_prev needs to be 8byte aligned...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding WAL Format Changes

2012-06-27 Thread Heikki Linnakangas

On 27.06.2012 18:55, Fujii Masao wrote:

On Wed, Jun 27, 2012 at 11:56 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 27.06.2012 17:14, Amit Kapila wrote:


1. Function header for following functions still contains referece to log,
seg
a. InstallXLogFileSegment()
b. RemoveOldXlogFiles()
c. XLogFileCopy()
d. XLogGetLastRemoved()
e. UpdateLastRemovedPtr()
f. RemoveOldXlogFiles()



Thanks, fixed.


There is still reference to log/seg in the comment of InstallXLogFileSegment().
The attached patch should be applied?


Thanks, applied. Sorry for being so sloppy with this..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding WAL Format Changes

2012-06-27 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 So is there any particular reason for it?

 Not really. There are several messages that use log file %s, and also 
 several places that use log segment %s Should we make it consistent 
 and use either log segment or log file everywhere?

+1 for uniformity.  I think I'd vote for using file and eliminating
the segment terminology altogether, but the other direction would be
okay too, and might require fewer changes.

IIRC, in the original coding segment meant 16MB worth of WAL while
file was sometimes used to denote 4GB worth (ie, the boundaries where
we had to increment the high half of the LSN struct).  Now that 4GB
boundaries are not special, there's no reason to retain the file
concept or terminology.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_basebackup blocking all queries with horrible performance

2012-06-27 Thread Fujii Masao
On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander mag...@hagander.net wrote:
 You agreed to add something like NOSYNC option into START_REPLICATION 
 command?

 I'm on the fence. I was hoping somebody else would chime in with an
 opinion as well.

 +1

 Nobody else with any opinion on this? :(

 I don't think we really need a NOSYNC flag at this point.  Just not
 setting the flush location in clients that make a point of flushing in
 a timely fashion seems fine.

Okay, I'm in the minority, so I'm writing the patch that way. WIP
patch attached.

In the patch, pg_basebackup background process and pg_receivexlog always
return invalid location as flush one, and will never become sync standby even
if their name is in synchronous_standby_names. The timing of their sending
the reply depends on the standby_message_timeout specified in -s option. So
the write position may lag behind the true position.

pg_receivexlog accepts new option -S (better option character?). If this option
is specified, pg_receivexlog returns true flush position, and can become sync
standby. It sends back the reply to the master each time the write position
changes or the timeout passes. If synchronous_commit is set to remote_write,
synchronous replication to pg_receivexlog would work well.

The patch needs more documentation. But I think that it's worth reviewing the
code in advance, so I attached the WIP patch. Comments? Objections?

The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied,
we need to write the backport version of the patch for 9.2.

Regards,

-- 
Fujii Masao


pg_receivexlog_syncstandby_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding WAL Format Changes

2012-06-27 Thread Fujii Masao
On Thu, Jun 28, 2012 at 1:13 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 27.06.2012 18:55, Fujii Masao wrote:

 On Wed, Jun 27, 2012 at 11:56 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:

 On 27.06.2012 17:14, Amit Kapila wrote:


 1. Function header for following functions still contains referece to
 log,
 seg
    a. InstallXLogFileSegment()
    b. RemoveOldXlogFiles()
    c. XLogFileCopy()
    d. XLogGetLastRemoved()
    e. UpdateLastRemovedPtr()
    f. RemoveOldXlogFiles()



 Thanks, fixed.


 There is still reference to log/seg in the comment of
 InstallXLogFileSegment().
 The attached patch should be applied?


 Thanks, applied. Sorry for being so sloppy with this..

No problem. WAL format change you did is really nice!

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_basebackup blocking all queries with horrible performance

2012-06-27 Thread Simon Riggs
On 27 June 2012 18:24, Fujii Masao masao.fu...@gmail.com wrote:

 will never become sync standby even
 if their name is in synchronous_standby_names.

I don't understand why you'd want that.

What is wrong with removing the name from synchronous_standby_names if
you don't like it?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread A.M.

On Jun 27, 2012, at 7:34 AM, Robert Haas wrote:

 On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 So, here's a patch.  Instead of using POSIX shmem, I just took the
 expedient of using mmap() to map a block of MAP_SHARED|MAP_ANONYMOUS
 memory.  The sysv shm is still allocated, but it's just a copy of
 PGShmemHeader; the real shared memory is the anonymous block.  This
 won't work if EXEC_BACKEND is defined so it just falls back on
 straight sysv shm in that case.
 
 Um.  I hadn't thought about the EXEC_BACKEND interaction, but that seems
 like a bit of a showstopper.  I would not like to give up the ability
 to debug EXEC_BACKEND mode on Unixen.
 
 Would Posix shmem help with that at all?  Why did you choose not to
 use the Posix API, anyway?
 
 It seemed more complicated.  If we use the POSIX API, we've got to
 have code to find a non-colliding name for the shm, and we've got to
 arrange to clean it up at process exit.  Anonymous shm doesn't require
 a name and goes away automatically when it's no longer in use.
 
 With respect to EXEC_BACKEND, I wasn't proposing to kill it, just to
 make it continue to use a full-sized sysv shm.
 

I solved this by unlinking the posix shared memory segment immediately after 
creation. The file descriptor to the shared memory is inherited, so, by 
definition, only the postmaster children can access the memory. This ensures 
that shared memory cleanup is immediate after the postmaster and all children 
close, as well. The fcntl locking is not required to protect the posix shared 
memory- it can protect itself.

Cheers,
M




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-06-27 Thread Robert Haas
On Thu, May 24, 2012 at 12:04 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Thanks. I rewrote inverse conversion from pg_wchar to mule. New version of
 patch is attached.

Review:

It looks to me like pg_wchar2utf_with_len will not work, because
unicode_to_utf8 returns its second argument unmodified - not, as your
code seems to assume, the byte following what was already written.

MULE also looks problematic.  The code that you've written isn't
symmetric with the opposite conversion, unlike what you did in all
other cases, and I don't understand why.  I'm also somewhat baffled by
the reverse conversion: it treats a multi-byte sequence beginning with
a byte for which IS_LCPRV1(x) returns true as invalid if there are
less than 3 bytes available, but it only reads two; similarly, for
IS_LCPRV2(x), it demands 4 bytes but converts only 3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-27 Thread Robert Haas
On Sat, Jun 2, 2012 at 10:44 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Tom Lane  wrote:
 Simon Riggs  writes:
 On 31 May 2012 15:00, Tom Lane  wrote:
 If we want to finish the beta cycle in a reasonable time period
 and get back to actual development, we have to refrain from
 adding more possibly-destabilizing development work to 9.2. And
 that is what this is.

 In what way is it possibly destabilising?

 I'm prepared to believe that it only affects performance, but it
 could be destabilizing to that. It needs proper review and testing,
 and the next CF is the right environment for that to happen.

 +1

 This is not a bug fix or even a fix for a performance regression.
 The train has left the station; the next one will be along shortly.

And here it is.  There are a couple of outstanding issues here upon
which it would be helpful to get input from a few more people:

1. Are there any call sites from which this should be disabled, either
because the optimization won't help, or because the caller is holding
a lock that we don't want them sitting on for a moment longer than
necessary?  I think the worst case is that we're doing something like
splitting the root page of a btree, and now nobody can walk the btree
until the flush finishes, and here we are doing an unnecessary
sleep.  I am not sure where I can construct a workload where this is a
real as opposed to a theoretical problem, though.  So maybe we should
just not worry about it.  It suffices for this to be an improvement
over the status quo; it doesn't have to be perfect.  Thoughts?

2. Should we rename the GUCs, since this patch will cause them to
control WAL flush in general, as opposed to commit specifically?
Peter Geoghegan and Simon were arguing that we should retitle it to
group_commit_delay rather than just commit_delay, but that doesn't
seem to be much of an improvement in describing what the new behavior
will actually be, and I am doubtful that it is worth creating a naming
incompatibility with previous releases for a cosmetic change.  I
suggested wal_flush_delay, but there's no consensus on that.
Opinions?

Also, I think I see a straightforward, if minor, improvement.   Right
now, the patch has this:

 * Sleep before flush! By adding a delay here, we may give further
 * backends the opportunity to join the backlog of group commit
 * followers; this can significantly improve transaction throughput, at
 * the risk of increasing transaction latency.
 *
 * We do not sleep if enableFsync is not turned on, nor if there are
 * fewer than CommitSiblings other backends with active transactions.
 */
if (CommitDelay  0  enableFsync 
MinimumActiveBackends(CommitSiblings))
pg_usleep(CommitDelay);

/* Got the lock */
LogwrtResult = XLogCtl-LogwrtResult;
if (!XLByteLE(record, LogwrtResult.Flush))
{
/* try to write/flush later additions to XLOG as well */
if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))

The XLByteLE test four lines from the bottom should happen before we
consider whether to sleep, because it's possible that we'll discover
that our flush request has already been satisfied and exit without
doing anything, in which case the sleep is a waste.  The attached
version adjusts the logic accordingly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


move_delay_2012_06_27.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Uh, I change my mind about commit_delay + commit_siblings (sort of)

2012-06-27 Thread Peter Geoghegan
On 27 June 2012 23:01, Robert Haas robertmh...@gmail.com wrote:
 1. Are there any call sites from which this should be disabled, either
 because the optimization won't help, or because the caller is holding
 a lock that we don't want them sitting on for a moment longer than
 necessary?  I think the worst case is that we're doing something like
 splitting the root page of a btree, and now nobody can walk the btree
 until the flush finishes, and here we are doing an unnecessary
 sleep.  I am not sure where I can construct a workload where this is a
 real as opposed to a theoretical problem, though.  So maybe we should
 just not worry about it.  It suffices for this to be an improvement
 over the status quo; it doesn't have to be perfect.  Thoughts?

I also find it hard to believe this is a concern. As I've said before
though, it hasn't been adequately explained just how you're supposed
to skip the delay if you're in this situation. It's highly unlikely
that you'll be the one doing the delaying anyway. Skipping the delay
iff you're the leader when this might possibly be a problem will only
*conceivably* be effective in a tiny minority of cases, so this seems
to make little sense to me.

Look at the original benchmark - there are latency numbers there too.
The patch actually performs slightly better than baseline in terms of
latency (worst and average cases) as well as throughput. To get back
to bus analogies again, if people start treating buses like Taxis, you
get slightly better minimum latency figures (not included in any
benchmark performed), because one or two people immediately
high-jacked a bus and left on an hour long journey rather than waiting
another 15 minutes for more passengers to come along. That doesn't
seem like it should be something to concern ourselves with. More to
the point, I think that there are laws of physics reasons why these
backends that might be particularly adversely affected by a delay (due
to some eventuality like the one you described) cannot *really* avoid
a delay. Of course this isn't an absolute, and certainly won't apply
when there are relatively few clients, when the delay really is
useless, but then we trust the dba to set commit_siblings (and indeed
commit_delay) correctly, and it's hardly that big of a deal, since
it's only typically a fraction of a single fsync() longer at worst -
certainly not a total wait that is longer than the total wait the
backend could reasonably expect to have.

Incidentally, I'm guessing someone is going to come up with a
rule-of-thumb for setting commit_delay that sounds something like
half the latency of your wal_sync_method as reported by
pg_test_fsync().

 2. Should we rename the GUCs, since this patch will cause them to
 control WAL flush in general, as opposed to commit specifically?
 Peter Geoghegan and Simon were arguing that we should retitle it to
 group_commit_delay rather than just commit_delay, but that doesn't
 seem to be much of an improvement in describing what the new behavior
 will actually be, and I am doubtful that it is worth creating a naming
 incompatibility with previous releases for a cosmetic change.  I
 suggested wal_flush_delay, but there's no consensus on that.
 Opinions?

My main problem with the existing name is that every single article on
commit_delay since it was introduce over ten years ago has been on
balance very negative. Greg Smith's book, the docs, my talk at PgCon,
and any other material in existence about commit_delay that I'm aware
of says the same thing.

commit_delay in master is essentially the same now as it was in 2000;
it's just that commit_siblings is a bit smarter, in that it is now
based on number of active transactions specifically.

Now, based on the fact that the doc changes concerning the practical
details of setting commit_delay survived your editorialisations, I
take it that you accept my view that this is going to fundamentally
alter the advice that surrounds commit_delay, from just don't touch
it to something like this is something that you should probably
consider, that may be very compelling in certain situations.

Let's not shoot ourselves in the foot by keeping the old, disreputable
name. You felt that wal_flush_delay more accurately reflected what the
new setting actually does. You were right, and I'd be perfectly happy
to go along with that.

 The XLByteLE test four lines from the bottom should happen before we
 consider whether to sleep, because it's possible that we'll discover
 that our flush request has already been satisfied and exit without
 doing anything, in which case the sleep is a waste.  The attached
 version adjusts the logic accordingly.

Seems like a minor point, unlikely to make any appreciable difference,
but there's no reason to think that it will have any negative effect
compared to what I've proposed either, so that's fine with me.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent 

[HACKERS] pg_signal_backend() asymmetry

2012-06-27 Thread Josh Kupershmidt
Hi all,

I have one nitpick related to the recent changes for
pg_cancel_backend() and pg_terminate_backend(). If you use these
functions as an unprivileged user, and try to signal a nonexistent
PID, you get:

  ERROR:  must be superuser or have the same role to cancel queries
running in other server processes

Whereas if you do the same thing as a superuser, you get:

  WARNING:  PID 123 is not a PostgreSQL server process
   pg_cancel_backend
  ---
   f
  (1 row)

The comment above the WARNING generated for the latter case says:

/*
 * This is just a warning so a loop-through-resultset
will not abort
 * if one backend terminated on its own during the run
 */

which is nice, but wouldn't unprivileged users want the same behavior?
Not to mention, the ERROR above is misleading, as it claims the
nonexistent PID really belongs to another user.

A simple fix is attached. The existing code called both
BackendPidGetProc(pid) and IsBackendPid(pid) while checking a
non-superuser's permissions, which really means two separate calls to
BackendPidGetProc(pid). I simplified that to down to a single
BackendPidGetProc(pid) call.

Josh


pg_signal_backend_asymmetry.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink

2012-06-27 Thread Shigeru HANADA
On Thu, Jun 14, 2012 at 10:55 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 13, 2012 at 12:07 AM, Amit Kapila amit.kap...@huawei.com wrote:
 To achieve the same in dblink, we need to parse the passed connection string
 and check if it contains fallback_application_name, if yes then its okay,
 otherwise we need to append fallback_application_name in connection string.

 That seems undesirable.  I don't think this is important enough to be
 worth reparsing the connection string for.  I'd just forget about
 doing it for dblink if there's no cheaper way.

Indeed reparsing connection string is not cheap, but dblink does it for
checking password requirement for non-in dblink_connstr_check when the
local user was not a superuser.  So Amit's idea doesn't seem
unreasonable to me, if we can avoid extra PQconninfoParse call.

Just an idea, but how about pushing fallback_application_name handling
into dblink_connstr_check?  We reparse connection string unless local
user was a superuser, so it would not be serious overhead in most cases.
 Although it might require changes in DBLINK_GET_CONN macro...

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread Josh Berkus
Folks,

Yeah, I can't believe I'm calling for *yet another* configuration
variable either.  Suggested workaround fixes very welcome.

The basic issue is that autovacuum_max_workers is set by most users
based on autovac's fairly lightweight action most of the time: analyze,
vacuuming pages not on the visibility list, etc.  However, when XID
wraparound kicks in, then autovac starts reading entire tables from disk
... and those tables may be very large.

This becomes a downtime issue if you've set autovacuum_max_workers to,
say, 5 and several large tables hit the wraparound threshold at the same
time (as they tend to do if you're using the default settings).  Then
you have 5 autovacuum processes concurrently doing heavy IO and getting
in each others' way.

I've seen this at two sites now, and my conclusion is that a single
autovacuum_max_workers isn't sufficient if to cover the case of
wraparound vacuum.  Nor can we just single-thread the wraparound vacuum
(i.e. just one worker) since that would hurt users who have thousands of
small tables.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink

2012-06-27 Thread Robert Haas
On Wed, Jun 27, 2012 at 9:57 PM, Shigeru HANADA
shigeru.han...@gmail.com wrote:
 On Thu, Jun 14, 2012 at 10:55 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 13, 2012 at 12:07 AM, Amit Kapila amit.kap...@huawei.com wrote:
 To achieve the same in dblink, we need to parse the passed connection string
 and check if it contains fallback_application_name, if yes then its okay,
 otherwise we need to append fallback_application_name in connection string.

 That seems undesirable.  I don't think this is important enough to be
 worth reparsing the connection string for.  I'd just forget about
 doing it for dblink if there's no cheaper way.

 Indeed reparsing connection string is not cheap, but dblink does it for
 checking password requirement for non-in dblink_connstr_check when the
 local user was not a superuser.  So Amit's idea doesn't seem
 unreasonable to me, if we can avoid extra PQconninfoParse call.

 Just an idea, but how about pushing fallback_application_name handling
 into dblink_connstr_check?  We reparse connection string unless local
 user was a superuser, so it would not be serious overhead in most cases.
  Although it might require changes in DBLINK_GET_CONN macro...

*shrug*

If it can be done without costing anything meaningful, I don't object,
but I would humbly suggest that this is not hugely important one way
or the other.  application_name is primarily a monitoring convenience,
so it's not hugely important to have it set anyway, and
fallback_application_name is only going to apply in cases where the
user doesn't care enough to bother setting application_name.  Let's
not knock ourselves out to solve a problem that may not be that
important to begin with.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 Yeah, I can't believe I'm calling for *yet another* configuration
 variable either.  Suggested workaround fixes very welcome.

 The basic issue is that autovacuum_max_workers is set by most users
 based on autovac's fairly lightweight action most of the time: analyze,
 vacuuming pages not on the visibility list, etc.  However, when XID
 wraparound kicks in, then autovac starts reading entire tables from disk
 ... and those tables may be very large.

It doesn't seem to me that this has much of anything to do with
wraparound; that just happens to be one possible trigger condition
for a lot of vacuuming activity to be happening.  (Others are bulk
data loads or bulk updates, for instance.)  Nor am I convinced that
changing the max_workers setting is an appropriate fix anyway.

I think what you've really got here is inappropriate autovacuum cost
delay settings, and/or the logic in autovacuum.c to try to divvy up the
available I/O capacity by tweaking workers' delay settings isn't working
very well.  It's hard to propose improvements without a lot more detail
than you've provided, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread David Johnston
On Jun 27, 2012, at 22:00, Josh Berkus j...@agliodbs.com wrote:

 Folks,
 
 Yeah, I can't believe I'm calling for *yet another* configuration
 variable either.  Suggested workaround fixes very welcome.
 
 The basic issue is that autovacuum_max_workers is set by most users
 based on autovac's fairly lightweight action most of the time: analyze,
 vacuuming pages not on the visibility list, etc.  However, when XID
 wraparound kicks in, then autovac starts reading entire tables from disk
 ... and those tables may be very large.
 
 This becomes a downtime issue if you've set autovacuum_max_workers to,
 say, 5 and several large tables hit the wraparound threshold at the same
 time (as they tend to do if you're using the default settings).  Then
 you have 5 autovacuum processes concurrently doing heavy IO and getting
 in each others' way.
 
 I've seen this at two sites now, and my conclusion is that a single
 autovacuum_max_workers isn't sufficient if to cover the case of
 wraparound vacuum.  Nor can we just single-thread the wraparound vacuum
 (i.e. just one worker) since that would hurt users who have thousands of
 small tables.
 
 

Would there be enough benefit to setting up separate small/medium?/large 
thresholds with user-changeable default table size boundaries so that you can 
configure 6 workers where 3 handle the small tables, 2 handle the medium 
tables, and 1 handles the large tables.  Or alternatively a small worker 
consumes 1, medium 2, and large 3 'units' from whatever size pool has been 
defined.  So you could have 6 small tables or two large tables in-progress 
simultaneously.

David J.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread Josh Berkus

 I think what you've really got here is inappropriate autovacuum cost
 delay settings, and/or the logic in autovacuum.c to try to divvy up the
 available I/O capacity by tweaking workers' delay settings isn't working
 very well.  It's hard to propose improvements without a lot more detail
 than you've provided, though.

Wait, we *have* that logic?  If so, that's the problem ... it's not
working very well.

What detail do you want?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread Stephen Frost
Josh, all,

* Josh Berkus (j...@agliodbs.com) wrote:
 Yeah, I can't believe I'm calling for *yet another* configuration
 variable either.  Suggested workaround fixes very welcome.

As I suggested on IRC, my thought would be to have a goal-based system
for autovacuum which is similar to our goal-based commit system.  We
don't need autovacuum sucking up all the I/O in the box, nor should we
ask the users to manage that.  Instead, let's decide when the autovacuum
on a given table needs to finish and then plan to keep on working at a
rate that'll allow us to get done well in advance of that deadline.

Just my 2c.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Server crash while trying to fetch EXPLAIN query results with a cursor

2012-06-27 Thread Tom Lane
Rushabh Lathia rushabh.lat...@gmail.com writes:
 Above testcase endup with server crash. Crash is due to following assert
 into ScanQueryForLocks()
 Assert(parsetree-commandType != CMD_UTILITY);

Meh, yeah, more fallout from the CREATE TABLE AS representation change.

 PFA patch for the same.

Applied with some editorialization.  Thanks!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 I think what you've really got here is inappropriate autovacuum cost
 delay settings, and/or the logic in autovacuum.c to try to divvy up the
 available I/O capacity by tweaking workers' delay settings isn't working
 very well.  It's hard to propose improvements without a lot more detail
 than you've provided, though.

 Wait, we *have* that logic?  If so, that's the problem ... it's not
 working very well.

 What detail do you want?

What's it doing?  What do you think it should do instead?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Josh Berkus (j...@agliodbs.com) wrote:
 Yeah, I can't believe I'm calling for *yet another* configuration
 variable either.  Suggested workaround fixes very welcome.

 As I suggested on IRC, my thought would be to have a goal-based system
 for autovacuum which is similar to our goal-based commit system.  We
 don't need autovacuum sucking up all the I/O in the box, nor should we
 ask the users to manage that.  Instead, let's decide when the autovacuum
 on a given table needs to finish and then plan to keep on working at a
 rate that'll allow us to get done well in advance of that deadline.

If we allow individual vacuum operations to stretch out just because
they don't need to be completed right away, we will need more concurrent
vacuum workers (so that we can respond to vacuum requirements for other
tables).  So I submit that this would only move the problem around:
the number of active workers would increase to the point where things
are just as painful, plus or minus a bit.

The intent of the autovacuum cost delay features is to ensure that
autovacuum doesn't suck an untenable fraction of the machine's I/O
capacity, even when it's running flat out.  So I think Josh's complaint
indicates that we have a problem with cost-delay tuning; hard to tell
what exactly without more info.  It might only be that the defaults
are bad for these particular users, or it could be more involved.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regarding WAL Format Changes

2012-06-27 Thread Amit Kapila

From: Heikki Linnakangas [mailto:heikki.linnakan...@enterprisedb.com] 
Sent: Wednesday, June 27, 2012 8:26 PM
On 27.06.2012 17:14, Amit Kapila wrote:
 2. @@ -2680,8 +2645,8 @@ InstallXLogFileSegment(uint32 *log, uint32 *seg,
 char *tmppath,
   LWLockRelease(ControlFileLock);
   ereport(LOG,
   (errcode_for_file_access(),
 - errmsg(could not link file \%s\ to
 \%s\ (initialization of log file %u, segment %u): %m,
 -tmppath, path, *log,
 *seg)));
 + errmsg(could not link file \%s\ to
 \%s\ (initialization of log file): %m,
 +tmppath, path)));
 If Changed error message can contain log file and segment number, it
  would be more clear. That should be easily
 deducible from segment number.

That seems redundant. The target file name is calculated from the 
segment number, and we're now using the file name instead of log+seg in 
other messages too.

errmsg(could not link file \%s\ to  \%s\ (initialization of log file):
%m, +tmppath, path)));

In this if we try to get the meaning of second part of message
(initialization of log file), it was much 
better previously as in this message it refers 2 files and previously it was
clear initialization of which log
file failed. So we can mention file name in second part of message
(initialization of log file) as well.




 3.   -RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr)
For the above 2 changed error messages, 'log segment' is used for
 filename.
However all similar changes has 'log file' for filename. There are
some
 places
where 'log segment' is used and other places it is 'log file'.
So is there any particular reason for it?

 Not really. There are several messages that use log file %s, and also 
 several places that use log segment %s Should we make it consistent 
 and use either log segment or log file everywhere?

'file' seems to be better option as some users may not be even aware of
segments, they would be using default values of segments and they can relate
to 'file' easily. 
Also using 'WAL' instead of 'log' as suggested by Alvaro is good if others
also thinks same.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread Robert Haas
On Wed, Jun 27, 2012 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Stephen Frost sfr...@snowman.net writes:
 * Josh Berkus (j...@agliodbs.com) wrote:
 Yeah, I can't believe I'm calling for *yet another* configuration
 variable either.  Suggested workaround fixes very welcome.

 As I suggested on IRC, my thought would be to have a goal-based system
 for autovacuum which is similar to our goal-based commit system.  We
 don't need autovacuum sucking up all the I/O in the box, nor should we
 ask the users to manage that.  Instead, let's decide when the autovacuum
 on a given table needs to finish and then plan to keep on working at a
 rate that'll allow us to get done well in advance of that deadline.

 If we allow individual vacuum operations to stretch out just because
 they don't need to be completed right away, we will need more concurrent
 vacuum workers (so that we can respond to vacuum requirements for other
 tables).  So I submit that this would only move the problem around:
 the number of active workers would increase to the point where things
 are just as painful, plus or minus a bit.

 The intent of the autovacuum cost delay features is to ensure that
 autovacuum doesn't suck an untenable fraction of the machine's I/O
 capacity, even when it's running flat out.  So I think Josh's complaint
 indicates that we have a problem with cost-delay tuning; hard to tell
 what exactly without more info.  It might only be that the defaults
 are bad for these particular users, or it could be more involved.

I've certainly come across many reports of the cost delay settings
being difficult to tune, both on pgsql-hackers/performance and in
various private EnterpriseDB correspondence.  I think Stephen's got it
exactly right: the system needs to figure out the rate at which vacuum
needs to happen, not rely on the user to provide that information.

For checkpoints, we estimated the percentage of the checkpoint that
ought to be completed and the percentage that actually is completed;
if the latter is less than the former, we speed things up until we're
back on track.  For autovacuum, the trick is to speed things up when
the rate at which tables are coming due for autovacuum exceeds the
rate at which we are vacuuming them; or, when we anticipate that a
whole bunch of wraparound vacuums are going to come due
simultaneously, to start doing them sooner so that they are more
spread out.

For example, suppose that 26 tables each of which is 4GB in size are
going to simultaneously come due for an anti-wraparound vacuum in 26
hours.  For the sake of simplicity suppose that each will take 1 hour
to vacuum.  What we currently do is wait for 26 hours and then start
vacuuming them all at top speed, thrashing the I/O system.  What we
ought to do is start vacuuming them much sooner and do them
consecutively.  Of course, the trick is to design a mechanism that
does something intelligent if we think we're on track and then all of
a sudden the rate of XID consumption changes dramatically, and now
we've got vacuum faster or with more workers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 For example, suppose that 26 tables each of which is 4GB in size are
 going to simultaneously come due for an anti-wraparound vacuum in 26
 hours.  For the sake of simplicity suppose that each will take 1 hour
 to vacuum.  What we currently do is wait for 26 hours and then start
 vacuuming them all at top speed, thrashing the I/O system.

This is a nice description of a problem that has nothing to do with
reality.  In the first place, we don't vacuum them all at once; we can
only vacuum max_workers of them at a time.  In the second place, the
cost-delay features ought to be keeping autovacuum from thrashing the
I/O, entirely independently of what the reason was for starting the
vacuums.  Clearly, since people are complaining, there's something that
needs work there.  But not the work you're proposing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Posix Shared Mem patch

2012-06-27 Thread Robert Haas
On Wed, Jun 27, 2012 at 9:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jun 27, 2012 at 12:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Would Posix shmem help with that at all?  Why did you choose not to
 use the Posix API, anyway?

 It seemed more complicated.  If we use the POSIX API, we've got to
 have code to find a non-colliding name for the shm, and we've got to
 arrange to clean it up at process exit.  Anonymous shm doesn't require
 a name and goes away automatically when it's no longer in use.

 I see.  Those are pretty good reasons ...

So, should we do it this way?

I did a little research and discovered that Linux 2.3.51 (released
3/11/2000) apparently returns EINVAL for MAP_SHARED|MAP_ANONYMOUS.
That combination is documented to work beginning in Linux 2.4.0.  How
worried should we be about people trying to run PostgreSQL 9.3 on
pre-2.4 kernels?  If we want to worry about it, we could try mapping a
one-page shared MAP_SHARED|MAP_ANONYMOUS segment first.  If that
works, we could assume that we have a working MAP_SHARED|MAP_ANONYMOUS
facility and try to allocate the whole segment plus a minimal sysv
shm.  If the single page allocation fails with EINVAL, we could fall
back to allocating the entire segment as sysv shm.

A related question is - if we do this - should we enable it only on
ports where we've verified that it works, or should we just turn it on
everywhere and fix breakage if/when it's reported?  I lean toward the
latter.

If we find that there are platforms where (a) mmap is not supported or
(b) MAP_SHARED|MAP_ANON works but has the wrong semantics, we could
either shut off this optimization on those platforms by fiat, or we
could test not only that the call succeeds, but that it works
properly: create a one-page mapping and fork a child process; in the
child, write to the mapping and exit; in the parent, wait for the
child to exit and then test that we can read back the correct
contents.  This would protect against a hypothetical system where the
flags are accepted but fail to produce the correct behavior.  I'm
inclined to think this is over-engineering in the absence of evidence
that there are platforms that work this way.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-27 Thread Kyotaro HORIGUCHI
Hello,  thank you for your sugestion.

  I agree. That is the fundamental question. I've coded just for my
  fun but I don't see not so much signicance to do that. We might
  omit the test for this which is non-ciritical and corner cases.
 
 We need these tests to work on Windows too, so fancy gmake tricks are
 probably not the way to deal with varying results.

Hmm. I understand that you suggested that we should do this in
normal regression test.

Ok. Since there found to be only two patterns in the regression
test. The fancy thing is no more needed. I will unfold them and
make sure to work on mingw build environment.

And for one more environment, on the one with VC++.. I'll need a
bit longer time to make out what vcregress.pl does.


On the other things, I will decide as following and sent to
committer as soon as the above is finished.

- The main patch fixes the sql-ascii handling itself shoud ported
  into 9.2 and 9.1. Someone shoud work for this. (me?)

- The remainder of the patch whic fixes the easy fixable leakes
  of palloc'ed memory won't be ported into 9.1. This is only for
  9.3dev.

- The patch for 9.3dev will be provided with the new regression
  test. It will be easily ported into 9.1 and 9.2 and there seems
  to be no problem technically, but a bit unsure from the other
  points of view...

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We probably need autovacuum_max_wraparound_workers

2012-06-27 Thread Robert Haas
On Thu, Jun 28, 2012 at 12:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 For example, suppose that 26 tables each of which is 4GB in size are
 going to simultaneously come due for an anti-wraparound vacuum in 26
 hours.  For the sake of simplicity suppose that each will take 1 hour
 to vacuum.  What we currently do is wait for 26 hours and then start
 vacuuming them all at top speed, thrashing the I/O system.

 This is a nice description of a problem that has nothing to do with
 reality.  In the first place, we don't vacuum them all at once; we can
 only vacuum max_workers of them at a time.  In the second place, the
 cost-delay features ought to be keeping autovacuum from thrashing the
 I/O, entirely independently of what the reason was for starting the
 vacuums.

I don't think it works that way.  The point is that the workload
imposed by autovac is intermittent and spikey.  If you configure the
cost limit too low, or the delay too high, or the number of autovac
workers is too low, then autovac can't keep up, which causes all of
your tables to bloat and is a total disaster.  You have to make sure
that isn't going to happen, so you naturally configure the settings
aggressively enough that you're sure autovac will be able to stay
ahead of your bloat problem.  But then autovac is more
resource-intensive ALL the time, not just when there's a real need for
it.  This is like giving a kid a $20 bill to buy lunch and having them
walk around until they find a restaurant sufficiently expensive that
lunch there costs $20.  The point of handing over $20 was that you
were willing to spend that much *if needed*, not that the money was
burning a hole in your pocket.

To make that more concrete, suppose that a table has an update rate
such that it hits the autovac threshold every 10 minutes.  If you set
the autovac settings such that an autovacuum of that table takes 9
minutes to complete, you are hosed: there will eventually be some
10-minute period where the update rate is ten times the typical
amount, and the table will gradually become horribly bloated.  But if
you set the autovac settings such that an autovacuum of the table can
finish in 1 minute, so that you can cope with a spike, then whenever
there isn't a spike you are processing the table ten times faster than
necessary, and now one minute out of every ten carries a heavier I/O
load than the other 9, leading to uneven response times.

It's just ridiculous to assert that it doesn't matter if all the
anti-wraparound vacuums start simultaneously.  It does matter.  For
one thing, once every single autovacuum worker is pinned down doing an
anti-wraparound vacuum of some table, then a table that needs an
ordinary vacuum may have to wait quite some time before a worker is
available.  Depending on the order in which workers iterate through
the tables, you could end up finishing all of the anti-wraparound
vacuums before doing any of the regular vacuums.  If the wraparound
vacuums had been properly spread out, then there would at all times
have been workers available for regular vacuums as needed.  For
another thing, you can't possibly think that three or five workers
running simultaneously, each reading a different table, is just as
efficient as having one worker grind through them consecutively.
Parallelism is not free, ever, and particularly not here, where it has
the potential to yank the disk head around between five different
files, seeking like crazy, instead of a nice sequential I/O pattern on
each file in turn.  Josh wouldn't keep complaining about this if it
didn't suck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers