Re: [PATCHES] Feature: POSIX Shared memory support, round 2

2007-02-09 Thread Tom Lane
Chris Marcellino [EMAIL PROTECTED] writes:
 Here is a new patch that uses the POSIX api's. It encodes the  
 canonical path (see 'man realpath') of the database's data directory  
 into the shared memory segment name using an strong hash function to  
 make it fit in the shared memory segment name under all cases,  
 without risk of key collision.

I find this patch utterly unreadable, because of your cavalier disregard
for making the comments match the truth.  You have copied-and-pasted the
original SysV code and fixed some small fraction of the comments, and I
cannot tell which ones still reflect reality --- but I can tell that a
lot of them don't.

Also, I don't see where this implements any sort of detection of live
backends attached to an existing segment, so I don't think you have
responded to that objection.  Magnus' idea for Windows was to use a
segment set up to automatically go away as soon as the last attacher
died, but AFAICT that isn't how this works.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-08 Thread Magnus Hagander
On Wed, Feb 07, 2007 at 09:40:16AM -0500, Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
  On Tue, Feb 06, 2007 at 11:08:51PM -0500, Tom Lane wrote:
  AFAIK the Windows port is simply wrong/insecure on this point --- it's
  one of the reasons you'll never see me recommending Windows as the OS
  for a production Postgres server.
 
  What exactly is the failure case? Might be able to figure out a way to
  do what we want on win32 even if it's not possible to do it exactly with
  the sysv semantics.
 
 kill -9 postmaster (only), then try to start new postmaster.  This
 should succeed if and only if there are no live orphaned backends.
 An implementation that hasn't got a direct test for the presence of
 backends can only get one of the two cases correct.
 
 On Windows (or really any EXEC_BACKEND platform) there's an additional
 problem, which is that even with an attach count you have a race
 condition: what if the postmaster launched a new backend just before
 dying, and that process has not yet re-attached to shared memory?
 I don't think this is a big problem in practice, because most people
 don't feel a need for an automated postmaster-restarting monitor, and
 so the time scale for human intervention is too long to hit the race
 condition.  But it's annoying from a theoretical perspective.
 
 It's probably possible to replace the attach-count test with some sort
 of file locking convention --- eg if all the backends hold some type of
 shared lock on postmaster.pid.  This seems unlikely to be much more
 portable than the attach-count solution as far as Unixen go, but if
 we're looking for a Windows-specific solution that's where I'd look.

Ok. From what I can tell, we create a shared mem segment named
PostgreSQL.5432001. If I kill postmaster with something active, and
start a new one, it gets named PostgreSQL.5432002.

If we just didn't add the serial number at the end, then it would be
impossible to create a shared memory segment for the same port again.
That protects the port and not the datadir. But what if we change the
name of the shared memory segment to be that of the data directory
instead of the port?

On win32 we do not have the problem of orphaned segments, because once
the last process that holds a segment dies, the segment always goes
away. An anonymous region cannot exist if there are no handles open to
it.

As for the EXEC_BACKEND case you mentioned,  don't think it's an issue
on win32. If the postmaster dies before the backend re-attaches, the
backend will fail to re-attach. I think?

Thoughts?

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-08 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 If we just didn't add the serial number at the end, then it would be
 impossible to create a shared memory segment for the same port again.
 That protects the port and not the datadir. But what if we change the
 name of the shared memory segment to be that of the data directory
 instead of the port?

That would help if there's only one possible spelling of the data
directory path ... otherwise not so much ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-08 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 If we just didn't add the serial number at the end, then it would be
 impossible to create a shared memory segment for the same port again.
 That protects the port and not the datadir. But what if we change the
 name of the shared memory segment to be that of the data directory
 instead of the port?
 
 That would help if there's only one possible spelling of the data
 directory path ... otherwise not so much ...

Well, we could run GetFullPathName() on it
(http://msdn2.microsoft.com/en-us/library/aa364963.aspx). I think that
should work - takes out the relative vs absolute path part at least.

It won't take care of somebody having a junction pointing at the data
directory and starting it against that one, but that's really someone
*trying* to break the system. You wouldn't do that by mistake...

Seems worthwhile to you? If so I can take a look at doing it when I get
some spare time.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-08 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 If we just didn't add the serial number at the end, then it would be
 impossible to create a shared memory segment for the same port again.
 That protects the port and not the datadir. But what if we change the
 name of the shared memory segment to be that of the data directory
 instead of the port?
 
 That would help if there's only one possible spelling of the data
 directory path ... otherwise not so much ...

 Well, we could run GetFullPathName() on it
 (http://msdn2.microsoft.com/en-us/library/aa364963.aspx). I think that
 should work - takes out the relative vs absolute path part at least.

 It won't take care of somebody having a junction pointing at the data
 directory and starting it against that one, but that's really someone
 *trying* to break the system. You wouldn't do that by mistake...

 Seems worthwhile to you? If so I can take a look at doing it when I get
 some spare time.

Sounds reasonable --- certainly it'd be better than the current
situation.  I assume that we can have long enough shared memory segment
names that the data directory path length isn't unduly constrained?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-08 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Magnus Hagander [EMAIL PROTECTED] writes:
 If we just didn't add the serial number at the end, then it would be
 impossible to create a shared memory segment for the same port again.
 That protects the port and not the datadir. But what if we change the
 name of the shared memory segment to be that of the data directory
 instead of the port?
 That would help if there's only one possible spelling of the data
 directory path ... otherwise not so much ...
 
 Well, we could run GetFullPathName() on it
 (http://msdn2.microsoft.com/en-us/library/aa364963.aspx). I think that
 should work - takes out the relative vs absolute path part at least.
 
 It won't take care of somebody having a junction pointing at the data
 directory and starting it against that one, but that's really someone
 *trying* to break the system. You wouldn't do that by mistake...
 
 Seems worthwhile to you? If so I can take a look at doing it when I get
 some spare time.
 
 Sounds reasonable --- certainly it'd be better than the current
 situation.  I assume that we can have long enough shared memory segment
 names that the data directory path length isn't unduly constrained?

From what I can see, we can have a shared memory segment name that is
just as long as any path name. Will run some tests on that to make
absolutely sure.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PATCHES] Feature: POSIX Shared memory support, round 2

2007-02-08 Thread Chris Marcellino

As discussed earlier, using POSIX shared memory can solve a few issues,
On Mac OS X and other BSD's, the default System V shared memory  
limits are often very low and require adjustment for acceptable  
performance. Particularly, when Postgres is included as part of  
larger end-user friendly software products, these kernel settings  
are often difficult to change for 2 reasons:


1. The (arbitrarily) limited resources must be shared by all  
programs that use System V shared memory. For example on my Mac OS  
X computer, I have Postgres running a standalone database, but also  
as part of Apple Remote Desktop. Without manual adjustment, running  
both simultaneously causes one of them to fail. Correcting this in  
any robust way is challenging to automate for consumer-style (i.e.  
Mac) installers.


2. On these BSD's, this System V shared memory is wired down and  
cannot be swapped out for any reason. If Postgres is running as  
part of another software program or is a lower priority, other  
programs cannot use the potentially limited memory. This places the  
user or developer in a tricky position of having to minimize  
overall system impact, while permitting enough shared memory for  
Postgres to perform well.


Also, the SysV code is complex since it needs to deal with the  
(probable) likelihood that a shmid will collide with another program  
or postmaster.


Here is a new patch that uses the POSIX api's. It encodes the  
canonical path (see 'man realpath') of the database's data directory  
into the shared memory segment name using an strong hash function to  
make it fit in the shared memory segment name under all cases,  
without risk of key collision.


I have taken a new, simpler approach to handling databases that have  
been kill -9 or crashed. It is described in the comments, but  
essentially since all collisions in shared memory key must be from   
orphaned backends or crashed postmasters from the current data  
directory, they can be freed. A 2 character identifier field is  
prepended to the data directory hash, which is incremented after  
freeing an orphan, so that the new postmaster need not wait for the  
backends to die. This approach also works equally well on Windows as  
it does on Unixen. The comments also describe some of the portability  
concerns (which have been handled). Please see the code  
(PGSharedMemoryCreate and its helpers) for more information on this  
point.


To build/test this, place the attached file in src/backend/port/ and  
change the symbolic link pg_shmem.c to point to this file. If this  
gets used on BSD's, keep in mind that shared memory is no longer  
drawn from the SysV pool, so the SysV settings (SHMMAX, etc.) can be  
set to their default values to recover the memory that was wired down  
for the SysV pool.

I don't have access to any Linux machines to test this.

Thanks for your feedback,
Chris Marcellino




posix_shmem.c
Description: Binary data

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-07 Thread Magnus Hagander
On Tue, Feb 06, 2007 at 11:08:51PM -0500, Tom Lane wrote:
 Takayuki Tsunakawa [EMAIL PROTECTED] writes:
  From: Tom Lane [EMAIL PROTECTED]
  the POSIX API provides no way to detect whether anyone else is
  attached to the segment.  Not being able to tell that is a tremendous
  robustness hit for us.
 
  How is this done on Windows?  Is it possible to count the number of
  processes that attach a shared memory?
 
 AFAIK the Windows port is simply wrong/insecure on this point --- it's
 one of the reasons you'll never see me recommending Windows as the OS
 for a production Postgres server.

What exactly is the failure case? Might be able to figure out a way to
do what we want on win32 even if it's not possible to do it exactly with
the sysv semantics.

//Magnus

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-07 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 On Tue, Feb 06, 2007 at 11:08:51PM -0500, Tom Lane wrote:
 AFAIK the Windows port is simply wrong/insecure on this point --- it's
 one of the reasons you'll never see me recommending Windows as the OS
 for a production Postgres server.

 What exactly is the failure case? Might be able to figure out a way to
 do what we want on win32 even if it's not possible to do it exactly with
 the sysv semantics.

kill -9 postmaster (only), then try to start new postmaster.  This
should succeed if and only if there are no live orphaned backends.
An implementation that hasn't got a direct test for the presence of
backends can only get one of the two cases correct.

On Windows (or really any EXEC_BACKEND platform) there's an additional
problem, which is that even with an attach count you have a race
condition: what if the postmaster launched a new backend just before
dying, and that process has not yet re-attached to shared memory?
I don't think this is a big problem in practice, because most people
don't feel a need for an automated postmaster-restarting monitor, and
so the time scale for human intervention is too long to hit the race
condition.  But it's annoying from a theoretical perspective.

It's probably possible to replace the attach-count test with some sort
of file locking convention --- eg if all the backends hold some type of
shared lock on postmaster.pid.  This seems unlikely to be much more
portable than the attach-count solution as far as Unixen go, but if
we're looking for a Windows-specific solution that's where I'd look.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-07 Thread Alvaro Herrera
Andrew Dunstan wrote:

 Maybe we should look some more at that. Use of file locking was one
 thought I had today after I saw Tom's earlier comments.
 
 Perl provides a moderately portable flock(), which we use in fact in
 buildfarm to stop it from running more than one at a time on a given repo
 copy.

But does it work over NFS?  On my system, the flock manpage claims it
doesn't, lockf doesn't say and fcntl also doesn't say, but the flock
manpage says fcntl does.  A lot of people runs servers on NFS, even
though we recommend they don't.  And there are those strange hybrids
like SANs, NASes or what have you.

One serious problem is that if the lock doesn't work for some reason
like NFSness, it will fail silently, which is not acceptable.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-07 Thread Andrew Dunstan

Alvaro Herrera wrote:

Andrew Dunstan wrote:

  

Maybe we should look some more at that. Use of file locking was one
thought I had today after I saw Tom's earlier comments.

Perl provides a moderately portable flock(), which we use in fact in
buildfarm to stop it from running more than one at a time on a given repo
copy.



But does it work over NFS?  On my system, the flock manpage claims it
doesn't, lockf doesn't say and fcntl also doesn't say, but the flock
manpage says fcntl does.  A lot of people runs servers on NFS, even
though we recommend they don't.  And there are those strange hybrids
like SANs, NASes or what have you.

One serious problem is that if the lock doesn't work for some reason
like NFSness, it will fail silently, which is not acceptable.

  


Fair point. Perl in fact uses whatever it can from the underlying 
system,  preferring (I think) flock, then fcntl, then lockf. So its 
flock is quite possibly not NFS safe in many cases.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-07 Thread Alvaro Herrera
Andrew Dunstan wrote:

 Perl provides a moderately portable flock(), which we use in fact in
 buildfarm to stop it from running more than one at a time on a given repo
 copy.
 
[...]

 Maybe we can borrow some code.

Probably not, because it's GPL/Artistic; but we could borrow some ideas
instead.

The relevant code is here
http://public.activestate.com/cgi-bin/perlbrowse/f/pp_sys.c

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Chris Marcellino
On Mac OS X and other BSD's, the default System V shared memory  
limits are often very low and require adjustment for acceptable  
performance. Particularly, when Postgres is included as part of  
larger end-user friendly software products, these kernel settings are  
often difficult to change for 2 reasons:


1. The (arbitrarily) limited resources must be shared by all programs  
that use System V shared memory. For example on my Mac OS X computer,  
I have Postgres running a standalone database, but also as part of  
Apple Remote Desktop. Without manual adjustment, running both  
simultaneously causes one of them to fail. Correcting this in any  
robust way is challenging to automate for consumer-style (i.e. Mac)  
installers.


2. On these BSD's, this System V shared memory is wired down and  
cannot be swapped out for any reason. If Postgres is running as part  
of another software program or is a lower priority, other programs  
cannot use the potentially limited memory. This places the user or  
developer in a tricky position of having to minimize overall system  
impact, while permitting enough shared memory for Postgres to perform  
well.


To this end, I have ported the svsv_shmem.c layer to use the POSIX  
calls (which are some ways more robust w.r.t reducing collision by  
using strings as shared memory id's, instead of ints).


In principle, this should not have any significant affect on  
performance. Running PGBench on a few different load types gives very  
similar results (-3%/+1%), that aren't very statistically  
significant. Of course, on a un-tuned Mac OS X machine (where the  
original SysV version is limited to the default 4MB) the POSIX  
version outperforms significantly (+250%). Using the POSIX calls  
helps minimize the kernel side of the tuning, which is a big plus for  
integrated uses of Postgres, but also for other amateur installations  
(i.e. Fink).


If this is appropriate for the distribution, it could become a  
'contrib' add-on or it could be a autoconf custom build option until  
it reached greater maturity.


Any thoughts? Suggestions? I would also appreciate any advice on more  
sophisticate ways to measure the performance impacts of a change like  
this.


Thanks,
Chris Marcellino
Apple Computer, Inc.





posix_shmem.c
Description: Binary data




src/backend/port/posix_shmem.c
===
/ 
*--- 
--

 *
 * posix_shmem.c
 *Implement shared memory using POSIX facilities
 *
 * These routines represent a fairly thin layer on top of POSIX shared
 * memory functionality.
 *
 * Portions Copyright (c) 1996-2006, PostgreSQL Global Development  
Group

 * Portions Copyright (c) 1994, Regents of the University of California
 *
  
*--- 
--

 */
#include postgres.h

#include signal.h
#include unistd.h
#include sys/file.h
#include sys/types.h
#include sys/stat.h
#include sys/mman.h
#ifdef HAVE_KERNEL_OS_H
#include kernel/OS.h
#endif

#include miscadmin.h
#include storage/ipc.h
#include storage/pg_shmem.h


#define IPCProtection   (0600)  /* access/modify by user only */
#define IPCNameLength		32	/* must be long enough to contain all  
possible format strings

 * see 
GenerateIPDName */


unsigned long UsedShmemSegID = 0;
void   *UsedShmemSegAddr = NULL;

static void GenerateIPCName(int memKey, char *dest);
static void *InternalIpcMemoryCreate(int memKey, Size size);
static void IpcMemoryDetach(int status, Datum shmaddr);
static void IpcMemoryDelete(int status, Datum memKey);
static PGShmemHeader *PGSharedMemoryAttach(int key);


/*
 *  GenerateIPCName(key, dest)
 *
 * Generate a shared memory object key name using the argument key.
 * This uses the magic number and text to prevent collisions from other
 * apps.
 */
static void
GenerateIPCName(int memKey, char *dest)
{
	/* This must be 31 characters or less for portability (i.e. Mac OS  
X) */

sprintf(dest, PostgreSQL.%lx.%lx, PGShmemMagic, memKey);
}

/*
 *  InternalIpcMemoryCreate(memKey, size)
 *
 * Attempt to create a new shared memory segment with the specified  
key.
 * Will fail (return NULL) if such a segment already exists.  If  
successful,
 * attach the segment to the current process and return its attached  
address.
 * On success, callbacks are registered with on_shmem_exit to detach  
and

 * delete the segment when on_shmem_exit is called.
 *
 * If we fail with a failure code other than collision-with-existing- 
segment,
 * print out an error and abort.  Other types of errors are not  
recoverable.

 */
static void *
InternalIpcMemoryCreate(int memKey, Size size)
{
int fd;
void   *memAddress;
charkeyName[IPCNameLength];
struct  stat statbuf;


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Tom Lane
Chris Marcellino [EMAIL PROTECTED] writes:
 To this end, I have ported the svsv_shmem.c layer to use the POSIX  
 calls (which are some ways more robust w.r.t reducing collision by  
 using strings as shared memory id's, instead of ints).

This has been suggested before, and rejected before, on the grounds that
the POSIX API provides no way to detect whether anyone else is attached
to the segment.  Not being able to tell that is a tremendous robustness
hit for us.  We are not going to risk destroying someone's database
(or in the alternative, failing to restart after most crashes, which
it looks like your patch would do) in order to make installation
fractionally easier.

I read through your patch in the hopes that you had a solution for this,
but all I find is a copied-and-pasted comment

   /*
* We detect whether a shared memory segment is in use by seeing whether
* it (a) exists and (b) has any processes are attached to it.
*/

followed by code that does no such thing.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Michael Paesold

Tom Lane wrote:

Chris Marcellino [EMAIL PROTECTED] writes:
To this end, I have ported the svsv_shmem.c layer to use the POSIX  
calls (which are some ways more robust w.r.t reducing collision by  
using strings as shared memory id's, instead of ints).


This has been suggested before, and rejected before, on the grounds that
the POSIX API provides no way to detect whether anyone else is attached
to the segment.  Not being able to tell that is a tremendous robustness
hit for us.  We are not going to risk destroying someone's database
(or in the alternative, failing to restart after most crashes, which
it looks like your patch would do) in order to make installation
fractionally easier.

I read through your patch in the hopes that you had a solution for this,
but all I find is a copied-and-pasted comment


/*
 * We detect whether a shared memory segment is in use by seeing whether
 * it (a) exists and (b) has any processes are attached to it.
 */


followed by code that does no such thing.


Just an idea, but would it be possible to have a small SysV area as an 
advisory lock (using the existing semantics) to protect the POSIX segment.


Best Regards
Michael Paesold


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Chris Marcellino
Tom, that is a definitely valid point and thanks for the feedback. I  
assume that the 'more modern' string segment naming gave the POSIX  
methods an edge in avoiding collision between other apps.
As far as detecting a) whether anyone else is currently attached to  
that segment and b) whether an earlier existence of the current  
backend was still attached to a segment, I presumed that checking the  
pid's of the backend that owns the shared memory segment and checking  
the data directory (both which the SysV code already does) would  
suffice?

What am I forgetting?

Michael, that is an interesting idea. That might be an avenue to  
explore if there isn't a simpler way.


Thanks,
Chris Marcellino


On Feb 6, 2007, at 7:51 AM, Michael Paesold wrote:


Tom Lane wrote:

Chris Marcellino [EMAIL PROTECTED] writes:
To this end, I have ported the svsv_shmem.c layer to use the  
POSIX  calls (which are some ways more robust w.r.t reducing  
collision by  using strings as shared memory id's, instead of ints).
This has been suggested before, and rejected before, on the  
grounds that
the POSIX API provides no way to detect whether anyone else is  
attached
to the segment.  Not being able to tell that is a tremendous  
robustness

hit for us.  We are not going to risk destroying someone's database
(or in the alternative, failing to restart after most crashes, which
it looks like your patch would do) in order to make installation
fractionally easier.
I read through your patch in the hopes that you had a solution for  
this,

but all I find is a copied-and-pasted comment

/*
	 * We detect whether a shared memory segment is in use by seeing  
whether

 * it (a) exists and (b) has any processes are attached to it.
 */

followed by code that does no such thing.


Just an idea, but would it be possible to have a small SysV area as  
an advisory lock (using the existing semantics) to protect the  
POSIX segment.


Best Regards
Michael Paesold


---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Alvaro Herrera
Chris Marcellino wrote:
 Tom, that is a definitely valid point and thanks for the feedback. I  
 assume that the 'more modern' string segment naming gave the POSIX  
 methods an edge in avoiding collision between other apps.
 As far as detecting a) whether anyone else is currently attached to  
 that segment and b) whether an earlier existence of the current  
 backend was still attached to a segment, I presumed that checking the  
 pid's of the backend that owns the shared memory segment and checking  
 the data directory (both which the SysV code already does) would  
 suffice?

Is there an API call to list all PIDs that are connected to a particular
segment?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Chris Marcellino
To my knowledge there is unfortunately not a portable call that does  
that.
I was actually referring to the check that the current SysV code does  
on the pid that is stored in the shmem header. I presume that if the  
backend is dead, the kill(hdr-creatorPID, 0) returning zero would  
suffice for confirming the existence of the other backend process.


Chris Marcellino

On Feb 6, 2007, at 10:32 AM, Alvaro Herrera wrote:


Chris Marcellino wrote:

Tom, that is a definitely valid point and thanks for the feedback. I
assume that the 'more modern' string segment naming gave the POSIX
methods an edge in avoiding collision between other apps.
As far as detecting a) whether anyone else is currently attached to
that segment and b) whether an earlier existence of the current
backend was still attached to a segment, I presumed that checking the
pid's of the backend that owns the shared memory segment and checking
the data directory (both which the SysV code already does) would
suffice?


Is there an API call to list all PIDs that are connected to a  
particular

segment?

--
Alvaro Herrerahttp:// 
www.CommandPrompt.com/

PostgreSQL Replication, Consulting, Custom Development, 24x7 support



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Tom Lane
Chris Marcellino [EMAIL PROTECTED] writes:
 I was actually referring to the check that the current SysV code does  
 on the pid that is stored in the shmem header. I presume that if the  
 backend is dead, the kill(hdr-creatorPID, 0) returning zero would  
 suffice for confirming the existence of the other backend process.

No, that's not relevant, because only the postmaster's PID will be there
--- that test is actually more or less redundant with the existing
postmaster.pid lockfile checks.  The thing that the SysV attachment
count is useful for is detecting whether there are orphaned backends
still alive in the database (and potentially changing it, hence the
danger).

We've speculated on occasion about using file locking in some form as a
substitute mechanism for detecting this, but that seems to just bring
its own set of not-too-portable assumptions.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Takayuki Tsunakawa
From: Chris Marcellino [EMAIL PROTECTED]
 To this end, I have ported the svsv_shmem.c layer to use the POSIX
 calls (which are some ways more robust w.r.t reducing collision by
 using strings as shared memory id's, instead of ints).

I hope your work will be accepted.  Setting IPC parameters is tedious
for normal users, and they sometimes miss the manual article and hit
the IPC resource shortage problem, particularly when the system
developers run multiple instances on a single machine at the same
time.
Then, how about semaphores?  When I just do configure, PostgreSQL
seems to use SysV semaphores.  But POSIX semaphore implementation is
prepared in src/backend/port/posix_sema.c.  Why isn't it used by
default?  Does it have any problem?
# Windows is good in this point, isn't it?

I'm sorry to ask you a question even though I've not read your patch
well.  Does mmap(MAP_SHARED) need msync() to make the change by one
process visible to other processes?  I found the following in the
manual page of mmap on Linux:


   MAP_SHARED Share this mapping with all other processes that
map  this
object.   Storing to the region is equivalent to writing to
the file.  The file  may  not  actually  be  updated until
msync(2) or munmap(2) are called.


BTW, is the number of semaphores for dummy backends (eg bgwriter,
autovacuum) counted in PostgreSQL manual?

From: Tom Lane [EMAIL PROTECTED]
 the POSIX API provides no way to detect whether anyone else is
attached
 to the segment.  Not being able to tell that is a tremendous
robustness
 hit for us.  We are not going to risk destroying someone's database
 (or in the alternative, failing to restart after most crashes, which
 it looks like your patch would do) in order to make installation
 fractionally easier.

How is this done on Windows?  Is it possible to count the number of
processes that attach a shared memory?




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Chris Marcellino

Responses inline.

On Feb 6, 2007, at 7:05 PM, Takayuki Tsunakawa wrote:


From: Chris Marcellino [EMAIL PROTECTED]

To this end, I have ported the svsv_shmem.c layer to use the POSIX
calls (which are some ways more robust w.r.t reducing collision by
using strings as shared memory id's, instead of ints).


I hope your work will be accepted.  Setting IPC parameters is tedious
for normal users, and they sometimes miss the manual article and hit
the IPC resource shortage problem, particularly when the system
developers run multiple instances on a single machine at the same
time.


As Tom pointed out, the code I posted yesterday is not robust enough  
for general consumption. I'm working on a better solution, which will  
likely involve using a very small SysV shmem segment as a mutex of  
sorts (as Michael Paesold suggested).



Then, how about semaphores?  When I just do configure, PostgreSQL
seems to use SysV semaphores.  But POSIX semaphore implementation is
prepared in src/backend/port/posix_sema.c.  Why isn't it used by
default?  Does it have any problem?



In this case, semaphore usage is unrelated to shared memory  
shortages. Also, on many platforms the posix_sema's code is used.  
Either way, Essentially, no one is running out of shared memory due  
to semaphores.



# Windows is good in this point, isn't it?


From what I can tell, if you look at the Windows SysV shmem  
emulation code in src/backend/port/win32/shmem.c, you will see in the  
shmctl() function that the 'other process detection' code is not  
implemented, since their is no corresponding Win32 API to implement  
this. There is only so much you can do in that case.


As far as the other platforms go, any replacement for the SysV shmem  
code should be as reliable as what preceded it.





I'm sorry to ask you a question even though I've not read your patch
well.  Does mmap(MAP_SHARED) need msync() to make the change by one
process visible to other processes?  I found the following in the
manual page of mmap on Linux:


   MAP_SHARED Share this mapping with all other processes that
map  this
object.   Storing to the region is equivalent to writing to
the file.  The file  may  not  actually  be  updated until
msync(2) or munmap(2) are called.


BTW, is the number of semaphores for dummy backends (eg bgwriter,
autovacuum) counted in PostgreSQL manual?

From: Tom Lane [EMAIL PROTECTED]

the POSIX API provides no way to detect whether anyone else is

attached

to the segment.  Not being able to tell that is a tremendous

robustness

hit for us.  We are not going to risk destroying someone's database
(or in the alternative, failing to restart after most crashes, which
it looks like your patch would do) in order to make installation
fractionally easier.


How is this done on Windows?  Is it possible to count the number of
processes that attach a shared memory?




---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Tom Lane
Takayuki Tsunakawa [EMAIL PROTECTED] writes:
 From: Tom Lane [EMAIL PROTECTED]
 the POSIX API provides no way to detect whether anyone else is
 attached to the segment.  Not being able to tell that is a tremendous
 robustness hit for us.

 How is this done on Windows?  Is it possible to count the number of
 processes that attach a shared memory?

AFAIK the Windows port is simply wrong/insecure on this point --- it's
one of the reasons you'll never see me recommending Windows as the OS
for a production Postgres server.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Tom Lane
Chris Marcellino [EMAIL PROTECTED] writes:
 As Tom pointed out, the code I posted yesterday is not robust enough  
 for general consumption. I'm working on a better solution, which will  
 likely involve using a very small SysV shmem segment as a mutex of  
 sorts (as Michael Paesold suggested).

One problem with Michael's idea is that it gives up one of the better
arguments for having a POSIX option, namely to allow us to run on
platforms where SysV shmem support is not there at all.

I'm not sure whether the idea can be implemented without creating new
failure modes; that will have to wait on seeing a patch.  But the
strength of the coupling between the SysV and POSIX segments is
certainly going to be a red-flag item to look at.

 Then, how about semaphores?  When I just do configure, PostgreSQL
 seems to use SysV semaphores.  But POSIX semaphore implementation is
 prepared in src/backend/port/posix_sema.c.  Why isn't it used by
 default?  Does it have any problem?

 In this case, semaphore usage is unrelated to shared memory  
 shortages. Also, on many platforms the posix_sema's code is used.  
 Either way, Essentially, no one is running out of shared memory due  
 to semaphores.

AFAIK the only platform where the POSIX sema code is really used is
Darwin (OS X), and it is not something I'd use there if I had a choice.
The problem with it is that *every* semaphore corresponds to an open
file handle in the postmaster that has to be inherited by *every* forked
child.  So N backend slots cost you O(N^2) in kernel filehandles and
process fork overhead, plus if N is big you're taking a serious hit in
the number of disk files any one backend can have open.  This problem
may be specific to Darwin's implementation of the POSIX spec, but it's
real enough there.  If you trawl the archives you'll probably notice a
lack of people running big Postgres installations on Darwin, and this is
why.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Takayuki Tsunakawa
ep


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Takayuki Tsunakawa
 Then, how about semaphores?  When I just do configure, PostgreSQL
 seems to use SysV semaphores.  But POSIX semaphore implementation
is
 prepared in src/backend/port/posix_sema.c.  Why isn't it used by
 default?  Does it have any problem?


 Either way, Essentially, no one is running out of shared memory due
 to semaphores.
 In this case, semaphore usage is unrelated to shared memory
 shortages.

Yes, of course, shared memory is not related to semaphores.

 Also, on many platforms the posix_sema's code is used.

Really?  When I run 'configure' without any parameter on Red Hat
Enterprise Linux 4.0 (kernel 2.6.x), PostgreSQL uses SysV semaphores.
I confirmed that by seeing the result of 'ipcs -u'.  What platforms is
POSIX sema used by PostgreSQL by default?





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Chris Marcellino
Yes, as Tom pointed out. Sorry, I misread the autoconf file. I've  
gotten quite used to Darwin == BSD.
I've added a note to my todo list to look into the posix semaphore  
performance on the Darwin side.


--Chris

On Feb 6, 2007, at 8:32 PM, Takayuki Tsunakawa wrote:


Then, how about semaphores?  When I just do configure, PostgreSQL
seems to use SysV semaphores.  But POSIX semaphore implementation

is

prepared in src/backend/port/posix_sema.c.  Why isn't it used by
default?  Does it have any problem?



Either way, Essentially, no one is running out of shared memory due
to semaphores.
In this case, semaphore usage is unrelated to shared memory
shortages.


Yes, of course, shared memory is not related to semaphores.


Also, on many platforms the posix_sema's code is used.


Really?  When I run 'configure' without any parameter on Red Hat
Enterprise Linux 4.0 (kernel 2.6.x), PostgreSQL uses SysV semaphores.
I confirmed that by seeing the result of 'ipcs -u'.  What platforms is
POSIX sema used by PostgreSQL by default?



---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Chris Marcellino
Attached is a beta of the POSIX shared memory layer. It is 75% the  
original sysv_shmem.c code. I'm looking for ways to refactor it down  
a bit, while changing as little of the tried-and-tested code as  
possible. I though I'd put it out there for comments.


Of course, unfortunately it is more complicated than the original as  
it uses both sets of API.  Also, I haven't tested the crash recovery  
thoroughly.  The POSIX code could be used Windows-style (i.e. no  
crash recovery) if one ifdef'd out the SysV calls properly, if they  
had such a POSIX-only platform they needed to run Postgres on.


Using both API is certainly not ideal. You mentioned,

We've speculated on occasion about using file locking in some form  
as a

substitute mechanism for detecting this, but that seems to just bring
its own set of not-too-portable assumptions


What sort of file locking did you have in mind? Do you think this  
might be worth me trying?


Thanks for your help,
Chris Marcellino




posix_shmem.c
Description: Binary data

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Feature: POSIX Shared memory support

2007-02-06 Thread Andrew Dunstan
Tom Lane wrote:

 We've speculated on occasion about using file locking in some form as a
 substitute mechanism for detecting this, but that seems to just bring
 its own set of not-too-portable assumptions.



Maybe we should look some more at that. Use of file locking was one
thought I had today after I saw Tom's earlier comments.

Perl provides a moderately portable flock(), which we use in fact in
buildfarm to stop it from running more than one at a time on a given repo
copy.

The Perl description starts thus:

   Calls flock(2), or an emulation of it, on FILEHANDLE.  Returns
   true for success, false on failure.  Produces a fatal error if
   used on a machine that doesn't implement flock(2), fcntl(2)
   locking, or lockf(3).  flock is Perl's portable file locking
   interface, although it locks only entire files, not records.

Note that this means it works on every platform that has ever reported on
buildfarm.

Maybe we can borrow some code.

cheers

andrew






---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match