Re: [HACKERS] could not reattach to shared memory captured in buildfarm

2009-05-05 Thread Magnus Hagander
Alvaro Herrera wrote:
 Magnus Hagander wrote:
 
 I didn't mean race condition between backends. I meant against a
 potential other thread started by a loaded DLL for initialization.
 (Again, things like antivirus are known to do this, and we do see these
 issues more often if AV is present for example)
 
 I don't understand this.  How can memory allocated by a completely separate
 process affect what happens to a backend?  I mean, if an antivirus is running,
 surely it does not run on the backend's process?  Or does it?

Anti[something] software regularly injects code into other processes,
yes. Either by creating a thread in the process using
CreateRemoteThread() or by using techniques similar to LD_PRELOAD.

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not reattach to shared memory captured in buildfarm

2009-05-05 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 One proposed fix is to allocate a fairly large block of memory in the
 postmaster just before we get the shared memory, and then free it right
 away. The effect should be to push down the shared memory segment
 further in the address space.

I have no enthusiasm for doing something like this when we have so
little knowledge of what's actually happening.  We have *no* idea
whether the above could help, or what size of allocation to request.
It's not very hard to imagine that the wrong size choice could make
things worse rather than better.

It seems to me that what we ought to do now is make a serious effort
to gather more data.  I came across a suggestion that one could use
VirtualQuery() to generate a map of the process address space
under Windows.  I suggest that we add some code that is executed
if the reattach attempt fails and dumps the process address space
details to the postmaster log.  Dumping the postmaster's address
space at the time it successfully creates the shmem segment might
be useful for comparison, too.

(A quick look at the VirtualQuery spec indicates that you can't tell
very much beyond free/allocated status, though.  Maybe there's some
other call that would tell more?  It'd be really good if we could get
the names of DLLs occupying memory ranges, for example.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not reattach to shared memory captured in buildfarm

2009-05-04 Thread Magnus Hagander
Tom Lane wrote:
 vaquita has an interesting report today:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=vaquitadt=2009-05-01%2020:00:06
 
 Partway through the contrib tests, for absolutely no visible reason
 whatsoever, connections start to fail with
 FATAL:  could not reattach to shared memory (key=364, addr=0292): 487

Note that 487 is invalid address, and should not have anything to do
with the issues Andrew mentioned (which were about the already-exists
error).

Somebody else mentioned, and IIRC I talked to Dave about this before,
that this could be because the address is no longer available. The
reason for this could be some kind of race condition in the backends
starting - the address is available when the postmaster starts and thus
it's used, but when a regular backend starts, the memory is used for
something else.

One proposed fix is to allocate a fairly large block of memory in the
postmaster just before we get the shared memory, and then free it right
away. The effect should be to push down the shared memory segment
further in the address space.

Comments?

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not reattach to shared memory captured in buildfarm

2009-05-04 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 Somebody else mentioned, and IIRC I talked to Dave about this before,
 that this could be because the address is no longer available. The
 reason for this could be some kind of race condition in the backends
 starting - the address is available when the postmaster starts and thus
 it's used, but when a regular backend starts, the memory is used for
 something else.

How is it no longer available, when the new backend is a brand new
process?  The race condition bit seems even sillier --- if there
are multiple backends starting, they're each an independent process.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not reattach to shared memory captured in buildfarm

2009-05-04 Thread Magnus Hagander
Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
 Somebody else mentioned, and IIRC I talked to Dave about this before,
 that this could be because the address is no longer available. The
 reason for this could be some kind of race condition in the backends
 starting - the address is available when the postmaster starts and thus
 it's used, but when a regular backend starts, the memory is used for
 something else.
 
 How is it no longer available, when the new backend is a brand new
 process?  The race condition bit seems even sillier --- if there
 are multiple backends starting, they're each an independent process.

Because some other DLL that was loaded on process startup allocated
memory differently - in a different order, different size because or
something, or something like that.

I didn't mean race condition between backends. I meant against a
potential other thread started by a loaded DLL for initialization.
(Again, things like antivirus are known to do this, and we do see these
issues more often if AV is present for example)

//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not reattach to shared memory captured in buildfarm

2009-05-04 Thread Alvaro Herrera
Magnus Hagander wrote:

 I didn't mean race condition between backends. I meant against a
 potential other thread started by a loaded DLL for initialization.
 (Again, things like antivirus are known to do this, and we do see these
 issues more often if AV is present for example)

I don't understand this.  How can memory allocated by a completely separate
process affect what happens to a backend?  I mean, if an antivirus is running,
surely it does not run on the backend's process?  Or does it?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] could not reattach to shared memory captured in buildfarm

2009-05-02 Thread Tom Lane
vaquita has an interesting report today:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=vaquitadt=2009-05-01%2020:00:06

Partway through the contrib tests, for absolutely no visible reason
whatsoever, connections start to fail with
FATAL:  could not reattach to shared memory (key=364, addr=0292): 487

We've certainly heard more than a couple of field reports of this from
Windows users, but I don't think we've ever seen it in the buildfarm
before.  (I don't see any similar instances in vaquita's history, anyway.)

I assume vaquita's configuration hasn't changed recently (Dave?)
so this seems to put the lie to the theory we've taken refuge in
that it's caused by bad antivirus software.  I don't see that it
gets us any closer to a solution though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not reattach to shared memory captured in buildfarm

2009-05-02 Thread Dave Page
On Sat, May 2, 2009 at 4:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I assume vaquita's configuration hasn't changed recently (Dave?)
 so this seems to put the lie to the theory we've taken refuge in
 that it's caused by bad antivirus software.  I don't see that it
 gets us any closer to a solution though.

Well, theres a bit of a story there. Vaquita and Baiji are both the
same Vista machine running on VMware Server. About a month back, for
what seemed like no reason, the guest VM started running at much
higher speed than it should - animated cursors started running at
double speed, double-clicking become impossible and the clock started
gaining significant amounts of time - to the expent that buildfarm
runs were rejected by the server because the finish time was in the
future.

I believe I finally fixed this on Friday - from what I can tell, it
looks like the Java self-update applet was causing the clock rate on
the host to be raised to 1000/1024Hz (this can be done using the
multimedia API). This in turn was apparently upsetting VMware. Anyway,
long story short, removed the JVM from the host and everything appears
to have returned to normal. Nothing has changed in the config of the
VM itself, though a couple of minor tweaks were made to the VMware
configuration - but they were clock-related.

-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers