Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2011-09-09 Thread Tom Lane
Daniel Farina dan...@heroku.com writes:
 A huge thanks to Conrad Irwin of Rapportive for furnishing virtually all the
 details of this bug report.

This isn't really enough information to reproduce the problem ...

 The occurrence rate is somewhere in the one per tens-of-millions of
 queries.

... and that statement is going to discourage anyone from even trying,
since with such a low occurrence rate it's going to be impossible to be
sure whether the setup to reproduce the problem is correct.  So if you'd
like this to be fixed, you're either going to need to show us exactly
how to reproduce it, or investigate it yourself.

The way that I'd personally proceed to investigate it would probably be
to change the invalid memory alloc request size size errors (in
src/backend/utils/mmgr/mcxt.c; there are about four occurrences) from
ERROR to PANIC so that they'll provoke a core dump, and then use gdb
to get a stack trace, which would provide at least a little more
information about what happened.  However, if you are only able to
reproduce it in a production server, you might not like that approach.
Perhaps you can set up an extra standby that's only there for testing,
so you don't mind if it crashes?

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2011-09-09 Thread Heikki Linnakangas

On 09.09.2011 18:02, Tom Lane wrote:

The way that I'd personally proceed to investigate it would probably be
to change the invalid memory alloc request size size errors (in
src/backend/utils/mmgr/mcxt.c; there are about four occurrences) from
ERROR to PANIC so that they'll provoke a core dump, and then use gdb
to get a stack trace, which would provide at least a little more
information about what happened.  However, if you are only able to
reproduce it in a production server, you might not like that approach.
Perhaps you can set up an extra standby that's only there for testing,
so you don't mind if it crashes?


If that's not possible or doesn't reproduce the issue, there's also 
functions in glibc to produce a backtrace without aborting the program: 
https://www.gnu.org/s/libc/manual/html_node/Backtraces.html.


I think you could also fork() + abort() to generate a core dump, not 
just a backtrace.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT

2011-09-09 Thread Simon Riggs
On Thu, Sep 8, 2011 at 11:33 PM, Daniel Farina dan...@heroku.com wrote:

  ERROR: invalid memory alloc request size 18446744073709551613

 At least once, a hot standby was promoted to a primary and the errors seem
 to discontinue, but then reappear on a newly-provisioned standby.

So the query that fails is a btree index on a hot standby. I don't
fully accept it as an HS bug, but lets assume that it is and analyse
what could cause it.

The MO is certain user queries, only observed in HS. So certain
queries might be related to the way we use indexes or not.

There is a single and small difference between how a btree index
operates in HS and normal operation, which relates to whether we
kill tuples in the index. That's simple code and there's no obvious
bugs there, nor anything that specifically allocates memory even. So
the only bug that springs to mind is something related to how we
navigate hot chains with/without killed tuples. i.e. the bug is not
actually HS related, but is only observed under conditions typical in
HS.

HS touches almost nothing else in user space, apart from snapshots. So
there could be a bug there also, maybe in CopySnapshot().

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #6201: Windows User Log Off Causes Backend Exception 0xC0000142

2011-09-09 Thread Jerome Schulteis

The following bug has been logged online:

Bug reference:  6201
Logged by:  Jerome Schulteis
Email address:  jerome.schult...@edstrom.com
PostgreSQL version: 9.0.4
Operating system:   Windows XP Pro SP3
Description:Windows User Log Off Causes Backend Exception 0xC142
Details: 

It does not happen on every log off, but if the Windows console user logs
off
of the server at just the wrong time while a backend is starting up, the
backend terminates with exception 0xC142 (STATUS_DLL_INIT_FAILED), and
the
PostgreSQL Windows service stops (shared memory block is still in use):

2011-09-09 10:34:01 CDT LOG:  server process (PID 3260) was terminated by
exception 0xC142
2011-09-09 10:34:01 CDT HINT:  See C include file ntstatus.h for a
description of the hexadecimal value.
2011-09-09 10:34:01 CDT LOG:  terminating any other active server processes
[...]
2011-09-09 10:34:02 CDT WARNING:  terminating connection because of crash of
another server process
2011-09-09 10:34:02 CDT DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2011-09-09 10:34:02 CDT HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2011-09-09 10:34:02 CDT LOG:  all server processes terminated;
reinitializing
2011-09-09 10:34:12 CDT FATAL:  pre-existing shared memory block is still in
use
2011-09-09 10:34:12 CDT HINT:  Check if there are any old server processes
still running, and terminate them.

Originally encountered with our web app running a stress test on PostgreSQL
8.4.2; I reproduced it on a default one-click install of 9.0.4.  I run the
following Java on a separate machine to get a new connection every 100 ms:

class ConnectionHog {
static public void main(String[] args) {
try {
while (true) {
java.sql.DriverManager.getConnection(args[0], args[1],
args[2]);
Thread.sleep(Long.parseLong(args[3]));
}
}
catch (Throwable t) {
System.err.println(t);
}
}
}

I then log on and off the server from a separate user account (that is, not
the postgres
account); the error shows up after at most 3 log offs. As long as no log
offs happen on the server, no problem.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #6201: Windows User Log Off Causes Backend Exception 0xC0000142

2011-09-09 Thread Craig Ringer

On 10/09/2011 4:59 AM, Jerome Schulteis wrote:

The following bug has been logged online:

Bug reference:  6201
Logged by:  Jerome Schulteis
Email address:  jerome.schult...@edstrom.com
PostgreSQL version: 9.0.4
Operating system:   Windows XP Pro SP3
Description:Windows User Log Off Causes Backend Exception 0xC142
Details:

It does not happen on every log off, but if the Windows console user logs
off
of the server at just the wrong time while a backend is starting up, the
backend terminates with exception 0xC142 (STATUS_DLL_INIT_FAILED), and
the
PostgreSQL Windows service stops (shared memory block is still in use):
I wonder which DLL failed? It seems that PostgreSQL doesn't print the 
message the operating system generates for these errors. For example, it 
should be printing:


{DLL Initialization Failed} Initialization of the dynamic link library 
%hs failed. The process is terminating abnormally.


... where %hs is the DLL name or path. I should have a play with that 
once I finish moving house...




Do you have any antivirus or antimalware products on the system?

Do you have a Logitech webcam? Their webcam effects app adds a hook DLL 
to every process on the system, and I've seen it cause issues with MinGW 
among other things before.


If you launch a trivial process like notepad.exe then launch Process 
Explorer from SysInternals, select notepad.exe and press control-D to 
show the DLL list, what non-Microsoft DLLs are shown? You can copy and 
paste the list.


If you examine a running postgres.exe backend (one of the processess 
listed UNDER the main postgres.exe which is the postmaster) the same 
way, what non-Microsoft DLLs are shown there? Note that you may have to 
run Process Explorer as administrator to view the DLL list for 
postgres.exe. Ignore any DLLs ending in .NLS .


For bonus points, verify each Microsoft DLL as having a valid signature 
by double-clicking on it in the list, and report any DLLs that fail 
verification.




--
Craig Ringer


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs