Re: [HACKERS] Win32 hard crash problem
> IIRC there is no real SIGINT on Windows, so it can only come > from a postgres program. The windows shutdown could be > calling pg_ctl to stop the service, of course. Well, not quite that, but it will send a service command to the running pg_ctl (which is our "service supervisor"), which *will* respond with a SIGINT to the postmaster. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
IIRC there is no real SIGINT on Windows, so it can only come from a postgres program. The windows shutdown could be calling pg_ctl to stop the service, of course. cheers andrew Joshua D. Drake wrote: > Magnus Hagander wrote: That log entry is the last (of consequence) entry before >>> the machine says: 2006-09-28 16:40:36.921 LOG: received fast shutdown request >>> Oh? That's pretty interesting on a Windows machine, because >>> AFAIK there wouldn't be any standard mechanism that might tie >>> into our homegrown signal facility. Anyone have a theory on >>> what might trigger a SIGINT to the postmaster, other than >>> intentional pg_ctl invocation? >> >> pg_ctl will send SIGINT to the postmaster when the service is stopped, >> or when windows is shutting down. > > O.k. that pretty much confirms my suspicion then. The SIGINT likely came > from the user rebooting windows. > >> >> Do you get anything about the postgresql service in the eventlog within >> say a minute of this happening? (before or after) > > Too late to say now :( I will have to follow up with them. > ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Win32 hard crash problem
Magnus Hagander wrote: >>> That log entry is the last (of consequence) entry before >> the machine says: >>> 2006-09-28 16:40:36.921 LOG: received fast shutdown request >> Oh? That's pretty interesting on a Windows machine, because >> AFAIK there wouldn't be any standard mechanism that might tie >> into our homegrown signal facility. Anyone have a theory on >> what might trigger a SIGINT to the postmaster, other than >> intentional pg_ctl invocation? > > pg_ctl will send SIGINT to the postmaster when the service is stopped, > or when windows is shutting down. O.k. that pretty much confirms my suspicion then. The SIGINT likely came from the user rebooting windows. > > Do you get anything about the postgresql service in the eventlog within > say a minute of this happening? (before or after) Too late to say now :( I will have to follow up with them. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Win32 hard crash problem
> > That log entry is the last (of consequence) entry before > the machine says: > > 2006-09-28 16:40:36.921 LOG: received fast shutdown request > > Oh? That's pretty interesting on a Windows machine, because > AFAIK there wouldn't be any standard mechanism that might tie > into our homegrown signal facility. Anyone have a theory on > what might trigger a SIGINT to the postmaster, other than > intentional pg_ctl invocation? pg_ctl will send SIGINT to the postmaster when the service is stopped, or when windows is shutting down. Do you get anything about the postgresql service in the eventlog within say a minute of this happening? (before or after) Could it be a backend or the postmaster trying to send a signal to a different backend, that for some reason sends it to the wrong process? //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
Tom Lane wrote: > "Joshua D. Drake" <[EMAIL PROTECTED]> writes: >> O.k. further on this.. the crashing is happening quickly now but not >> predictably. (as in sometimes a week sometimes 2 days). > > OK, that seems to eliminate the GetTickCount-overflow theory anyway. > >> That log entry is the last (of consequence) entry before the machine says: >> 2006-09-28 16:40:36.921 LOG: received fast shutdown request > > Oh? That's pretty interesting on a Windows machine, because AFAIK there > wouldn't be any standard mechanism that might tie into our homegrown > signal facility. Anyone have a theory on what might trigger a SIGINT > to the postmaster, other than intentional pg_ctl invocation? Well the other option would be a windows restart. On windows would that send a SIGINT to the backend? Joshua D. Drake > > regards, tom lane > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > O.k. further on this.. the crashing is happening quickly now but not > predictably. (as in sometimes a week sometimes 2 days). OK, that seems to eliminate the GetTickCount-overflow theory anyway. > That log entry is the last (of consequence) entry before the machine says: > 2006-09-28 16:40:36.921 LOG: received fast shutdown request Oh? That's pretty interesting on a Windows machine, because AFAIK there wouldn't be any standard mechanism that might tie into our homegrown signal facility. Anyone have a theory on what might trigger a SIGINT to the postmaster, other than intentional pg_ctl invocation? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
Joshua D. Drake wrote: > Tom Lane wrote: >> "Joshua D. Drake" <[EMAIL PROTECTED]> writes: >>> Yes, unfortunately there isn't much more to be had for another 2 >>> weeks ;) >> >> I trust they've got the reboot time and they will know exactly how long >> from reboot to problem? I'm not all that sold on the "GetTickCount >> overflow" theory, but certainly we ought not be missing a chance to test >> or disprove it. > > Yes I documented all conversations and disclaimers :) O.k. further on this.. the crashing is happening quickly now but not predictably. (as in sometimes a week sometimes 2 days). I just now got them to send some further logs... Interestingly: 2006-09-28 16:38:37.406 LOG: could not send data to client: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full. That log entry is the last (of consequence) entry before the machine says: 2006-09-28 16:40:36.921 LOG: received fast shutdown request 2006-09-28 16:40:36.921 LOG: aborting any active transactions 2006-09-28 16:40:36.921 FATAL: terminating connection due to administrator command On the ERROR side of things I have a bunch of standard, unique key violations etc... AND: postgresql-2006-09-27_00.log:2006-09-27 23:49:57.671 FATAL: could not read from statistics collector pipe: No error I have requested a clean run with entire log at DEBUG2. Hopefully that will give us more info. Sincerely, Joshua D. Drake > > Joshua D. Drake > >> >> regards, tom lane >> > > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > Yes I am fully aware of that. I am only relaying what the customer said. Yeah sorry, I guess what I sent was pretty obvious to you. I should stop confusing -general with -hackers :) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
Joshua D. Drake wrote: > Gregory Stark wrote: > >"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > >>The only known resolution is to reboot Windows. Using the service > >>control panel to shutdown postgresql will fail once the message is > >>received. It is unknown if using the task master to individually > >>kill processes will work. > > > >This contradicts your previous email about restarting the postmaster > >working. > > No, it doesn't. I never said restarting the postmaster would work. I > said rebooting windows, allows postgresql to come back up. Those are > entirely different things. Yup. It was me who said that restarting the postmaster solved the problem. That's what Dave Cramer told me. But maybe Dave was not certain about that -- he did use the word "reboot" and I asked for confirmation about whether this was an actual reboot of the machine or just a postmaster "reboot", and he said it was the latter. But this may have been a suposition. Sorry for the confusion. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
Gregory Stark wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: O.k. to recap: This message will present itself, if connection attempts are made from the Web Application (Java/JDBC), or locally via PgAdmin. Once the error message is received, all subsequent connection attempts will also result in that same message. We do not know if the error occurs before or after authentication. I think other people have claimed that this message is in libpq and not in JDBC source code which is inconsistent with this description. Yes I am fully aware of that. I am only relaying what the customer said. The only known resolution is to reboot Windows. Using the service control panel to shutdown postgresql will fail once the message is received. It is unknown if using the task master to individually kill processes will work. This contradicts your previous email about restarting the postmaster working. No, it doesn't. I never said restarting the postmaster would work. I said rebooting windows, allows postgresql to come back up. Those are entirely different things. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > O.k. to recap: > > This message will present itself, if connection attempts are made from the Web > Application (Java/JDBC), or locally via PgAdmin. Once the error message is > received, all subsequent connection attempts will also result in that same > message. We do not know if the error occurs before or after authentication. I think other people have claimed that this message is in libpq and not in JDBC source code which is inconsistent with this description. > The only known resolution is to reboot Windows. Using the service control > panel > to shutdown postgresql will fail once the message is received. It is unknown > if > using the task master to individually kill processes will work. This contradicts your previous email about restarting the postmaster working. I think you have to sit down and write down *exactly* what sequence of actions cause what results. Describing them in shorthand like "if connection attempts are made" is leading to a lot of speculation instead of systematic deductions. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Win32 hard crash problem
On 6-Sep-06, at 3:27 AM, Magnus Hagander wrote: Yes they are using a connection pool. A java based one. Since java has it's own protocol implementation, this is totally unrelated to any libpq error messages. Another important point that we've not been given information on: when pgAdmin/libpq starts failing like this, exactly what is happening with the connection pool? Is it still able to issue queries, and if not what happens exactly? No, when this happens everything stops. The only thing they get back is that message until they reboot the server. The web app (via java/connection pool), pgAdmin both give the same error. Which now that I think about it, seems odd if the message is coming from libpq yes? Yes, this is very odd, AFICS, this message does not exist in the java driver. So it would be interesting to get the actual logs from the client. Definitly - that error msg showing up in the web app really doesn't make sense. However, are we sure that the error message is *exactly* the same, word for word, or is it possible that it's just "the same in what it says" but with different words? I assume there are screendumps to verify this ;-) I looked at the code in the jdbc driver and it doesn't even do this check Another point that at least I don't know - what kind of connection pool is it? Is it an external one (like pgpool) to which the java app connects (using FE/BE protocol, emulating a "proper postmaster" but pooling access to the database), or is it running inside the app server (like for example .net connection pooling does, which simply means that when you run the Open() method on the connection object it will pick something off an *internal* pool)? It's an internal pool, and the client has told me off list they have removed it and are using the jdbc driver pool. At this point I'm confused as to what they really are using, but as they have contracted Command Prompt to fix this for them, I am no longer in the private loop. Dave //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Win32 hard crash problem
> > server sent data ("D" message) without prior row description ("T" > > message) > > During the connection attempt? I don't think libpq can report that > message until it tries to do a regular query (might be wrong > though). > Is the client using some application that's going to issue a query > immediately on connecting? In the case of pgAdmin, it does. It will set datestyle, load a list of dbs etc. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
Magnus Hagander wrote: > Another point that at least I don't know - what kind of connection pool > is it? Is it an external one (like pgpool) to which the java app > connects (using FE/BE protocol, emulating a "proper postmaster" but > pooling access to the database), or is it running inside the app server > (like for example .net connection pooling does, which simply means that > when you run the Open() method on the connection object it will pick > something off an *internal* pool)? Googling for 3CPO [1] shows that it is a Java-based connection pool that implements connection pooling using the JDBC API, i.e. it is an *internal* pool running inside the app servers JVM. PG Admin cannot in any case connect through this pool. Best Regards Michael Paesold [1] http://sourceforge.net/projects/c3p0 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
> Yes they are using a connection pool. A java based one. > >>> Since java has it's own protocol implementation, this is > totally > >>> unrelated to any libpq error messages. > >> Another important point that we've not been given information > on: > >> when pgAdmin/libpq starts failing like this, exactly what is > >> happening with the connection pool? Is it still able to issue > >> queries, and if not what happens exactly? > > > > No, when this happens everything stops. The only thing they get > back > > is that message until they reboot the server. The web app (via > > java/connection pool), pgAdmin both give the same error. > > > > Which now that I think about it, seems odd if the message is > coming > > from libpq yes? > Yes, this is very odd, AFICS, this message does not exist in the > java driver. So it would be interesting to get the actual logs > from the client. Definitly - that error msg showing up in the web app really doesn't make sense. However, are we sure that the error message is *exactly* the same, word for word, or is it possible that it's just "the same in what it says" but with different words? I assume there are screendumps to verify this ;-) Another point that at least I don't know - what kind of connection pool is it? Is it an external one (like pgpool) to which the java app connects (using FE/BE protocol, emulating a "proper postmaster" but pooling access to the database), or is it running inside the app server (like for example .net connection pooling does, which simply means that when you run the Open() method on the connection object it will pick something off an *internal* pool)? //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Win32 hard crash problem
I'm a bit fear to to engage into this thread, but I've seen also reproducible case when libpq client stops working and 'vaccuum analyze' helped. It's happened on Windows Server 2003 and XP with PostgreSQL 8.1.4. I don't have client source code, so I can't say more, but customer's developer said the same behaviour was observed on Linux with 8.1.0 and has gone in 8.1.4. They said, that this happens only with enabled row statistics. Client inserts some data in transaction, backend writes 'COMMIT' to log, but client wait something and 'vacuum analyze' of all database in some magic way pushed the process. I've got their installation CD and will try to investigate this problem. Any suggestions ? I'm not familiar with W32 at all. Oleg On Tue, 5 Sep 2006, Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Yes, unfortunately there isn't much more to be had for another 2 weeks ;) I trust they've got the reboot time and they will know exactly how long from reboot to problem? I'm not all that sold on the "GetTickCount overflow" theory, but certainly we ought not be missing a chance to test or disprove it. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Yes, unfortunately there isn't much more to be had for another 2 weeks ;) I trust they've got the reboot time and they will know exactly how long from reboot to problem? I'm not all that sold on the "GetTickCount overflow" theory, but certainly we ought not be missing a chance to test or disprove it. Yes I documented all conversations and disclaimers :) Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > Yes, unfortunately there isn't much more to be had for another 2 weeks ;) I trust they've got the reboot time and they will know exactly how long from reboot to problem? I'm not all that sold on the "GetTickCount overflow" theory, but certainly we ought not be missing a chance to test or disprove it. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
On 5-Sep-06, at 7:00 PM, Joshua D. Drake wrote: Tom Lane wrote: Dave Cramer <[EMAIL PROTECTED]> writes: On 5-Sep-06, at 6:05 PM, Joshua D. Drake wrote: Yes they are using a connection pool. A java based one. Since java has it's own protocol implementation, this is totally unrelated to any libpq error messages. Another important point that we've not been given information on: when pgAdmin/libpq starts failing like this, exactly what is happening with the connection pool? Is it still able to issue queries, and if not what happens exactly? No, when this happens everything stops. The only thing they get back is that message until they reboot the server. The web app (via java/connection pool), pgAdmin both give the same error. Which now that I think about it, seems odd if the message is coming from libpq yes? Yes, this is very odd, AFICS, this message does not exist in the java driver. So it would be interesting to get the actual logs from the client. Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
It sounds to me like we don't actually know that, because the client doesn't know how to restart the postmaster without rebooting the OS. (Josh says "pg_ctl stop" doesn't work in this state, which is a tad interesting in itself, because that doesn't go through a connection request.) It would be useful to try killing off the postgres processes via task manager and then see if a new postmaster can be started and if things then behave normally, or if a reboot is truly needed. Right, and I have asked that the next time this happens that they try and use the task manager to kill the process. The bottom line here is that all we have so far are client-side observations ("I get this message") and we have no clue what state the postmaster thinks it's in. We really need more information. Yes, unfortunately there isn't much more to be had for another 2 weeks ;) Sincerely, Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Win32 hard crash problem
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Joshua D. Drake wrote: >> I already said that ;). The problem IS NOT that we can't restart the >> system and get postgresql back. It is that it happens at all. > It is quite different a bug that can only be fixed by "rebooting the > server" (which to me means taking the operating system down and starting > it afresh) than one that can be fixed by restarting the PostgreSQL > server (_without_ taking the operating system down). I've been reading > "reboot" all along -- sorry if I missed an email saying otherwise. It sounds to me like we don't actually know that, because the client doesn't know how to restart the postmaster without rebooting the OS. (Josh says "pg_ctl stop" doesn't work in this state, which is a tad interesting in itself, because that doesn't go through a connection request.) It would be useful to try killing off the postgres processes via task manager and then see if a new postmaster can be started and if things then behave normally, or if a reboot is truly needed. The bottom line here is that all we have so far are client-side observations ("I get this message") and we have no clue what state the postmaster thinks it's in. We really need more information. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Win32 hard crash problem
Joshua D. Drake wrote: > The only known resolution is to reboot Windows. Using the service ^^ > control panel to shutdown postgresql will fail once the message is > received. It is unknown if using the task master to individually kill > processes will work. This is what I'm saying that doesn't match what Dave told me. The stuff about failing to shut the postmaster down, is the first time I hear. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
Joshua D. Drake wrote: > Alvaro Herrera wrote: > >Joshua D. Drake wrote: > >>Tom Lane wrote: > >>>Dave Cramer <[EMAIL PROTECTED]> writes: > On 5-Sep-06, at 6:05 PM, Joshua D. Drake wrote: > >Yes they are using a connection pool. A java based one. > Since java has it's own protocol implementation, this is totally > unrelated to any libpq error messages. > >>>Another important point that we've not been given information on: > >>>when pgAdmin/libpq starts failing like this, exactly what is happening > >>>with the connection pool? Is it still able to issue queries, and > >>>if not what happens exactly? > >>No, when this happens everything stops. The only thing they get back is > >>that message until they reboot the server. The web app (via > >>java/connection pool), pgAdmin both give the same error. > > > >Actually Dave Cramer told me that if the postmaster was stopped and then > >restarted, it would start answering fine again. Which would make a lot > >of sense. > > I already said that ;). The problem IS NOT that we can't restart the > system and get postgresql back. It is that it happens at all. It is quite different a bug that can only be fixed by "rebooting the server" (which to me means taking the operating system down and starting it afresh) than one that can be fixed by restarting the PostgreSQL server (_without_ taking the operating system down). I've been reading "reboot" all along -- sorry if I missed an email saying otherwise. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Win32 hard crash problem
Hello, O.k. to recap: OS: Win2k3 SP1 PostgreSQL: 8.1.2 Application Server: Jboss Connection Pooler: C3PO JDBC Version: 8.1.404, Also verified with 8.0.311 Problem: After 2/3 weeks, PostgreSQL will begin issuing the following message: server sent data ("D" message) without prior row description ("T" message) This message will present itself, if connection attempts are made from the Web Application (Java/JDBC), or locally via PgAdmin. Once the error message is received, all subsequent connection attempts will also result in that same message. We do not know if the error occurs before or after authentication. The only known resolution is to reboot Windows. Using the service control panel to shutdown postgresql will fail once the message is received. It is unknown if using the task master to individually kill processes will work. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
Alvaro Herrera wrote: Joshua D. Drake wrote: Tom Lane wrote: Dave Cramer <[EMAIL PROTECTED]> writes: On 5-Sep-06, at 6:05 PM, Joshua D. Drake wrote: Yes they are using a connection pool. A java based one. Since java has it's own protocol implementation, this is totally unrelated to any libpq error messages. Another important point that we've not been given information on: when pgAdmin/libpq starts failing like this, exactly what is happening with the connection pool? Is it still able to issue queries, and if not what happens exactly? No, when this happens everything stops. The only thing they get back is that message until they reboot the server. The web app (via java/connection pool), pgAdmin both give the same error. Actually Dave Cramer told me that if the postmaster was stopped and then restarted, it would start answering fine again. Which would make a lot of sense. I already said that ;). The problem IS NOT that we can't restart the system and get postgresql back. It is that it happens at all. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
Joshua D. Drake wrote: > Tom Lane wrote: > >Dave Cramer <[EMAIL PROTECTED]> writes: > >>On 5-Sep-06, at 6:05 PM, Joshua D. Drake wrote: > >>>Yes they are using a connection pool. A java based one. > > > >>Since java has it's own protocol implementation, this is totally > >>unrelated to any libpq error messages. > > > >Another important point that we've not been given information on: > >when pgAdmin/libpq starts failing like this, exactly what is happening > >with the connection pool? Is it still able to issue queries, and > >if not what happens exactly? > > No, when this happens everything stops. The only thing they get back is > that message until they reboot the server. The web app (via > java/connection pool), pgAdmin both give the same error. Actually Dave Cramer told me that if the postmaster was stopped and then restarted, it would start answering fine again. Which would make a lot of sense. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
-Original Message- From: "Joshua D. Drake" <[EMAIL PROTECTED]> To: "Joshua D. Drake" <[EMAIL PROTECTED]>; "Tom Lane" <[EMAIL PROTECTED]>; "Merlin Moncure" <[EMAIL PROTECTED]>; "Magnus Hagander" <[EMAIL PROTECTED]>; "PostgreSQL-development" Sent: 05/09/06 23:27 Subject: Re: [HACKERS] Win32 hard crash problem > Well except when they are connecting with Pgadmin (which wouldn't go > through the connection pool) they get the error as well. It wouldn't? It's just a 'regular' libpq app. Doesn't say much for the connection pool if it cannot handle a simple libpq connection. /D ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Win32 hard crash problem
Tom Lane wrote: Dave Cramer <[EMAIL PROTECTED]> writes: On 5-Sep-06, at 6:05 PM, Joshua D. Drake wrote: Yes they are using a connection pool. A java based one. Since java has it's own protocol implementation, this is totally unrelated to any libpq error messages. Another important point that we've not been given information on: when pgAdmin/libpq starts failing like this, exactly what is happening with the connection pool? Is it still able to issue queries, and if not what happens exactly? No, when this happens everything stops. The only thing they get back is that message until they reboot the server. The web app (via java/connection pool), pgAdmin both give the same error. Which now that I think about it, seems odd if the message is coming from libpq yes? Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Win32 hard crash problem
Alvaro Herrera wrote: Joshua D. Drake wrote: Alvaro Herrera wrote: Joshua D. Drake wrote: Alvaro Herrera wrote: What I've been wondering all along is whether they are using a connection pool. Yes they are using a connection pool. A java based one. It's quite possible that it's the connection pool that gets confused, and not PostgreSQL itself. It would be interesting if they change the connection setting when the "hang" next occurs, to point directly to PostgreSQL bypassing the connection pool. Well except when they are connecting with Pgadmin (which wouldn't go through the connection pool) they get the error as well. Are you assuming, or did they/you verify that this is indeed the case? I see no reason to assume that pgAdmin can't connect via a pool. Verified. They do not connect to the connection pool for pgadmin. Although I would think pgadmin might have problems connecting to a java based pool. If I recall, (I could be cranked) JDBC apps can't use pgpool for example. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Win32 hard crash problem
Joshua D. Drake wrote: > Alvaro Herrera wrote: > >Tom Lane wrote: > >>"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > >Fail in what way. Hang, not connect, or get an error msg? > >>>Just verified with customer. Once the problem occurs the first time, the > >>>customer will continually get the same error message for each subsequent > >>>connection attempt: > >>>server sent data ("D" message) without prior row description ("T" > >>>message) > >>During the connection attempt? I don't think libpq can report that > >>message until it tries to do a regular query (might be wrong though). > >>Is the client using some application that's going to issue a query > >>immediately on connecting? > > > >What I've been wondering all along is whether they are using a > >connection pool. > > Yes they are using a connection pool. A java based one. It's quite possible that it's the connection pool that gets confused, and not PostgreSQL itself. It would be interesting if they change the connection setting when the "hang" next occurs, to point directly to PostgreSQL bypassing the connection pool. OTOH the connection pool may be the thing with the TickCounter problem. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
Dave Cramer <[EMAIL PROTECTED]> writes: > On 5-Sep-06, at 6:05 PM, Joshua D. Drake wrote: >> Yes they are using a connection pool. A java based one. > Since java has it's own protocol implementation, this is totally > unrelated to any libpq error messages. Another important point that we've not been given information on: when pgAdmin/libpq starts failing like this, exactly what is happening with the connection pool? Is it still able to issue queries, and if not what happens exactly? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Win32 hard crash problem
Joshua D. Drake wrote: > Alvaro Herrera wrote: > >Joshua D. Drake wrote: > >>Alvaro Herrera wrote: > >>>What I've been wondering all along is whether they are using a > >>>connection pool. > >>Yes they are using a connection pool. A java based one. > > > >It's quite possible that it's the connection pool that gets confused, > >and not PostgreSQL itself. It would be interesting if they change the > >connection setting when the "hang" next occurs, to point directly to > >PostgreSQL bypassing the connection pool. > > Well except when they are connecting with Pgadmin (which wouldn't go > through the connection pool) they get the error as well. Are you assuming, or did they/you verify that this is indeed the case? I see no reason to assume that pgAdmin can't connect via a pool. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Win32 hard crash problem
On 5-Sep-06, at 6:05 PM, Joshua D. Drake wrote: Alvaro Herrera wrote: Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Fail in what way. Hang, not connect, or get an error msg? Just verified with customer. Once the problem occurs the first time, the customer will continually get the same error message for each subsequent connection attempt: server sent data ("D" message) without prior row description ("T" message) During the connection attempt? I don't think libpq can report that message until it tries to do a regular query (might be wrong though). Is the client using some application that's going to issue a query immediately on connecting? What I've been wondering all along is whether they are using a connection pool. Yes they are using a connection pool. A java based one. Since java has it's own protocol implementation, this is totally unrelated to any libpq error messages. While I've not personally used the pool in question (c3p0) my understanding is that it is pretty robust. Personally, I'm betting on some windows TCP/IP weirdness here. Dave Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Win32 hard crash problem
Alvaro Herrera wrote: Joshua D. Drake wrote: Alvaro Herrera wrote: Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Fail in what way. Hang, not connect, or get an error msg? Just verified with customer. Once the problem occurs the first time, the customer will continually get the same error message for each subsequent connection attempt: server sent data ("D" message) without prior row description ("T" message) During the connection attempt? I don't think libpq can report that message until it tries to do a regular query (might be wrong though). Is the client using some application that's going to issue a query immediately on connecting? What I've been wondering all along is whether they are using a connection pool. Yes they are using a connection pool. A java based one. It's quite possible that it's the connection pool that gets confused, and not PostgreSQL itself. It would be interesting if they change the connection setting when the "hang" next occurs, to point directly to PostgreSQL bypassing the connection pool. Well except when they are connecting with Pgadmin (which wouldn't go through the connection pool) they get the error as well. Joshua D. Drake OTOH the connection pool may be the thing with the TickCounter problem. -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
On Tue, 5 Sep 2006, Joshua D. Drake wrote: > Right, but "just took a reboot to fix it" isn't very confidence inspiring ;) Are you kidding? This is standard procedure for troubleshooting Windows problems :) -- The world is coming to an end. Please log off. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Win32 hard crash problem
Alvaro Herrera wrote: Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Fail in what way. Hang, not connect, or get an error msg? Just verified with customer. Once the problem occurs the first time, the customer will continually get the same error message for each subsequent connection attempt: server sent data ("D" message) without prior row description ("T" message) During the connection attempt? I don't think libpq can report that message until it tries to do a regular query (might be wrong though). Is the client using some application that's going to issue a query immediately on connecting? What I've been wondering all along is whether they are using a connection pool. Yes they are using a connection pool. A java based one. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Fail in what way. Hang, not connect, or get an error msg? Just verified with customer. Once the problem occurs the first time, the customer will continually get the same error message for each subsequent connection attempt: server sent data ("D" message) without prior row description ("T" message) During the connection attempt? I don't think libpq can report that message until it tries to do a regular query (might be wrong though). Is the client using some application that's going to issue a query immediately on connecting? Well, windows ;) Customer says that they double click pgadmin and they get that message. I have informed them on how to increase to debug5 and hopefully we get something from that, of course it will likely be 24.85 days from now ;) It would be useful to turn on log_connections and log_statement (and perhaps crank log_min_messages all the way up to DEBUG5) to see if we can get anything in the postmaster log giving a hint what actually happens here. A TCP sniff of the connection attempt traffic would be pretty useful too. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Win32 hard crash problem
Tom Lane wrote: > "Joshua D. Drake" <[EMAIL PROTECTED]> writes: > >>> Fail in what way. Hang, not connect, or get an error msg? > > > Just verified with customer. Once the problem occurs the first time, the > > customer will continually get the same error message for each subsequent > > connection attempt: > > > server sent data ("D" message) without prior row description ("T" message) > > During the connection attempt? I don't think libpq can report that > message until it tries to do a regular query (might be wrong though). > Is the client using some application that's going to issue a query > immediately on connecting? What I've been wondering all along is whether they are using a connection pool. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: >>> Fail in what way. Hang, not connect, or get an error msg? > Just verified with customer. Once the problem occurs the first time, the > customer will continually get the same error message for each subsequent > connection attempt: > server sent data ("D" message) without prior row description ("T" message) During the connection attempt? I don't think libpq can report that message until it tries to do a regular query (might be wrong though). Is the client using some application that's going to issue a query immediately on connecting? It would be useful to turn on log_connections and log_statement (and perhaps crank log_min_messages all the way up to DEBUG5) to see if we can get anything in the postmaster log giving a hint what actually happens here. A TCP sniff of the connection attempt traffic would be pretty useful too. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Win32 hard crash problem
Josh failed to answer the most important question though: Sorry. Subsequent connections to the database will fail (such as pgAdmin) and Windows must be completely rebooted. Fail in what way. Hang, not connect, or get an error msg? Just verified with customer. Once the problem occurs the first time, the customer will continually get the same error message for each subsequent connection attempt: server sent data ("D" message) without prior row description ("T" message) regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Win32 hard crash problem
"Merlin Moncure" <[EMAIL PROTECTED]> writes: > On 9/5/06, Joshua D. Drake <[EMAIL PROTECTED]> wrote: >> Magnus Hagander wrote: >>> What do you mean by this? It doesn't start upon reboot? What is needed >>> to make it start? >> >> It means that postgresql doesn't recover on its own. On linux if a >> backend crashes all of PostgreSQL will restart and come back up if it can. >> >> On Win32 it doesn't. > it does for me, at least for me when I used to work with windows :). > I think it just doesn't restart for this particular type of crash. As best I can tell, Josh isn't describing a crash at all. Something (possibly in the TCP stack) has locked up, but there's no way for the postmaster to know there's anything wrong, and probably no way for the postmaster to fix it if it did know. Restarting backends certainly isn't going to fix a communication problem. Josh failed to answer the most important question though: >> Subsequent connections to the database will fail (such as pgAdmin) >> and Windows must be completely rebooted. > > Fail in what way. Hang, not connect, or get an error msg? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
> >> PostgreSQL will also not recover on its own (e.g; auto restart and > >> roll through the logs). > > > > What do you mean by this? It doesn't start upon reboot? > What is needed > > to make it start? > > It means that postgresql doesn't recover on its own. On linux > if a backend crashes all of PostgreSQL will restart and come > back up if it can. > > On Win32 it doesn't. Ah, I thought you meant that the database recovery process (that runs after a crash) failed and lost data. But it's not data-loss then, it just took a reboot to fix it? I think we're somehow seeing a complete postmaster hang, where it's either not able to kill off th ebackends as required, or just not capable of accepting new connections after that. Which makes a stacktrace from the postmaster the most interesting one to look at. //Magnus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Win32 hard crash problem
Magnus Hagander wrote: PostgreSQL will also not recover on its own (e.g; auto restart and roll through the logs). What do you mean by this? It doesn't start upon reboot? What is needed to make it start? It means that postgresql doesn't recover on its own. On linux if a backend crashes all of PostgreSQL will restart and come back up if it can. On Win32 it doesn't. Ah, I thought you meant that the database recovery process (that runs after a crash) failed and lost data. But it's not data-loss then, it just took a reboot to fix it? Right, but "just took a reboot to fix it" isn't very confidence inspiring ;) I think we're somehow seeing a complete postmaster hang, where it's either not able to kill off th ebackends as required, or just not capable of accepting new connections after that. Which makes a stacktrace from the postmaster the most interesting one to look at. I have asked the customer to also look and see if there was one particular process that was eating cpu via the task master and see if that process can be killed. If that process can be killed and postgresql comes back clean, then that is a step. However, debugging this beast is a pain. I take it mingw doesn't have a gdb we can use? //Magnus -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Win32 hard crash problem
On 9/5/06, Joshua D. Drake <[EMAIL PROTECTED]> wrote: Magnus Hagander wrote: > What do you mean by this? It doesn't start upon reboot? What is needed > to make it start? It means that postgresql doesn't recover on its own. On linux if a backend crashes all of PostgreSQL will restart and come back up if it can. On Win32 it doesn't. it does for me, at least for me when I used to work with windows :). I think it just doesn't restart for this particular type of crash. I had a couple of similarly wierd undetectable windows problems that I could never quite figured out until I got hired by another company and left that monster behind for good. merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
Magnus Hagander wrote: Oops, going backwards through the mails it seems :) Subsequent connections to the database will fail (such as pgAdmin) and Windows must be completely rebooted. Fail in what way. Hang, not connect, or get an error msg? PostgreSQL will also not recover on its own (e.g; auto restart and roll through the logs). What do you mean by this? It doesn't start upon reboot? What is needed to make it start? It means that postgresql doesn't recover on its own. On linux if a backend crashes all of PostgreSQL will restart and come back up if it can. On Win32 it doesn't. Joshua D. Drake //Magnus -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
> >> My bet is something depending on GetTickCount to measure elapsed time > >> (and no, it's not used in the core Postgres code, but you've got > >> plenty of other possible culprits in that stack). > > > This doesn't quite make sense. The only reason we have to reboot is > > because PostgreSQL no longer responds. The system itself is fine. > > The Windows kernel may still work, but that doesn't mean that > everything Postgres depends on still works. It may be a not reacting listen socket. This may be because of a handle leak. Next time it blocks look at the handle counts (e.g. with handle.exe from sysinternals). You could also look for handle count now with Task Manager and see if it increases constantly. (handle.exe shows you the details) Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Win32 hard crash problem
Oops, going backwards through the mails it seems :) > Subsequent connections to the database will fail (such as pgAdmin) > and Windows must be completely rebooted. Fail in what way. Hang, not connect, or get an error msg? > PostgreSQL will also not recover on its own (e.g; auto restart and > roll through the logs). What do you mean by this? It doesn't start upon reboot? What is needed to make it start? //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Win32 hard crash problem
> >> My bet is something depending on GetTickCount to measure elapsed > time > >> (and no, it's not used in the core Postgres code, but you've got > >> plenty of other possible culprits in that stack). > > > This doesn't quite make sense. The only reason we have to reboot > is > > because PostgreSQL no longer responds. The system itself is fine. > > The Windows kernel may still work, but that doesn't mean that > everything Postgres depends on still works. I'm wondering about > (a) the TCP stack (and that includes 3rd party firewalls and such, > not only the core Windows code); (b) timing or threading stuff > inside the application that's using libpq, which the only thing we > know about so far is that it's *not* JDBC/Hibernate. How about getting a simple backtrace from a couple of the stuck postgres processes? And from the postmaster which should be accepting new connections... Or does that also hang completely? How to get one? Well, since we don't have the MSVC build yet (yeah, yeah, eventually), you can only get a semi-backtrace that only looks at exported symbols. You can get this using process explorer (thread tab, click stack), using WinDBG or using Visual Studio (you'll need VS 2005, and you need to check the option for "Load DLL exports" in options->debugging->native). Oh, btw, if there is a 3rd firewall on the box the standard recommendation of uninstalling it definitely sounds like a good plan :-) //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Win32 hard crash problem
On 31/8/06 23:34, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: > Sure it is a registry entry... so we could (in theory) shrink that quite > a bit.. However I am confused, if we don't use it, what that is > connecting to libpq would trigger it? > > I know they are using pgAAdmin... Are they using pgAgent? That's the only part of pgAdmin that doesn't any sort of timing I can think of offhand (other than the query tool timer which only runs whilst a query is running). Even then it's done indirectly through wxWidgets so I'm not familiar with how it's implemented at the win32 API level. If it were pgAdmin (or any other client) though, how would that lock up the entire PostgreSQL instance, but not the rest of the server? Regards, Dave. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > Which means we need to start stripping it down. Gah, I actually argued > *for* this port to. Next time slap me. Well, before you invest a lot of time barking up what might be the wrong tree, there is a very easy test you can use to check the GetTickCount theory: keep closer track of time-since-boot on the affected systems. If that idea is right, it won't be "two or three weeks" between boot and problems appearing, it'll be 24.85 days on the nose. It shouldn't take much except waiting to either falsify the theory or make it look pretty convincing. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Win32 hard crash problem
Alvaro Herrera wrote: Dave Cramer wrote: On 31-Aug-06, at 6:01 PM, Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Tom Lane wrote: BTW, are you sure this is coming from JDBC? I see the exact same message text in libpq: libpq_gettext("server sent data (\"D\" message) without prior row description (\"T\" message)\n")); Maybe the JDBC driver uses the identical message wording but my thought is to look for something going through libpq. The error is server side. I was just describing the environment. I can entirely assure you that that error message is not present in the server code. Well that's even more interesting because it doesn't exist in the jdbc driver either. Conclusion: they are using libpq in some form, so you should investigate that. Is there a way to alter the tick counter, so that a test run does not need to take the full 3 weeks? Sure it is a registry entry... so we could (in theory) shrink that quite a bit.. However I am confused, if we don't use it, what that is connecting to libpq would trigger it? I know they are using pgAAdmin... Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: My bet is something depending on GetTickCount to measure elapsed time (and no, it's not used in the core Postgres code, but you've got plenty of other possible culprits in that stack). This doesn't quite make sense. The only reason we have to reboot is because PostgreSQL no longer responds. The system itself is fine. The Windows kernel may still work, but that doesn't mean that everything Postgres depends on still works. I'm wondering about (a) the TCP stack (and that includes 3rd party firewalls and such, not only the core Windows code); (b) timing or threading stuff inside the application that's using libpq, which the only thing we know about so far is that it's *not* JDBC/Hibernate. /me grumbles in a not so polite way about Windows. Which means we need to start stripping it down. Gah, I actually argued *for* this port to. Next time slap me. Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Win32 hard crash problem
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: >> My bet is something depending on GetTickCount to measure elapsed time >> (and no, it's not used in the core Postgres code, but you've got plenty >> of other possible culprits in that stack). > This doesn't quite make sense. The only reason we have to reboot is > because PostgreSQL no longer responds. The system itself is fine. The Windows kernel may still work, but that doesn't mean that everything Postgres depends on still works. I'm wondering about (a) the TCP stack (and that includes 3rd party firewalls and such, not only the core Windows code); (b) timing or threading stuff inside the application that's using libpq, which the only thing we know about so far is that it's *not* JDBC/Hibernate. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Win32 hard crash problem
That sounds suspiciously close to the time from boot to wraparound of GetTickCount: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/gettickcount.asp M$ list this as 49 days but that's the time to wrap clear around to zero; the value overflows and goes negative in 24.85 days if I've done the math correctly. My bet is something depending on GetTickCount to measure elapsed time (and no, it's not used in the core Postgres code, but you've got plenty of other possible culprits in that stack). This doesn't quite make sense. The only reason we have to reboot is because PostgreSQL no longer responds. The system itself is fine. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
Dave Cramer wrote: > > On 31-Aug-06, at 6:01 PM, Tom Lane wrote: > > >"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > >>Tom Lane wrote: > >>>BTW, are you sure this is coming from JDBC? I see the exact same > >>>message text in libpq: > >>>libpq_gettext("server sent data (\"D\" message) without prior row > >>>description (\"T\" message)\n")); > >>>Maybe the JDBC driver uses the identical message wording but my > >>>thought is to look for something going through libpq. > > > >>The error is server side. I was just describing the environment. > > > >I can entirely assure you that that error message is not present in > >the server code. > Well that's even more interesting because it doesn't exist in the > jdbc driver either. Conclusion: they are using libpq in some form, so you should investigate that. Is there a way to alter the tick counter, so that a test run does not need to take the full 3 weeks? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Win32 hard crash problem
On 31-Aug-06, at 6:01 PM, Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Tom Lane wrote: BTW, are you sure this is coming from JDBC? I see the exact same message text in libpq: libpq_gettext("server sent data (\"D\" message) without prior row description (\"T\" message)\n")); Maybe the JDBC driver uses the identical message wording but my thought is to look for something going through libpq. The error is server side. I was just describing the environment. I can entirely assure you that that error message is not present in the server code. Well that's even more interesting because it doesn't exist in the jdbc driver either. Dave regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Tom Lane wrote: BTW, are you sure this is coming from JDBC? I see the exact same message text in libpq: libpq_gettext("server sent data (\"D\" message) without prior row description (\"T\" message)\n")); Maybe the JDBC driver uses the identical message wording but my thought is to look for something going through libpq. The error is server side. I was just describing the environment. I can entirely assure you that that error message is not present in the server code. Ok let me be more clear. The message is being throw via PostgreSQL. I am getting per the message I posted.. http://projects.commandprompt.com/public/pgsql/browser/trunk/pgsql/src/interfaces/libpq/fe-protocol2.c?rev=22194 http://projects.commandprompt.com/public/pgsql/browser/trunk/pgsql/src/interfaces/libpq/fe-protocol3.c?rev=25989 It is in libpq and the protocol not the backend that is giving me the message. When I said server, I as referring to postgresql inclusively, not the driver that was actually connecting. Sincerely, Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> BTW, are you sure this is coming from JDBC? I see the exact same >> message text in libpq: >> libpq_gettext("server sent data (\"D\" message) without prior row >> description (\"T\" message)\n")); >> Maybe the JDBC driver uses the identical message wording but my thought >> is to look for something going through libpq. > The error is server side. I was just describing the environment. I can entirely assure you that that error message is not present in the server code. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Win32 hard crash problem
Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: Dave Cramer and I have dealt with a company today running 8.1.4 on Windows 2003. The application is a web app that runs via JDBC/Hibernate. The application will function perfectly for about 2/3 weeks and then we will receive a: "server sent data (\"D\" message) without prior row description (\"T\" message)"); That sounds suspiciously close to the time from boot to wraparound of GetTickCount: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/gettickcount.asp M$ list this as 49 days but that's the time to wrap clear around to zero; the value overflows and goes negative in 24.85 days if I've done the math correctly. My bet is something depending on GetTickCount to measure elapsed time (and no, it's not used in the core Postgres code, but you've got plenty of other possible culprits in that stack). BTW, are you sure this is coming from JDBC? I see the exact same message text in libpq: libpq_gettext("server sent data (\"D\" message) without prior row description (\"T\" message)\n")); Maybe the JDBC driver uses the identical message wording but my thought is to look for something going through libpq. The error is server side. I was just describing the environment. Any thoughts? I suppose "get a real operating system" won't go over well? Tried that, I got nervous laughter on the other end ;) Joshua D. Drake regards, tom lane -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Win32 hard crash problem
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > Dave Cramer and I have dealt with a company today running 8.1.4 on > Windows 2003. The application is a web app that runs via JDBC/Hibernate. > The application will function perfectly for about 2/3 weeks and then we > will receive a: > "server sent data (\"D\" message) without prior row description (\"T\" > message)"); That sounds suspiciously close to the time from boot to wraparound of GetTickCount: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/gettickcount.asp M$ list this as 49 days but that's the time to wrap clear around to zero; the value overflows and goes negative in 24.85 days if I've done the math correctly. My bet is something depending on GetTickCount to measure elapsed time (and no, it's not used in the core Postgres code, but you've got plenty of other possible culprits in that stack). BTW, are you sure this is coming from JDBC? I see the exact same message text in libpq: libpq_gettext("server sent data (\"D\" message) without prior row description (\"T\" message)\n")); Maybe the JDBC driver uses the identical message wording but my thought is to look for something going through libpq. > Any thoughts? I suppose "get a real operating system" won't go over well? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings