Re: [GENERAL] Questions about connection clean-up and invalid page header

2010-01-25 Thread Scott Marlowe
On Sun, Jan 24, 2010 at 3:17 AM, Herouth Maoz hero...@unicell.co.il wrote:
 Hi Everybody.

 I have two questions.

 1. We have a system that is accessed by Crystal reports which is in turned
 controlled by another (3rd party) system. Now, when a report takes too long or
 the user cancels it, it doesn't send a cancel request to Postgres. It just
 kills the Crystal process that works on it.

 As a result, the query is left alive on the Postgres backend. Eventually I get
 the message Unexpected End of file and the query is cancelled. But this
 doesn't happen soon enough for me - these are usually very heavy queries, and
 I'd like them to be cleaned up as soon as possible if the client connection
 has ended.

The real solution is to fix the application.  But I understand
sometimes you can't do that.

 Is there a parameter to set in the configuration or some other means to
 shorten the time before an abandoned backend's query is cancelled?

You can shorten the tcp_keepalive settings so that dead connections
get detected faster.

 2. I get the following message in my development database:

 vacuumdb: vacuuming of database reports failed: ERROR:  invalid page header
 in block 6200 of relation rb

 I had this already a couple of months ago. Looking around the web, I saw this
 error is supposed to indicate a hardware error. I informed my sysadmin, but
 since this is just the dev system and the data was not important, I did a
 TRUNCATE TABLE on the rb relation, and the errors stopped...

 But now the error is back, and I'm a bit suspicious. If this is a hardware
 issue, it's rather suspicious that it returned in the exact same relation
 after I did a truncate table. I have many other relations in the system,
 ones that fill up a lot faster. So I suspect this might be a PostgreSQL issue
 after all. What can I do about this?

Might be, but not very likely.  I and many others run pgsql in
production environments where it handles thousands of updates /
inserts per minute with no corruption.  We run on server class
hardware with ECC memory and large RAID arrays with no corruption.

Have you run something as simple as memtest86+ on your machine to see
if it's got bad memory?

 We are currently using PostgreSQL v. 8.3.1 on the server side.

You should really update to the latest 8.3.x version (around 8.3.8 or
so).  It's simple and easy, and it's possible you've hit a bug in an
older version of 8.3.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Questions about connection clean-up and invalid page header

2010-01-25 Thread Herouth Maoz

Scott Marlowe wrote:


You can shorten the tcp_keepalive settings so that dead connections
get detected faster.
  

Thanks, I'll ask my sysadmin to do that.


Might be, but not very likely.  I and many others run pgsql in
production environments where it handles thousands of updates /
inserts per minute with no corruption.  We run on server class
hardware with ECC memory and large RAID arrays with no corruption.
  
Someone pointed out to me, though, that comparing data warehouse systems 
to production systems is like Apples and Oranges - we also have a 
production system that, as you say, makes millions of inserts and 
updates per hour. It works very well with PostgreSQL - a lot better than 
with Sybase with which we worked previously. But the reports system on 
which I work makes bulk inserts using calculations based on complicated 
joins and each transaction is long and memory-consuming, as opposed to 
the production system, where each transaction takes a few milliseconds 
and is cleared immediately.


So far this only happened to me in the development server, and if it 
really is a matter of hardware, I'm not worried. What I am worried is if 
there really is some sort of bug that may carry to our production 
reports system.

Have you run something as simple as memtest86+ on your machine to see
if it's got bad memory?
  

I'll tell my sysadmin to do that. Thank you.
  

We are currently using PostgreSQL v. 8.3.1 on the server side.



You should really update to the latest 8.3.x version (around 8.3.8 or
so).  It's simple and easy, and it's possible you've hit a bug in an
older version of 8.3.
  

OK, I'll also try to get that done.

Thanks for your help,
Herouth



Re: [GENERAL] Questions about connection clean-up and invalid page header

2010-01-25 Thread Greg Stark
On Mon, Jan 25, 2010 at 8:15 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 Is there a parameter to set in the configuration or some other means to
 shorten the time before an abandoned backend's query is cancelled?

 You can shorten the tcp_keepalive settings so that dead connections
 get detected faster.


This won't help. The TCP connection is already being closed (or I
think only half-closed). The problem is that in the Unix socket API
you don't find out about that unless you check or try to read or write
to it.

The tcp_keepalive setting would only come into play if the remote
machine crashed or was disconnected from the network.



-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Questions about connection clean-up and invalid page header

2010-01-25 Thread Herouth Maoz

Greg Stark wrote:


On Mon, Jan 25, 2010 at 8:15 AM, Scott Marlowe scott.marl...@gmail.com wrote:
  

Is there a parameter to set in the configuration or some other means to
shorten the time before an abandoned backend's query is cancelled?
  

You can shorten the tcp_keepalive settings so that dead connections
get detected faster.




This won't help. The TCP connection is already being closed (or I
think only half-closed). The problem is that in the Unix socket API
you don't find out about that unless you check or try to read or write
to it.

The tcp_keepalive setting would only come into play if the remote
machine crashed or was disconnected from the network.
  
That's the situation I'm having, so it's OK. Crystal, being a Windows 
application, obviously runs on a different server than the database 
itself, so the connection between them is TCP/IP, not Unix domain 
sockets. And furthermore, that was exactly the problem as I described it 
- the fact that the third party software, instead of somehow instructing 
Crystal to send a cancel request to PostgreSQL, instead just kills the 
client process on the Windows side.


Herouth


Re: [GENERAL] Questions about connection clean-up and invalid page header

2010-01-25 Thread Greg Stark
On Mon, Jan 25, 2010 at 11:37 AM, Herouth Maoz hero...@unicell.co.il wrote:
 The tcp_keepalive setting would only come into play if the remote
 machine crashed or was disconnected from the network.


 That's the situation I'm having, so it's OK. Crystal, being a Windows
 application, obviously runs on a different server than the database itself,
 so the connection between them is TCP/IP, not Unix domain sockets.

The unix socket api is used for both unix domain sockets and internet
domain sockets. The point is that in the api there's no way to find
out about a connection the other side has closed except for when you
write or read from it or when you explicitly check.


 And
 furthermore, that was exactly the problem as I described it - the fact that
 the third party software, instead of somehow instructing Crystal to send a
 cancel request to PostgreSQL, instead just kills the client process on the
 Windows side.

Killing the client process doesn't mean the machine has crashed or
been disconnected from the network. I'm assuming Crystal isn't
crashing the machine just to stop the report... And even if it did and
tcp_keepalives kicked in the server *still* wouldn't notice until it
checked or tried to read or write to that socket.

-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Questions about connection clean-up and invalid page header

2010-01-25 Thread Herouth Maoz

Greg Stark wrote:


On Mon, Jan 25, 2010 at 11:37 AM, Herouth Maoz hero...@unicell.co.il wrote:
  

The tcp_keepalive setting would only come into play if the remote
machine crashed or was disconnected from the network.


That's the situation I'm having, so it's OK. Crystal, being a Windows
application, obviously runs on a different server than the database itself,
so the connection between them is TCP/IP, not Unix domain sockets.



The unix socket api is used for both unix domain sockets and internet
domain sockets. The point is that in the api there's no way to find
out about a connection the other side has closed except for when you
write or read from it or when you explicitly check.


  

And
furthermore, that was exactly the problem as I described it - the fact that
the third party software, instead of somehow instructing Crystal to send a
cancel request to PostgreSQL, instead just kills the client process on the
Windows side.



Killing the client process doesn't mean the machine has crashed or
been disconnected from the network. I'm assuming Crystal isn't
crashing the machine just to stop the report... And even if it did and
tcp_keepalives kicked in the server *still* wouldn't notice until it
checked or tried to read or write to that socket.

  
Well, I assume by the fact that eventually I get an Unexpected end of 
file message for those queries, that something does go in and check 
them. Do you have any suggestion as to how to cause the postgresql 
server to do so earlier?


Herouth


Re: [GENERAL] Questions about connection clean-up and invalid page header

2010-01-25 Thread Greg Stark
On Mon, Jan 25, 2010 at 1:16 PM, Herouth Maoz hero...@unicell.co.il wrote:
 Well, I assume by the fact that eventually I get an Unexpected end of file
 message for those queries, that something does go in and check them. Do you
 have any suggestion as to how to cause the postgresql server to do so
 earlier?

No, Postgres pretty intentionally doesn't check because checking would
be quite slow.

If this is a plpgsql function looping you can put a RAISE NOTICE in
the loop periodically. I suppose you could write such a function and
add it to your query but whether it does what you want will depend on
the query plan.

-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general