Re: [GENERAL] psql query gets stuck indefinitely

2011-12-06 Thread tamanna madaan
Hi All

Please help me .

Thanks...
Tamanna

On Mon, Dec 5, 2011 at 12:45 PM, tamanna madaan 
tamanna.mad...@globallogic.com wrote:

  Hi Tomas

 I tried it on the system having postgres-8.4.0 . And the behavior is same
 .

 Cluster means a group of machines having postgres installed on all of them
 .

 Same database is created on all the machines one of which working as
 master DB

 on which operation (like insert/delete/update) will be performed and
 others working

 as Slave Db which will get data replicated to them from master DB by slony
 . In my

 cluster setup there are only two machines ( A and B ) one having master Db
 and other

 being slave . I execute the below query from system A to system B :

 psql -Udb name -hhost ip of B -c select sleep(300);

 This query can be seen running on system B in `ps -eaf | grep postgres`
 output .

 Now, while this query is going on, execute below command on system A which
 will block any packet coming to this machine :

 iptables -I INPUT -i eth0 -j DROP .

 Afer 5 mins (which is the sleep period) , the above query will finish on
 system B . But it can still be seen

 running on system A . This may be because of the reason that the message
 (that the query is finished)

 have not been received by system A .

 Still I would assume that after (tcp_keepalive_time +
 tcp_keepalive_probes*tcp_keepalive_intvl) , the above

 psql query should return on system A as well. But, this query doesn't
 return until it is killed manually .

 What could be the reason of that ??


 Well , I learnt below from the release notes of postgres :


 ==
 =



 postgres 8.1


 server side chnages :


 Add configuration parameters to control TCP/IP keep-alive times for idle,
 interval, and count (Oliver Jowett)

 These values can be changed to allow more rapid detection of lost client
 connections.


 postgres 9.0


 E.8.3.9. Development Tools

 E.8.3.9.1. libpq


 Add TCP keepalive settings in libpq (Tollef Fog Heen, Fujii Masao, Robert
 Haas)

 Keepalive settings were already supported on the server end of TCP
 connections.



 ==


 Does this mean that TCP keep alive settings(that are provided in postgres
 8.1 onwards) would only work for lost connections to server and

 won't work in the case above as above case requires psql (which is client
 ) to be returned ?? And for the above case the TCP keepalive settings in
 libpq ( that are provided in postgres 9.0 onwards) would work ??


 kernel version on my system is 2.6.27.7-9-default and potstgres-8.4.0.
 keepalive setting are as below :


 postgresql.conf


 #tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds;

 # 0 selects the system default

 #tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds;

 # 0 selects the system default

 #tcp_keepalives_count = 0 # TCP_KEEPCNT;

 # 0 selects the system default

  system level setiing :

 net.ipv4.tcp_keepalive_time = 7200

 net.ipv4.tcp_keepalive_probes = 9

 net.ipv4.tcp_keepalive_intvl = 75

  Regards

 Tamanna



 On Thu, Dec 1, 2011 at 7:28 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 1 Prosinec 2011, 12:57, tamanna madaan wrote:
  Hi Craig
  I am able to reproduce the issue now . I have postgres-8.1.2 installed
 in
  cluster setup.

 Well, the first thing you should do is to upgrade, at least to the last
 8.1 minor version, which is 8.1.22. It may very well be an already fixed
 bug (haven't checked). BTW the 8.1 branch is not supported for a long
 time, so upgrade to a more recent version if possible.

 Second - what OS are you using, what version? The keep-alive needs support
 at OS level, and if the OS is upgraded as frequently as the database (i.e.
 not at all), this might be already fixed.

 And finally - what do you mean by 'cluster setup'?

 Tomas




 --
 Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
 Leaders in Software RD Services
 ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

 Office: +0-120-406-2000 x 2971

 www.globallogic.com





-- 
Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
Leaders in Software RD Services
ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

Office: +0-120-406-2000 x 2971

www.globallogic.com


Re: [GENERAL] psql query gets stuck indefinitely

2011-12-04 Thread tamanna madaan
 Hi Tomas

 I tried it on the system having postgres-8.4.0 . And the behavior is same
.

Cluster means a group of machines having postgres installed on all of them .

Same database is created on all the machines one of which working as master
DB

on which operation (like insert/delete/update) will be performed and others
working

as Slave Db which will get data replicated to them from master DB by slony
. In my

cluster setup there are only two machines ( A and B ) one having master Db
and other

being slave . I execute the below query from system A to system B :

 psql -Udb name -hhost ip of B -c select sleep(300);

 This query can be seen running on system B in `ps -eaf | grep postgres`
output .

 Now, while this query is going on, execute below command on system A which
will block any packet coming to this machine :

 iptables -I INPUT -i eth0 -j DROP .

 Afer 5 mins (which is the sleep period) , the above query will finish on
system B . But it can still be seen

running on system A . This may be because of the reason that the message
(that the query is finished)

have not been received by system A .

 Still I would assume that after (tcp_keepalive_time +
tcp_keepalive_probes*tcp_keepalive_intvl) , the above

psql query should return on system A as well. But, this query doesn't
return until it is killed manually .

 What could be the reason of that ??


Well , I learnt below from the release notes of postgres :


==
=



postgres 8.1


server side chnages :


Add configuration parameters to control TCP/IP keep-alive times for idle,
interval, and count (Oliver Jowett)

These values can be changed to allow more rapid detection of lost client
connections.


postgres 9.0


E.8.3.9. Development Tools

E.8.3.9.1. libpq


Add TCP keepalive settings in libpq (Tollef Fog Heen, Fujii Masao, Robert
Haas)

Keepalive settings were already supported on the server end of TCP
connections.


==


Does this mean that TCP keep alive settings(that are provided in postgres
8.1 onwards) would only work for lost connections to server and

won't work in the case above as above case requires psql (which is client )
to be returned ?? And for the above case the TCP keepalive settings in
libpq ( that are provided in postgres 9.0 onwards) would work ??


kernel version on my system is 2.6.27.7-9-default and potstgres-8.4.0.
keepalive setting are as below :


postgresql.conf


#tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds;

# 0 selects the system default

#tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds;

# 0 selects the system default

#tcp_keepalives_count = 0 # TCP_KEEPCNT;

# 0 selects the system default

  system level setiing :

 net.ipv4.tcp_keepalive_time = 7200

net.ipv4.tcp_keepalive_probes = 9

net.ipv4.tcp_keepalive_intvl = 75

  Regards

Tamanna



On Thu, Dec 1, 2011 at 7:28 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 1 Prosinec 2011, 12:57, tamanna madaan wrote:
  Hi Craig
  I am able to reproduce the issue now . I have postgres-8.1.2 installed in
  cluster setup.

 Well, the first thing you should do is to upgrade, at least to the last
 8.1 minor version, which is 8.1.22. It may very well be an already fixed
 bug (haven't checked). BTW the 8.1 branch is not supported for a long
 time, so upgrade to a more recent version if possible.

 Second - what OS are you using, what version? The keep-alive needs support
 at OS level, and if the OS is upgraded as frequently as the database (i.e.
 not at all), this might be already fixed.

 And finally - what do you mean by 'cluster setup'?

 Tomas




-- 
Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
Leaders in Software RD Services
ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

Office: +0-120-406-2000 x 2971

www.globallogic.com


Re: [GENERAL] psql query gets stuck indefinitely

2011-12-01 Thread tamanna madaan
Hi Craig
I am able to reproduce the issue now . I have postgres-8.1.2 installed in
cluster setup.

I have started the below query from one system let say A to system B in
cluster .
psql -Udbname -hip of system B -c select sleep(300);

while this command is going on , system B is stopped abruptly by taking out
the power cable from it . This caused the above query on system A to hang.
This is still showing in 'ps -eaf' output after one day.  I think the tcp
keepalive mechanism which has been set at system level should have closed
this connection. But it didnt . Following keepalive values have been set on
system A :

net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
Why system level keepalive is not working in this case. Well, I learnt
, from the link you have provided, that programs must request keepalive
control for their sockets using the setsockopt interface. I wonder if
postgres8.1.2 supports / request for system level keepalive control ?? If
not, then which release/version of  postgres supports that ??

Thanks...
Tamanna


On Tue, Nov 29, 2011 at 4:56 PM, tamanna madaan 
tamanna.mad...@globallogic.com wrote:

 well, one question : Is tcp-keep-alive enabled by default in postgres-8.1.2 .

 I am using postgres on linux platform .



 On Tue, Nov 29, 2011 at 8:51 AM, tamanna madaan 
 tamanna.mad...@globallogic.com wrote:

 Hi Craig

 Thanks for your reply . But unfortunately I dont have that process
 running right now. I have already killed that process . But I have seen
 this problem sometimes on my setup.
 It generally happens when the remote system is going slow for some reason
 (CPU utilization high etc.)  . But whatever is the reason , I would assume
 that the query should return with some error or so
 in case the system, the query is running on , is rebooted .  But  it
 doesn't return and remain stuck. Moreover, the same query sometimes hangs
 even if it is run on local postgres  database so I dont think
 network issues have any role in that . Please help.

 Thanks

 Regards
 Tamanna


 On Tue, Nov 29, 2011 at 7:58 AM, Craig Ringer ring...@ringerc.id.auwrote:

 On 11/28/2011 05:30 PM, tamanna madaan wrote:

 Hi All
 I have postgres installed in cluster setup. My system has a script
 which executes the below query on remote system in cluster.
 psql -t -q -Uslon -hhostip -ddbname -cselect 1;
 But somehow this query got stuck. It didnt return even after the remote
 system( on which this query was supposed to execute) is rebooted . What
 could be the reason ??


 I relised just after sending my last message:

 You should use ps to find out what exactly psql is doing and which
 system call it's blocked in in the kernel (if it's waiting on a syscall).
 As you didn't mention your OS I'll assume you're on Linux, where you'd use:

  ps -C psql -o wchan:80=

 or

  ps -p 1234 -o wchan:80=

 ... where 1234 is the pid of the stuck psql process. In a psql waiting
 for command line input I see it blocked in the kernel routine n_tty_read
 for example.


 If you really want to know what it's doing you can also attach gdb and
 get a backtrace to see what code it's paused in inside psql:

 gdb -q -p 1234 __END__
 bt
 q
 __END__

 If you get a message about missing debuginfos, lots of lines reading
 no debugging symbols found or lots of lines ending in ?? () then you
 need to install debug symbols. How to do that depends on your OS/distro so
 I won't go into that; it's documented on the PostgreSQL wiki under how to
 get a stack trace but you probably won't want to bother if this is just
 for curiosity's sake.

 You're looking for output that looks like:

 #1  0x00369d22a131 in rl_getc () from /lib64/libreadline.so.6
 #2  0x00369d22a8e9 in rl_read_key () from /lib64/libreadline.so.6
 #3  0x00369d215b11 in readline_internal_char () from
 /lib64/libreadline.so.6
 #4  0x00369d216065 in readline () from /lib64/libreadline.so.6

 ... etc ...


 --
 Craig Ringer




 --
 Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
 Leaders in Software RD Services
 ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

 Office: +0-120-406-2000 x 2971

 www.globallogic.com





 --
 Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
 Leaders in Software RD Services
 ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

 Office: +0-120-406-2000 x 2971

 www.globallogic.com





-- 
Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
Leaders in Software RD Services
ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

Office: +0-120-406-2000 x 2971

www.globallogic.com


Re: [GENERAL] psql query gets stuck indefinitely

2011-12-01 Thread Tomas Vondra
On 1 Prosinec 2011, 12:57, tamanna madaan wrote:
 Hi Craig
 I am able to reproduce the issue now . I have postgres-8.1.2 installed in
 cluster setup.

Well, the first thing you should do is to upgrade, at least to the last
8.1 minor version, which is 8.1.22. It may very well be an already fixed
bug (haven't checked). BTW the 8.1 branch is not supported for a long
time, so upgrade to a more recent version if possible.

Second - what OS are you using, what version? The keep-alive needs support
at OS level, and if the OS is upgraded as frequently as the database (i.e.
not at all), this might be already fixed.

And finally - what do you mean by 'cluster setup'?

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] psql query gets stuck indefinitely

2011-11-29 Thread tamanna madaan
well, one question : Is tcp-keep-alive enabled by default in postgres-8.1.2 .

I am using postgres on linux platform .



On Tue, Nov 29, 2011 at 8:51 AM, tamanna madaan 
tamanna.mad...@globallogic.com wrote:

 Hi Craig

 Thanks for your reply . But unfortunately I dont have that process running
 right now. I have already killed that process . But I have seen this
 problem sometimes on my setup.
 It generally happens when the remote system is going slow for some reason
 (CPU utilization high etc.)  . But whatever is the reason , I would assume
 that the query should return with some error or so
 in case the system, the query is running on , is rebooted .  But  it
 doesn't return and remain stuck. Moreover, the same query sometimes hangs
 even if it is run on local postgres  database so I dont think
 network issues have any role in that . Please help.

 Thanks

 Regards
 Tamanna


 On Tue, Nov 29, 2011 at 7:58 AM, Craig Ringer ring...@ringerc.id.auwrote:

 On 11/28/2011 05:30 PM, tamanna madaan wrote:

 Hi All
 I have postgres installed in cluster setup. My system has a script
 which executes the below query on remote system in cluster.
 psql -t -q -Uslon -hhostip -ddbname -cselect 1;
 But somehow this query got stuck. It didnt return even after the remote
 system( on which this query was supposed to execute) is rebooted . What
 could be the reason ??


 I relised just after sending my last message:

 You should use ps to find out what exactly psql is doing and which system
 call it's blocked in in the kernel (if it's waiting on a syscall). As you
 didn't mention your OS I'll assume you're on Linux, where you'd use:

  ps -C psql -o wchan:80=

 or

  ps -p 1234 -o wchan:80=

 ... where 1234 is the pid of the stuck psql process. In a psql waiting
 for command line input I see it blocked in the kernel routine n_tty_read
 for example.


 If you really want to know what it's doing you can also attach gdb and
 get a backtrace to see what code it's paused in inside psql:

 gdb -q -p 1234 __END__
 bt
 q
 __END__

 If you get a message about missing debuginfos, lots of lines reading
 no debugging symbols found or lots of lines ending in ?? () then you
 need to install debug symbols. How to do that depends on your OS/distro so
 I won't go into that; it's documented on the PostgreSQL wiki under how to
 get a stack trace but you probably won't want to bother if this is just
 for curiosity's sake.

 You're looking for output that looks like:

 #1  0x00369d22a131 in rl_getc () from /lib64/libreadline.so.6
 #2  0x00369d22a8e9 in rl_read_key () from /lib64/libreadline.so.6
 #3  0x00369d215b11 in readline_internal_char () from
 /lib64/libreadline.so.6
 #4  0x00369d216065 in readline () from /lib64/libreadline.so.6

 ... etc ...


 --
 Craig Ringer




 --
 Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
 Leaders in Software RD Services
 ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

 Office: +0-120-406-2000 x 2971

 www.globallogic.com





-- 
Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
Leaders in Software RD Services
ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

Office: +0-120-406-2000 x 2971

www.globallogic.com


[GENERAL] psql query gets stuck indefinitely

2011-11-28 Thread tamanna madaan
Hi All

I have postgres installed in cluster setup. My system has a script  which
executes the below query on remote system in cluster.

psql -t -q -Uslon -hhostip -ddbname -cselect 1;

But somehow this query got stuck. It didnt return even after the remote
system( on which this query was supposed to execute) is rebooted . What
could be the reason ??


Thanks...
-- 
Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
Leaders in Software RD Services
ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

Office: +0-120-406-2000 x 2971

www.globallogic.com


Re: [GENERAL] psql query gets stuck indefinitely

2011-11-28 Thread Craig Ringer

 On 11/28/2011 05:30 PM, tamanna madaan wrote:

Hi All
I have postgres installed in cluster setup. My system has a script  
which executes the below query on remote system in cluster.

psql -t -q -Uslon -hhostip -ddbname -cselect 1;
But somehow this query got stuck. It didnt return even after the 
remote system( on which this query was supposed to execute) is 
rebooted . What could be the reason ??




The issue will most likely be related to the network or to the 
client-side host. Perhaps the client machine changed IP addresses (maybe 
as part of a switch from WiFi to wired or similar) ?


Check the man page for psql in 9.1; I think client-side keepalive 
support got committed for 9.1 . If it didn't, you can always set it 
globally for all TCP/IP connections on your system. See eg 
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html .


--
Craig Ringer


Re: [GENERAL] psql query gets stuck indefinitely

2011-11-28 Thread Craig Ringer

On 11/28/2011 05:30 PM, tamanna madaan wrote:

Hi All
I have postgres installed in cluster setup. My system has a script
which executes the below query on remote system in cluster.
psql -t -q -Uslon -hhostip -ddbname -cselect 1;
But somehow this query got stuck. It didnt return even after the remote
system( on which this query was supposed to execute) is rebooted . What
could be the reason ??


I relised just after sending my last message:

You should use ps to find out what exactly psql is doing and which 
system call it's blocked in in the kernel (if it's waiting on a 
syscall). As you didn't mention your OS I'll assume you're on Linux, 
where you'd use:


  ps -C psql -o wchan:80=

or

  ps -p 1234 -o wchan:80=

... where 1234 is the pid of the stuck psql process. In a psql waiting 
for command line input I see it blocked in the kernel routine 
n_tty_read for example.



If you really want to know what it's doing you can also attach gdb and 
get a backtrace to see what code it's paused in inside psql:


gdb -q -p 1234 __END__
bt
q
__END__

If you get a message about missing debuginfos, lots of lines reading 
no debugging symbols found or lots of lines ending in ?? () then you 
need to install debug symbols. How to do that depends on your OS/distro 
so I won't go into that; it's documented on the PostgreSQL wiki under 
how to get a stack trace but you probably won't want to bother if this 
is just for curiosity's sake.


You're looking for output that looks like:

#1  0x00369d22a131 in rl_getc () from /lib64/libreadline.so.6
#2  0x00369d22a8e9 in rl_read_key () from /lib64/libreadline.so.6
#3  0x00369d215b11 in readline_internal_char () from 
/lib64/libreadline.so.6

#4  0x00369d216065 in readline () from /lib64/libreadline.so.6

... etc ...


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] psql query gets stuck indefinitely

2011-11-28 Thread tamanna madaan
Hi Craig

Thanks for your reply . But unfortunately I dont have that process running
right now. I have already killed that process . But I have seen this
problem sometimes on my setup.
It generally happens when the remote system is going slow for some reason
(CPU utilization high etc.)  . But whatever is the reason , I would assume
that the query should return with some error or so
in case the system, the query is running on , is rebooted .  But  it
doesn't return and remain stuck. Moreover, the same query sometimes hangs
even if it is run on local postgres  database so I dont think
network issues have any role in that . Please help.

Thanks

Regards
Tamanna


On Tue, Nov 29, 2011 at 7:58 AM, Craig Ringer ring...@ringerc.id.au wrote:

 On 11/28/2011 05:30 PM, tamanna madaan wrote:

 Hi All
 I have postgres installed in cluster setup. My system has a script
 which executes the below query on remote system in cluster.
 psql -t -q -Uslon -hhostip -ddbname -cselect 1;
 But somehow this query got stuck. It didnt return even after the remote
 system( on which this query was supposed to execute) is rebooted . What
 could be the reason ??


 I relised just after sending my last message:

 You should use ps to find out what exactly psql is doing and which system
 call it's blocked in in the kernel (if it's waiting on a syscall). As you
 didn't mention your OS I'll assume you're on Linux, where you'd use:

  ps -C psql -o wchan:80=

 or

  ps -p 1234 -o wchan:80=

 ... where 1234 is the pid of the stuck psql process. In a psql waiting
 for command line input I see it blocked in the kernel routine n_tty_read
 for example.


 If you really want to know what it's doing you can also attach gdb and get
 a backtrace to see what code it's paused in inside psql:

 gdb -q -p 1234 __END__
 bt
 q
 __END__

 If you get a message about missing debuginfos, lots of lines reading no
 debugging symbols found or lots of lines ending in ?? () then you need
 to install debug symbols. How to do that depends on your OS/distro so I
 won't go into that; it's documented on the PostgreSQL wiki under how to
 get a stack trace but you probably won't want to bother if this is just
 for curiosity's sake.

 You're looking for output that looks like:

 #1  0x00369d22a131 in rl_getc () from /lib64/libreadline.so.6
 #2  0x00369d22a8e9 in rl_read_key () from /lib64/libreadline.so.6
 #3  0x00369d215b11 in readline_internal_char () from
 /lib64/libreadline.so.6
 #4  0x00369d216065 in readline () from /lib64/libreadline.so.6

 ... etc ...


 --
 Craig Ringer




-- 
Tamanna Madaan | Associate Consultant | GlobalLogic Inc.
Leaders in Software RD Services
ARGENTINA | CHILE | CHINA | GERMANY | INDIA | ISRAEL | UKRAINE | UK | USA

Office: +0-120-406-2000 x 2971

www.globallogic.com


Re: [GENERAL] psql query gets stuck indefinitely

2011-11-28 Thread Craig Ringer
On 29/11/11 11:21, tamanna madaan wrote:
 Hi Craig

 Thanks for your reply . But unfortunately I dont have that process
 running right now. I have already killed that process . But I have
 seen this problem sometimes on my setup.
 It generally happens when the remote system is going slow for some
 reason (CPU utilization high etc.)  . But whatever is the reason , I
 would assume that the query should return with some error or so
 in case the system, the query is running on , is rebooted .  But  it
 doesn't return and remain stuck. Moreover, the same query sometimes
 hangs even if it is run on local postgres  database so I dont think
 network issues have any role in that . Please help.

Well, it *really* shouldn't hang locally.

To help you further I'll need you to collect the information on the
stuck process next time you encounter one and post that as a reply.
Maybe with a bit more info we can see what might be going on.

--
Craig Ringer