Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-09 Thread Dmitri Girski
Thanks for advice, Dave!

This saga ended in an unexpected way: the firewall died.
Since the replacement firewall installed I have not seen any 3 seconds
connects. Well, there was no real load so far, but I will keep checking.


Thanks to everyone replied, it was very helpful.

Cheers,
Dmitri.




On Fri, Jan 8, 2010 at 7:13 AM, Dave Crooke  wrote:

> Oops, I meant to mention this too  virtually all GigE and/or server
> class NICs do TCP checksum offload.
>
> Dimitri - it's unlikely that you have a hardware issue on the NIC, it's
> more likely to be a cable problem or network congestion. What you want to
> look for in the tcpdump capture is things like SYN retries.
>
> A good way to test for cable issues is to use a ping flood with a large
> packet size.
>
> Cheers
> Dave
>
>  Hang on a sec. You need to ignore bad checksums on *outbound* packets,
>> because many (most?) Ethernet drivers implement some level of TCP
>> offloading, and this will result in packet sniffers seeing invalid checksums
>> for transmitted packets - the checksums haven't been generated by the NIC
>> yet.
>>
>> Unless you know for sure that your NIC doesn't do TSO, ignore bad
>> checksums on outbound packets from the local interface.
>>
>> --
>> Craig Ringer
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
>> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>


-- 
@Gmail


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-07 Thread Dave Crooke
Oops, I meant to mention this too  virtually all GigE and/or server
class NICs do TCP checksum offload.

Dimitri - it's unlikely that you have a hardware issue on the NIC, it's more
likely to be a cable problem or network congestion. What you want to look
for in the tcpdump capture is things like SYN retries.

A good way to test for cable issues is to use a ping flood with a large
packet size.

Cheers
Dave

Hang on a sec. You need to ignore bad checksums on *outbound* packets,
> because many (most?) Ethernet drivers implement some level of TCP
> offloading, and this will result in packet sniffers seeing invalid checksums
> for transmitted packets - the checksums haven't been generated by the NIC
> yet.
>
> Unless you know for sure that your NIC doesn't do TSO, ignore bad checksums
> on outbound packets from the local interface.
>
> --
> Craig Ringer
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-06 Thread Craig Ringer

On 7/01/2010 10:44 AM, Dmitri Girski wrote:

Hi everybody,

Many thanks to everyone replied, I think we are on the right way.
I've used tcpdump to generate the logs and there are a lot of dropped
packets due to the bad checksum. Network guy is currently looking at the
problem and most likely this is hardware issue.


Hang on a sec. You need to ignore bad checksums on *outbound* packets, 
because many (most?) Ethernet drivers implement some level of TCP 
offloading, and this will result in packet sniffers seeing invalid 
checksums for transmitted packets - the checksums haven't been generated 
by the NIC yet.


Unless you know for sure that your NIC doesn't do TSO, ignore bad 
checksums on outbound packets from the local interface.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-06 Thread Scott Marlowe
On Wed, Jan 6, 2010 at 7:44 PM, Dmitri Girski  wrote:
> Hi everybody,
> Many thanks to everyone replied, I think we are on the right way.
> I've used tcpdump to generate the logs and there are a lot of dropped
> packets due to the bad checksum. Network guy is currently looking at the
> problem and most likely this is hardware issue.

95% of these problems are a bad NIC or a bad cable. Since cables are
easy to change I'd try those first, then NICs.  Since lots of servers
have dual nics that's a pretty easy change too.

 Every now and then it's a bad switch / router.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-06 Thread Dmitri Girski
Hi everybody,

Many thanks to everyone replied, I think we are on the right way.
I've used tcpdump to generate the logs and there are a lot of dropped
packets due to the bad checksum. Network guy is currently looking at the
problem and most likely this is hardware issue.

Cheers,
Dmitri.

On Tue, Jan 5, 2010 at 1:12 PM, Dmitri Girski  wrote:

> Hi everybody,
>
> I am running a PostgreSQL server 8.3.5 with a pretty much standard config.
>
> The web application server which runs Apache 1.3/PHP2.9  has an
> intermittent problem:
> pg_connect takes exactly 3.0 seconds. The usual connection time is 0.0045.
> The long request happens at approximate rate 1:100.
>
> I turned on logs on postgres server side, and there is
> nothing suspicious for me there. When a connection request comes, it is
> being served without any delay.
>
> Could anyone point me to the direction in which I should investigate this
> problem further?
> Thank you in advance!
>
>
> PS The hardware is: Dell SC1435/4Gb/2x2.0GHz/Gentoo Linux.
> The database & web servers are in the 2 local subnets.
>
>
> Dmitri.
>
>


-- 
@Gmail


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-06 Thread Craig James

Dave Crooke wrote:
The fact that the delays are clustered at (3 + 0.2 n) seconds, rather 
than a distributed range, strongly indicates a timeout and not 
(directly) a resource issue.


3 seconds is too fast for a timeout on almost any DNS operation, unless 
it has been modified, so I'd suspect it's the TCP layer, e.g. perhaps 
the SYN packet goes awol and it has to retry.


I'd second the vote for investigation with a packet sniffing tool 
(Wireshark, tcpdump, etc)


If you have a PC (Windows), pingplotter is a remarkable and simple tool to use 
that quickly identifies problems, and gives results that are convincing when 
you show them to your network admin.  Wireshark and tcpdump have a pretty steep 
learning curve and are overkill if your problem is simple.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-06 Thread Dave Crooke
The fact that the delays are clustered at (3 + 0.2 n) seconds, rather than a
distributed range, strongly indicates a timeout and not (directly) a
resource issue.

3 seconds is too fast for a timeout on almost any DNS operation, unless it
has been modified, so I'd suspect it's the TCP layer, e.g. perhaps the SYN
packet goes awol and it has to retry.

I'd second the vote for investigation with a packet sniffing tool
(Wireshark, tcpdump, etc)

Cheers
Dave

On Mon, Jan 4, 2010 at 8:12 PM, Dmitri Girski  wrote:

> Hi everybody,
>
> I am running a PostgreSQL server 8.3.5 with a pretty much standard config.
>
> The web application server which runs Apache 1.3/PHP2.9  has an
> intermittent problem:
> pg_connect takes exactly 3.0 seconds. The usual connection time is 0.0045.
> The long request happens at approximate rate 1:100.
>
> I turned on logs on postgres server side, and there is
> nothing suspicious for me there. When a connection request comes, it is
> being served without any delay.
>
> Could anyone point me to the direction in which I should investigate this
> problem further?
> Thank you in advance!
>
>
> PS The hardware is: Dell SC1435/4Gb/2x2.0GHz/Gentoo Linux.
> The database & web servers are in the 2 local subnets.
>
>
> Dmitri.
>
>


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-06 Thread Robert Haas
On Tue, Jan 5, 2010 at 11:50 PM, Craig Ringer
 wrote:
> Wireshark is your friend.

+1.  I think if you put a packet sniffer on the interface you are
connecting from it will become clear what the problem is in short
order.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-06 Thread Matthew Wakeling

On Wed, 6 Jan 2010, Dmitri Girski wrote:

On the other hand, if I use ip addresses this should not attract any possible 
issues with
DNS, right?


Not true. It is likely that the server program you are connecting to will 
perform a reverse DNS lookup to work out who the client is, for logging or 
authentication purposes.


Matthew

--
"To err is human; to really louse things up requires root
privileges." -- Alexander Pope, slightly paraphrased

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-05 Thread Craig Ringer
Dmitri Girski wrote:
> Hi Andy,
> 
> I tried  2 connections strings:
> - server name (DB1),  which is listed in all machines hosts files.
> - ip address.
> 
> There is no difference in both methods, still I have 5-7 pg_connects which
> last around 3 seconds.

Don't rule out reverse DNS issues (such as a negative cache entry
expiring and being re-checked), packet loss, nss issues on the server
related to NIS, LDAP etc user directories, and so on.

Wireshark is your friend.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-05 Thread Scott Marlowe
On Tue, Jan 5, 2010 at 8:49 PM, Dmitri Girski  wrote:
> Hi Tom,
> The timing is around 3.0 seconds
> Time=3.0037
> Time=3.4038
> Time=3.0038
> Time=3.004
> Time=3.2037
> Time=3.0039
> Time=3.0034
> Time=3.0034
> Time=3.2039
> Time=3.0044
> Time=3.8044
> Time=3.2034
>
> I don't think that it could relate to DNS problem as I tried 2 methods which
> does not use name resolution ( hosts file & ip address)
> I will definitely seek the help from network geeks and I will check all TCP
> stack settings.

Have you looked at the various logs inn /var/log on both machines?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-05 Thread Dmitri Girski
Hi Greg,

Thank you for idea, reading about checkpints & tuning was very useful.

I had a checkpoints logging turned on. I studied a couple of days logs and I
there is no clear dependency on checkpoint write. Sometimes it is within a
vicinity of 3 seconds CONNECT, sometimes well off it.

Also the postgres log file does not show any long operations, which inclines
me to think that this is a network connectivity/apache/php issue rather than
postgres.


Here is the excerpts from logs:

Log file from WWW server
===
[06-01-10 14:58:16] UserId=15  Time=3.0032 Req=DB CONNECT
===

Log file from DB server:
===
[2010-01-06 14:58:13 EST] idleLOG:  0: disconnection: session time:
0:00:00.027 user=pri_user database=data host=192.168.1.10 port=50087
[2010-01-06 14:58:13 EST] idleLOCATION:  log_disconnections, postgres.c:3982

[2010-01-06 14:58:18 EST] /usr/lib64/postgresql-8.3/bin/postgresLOG:  0:
connection received: host=192.168.1.10 port=52425
[2010-01-06 14:58:18 EST] /usr/lib64/postgresql-8.3/bin/postgresLOCATION:
 BackendInitialize, postmaster.c:3027
[2010-01-06 14:58:18 EST] authenticationLOG:  0: connection authorized:
user=pri_user database=data
[2010-01-06 14:58:18 EST] authenticationLOCATION:  BackendInitialize,
postmaster.c:3097

[2010-01-06 14:58:18 EST] idleLOG:  0: statement: SELECT
"fIsLoggedIn"(15)
[2010-01-06 14:58:18 EST] idleLOCATION:  exec_simple_query, postgres.c:845
[2010-01-06 14:58:18 EST] SELECTLOG:  0: duration: 39.233 ms
[2010-01-06 14:58:18 EST] SELECTLOCATION:  exec_simple_query,
postgres.c:1056
[2010-01-06 14:58:18 EST] idleLOG:  0: statement: START TRANSACTION
[2010-01-06 14:58:18 EST] idleLOCATION:  exec_simple_query, postgres.c:845
[2010-01-06 14:58:18 EST] START TRANSACTIONLOG:  0: duration: 0.050 ms
===

Cheers,
Dmitri.


On Tue, Jan 5, 2010 at 2:01 PM, Greg Smith  wrote:

>  Dmitri Girski wrote:
>
> I am running a PostgreSQL server 8.3.5 with a pretty much standard config.
>
>  The web application server which runs Apache 1.3/PHP2.9  has an
> intermittent problem:
> pg_connect takes exactly 3.0 seconds. The usual connection time is 0.0045.
> The long request happens at approximate rate 1:100.
>
>
> First thing to check for intermittent multi-second delays is whether a
> checkpoint is happening at that time.  See
> http://wiki.postgresql.org/wiki/Logging_Checkpoints for an intro, you'd
> want to see if the checkpoints are around the same time as the delays each
> time.  The default configuration makes checkpoints happen all the time if
> there's any significant write traffic on your database.
>
> --
> Greg Smith2ndQuadrant   Baltimore, MD
> PostgreSQL Training, Services and supportg...@2ndquadrant.com  
> www.2ndQuadrant.com
>
>


-- 
@Gmail


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-05 Thread Dmitri Girski
Hi Tom,

The timing is around 3.0 seconds
Time=3.0037
Time=3.4038
Time=3.0038
Time=3.004
Time=3.2037
Time=3.0039
Time=3.0034
Time=3.0034
Time=3.2039
Time=3.0044
Time=3.8044
Time=3.2034

I don't think that it could relate to DNS problem as I tried 2 methods which
does not use name resolution ( hosts file & ip address)
I will definitely seek the help from network geeks and I will check all TCP
stack settings.

Thank you!

Cheers,
Dmitri.


On Tue, Jan 5, 2010 at 4:14 PM, Tom Lane  wrote:

> Dmitri Girski  writes:
> > I am running a PostgreSQL server 8.3.5 with a pretty much standard
> config.
>
> > The web application server which runs Apache 1.3/PHP2.9  has an
> intermittent
> > problem:
> > pg_connect takes exactly 3.0 seconds. The usual connection time is
> 0.0045.
> > The long request happens at approximate rate 1:100.
>
> Sounds a lot like a dropped-packets problem.  The exact timing would be
> explained if that is the retransmit timeout in your client-side TCP
> stack.  If that's what it is, you need some network engineers, not us
> database geeks ...
>
>regards, tom lane
>



-- 
@Gmail


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-05 Thread Dmitri Girski
Thank you for reply , Andy!

I tried both cases: server name which is listed in hosts file and ip address
( 192.168.2.2) - no difference so far.

Cheers,
Dmitri.

On Wed, Jan 6, 2010 at 2:03 AM, Andy Colson  wrote:

> On 1/4/2010 8:12 PM, Dmitri Girski wrote:
>
>> Hi everybody,
>>
>> I am running a PostgreSQL server 8.3.5 with a pretty much standard config.
>>
>> The web application server which runs Apache 1.3/PHP2.9  has an
>> intermittent problem:
>> pg_connect takes exactly 3.0 seconds. The usual connection time is 0.0045.
>> The long request happens at approximate rate 1:100.
>>
>> I turned on logs on postgres server side, and there is
>> nothing suspicious for me there. When a connection request comes, it is
>> being served without any delay.
>>
>> Could anyone point me to the direction in which I should investigate
>> this problem further?
>> Thank you in advance!
>>
>>
>> PS The hardware is: Dell SC1435/4Gb/2x2.0GHz/Gentoo Linux.
>> The database & web servers are in the 2 local subnets.
>>
>>
>> Dmitri.
>>
>>
> How do you have the connect string?  With an IP or a name?  Maybe its a DNS
> lookup timeout?  You could switch to IP or drop the name in the hosts file
> and see if that makes a difference.
>
> -Andy
>



-- 
@Gmail


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-05 Thread Dmitri Girski
Hi Scott,

Thank you pointers, I've spoken to the network guy, he will help to monitor
connections on the firewall.
On the other hand, if I use ip addresses this should not attract any
possible issues with DNS, right?

Thanks!

Dmitri.

On Wed, Jan 6, 2010 at 9:32 AM, Scott Carey  wrote:

> Delays that are almost exactly 3 seconds over a network are almost always
> some sort of network configuration issue.
>
> Inside a datacenter, mis-configured load balancers or routers can cause low
> level network issues that result in intermittent network delays of exactly
> 3
> seconds (a loop in a routing network?).
> DNS timeouts are often 3 seconds.
>
> Not sure if any of the above is it, but this sounds like a network
> configuration problem to me.
>
> On 1/4/10 6:12 PM, "Dmitri Girski"  wrote:
>
> > Hi everybody,
> >
> > I am running a PostgreSQL server 8.3.5 with a pretty much standard
> config.
> >
> > The web application server which runs Apache 1.3/PHP2.9  has an
> intermittent
> > problem:
> > pg_connect takes exactly 3.0 seconds. The usual connection time is
> 0.0045.
> > The long request happens at approximate rate 1:100.
> >
> > I turned on logs on postgres server side, and there is
> nothing suspicious for
> > me there. When a connection request comes, it is being served without any
> > delay.
> >
> > Could anyone point me to the direction in which I should investigate this
> > problem further?
> > Thank you in advance!
> >
> >
> > PS The hardware is: Dell SC1435/4Gb/2x2.0GHz/Gentoo Linux.
> > The database & web servers are in the 2 local subnets.
> >
> >
> > Dmitri.
> >
> >
>
>


-- 
@Gmail


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-05 Thread Dmitri Girski
Hi Andy,

I tried  2 connections strings:
- server name (DB1),  which is listed in all machines hosts files.
- ip address.

There is no difference in both methods, still I have 5-7 pg_connects which
last around 3 seconds.

Cheers,
Dmitri.

On Wed, Jan 6, 2010 at 2:03 AM, Andy Colson  wrote:

> On 1/4/2010 8:12 PM, Dmitri Girski wrote:
>
>> Hi everybody,
>>
>> I am running a PostgreSQL server 8.3.5 with a pretty much standard config.
>>
>> The web application server which runs Apache 1.3/PHP2.9  has an
>> intermittent problem:
>> pg_connect takes exactly 3.0 seconds. The usual connection time is 0.0045.
>> The long request happens at approximate rate 1:100.
>>
>> I turned on logs on postgres server side, and there is
>> nothing suspicious for me there. When a connection request comes, it is
>> being served without any delay.
>>
>> Could anyone point me to the direction in which I should investigate
>> this problem further?
>> Thank you in advance!
>>
>>
>> PS The hardware is: Dell SC1435/4Gb/2x2.0GHz/Gentoo Linux.
>> The database & web servers are in the 2 local subnets.
>>
>>
>> Dmitri.
>>
>>
> How do you have the connect string?  With an IP or a name?  Maybe its a DNS
> lookup timeout?  You could switch to IP or drop the name in the hosts file
> and see if that makes a difference.
>
> -Andy
>



-- 
@Gmail


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-05 Thread Scott Carey
Delays that are almost exactly 3 seconds over a network are almost always
some sort of network configuration issue.

Inside a datacenter, mis-configured load balancers or routers can cause low
level network issues that result in intermittent network delays of exactly 3
seconds (a loop in a routing network?).
DNS timeouts are often 3 seconds.

Not sure if any of the above is it, but this sounds like a network
configuration problem to me.

On 1/4/10 6:12 PM, "Dmitri Girski"  wrote:

> Hi everybody,
> 
> I am running a PostgreSQL server 8.3.5 with a pretty much standard config.
> 
> The web application server which runs Apache 1.3/PHP2.9  has an intermittent
> problem:
> pg_connect takes exactly 3.0 seconds. The usual connection time is 0.0045.
> The long request happens at approximate rate 1:100.
> 
> I turned on logs on postgres server side, and there is nothing suspicious for
> me there. When a connection request comes, it is being served without any
> delay. 
> 
> Could anyone point me to the direction in which I should investigate this
> problem further?
> Thank you in advance!
> 
> 
> PS The hardware is: Dell SC1435/4Gb/2x2.0GHz/Gentoo Linux.
> The database & web servers are in the 2 local subnets. 
> 
> 
> Dmitri.
> 
> 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-05 Thread Andy Colson

On 1/4/2010 8:12 PM, Dmitri Girski wrote:

Hi everybody,

I am running a PostgreSQL server 8.3.5 with a pretty much standard config.

The web application server which runs Apache 1.3/PHP2.9  has an
intermittent problem:
pg_connect takes exactly 3.0 seconds. The usual connection time is 0.0045.
The long request happens at approximate rate 1:100.

I turned on logs on postgres server side, and there is
nothing suspicious for me there. When a connection request comes, it is
being served without any delay.

Could anyone point me to the direction in which I should investigate
this problem further?
Thank you in advance!


PS The hardware is: Dell SC1435/4Gb/2x2.0GHz/Gentoo Linux.
The database & web servers are in the 2 local subnets.


Dmitri.



How do you have the connect string?  With an IP or a name?  Maybe its a 
DNS lookup timeout?  You could switch to IP or drop the name in the 
hosts file and see if that makes a difference.


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-04 Thread Tom Lane
Dmitri Girski  writes:
> I am running a PostgreSQL server 8.3.5 with a pretty much standard config.

> The web application server which runs Apache 1.3/PHP2.9  has an intermittent
> problem:
> pg_connect takes exactly 3.0 seconds. The usual connection time is 0.0045.
> The long request happens at approximate rate 1:100.

Sounds a lot like a dropped-packets problem.  The exact timing would be
explained if that is the retransmit timeout in your client-side TCP
stack.  If that's what it is, you need some network engineers, not us
database geeks ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-04 Thread Greg Smith

Dmitri Girski wrote:

I am running a PostgreSQL server 8.3.5 with a pretty much standard config.

The web application server which runs Apache 1.3/PHP2.9  has an 
intermittent problem:

pg_connect takes exactly 3.0 seconds. The usual connection time is 0.0045.
The long request happens at approximate rate 1:100.


First thing to check for intermittent multi-second delays is whether a 
checkpoint is happening at that time.  See 
http://wiki.postgresql.org/wiki/Logging_Checkpoints for an intro, you'd 
want to see if the checkpoints are around the same time as the delays 
each time.  The default configuration makes checkpoints happen all the 
time if there's any significant write traffic on your database.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com