In case anybody looks back on this thread in the future, I fixed the problem (at least temporarily) by running the following in CentOS 7: # echo "1025 65535" >/proc/sys/net/ipv4/ip_local_port_range # sysctl -w net.ipv4.tcp_timestamps=1 # sysctl -w net.ipv4.tcp_tw_recycle=1 # sysctl -w tcp_tw_reuse=1
On both the database server and the client nodes. It seems the clients were eating up all the available ephemeral ports opening and closing database connections. I'm going to have a long talk with the developer about why this is done in the first place :) Cheers, and thanks for the tips. --Steve On Tue, Mar 29, 2016 at 6:51 PM Adrian Klaver <adrian.kla...@aklaver.com> wrote: > On 03/29/2016 03:25 PM, Stephen Constable wrote: > > Sorry, my client environment is Linux. > > Hmm, so I was reading win32.c wrong. It is mapping a Windows error > message to that string. > > > > > My current theory is that my clients are running out of available > > ephemeral ports, like in this thread: > > > http://dba.stackexchange.com/questions/59650/pgbouncer-works-great-but-occasionally-becomes-unavailable > (but > > I"m not currently using pg bouncer). I tried pg bouncer before and had > > the same errors, which in retrospect makes the client-side issue seem > > more likely. Are there any configuration variables I can set to reduce > > the number of ephemeral ports required in the postgresql client > > libraries? Otherwise, I will attempt to reconfigure the OS of the > > client machines tomorrow morning. > > Not sure how that would work. To make a network connection would seem to > me to require a port. > > Are you seeing the same sort of port churn on your 8.4 machine? > > Is the fact that is processing results slower maybe giving the ports a > chance to timeout their wait time, versus not on the newer faster machine? > > The issue, to me at least, seems to be less the number of jobs, but the > number of connections each job is producing. T > > > > Thanks, > > Steve > > > > On Tue, Mar 29, 2016 at 4:44 PM Adrian Klaver <adrian.kla...@aklaver.com > > <mailto:adrian.kla...@aklaver.com>> wrote: > > > > On 03/29/2016 01:28 PM, Stephen Constable wrote: > > > My apologies, I'm not sure what part of the networking stack the > > > messages are coming from. It also states: > > > """ > > > could not connect to server: Cannot assign requested address > > > Is the server running on host "<hostname>" and accepting > > > TCP/IP connections on port <port>? > > > """ > > > > Alright I lied, the above is a Postgres error message. I am just not > > used to seeing 'Cannot assign requested address'. Turns out it is in > > interfaces/libpq/win32.c. > > > > So your client is running on Windows? > > > > > > > This error is only printed under a 32-job load, never a single > > job load. > > > > > > The processes are indeed connecting over a local network. > > > > > > I have only enabled the logging of connections and disconnections > > since > > > I figured that would be the most telling :) perhaps that was not > the > > > best idea. but, FYI, I see over 5000 such notices in a single > > minute. > > > I will reconfigure the logging to be more verbose. > > > > > > Thanks, > > > Steve > > > > > > On Tue, Mar 29, 2016 at 4:21 PM Adrian Klaver > > <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> > > > <mailto:adrian.kla...@aklaver.com > > <mailto:adrian.kla...@aklaver.com>>> wrote: > > > > > > On 03/29/2016 01:10 PM, Stephen Constable wrote: > > > > Hi All, > > > > > > > > I'm a new-ish sysadmin working on porting legacy > > scientific code > > > from a > > > > local server/client to new supercomputer environment. My > > work is > > > mostly > > > > done, except that my postgres database doesn't seem to be > > able to > > > keep > > > > up with the new environment. The application is written > > in-house > > > in a > > > > mixture of FORTAN 77 and C, and uses postgres BLOBS as its > > main data > > > > store. This application in particular only reads from the > > > database, it > > > > never writes, which *should* make it easy to scale. > > > > > > > > My main problem is that this client application is unable > to > > > connect to > > > > the database under a modest load (32 simultaneous jobs). > > The client > > > > error logs print out messages like "could not connect to > > server: > > > Cannot > > > > assign requested address" and "Cannot connect to database > > > [runlog]!!!" > > > > (an important database of ours). The "cannot assign > requested > > > address" > > > > > > Well those do not look like Postgres error messages to me, so > > the first > > > thing would be to determine what part of the stack is > > generating them. > > > > > > Is the client software connecting to the database over a > network? > > > > > > Are you using connection pooling? > > > > > > > message makes me think it's a configuration issue. The > > logs are > > > flooded > > > > with hundreds of connection and disconnection notices per > > > second. This > > > > > > Might want to turn off logging connections/disconnections: > > > > > > > > > http://www.postgresql.org/docs/9.4/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT > > > > > > log_connections (boolean) > > > > > > log_disconnections (boolean) > > > > > > > same code and configuration runs fine on our mid-2000's > > Solaris > > > 10 box > > > > with postgres 8.4 (albeit very slowly) but totally fails > > with these > > > > connection errors on a modern Dell system running CentOS 7 > or > > > FreeBSD 10 > > > > (I tested both) with postgres 9.4. > > > > > > > > While the database is under load (and jobs are actively > > failing), > > > select > > > > count(*) from pg_stat_activity returns 30-34 ish > > connections, show > > > > max_connections returns 100, and show > > superuser_reserved_connections > > > > shows 3. My only other hint is that right after a fresh > > install of > > > > CentOS 7 my job success rate was around 50%, and now it has > > > approached > > > > approximately 5%, so something is changing over time. > > > > > > > > Does anyone have any advice or experience with similar > issues? > > > > > > What else does the Postgres log show besides the > > > connections/disconnections, that might be of interest? > > > > > > What does the system log show? > > > > > > > > > > > Thanks, > > > > Steve > > > > > > > > > > > > > -- > > > Adrian Klaver > > > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> > > <mailto:adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com > >> > > > > > > > > > -- > > Adrian Klaver > > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >