subject:"Re\: \[Haskell\-cafe\] Exceeding OS limits for simultaneous socket connections"

Re: [Haskell-cafe] Exceeding OS limits for simultaneous socket connections

2012-02-07 Thread Alexander Kjeldaas

On 30 January 2012 14:22, Rob Stewart robstewar...@gmail.com wrote:

 Hi,

 I'm experiencing the accept: resource exhausted (Too many open
 files) exception when trying to use sockets in my Haskell program.

 The situation:
 - Around a dozen Linux machines running my Haskell program,
 transmitting thousands of messages to each other, sometimes within a
 small period of time.


...


 $ ulimit -n
 1024


This is not an OS limit, this is your freely chosen limit.  You should not
run with this few file descriptors on a server.  Increasing this by 50x is
entirely reasonable.  However, having too many open TCP connections is not
a good thing either. 1024 was an upper limit way way back on the i386 linux
architecture for code using the select() system call, that is why it is
still a common default.

There are a few ways to get out of this situation.

1. Reuse your TCP connections.  Maybe you could even use HTTP.  An HTTP
library might do reusing of connections for you.

2.  Since you are blocking in getContents, there is a probability that it
is the senders that are being lazy in sendAll.  They opened the TCP
connection, but now they are not sending everything in sendAll, so your
receiver is having lots of threads that are blocked on reading.  Try to be
strict when *sending* so you do not have too many ongoing TCP connections.

3. On the receiver side, to be robust, you could limit the number of
threads that are allowed to do an accept() to the number of file
descriptors you have free.  You can also block on a semaphore whenever
accept returns out of resources, and signal that semaphore after every
close.


Alexander

Indeed, when I experience the accept: resource exhausted (Too many
 open files) exception, I check the number of open sockets, which
 exceeds 1024, by looking at the contents of the directory:
 ls -lah /proc/prod_id/fd

 It is within the getContents function that, once the lazy bytestring
 is fully received, the socket is shutdown http://goo.gl/B6XcV :
 shutdown sock ShutdownReceive

 There seems to be no way of limiting the number of permitted
 connection requests from remote nodes. What I am perhaps looking for
 is a mailbox implementation on top of sockets, or another way to avoid
 this error. I am looking to scale up to 100's of nodes, where the
 possibility of more than 1024 simultaneous socket connections to one
 node is increased. Merely increasing the ulimit feels like a temporary
 measure. Part of the dilemma is that the `connect' call in `sendMsg'
 does not throw an error, despite the fact that it does indeed cause an
 error on the receiving node, by pushing the number of open connections
 to the same socket on the master node, beyond the 1024 limit permitted
 by the OS.

 Am I missing something? One would have thought such a problem occurs
 frequently with Haskell web servers and the like.. ?

 --
 Rob Stewart

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Exceeding OS limits for simultaneous socket connections

2012-02-03 Thread Matthew Farkas-Dyck

Rob Stewart wrote:
 transmitting thousands of messages to each other, sometimes within a small 
 period of time.

Either SCTP or UDP seems far more appropriate than TCP (which I
assume, hopefully safely, to be at work here) for this task.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Exceeding OS limits for simultaneous socket connections

2012-01-30 Thread Marc Weber

What you can try
- reduce amount of threads running at the same time - thus accept less
  connections. AFAIK the OS has a buffer caching the connection request
  for a while - thus this may just work.

  Using STM it would be trivial to implement it: try incrementing a var,
  if it is  100 fail. It will only be retried if the vars change or
  such, correct?

  When you're done decrease the number.

- increase limit (you said this is no option)

- replace getContents conn by something strict and close the handle
  yourself? (not sure about this.)
  Eg yesod introduces conduits for that reason =
  http://www.yesodweb.com/blog/2011/12/conduits
  There are alternative implementations on hackage.

- not sure how many apps are running at the same time. But instead of
  creating many connections from machine A to B you could try
  establishing a permanent connection sending binary streams or chunk
  the messages. Eg wait for 5 requests - then bundle them and send them
  all at once (depneds on your implementation whether this could be an
  option).

That's all which comes to my mind. Probably more experienced users have
additional ideas. Thus keep waiting and reading.

Marc Weber

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Exceeding OS limits for simultaneous socket connections

Re: [Haskell-cafe] Exceeding OS limits for simultaneous socket connections

Re: [Haskell-cafe] Exceeding OS limits for simultaneous socket connections

3 matches

Site Navigation

Mail list logo

Footer information