Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-03-25 Thread Brandon S. Allbery KF8NH

On Mar 25, 2010, at 15:03 , Bardur Arantsson wrote:

On 2010-02-24 20:50, Brandon S. Allbery KF8NH wrote:

tcpdump 'host ps3 and tcp[tcpflags]  0x27 != 0'


The only striking thing I can see about the dump is that there are  
22 (conspicuously close to 16) sequences like:


19:45:30.135291 IP 192.168.0.115.64931  gwendolyn.9000: Flags [R],  
seq 2112225068, win 0, length 0
19:45:30.135295 IP 192.168.0.115.64931  gwendolyn.9000: Flags [R],  
seq 2112225068, win 0, length 0
19:45:30.135299 IP 192.168.0.115.64931  gwendolyn.9000: Flags [R],  
seq 2112225068, win 0, length 0
19:45:30.135302 IP 192.168.0.115.64931  gwendolyn.9000: Flags [R],  
seq 2112225068, win 0, length 0


The above is a single socket:  the source and destination ports are  
the same for all 4 traces.


More useful, from the dump, is:

19:44:41.774161 IP 192.168.0.115.65265  gwendolyn.9000: Flags [F.],  
seq 231, ack 1073301, win 41124, options [nop,nop,TS val 0 ecr  
95041042], length 0



which is where the PS/3 sent a FIN telling gwendolyn to close the  
socket.  It then follows that with a bunch of RST packets, the first  
of which is in sequence with the above FIN (suggesting the PS/3  
responded to the continued attempt to send by dropping the socket on  
the floor instead of by resending the FIN) and the rest are this port  
is closed RSTs, presumably due to 22 attempts to continue sending  
data.  This is somewhat poor on the part of the PS/3, but  
understandable given that it's essentially an embedded device.


It would be interesting to see what the data around there was, but  
that's not easy to do without recording all of it.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH




PGP.sig
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-23 Thread Brandon S. Allbery KF8NH

On Feb 21, 2010, at 20:17 , Jeremy Shaw wrote:
The PS3 does do something though. If we were doing a write *and*  
read select on the socket, the read select would wakeup. So, it is  
trying to notify us that something has happened, but we are not  
seeing it because we are only looking at the write select().


Earlier the OP claimed this would happen within a few minutes if he  
seeked in a movie.  If it's that reproducible, it should be easy to  
capture a tcpdump and attach it to an email (or pastebin it), allowing  
us to determine what really happens.


Also, Donn, you are incorrect about invalidating premises; we know the  
connection is going away, we can infer it's not going away normally,  
that's why there have been comments about it sending a FIN and  
dropping the connection entirely (bypassing the shutdown handshake),  
or sending an RST, etc.


(I'd also be interested in finding out if OpenSolaris or FreeBSD has  
the same problem, but that may be too difficult to test easily.  I  
still find it highly unlikely that loss of a connection only wakes the  
read end in general, and would absolutely not be surprised if this  
were some odd corner case in the Linux TCP stack.  Sadly, I don't have  
a PS3 (yet, if ever) and I don't know of any streaming software for  
non-hacked Wiis.)


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH




PGP.sig
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-23 Thread Donn Cave
Quoth Brandon S. Allbery KF8NH allb...@ece.cmu.edu,
 On Feb 21, 2010, at 20:17 , Jeremy Shaw wrote:
 The PS3 does do something though. If we were doing a write *and*  
 read select on the socket, the read select would wakeup. So, it is  
 trying to notify us that something has happened, but we are not  
 seeing it because we are only looking at the write select().

 Earlier the OP claimed this would happen within a few minutes if he  
 seeked in a movie.  If it's that reproducible, it should be easy to  
 capture a tcpdump and attach it to an email (or pastebin it), allowing  
 us to determine what really happens.

 Also, Donn, you are incorrect about invalidating premises; we know the  
 connection is going away, we can infer it's not going away normally,  
 that's why there have been comments about it sending a FIN and  
 dropping the connection entirely (bypassing the shutdown handshake),  
 or sending an RST, etc.

That's what I'm saying - it clearly is not a full close, i.e., going
away normally per protocol.

With luck maybe the packets will show that something does happen at
a wire protocol level, and there will be a way to recognize the event
at the `user land' level and plug that into the event loop.

My prediction is that on the contrary, the transition between functional
and defunct will not be not announced in any way by the peer, but that's
just guessing.  It would be a lot less interesting.

Donn

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-23 Thread Brandon S. Allbery KF8NH

On Feb 23, 2010, at 23:47 , Donn Cave wrote:
My prediction is that on the contrary, the transition between  
functional
and defunct will not be not announced in any way by the peer, but  
that's

just guessing.  It would be a lot less interesting.



But that's not the issue.  The *kernel* is clearly detecting it; the  
problem is it's only being reported for the *read* end of the socket,  
whereas sendfile() (correctly) only cares about, and therefore only  
registers interest in, the *write* end.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH




PGP.sig
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-21 Thread Donn Cave
Quoth Bardur Arantsson s...@scientician.net,
 Taru Karttunen wrote:

 Excerpts from Bardur Arantsson's message of Wed Feb 17 21:27:07 +0200 2010:
 For sendfile, a timeout of 1 second would probably be fine. The *ONLY* 
 purpose of threadWaitWrite in the sendfile code is to avoid busy-waiting 
 on EAGAIN from the native sendfile.
 
 Of course this will kill connections for all clients that may have a
 two second network hickup.
 

 I'm not talking about killing the connection. I'm talking about retrying 
 sendfile() if threadWaitWrite has been waiting for more than 1 second.

 If the connection *has already been closed* (as detected by the OS), 
 then sendfile() will fail with EBADF, and we're good.
...
 I don't see how that would lead to anything like what you describe.

If I understand correctly, we're talking about what it means for the
OS to detect a closed connection.

The proposal I think was to change the socket options to add keepalive,
and to set a short timeout.  This will indeed allow the OS to discover
connections that didn't properly close, but are effectively closed in
the sense that they are no use any more - disconnected cable, or it
sounds like the PS3 may routinely do this out of negligence.

The problem is that this definition of `closed' is, precisely,
`failed to respond within 2 seconds.'  If there is no observable
difference between a connection that has been abandoned by the PS3,
and a connection that just suffered a momentary lapse, then there's
no way to catch the former without making connections more fragile.

Donn Cave
d...@avvanta.com

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-21 Thread Jeremy Shaw


On Feb 21, 2010, at 11:50 AM, Donn Cave wrote:


The problem is that this definition of `closed' is, precisely,
`failed to respond within 2 seconds.'  If there is no observable
difference between a connection that has been abandoned by the PS3,
and a connection that just suffered a momentary lapse, then there's
no way to catch the former without making connections more fragile.


No. (i think)

What happens is the PS3 has closed the connection, and if you attempt  
to send any more packets the PS3 will tell you it has closed the  
connection and the write() / sendfile() call will raise SIGPIPE.


The problem is we never try to send those packets, because we are  
sitting at threadWaitWrite waiting to write -- and there is nothing  
that is going to happen that will cause that call to select () (by  
threadWaitWrite) to actually wakeup.


I believe the proposal is to add a 2 second time out to the  
threadWaitWrite call. If it wakes up and can't write (because the  
remote side has lost connections, etc) then it will just go back to  
sleep. But if it wakes up, tries to write, and then gets sigPIPE, then  
it knows the connection is actually dead and will clean up after itself.


The problem is that we have not successfully figure out what is  
causing this issue in the first place.


I wrote a haskell server and a C client to try to emulate the  
situation which causes threadWaitWrite to never wake-up.. but I could  
not actually get that to happen. So for the PS3 client is the only  
thing that causes it.


I think that applying a fix with out really understanding the problem  
is asking for trouble.


Among other things, since the problem is with threadWaitWrite (not  
sendfile), then the same issue ought to exist when we are calling  
hPutStr, etc, since they ultimately call threadWaitWrite as well. If  
hPut never has this problem, then we should understand why and use the  
same solution for sendfile. If hPut does have this problem, then  
fixing just sendfile isn't much of a solution.


So far there is:

 - no way for anyone besides Bardur to reproduce the problem
 - no sound explanation for why the PS3 client causes the error, but  
nothing else does
 - no proof that this error does or does not affect all the normal I/ 
O functions in Haskell (hPut, etc).


- jeremy ___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-21 Thread Jeremy Shaw
On Sun, Feb 21, 2010 at 6:39 PM, Donn Cave d...@avvanta.com wrote:

 Quoth Jeremy Shaw jer...@n-heptane.com,
 ...
  What happens is the PS3 has closed the connection, and if you attempt
  to send any more packets the PS3 will tell you it has closed the
  connection and the write() / sendfile() call will raise SIGPIPE.
 ...
  So far there is:
 
- no way for anyone besides Bardur to reproduce the problem
- no sound explanation for why the PS3 client causes the error,
  but nothing else does

 I think in fact this invalidates your premise.  If the PS3 really
 closed its connection in the standard fashion, then it would be trivial
 to reproduce this problem with any other peer.  Evidently it doesn't,
 at least in this particular case, and that's why people are talking
 about TCP keep-alives, which address the defunct peer problem (within
 two hours, normally.)


The PS3 does do something though. If we were doing a write *and* read select
on the socket, the read select would wakeup. So, it is trying to notify us
that something has happened, but we are not seeing it because we are only
looking at the write select().

But I can not explain what the PS3 client is doing differently than the
other clients such that it does not cause the threadWaitWrite to wakeup.

Additionally, it is not clear that setting SO_KEEPALIVE will actually fix
anything. The documentation that I have read indicates that that may only
cause the read select() to wakeup not the write select(). Well, that is no
good, because that is supposedly what is happening with the PS3 client
already.

Anyway, part of the annoyance here is that in this particular case we
shouldn't need any timeouts to 'guess' that the client is 'probably dead'.
The client seems to be telling us that it has disconnected, but we are not
looking in the right place. And if we did try to write we would get a
sigPIPE error.

It is not the case the the client is unresponsive -- it is quite responsive.
The problem is that we are not looking in the right place for that response.

But, 'looking in the right place' is tricky. How do you tell hPut that it
should wakeup from threadWaitWrite if the Handle happens to be backed by a
socket, and threadWaitRead has data available? That does not even always
indicate an error condition, it can be a perfectly valid situation.

Well, before I think about that, I want to know what the PS3 client is doing
differently such that it is the only client that seems to exhibit this
behavior at the moment. If we do not understand the real difference between
what the PS3 and the C client are doing, then I don't think we can expect to
arrive at an appropriate fix.

- jeremy
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-18 Thread Taru Karttunen
Excerpts from Bardur Arantsson's message of Wed Feb 17 21:27:07 +0200 2010:
 For sendfile, a timeout of 1 second would probably be fine. The *ONLY* 
 purpose of threadWaitWrite in the sendfile code is to avoid busy-waiting 
 on EAGAIN from the native sendfile.

Of course this will kill connections for all clients that may have a
two second network hickup.

 How so? As a user I expect sendfile to work and not semi-randomly block 
 threads indefinitely.

If you want sending something to terminate you will add a timeout to
it. A nasty client may e.g. take one byte each minute and sending your
file may take a few years.

- Taru Karttunen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-17 Thread Taru Karttunen
Excerpts from Bardur Arantsson's message of Tue Feb 16 23:48:14 +0200 2010:
  This cannot be fixed in the sendfile library, it is a 
  feature of TCP that connections may linger for a long
  time unless explicit timeouts are used.
 
 The problem is that the sendfile library *doesn't* wake
 up when the connection is terminated (because of threadWaitWrite)
 -- it doesn't matter what the timeout is.

Even server code without sendfile has the same issue since
all writing to sockets ends up using threadWaitWrite.

System.Timeout.timeout terminates a threadWaitWrite using
asynchronous exceptions.

If you want to detect dead sockets somewhat reliably 
without a timeout then there is SO_KEEPALIVE combined
with polling SO_ERROR every few minutes.

- Taru Karttunen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-17 Thread Jeremy Shaw
On Wed, Feb 17, 2010 at 2:36 AM, Taru Karttunen tar...@taruti.net wrote:

 Excerpts from Bardur Arantsson's message of Tue Feb 16 23:48:14 +0200 2010:
   This cannot be fixed in the sendfile library, it is a
   feature of TCP that connections may linger for a long
   time unless explicit timeouts are used.
 
  The problem is that the sendfile library *doesn't* wake
  up when the connection is terminated (because of threadWaitWrite)
  -- it doesn't matter what the timeout is.

 Even server code without sendfile has the same issue since
 all writing to sockets ends up using threadWaitWrite.


Right, this is my concern -- I want to make sure that all of happstack is
fixed, not just sendfile.


 System.Timeout.timeout terminates a threadWaitWrite using
 asynchronous exceptions.


So for sendfile, instead of threadWaitWrite we could do:

 r - timeout (60 * 10^6) threadWaitWrite
 case r of
   Nothing - ... -- timed out
   (Just ()) - ... -- keep going

It seems tricky to use timeout at a higher level in the code, because some
requests may take a very long time to finish. For example, when serving a
long video, or streaming music it could be hours or days before the IO
request finishes.


If you want to detect dead sockets somewhat reliably
 without a timeout then there is SO_KEEPALIVE combined
 with polling SO_ERROR every few minutes.


 This approach sounds promising because it seems like it could be
incorporated into the guts of happstack-server. The timeout period could be
a Config option with a reasonable default. I would be surprised if *any*
happstack programs today are handling this correctly, so updating the core
to do something reasonable would be a big improvement... And if someone has
a special need where it is not ok, they can just change the config to use an
infinite timeout...

Does that sound like the right fix to you? (Obviously, if people are using
sendfile with something other than happstack, it does not help them, but it
sounds like trying to fix things in sendfile is misguided anyway.)

- jeremy
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-17 Thread Jeremy Shaw
On Wed, Feb 17, 2010 at 1:27 PM, Bardur Arantsson s...@scientician.netwrote:


  (Obviously, if people are using sendfile with something other than
 happstack,
 it does not help them, but it  sounds like trying to fix things in

  sendfile is misguided anyway.)



 How so? As a user I expect sendfile to work and not semi-randomly block
 threads indefinitely.


Because it only addresses *one* case when this type of blocking can happen.

Shouldn't hPut and friends also block indefinitely since they also use
threadWaitWrite? If so, what good is just fixing sendfile, when all other
network I/O will still block indefinitely?

If things are 'fixed' at a higher-level, by using SO_KEEPALIVE, then does
sendfile really need a hack to deal with it?

With your proposed fix, if the user unplugs the network cable, then won't
you get an polling loop that never terminates? That doesn't sound any better
than the current situation..

You said that you have not seen this issue when using the code that uses
hPut, only the code that uses sendfile(). But my research indicates that we
*should* see the error. So, I am not very comfortable fixing just sendfile
and ignoring the fact that all network I/O might be borked..

I am also not 100% pleased by the SO_KEEPALIVE solution. There are really
two errors which can occur:

  1. the remote end drops the connection in such a manner that we
immediately get notified of it by seeing that a read select() on the socket
is successful but there are 0 bytes available to read. This happens because
the remote end sent a notification to us that they have terminated the
connection.

  2. the remote end drops off the network (for example, the network cable is
disconnected). In this case, we will not get any notification via read
select(), because the remote server is not there to send the notification.
The only solution is to eventually timeout.

By using a timeout to handle #2, we implicitly handle #1, but in a very
untimely manner.

Ideally, we would like to handle both these cases separately. In case #1, we
know immediately, that the connection is dead, and can therefore clean
things up. With case #2, the remote client might actually come back online,
(someone plugs the cable back in), and the transfer resumes. Perhaps in some
applications we want infinite timeouts for case #2. That does not mean we do
not want case #1 handled.

However, I do not really see a good way of handle #1 right now that works
for all network code, not just sendfile.

The issue seems to be that select() was designed as a way to *avoid* using
threads. There seems to be the assumption in the network code that you are
going to do a select on the read and write aspects of the socket. When the
select returns you will then look at what happened, and take the correct
action.

But, in Haskell, we are using multiple threads. So the code that is looking
to read data and the code that is looking to write data don't really know
about each other. So even if the read thread detects the closed socket, it
has no idea that some other thread needs to be killed.

so, what to do? Perhaps it is wrong to use a socket in more than one thread?
Obviously, having multiple threads trying read the same socket, or write to
the same socket would be a mess. So why do we expect it is ok to have one
thread reading and a different thread writing? But, even if we do restrict
ourselves to only accessing a socket from one thread at a time, we still
have the issue that every place which uses threadWaitWrite needs to handle
the disconnect case. We could, of course, write a wrapper function that does
the check, and call that instead. But we still have not really solved the
problem. The code in the I/O libraries that eventually implements hPut calls
threadWaitWrite. But it has no idea that the file descriptor it is waiting
on is a socket which has special requirements. That code is also used for
writing to plain old files, etc, so it probably wouldn't make sense for it
to behave that way by default..

- jeremy
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-17 Thread Jeremy Shaw
On Wed, Feb 17, 2010 at 3:54 PM, Jeremy Shaw jer...@n-heptane.com wrote:

 On Wed, Feb 17, 2010 at 1:27 PM, Bardur Arantsson s...@scientician.netwrote:


  (Obviously, if people are using sendfile with something other than
 happstack,
 it does not help them, but it  sounds like trying to fix things in

  sendfile is misguided anyway.)



 How so? As a user I expect sendfile to work and not semi-randomly block
 threads indefinitely.


 Because it only addresses *one* case when this type of blocking can happen.

 Shouldn't hPut and friends also block indefinitely since they also use
 threadWaitWrite? If so, what good is just fixing sendfile, when all other
 network I/O will still block indefinitely?

 If things are 'fixed' at a higher-level, by using SO_KEEPALIVE, then does
 sendfile really need a hack to deal with it?


I think I understand the SO_KEEPALIVE + SO_ERROR solution, and that does not
really fix things either.

Setting SO_KEEPALIVE by itself does not cause the write select() to behave
any differently. What it does do is cause the TCP stack to eventually send
and empty packet to the remote host and hopefully get a response back. The
response might be an error, or it might just be an ACK. But either way, I
believe it is intended to cause the read select() to wakeup. But, in the
case that started this discussion, we are already getting this information.
So this won't help with that at all.

The second part of the solution is to poll SO_ERROR to determine if
something went wrong. This is an alternative to doing a read() on the socket
and see if it returns 0 bytes. It is a nice alternative *because* it does
not require a read(). However, it is still problematic. When you poll
SO_ERROR, it will clear the error value, so there is a potential race
condition if multiple threads are doing it.

In happstack, we fork a new thread to handle each incoming connection. So at
first it seems like we could just fork a second thread that polls the
SO_ERROR option on the socket and kills the first thread if an error
happens. Unfortunately, it is not that simple. The first thread might fork
another thread that is actually doing the threadWaitWrite. Killing the
parent thread will not kill that child thread.

So, at present, I don't see a solution that is going to fix the problem in
the rest of the IO code. There are multiple ways to hack only sendfile.. but
that is only one place this error can happen.

If this error truly never happens with hPut, then we should figure out why.
If there is a solution that works for write() it should work for sendfile(),
because the real issue is with the select() call anyway..

- jeremy
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-16 Thread Jeremy Shaw
On Sun, Feb 14, 2010 at 2:04 PM, Bardur Arantsson s...@scientician.netwrote:


 I've tested this extensively during this weekend and not a single leaked
 FD so far.

 I think we can safely say that polling an FD for read readiness is
 sufficient to properly detect a disconnected client regardless of why/how
 the client disconnected.

 The only issue I can see with just dropping the above code directly into
 the sendfile library is that it may lead to busy-waiting on EAGAIN *if* the
 client is actually trying to send data to the server while it's receiving
 the file via sendfile(). If the client sends even a single byte and the
 server isn't reading it from the socket, then threadWaitRead will keep
 returning immediately since it's level-triggered rather than edge triggered.


Yeah. That could be trouble.


 Not sure what the best solution for this would be, API-wise... Maybe
 actually have sendfile read the data and supply it to a user-defined
 function which could react to the data in some way? (Could supply two
 standard functions: disconnect immediately and accumulate all received
 data into a bytestring.)


I think this goes beyond just a sendfile issue -- anyone trying to write
non-blocking network code should run into this issue, right ? For now, maybe
we should patch sendfile with what we have. But I think we really need to
summarize our findings, see if we can generate a test case, and then see
what Simon Marlow and company have to say...

- jeremy
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-16 Thread Bryan O'Sullivan
On Tue, Feb 16, 2010 at 12:37 PM, Jeremy Shaw jer...@n-heptane.com wrote:


 I think this goes beyond just a sendfile issue -- anyone trying to write
 non-blocking network code should run into this issue, right ?


What's a fairly concise description of the issue at hand? I haven't been
paying much attention to this thread, and the descriptions I have seen have
been somewhat confused.

One admittedly unhelpful observation is that when something goes wrong in
this area, it's usually due to pilot error (either on the part of whoever
wrote the Haskell library, or its user), and not so often caused by a bug in
the underlying platform.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-16 Thread Taru Karttunen
Excerpts from Bardur Arantsson's message of Tue Feb 16 22:57:23 +0200 2010:
 As far as I can tell, all nonblocking networking code is vulnerable to 
 this issue (unless it actually does use threadWaitRead, obviously :)).

There are a few easy fixes:

1) socket timeouts with Network.Socket.setSocketOption
2) just make your server code have timeouts in Haskell

This cannot be fixed in the sendfile library, it is a 
feature of TCP that connections may linger for a long
time unless explicit timeouts are used.

So just document it and in your code using sendfile
wrap it in an application specific timeout.

- Taru Karttunen
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-16 Thread Jeremy Shaw
On Tue, Feb 16, 2010 at 3:48 PM, Bardur Arantsson s...@scientician.netwrote:

 The problem is that the sendfile library *doesn't* wake
 up when the connection is terminated (because of threadWaitWrite)
 -- it doesn't matter what the timeout is.


Have we actually confirmed this? We know that with the default socket
configuration things are good. But have we actually tested testing the
timeout to something short and seeing what happens? It would be good to know
for sure..

- jeremy
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-11 Thread Jeremy Shaw
On Wed, Feb 10, 2010 at 1:15 PM, Bardur Arantsson s...@scientician.netwrote:

I've also been contemplating some solutions, but I cannot see any solutions
 to this problem which could reasonably be implemented outside of GHC itself.
 GHC lacks a threadWaitError, so there's no way to detect the problem
 except by timeout or polling. Solutions involving timeouts and polling are
 bad in this case because they arbitrarily restrict the client connection
 rate.

 Cheers,


I believe solutions involving polling and timeouts may be the *only*
solution due to the way TCP works. There are two cases to consider here:

 1. what happens when the remote client does a proper disconnect by sending
a FIN packet, etc
 2. what happens when the remote client just drops the connection

Case #1 - Proper Disconnect

I believe that in case we are ok. select() may not wakeup due to the socket
being closed -- but something will eventually cause select() to wakeup, and
then next time through the loop, the call to select will fail with EBADF.
This will cause everyone to wakeup. We can test this case by writing a
client that purposely (and correctly) terminations the connection while
threadWaitWrite is blocking and see if that causes it to wakeup. To ensure
that the IOManager is eventually waking up, the server can have an IO thread
that just does, forever $ threadDelay (1*10^6)

Look here for more details:
http://darcs.haskell.org/packages/base/GHC/Conc.lhs

Case #2 - Sudden Death

In this case, there is no way to tell if the client is still there with out
trying to send / recv data. A TCP connection is not a 'tangible' link. It is
just an agreement to send packets to/from certain ports with certain
sequence numbers. It's much closer to snail mail than a telephone call.

If you set the keepalive socket option, then the TCP layer will
automatically ping the connection to make sure it is still alive. However, I
believe the default time between keepalive packets is 2 hours, and can only
be changed on a system wide basis?

http://www.unixguide.net/network/socketfaq/2.8.shtml

The other option is to try to send some data. There are at least two cases
that can happen here.

 1. the network cable is unplugged -- this is not an 'error'. The write
buffer will fill up and it will wait until it can send the data. If the
write buffer is full, it will either block or return EAGAIN depending on the
mode. Eventually, after 2 hours, it might give up.

 2. the remote client has terminated the connection as far as it is
concerned but not notified the server -- when you try to send data it will
reject it, and send/write/sendfile/etc will raise sigPIPE.

Looking at your debug output, we are seeing the sigPIPE / Broken Pipe error
most of the time. But then there is the case where we get stuck on the
threadWaitWrite.

threadWaitWrite is ultimately implemented by passing the file descriptor to
the list of write descriptors in a call to select(). It seems, however, that
select() is not waking up just because calling write() on a file descriptor
*would* cause sigPIPE.

The easiest way to confirm this case is probably to write a small, pure C
program and see what really happens.

If this is the case, then it means the only way to tell if the client has
abruptly dropped the connection is to actually try sending the data and see
if the sending function calls sigPIPE. And that means doing some sort of
polling/timeout?

What do you think?

I do not have a good explanation as to why the portable version does not
fail. Except maybe it is just so slow that it does not ever fill up the
buffer, and hence does not get stuck in threadWaitWrite?

Any way, the fundamental question is:

 When your write buffer is full, and you call select() on that file
descriptor, will select() return in the case where calling write() again
would raise sigPIPE?

- jeremy
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-11 Thread Thomas DuBuisson
Bardur Arantsson s...@scientician.net wrote:
 ...
       then do errno - getErrno
               if errno == eAGAIN
                 then do
                    threadDelay 100
                    sendfile out_fd in_fd poff bytes
                 else throwErrno Network.Socket.SendFile.Linux
      else return (fromIntegral sbytes)

 That is, I removed the threadWaitWrite in favor of just adding a
 threadDelay 100 when eAGAIN is encountered.

 With this code, I cannot provoke the leak.

 Unfortunately this isn't really a solution -- the CPU is pegged at
 ~50% when I do this and it's not exactly elegant to have a hardcoded
 100 ms delay in there. :)

I don't think it matters wrt the desired final solution, but this is
NOT a 100 ms delay.  It is a 0.1 ms delay, which is less than a GHC
time slice and as such is basically a tight loop.  If you use a
reasonable value for the delay you will probably see the CPU being
almost completely idle.

Thomas
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-11 Thread Jeremy Shaw


On Feb 11, 2010, at 1:57 PM, Bardur Arantsson wrote:



2. the remote client has terminated the connection as far as it is
concerned but not notified the server -- when you try to send data  
it will

reject it, and send/write/sendfile/etc will raise sigPIPE.
Looking at your debug output, we are seeing the sigPIPE / Broken  
Pipe error
most of the time. But then there is the case where we get stuck on  
the

threadWaitWrite.
threadWaitWrite is ultimately implemented by passing the file  
descriptor to
the list of write descriptors in a call to select(). It seems,  
however, that
select() is not waking up just because calling write() on a file  
descriptor

*would* cause sigPIPE.


That's what I expect select() with an errfd FDSET would do.


Nope. The expectfds are only trigger in esoteric conditions. For TCP  
sockets, I think it only occurs if there is out-of-band data available  
to be read via recv() with the MSG_OOB flag.


http://uw714doc.sco.com/en/SDK_netapi/sockC.OoBdata.html

The easiest way to confirm this case is probably to write a small,  
pure C

program and see what really happens.
If this is the case, then it means the only way to tell if the  
client has
abruptly dropped the connection is to actually try sending the data  
and see
if the sending function calls sigPIPE. And that means doing some  
sort of

polling/timeout?


Correct, but the trouble is deciding how often to poll and/or how  
long the timeout should be.


I don't see any easy answer to that. That's why my suggested  
solution is to simply punt it to the OS (by using portable mode)  
and suck up the extra overhead of the portable solution. Hopefully  
the new GHC I/O manager will make it possible to have a proper  
solution.


The whole point of the sendfile library is to use sendfile(), so not  
using sendfile() seems like the wrong solution. I am also not  
convinced that the new GHC I/O manager will do anything new to make it  
possible to have a proper solution. I believe we would be seeing the  
same error even in pure C, so we need to know the work around that  
works in pure C as well. I am not convinced we are punting to the OS  
by using portable mode either (more below).


I do not have a good explanation as to why the portable version  
does not
fail. Except maybe it is just so slow that it does not ever fill up  
the

buffer, and hence does not get stuck in threadWaitWrite?


The portable version doesn't call threadWaitWrite. It simply turns  
the Socket into a handle (which causes it to become blocking)  and  
so the kernel is tasked with handling all the gritty details.


The portable version does not directly call threadWaitWrite, but it  
still calls it.


Data.ByteString.Char8.hPutStr calls
Data.ByteString.hPut which calls
Data.ByteString.hPutBuf which calls
System.IO.hPutBuf which calls
GHC.IO.Handle.Text.hPutBuf which calls
GHC.IO.Handle.bufWrite.Text which calls
GHC.IO.Device.write which calls
GHC.IO.FD.fdWrite which calls
GHC.IO.FD.writeRawBufferPtr which calls

which is defined as:

writeRawBufferPtr :: String - FD - Ptr Word8 - Int - CSize - IO  
CInt

writeRawBufferPtr loc !fd buf off len
  | isNonBlocking fd = unsafe_write -- unsafe is ok, it can't block
  | otherwise   = do r - unsafe_fdReady (fdFD fd) 1 0 0
 if r /= 0
then write
else do threadWaitWrite (fromIntegral (fdFD  
fd)); write

  where
do_write call = fromIntegral `fmap`
  throwErrnoIfMinus1RetryMayBlock loc call
(threadWaitWrite (fromIntegral (fdFD fd)))
write = if threaded then safe_write else unsafe_write
unsafe_write  = do_write (c_write (fdFD fd) (buf `plusPtr` off)  
len)
safe_write= do_write (c_safe_write (fdFD fd) (buf `plusPtr`  
off) len)


According to the following test program, I expect that 'isNonBlocking  
fd' will be 'True'. So it seems like the portable solution should be  
vulnerable to the same condition. Perhaps the portable version is just  
so slow that the OS buffers never fill up so EAGAIN is never raised?


---

{-# LANGUAGE RecordWildCards #-}
module Main where

import Control.Concurrent (forkIO)
import Control.Monad (forever)
import Network (PortID(PortNumber), Socket, listenOn)
import Network.Socket (accept, socketToHandle)
import System.IO
import qualified GHC.IO.FD as FD
import GHC.IO.Handle.Internals (withHandle, flushWriteBuffer)
import GHC.IO.Handle.Types (Handle__(..), HandleType(..))
import qualified GHC.IO.FD as FD
import System.Posix.Types (Fd(..))
import System.IO.Error
import GHC.IO.Exception
import Data.Typeable (cast)
import GHC.IO.Handle.Internals (wantWritableHandle)

main =
  listen (PortNumber (toEnum 2525)) $ \s -
 do h - socketToHandle s ReadWriteMode
wantWritableHandle main h $ \h_ - showBlocking h_


showBlocking :: Handle__ - IO ()

Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-10 Thread Jeremy Shaw

On Feb 9, 2010, at 6:47 PM, Thomas Hartman wrote:


Matt, have you seen this thread?

Jeremy, are you saying this a bug in the sendfile library on hackage,
or something underlying?


I'm saying that the behavior of the sendfile library is buggy. But it  
could be due to something underlying..


Either threadWaitWrite is buggy and should be fixed. Or  
threadWaitWrite is doing the right thing, and sendfile needs to be  
modified some how to account for the behavior. But I don't know which  
is the case or how to implement a solution to either option.


- jeremy
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-09 Thread Jeremy Shaw
On Sun, Feb 7, 2010 at 9:22 AM, Bardur Arantsson s...@scientician.netwrote:

True, it is perhaps technically not a bug, but it is certainly a misfeature
 since there is no easy way (at least AFAICT) to discover that something bad
 has happened for the file descriptor and act accordingly. AFAICT any
 solution would have to be based on a separate thread which either 1)
 checks the FD periodically somehow, or 2) simply lets the thread doing the
 threadWaitWrite time out after a set period of inactivity. Neither is very
 optimal.

 Either way, I'd certainly expect the sendfile library to work around this
 somehow such that this situation doesn't occur. I'm just having a hard time
 thinking up a good solution :).


Well, it is certainly a bug in sendfile that needs to be fixed. I'm not sure
how to fix it either. If we can simplify the test case, we can ask Simon
Marlow..

- jeremy
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-09 Thread Thomas Hartman
Matt, have you seen this thread?

Jeremy, are you saying this a bug in the sendfile library on hackage,
or something underlying?

thomas.

2010/2/9 Jeremy Shaw jer...@n-heptane.com:
 On Sun, Feb 7, 2010 at 9:22 AM, Bardur Arantsson s...@scientician.net
 wrote:

 True, it is perhaps technically not a bug, but it is certainly a
 misfeature since there is no easy way (at least AFAICT) to discover that
 something bad has happened for the file descriptor and act accordingly.
 AFAICT any solution would have to be based on a separate thread which either
 1) checks the FD periodically somehow, or 2) simply lets the thread doing
 the threadWaitWrite time out after a set period of inactivity. Neither is
 very optimal.

 Either way, I'd certainly expect the sendfile library to work around this
 somehow such that this situation doesn't occur. I'm just having a hard time
 thinking up a good solution :).

 Well, it is certainly a bug in sendfile that needs to be fixed. I'm not sure
 how to fix it either. If we can simplify the test case, we can ask Simon
 Marlow..
 - jeremy
 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-07 Thread Jeremy Shaw
It's not clear to me that this is actually a bug in threadWaitWrite.

I believe that under Linux, select() does not wakeup just because the file
descriptor was closed. (Under Windows, and possibly solaris/BSD/etc it
does). So this behavior might be consistent with normal Linux behavior.
However, it is clearly annoying that (a) the expected behavior is not
documented (b) the behavior might be different under Linux than other OSes.

In some sense it is correct -- if the file descriptor is closed, then we
certainly can't write more to it -- so threadWaitWrite need not wake up..
But that leaves us with the issue of needing  someway to be notified that
the file descriptor was closed so that we can clean up after ourselves..

- jeremy

On Sun, Feb 7, 2010 at 2:13 AM, Bardur Arantsson s...@scientician.netwrote:

 Bardur Arantsson wrote:

 Bardur Arantsson wrote:

 (sorry about replying-to-self)

  During yet another bout of debugging, I've added even more I am here
 instrumentation code to the SendFile code, and the culprit seems to be

   threadWaitWrite.


 As Jeremy Shaw pointed out off-list, the symptoms are also consistent
 with a thread that simply gets stuck in threadWaitWrite.

 I've tried a couple of different solutions to this based on starting a
 separate thread to enforce a timeout on threadWaitWrite (using throwTo).

 It seems to work to prevent the file descriptor leak, but causes GHC
 to segfault after a while. Probably some sort of other resource exhaustion
 since my code is just an evil hack:

  killer :: MVar () - ThreadId - IO ()
  killer dontKill otherThread = do
 threadDelay 5000
 x - tryTakeMVar dontKill
 case x of
Just _ - putStrLn Killer thread expired
Nothing - throwTo otherThread (Overflow)

 where the relevant bit of sendfile reads:

 mtid - myThreadId
 dontKill - newEmptyMVar
 forkIO $ killer dontKill mtid
 threadWaitWrite out_fd
 putMVar dontKill ()

 So I'm basically creating a thread for every single threadWaitWrite
 operation
 (which is a lot in this case).

 Anyone got any ideas on a simpler way to handle this? Maybe I should just
 report a bug for threadWaitWrite? IMO threadWaitWrite really should
 throw some sort of IOException if the FD goes dead while it's waiting.

 I suppose an alternative way to try to work around this would be by forcing
 the output
 socket into blocking (as opposed to non-blocking) mode, but I can't figure
 out how to
 do this on GHC 6.10.x -- I only see setNonBlockingFD which doesn't take a
 parameter
 unlike its 6.12.x counterpart.


 Cheers,

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-06 Thread Felipe Lessa
On Sat, Feb 06, 2010 at 09:16:35AM +0100, Bardur Arantsson wrote:
 Brandon S. Allbery KF8NH wrote:
 On Feb 5, 2010, at 02:56 , Bardur Arantsson wrote:
 [--snip--]
 
 Broken pipe is normally handled as a signal, and is only mapped
 to an error if SIGPIPE is set to SIG_IGN.  I can well imagine that
 the SIGPIPE signal handler isn't closing resources properly; a
 workaround would be to use the System.Posix.Signals API to ignore
 SIGPIPE, but I don't know if that would work as a general solution
 (it would depend on what other uses of pipes/sockets exist).

 It was a good idea, but it doesn't seem to help to add

   installHandler sigPIPE Ignore (Just fullSignalSet)

 to the main function. (Given the package name I assume
 System.Posix.Signals works similarly to regular old signals, i.e.
 globally per-process.)

 This is really starting to drive me round the bend...

Have you seen GHC ticket #1619?

http://hackage.haskell.org/trac/ghc/ticket/1619


 One further thing I've noticed: When compiling on my 64-bit machine,
 ghc issues the following warnings:

 Linux.hsc:41: warning: format ‘%d’ expects type ‘int’, but argument
 3 has type ‘long unsigned int’
 Linux.hsc:45: warning: format ‘%d’ expects type ‘int’, but argument
 3 has type ‘long unsigned int’
 Linux.hsc:45: warning: format ‘%d’ expects type ‘int’, but argument
 3 has type ‘long unsigned int’
 Linux.hsc:45: warning: format ‘%d’ expects type ‘int’, but argument
 3 has type ‘long unsigned int’

 Those lines are:

 39: -- max num of bytes in one send
 40: maxBytes :: Int64
 41: maxBytes = fromIntegral (maxBound :: (#type ssize_t))

 and

 44: foreign import ccall unsafe sendfile64 c_sendfile
 45:   :: Fd - Fd - Ptr (#type off_t) - (#type size_t) - IO
 (#type ssize_t)

 This looks like a typical 32/64-bit problem, but normally I would
 expect any real run-time problems caused by a problematic conversion
 in the FFI to crash the whole process. Maybe I'm wrong about this...

To convert those '#' constants, hsc2hs preprocessor constructs a
C file things like 'printf(%d, sizeof(ssize_t))' to use the
system's C compiler and avoid having the encode the ABI of every
platform (to be able to know the memory layout of the
structures).

So that message comes from that C file, not from your Haskell
one.  At runtime it really doesn't matter.

--
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-06 Thread Thomas Hartman
me too.

2010/2/5 MightyByte mightyb...@gmail.com:
 I've been seeing a steady stream of similar resource vanished messages
 for as long as I've been running my happstack app.  This message I get
 is this:

 socket: 58: hClose: resource vanished (Broken pipe)

 I run my app from a shell script inside a while true loop, so it
 automatically gets restarted if it crashes.  This incurs no more than
 a few seconds of down time.  Since that is acceptable for my
 application, I've never put much effort into investigating the issue.
 But I don't think the resource vanished error results in program
 termination.  When I have looked into it, I've had similar trouble
 reproducing it.  Clients such as wget and firefox don't seem to cause
 the problem.  If I remember correctly it only happens with IE.

 On Fri, Feb 5, 2010 at 2:56 AM, Bardur Arantsson s...@scientician.net wrote:
 Jeremy Shaw wrote:

 Actually,

 We should start by testing if native sendfile leaks file descriptors even
 when the whole file is sent. We have a test suite, but I am not sure if it
 tests for file handle leaking...


 I should have posted this earlier, but the exact message I'm seeing in the
 case where the Bad Client disconnects is this:

   hums: Network.Socket.SendFile.Linux: resource vanished (Broken pipe)

 Oddly, I haven't been able to reproduce this using a wget client with a ^C
 during transfer. When I disconnect wget with ^C or pkill wget or even
 pkill -9 wget, I get this message:

  hums: Network.Socket.SendFile.Linux: resource vanished (Connection reset by
 peer)

 (and no leak, as observed by lsof | grep hums).

 So there appears to be some vital difference between the handling of the two
 cases.

 Another observation which may be useful:

 Before the sendfile' API change (Handle - FilePath) in sendfile-0.6.x, my
 code used withFile to open the file and to ensure that it was closed. So
 it seems that withBinaryFile *should* also be fine. Unless the Broken pipe
 error somehow escapes the scope without causing a close.

 I don't have time to dig more right now, but I'll try to see if I can find
 out more later.

 Cheers,

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-05 Thread MightyByte
I've been seeing a steady stream of similar resource vanished messages
for as long as I've been running my happstack app.  This message I get
is this:

socket: 58: hClose: resource vanished (Broken pipe)

I run my app from a shell script inside a while true loop, so it
automatically gets restarted if it crashes.  This incurs no more than
a few seconds of down time.  Since that is acceptable for my
application, I've never put much effort into investigating the issue.
But I don't think the resource vanished error results in program
termination.  When I have looked into it, I've had similar trouble
reproducing it.  Clients such as wget and firefox don't seem to cause
the problem.  If I remember correctly it only happens with IE.

On Fri, Feb 5, 2010 at 2:56 AM, Bardur Arantsson s...@scientician.net wrote:
 Jeremy Shaw wrote:

 Actually,

 We should start by testing if native sendfile leaks file descriptors even
 when the whole file is sent. We have a test suite, but I am not sure if it
 tests for file handle leaking...


 I should have posted this earlier, but the exact message I'm seeing in the
 case where the Bad Client disconnects is this:

   hums: Network.Socket.SendFile.Linux: resource vanished (Broken pipe)

 Oddly, I haven't been able to reproduce this using a wget client with a ^C
 during transfer. When I disconnect wget with ^C or pkill wget or even
 pkill -9 wget, I get this message:

  hums: Network.Socket.SendFile.Linux: resource vanished (Connection reset by
 peer)

 (and no leak, as observed by lsof | grep hums).

 So there appears to be some vital difference between the handling of the two
 cases.

 Another observation which may be useful:

 Before the sendfile' API change (Handle - FilePath) in sendfile-0.6.x, my
 code used withFile to open the file and to ensure that it was closed. So
 it seems that withBinaryFile *should* also be fine. Unless the Broken pipe
 error somehow escapes the scope without causing a close.

 I don't have time to dig more right now, but I'll try to see if I can find
 out more later.

 Cheers,

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: sendfile leaking descriptors on Linux?

2010-02-05 Thread Brandon S. Allbery KF8NH

On Feb 5, 2010, at 02:56 , Bardur Arantsson wrote:
I should have posted this earlier, but the exact message I'm seeing  
in the case where the Bad Client disconnects is this:


  hums: Network.Socket.SendFile.Linux: resource vanished (Broken pipe)

Oddly, I haven't been able to reproduce this using a wget client  
with a ^C during transfer. When I disconnect wget with ^C or  
pkill wget or even pkill -9 wget, I get this message:


 hums: Network.Socket.SendFile.Linux: resource vanished (Connection  
reset by peer)


(and no leak, as observed by lsof | grep hums).



Broken pipe is normally handled as a signal, and is only mapped to  
an error if SIGPIPE is set to SIG_IGN.  I can well imagine that the  
SIGPIPE signal handler isn't closing resources properly; a workaround  
would be to use the System.Posix.Signals API to ignore SIGPIPE, but I  
don't know if that would work as a general solution (it would depend  
on what other uses of pipes/sockets exist).


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH




PGP.sig
Description: This is a digitally signed message part
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe