On Feb 21, 2010, at 11:50 AM, Donn Cave wrote:
The problem is that this definition of `closed' is, precisely,
`failed to respond within 2 seconds.' If there is no observable
difference between a connection that has been abandoned by the PS3,
and a connection that just suffered a momentary lapse, then there's
no way to catch the former without making connections more fragile.
No. (i think)
What happens is the PS3 has closed the connection, and if you attempt
to send any more packets the PS3 will tell you it has closed the
connection and the write() / sendfile() call will raise SIGPIPE.
The problem is we never try to send those packets, because we are
sitting at threadWaitWrite waiting to write -- and there is nothing
that is going to happen that will cause that call to select () (by
threadWaitWrite) to actually wakeup.
I believe the proposal is to add a 2 second time out to the
threadWaitWrite call. If it wakes up and can't write (because the
remote side has lost connections, etc) then it will just go back to
sleep. But if it wakes up, tries to write, and then gets sigPIPE, then
it knows the connection is actually dead and will clean up after itself.
The problem is that we have not successfully figure out what is
causing this issue in the first place.
I wrote a haskell server and a C client to try to emulate the
situation which causes threadWaitWrite to never wake-up.. but I could
not actually get that to happen. So for the PS3 client is the only
thing that causes it.
I think that applying a fix with out really understanding the problem
is asking for trouble.
Among other things, since the problem is with threadWaitWrite (not
sendfile), then the same issue ought to exist when we are calling
hPutStr, etc, since they ultimately call threadWaitWrite as well. If
hPut never has this problem, then we should understand why and use the
same solution for sendfile. If hPut does have this problem, then
fixing just sendfile isn't much of a solution.
So far there is:
- no way for anyone besides Bardur to reproduce the problem
- no sound explanation for why the PS3 client causes the error, but
nothing else does
- no proof that this error does or does not affect all the normal I/
O functions in Haskell (hPut, etc).
- jeremy
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe