Jeremy Shaw wrote:
Hello,

I think to make progress on this bug we really need a failing test case that
other people can reproduce.

I have hacked up small server that should reproduce the error (using fdWrite
instead of sendfile). And a small C client which is intended to reproduce
the error -- but doesn't.

I have attached both.

The server tries to write a whole lot of 'a' characters to the client. The
client does not consume any of them. This causes the server to block on the
threadWaitWrite.

No matter how I kill the client, threadWaitWrite always wakes up.

Are you running the client and server on different physical machines? If so, have you tried simply yanking the connection?

Your client isn't dropping the connection hard -- if you kill the client (even with a -9) your OS cleans up any open sockets it has. On well-behaved OS'es that cleanup usually involves properly shutting down the connection somehow. Different OS'es have different ideas about what constitutes "properly shutting down the connection" -- some simply don't.

My hypothesis is that the PS3 doesn't properly shut down the connection, but simply sends a RST (or maybe a FIN) and drops any further packets. I'll do a Wireshark dump after posting this to see if I can see what it's doing at the TCP level -- I'm not optimistic about seeing the exact moment when the "leak" occurs, but maybe the general pattern can yield some useful ideas.

I have no idea how to test this without using an actual PS3.

> So, we
need to figure out exactly what the PS3 is doing differently that causes
threadWaitWrite to not wakeup..

Does it matter? I can reproduce this reliably within a few minutes of testing.

Note that this doesn't happen *every* time the PS3 disconnects and reconnects, it just happens some of the time. It's enough to eat up MAX_FDs file descriptors in a few hours of playing media normally. If I do a lot of seeking (forces a disconnect+reconnect) through the movie, at least one file descriptor usually leaks within a few minutes.

If we don't know why it is failing, then I
don't think we can properly fix it.

I'm more pragmatic: If, after applying a fix, I cannot reproduce this problem within a few hours (or so) or running my media server, I'd say it's fixed. As long as the modifications to the sendfile library don't change its behavior in other ways, I don't see the problem.

P.S. Does anyone else out there have a PS3 to test with?

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to