[Haskell-cafe] Project postmortem

Joel Reymont Thu, 17 Nov 2005 05:43:28 -0800

Folks,

I have done a lot of experiments over the past few weeks and came toa few interesting conclusions. First some background, then issues,solutions and conclusions.

I wrote a test harness for a poker server that understands thedifferent binary packets and can send and receive them. The harnesslaunches each "script" in a separate unbound thread that connects tothe server via TCP and does its work.

The main goals of the project were: easy scripting, very high numberof connections from the harness (a few thousand) and running onWindows. I develop on Mac OSX but have a Windows machine for testingand to run the poker server.

Another key goal was to support the server encryption. SSL encryptionis done in a wierd way that requires attaching read/write OpenSSLBIOs to the SSL descriptor so that SSL encrypts to/from memory.Encrypted chunks are then taken from the BIOs and sent as payload inservver packets.

Overall, I probably spent about 4 weeks writing the server and about2 more weeks grappling with the various issues. The issues centeredaround 1) the program trashing memory like no tomorrow, 2)intermittent crashes on Windows and 3) not being able to launch ahigh number of connections on Windows before crashing.

I significantly improved trashing of memory by switching to plainHaskell structures from nested lists of wxHaskell-style properties(attr := value). Intermittent crashes were harder to troubleshoot,specially given that things were running smoothly on Mac OSX.

Stack traces pointed into libcrypto (part of OpenSSL) and thus to theBIOs that I was allocating. I guesses that OpenSSL was maxing outsome resources and closed the leak by explicitly freeing the SSLdescriptor which freed the associated BIO structures. Then things gotwierder as my program started crashing in a different place entirelywith stack traces like this:


Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x3139322e
0x0027c174 in s8j1_info ()
(gdb) where
#0  0x0027c174 in s8j1_info ()
#1  0x0021c9f4 in StgRunIsImplementedInAssembler () at StgCRun.c:576

#2 0x0021cdc4 in schedule (mainThread=0x1100360,initialCapability=0x308548) at Schedule.c:932#3 0x0021dd6c in waitThread_ (m=0x1100360, initialCapability=0x0) atSchedule.c:2156#4 0x0021dc50 in scheduleWaitThread (tso=0x13c0000, ret=0x0,initialCapability=0x0) at Schedule.c:2050

#5  0x00219548 in rts_evalLazyIO (p=0x29b47c, ret=0x0) at RtsAPI.c:459
#6  0x001e4768 in main (argc=2262116, argv=0x308548) at Main.c:104

I took waitThread_ as a clue and started digging deeper.

Whenever I connect to the server or send a command I wait for Xseconds and if not connected or desired command is not received Ithrow an exception which fails the script. I implemented the timeoutcombinator a couple of different ways, including that in theAsynchronous Exceptions paper but it did not help. I think the issuehas to do with killing threads that are using FFI. Although I'mkilling threads that call the Haskell connectTo, hGetBuf, etc. Ithink it's still FFI.

I disposed of timeouts entirely, leaving connectTo as it is and usinghWaitForInput on my socket handle to simulate timeouts. This improvedthings tremendously and I'm now able to run a few thousands ofunbound script threads on Windows with OpenSSL FFI and everything.

Memory usage is still higher than I would have liked and crashes inOpenSSL still happen when the number of threads/memory usage isreally high so there's still room for improvement. I should probablygo back to using a foreign finalizer (SSL_free) on the SSLdescriptors rather than freeing them explicitly as the freeing doesnot happen if a script fails mid-way.

I'm quite satisfied with my first Haskell project. I love Haskell andwill continue hacking away with it. This list is invaluable in thedepth of offered help whereas #haskell (IRC) is invaluable when speedmatters. I'm quite amazed at the things I have been able to do, theexpressiveness of Haskell and the clean looks.

Clean looks can be deceptive, though, as they can hide code ofamazing complexity. Fundeps, existential types, HList take a while tograsp. Also, I feel somewhat like a pioneer and I definitely got morethan a fair share of arrows in my back.

I had GHC run out of memory during compilation (fixed by SPJ), had itquit midway during compilation with an error about generated extentsbeing too large in assembler code. I had GHC crash at runtime with anerror like "fromJust not returning Just, this could not behappening!". Yesterday's error topped them all:


internal error: update_fwd: unknown/strange object  0
   Please report this as a bug to glasgow-haskell-bugs@haskell.org,
   or http://www.sourceforge.net/projects/ghc/

I think I got this when using +RTS -C0 -c.

Overall, the experience with Haskell has been exhilarating and I'malready preparing to use it on my next projects like detectingcollusion in poker as well as rake optimization (Dazzle paper veryhelpful here!). Still, I think that GHC can be a bit rough around theedges and I would think twice about writing high-performance networkapps with it.


        Thanks, Joel

P.S. The Glasgow Distributed Haskell (GdH) people are supposed tohave a mailing list and I would love to share my findings twith thembut I could not find the mailing list itself.


--
http://wagerlabs.com/





_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Project postmortem

Reply via email to