Send Beginners mailing list submissions to beginners@haskell.org To subscribe or unsubscribe via the World Wide Web, visit http://www.haskell.org/mailman/listinfo/beginners or, via email, send a message with subject or body 'help' to beginners-requ...@haskell.org
You can reach the person managing the list at beginners-ow...@haskell.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beginners digest..." Today's Topics: 1. Fwd: Implementing a spellchecker - problem with Data.HashTable performance (Rados?aw Szymczyszyn) ---------------------------------------------------------------------- Message: 1 Date: Fri, 20 Apr 2012 11:33:51 +0200 From: Rados?aw Szymczyszyn <lav...@gmail.com> Subject: [Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance To: beginners@haskell.org Message-ID: <CAG=dco3z3gh9fdgsikbts4riv1i8gqhvt+o_sgqsptmd9ms...@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 Hello fellow Haskellers, It's my first post to this list, so at first: Greetings to everyone! I've wanted to pick up Haskell for some time and I've run in to a perfect (at least it seemed so) oocasion at my Natural Language Processing course. At first I tried using Python, but it turned out that some extra performance wouldn't hurt and I hoped Haskell would give me some boost without forcing me to sacrifice high-level expressiveness at the same time. The first lesson was to use ByteStrings instead of ordinary Strings for IO. Then I needed to build a dictionary of words found in the source text. That's the problematic part - I've tried using a few structures, namely Data.Map and Data.HashTable from base package and Data.HashMap from unordered-containers. In all cases I couldn't reach acceptable performance. As of that, I'd like to ask about some advice and comments about the following examples. First example [1] just builds a HashTable of ~1 500 000 ByteStrings. The program takes about 11 seconds to execute. Second example [2] does roughly the same, but taking the HashTable data from a file (just a dictionary of valid Polish forms). Reading data itself is not a problem, as I've measured the execution time of a program which just reads the file (and prints its last line to actually force the read, hence "handle" Haskell's laziness) and it takes no more than a second. Coupled together reading and building, however, takes unbearably long. I didn't wait till the program stopped and just terminated it, but it had been running for ~10 minutes already. The most disappointing fact is that a roughly equivalent (and rather effortless to write in contrast to the Haskell example - I'm just learning though) Python test which just reads everything into a dict takes about 10 seconds to execute (reading and updating the dict of course). Writing the spellchecker in Python seemed pretty straightforward for me (but its certainly not the only language I know; I've had some Java (who didn't nowadays), system level C and am daily working in Erlang). I understand of course, that the most convenient tool is generally the one you know the best, but my experience writing the Haskell equivalent reached both extremes - effortless expression of the logic, but frustration because of the unexplainable (for me at least) performance behaviour. The question is why is it so slow? (or AM I RILY DAT DUMB? ;) The link to examples: [1] https://gist.github.com/2411699 TestHashTableArtificial [2] https://gist.github.com/2411699 TestReadIntoHashTable I'm looking forward to somebody shedding some light on this. Thanks in advance! Best regards, Radek Szymczyszyn ------------------------------ _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners End of Beginners Digest, Vol 46, Issue 36 *****************************************