Send Beginners mailing list submissions to beginners@haskell.org To subscribe or unsubscribe via the World Wide Web, visit http://www.haskell.org/mailman/listinfo/beginners or, via email, send a message with subject or body 'help' to beginners-requ...@haskell.org
You can reach the person managing the list at beginners-ow...@haskell.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beginners digest..." Today's Topics: 1. Re: Fwd: Implementing a spellchecker - problem with Data.HashTable performance (Karol Samborski) 2. Re: Fwd: Implementing a spellchecker - problem with Data.HashTable performance (Brent Yorgey) 3. Re: Two questions for a production project... (umptious) 4. Re: Fwd: Implementing a spellchecker - problem with Data.HashTable performance (umptious) 5. Re: Fwd: Implementing a spellchecker - problem with Data.HashTable performance (Karol Samborski) 6. Re: Training tasks (umptious) ---------------------------------------------------------------------- Message: 1 Date: Fri, 20 Apr 2012 12:30:38 +0200 From: Karol Samborski <edv.ka...@gmail.com> Subject: Re: [Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance To: Rados?aw Szymczyszyn <lav...@gmail.com> Cc: beginners@haskell.org Message-ID: <CACe2dTt7vt2q3ypbaB+XU35CU=t1jn7bnucrzzcf5gwfzib...@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 W dniu 20 kwietnia 2012 11:33 u?ytkownik Rados?aw Szymczyszyn <lav...@gmail.com> napisa?: > Hello fellow Haskellers, > > It's my first post to this list, so at first: Greetings to everyone! > > I've wanted to pick up Haskell for some time and I've run in > to a perfect (at least it seemed so) oocasion at my Natural Language > Processing course. At first I tried using Python, but it turned out > that some extra performance wouldn't hurt and I hoped Haskell would > give me some boost without forcing me to sacrifice high-level > expressiveness at the same time. > > The first lesson was to use ByteStrings instead of ordinary Strings > for IO. Then I needed to build a dictionary of words found in the > source text. That's the problematic part - I've tried using a few > structures, namely Data.Map and Data.HashTable from base package and > Data.HashMap from unordered-containers. In all cases I couldn't reach > acceptable performance. As of that, I'd like to ask about some advice > and comments about the following examples. > > First example [1] just builds a HashTable of ~1 500 000 ByteStrings. > The program takes about 11 seconds to execute. Second example [2] does > roughly the same, but taking the HashTable data from a file (just a > dictionary of valid Polish forms). Reading data itself is not a > problem, as I've measured the execution time of a program which just > reads the file (and prints its last line to actually force the read, > hence "handle" Haskell's laziness) and it takes no more than a second. > Coupled together reading and building, however, takes unbearably long. > I didn't wait till the program stopped and just terminated it, but it > had been running for ~10 minutes already. > > The most disappointing fact is that a roughly equivalent (and rather > effortless to write in contrast to the Haskell example - I'm just > learning though) Python test which just reads everything into a dict > takes about 10 seconds to execute (reading and updating the dict of > course). Writing the spellchecker in Python seemed pretty > straightforward for me (but its certainly not the only language I > know; I've had some Java (who didn't nowadays), system level C and am > daily working in Erlang). I understand of course, that the most > convenient tool is generally the one you know the best, but my > experience writing the Haskell equivalent reached both extremes - > effortless expression of the logic, but frustration because of the > unexplainable (for me at least) performance behaviour. > > The question is why is it so slow? (or AM I RILY DAT DUMB? ;) > > The link to examples: > [1] https://gist.github.com/2411699 TestHashTableArtificial > [2] https://gist.github.com/2411699 TestReadIntoHashTable > > I'm looking forward to somebody shedding some light on this. Thanks in > advance! > > Best regards, > Radek Szymczyszyn Hi! Did you try with -O2? I've created file "formy.utf8" filled with 1400000 lines of word "?y?niejszymi" and here is my output: $ ghc --make -O2 TestReadIntoHashTable.hs -o test -rtsopts $ time ./test +RTS -K100m -RTS 1400000 real 0m3.265s user 0m3.097s sys 0m0.143s Best, Karol Samborski ------------------------------ Message: 2 Date: Fri, 20 Apr 2012 07:52:56 -0400 From: Brent Yorgey <byor...@seas.upenn.edu> Subject: Re: [Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance To: beginners@haskell.org Message-ID: <20120420115256.ga22...@seas.upenn.edu> Content-Type: text/plain; charset=utf-8 I seem to recall that Data.HashTable is quite slow and not recommended for use. Try using Data.HashMap.Lazy/Strict from the unordered-containers package instead. -Brent On Fri, Apr 20, 2012 at 11:33:51AM +0200, Rados?aw Szymczyszyn wrote: > Hello fellow Haskellers, > > It's my first post to this list, so at first: Greetings to everyone! > > I've wanted to pick up Haskell for some time and I've run in > to a perfect (at least it seemed so) oocasion at my Natural Language > Processing course. At first I tried using Python, but it turned out > that some extra performance wouldn't hurt and I hoped Haskell would > give me some boost without forcing me to sacrifice high-level > expressiveness at the same time. > > The first lesson was to use ByteStrings instead of ordinary Strings > for IO. Then I needed to build a dictionary of words found in the > source text. That's the problematic part - I've tried using a few > structures, namely Data.Map and Data.HashTable from base package and > Data.HashMap from unordered-containers. In all cases I couldn't reach > acceptable performance. As of that, I'd like to ask about some advice > and comments about the following examples. > > First example [1] just builds a HashTable of ~1 500 000 ByteStrings. > The program takes about 11 seconds to execute. Second example [2] does > roughly the same, but taking the HashTable data from a file (just a > dictionary of valid Polish forms). Reading data itself is not a > problem, as I've measured the execution time of a program which just > reads the file (and prints its last line to actually force the read, > hence "handle" Haskell's laziness) and it takes no more than a second. > Coupled together reading and building, however, takes unbearably long. > I didn't wait till the program stopped and just terminated it, but it > had been running for ~10 minutes already. > > The most disappointing fact is that a roughly equivalent (and rather > effortless to write in contrast to the Haskell example - I'm just > learning though) Python test which just reads everything into a dict > takes about 10 seconds to execute (reading and updating the dict of > course). Writing the spellchecker in Python seemed pretty > straightforward for me (but its certainly not the only language I > know; I've had some Java (who didn't nowadays), system level C and am > daily working in Erlang). I understand of course, that the most > convenient tool is generally the one you know the best, but my > experience writing the Haskell equivalent reached both extremes - > effortless expression of the logic, but frustration because of the > unexplainable (for me at least) performance behaviour. > > The question is why is it so slow? (or AM I RILY DAT DUMB? ;) > > The link to examples: > [1] https://gist.github.com/2411699 TestHashTableArtificial > [2] https://gist.github.com/2411699 TestReadIntoHashTable > > I'm looking forward to somebody shedding some light on this. Thanks in > advance! > > Best regards, > Radek Szymczyszyn > > _______________________________________________ > Beginners mailing list > Beginners@haskell.org > http://www.haskell.org/mailman/listinfo/beginners ------------------------------ Message: 3 Date: Fri, 20 Apr 2012 14:38:34 +0100 From: umptious <umpti...@gmail.com> Subject: Re: [Haskell-beginners] Two questions for a production project... To: Michael Orlitzky <mich...@orlitzky.com> Cc: beginners@haskell.org Message-ID: <cae20bnv0b+wweshuclzap08xrcxhd1bh_1i1_3u-aas2qwx...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" This is the sort of question you might want to ask on Haskell cafe too. But - This really comes down to whether the GC can trusted, and I would astonished if it was otherwise. It would make writing web frameworks in Haskell rather pointless. (Otoh I can't find a list of real world users for Yesod etc.) - A lot of people use XMonad to manage their desktops and just leave their machines on *forever.* - If Haskell does prove to be unsuitable (which is very unlikely) then Erlang is a safe bet. It's about the most reliable language there is - it was designed for telecoms. It's much simpler to learn than Haskell, although less funky (eg no lazy evaluation) and closer to a scripting language in speed. Get a copy of Programming *Erlang*: Software for a Concurrent World by Joe *Armstrong* and with even minimal Haskell experience you should be writing code straight away. On 20 April 2012 03:43, Michael Orlitzky <mich...@orlitzky.com> wrote: > On 04/19/2012 07:40 PM, Mike Meyer wrote: > > I've got a project coming up that I could use haskell for, providing I > > can convince myself that it's appropriate. The critical question is: > > > > Anyone using Haskell in a long-running production application? I'm > > talking about something where the program would run for weeks or > > months non-stop, with either multiple threads or multiple copies. > > I have a rather stupid application that parses feeds from the Twitter > API. For each username given on the command line, it calls forkIO and > the new thread enters a recursive loop: > > recurse x y z = do > ... > recurse x y z > > forever and ever. > > Contrary to my expectations (I still basically don't know what I'm doing > in Haskell) it seems to run in constant space for months at a time. > > _______________________________________________ > Beginners mailing list > Beginners@haskell.org > http://www.haskell.org/mailman/listinfo/beginners > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.haskell.org/pipermail/beginners/attachments/20120420/8768de00/attachment-0001.htm> ------------------------------ Message: 4 Date: Fri, 20 Apr 2012 14:45:25 +0100 From: umptious <umpti...@gmail.com> Subject: Re: [Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance Cc: beginners@haskell.org Message-ID: <cae20bnv3fex8etersm2ekl-5vzuvw5z8vrhwz3sakgyx4wt...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" While I think Misters Samborski and Yorgey have fixed the performance problem, I'm still curious about the apparent synergy effect: > First example [1] just builds a HashTable of ~1 500 000 ByteStrings. > > The program takes about 11 seconds to execute. > > > Reading data itself is not a > > problem, as I've measured the execution time of a program which just > > reads the file (and prints its last line to actually force the read, > > hence "handle" Haskell's laziness) and it takes no more than a second. > > > Coupled together reading and building, however, takes unbearably long. > > I didn't wait till the program stopped and just terminated it, but it > > had been running for ~10 minutes already. > ??? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.haskell.org/pipermail/beginners/attachments/20120420/474e2415/attachment-0001.htm> ------------------------------ Message: 5 Date: Fri, 20 Apr 2012 16:01:53 +0200 From: Karol Samborski <edv.ka...@gmail.com> Subject: Re: [Haskell-beginners] Fwd: Implementing a spellchecker - problem with Data.HashTable performance To: umptious <umpti...@gmail.com> Cc: beginners@haskell.org Message-ID: <cace2dtuhrobegtxuuq1qrql1b8b8rtznpttcepdnt4cvvn0...@mail.gmail.com> Content-Type: text/plain; charset=UTF-8 W dniu 20 kwietnia 2012 15:45 u?ytkownik umptious <umpti...@gmail.com> napisa?: > While I think Misters Samborski and Yorgey have fixed the performance > problem, I'm still curious about the apparent synergy effect: > Hi, I wrote about the example program of that effect last time. With -O2 flag it took 3 sek. Best, Karol Samborski ------------------------------ Message: 6 Date: Fri, 20 Apr 2012 15:02:21 +0100 From: umptious <umpti...@gmail.com> Subject: Re: [Haskell-beginners] Training tasks To: beginners@haskell.org Message-ID: <CAE20bNuGd+MaWL6Jme8zqo33ePUo-tKi82ZO-=xwggcxcac...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" On 19 April 2012 20:20, Michael Litchard <mich...@schmong.org> wrote: > Hi, I noticed one of your suggestions > > > - A version of the classic Traveller/Elite trading system - a really nice > > little toy business rules system that can be code either minimally or to > > show off the funkier features of a language to get more flexibility and > more > > interesting pricing rules. Minimal version would be a few tens of lines > and > > the config file declaring item > > I couldn't find anything on Elite, could you give me a reference? > I found a reference concerning Traveller, but if you had anything in > mind, it may be helpful. > > Thanks :) > If you download Oolite, available from most Linux repositories, it should have the same trading rules as the original Elite (which it stole from the Traveller pen and paper game.) There's a space trading game for the Palm Pilot that has improved more interesting rules - maybe you can find a Palm emulator if you don't have one? - http://en.wikipedia.org/wiki/Space_Trader_%28Palm_OS%29 Basically, you like goods that can be traded and you have worlds to sell them on. Worlds have government types and economic types - and in more complex variants ecologies and events - that affect the price and supply of goods. Eg Fish will be cheap on a Farming world with an Ocean environment, but expensive on an Industrial Desert world especially if it is Rich; Guns will sell for more if a War is going on; Survival Equipment and any Food Item will sell for more in the aftermath of a disaster. You can also model legality based on a Government type. The challenge is to get a system that allows trade rules to be concise and expressive while being written in a form that can be edited by a non-programmer - the config can be an .hs file, but it should be one that even a non-coder can understand and edit. It's a nice problem because it's scalable (you increase the complexity of the rules) and it has a lot of characteristics of real business systems. I knocked out the core of a solution last night - together with the recursive Shape Editor it's one of the best language test programs I know - and should have something nice around Monday if you want a copy. - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.haskell.org/pipermail/beginners/attachments/20120420/e5675244/attachment.htm> ------------------------------ _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners End of Beginners Digest, Vol 46, Issue 37 *****************************************