Hi Aleks,

On Wed, Jun 1, 2011 at 12:14 AM, Aleksandar Dimitrov
<aleks.dimit...@googlemail.com> wrote:
> I implemented your method, with these minimal changes (i.e. just using a main
> driver in the same file.)
>
>> countUnigrams :: Handle -> IO (M.Map S.ByteString Int)
>> countUnigrams = foldLines (\ m s -> M.insertWith (+) s 1 m) M.empty
>>
>> main :: IO ()
>> main = do (f:_) <- getArgs
>>           openFile f ReadMode >>= countUnigrams >>= print . M.toList
>
> It seems to perform about 3x worse than the iteratee method in terms of time,
> and worse in terms of space :-( On Brandon's War & Peace example, hGetLine 
> uses
> 1.565 seconds for the small file, whereas my iteratee method uses 1.085s for 
> the
> small file, and around 2 minutes for the large file.

That's curious. I chatted with Duncan Coutts today and he mentioned
that hGetLine can be a bit slow as it needs to take a lock in every
read and causes some copying, which could explain why it's slower than
iteratee which works in blocks. However, I don't understand why it
uses more memory. The ByteStrings that are returned by hGetLine should
have an underlying storage of the same size as the ByteString (as
reported by length). You can try to verify this by calling 'copy' on
the ByteString before inserting it.

It looks like hGetLine needs some love.

> I also tried sprinkling strictness annotations throughout your above code, 
> but I
> failed to produce good results :-(

The strictness of the code I gave should be correct. The problem
should be elsewhere.

> I, unfortunately, don't really have any contact to "the elders," apart from 
> what
> I read on their respective blogs…

You and everyone else. :) I just spent enough time talking to people
on IRC, reading good code, blogs and mailing list posts. I think Bryan
described the process pretty well in his CUFP keynote:

http://www.serpentine.com/blog/2009/09/23/video-of-my-cufp-keynote/

Cheers,
Johan

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to