On Thu, Sep 18, 2008 at 1:55 PM, Don Stewart <[EMAIL PROTECTED]> wrote: > wchogg: >> On Thu, Sep 18, 2008 at 1:29 PM, Don Stewart <[EMAIL PROTECTED]> wrote: >> > wchogg: >> >> Hey Haskell, >> >> So for a fairly inane reason, I ended up taking a couple of minutes >> >> and writing a program that would spit out, to the console, the number >> >> of lines in a file. Off the top of my head, I came up with this which >> >> worked fine with files that had 100k lines: >> >> >> >> main = do >> >> path <- liftM head $ getArgs >> >> h <- openFile path ReadMode >> >> n <- execStateT (countLines h) 0 >> >> print n >> >> >> >> untilM :: Monad m => (a -> m Bool) -> (a -> m ()) -> a -> m () >> >> untilM cond action val = do >> >> truthy <- cond val >> >> if truthy then return () else action val >> (untilM cond action val) >> >> >> >> countLines :: Handle -> StateT Int IO () >> >> countLines = untilM (\h -> lift $ hIsEOF h) (\h -> do >> >> lift $ hGetLine h >> >> modify (+1)) >> >> >> >> If this makes anyone cringe or cry "you're doing it wrong", I'd >> >> actually like to hear it. I never really share my projects, so I >> >> don't know how idiosyncratic my style is. >> > >> > This makes me cry. >> > >> > import System.Environment >> > import qualified Data.ByteString.Lazy.Char8 as B >> > >> > main = do >> > [f] <- getArgs >> > s <- B.readFile f >> > print (B.count '\n' s) >> > >> > Compile it. >> > >> > $ ghc -O2 --make A.hs >> > >> > $ time ./A /usr/share/dict/words >> > 52848 >> > ./A /usr/share/dict/words 0.00s user 0.00s system 93% cpu 0.007 total >> > >> > Against standard tools: >> > >> > $ time wc -l /usr/share/dict/words >> > 52848 /usr/share/dict/words >> > wc -l /usr/share/dict/words 0.01s user 0.00s system 88% cpu 0.008 total >> >> So both you & Bryan do essentially the same thing and of course both >> versions are far better than mine. So the purpose of using the Lazy >> version of ByteString was so that the file is only incrementally >> loaded by readFile as count is processing? > > Yep, that's right > > The streaming nature is implicit in the lazy bytestring. It's kind of > the dual of explicit chunkwise control -- chunk processing reified into > the data structure.
Hi Don, I have a bit more of a followup, actually. You make use of the built in bytestring consumer count, which itself is built upon the foldlChunks function which is only exported in the ByteString.Lazy.Internal. If I want to make my own efficient bytestring consumer, is that what I need to use in order to preserve the inherent laziness of the datastructure? Also, I feel a little at a loss for how to make a good bytestring producer for efficiently _writing_ large swaths of data via writeFile. Would it be possible to whip up a small example? Oh, and lastly, I apologize to both you & Bryan for making you cry. I hope you can forgive my cruelty. Thanks, Creighton _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe