Just for fun. Here's some improvements. about 6x faster. I'd be interested to see what io-streams could do on this.
Using a 250M test file. -- strict state monad and bang patterns on the uncons and accumulator argument: $ time ./A 4166680 ./A 8.42s user 0.57s system 99% cpu 9.037 total -- just write a loop $ time ./A 4166680 ./A 3.84s user 0.26s system 99% cpu 4.121 total -- switch to Int $ time ./A 4166680 ./A 1.89s user 0.23s system 99% cpu 2.134 total -- custom isSpace function $ time ./A 4166680 ./A 1.56s user 0.24s system 99% cpu 1.808 total -- mmap IO $ time ./A 4166680 ./A 1.54s user 0.09s system 99% cpu 1.636 total Here's the final program: {-# LANGUAGE BangPatterns #-} import qualified Data.ByteString as S import qualified Data.ByteString.Lazy.Char8 as L import System.IO.Posix.MMap.Lazy main = do f <- unsafeMMapFile "test.txt" print $ go 0 f where go :: Int -> L.ByteString -> Int go !a !s = case L.uncons s of Nothing -> a Just (x,xs) | isSpaceChar8 x -> go (a+1) xs | otherwise -> go a xs isSpaceChar8 c = c == '\n' || c == ' ' {-# INLINE isSpaceChar8 #-} On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko < to.darkan...@gmail.com> wrote: > Hi All! > > I tune my toy project for performance and hit the wall on simple, in > imperative world, task. Here is the code that model what I'm trying to > achieve > > import qualified Data.ByteString.Lazy as L > import Data.Word8(isSpace) > import Data.Word > import Control.Monad.State > > type Stream = State L.ByteString > > get_byte :: Stream (Maybe Word8) > get_byte = do > s <- get > case L.uncons s of > Nothing -> return Nothing > Just (x, xs) -> put xs >> return (Just x) > > main = do > f <- L.readFile "test.txt" > let r = evalState count_spaces f > print r > where > count_spaces = go 0 > where > go a = do > x <- get_byte > case x of > Just x' -> if isSpace x' then go (a + 1) else go a > Nothing -> return a > > It takes the file and count spaces, in imperative way, consuming bytes one > by one. The problem is: How to rewrite this to get rid of constant > allocation of state but still working with stream of bytes? I can rewrite > this as one-liner L.foldl, but that doesn't help me in any way to optimize > my toy project where all algorithms build upon consuming stream of bytes. > > PS. My main lang is C++ over 10 years and I only learn Haskell :) > > > ______________________________**_________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe> >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe