Oh, I forgot the technique of inlining the lazy bytestring chunks, and processing each chunk seperately.
$ time ./fast 4166680 ./fast 1.25s user 0.07s system 99% cpu 1.325 total Essentially inline Lazy.foldlChunks and specializes is (the inliner should really get that). And now we have a nice unboxed inner loop, which llvm might spot: $ ghc -O2 -funbox-strict-fields fast.hs --make -fllvm $ time ./fast 4166680 ./fast 1.07s user 0.06s system 98% cpu *1.146 total* So about 8x faster. Waiting for some non-lazy bytestring benchmarks... :) {-# LANGUAGE BangPatterns #-} import Data.ByteString.Internal import Data.ByteString.Unsafe import qualified Data.ByteString.Char8 as S import qualified Data.ByteString.Lazy.Char8 as L import qualified Data.ByteString.Lazy.Internal as L import System.IO.Posix.MMap.Lazy main = do f <- unsafeMMapFile "test.txt" print . new 0 $ L.toChunks f new :: Int -> [ByteString] -> Int new i [] = i new i (x:xs) = new (add i x) xs -- jump into the fast path {-# INLINE add #-} add :: Int -> ByteString -> Int add !i !s | S.null s = i | isSpace' x = add (i+1) xs | otherwise = add i xs where T x xs = uncons s data T = T !Char ByteString uncons s = T (w2c (unsafeHead s)) (unsafeTail s) isSpace' c = c == '\n' || c == ' ' {-# INLINE isSpace' #-} On Tue, Mar 19, 2013 at 7:36 PM, Don Stewart <don...@gmail.com> wrote: > Just for fun. Here's some improvements. about 6x faster. > I'd be interested to see what io-streams could do on this. > > Using a 250M test file. > > -- strict state monad and bang patterns on the uncons and accumulator > argument: > > $ time ./A > 4166680 > ./A 8.42s user 0.57s system 99% cpu 9.037 total > > -- just write a loop > > $ time ./A > 4166680 > ./A 3.84s user 0.26s system 99% cpu 4.121 total > > -- switch to Int > > $ time ./A > 4166680 > ./A 1.89s user 0.23s system 99% cpu 2.134 total > > -- custom isSpace function > > $ time ./A > 4166680 > ./A 1.56s user 0.24s system 99% cpu 1.808 total > > -- mmap IO > > $ time ./A > 4166680 > ./A 1.54s user 0.09s system 99% cpu 1.636 total > > Here's the final program: > > > {-# LANGUAGE BangPatterns #-} > > import qualified Data.ByteString as S > import qualified Data.ByteString.Lazy.Char8 as L > import System.IO.Posix.MMap.Lazy > > main = do > f <- unsafeMMapFile "test.txt" > print $ go 0 f > where > go :: Int -> L.ByteString -> Int > go !a !s = case L.uncons s of > Nothing -> a > Just (x,xs) | isSpaceChar8 x -> go (a+1) xs > | otherwise -> go a xs > > isSpaceChar8 c = c == '\n' || c == ' ' > {-# INLINE isSpaceChar8 #-} > > > On Mon, Mar 18, 2013 at 8:53 AM, Konstantin Litvinenko < > to.darkan...@gmail.com> wrote: > >> Hi All! >> >> I tune my toy project for performance and hit the wall on simple, in >> imperative world, task. Here is the code that model what I'm trying to >> achieve >> >> import qualified Data.ByteString.Lazy as L >> import Data.Word8(isSpace) >> import Data.Word >> import Control.Monad.State >> >> type Stream = State L.ByteString >> >> get_byte :: Stream (Maybe Word8) >> get_byte = do >> s <- get >> case L.uncons s of >> Nothing -> return Nothing >> Just (x, xs) -> put xs >> return (Just x) >> >> main = do >> f <- L.readFile "test.txt" >> let r = evalState count_spaces f >> print r >> where >> count_spaces = go 0 >> where >> go a = do >> x <- get_byte >> case x of >> Just x' -> if isSpace x' then go (a + 1) else go a >> Nothing -> return a >> >> It takes the file and count spaces, in imperative way, consuming bytes >> one by one. The problem is: How to rewrite this to get rid of constant >> allocation of state but still working with stream of bytes? I can rewrite >> this as one-liner L.foldl, but that doesn't help me in any way to optimize >> my toy project where all algorithms build upon consuming stream of bytes. >> >> PS. My main lang is C++ over 10 years and I only learn Haskell :) >> >> >> ______________________________**_________________ >> Haskell-Cafe mailing list >> Haskell-Cafe@haskell.org >> http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe> >> > >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe