Send Beginners mailing list submissions to beginners@haskell.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners or, via email, send a message with subject or body 'help' to beginners-requ...@haskell.org
You can reach the person managing the list at beginners-ow...@haskell.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beginners digest..." Today's Topics: 1. Re: How to write faster ByteString/Conduit code (John Ky) ---------------------------------------------------------------------- Message: 1 Date: Mon, 04 Apr 2016 09:37:27 +0000 From: John Ky <newho...@gmail.com> To: The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell <Beginners@haskell.org> Subject: Re: [Haskell-beginners] How to write faster ByteString/Conduit code Message-ID: <camb4o-b_leowq7jbkuumzw_w5zqvq68gj7zaemzmf6p8q0k...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" It turns out that using a simple enum type to implement a state machine instead of a function avoids the performance penalty and allows me to collapse a four stage conduit pipeline into one with 4 x performance improvement. blankStrings :: MonadThrow m => Conduit BS.ByteString m BS.ByteString blankStrings = blankStrings' InJson blankStrings' :: MonadThrow m => FastState -> Conduit BS.ByteString m BS.ByteString blankStrings' lastState = do mbs <- await case mbs of Just bs -> do let (!cs, Just (!nextState, _)) = unfoldrN (BS.length bs) blankByteString (lastState, bs) yield cs blankStrings' nextState Nothing -> return () where blankByteString :: (FastState, ByteString) -> Maybe (Word8, (FastState, ByteString)) blankByteString (InJson, bs) = case BS.uncons bs of Just (!c, !cs) | isLeadingDigit c -> Just (w1 , (InNumber , cs)) Just (!c, !cs) | c == wDoubleQuote -> Just (wOpenParen , (InString , cs)) Just (!c, !cs) | isAlphabetic c -> Just (c , (InIdent , cs)) Just (!c, !cs) -> Just (c , (InJson , cs)) Nothing -> Nothing blankByteString (InString, bs) = case BS.uncons bs of Just (!c, !cs) | c == wBackslash -> Just (wSpace , (Escaped , cs)) Just (!c, !cs) | c == wDoubleQuote -> Just (wCloseParen, (InJson , cs)) Just (_ , !cs) -> Just (wSpace , (InString , cs)) Nothing -> Nothing blankByteString (Escaped, bs) = case BS.uncons bs of Just (_, !cs) -> Just (wSpace, (InString, cs)) Nothing -> Nothing blankByteString (InNumber, bs) = case BS.uncons bs of Just (!c, !cs) | isTrailingDigit c -> Just (w0 , (InNumber , cs)) Just (!c, !cs) | c == wDoubleQuote -> Just (wOpenParen , (InString , cs)) Just (!c, !cs) | isAlphabetic c -> Just (c , (InIdent , cs)) Just (!c, !cs) -> Just (c , (InJson , cs)) Nothing -> Nothing blankByteString (InIdent, bs) = case BS.uncons bs of Just (!c, !cs) | isAlphabetic c -> Just (wUnderscore, (InIdent , cs)) Just (!c, !cs) | isLeadingDigit c -> Just (w1 , (InNumber , cs)) Just (!c, !cs) | c == wDoubleQuote -> Just (wOpenParen , (InString , cs)) Just (!c, !cs) -> Just (c , (InJson , cs)) Nothing -> Nothing I?m quite please with this, but any further suggestions are still welcome. Cheers, -John On Sun, 3 Apr 2016 at 23:55 John Ky newho...@gmail.com <http://mailto:newho...@gmail.com> wrote: Hi Haskellers, > > I just rewrote the code to a state-machine in the hope that I can > eventually collapse several stages in a pipeline into one, but this simple > state-machine version turns out to be about 3 times slower even though it > does the same thing: > > newtype Blank = Blank > { blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank)) > } > > escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank)) > escapeChar bs = case BS.uncons bs of > Just (c, cs) -> Just (c, (cs, Blank (if c /= wBackslash then escapeChar > else escapedChar))) > Nothing -> Nothing > > escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank)) > escapedChar bs = case BS.uncons bs of > Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar)) > Nothing -> Nothing > > fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString > fastBlank = fastBlank' escapeChar > > fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8, (BS.ByteString, > Blank))) -> Conduit BS.ByteString m BS.ByteString > fastBlank' blank = do > mbs <- await > case mbs of > Just bs -> do > let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs) (\(bs, > Blank f) -> f bs) (bs, Blank blank) > yield cs > fastBlank' newBlank > Nothing -> return () > > I worry that if I go this approach, just the cost of the state-machine > might mean I only break-even. > > Is there any reason why this version should be slower? > > Cheers, > > -John > ? > > On Sun, 3 Apr 2016 at 23:11 John Ky <newho...@gmail.com> wrote: > >> Hello Haskellers, >> >> I?ve been trying to squeeze as much performance out of my code as >> possible and I?ve come to a point where can?t figure out what more I can do. >> >> Here is some example code: >> >> blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString >> blankEscapedChars = blankEscapedChars' "" >> >> blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString >> m BS.ByteString >> blankEscapedChars' rs = do >> mbs <- await >> case mbs of >> Just bs -> do >> let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs >> let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs)) >> yield ds >> blankEscapedChars' (BS.drop (BS.length ds) cs) >> Nothing -> when (BS.length rs > 0) (yield rs) >> where >> unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, >> ByteString)) >> unescapeByteString (wasEscaped, bs) = case BS.uncons bs of >> Just (_, cs) | wasEscaped -> Just (wUnderscore, (False, cs)) >> Just (c, cs) | c /= wBackslash -> Just (c, (False, cs)) >> Just (c, cs) -> Just (c, (True, cs)) >> Nothing -> Nothing >> >> The above function blankEscapedChars will go find all \ characters and >> convert the following character to a _. For a 1 MB in memory JSON >> ByteString, it benches at about 6.6 ms >> >> In all my code the basic strategy is the same. await for the next byte >> string, then use and unfoldrN to produce a new ByteString for yielding. >> >> Anyone know of a way to go faster? >> >> Cheers, >> >> -John >> ? >> > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160404/2a10b43f/attachment.html> ------------------------------ Subject: Digest Footer _______________________________________________ Beginners mailing list Beginners@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners ------------------------------ End of Beginners Digest, Vol 94, Issue 2 ****************************************