Send Beginners mailing list submissions to
        beginners@haskell.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        beginners-requ...@haskell.org

You can reach the person managing the list at
        beginners-ow...@haskell.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1. Re:  How to write faster ByteString/Conduit code (John Ky)


----------------------------------------------------------------------

Message: 1
Date: Mon, 04 Apr 2016 09:37:27 +0000
From: John Ky <newho...@gmail.com>
To: The Haskell-Beginners Mailing List - Discussion of primarily
        beginner-level topics related to Haskell <Beginners@haskell.org>
Subject: Re: [Haskell-beginners] How to write faster
        ByteString/Conduit code
Message-ID:
        <camb4o-b_leowq7jbkuumzw_w5zqvq68gj7zaemzmf6p8q0k...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

It turns out that using a simple enum type to implement a state machine
instead of a function avoids the performance penalty and allows me to
collapse a four stage conduit pipeline into one with 4 x performance
improvement.

blankStrings :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
blankStrings = blankStrings' InJson

blankStrings' :: MonadThrow m => FastState -> Conduit BS.ByteString m
BS.ByteString
blankStrings' lastState = do
  mbs <- await
  case mbs of
    Just bs -> do
      let (!cs, Just (!nextState, _)) = unfoldrN (BS.length bs)
blankByteString (lastState, bs)
      yield cs
      blankStrings' nextState
    Nothing -> return ()
  where
    blankByteString :: (FastState, ByteString) -> Maybe (Word8,
(FastState, ByteString))
    blankByteString (InJson, bs) = case BS.uncons bs of
      Just (!c, !cs) | isLeadingDigit c   -> Just (w1         , (InNumber , cs))
      Just (!c, !cs) | c == wDoubleQuote  -> Just (wOpenParen , (InString , cs))
      Just (!c, !cs) | isAlphabetic c     -> Just (c          , (InIdent  , cs))
      Just (!c, !cs)                      -> Just (c          , (InJson   , cs))
      Nothing -> Nothing
    blankByteString (InString, bs) = case BS.uncons bs of
      Just (!c, !cs) | c == wBackslash    -> Just (wSpace     , (Escaped  , cs))
      Just (!c, !cs) | c == wDoubleQuote  -> Just (wCloseParen, (InJson   , cs))
      Just (_ , !cs)                      -> Just (wSpace     , (InString , cs))
      Nothing                             -> Nothing
    blankByteString (Escaped, bs) = case BS.uncons bs of
      Just (_, !cs)                       -> Just (wSpace, (InString, cs))
      Nothing                             -> Nothing
    blankByteString (InNumber, bs) = case BS.uncons bs of
      Just (!c, !cs) | isTrailingDigit c  -> Just (w0         , (InNumber , cs))
      Just (!c, !cs) | c == wDoubleQuote  -> Just (wOpenParen , (InString , cs))
      Just (!c, !cs) | isAlphabetic c     -> Just (c          , (InIdent  , cs))
      Just (!c, !cs)                      -> Just (c          , (InJson   , cs))
      Nothing                             -> Nothing
    blankByteString (InIdent, bs) = case BS.uncons bs of
      Just (!c, !cs) | isAlphabetic c     -> Just (wUnderscore, (InIdent  , cs))
      Just (!c, !cs) | isLeadingDigit c   -> Just (w1         , (InNumber , cs))
      Just (!c, !cs) | c == wDoubleQuote  -> Just (wOpenParen , (InString , cs))
      Just (!c, !cs)                      -> Just (c          , (InJson   , cs))
      Nothing                             -> Nothing

I?m quite please with this, but any further suggestions are still welcome.

Cheers,

-John

On Sun, 3 Apr 2016 at 23:55 John Ky newho...@gmail.com
<http://mailto:newho...@gmail.com> wrote:

Hi Haskellers,
>
> I just rewrote the code to a state-machine in the hope that I can
> eventually collapse several stages in a pipeline into one, but this simple
> state-machine version turns out to be about 3 times slower even though it
> does the same thing:
>
> newtype Blank = Blank
>   { blank :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
>   }
>
> escapeChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
> escapeChar bs = case BS.uncons bs of
>   Just (c, cs)  -> Just (c, (cs, Blank (if c /= wBackslash then escapeChar 
> else escapedChar)))
>   Nothing       -> Nothing
>
> escapedChar :: BS.ByteString -> Maybe (Word8, (BS.ByteString, Blank))
> escapedChar bs = case BS.uncons bs of
>   Just (_, cs) -> Just (wUnderscore, (cs, Blank escapeChar))
>   Nothing      -> Nothing
>
> fastBlank :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
> fastBlank = fastBlank' escapeChar
>
> fastBlank' :: MonadThrow m => (BS.ByteString -> Maybe (Word8, (BS.ByteString, 
> Blank))) -> Conduit BS.ByteString m BS.ByteString
> fastBlank' blank = do
>   mbs <- await
>   case mbs of
>     Just bs -> do
>       let (cs, Just (_, Blank newBlank)) = unfoldrN (BS.length bs) (\(bs, 
> Blank f) -> f bs) (bs, Blank blank)
>       yield cs
>       fastBlank' newBlank
>     Nothing -> return ()
>
> I worry that if I go this approach, just the cost of the state-machine
> might mean I only break-even.
>
> Is there any reason why this version should be slower?
>
> Cheers,
>
> -John
> ?
>
> On Sun, 3 Apr 2016 at 23:11 John Ky <newho...@gmail.com> wrote:
>
>> Hello Haskellers,
>>
>> I?ve been trying to squeeze as much performance out of my code as
>> possible and I?ve come to a point where can?t figure out what more I can do.
>>
>> Here is some example code:
>>
>> blankEscapedChars :: MonadThrow m => Conduit BS.ByteString m BS.ByteString
>> blankEscapedChars = blankEscapedChars' ""
>>
>> blankEscapedChars' :: MonadThrow m => BS.ByteString -> Conduit BS.ByteString 
>> m BS.ByteString
>> blankEscapedChars' rs = do
>>   mbs <- await
>>   case mbs of
>>     Just bs -> do
>>       let cs = if BS.length rs /= 0 then BS.concat [rs, bs] else bs
>>       let ds = fst (unfoldrN (BS.length cs) unescapeByteString (False, cs))
>>       yield ds
>>       blankEscapedChars' (BS.drop (BS.length ds) cs)
>>     Nothing -> when (BS.length rs > 0) (yield rs)
>>   where
>>     unescapeByteString :: (Bool, ByteString) -> Maybe (Word8, (Bool, 
>> ByteString))
>>     unescapeByteString (wasEscaped, bs) = case BS.uncons bs of
>>       Just (_, cs) | wasEscaped       -> Just (wUnderscore, (False, cs))
>>       Just (c, cs) | c /= wBackslash  -> Just (c, (False, cs))
>>       Just (c, cs)                    -> Just (c, (True, cs))
>>       Nothing                         -> Nothing
>>
>> The above function blankEscapedChars will go find all \ characters and
>> convert the following character to a _. For a 1 MB in memory JSON
>> ByteString, it benches at about 6.6 ms
>>
>> In all my code the basic strategy is the same. await for the next byte
>> string, then use and unfoldrN to produce a new ByteString for yielding.
>>
>> Anyone know of a way to go faster?
>>
>> Cheers,
>>
>> -John
>> ?
>>
> ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.haskell.org/pipermail/beginners/attachments/20160404/2a10b43f/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners


------------------------------

End of Beginners Digest, Vol 94, Issue 2
****************************************

Reply via email to