On 13/11/2009, at 18:04, Bryan O'Sullivan wrote:

main = do
  args <- getArgs
  forM_ args $ \a -> do
    s <- B.readFile a
    let t = T.decodeUtf8 s
    print (T.length t)

The streamUtf8 function looks roughly like this:

streamUtf8 :: OnDecodeError -> ByteString -> Stream Char
streamUtf8 onErr bs = Stream next 0 (maxSize l)
    where
      l = B.length bs
      next i
          | i >= l =          Done
          | U8.validate1 x1 = Yield (unsafeChr8 x1) (i+1)
          | {- etc. -}
{-# INLINE [0] streamUtf8 #-}

The values being Yielded from the inner function are, as you can see, themselves constructed by functions.

Originally, with the inner next function manually marked as INLINE, I found that functions like unsafeChr8 were not being inlined by GHC, and performance was terrible due to the amount of boxing and unboxing happening in the inner loop.

Let's see if I understand this correctly. In your code, decodeUtf8 calls streamUtf8. They both get inlined into main but then unsafeChr8 does not. Correct?

If so, are you sure that unsafeChr8 is really called in the simplified code? IIUC, this isn't necessary if you don't actually inspect the Chars (which length presumably doesn't). So perhaps GHC removes the call altogether? If not, what does it do with the result?

I somehow stumbled on the idea of removing the INLINE annotation from next, and performance suddenly improved by a significant integer multiple. This caused the body of streamUtf8 to be inlined into my test program, as I hoped.

Or are you saying that it's streamUtf8 that isn't getting inlined into main?

length :: Text -> Int
length t = Stream.length (Stream.stream t)
{-# INLINE length #-}

And the streaming length is:

length :: Stream Char -> Int
length = S.lengthI
{-# INLINE[1] length #-}

And the lengthI function is defined more generally, in the hope that I could use it for both Int and Int64 lengths:

lengthI :: Integral a => Stream Char -> a
lengthI (Stream next s0 _len) = loop_length 0 s0
    where
      loop_length !z s  = case next s of
                           Done       -> z
                           Skip    s' -> loop_length z s'
                           Yield _ s' -> loop_length (z + 1) s'
{-# INLINE[0] lengthI #-}

Unfortunately, although lengthI is inlined into the Int-typed streaming length function, that function is not in turn marked with __inline_me in simplifier output, so the length/decodeUtf8 loops do not fuse. The code is pretty fast, but there's still a lot of boxing and unboxing happening for all the Yields.

Does changing the definition of length to

length = id S.lengthI

help? GHC used to have a bug in this area but I haven't been bitten by it for quite some time.

Also, I wonder how Stream.stream is defined. Is it strict in Text? If it isn't, does making it strict help?

All of these flip-flops in inliner behaviour are very difficult to understand, and they seem to be exceedingly fragile. Should I expect the situation to be better with the new inliner in 6.12?

I suspect that the fragility you are seeing is just a symptom of a problem in how the UTF-8 library implements stream fusion. It's a bit tricky to get everything right. Generally, I've found the simplifier to be quite stable and predictable in the last year or so. Simon is working hard on making it even better. If you have a spare minute, perhaps you could try the HEAD with the new inliner and see if that helps? Although I somewhat doubt it, to be honest.

Roman


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to