Re: [Haskell-cafe] Re: A suggestion for the next high profile Haskell project

Lennart Augustsson Mon, 18 Dec 2006 16:23:29 -0800

I'm allergic to hex constants.  Surely, they are not necessary.


        -- Lennart

On Dec 18, 2006, at 18:14 , Chris Kuklewicz wrote:

Hi all (and Don!),

  I have some rewritten versions of readInt below...

Bulat Ziganshin wrote:
Hello Donald,

Monday, December 18, 2006, 3:51:48 AM, you wrote:
Haskell can't provide fast execution speed unless very low-level
programming style is used (which is much harder to do inHaskell than in C,
see one of my last messages for example) AND jhc compiler is used
I have to dispute this Bulat's characterisation here. We cansolve lots
of nice problems and have high performance *right now*. Particularly
concurrency problems, and ones involving streams of bytestrings.
No need to leave the safety of GHC either, nor resort to lowlevel evil
code.
let's go further in this long-term discussion. i've read Shootoutproblems
and concluded that there are only 2 tasks which speed is dependent on
code-generation abilities of compiler, all other tasks aredependent onspeed of used libraries. just for example - in one test TCL wasfastestlanguage. why? because this test contained almost nothing but 1000calls tothe regex engine with very large strings and TCL regex engine wasfastest
the same applies to the two above-mentioned areas - GHC wins inconcurrency
tests just because only built-in libraries considered, and GHC has a
lightweight threads library built-in while C compilers don't
with ByteString library, i know at least one example, where youhave added
function - readInt - to the library only to win in Shootout test
This obsession with mutable-variable, imperative code isunhealthy, Bulat ;)
so why your readInt routine is written in imperative way? ;)
ajb:
The PGP format is heavily character stream-based. We know howhorriblethe performance of character streams are in Haskell. On onehand, thiswould be an excellent test case. On the other hand, performancewould
indeed suck now.
Unless you used a stream of lazy bytestrings!
As Duncan did for his pure gzip and bzip2 bindings:
these are binding to existing C libs. if you try to convince usthat to getfast FPS routines one need to write them in C and then provideHaskell
binding, i will 100% agree
otherwise, please explain me why your own function, readInt, don'tuse
these fascinating stream fusion capabilities? ;)
P.S. The comments on this thread makes me think that the stateof the
art high perf programming in Haskell isn't widely known. Bulat-style
imperative Haskell is rarely (ever?) needed in the GHC 6.6Haskell code
I'm writing in these days. Solving large data problems is feasible
*right now* using bytestrings.
may be it's me who wrote FPS library? :) let's agree that high-performancecode can be written in C and then called from Haskell. and when*you* wantto have real performance, you write something very far from youroptimistic
words:

readInt :: ByteString -> Maybe (Int, ByteString)
readInt as
    | null as   = Nothing
    | otherwise =
        case unsafeHead as of
            '-' -> loop True  0 0 (unsafeTail as)
            '+' -> loop False 0 0 (unsafeTail as)
            _   -> loop False 0 0 as
where loop :: Bool -> Int -> Int -> ByteString -> Maybe (Int,ByteString)
          STRICT4(loop)
          loop neg i n ps
              | null ps   = end neg i n ps
              | otherwise =
                  case B.unsafeHead ps of
                    w | w >= 0x30
                     && w <= 0x39 -> loop neg (i+1)
(n * 10 + (fromIntegralw - 0x30))
                                          (unsafeTail ps)
                      | otherwise -> end neg i n ps

          end _    0 _ _  = Nothing
          end True _ n ps = Just (negate n, ps)
          end _    _ n ps = Just (n, ps)
Trying to write code using an imperative style is likely to domore harmthan good, and certainly not something to suggest to beginners onthe
mailing list ;)
of course, if you want to fool beginners, you can continue to singthese
songs :D
That code for readInt has no mutable state or IO, and it may beimperative
style, it is not much like c-code.
I liked my readInt1 version below, which actually benchmarks fasterthan the GHC6.6 Data.ByteString.Char8.readInt (on the little "test r" below).What a
pleasant surprise.
The readInt2 version below is even more functional, but takes about1.7 times aslong to run "test r". (gaged from running "time ./ReadInt.2" onMac OS X on a
G4 cpu).  The latest fusion in darcs fps might be better...
module Main(main) where

import Control.Arrow((***))
import Data.Char(chr,ord)
import Data.ByteString(ByteString)
import qualified Data.ByteString as B(null,foldl',span)
import qualified Data.ByteString.Char8 as C (readInt,pack)
import qualified Data.ByteString.Base as B(unsafeHead,unsafeTail)
import Data.Ix(inRange)
import Data.Maybe(fromJust)
import Data.Word(Word8)

default ()

{-# INLINE chr8 #-}
chr8 :: Word8 -> Char
chr8 = chr . fromEnum
{-# INLINE ord8 #-}
ord8 :: Char -> Word8
ord8 = toEnum . ord

{-# INLINE decompose #-}
decompose :: a -> (Word8 -> ByteString -> a) -> ByteString -> a
decompose whenNull withHeadTail bs =
if B.null bs then whenNull else (withHeadTail $! (B.unsafeHeadbs)) $! (B.unsafeTail bs)
-- This does not do any bound checking if the Int overflows
readInt1 :: ByteString -> Maybe (Int, ByteString)
readInt1 bs = decompose Nothing (\h t ->
    case h of
      0x2d -> fmap (negate *** id) $ first t -- '-'
      0x2b -> first t                        -- '+'
      _    -> first bs) bs
  where first :: ByteString -> Maybe (Int, ByteString)
        first = decompose Nothing (\h t ->
            if inRange (0x30,0x39) h
              then Just (loop t $! (fromIntegral h-0x30))
              else Nothing)
        loop :: ByteString -> Int -> (Int, ByteString)
        loop bs n = decompose (n,bs) (\h t ->
            if inRange (0x30,0x39) h
              then loop t $! (n*10+(fromIntegral h-0x30))
              else (n,bs)) bs


readInt2 :: ByteString -> Maybe (Int,ByteString)
readInt2 bs =  decompose Nothing (\h t ->
    case h of
      0x2d -> fmap (negate *** id) $ toInt t -- '-'
      0x2b -> toInt t                        -- '+'
      _    -> toInt bs) bs
  where toInt :: ByteString -> Maybe (Int,ByteString)
toInt bs = let (digits,rest) = B.span (inRange(0x30,0x39)) bsin if B.null digits then Nothing else Just(convert digits,rest)
        convert :: ByteString -> Int
        convert = B.foldl' (\n h -> n*10 + (fromIntegral h - 0x30)) 0
test r = let a = take 1000000 $ cycle [C.pack "13247897",C.pack"-13247896"]
         in (sum . map (fst . fromJust . r) $ a
            , take 4 a
            , map r $ take 4 a)

main = print $ test $ (readInt2)
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Re: A suggestion for the next high profile Haskell project

Reply via email to