On Tue, Feb 13, 2007 at 10:43:11PM -0500, Jefferson Heard wrote: > I am running GHC 2.6 now, and am using -O3 as my optimization parameter. I'm
I think you will get much better performance with GHC 6.6. The optimizer has been improved a *lot* in the last 10 years. (I hope that was a typo!!) > Non-lazy version > > { > module Main > where > > import qualified FileReader > > } > > %wrapper "basic" > > $letter = [a-zA-Z] > $digit = 0-9 > $alphanum = [a-zA-Z0-9] > $punct = [\! \@ \# \$ \% \^ \& \* \( \) \_ \- \+ \= \{ \[ \} \] \\ \| \; \: > \' > \" \, \. \? \/ \` \~] > $dec = \. > $posneg = [\- \+] > > @date1 = jan($punct|uary)?\ $digit{1,2}(\,\ $digit{2,4})? > | feb($punct|ruary)?\ $digit{1,2}(\,\ $digit{2,4})? > | mar($punct|ch)?\ $digit{1,2}(\,\ $digit{2,4})? > | apr($punct|il)?\ $digit{1,2}(\,\ $digit{2,4})? > | may?\ $digit{1,2}(\,\ $digit{2,4})? > | jun($punct|e)?\ $digit{1,2}(\,\ $digit{2,4})? > | jul($punct|y)?\ $digit{1,2}(\,\ $digit{2,4})? > | aug($punct|ust)?\ $digit{1,2}(\,\ $digit{2,4})? > | sep($punct|tember)?\ $digit{1,2}(\,\ $digit{2,4})? > | sept($punct)?\ $digit{1,2}(\,\ $digit{2,4})? > | oct($punct|ober)?\ $digit{1,2}(\,\ $digit{2,4})? > | nov($punct|ember)?\ $digit{1,2}(\,\ $digit{2,4})? > | dec($punct|ember)?\ $digit{1,2}(\,\ $digit{2,4})? > > @date2 = $digit{1,2} $punct $digit{1,2} $punct $digit{2,4} > > @time = $digit{1,2} \: $digit{2} (am|pm)? > > @word = $alphanum+ > > @number = $posneg? $digit+ > | $posneg? $digit+ $dec $digit+ > | $posneg? $digit+ (\,$digit{3})+ > | $posneg? $digit? (\,$digit{3})+ $dec $digit+ > > $white = [\t\r\n\v\f\ ] > > @doc = \< DOC \> > @tag = \< $alphanum+ \> > | \<\/ $alphanum+ \> > > tokens :- > @doc { \s -> "" } > @tag ; > $white+ ; > @time { \s -> s } > @number { \s -> s } > @word { \s -> s } > $punct ; > . ; > > { > > printCount c [] = print c > printCount c (l:ls) = if l == "" then printCount (c+1) ls else printCount c ls > > main = do > file <- readFile "trecfile1" > printCount 0 (alexScanTokens file) > > } FTR, regular strings are lazy - too lazy, which is where the performance problems come from. > -- > ------------------------------------------------------------------------------------------------------------ > Version depending on ByteString.Lazy -- note that the grammar is the same, so > it has been omitted > -- > ------------------------------------------------------------------------------------------------------------ > > ... grammar ... > > { > type AlexInput = (Char, -- previous char > B.ByteString) -- current input string > > takebytes :: Int -> B.ByteString -> String > takebytes (0) _ = "" > takebytes n s = c : takebytes (n-1) cs > where c = B.index s 0 > cs = B.drop 1 s > > alexGetChar :: AlexInput -> Maybe (Char,AlexInput) > alexGetChar (_, bytestring) > | bytestring == B.empty = Nothing > | otherwise = Just (c , (c,cs)) > where c = B.index bytestring 0 > cs = B.drop 1 bytestring Hm, you might do better with more specialized functions. > alexGetChar :: AlexInput -> Maybe (Char,AlexInput) > alexGetChar (_, bytestring) > | B.null bytestring = Nothing > | otherwise = Just (c , (c,cs)) > where c = B.head bytestring > cs = B.tail bytestring or even: > alexGetChar :: AlexInput -> Maybe (Char,AlexInput) > alexGetChar (_, bytestring) > | B.null bytestring = Nothing > | otherwise = Just (c , (c,cs)) > where c = B.unsafeHead bytestring > cs = B.unsafeTail bytestring > alexInputPrevChar :: AlexInput -> Char > alexInputPrevChar (c,_) = c If you are certian this isn't the first character, you might do better using B.unsafeIndex (-1). > alexScanTokens :: B.ByteString -> [String] > alexScanTokens str = go ('\n',str) > where go inp@(_,str) = > case alexScan inp 0 of > AlexToken inp' len act -> act (takebytes len str) : go inp' > AlexSkip inp' len -> go inp' > AlexEOF -> [] > AlexError _ -> error "lexical error" > > > > > printCount :: Int -> [String] -> IO () > printCount c [] = print c > printCount c (l:ls) = if l == "" then printCount (c+1) ls else printCount c ls > > main = do > file <- B.readFile "trecfile1" > printCount 0 (alexScanTokens file) > > } _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe