Pete Kazmier: > I attempted to read Oleg's fold-stream implementation [1] as this > sounds quite appealing to me, but I was completely overwhelmed, > especially with all of the various type signatures used. It would be > great if one of the regular Haskell bloggers (Tom Moertel are you > reading this?) might write a blog entry or two interpreting his > implementation for those of us starting out in Haskell perhaps by > starting out with a non-polymorphic version so as to emphasize the > approach. > > [1] http://okmij.org/ftp/Haskell/fold-stream.lhs
The basic idea of the paper is the use of a left-fold operator as the primary interface for enumarating collections. The recursive version (less general than the non-recursive version) of a left-fold operator for enumerating the lines of a text file might look something like this: > import Control.Monad.Fix > import Control.Exception > import Data.List > import qualified Data.Set as S > import qualified Data.Map as M > import System.IO > > enumLines :: (a -> String -> Either a a) -> a -> FilePath -> IO a > enumLines iter accum filename = do > h <- openFile filename ReadMode > flip fix accum $ > \iterate accum -> do > try_line <- try (hGetLine h) > case try_line of > Left e -> hClose h >> return accum > Right line -> do > case iter accum line of > Left accum -> hClose h >> return accum > Right accum -> iterate accum It needs better exception handling, including a mechanism to pass a non-EOF exception back to the caller without losing the value accumulated up to the point of the exception. Perhaps it might also be useful to generalise the first parameter to an IO action. To use this, you provide an "iteratee", a function which takes an accumulator and a line from the file, and returns a new accumulator embedded in an Either. Using the Left branch causes immediate termination of the enumeration. For example, to search for the first occurrence of each of a set of email headers: > getHeaders :: S.Set String -> FilePath -> IO (S.Set String, M.Map String > String) > getHeaders hdrs file = enumLines findHdrs (hdrs,M.empty) file where > findHdrs accum@(wanted,found) line = > if null line > then Left accum > else > case headerLine line of > Nothing -> Right accum > Just hdr -> > case findDelete hdr wanted of > Nothing -> Right accum > Just wanted -> > let accum = (wanted, M.insert hdr line found) in > if S.null wanted > then Left accum > else Right accum > > headerLine :: String -> Maybe String > headerLine (':':xs) = Just [] > headerLine (x:xs) = fmap (x:) (headerLine xs) > headerLine [] = Nothing > > findDelete :: Ord a => a -> S.Set a -> Maybe (S.Set a) > findDelete e s = if S.member e s > then Just (S.delete e s) > else Nothing It's a bit of a case-analysis nightmare, but when comparing this to previous approaches, note that file traversal and processing are cleanly separated, file handle closure is guaranteed to be timely, file traversal stops as soon as all the required headers have been found, memory usage is minimised. I hope that helps. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe