Carlo, Thanks a lot! This looks very promising (though I have to test it for my purpose more in depth). As you mention, the key seems to be the optionMaybe combinator. Thanks for pointing to it.
Immanuel 2013/3/5 Carlo Hamalainen <carlo.hamalai...@gmail.com> > On Mon, Mar 4, 2013 at 1:44 AM, Immanuel Normann < > immanuel.norm...@googlemail.com> wrote: > >> I am trying to parse a semi structured text with parsec that basically >> should identify sections. Each section starts with a headline and has an >> unstructured content - that's all. >> > > Here's my attempt: https://gist.github.com/carlohamalainen/5087207 > > {-# LANGUAGE FlexibleContexts #-} > > import Text.Parsec > import Control.Applicative hiding ((<|>),many) > > -- Example input: > > {- > top 1: > > some text ... bla > > top 2: > > more text ... bla bla > > -} > > data Top = Top String deriving (Show) > data Content = Content [String] deriving (Show) > data Section = Section Top Content deriving (Show) > > headline = do > t <- many1 (noneOf ":\n") > char ':' > newline > > return $ Top t > > contentLine = do > x <- many (noneOf ":\n") > newline > return x > > content = do > line <- optionMaybe (try contentLine) > > case line of Just x -> do xs <- content > return (x:xs) > _ -> return [] > > section = do > h <- headline > c <- Content <$> content > return $ Section h c > > main = do > x <- readFile "simple.txt" > print $ parse (many section) "" x > > > Example run using your sample data: > > $ runhaskell Simple.hs > Right [Section (Top "top 1") (Content ["","some text ... bla",""]),Section > (Top "top 2") (Content ["","more text ... bla bla",""])] > > Notes: > > * I had to assume that a content line does not contain a ':', because that > is the only way to distinguish a head-line (correct me if I'm wrong). > > * The key was to use optionMaybe along with try; see the definition of > content. > > * I haven't tested this code on very large inputs. > > * I slightly changed the definition of Content to have a list of Strings, > one for each line. I'm sure this could be altered if you wanted to retain > all whitespace. > > * I am still new to Parsec, so don't take this as the definitive answer ;-) > > -- > Carlo Hamalainen > http://carlo-hamalainen.net
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe