Andrey, Thanks a lot for your effort! I have the same suspect that the lookahead in the content parser is the problem, but I don't know how to solve it either. At least the I learned from your code that noneOf is also a quite useful parser in this context which I have overlooked. Anyway, if you find a solution it would be great! In the end the task itself doesn't look very specific, but rather general: an alternation between strictly (the headline in my case) and loosely (the content in my case) structured text. It shouldn't be difficult to build a parser for such a setting.
(btw. I am aware the my test parser would (or rather should) parse only the first section. For testing this would be sufficient.) 2013/3/4 Andrey Chudnov <achud...@gmail.com> > Immanuel, > I tried but I couldn't figure it out. Here's a gist with my attempts and > results so far: https://gist.github.com/achudnov/f3af65f11d5162c73064There, > 'test' uses my attempt at specifying the parser, 'test2' uses yours. > Note that your attempt wouldn't parse multiple sections -- for that you > need to use 'many section' instead of just 'section' in 'parse' > ('parseFromFile' in the original). > I think what's going on is the lookahead is wrong, but I'm not sure how > exactly. I'll give it another go tomorrow if I have time. > > /Andrey > > > On 03/03/2013 05:16 PM, Immanuel Normann wrote: > > Andrey, > > Thanks for your attempt, but it doesn't seem to work. The easy part is > the headline, but the content makes trouble. > > Let me write the code a bit more explicit, so you can copy and paste it: > > ------------------------------------------ > {-# LANGUAGE FlexibleContexts #-} > > module Main where > > import Text.Parsec > > data Top = Top String deriving (Show) > data Content = Content String deriving (Show) > data Section = Section Top Content deriving (Show) > > headline :: Stream s m Char => ParsecT s u m Top > headline = manyTill anyChar (char ':' >> newline) >>= return . Top > > content :: Stream s m Char => ParsecT s u m Content > content = manyTill anyChar (try headline) >>= return . Content > > section :: Stream s m Char => ParsecT s u m Section > section = do {h <- headline; c <- content; return (Section h c)} > ------------------------------------------ > > > Assume the following example text is stored in "/tmp/test.txt": > --------------------------- > top 1: > > some text ... bla > > top 2: > > more text ... bla bla > --------------------------- > > Now I run the section parser in ghci against the above mentioned example > text stored in "/tmp/test.txt": > > *Main> parseFromFile section "/tmp/test.txt" > Right (Section (Top "top 1") (Content "")) > > I don't understand the behaviour of the content parser here. Why does it > return ""? Or perhaps more generally, I don't understand the manyTill > combinator (though I read the docs). > > Side remark: of cause for this little task it is probably to much effort > to use parsec. However, my content in fact has an internal structure which > I would like to parse further, but I deliberately abstracted from these > internals as they don't effect my above stated problem. > > Immanuel > > > 2013/3/3 Andrey Chudnov <achud...@gmail.com> > >> Immanuel, >> Since a heading always starts with a new line (and ends with a colon >> followed by a carriage return or just a colon?), I think it might be useful >> to first separate the input into lines and then classify them depending on >> whether it's a heading or not and reassemble them into the value you need. >> You don't even need parsec for that. >> >> However, if you really want to use parsec, you can write something like >> (warning, not tested): >> many $ liftM2 Section headline content >> where headline = anyChar `manyTill` (char ':' >> spaces >> newline) >> content = anyChar `manyTill` (try $ newline >> headline) >> >> /Andrey >> >> >> On 3/3/2013 10:44 AM, Immanuel Normann wrote: >> >>> I am trying to parse a semi structured text with parsec that basically >>> should identify sections. Each section starts with a headline and has an >>> unstructured content - that's all. For instance, consider the following >>> example text (inside the dashed lines): >>> >>> --------------------------- >>> >>> top 1: >>> >>> some text ... bla >>> >>> top 2: >>> >>> more text ... bla bla >>> >>> >>> --------------------------- >>> >>> This should be parsed into a structure like this: >>> >>> [Section (Top 1) (Content "some text ... bla"), Section (Top 1) (Content >>> "more text ... bla")] >>> >>> Say, I have a parser "headline", but the content after a headline could >>> be anything that is different from what "headline" parses. >>> How could the "section" parser making use of "headline" look like? >>> My idea would be to use the "manyTill" combinator, but I don"t find an >>> easy solution. >>> >> > >
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe