[REBOL] parse or Re:(5)

RChristiansen Wed, 20 Sep 2000 16:15:40 -0700
> I assumed that
> this is NOT what you wanted, but rather you wanted to copy through
> either {.^/} or {."^} WHICHEVER COMES NEXT.  (Natural language
> text munching is a real pain, speaking from personal experience! ;-)

Yes, this is what I was looking for. As someone who has never 
parsed anything before using REBOL (there will be more like me!) the 
parsing rules are confusing to read in the REBOL docs. My 
inclination is to want to use a simple statement which will parse until 
a set of characters is reached OR a different set of characters is 
reached, whichever comes along first and next.

> The strategies I've thought of (I don't have time to code, compare,
> and recommend right at the moment) are:
> 
> 1)  Write more complicated parse rules, that either
>     1a)  parse to newline, append the copied chunk to a paragraph
>          string under construction, then look at the tail end of
>          the last chunk to see whether it can be extended or whether
>          a new paragraph should be started (based on whether it
>          looked like the end of a sentence).
>     1b)  parse to period, grab and append the next character if it
>          is a quotation mark, append to paragraph under construction, and
>          start a new paragraph if the next character is newline.
> 2)  Use simpler parsing (break on newlines), then make a postpass
>     across the block of "lines", gluing back together wherever the
>     boundary isn't the end of a sentence.

You missed another option, which I had been using previously. Here 
is the function:

breakdown-content: func [
        "breakdown an e-mail content field into its parts"
        msg [object!] "e-mail message"
][
        article-info: msg/content
        end-of-paragraph: rejoin [{.} newline]
        replace/all article-info end-of-paragraph {.~}
        content-parts: copy []
        foreach part parse/all article-info {~} [ append content-parts 
trim/lines part ]
]

In other words, replace all instances of a set of characters with a new 
character that can be recognized later. The above example needs to 
be fixed because it only replaces instances of {.^/} with "~" and I've 
discovered the tilde is a bad choice, anyway. I need to also be able 
to replace any set of characters you might find at the end of a 
paragraph, including {."^/} and {!^/} and {?^/} and {:^/} and {...^/} and 
I'm sure there are more.

I was hoping there would be a quick way to use parse instead of 
replacing characters first and then parsing.

-Ryan
[REBOL] parse or Re:(5)

Reply via email to