[REBOL] parse or Re:(5)

2000-09-21 Thread rryost

So what?  Seems the application that's going to use the block of paragraphs
could easily deal with the "" for an empty paragraph.  To me, that's
preferable than trying to outguess the final character of every conceivable
paragraph!

Russell [EMAIL PROTECTED]
- Original Message -
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 20, 2000 10:00 PM
Subject: [REBOL] parse or Re:(4)


 Ahh, but this is not enough, because if the report has more than one
newline
 character following a paragraph, you will end up with empty paragraphs.

  But what if I'm trying to parse a report and wish to make each
  paragraph a separate string within a block?
 
 Simple parsing with the /all refinement will do this in one step.  The
/all
 refiinement disables all the default delimiters and uses only the
supplied
 string of characters to break apart the target string. In this case,
we'll
 use the control character "^/", end of line, commonly used to end a
 paragraph.
 
 A console session follows  that illustrates this. ( Note I've inserted an
 extra "test" period within the first paragraph.)
 
  paragraphs: {First. paragraph.^/Second "paragraph."^/Third paragraph.}
 == {First. paragraph.
 Second "paragraph."
 Third paragraph.}
 
  ; Now apply the simple parse/all with only the single break character
 "^/"
 
  parse/all paragraphs "^/"
 == ["First. paragraph." {Second "paragraph."} "Third paragraph."]
 
 end of console session.
 
 This seems to be just what is wanted.  {}'s are used for the second item
 because it included " 's.  The period at the end of First is ignored,
along
 with all the other spaces, ", etc because the /all refinement disabled
the
 usual default break chars.





[REBOL] parse or Re:(5)

2000-09-20 Thread RChristiansen

 I assumed that
 this is NOT what you wanted, but rather you wanted to copy through
 either {.^/} or {."^} WHICHEVER COMES NEXT.  (Natural language
 text munching is a real pain, speaking from personal experience! ;-)

Yes, this is what I was looking for. As someone who has never 
parsed anything before using REBOL (there will be more like me!) the 
parsing rules are confusing to read in the REBOL docs. My 
inclination is to want to use a simple statement which will parse until 
a set of characters is reached OR a different set of characters is 
reached, whichever comes along first and next.

 The strategies I've thought of (I don't have time to code, compare,
 and recommend right at the moment) are:
 
 1)  Write more complicated parse rules, that either
 1a)  parse to newline, append the copied chunk to a paragraph
  string under construction, then look at the tail end of
  the last chunk to see whether it can be extended or whether
  a new paragraph should be started (based on whether it
  looked like the end of a sentence).
 1b)  parse to period, grab and append the next character if it
  is a quotation mark, append to paragraph under construction, and
  start a new paragraph if the next character is newline.
 2)  Use simpler parsing (break on newlines), then make a postpass
 across the block of "lines", gluing back together wherever the
 boundary isn't the end of a sentence.

You missed another option, which I had been using previously. Here 
is the function:

breakdown-content: func [
"breakdown an e-mail content field into its parts"
msg [object!] "e-mail message"
][
article-info: msg/content
end-of-paragraph: rejoin [{.} newline]
replace/all article-info end-of-paragraph {.~}
content-parts: copy []
foreach part parse/all article-info {~} [ append content-parts 
trim/lines part ]
]

In other words, replace all instances of a set of characters with a new 
character that can be recognized later. The above example needs to 
be fixed because it only replaces instances of {.^/} with "~" and I've 
discovered the tilde is a bad choice, anyway. I need to also be able 
to replace any set of characters you might find at the end of a 
paragraph, including {."^/} and {!^/} and {?^/} and {:^/} and {...^/} and 
I'm sure there are more.

I was hoping there would be a quick way to use parse instead of 
replacing characters first and then parsing.

-Ryan




[REBOL] parse or Re:(5)

2000-09-20 Thread jeff



  Howdy, Joel:

 Notice that now the result block has only TWO elements!
 Since the first test (the thru {.^/} part) can succeed by
 grabbing text all the way to the end of the SECOND
 paragraph, it does so, putting the first two paragraphs
 into the first output string.  I assumed that this is NOT
 what you wanted, but rather you wanted to copy through
 either {.^/} or {."^} WHICHEVER COMES NEXT.  (Natural
 language text munching is a real pain, speaking from
 personal experience! ;-)

  Sure.  In the interests of advancing its popularity, I
  offered up a simplistic example of PARSE. :-)

  Paragraphs can end in a variety of punctuation ("!?.-;:),
  with different quantities (as Russ pointed out), no? 

  -jeff