[REBOL] parse or Re:(5)
So what? Seems the application that's going to use the block of paragraphs could easily deal with the "" for an empty paragraph. To me, that's preferable than trying to outguess the final character of every conceivable paragraph! Russell [EMAIL PROTECTED] - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, September 20, 2000 10:00 PM Subject: [REBOL] parse or Re:(4) Ahh, but this is not enough, because if the report has more than one newline character following a paragraph, you will end up with empty paragraphs. But what if I'm trying to parse a report and wish to make each paragraph a separate string within a block? Simple parsing with the /all refinement will do this in one step. The /all refiinement disables all the default delimiters and uses only the supplied string of characters to break apart the target string. In this case, we'll use the control character "^/", end of line, commonly used to end a paragraph. A console session follows that illustrates this. ( Note I've inserted an extra "test" period within the first paragraph.) paragraphs: {First. paragraph.^/Second "paragraph."^/Third paragraph.} == {First. paragraph. Second "paragraph." Third paragraph.} ; Now apply the simple parse/all with only the single break character "^/" parse/all paragraphs "^/" == ["First. paragraph." {Second "paragraph."} "Third paragraph."] end of console session. This seems to be just what is wanted. {}'s are used for the second item because it included " 's. The period at the end of First is ignored, along with all the other spaces, ", etc because the /all refinement disabled the usual default break chars.
[REBOL] parse or Re:(5)
I assumed that this is NOT what you wanted, but rather you wanted to copy through either {.^/} or {."^} WHICHEVER COMES NEXT. (Natural language text munching is a real pain, speaking from personal experience! ;-) Yes, this is what I was looking for. As someone who has never parsed anything before using REBOL (there will be more like me!) the parsing rules are confusing to read in the REBOL docs. My inclination is to want to use a simple statement which will parse until a set of characters is reached OR a different set of characters is reached, whichever comes along first and next. The strategies I've thought of (I don't have time to code, compare, and recommend right at the moment) are: 1) Write more complicated parse rules, that either 1a) parse to newline, append the copied chunk to a paragraph string under construction, then look at the tail end of the last chunk to see whether it can be extended or whether a new paragraph should be started (based on whether it looked like the end of a sentence). 1b) parse to period, grab and append the next character if it is a quotation mark, append to paragraph under construction, and start a new paragraph if the next character is newline. 2) Use simpler parsing (break on newlines), then make a postpass across the block of "lines", gluing back together wherever the boundary isn't the end of a sentence. You missed another option, which I had been using previously. Here is the function: breakdown-content: func [ "breakdown an e-mail content field into its parts" msg [object!] "e-mail message" ][ article-info: msg/content end-of-paragraph: rejoin [{.} newline] replace/all article-info end-of-paragraph {.~} content-parts: copy [] foreach part parse/all article-info {~} [ append content-parts trim/lines part ] ] In other words, replace all instances of a set of characters with a new character that can be recognized later. The above example needs to be fixed because it only replaces instances of {.^/} with "~" and I've discovered the tilde is a bad choice, anyway. I need to also be able to replace any set of characters you might find at the end of a paragraph, including {."^/} and {!^/} and {?^/} and {:^/} and {...^/} and I'm sure there are more. I was hoping there would be a quick way to use parse instead of replacing characters first and then parsing. -Ryan
[REBOL] parse or Re:(5)
Howdy, Joel: Notice that now the result block has only TWO elements! Since the first test (the thru {.^/} part) can succeed by grabbing text all the way to the end of the SECOND paragraph, it does so, putting the first two paragraphs into the first output string. I assumed that this is NOT what you wanted, but rather you wanted to copy through either {.^/} or {."^} WHICHEVER COMES NEXT. (Natural language text munching is a real pain, speaking from personal experience! ;-) Sure. In the interests of advancing its popularity, I offered up a simplistic example of PARSE. :-) Paragraphs can end in a variety of punctuation ("!?.-;:), with different quantities (as Russ pointed out), no? -jeff