Hi Gerard,
these can usually be fixed by using  parse's  /all refinement and 
handeling white space yourself. I find I almost allways do this
when I am doing more than simple string splitting.

make a rule that accepts white space and include it
at all the places you need it.
...
ws: charset [#" " #"^-" #"^/"]
...
english-day: ["Fri." |"Sat." |"Sun." |"Mon." |"Tue." |"Wed." |"Thu." 
|"Every day" some ws]
...
parse/all t4 rules2/expr
...



Gerard Cote wrote:
> Hi everybody,
> 
> in an effort to augment the interest of a friend for REBOL I recently tried 
> to create a simple datamining app that could analyze
> theatre information about films presentation days and hours. The site from 
> which I retrieve the information comes from the french
> site http://cinemaquebec.com).
> 
> In fact for the moment my biggest problem come from the fact that I don't 
> fully understand the way PARSE works when it encounters
> newline characters.
> 
> Let me give a simplified example extracted from the site to illustrate my 
> point:
> t4: { Fri.: 1:00, 3:00, 7:00 Sat., sun., mon., tue., wed., thu.: 10:00am, 
> 1:00, 3:00, 9:00, 10:00}
> 
> Here we have one day (Fri.) followed by a colon(:) followed again by 3 times.
> Right after this cycle is done again with not one but 6 days separated by (,) 
> again followed by a colon (:) and 5 other times.
> 
> I wrote a block of relatively simple rules that apply well against this 
> simple example.
> 
> Here is the result I get from the parse:
> 
>>>parse t4 rules2/expr
> 
> which-day:  "Fri." 4
> Hour: "1" 1
> Min: "00" 2
> which-hour:  " 1:00" 5
> Hour: "3" 1
> Min: "00" 2
> which-hour2:  " 3:00" 5
> Hour: "7" 1
> Min: "00" 2
> which-hour2:  " 7:00 " 6
> which-days:  "Fri.: 1:00, 3:00, 7:00 " 23
> which-day:  "Sat." 4
> which-day2:  " sun." 5
> which-day2:  " mon." 5
> which-day2:  " tue." 5
> which-day2:  " wed." 5
> which-day2:  " thu." 5
> Hour: "10" 2
> Min: "00" 2
> which-hour:  " 10:00" 6
> Hour: "1" 1
> Min: "00" 2
> which-hour2:  " 1:00" 5
> Hour: "3" 1
> Min: "00" 2
> which-hour2:  " 3:00" 5
> Hour: "9" 1
> Min: "00" 2
> which-hour2:  " 9:00" 5
> Hour: "10" 2
> Min: "00" 2
> which-hour2:  " 10:00" 6
> which-days2:  {Sat., sun., mon., tue., wed., thu.: 10:00am, 1:00, 3:00, 9:00, 
> 10:00} 68
> film-hours:  { Fri.: 1:00, 3:00, 7:00 Sat., sun., mon., tue., wed., thu.: 
> 10:00am, 1:00, 3:00, 9
> :00, 10:00}
> ----------------------------------------------------------
> == true
> 
>  Now I include my parse rules if I want to let those interested understand 
> the way I did.
>  (for convenience I also attach them to this msg.)
> You'll notice the many PRINTs to help me navigate in parallel with parse.
> 
> rules2: make object! [
>  expr: [copy film-hours film-hours-rules
>      (print ["film-hours: " mold film-hours newline
>           "----------------------------------------------------------"
>        newline])
>      to end
>     ]
> 
>  film-hours-rules: [copy which-days days-group
>         (print ["which-days: " mold which-days length? which-days])
>       any [copy which-days2 days-group
>        (print ["which-days2: " mold which-days2 length? which-days2])
>        ]
>          ]
> 
>  days-group: [copy which-day day
>       (print ["which-day: " mold which-day length? which-day])
>         any ["," copy which-day2 day
>       (print ["which-day2: " mold which-day2 length? which-day2])
>             ]
>           ":"
>         copy which-hour show-hour
>       (print ["which-hour: " mold which-hour length? which-hour])
>         0 1 "am"
>         any ["," copy which-hour2 show-hour
>        (print ["which-hour2: " mold which-hour2 length? which-hour2])
>         0 1 "am"
>                ]
>     ]
> 
>  digit: charset [#"0" - #"9"]
>  hour: [digit 0 1 digit]
>  minutes: [digit digit]
>  show-hour: [copy this-hour hour (print ["Hour:" mold this-hour length? 
> this-hour])
>     ":"
>     copy this-min minutes (print ["Min:" mold this-min length? this-min])]
> 
>  english-day: ["Fri." |"Sat." |"Sun." |"Mon." |"Tue." |"Wed." |"Thu." |"Every 
> day"]
>  french-day: ["Ven." |"Sam." |"Dim." |"Lun." |"Mar." |"Mer." |"Jeu." |"Tous 
> les jours"]
>  day: ["Fri." |"Sat." |"Sun." |"Mon." |"Tue." |"Wed." |"Thu." |"Every day"]
> ]
> 
> Now my problem is stated as this:
> 
> When I submit a broken (newline) set of data in the form of a new t4 as 
> follows, my rules no more work:
> t4: { Fri.: 1:00, 3:00, 7:00
> Sat., sun., mon., tue., wed., thu.: 10:00am, 1:00, 3:00, 9:00, 10:00}
> 
> The new results are now more like this:
> 
> 
>>>parse t4 rules2/expr
> 
> which-day:  "Fri." 4
> Hour: "1" 1
> Min: "00" 2
> which-hour:  " 1:00" 5
> Hour: "3" 1
> Min: "00" 2
> which-hour2:  " 3:00" 5
> Hour: "7" 1
> Min: "00" 2
> which-hour2:  " 7:00" 5
> which-days:  "Fri.: 1:00, 3:00, 7:00" 22
> film-hours:  " Fri.: 1:00, 3:00, 7:00"
> ----------------------------------------------------------
> 
> == true
> 
> The second part of results have been chopped.
> 
> Later this chopped part mixed with the next title film when
> I complete my rules to get the title after the last presentation time
> 
> Any help is appreciated.
> 
> Regards, Gerard
> 
> 
> 
> 
> 
> -- Binary/unsupported file stripped by Ecartis --
> -- Type: text/x-rebol
> -- File: parse-film-times.r
> 
> 

-- 
To unsubscribe from the list, just send an email to 
lists at rebol.com with unsubscribe as the subject.

Reply via email to