[REBOL] Re: Parse limitation ?

2003-10-09 Thread "Robert M. Münch"

On Wed, 8 Oct 2003 12:10:42 +0200, patrick à la poste 
<[EMAIL PROTECTED]> wrote:

> myText: {}

Hi, one other trick beside doing by-hand backtracking (which is very 
powerful) is to define more than one rule set and use parse several times. 
Why try to write on rule set at all? No one tries to solve a programming 
problem with one function.

So, what could be done:
1. We could parse for < and > and copy all we have.
2. The copied string can than be parsed again with an other rule set.

parse myText [ some [
to "<" copy sub-parse to ">" ( parse sub-parse [
  "HREF=" (print "href")
| "SRC=" (print "src")
])
]
]

What needs to be remember is that a rule which uses | only hit once. The 
first part that makes it to the end will terminate further evaluation. The 
logic is clear, the rule did it's job, why continue?

While doing make-doc-pro I have used this approach at several places, 
where parse rules would get very complicated otherwise.

-- 
Robert M. Münch
Management & IT Freelancer
Mobile: +49 (177) 245 2802
http://www.robertmuench.de

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Gregg Irwin

Hi Patrick,

pàlp> I'd like to parse a string searching for two things at the same time.
pàlp> it seems to me that this is impossible.
...
pàlp> parse myText [
pàlp> any [ thru "HREF=" copy target to ">" (print target) |
pàlp>   thru "SRC=" copy target to ">" (print target)
pàlp> ] ; any   
pàlp> ] ; parse

I'm pretty sure this same thing came up not too long ago on the list.
See if rebol.net/list has it, or if you've been around for at least a
couple months, you should have it too (the solution that is). If you
can't find it, let me know and I'll see if I can dig it up.

The issue has to do with wanting the THRU rule to be smarter than it
is. PARSE doesn't do backtracking, so it will keep going forward
until it finds the next occurrence of the first rule you give it,
which isn't what you want, but it isn't wrong either. :)

-- Gregg 

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Gregg Irwin

Hi Petr,

PK> Yes, I can imagine it, really. The problem is (at least for me), that I
PK> am able to understand such grammar once someone creates it, but am not 
PK> able to come up with it to solve problem at hand. Will you blame us 
PK> little bit underskilled rebol programmers now? :-)

It's often a challenge for me as well, but I think it's because of
what Gabriele said; I don't think in the right terms. Once I do that,
it seems to be much easier. The problem, though, isn't with REBOL or
PARSE, it has to do with grammar design, which most of us don't have
much (or any) experience with.

-- Gregg 

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Gabriele Santilli

Hi Petr,

On Wednesday, October 8, 2003, 6:57:38 PM, you wrote:

PK> Well - I am not sure my example will be any slower, except the penalty
PK> of extra function call. First, I pass it string at certain position and

First of all, FIND searches char by char too. It's just way faster
because  it's  native;  but,  if you end up searching the string n
times, you get n*m complexity (where m is the size of the string),
and  this  scales  up so badly that in the end it gets slower than
using a PARSE loop.

Probably  FIND is still faster for two or three alternatives. We'd
have to test it. When the alternatives are just strings, you could
speed  up  the  PARSE loop using a charset, and I have the feeling
that  PARSE  is  as  fast  as  FIND  in  such a case, so the PARSE
solution would be n times faster for n alternatives.

PK> Yes, I can imagine it, really. The problem is (at least for me), that I
PK> am able to understand such grammar once someone creates it, but am not
PK> able to come up with it to solve problem at hand. Will you blame us
PK> little bit underskilled rebol programmers now? :-)

Not at all, but you are underestimating yourself. ;-)

PK> Sounds interesting. I am just curious, if e.g. html only (not trying to
PK> complicate things with java-script for now :-) browser would be possible
PK> with Rebol? IIRC Python has web browser. Just curious.

The problem for a web browser is not HTML parsing, it's rendering.
In my dream-future, I will finish the PDF Maker 2 and then write a
HTML2PDF  translator. Rendering in View would be possible too, but
I'd  like  RT  to  offer us some kind of native rich text handling
first... you see, I'm too lazy to do all of that myself. ;-)

Who  needs  a  REBOL  web  browser?  I'd like an email client much
better.

Regards,
   Gabriele.
-- 
Gabriele Santilli <[EMAIL PROTECTED]>  --  REBOL Programmer
Amiga Group Italia sez. L'Aquila  ---   SOON: http://www.rebol.it/

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Gabriele Santilli

Hi Maxim,

On Wednesday, October 8, 2003, 6:29:03 PM, you wrote:

MOA> can you give a short example of a grammar that would extract the text from

MOA>   paragraph infocontent end>

MOA> and returns a block such as:
[...]

Well, nested tags are not valid HTML so this does not handle them,
but  maybe  it could be of some inspiration. (Sorry for Joel-style
indentation. ;-)

tag-rule:
[   "<" m1:
[   "/" word thru ">" (end-tag to word! word-res) 
  | "!--" thru "-->" m2: (add-contents to tag! copy/part m1 back m2)
  | "!DOCTYPE" thru ">" m2: (add-contents to tag! copy/part m1 back m2)
  | "?xml" thru "?>" m2: (add-contents to tag! copy/part m1 back m2)
  | word any space (clear attributes) any attribute ["/" (content: no) | none 
(content: yes)] ">"
(open-tag to word! word-res attributes content)
]   ]

chars: complement charset {<>"'= ^/^-/}
value-chars: union chars charset "/"
word: [copy word-res some chars]
space: charset { ^/^-}
attributes: [ ]
attribute:
[   (wrs: word-res) word any space 
[   "=" any space
[   {"} copy value any dquoted-chars {"}
  | {'} copy value any squoted-chars {'}
  | copy value any value-chars
]   any space
  | (value: yes)
]   (insert insert tail attributes to word! word-res any [value copy ""] word-res: 
wrs)
]
dquoted-chars: complement charset {"}
squoted-chars: complement charset {'}

document-rule:
[   some 
[   copy contents to "<" (add-contents contents) tag-rule 
  | copy contents to end (add-contents contents) break
]   ]

stack: [ ]
parsed: none

no-content-tags:
[   basefont br area link img param hr input col frame base meta]

open-tag:
func [tagname attributes content? /local tag]
[   if find no-content-tags tagname [content?: no]
either content?
[   tag: compose/deep [[(tagname) (attributes)]]
insert/only tail last stack tag
insert/only tail stack tag
]
[   tag: compose [(tagname) (attributes)]
insert/only tail last stack tag
]   ]
end-tag:
func [tagname]
[   stack: back tail stack
if head? stack [exit] ; unmatched close tag
while [tagname <> tagname-of stack/1]
[   stack: back stack
if head? stack [exit] ; unmatched close tag
]
stack: head clear stack
]
add-contents:
func [contents]
[   if contents
[   insert tail last stack contents
]   ]

parse-document:
func [document]
[   stack: clear head stack
insert/only stack parsed: make block! 10
parse/all document document-rule
parsed
]


This is extracted from other code so it is possible that something
is missing. Example:

>> parse-document "TitleThis is 
>> atest"
== [[[html] [[head] [[title] "Title"]] [[body] "This is a" [br] "test"]]]
>> parse-document read http://www.rebol.com
== [[[HTML] "^/" [[HEAD] "^/" [META HTTP-EQUIV "Content-Type" CONTENT 
"text/html;CHARSET=iso-8859-1"] "^/" [META NAME "KEYWORDS" CO...

Regards,
   Gabriele.
-- 
Gabriele Santilli <[EMAIL PROTECTED]>  --  REBOL Programmer
Amiga Group Italia sez. L'Aquila  ---   SOON: http://www.rebol.it/

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Patrick Philipot

Hello Ingo,

Wednesday, October 8, 2003, 12:50:20 PM, you wrote:


IH> Hi Patrick,

IH> patrick à la poste wrote:
>> Hi List,
>> 
>> I'd like to parse a string searching for two things at the same time.
>> it seems to me that this is impossible.

IH> One trick is, to find something that is equal between the two strings, and
IH> work from there ...

IH> REBOL []

IH> myText: {}

IH> parse/all myText [
IH> any [
IH> to "=" here: (there: at here -4) :there [
IH> [ "HREF=" | " SRC=" ]
copy target to ">>" (print target) |
IH> thru "="
IH> ]
IH> ]
IH> ] ; parse

IH> In this example I used the "=" which is common to both strings, checked
IH> whether what I have _before_ this sign is one of the two strings I'm
IH> interested in, and then start to copy, or just go thru the "=" to start
IH> again ...


IH> I hope that helps,

IH> Ingo

This is brilliant!
Thank you Ingo.



-- 
Best regards,
 Patrick

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Petr Krenzelok

Gabriele Santilli wrote:

>Hi Petr,
>
>On Wednesday, October 8, 2003, 4:39:01 PM, you wrote:
>
>PK> ah, but that is char-by-char execution ...
>
>Do  you know any other way to do that? (Your example is using FIND
>multiple  times,  and  in  a  big  string that would be many times
>slower.)
>  
>
Well - I am not sure my example will be any slower, except the penalty 
of extra function call. First, I pass it string at certain position and 
it then returns strings at positions, where further parse rule a) or b) 
can be applied directly, second - it is 2 direct search in string and 
decision upon which index came first vs probably recursive char-by-char 
rules (which penalty I am not able to think about :-)

>PK> yes, exactly - but I think such grammar to simply achieve what was
>PK> requested will not be easy for novices. The tool (REBOL) should support
>PK> our thinking pattern - and the most easy on is to "skip"  "to | thru"
>PK> certain string - no matter what is in between.
>
>I  think  that it is better to think of the problem in a different
>way,  because  it  allows you to understand things much better. If
>you switch to think about grammars instead of patterns you'll find
>out that your problems get simpler, not more complicated. IMHO.
>  
>
Yes, I can imagine it, really. The problem is (at least for me), that I 
am able to understand such grammar once someone creates it, but am not 
able to come up with it to solve problem at hand. Will you blame us 
little bit underskilled rebol programmers now? :-)

>PK> If someone is up-to writing complete html parser, building DOM object,
>PK> then maybe we are near seeing rebol based web-browser? :-)
>
>Well,  the  74-lines  [X]HTML parser built into Temple is far from
>being complete, but has been able to parse all the HTML files I've
>fed  into it until now. I don't think this is so much complicated,
>you  just  need  to avoid that brain-dead way of doing things that
>seems to pervade the world. ;-)
>
>  
>
Sounds interesting. I am just curious, if e.g. html only (not trying to 
complicate things with java-script for now :-) browser would be possible 
with Rebol? IIRC Python has web browser. Just curious.

-pekr-

>Regards,
>   Gabriele.
>  
>


-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Maxim Olivier-Adlhoch


> -Original Message-
> From: Gabriele Santilli [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 08, 2003 11:42 AM
> To: Petr Krenzelok
> Subject: [REBOL] Re: Parse limitation ?
> 
> 
> I  think  that it is better to think of the problem in a different
> way,  because  it  allows you to understand things much better. If
> you switch to think about grammars instead of patterns you'll find
> out that your problems get simpler, not more complicated. IMHO.
> 

can you give a short example of a grammar that would extract the text from

  paragraph infocontent end>

and returns a block such as:

[
tag! [
"tag content"
subtag! [
"its content?"
]
p [
"paragraph info"
]
"content end"
]
]

I have no idea How I would approach this!

this could be a nice tutorial for us "less gifted" parsers.


-MAx

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Gabriele Santilli

Hi Petr,

On Wednesday, October 8, 2003, 4:39:01 PM, you wrote:

PK> ah, but that is char-by-char execution ...

Do  you know any other way to do that? (Your example is using FIND
multiple  times,  and  in  a  big  string that would be many times
slower.)

PK> yes, exactly - but I think such grammar to simply achieve what was
PK> requested will not be easy for novices. The tool (REBOL) should support
PK> our thinking pattern - and the most easy on is to "skip"  "to | thru"
PK> certain string - no matter what is in between.

I  think  that it is better to think of the problem in a different
way,  because  it  allows you to understand things much better. If
you switch to think about grammars instead of patterns you'll find
out that your problems get simpler, not more complicated. IMHO.

PK> If someone is up-to writing complete html parser, building DOM object,
PK> then maybe we are near seeing rebol based web-browser? :-)

Well,  the  74-lines  [X]HTML parser built into Temple is far from
being complete, but has been able to parse all the HTML files I've
fed  into it until now. I don't think this is so much complicated,
you  just  need  to avoid that brain-dead way of doing things that
seems to pervade the world. ;-)

Regards,
   Gabriele.
-- 
Gabriele Santilli <[EMAIL PROTECTED]>  --  REBOL Programmer
Amiga Group Italia sez. L'Aquila  ---   SOON: http://www.rebol.it/

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Petr Krenzelok

Gabriele Santilli wrote:

>Hi Petr,
>
>On Wednesday, October 8, 2003, 2:06:58 PM, you wrote:
>
>PK> I would just like to point out, that 'first directive or tu/thru [a | b
>PK> | c]  was proposed for parse enahncement some time ago, but then some
>PK> parse gurus (e.g. Gabriele) admitted, that parse would have to work
>PK> other way internally and that it is not easy achievable (am I right,
>PK> Gabriele?)
>
>The  point  is,  that  internally  PARSE would be forced to do the
>equivalent of:
>
>   [any [a | b | c | skip]]
>
>  
>
ah, but that is char-by-char execution ...

>so  even  if it could be a bit faster than the above I don't think
>it  would  be  of  great  help.  More  readable,  maybe... so it's
>something I could add to compile-rules, if I get some time to work
>on it.
>
>In  this  particular  case,  I wouldn't use this construct at all,
>since  it's  much better to have a more complete grammar 
>
yes, exactly - but I think such grammar to simply achieve what was 
requested will not be easy for novices. The tool (REBOL) should support 
our thinking pattern - and the most easy on is to "skip"  "to | thru" 
certain string - no matter what is in between.

If someone is up-to writing complete html parser, building DOM object, 
then maybe we are near seeing rebol based web-browser? :-)

-pekr-

>(that can
>make  distinction  between  href=  in  a  tag and outside of a tag
>etc.), IMHO.
>
>Regards,
>   Gabriele.
>  
>



-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Petr Krenzelok

Petr Krenzelok wrote:

>patrick ŕ la poste wrote:
>
>  
>
>>Hi List,
>>
>>I'd like to parse a string searching for two things at the same time.
>>it seems to me that this is impossible.
>>
>>For example, a text from which I want to extract the HREF and the SRC target.
>>
>>myText: {}
>>
>>parse myText [
>>   any [ thru "HREF=" copy target to ">" (print target) |
>> thru "SRC=" copy target to ">" (print target)
>>   ] ; any   
>>] ; parse
>>
>>"#section1"
>>"#section1"
>>
>>parse myText [
>>   any [ thru "SRC=" copy target to ">" (print target) |
>> thru "HREF=" copy target to ">" (print target)
>>   ] ; any   
>>] ; parse
>>
>>"foobar.gif"
>>"#section1"
>>
>>The result is different depending which rule comes first. The only way I see as a 
>>workaround is to parse the text twice. Is there a better (smarter) way?
>>
>> 
>>
>>
>>
>I would just like to point out, that 'first directive or tu/thru [a | b 
>| c]  was proposed for parse enahncement some time ago, but then some 
>parse gurus (e.g. Gabriele) admitted, that parse would have to work 
>other way internally and that it is not easy achievable (am I right, 
>Gabriele?)
>
>OTOH - your example is just one of those which we often enough meet in 
>real life, but have no easy/elegant solution for, at least not for 
>novice being able to solve it 
>
>  
>
Well, I just played a bit and following hack appeared in my notepad :-)

reposition: func [str blk /local res tmp][
   res: copy []
   foreach item blk [
 if not none? tmp: find str item [append res reduce [index? tmp item]]
   ]
   sort/skip res 2
   either empty? res [str][at str (first res) - (index? str) + 1]
]


myText: {




}

src-rule:  ["SRC=" copy target to ">" (print target)]
href-rule: ["HREF=" copy target to ">" (print target)]

parse/all mytext [
  any [
mark: (mark: reposition mark ["HREF=" "SRC="]) :mark
[src-rule | href-rule]
  ]
to end
]

You can call 'reposition function with block containing any number of 
options you want to decide upon which is coming first. It will just do 
plain search, analyze its postion, sort resulting block and "reposition" 
your parse input string so that the parser pointer points to first of 
the options, so you can directly apply "HREF=", "SRC=" etc and you can 
be sure one of them is there ...

Well, I don't know how it is robust, but tried with mytext: read 
http://www.rebol.com and it seems it needs further tuning :-) 

following might get you better results:

mytext: read http://www.rebol.com

src-rule:  [{SRC="} copy target to {"} (print target)]
href-rule: [{HREF="} copy target to {"} (print target)]

parse/all mytext [
  any [
mark: (mark: reposition mark [{HREF="} {SRC="}]) :mark
[src-rule | href-rule]
  ]
to end
]


Anyway ... you've got some inspiration ...

-pekr-

>-pekr-
>
>  
>
>>Regards 
>>Patrick
>>
>> 
>>
>>
>>
>
>
>
>
>  
>




-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Gabriele Santilli

Hi Petr,

On Wednesday, October 8, 2003, 2:06:58 PM, you wrote:

PK> I would just like to point out, that 'first directive or tu/thru [a | b
PK> | c]  was proposed for parse enahncement some time ago, but then some
PK> parse gurus (e.g. Gabriele) admitted, that parse would have to work
PK> other way internally and that it is not easy achievable (am I right,
PK> Gabriele?)

The  point  is,  that  internally  PARSE would be forced to do the
equivalent of:

   [any [a | b | c | skip]]

so  even  if it could be a bit faster than the above I don't think
it  would  be  of  great  help.  More  readable,  maybe... so it's
something I could add to compile-rules, if I get some time to work
on it.

In  this  particular  case,  I wouldn't use this construct at all,
since  it's  much better to have a more complete grammar (that can
make  distinction  between  href=  in  a  tag and outside of a tag
etc.), IMHO.

Regards,
   Gabriele.
-- 
Gabriele Santilli <[EMAIL PROTECTED]>  --  REBOL Programmer
Amiga Group Italia sez. L'Aquila  ---   SOON: http://www.rebol.it/

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Ladislav Mecir

Hi Pat,

- Original Message - 
From: "patrick à la poste" 

> 
> Hi List,
> 
> I'd like to parse a string searching for two things at the same time.
> it seems to me that this is impossible.
> 
> For example, a text from which I want to extract the HREF and the SRC target.
> 
> myText: {}
> 
> parse myText [
> any [ thru "HREF=" copy target to ">" (print target) |
>   thru "SRC=" copy target to ">" (print target)
> ] ; any   
> ] ; parse
> 
> "#section1"
> "#section1"
> 
> parse myText [
> any [ thru "SRC=" copy target to ">" (print target) |
>   thru "HREF=" copy target to ">" (print target)
> ] ; any   
> ] ; parse
> 
> "foobar.gif"
> "#section1"
> 
> The result is different depending which rule comes first. The only way I see as a 
> workaround is to parse the text twice. Is there a better (smarter) way?
> 
> 
> 
> Regards 
> Patrick

This is possible with PARSE. You can use my parse enhancements e.g. Have a look at: 
http://www.fm.vslib.cz/~ladislav/rebol/parseen.r

Ladislav


-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Petr Krenzelok

patrick ŕ la poste wrote:

>Hi List,
>
>I'd like to parse a string searching for two things at the same time.
>it seems to me that this is impossible.
>
>For example, a text from which I want to extract the HREF and the SRC target.
>
>myText: {}
>
>parse myText [
>any [ thru "HREF=" copy target to ">" (print target) |
>  thru "SRC=" copy target to ">" (print target)
>] ; any   
>] ; parse
>
>"#section1"
>"#section1"
>
>parse myText [
>any [ thru "SRC=" copy target to ">" (print target) |
>  thru "HREF=" copy target to ">" (print target)
>] ; any   
>] ; parse
>
>"foobar.gif"
>"#section1"
>
>The result is different depending which rule comes first. The only way I see as a 
>workaround is to parse the text twice. Is there a better (smarter) way?
>
>  
>
I would just like to point out, that 'first directive or tu/thru [a | b 
| c]  was proposed for parse enahncement some time ago, but then some 
parse gurus (e.g. Gabriele) admitted, that parse would have to work 
other way internally and that it is not easy achievable (am I right, 
Gabriele?)

OTOH - your example is just one of those which we often enough meet in 
real life, but have no easy/elegant solution for, at least not for 
novice being able to solve it 

-pekr-

>
>Regards 
>Patrick
>
>  
>




-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.



[REBOL] Re: Parse limitation ?

2003-10-08 Thread Ingo Hohmann

Hi Patrick,

patrick à la poste wrote:
> Hi List,
> 
> I'd like to parse a string searching for two things at the same time.
> it seems to me that this is impossible.

One trick is, to find something that is equal between the two strings, and 
work from there ...

REBOL []

myText: {}

parse/all myText [
any [
to "=" here: (there: at here -4) :there [
[ "HREF=" | " SRC=" ]
copy target to ">" (print target) |
thru "="
]
]
] ; parse

In this example I used the "=" which is common to both strings, checked 
whether what I have _before_ this sign is one of the two strings I'm 
interested in, and then start to copy, or just go thru the "=" to start 
again ...


I hope that helps,

Ingo


-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.