[REBOL] Re: parse-xml and build-tag

Hallvard Ystad Mon, 08 Oct 2001 02:22:23 -0700

Thanks for the explanation, Joel. Question 1 is now out of the way. But as 
for Q2, I still am facing some problems.


I actually am parsing HTML, not XML, so I need a method that will 
understand certain things that are illegal in XML. Ex: <table width="100%" 
noborder height=75%>. This is valid HTML, I think, or at least it is widely 
in use. The parse-xml function understands neither the noborder attribute 
nor the height attribute:

 >> parse-xml {<table width="100%" noborder height=75%>}
== [document none [["table" ["width" "100%"] none]]]

I once used this method to extract attributes from tags:
ex_att: func [tag attr] [
   trim to-string select difference parse tag "<> =" [""] attr
]
but it doesn't get the nobordet attribute right...

Any suggestions (or code), anyone?
~H

Joel Neely skrev (Sunday 07.10.2001, kl. 23.38):
>Hi, Hallvard,
>
>Hallvard Ystad wrote:
> >
> > 1) When I use the parse-xml function, here's what I get:
> >
> > >> xml-doc: parse-xml {<test><tag>This is inside 
> "tag"</tag><goodForNothi
> > ng/> And this is in the outer tag, the "test" tag.</test>}
> > == [document none [["test" none [["tag" none [{This is inside "tag"}]] 
> ["
> > goodForNothing" none none] { And this is in the outer tag,...
> > >> print mold xml-doc
> > [document none [["test" none [["tag" none [{This is inside "tag"}]] 
> ["goo
> > dForNothing" none none] { And this is in the outer tag, the "test" 
> tag.}]
> > ]]]
> > >>
> >
> > Is there some good documentation for the use of this function 
> somewhere,
> > and, not least, for the kind of block tree it returns?
> >
>
>I haven't seen it documented, but the returned block structure works is
>organized as follows:
>
>*  content strings are represented as strings, with all 
>ignorablewhitespace
>    retained (e.g., any leading/trailing newlines, indentation, etc.)
>
>*  an XML element is represented by a three-element block
>
>    [ elementname attributeblock contentblock ]
>
>    where:
>
>    *  elementname is a string giving the name of the element itself;
>    *  attributeblock is either a block of name/value pairs or NONE,
>       depending on whether attributes were present in the element; and
>    *  contentblock is either a block of content items (strings and/or
>       element blocks) or NONE, depending on whether the element had
>       any contents.
>
>*  the top level of the structure is a three-element block with the
>    word DOCUMENT (note: not the string "document"!) as its first element,
>    NONE as the second element (presumably no attributes), and the root
>    XML element as the only member in its third block.
>
>For example:
>
>   >> parse-xml {<foo where="here" when="now"/>}
>   == [document none [["foo" ["where" "here" "when" "now"] none]]]
>
>which shows the DOCUMENT word (with no attributes) and a content of
>only one item -- the "foo" element.  That element has two attributes
>(with values, of course) and no content.  Similarly,
>
>   >> parse-xml {<foo where="here" when="now"></foo>}
>   == [document none [["foo" ["where" "here" "when" "now"] none]]]
>
>having no content is equivalent to being an empty element.  However,
>
>   >> parse-xml {
>   {    <foo where="here" when="now">
>   {    </foo>
>   {    }
>   == [document none [["foo" ["where" "here" "when" "now"] ["^/"]]]]
>
>shows that an ignorablewhitespace string (e.g., only a newline)
>is retained as the content of the "foo" element.
>
> >
> > 2) There is a build-tag function, which isn't perfect, but it _is_.
> > Has anyone written a good function to go the other way? I.e. to turn
> > a tag into a block or into an object?
> >
>
>How about this?
>
>   >> first third parse-xml {<foo where="here" when="now">}
>   == ["foo" ["where" "here" "when" "now"] none]
>
>IOW, let PARSE-XML do the work, then pluck out the first (and only)
>element in the content of the (hypothetical) document containing
>only that single tag.
>
>Then you get a block structure that is consistent with the above
>description (element name, attributes, and NONE).
>
>HTH!
>
>-jn-
>
>--
>; Joel Neely  [EMAIL PROTECTED]  901-263-4460  38017/HKA/9677
>REBOL []  foreach [order string]  sort/skip reduce [ true "!"
>false  head reverse "rekcah"  none "REBOL "  prin "Just " "another "
>] 2 [prin string] print ""
>--
>To unsubscribe from this list, please send an email to
>[EMAIL PROTECTED] with "unsubscribe" in the
>subject, without the quotes.

Praetera censeo Carthaginem esse delendam

-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

[REBOL] Re: parse-xml and build-tag

Reply via email to