Thanks for the explanation, Joel. Question 1 is now out of the way. But as for Q2, I still am facing some problems.
I actually am parsing HTML, not XML, so I need a method that will understand certain things that are illegal in XML. Ex: <table width="100%" noborder height=75%>. This is valid HTML, I think, or at least it is widely in use. The parse-xml function understands neither the noborder attribute nor the height attribute: >> parse-xml {<table width="100%" noborder height=75%>} == [document none [["table" ["width" "100%"] none]]] I once used this method to extract attributes from tags: ex_att: func [tag attr] [ trim to-string select difference parse tag "<> =" [""] attr ] but it doesn't get the nobordet attribute right... Any suggestions (or code), anyone? ~H Joel Neely skrev (Sunday 07.10.2001, kl. 23.38): >Hi, Hallvard, > >Hallvard Ystad wrote: > > > > 1) When I use the parse-xml function, here's what I get: > > > > >> xml-doc: parse-xml {<test><tag>This is inside > "tag"</tag><goodForNothi > > ng/> And this is in the outer tag, the "test" tag.</test>} > > == [document none [["test" none [["tag" none [{This is inside "tag"}]] > [" > > goodForNothing" none none] { And this is in the outer tag,... > > >> print mold xml-doc > > [document none [["test" none [["tag" none [{This is inside "tag"}]] > ["goo > > dForNothing" none none] { And this is in the outer tag, the "test" > tag.}] > > ]]] > > >> > > > > Is there some good documentation for the use of this function > somewhere, > > and, not least, for the kind of block tree it returns? > > > >I haven't seen it documented, but the returned block structure works is >organized as follows: > >* content strings are represented as strings, with all >ignorablewhitespace > retained (e.g., any leading/trailing newlines, indentation, etc.) > >* an XML element is represented by a three-element block > > [ elementname attributeblock contentblock ] > > where: > > * elementname is a string giving the name of the element itself; > * attributeblock is either a block of name/value pairs or NONE, > depending on whether attributes were present in the element; and > * contentblock is either a block of content items (strings and/or > element blocks) or NONE, depending on whether the element had > any contents. > >* the top level of the structure is a three-element block with the > word DOCUMENT (note: not the string "document"!) as its first element, > NONE as the second element (presumably no attributes), and the root > XML element as the only member in its third block. > >For example: > > >> parse-xml {<foo where="here" when="now"/>} > == [document none [["foo" ["where" "here" "when" "now"] none]]] > >which shows the DOCUMENT word (with no attributes) and a content of >only one item -- the "foo" element. That element has two attributes >(with values, of course) and no content. Similarly, > > >> parse-xml {<foo where="here" when="now"></foo>} > == [document none [["foo" ["where" "here" "when" "now"] none]]] > >having no content is equivalent to being an empty element. However, > > >> parse-xml { > { <foo where="here" when="now"> > { </foo> > { } > == [document none [["foo" ["where" "here" "when" "now"] ["^/"]]]] > >shows that an ignorablewhitespace string (e.g., only a newline) >is retained as the content of the "foo" element. > > > > > 2) There is a build-tag function, which isn't perfect, but it _is_. > > Has anyone written a good function to go the other way? I.e. to turn > > a tag into a block or into an object? > > > >How about this? > > >> first third parse-xml {<foo where="here" when="now">} > == ["foo" ["where" "here" "when" "now"] none] > >IOW, let PARSE-XML do the work, then pluck out the first (and only) >element in the content of the (hypothetical) document containing >only that single tag. > >Then you get a block structure that is consistent with the above >description (element name, attributes, and NONE). > >HTH! > >-jn- > >-- >; Joel Neely [EMAIL PROTECTED] 901-263-4460 38017/HKA/9677 >REBOL [] foreach [order string] sort/skip reduce [ true "!" >false head reverse "rekcah" none "REBOL " prin "Just " "another " >] 2 [prin string] print "" >-- >To unsubscribe from this list, please send an email to >[EMAIL PROTECTED] with "unsubscribe" in the >subject, without the quotes. Praetera censeo Carthaginem esse delendam -- To unsubscribe from this list, please send an email to [EMAIL PROTECTED] with "unsubscribe" in the subject, without the quotes.