RE: setting xml encoding

Dave Watts Fri, 15 Jun 2007 10:32:03 -0700

> A haiku is not XML.  You did not give me an example of an XML 
> string with white space between the tags in which an 
> application would need it to handle the data.  (Note, I am 
> not talking about white space between a corresponding start 
> and end tag as that would actually make sense since it is 
> part of the data being transferred.  I speak of white space 
> after one ending tag and before another starting tag)  This 
> discussion is not about haikus, limericks, or iambic 
> pentameter.  It is about xml as a means to define and 
> transfer data and the fact that I have seen no actual, 
> real-life, honest-to-goodness reason whatsoever that would 
> require an application to be aware of white space between the 
> nodes to be able to correctly receive, and process the data 
> contained within the XML.  Let's say I were to store a haiku 
> in an XML string.  Regardless of whether each line was stored 
> separately, or if it was stored as one large string, the text 
> would reside in the value of a tag's attribute or the inner 
> text of one tag.  In both of those scenarios white space IS 
> significant since it is within a tag.  That is not the scope 
> of this discussion however.  The white space outside of tags 
> is what we are talking about.


The only whitespace "outside of tags" is the whitespace around the root
element. Everything else is inside some element. Given this XML:

<someelement>
        This is some text.
        <somechildelement>...</somechildelement>
        This is some more text.
</someelement>

You have two text nodes within someelement, one before the child element,
one after.

Given this XML:

<someelement>
        This      is
                some
                        text.
        <somechildelement>...</somechildelement>
        This
        is
        some more
        text.
</someelement>

the values of the text nodes are obviously different from the first example,
and the XML parser cannot, by itself, determine whether whitespace is
relevant. That is up to the application which receives the data from the XML
parser. Imagine if, for example, HTML didn't treat whitespace the way it
does. (The fact that it does treat whitespace as irrelevant is an arbitrary
decision, not the way it had to be.) Assuming you have an XML representation
of HTML, your XML parser would have to preserve the whitespace for
presentation by the browser.

As for whether a haiku (or anything else) is XML, well obviously, no it
isn't. But the point of XML is to let me describe any sort of data in a way
that can be understood by any XML parser. That data may contain haikus, or a
representation of how William Shatner talks - lots of whitespace would be
needed there - or anything else you can think of. Just because whitespace is
unimportant to you, doesn't mean that it's unimportant to everyone else.

> The two xml strings below are NOT equal in my opinion:
> 
> <tag1>
>       <tag2 />
>       <tag3 />
> </tag1>
> 
> <tag1><extra_tag_you_dont_want><tag2 
> /><extra_tag_you_dont_want><tag3 /><extra_tag_you_dont_want></tag1>
> 
> That is exactly what Mozilla does. It takes the 2 children of 
> tag1 and adds three siblings.  That may seem right from some 
> technical, scientific, sterile point of view, but seems 
> downright wrong to me.

You are confusing elements with nodes. They are not the same thing. There is
no corresponding element for a text node. As for whether something seems
right or wrong to you, I can only point to the specification.

> > How does an XML parser know whether whitespace is 
> > significant or not? 
> 
> Because its an XML PARSER!

You are confusing the jobs of the XML parser, which is supposed to parse
XML, and the application that invokes the parser, which is supposed to
understand the significants of the parsed values. For example, I can
represent a recordset using XML, but the XML parser doesn't know anything
about recordsets, or databases. That's up to the application using the
parser.

> Why would a parser create elements out of white space?

Again, you're confusing elements with nodes.
  
> Strictly speaking from a practical point of view, when in the 
> history of XML has someone received a string of text which is 
> known to be xml, and needed data from outside of the tags?  I 
> don't think that has ever happened.

Again, the data is not "outside of the tags".

> I guess it should be noted that I am not necessarily arguing 
> about how XML parsing is currently defined as much as I am 
> how I think it SHOULD be defined if you were to logically 
> think through how it is used.  I think the "correct" 
> interpretation that Mozilla conforms to is wrong because it 
> implicitly creates un-wanted elements out of characters which 
> are NOT supposed to designate the existence of elements.

Again, I can only tell you what the specification says, not how things
should be. I disagree with your interpretation of how things should be,
though.

Dave Watts, CTO, Fig Leaf Software
http://www.figleaf.com/

Fig Leaf Software provides the highest caliber vendor-authorized
instruction at our training centers in Washington DC, Atlanta,
Chicago, Baltimore, Northern Virginia, or on-site at your location.
Visit http://training.figleaf.com/ for more information!

This email has been processed by SmoothZap - www.smoothwall.net


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
ColdFusion MX7 and Flex 2 
Build sales & marketing dashboard RIAâs for your business. Upgrade now
http://www.adobe.com/products/coldfusion/flex2?sdid=RVJT

Archive: 
http://www.houseoffusion.com/groups/CF-Talk/message.cfm/messageid:281313
Subscription: http://www.houseoffusion.com/groups/CF-Talk/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4

RE: setting xml encoding

Reply via email to