On Oct 20, 2003, at 10:09 PM, Gregor N. Purdy wrote:

Here's an example of a wrapper:

  <pmi class="foo" ...>
    <!-- Data of some sort determined by class foo -->
  </pmi>

That's a bit better, although bear in mind that if the intent is that
you could throw the entire chunk at an XML parser and have it not
complain, you are going to have to take some care in generating the
guts. Binary data is in general right out (where does it end? What if
it contains fragments that look like XML markup?). Sure, you could
slap it in a <![CDATA[ ... ]]> film, but you'd still have to watch
out for the possibility that the body might want the sequence "]]>"
in it somewhere...

A slight aside, but just to build on what you said since this is an often-misunderstood facet of XML, there are a bunch of other reasons you can't just throw binary data into an arbitrary XML document inside of a CDATA section:


1) If you declare the encoding of the XML to be UTF-8, then your document won't be well-formed XML if your binary data doesn't look like legitimate UTF-8 data (which it won't in the general case--many bytes and byte sequences can't occur in UTF-8).

2) XML parsers are free to transcode--so a document declared in one encoding may pass data on to its client application in another encoding, which will mangle your binary data. For instance, the expat parser can consume documents in a variety of encodings, but always passes text to its callbacks in UTF-8.

3) XML parsers are required to line-ending normalize, so anything which looks like a CR or CRLF will turn into an LF, even inside of a CDATA section.

That said, you can get around all of this by base-64 encoding your binary data for storage in XML, since that turns binary data into text. On the other hand, it's a waste of space and more CPU cycles to consume than a more obvious binary format.

And that said, I read Dan's email as just meaning an XML header, but I didn't quite understand exactly what he had in mind either.

JEff



Reply via email to