On Oct 21, 2003, at 6:12 AM, Dan Sugalski wrote:

On Tue, 21 Oct 2003, Elizabeth Mattijsen wrote:

At 08:21 -0400 10/21/03, Dan Sugalski wrote:
I find the notion of an "XML header" a bit confusing, given Dan's
 statement to the effect that it was a throw to XML folks.

 I think anything "XML folks" will be interested in will entail
 *wrapping* stuff, not *prefixing* it.

Nah, I expect what they'll want is for the entire data stream of
serialized objects to be in XML format. Which is fine--they can have that.
(It's why I mentioned the serialization routines can be overridden)


For an XML stream the header might be <xml parrot format='xml'
version=1.0> with the rest of the stream in XML. A YAML stream would start
<xml parrot format='yaml' version=1.0> with the rest in YAML, and teh
binary format as <xml parrot format='binary' version=1.0>. Or something
like that, modulo actual correct XML.

If you want that to be looking like valid XML, it would have to be different:


error: Specification mandate value for attribute parrot
<xml parrot/>
           ^
Better in my opinion would be something like:

<parrot format="xml" version="1.0"/>data yadda yadda yadda

I'm not an XML guy, and I'm making all this up as I go along. If that's better, fine with me. :)

Yeah, you can't put extra things in the "<?xml..." at the start of a document, and you can't create a tag of your own whose name starts with "XML" or "xml".


So are we talking about a header or a wrapper?  If it is really a
header, it's not XML and then it's prettyy useless from an XML point
of view.

We're talking about the first thing in a file (or stream, or whatever). I
was under the impression that XML files should be entirely composed of
valid XML, hence the need for the stream type marker being valid XML.

No, XML _documents_ must be XML, but that doesn't mean that document == file. (For another example where this comes up, consider an XML document transmitted over HTTP. There are headers and other textual things in the stream along with the xml, and it's the HTTP protocol which determines where the document begins and ends, not xml's.) You can certainly have more than one XML document in a single file, but something needs to decide where an xml document begins and ends, and hand only that data to the xml parser.


YAML doesn't care as much, so far as I understand, and for our own internal
binary format we cna do whatever we want. If that's not true, then we can
go for a more compact header.

Yes, if you want the whole serialized steam to count as a well-formed xml document, then you can't but arbitrary binary data in the middle. See my previous post for why.


Once again, modulo my limited and inevitably incorrect YAML knowledge. So
if the header says it's XML the whole thing is valid XML, while if it
doesn't the rest of the stream doesn't have to be. (Just enough of the
header so that an XML processing program can examine the stream and decide
that the valid XML chunk at the beginning says that the rest of the
stream's not XML)

Most XML parsers aren't expecting to handle this. That is, there's no such thing as a valid half-of-an-xml document, from the perspective of the xml spec, and in many cases you'd have trouble getting a parser to stop before hitting something problematic and blowing up. In other words, you can't rely on an xml parser to process something which starts out looking like xml, but isn't.


Basically we want some nice, fixed (mostly) thing at the head of the
stream that doesn't vary regardless of the way the stream is encoded, and
XML seemed to be the most restrictive of the forms I know people will
clamor for. (I know, it means the stream can't be valid Lisp-style sexprs,
but XML's more widespread :)

Yeah, if you're just needing to tag the stream with a label to indicate the type plus a version number, then xml's on the one hand overkill and on the other hand not necessarily a big help to xml proponents.


JEff



Reply via email to