PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com _____________________________________________________________
Preface: I'm not trying to be antagonistic, I have genuine questions. Sounds like you're more familiar with this territory than I. I've been exposed to XML, but in much the same way some have been exposed to sex offenders in a park: Cringe and look away. XSLT was not my cup of tea, though I can see uses for it. > A number of things... > > 1) Not flexible/extensible enough. What data can you represent in XML that you can't do in Cos? Key/value pairs, strings, numbers, binary data. All exist in both formats, and both seem equally extensible, at least to my eye. Once in the XML world, you can rope in other related technologies to give added benefit, but that's piling on more technologies to PDF that wouldn't have been necessary had they stuck to Cos. > 2) Outside the /Catalog structure of the PDF - thus making it > indirectly related to the document I don't see this as a problem. And if it was, they could have linked the DocInfo dict from the root as well as the trailer. > 3) Not consistent with adding per-object metadata I strongly disagree. It seems to me that adding per-object data would be better suited to a Cos construct, where you could make an indirect reference to the object in question. With XML, you'd have to go hunting in some way. A structure lookup, or a form field's name would have to be hunted down. Anything else is nigh-off limits. Resources aren't indexed at the document level. You could explicitly mention the objects number [and generation], but now you're shoehorning Cos-isms into the XML (xmp was it?). I'd rather have the Cos. [Not that I'll get it] > 4) Non-standards based for integration with databases/search > engines, etc. Nearly granted, but not quite. PDF is a standard, though one in corporate control. Quite a few engines out there do search PDF, with several open-source tools floating about to help. Now those tools have to expand to cover a whole new format. > (etc) I still don't see the benefit in inflicting this additional pain on PDF developers. Now we HAVE to get into XML in a variety of cases (such as this one). Admittedly, it can be argued that dragging programmers into the 21st century is a good thing. On the other hand, this is an example of "coupling", tying things together. Strong coupling in the programming world is a bad thing, particularly when it can be avoided (by never having introduced XML metadata in the first place, or by stamping it out now that its here). > My hope is that DocInfo is going away... I'd like to see a single source of information too. Which source? We disagree. > > Correct. If you are reading, you should first read > the /Metadata > and then if not present there, fall back to the /Info fields. > > Do keep in mind that there are multiple namespaces > within the XMP > that also needs to be kept in sync - DublinCore, pdf, etc. I fail to see how this is superior to adding data to the CosInfo dictionary. It seems like far more effort that its worth. > > > >And could they please throw a couple more namespaces into > that XML? I just > >don't think *S*E*V*E*N* is enough! I weep for the poor > fools who actually > >have to parse it. > > It's actually pretty easy, if you understand XML concepts... I've worked indirectly with various XML parsers in the past. I was more concerned with human-reading, though that thought didn't come out at all in my list post. Reading that XML wasn't easy. > > > >Or you could just blow the metadata away. > > You could - that is a valid approach to the problem... ...providing there's no additional data in there that Acrobat doesn't know how to generate automatically. (just to shoot a hole in my solution) > Leonard My conclusion (which could be wrong, Lord knows I've been wrong before) is that this XML was tacked on as a marketing bullet point or some techy pet project, rather than for sound technical reasons. None the less, it's here, and we have to deal with it. Is that XML metadata documented anywhere? I haven't seen it. It's PRESENCE is mentioned in the PDF specification (10.2.2), but not its contents. There's a nice list of things which won't be there, but not much on what will. Mark To change your subscription: http://www.pdfzone.com/discussions/lists-pdfdev.html
