Dov is of course correct in stating that PDF should be considered a final form document format. But, nevertheless, PDF can be used as an input or intermediate format when converting legacy documents formats to XML [1][2].
CambridgeDocs and Exegenix are used by many markup shops, not just for one-off conversion of 'old' legacy documents -- also for ongoing production. The process is *not* cheap and easy to set up, but it will sometimes be the only viable solution. It may be cost-effective compared to changing the ways documents are initially created. kind regards Peter Ring [1] http://www.cambridgedocs.com/products/overview/pdf2xml.htm [2] http://exegenix.com/technology/ecs_engine.html Dov Isaacs wrote: <snip/> > PDF is a "final form" document format. It does not have > the context of the graphical objects it represents. > At best, if you produce a "tagged" PDF, a "converter" > can make some guesses as to the original document > structure in terms of sentences, paragraphs, and tables, > but not much more. The Acrobat save-as-RTF capability > as well as the third party products out there try to > make good guesses as the original formatting, but that > is about the best they can do. Very little context of > a FrameMaker or InDesign document remains in the > resultant PDF file, so any attempt to go back to those > formats is somewhat doomed. If we were to supply "converters" > back to those formats, users expectations would be set > to a level that we could not deliver to. > > Conversions from PDF should be viewed as and only be used > for emergency retrieval of content that has no other > means of being retrieved. We provide an RTF converter > simply because just about every text consuming program out > there can open or import content in RTF and that does satisfy > most of our customer's needs in terms of such emergency > retrieval. > > - Dov