Thank you for your advice David, I'm trying this also for sure!

Van

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of David Sewell
Sent: Tuesday, July 28, 2009 5:37 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] PDF conversion trial

It's worth comparing ML's PDF-to-XML (and XHTML) conversion against the export 
facility in Adobe Acrobat 9, if you have it. I've recently been evaluating the 
two. Neither is perfect, and they differ in exactly where their strengths and 
weaknesses are. It is very difficult to get letter-perfect XML/XHTML conversion 
from PDF, if the source is complex, because the underlying PDF data has all 
sorts of font changes, typographic features, and other things that cause 
"interference" in the output.

For example, in converting the PDF from a typeset book containing wide angle 
brackets (U+2329 / U+232A or similar), the Acrobat export consistently captured 
them with styled <span>s, while the MarkLogic export sometimes captured them 
and sometimes dropped them or substituted '( )'. On the other hand, MarkLogic 
normalized ligature "fi"correctly as "fi", but Acrobat inserts an extra space, 
"fi " for no good reason.

MarkLogic's PDF conversion pipelines give you more options over how the output 
will be structured than Acrobat does.

DS

On Tue, 28 Jul 2009, Baranov, Ivan - Moscow wrote:

> Hi All
>
> I've recently tried to convert PDF to XML using built-it function
> xdmp:pdf-convert() and discovered that my company's license does not 
> allow this.  Actually I have my own converter so I just wanted to try 
> if ML does it better or faster and now I'm curious about, is there any 
> way to acquire such functionality on a trial basis?

> Thanks,
> Van
>

--
David Sewell, Editorial and Technical Manager ROTUNDA, The University of 
Virginia Press PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: [email protected]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to