Question about pdfedit functionality:

I currently am working on a project and wonder if I can use pdfedit to
help it along.

The PDFs in question are scans of books containing text in a mixture
of languages, for example, on one page, one has, interspersed, phrases
in English, Dutch, Japanese, and TRANSLITERATED Japanese in Hepburn
Romanization, with circumflexes marking long vowels.  These scans have
been OCR'd, so, in addition to the IMAGE of the page, there
is an undersetting of PDF text elements, full of errors, of course,
which need to be corrected.

The version of pdfedit I have deployed is under Cygwin, under Windoze,
and so, naturally, there's no hope of editing the Japanese strings
there--but it occurs to me that, because
of the complicated way Asian languages with big character sets are
implemented in PDF, a language in which big character sets are a bit
of an aftertthought, there might be no way to go about it, even if I
go through the effort of setting up some native UNIX system with
Japanese on it, and seeing if PDFEDIT might handle the Japanese
strings (by some miracle) there.

I wonder if anybody has any suggestions?  It's a tricky business,
because the OCR process often gets a character quite wrong--and so you
sometimes don't even HAVE the character you need to put into the
string in one of the font subsets embedded in the PDF.  I'm just
trying to think about approaches that don't involve me writing great
big programs to take apart the PDF and put it together again.

PDFEDIT would be the solution if I were dealing only with routine,
West-European, OCR'd text.  Open up the PDF, find the string that
needs fixing, type in what's needed, save, ship, finished.

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Pdfedit-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdfedit-support

Reply via email to