Question about pdfedit functionality: I currently am working on a project and wonder if I can use pdfedit to help it along.
The PDFs in question are scans of books containing text in a mixture of languages, for example, on one page, one has, interspersed, phrases in English, Dutch, Japanese, and TRANSLITERATED Japanese in Hepburn Romanization, with circumflexes marking long vowels. These scans have been OCR'd, so, in addition to the IMAGE of the page, there is an undersetting of PDF text elements, full of errors, of course, which need to be corrected. The version of pdfedit I have deployed is under Cygwin, under Windoze, and so, naturally, there's no hope of editing the Japanese strings there--but it occurs to me that, because of the complicated way Asian languages with big character sets are implemented in PDF, a language in which big character sets are a bit of an aftertthought, there might be no way to go about it, even if I go through the effort of setting up some native UNIX system with Japanese on it, and seeing if PDFEDIT might handle the Japanese strings (by some miracle) there. I wonder if anybody has any suggestions? It's a tricky business, because the OCR process often gets a character quite wrong--and so you sometimes don't even HAVE the character you need to put into the string in one of the font subsets embedded in the PDF. I'm just trying to think about approaches that don't involve me writing great big programs to take apart the PDF and put it together again. PDFEDIT would be the solution if I were dealing only with routine, West-European, OCR'd text. Open up the PDF, find the string that needs fixing, type in what's needed, save, ship, finished. ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Pdfedit-support mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pdfedit-support
