The PDF list is a service provided by PDFzone.com | http://www.pdfzone.com
__________________________________________________________________



If Acrobat Paper Capture uses FineReader 6, why is it that given the SAME
document Abbyy will get 25 hits on a given word search and Paper Capture
will only get 18 hits?

We've looked at thousands of documents and, with statistical consistently,
Acrobat's OCR underperforms both ScanSoft OmniPage Pro 12 and Abbyy
FineReader by very wide margins. Acrobat's Paper Capture results are not at
all up to par with either Scansoft or FineReader, so it seems highly
problematic that FineReader is actually bundled into Paper Capture.

PdfCompressor 2.1 uses ScanSoft's OCR engine, so let's conduct a simple test
using these two systems. Here's a comparison of some keyword hits for the
same document (the file is posted on Adobe's site):
http://www.adobe.com/products/acrcapture/agentpack/pdfs/pdfimage/AnnualRepor
t.pdf .

Running Acrobat 6's Paper Capture vs. PdfCompressor 2.1, we have the
following hit results:


                        # of keyword hits

keyword         Acrobat 6               CVISION
                        Paper Capture   PdfCompressor 2.1

commission              50                      169
section                  4                      19
recall          31                      58
requirements    18                      31
corrective              13                      23
regulations     11                      18


Of course, for all of Acrobat 6's OCR inaccuracy, it also runs much slower.
For this 68 page document, time to convert to searchable PDF using Acrobat 6
is 3 mins, 40 secs (220 secs) using a 3 GHz, intel P4 machine; the time to
covert to searchable (JBIG2-compressed) PDF using PdfCompressor 2.1 is only
1 min, 28 secs (88 secs). So Acrobat's Paper Capture is roughly 2.5x slower
than PdfCompressor 2.1.

In addition, the Acrobat Paper Capture-generated hidden text-layer is about
5x-7x larger than the hidden-text layer generated by CVISION's PdfCompressor
2.1.

Ari




-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of Leonard Rosenthol
Sent: Friday, September 26, 2003 7:54 AM
To: [EMAIL PROTECTED]
Subject: RE: [PDF] Searchable pdf



The PDF list is a service provided by PDFzone.com | http://www.pdfzone.com
__________________________________________________________________

At 12:21 AM -0400 9/26/03, Ari Gross wrote:
>Acrobat 6 Paper Capture is not reliable, nor accurate, and runs very
>slow. Its accuracy is in now way comparable to either Scansoft's
>OmniPage Pro 12 or Abbyy FineReader 6.0.

        First, I only said it was better than 5.0 ;).

        However, under certain circumstances, it is EXACTLY the same
as FineReader 6 since that's the engine being used!   Paper Capture
uses multiple engines based on internally determined criteria
(language, quality, platform, etc.) - one of those engines is the
Abbyy FineReader one.


>I've seen it fail to process in Paper Capture mode some very
>standard TIFF files. It also runs very slow.
>

        It runs slowly on color, works much better on B&W.


Leonard
--
---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:[EMAIL PROTECTED]>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-629-3700 (voice)
                                              215-629-0789 (fax)

To change your subscription:
http://www.pdfzone.com/discussions/lists-pdf.html



To change your subscription:
http://www.pdfzone.com/discussions/lists-pdf.html

Reply via email to