[htdig] Correction to patch for Acrobat 4

Gilles Detillieux Wed, 18 Aug 1999 08:22:58 -0700

Hi again, folks.  I made a silly mistake in my patch last Friday, August
13, to support Acrobat 4.  Here's the fix for that mistake:

--- htdig/PDF.cc.bug    Tue Aug 17 11:07:17 1999
+++ htdig/PDF.cc        Wed Aug 18 09:22:28 1999
@@ -109,7 +109,7 @@ PDF::parse(Retriever &retriever, URL &ur
     if (notfound)      // we only need to complain once
        return;
     String arg0 = acroread;
-    char *endarg = strchr(acroread.get(), ' ');
+    char *endarg = strchr(arg0.get(), ' ');
     if (endarg)
        *endarg = '\0';
     // If first arg is a path, check that it exists, and is a regular file. 


It turns out that even without the -pairs option, acroread 4 is still
prone to segmentation violations when generating PostScript, so acroread 3
is a better choice anyway.  However, this fix handles a few other problems
with pdf_parser handling, and you may find that Acrobat 4 works OK with
your files.  Hopefully Adobe will fix these problems before too long.

Also, if you applied last Friday's patch after applying the patch file
collection I sent out last Monday, August 9, there's a hunk that would
have failed to apply to htdoc/attrs.html, because of a conflicting
change in the patch file collection.  You can correct that by applying
the patch below (as well as the one above) after Friday's patch.

--- htdig-3.1.2/htdoc/attrs.html.orig   Fri Aug  6 14:00:28 1999
+++ htdig-3.1.2/htdoc/attrs.html        Tue Aug 17 10:55:45 1999
@@ -4283,14 +4283,33 @@
                      <em>infile outfile</em>,<br>
                      where <em>infile</em> is a file to parse and
                      <em>outfile</em> is the PostScript output of the
-                     parser. The program is supposed to convert to a
+                     parser. In the case where acroread is the parser, and
+                     the -pairs option is not given, the second parameter
+                     will be the output directory rather than the output
+                     file name. The program is supposed to convert to a
                      variant of PostScript, which is then parsed
-                     internally. Currently, Adobe's <a
+                     internally. Currently, only Adobe's <a
                      href="http://www.adobe.com/prodindex/acrobat/readstep.html">
-                     acroread</a> program and the pdftops program
-                     that is part of the <a
+                     acroread</a> program has been tested as a pdf_parser.
+                     There is a bug in Acrobat 4's acroread command, which
+                     causes it to fail when -pairs is used, hence the special
+                     case above.<br>
+                      The pdftops program that is part of the <a
                      href="http://www.foolabs.com/xpdf/">xpdf</a>
-                     0.80 package have been tested as pdf_parsers.
+                     package is not suitable as a pdf_parser,
+                     because its variant of PostScript is slightly
+                     different.  However, an alternative is to
+                     use xpdf's pdftotext program as a component
+                     of an <a href="#external_parsers">external
+                     parser</a> with the xpdf 0.90 package installed
+                     on your system, as described in FAQ question <a
+                     href="FAQ.html#q4.9">4.9</a>.<br>
+                      In either case, to successfully index PDF files,
+                     be sure to set the <a
+                     href="#max_doc_size">max_doc_size</a> attribute
+                     to a value larger than the size of your largest
+                     PDF file. PDF documents can not be parsed if they
+                     are truncated.
                        <p>
                          The default value of this attribute is determined at
                          compile time, to include the path to the acroread


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.
[htdig] Correction to patch for Acrobat 4

Reply via email to