Also sprach Alex Block (at 12:43 PM 7/28/98 -0400) ...
>Given that the htdig archive at eosys is down, can someone advise me as to
>the steps required to provide PDF support within htdig?

First you need to install Adobe Acrobat Reader on your server.  Get the
latest version from:
        http://www.adobe.com

Second, you need to run the patch that's included in htdig-pdf.tgz,
available at:
        ftp://sol.ccsf.cc.ca.us/htdig-patches/3.0.8b2/

Don't compile it yet because ...

Third, make the following changes, as pointed out by Sylvain Wallez:

<start quote>
The first one is a bug in PDF.cc (doesn't seem to happen on the PDF
files on my Intranet, but we only use Acrobat to produce them). Here's
the diff he sent me :

diff -c htdig/PDF.cc.old htdig/PDF.cc
*** htdig/PDF.cc.old    Wed Jul 15 10:46:03 1998
--- htdig/PDF.cc        Tue Jul 14 10:21:38 1998
***************
*** 280,286 ****
        }

     }
!    else if (line == "BT")
     {
        // Beginning of text block
        if (debug > 3)
--- 280,286 ----
        }

     }
!    else if ( mystrncasecmp( line.get(), "BT", 2 ) == 0 )
     {
        // Beginning of text block
        if (debug > 3)


The second problem is that the default value for the "bad_extension"
attribute contains .pdf, which causes all pdf files to be ignored by
htdig, even if a parser is available.

To correct this, you can either put a "bad_extension" list without
".pdf" in your config file (this is what I did), of apply the following
patch to htcommon/defaults.cc :

diff -c htcommon/defaults.cc.old htcommon/defaults.cc
*** htcommon/defaults.cc.old    Fri Aug 15 01:59:25 1997  
--- htcommon/defaults.cc        Mon Jul 13 19:37:33 1998  
***************
*** 37,43 ****
      {"add_anchors_to_excerpt",        "true"},
      {"allow_numbers",                 "false"},
      {"allow_virtual_hosts",           "true"},
!     {"bad_extensions",                ".wav .gz .z .sit .au .zip .tar
.hqx .exe .com .gif .jpg .jpeg .aiff .pdf .class .map .ram"},
      {"bad_word_list",                 "${common_dir}/bad_words"},
      {"create_image_list",             "false"},
      {"create_url_list",               "false"},
--- 37,43 ----
      {"add_anchors_to_excerpt",        "true"},
      {"allow_numbers",                 "false"},
      {"allow_virtual_hosts",           "true"},
!     {"bad_extensions",                ".wav .gz .z .sit .au .zip .tar
.hqx .exe .com .gif .jpg .jpeg .aiff .class .map .ram"},
      {"bad_word_list",                 "${common_dir}/bad_words"},
      {"create_image_list",             "false"},
      {"create_url_list",               "false"},

Thanks to M.J. Long for bug hunting.

<end quote>

Now, you can do a configure, make clean, make and make install.  Voila, PDF
parsing!

.........................................................................
Colin Viebrock           Creative Director - Private World Communciations
[EMAIL PROTECTED]                          http://www.privateworld.com
ICQ: 11386088

                                                 If puns were deli meat,
                                                this would be the wurst.

----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to