Title: RE: PDF indexing not working in Pro Trial for Windows

Joe:

I've successfully set up mnogo on NT with MySQL.  However, I compiled the unix source version with Cygwin (http://sources.redhat.com/cygwin/).

Here are the lines from indexer.conf that I use:

Mime "application/pdf; charset=iso-8859-1"  "text/html"                  "/usr/local/bin/pdf2html.pl $1 application/pdf"

Mime application/pdf  "text/html"                  "/usr/local/bin/pdf2html.pl $1 application/pdf"

pdf2html.pl is a contributed script from htdig (www.htdig.org) that uses pdfinfo and pdftotext to construct a web page and feed it back to the indexer.  You get meta data this way as well as the text indexed.  Obviously, you need perl for this to work.

Also, there might have been a default line in indexer.conf excluding PDFs from indexing, if so, you'll have to remove it or comment it out.

Works for me, hope it helps.

Greg Holmes

-----Original Message-----
From: Joe Frost [mailto:[EMAIL PROTECTED]]
Subject: PDF indexing not working in Pro Trial for Windows

...........
a client of mine who hosts exclusively on NT wants to set it up as a search
engine for a new project they have. Much of the content will be in PDFs so
this feature is vital.

I have set up my own test system using the current Pro Trial version on
Windows 2000 with MySQL. Indexing of normal html URLs works fine but
indexing of PDFs does not. I'm using pdftotext.exe with the settings
suggested in the help file including using "/" instead of the normal Windows
"\". The PDF is fetched by the indexer and seems to be briefly parsed but
the URL is not included in any subsequent searches for terms that it is
known to include.

Is this a restriction of the trial version or am I doing something wrong?

Reply via email to