Todd,
I would definitely take Michael's advice to learn more about the
overall issue before you get too far.
A quick answer that may help is Windows does not ship with an iFilter
for PDF built-in. Installing Adobe Reader 8 or higher will install a
decent PDF iFilter.
I am a little surprised by your question though - I assume that you
have access to your own source code and could examine the result from
the iFilter that's being fed to the IndexWriter and compare the
behavior in the TXT case with the behavior in the PDF case?
Cheers,
Ben
Sent from my iPhone
On Jan 6, 2010, at 10:13, Michael Garski <mgar...@myspace-inc.com>
wrote:
Todd,
You'll need some way to extract the text from the PDF prior to
indexing. I'm not familiar with any packages that can do that but I
have heard of them. You may want to try searching the mailing list
to see if there has been mention of one previously. Lucid
Imagination hosts a great mailing list search tool at http://www.lucidimagination.com/search/
Michael
-----Original Message-----
From: Todd McIndoo [mailto:tmcin...@speedyscan.biz]
Sent: Wednesday, January 06, 2010 10:11 AM
To: lucene-net-dev@lucene.apache.org
Subject: Question
Sorry if this is duplicate
We are using Lucene.net of version 2.0.0.4. I am trying to search a
document
which contains lots of PDFs. I want to search a document, which
contains a
specific word, using Lucene.net. We are yielding results in text
documents
but not in PDF. Is there something we have to do to be able to
search in PDF
Documents. All ifilters have been installed on the computer so I do
not
think that is the issue.
Regards,
SPEEDY SOLUTIONS
Todd McIndoo