I'm using MoreLikeThis class to find similar documents... but I'm not
sure if it is correct to pass as argument a Pdf file to
*MoreLikeThis.like()* method.

Trying to be more clear:

1) In my Lucene index I add some PDF files (I use PDFBox to extract text
and add fields to index)
2) Now I want to search similar documents from a specific PDF file and I
have the PDF file name (C:\\Example.pdf)

*My question is: What is the correct way to call like() method when I
have to find similar PDF files?*

I use:
MoreLikeThis mlt = new MoreLikeThis(IndexReader);               

Query query = mlt.like(*new File("C:\\Example.pdf")*);

I don't sure It is the correct way because I think if I pass a file to
the like() method It is expected to receive a text file and not a PDF
file where the text is not visible...

Do I have to extract text from PDF file and then pass an InputStream
with the text inside? Or my way is ok?

Thanks for any suggestion,

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to