Hi,
I have to search a single pdf document for requested string and if that
string is found, I need to return a page number where that string was found.
Requested string can be anything in a pdf document.
It is a big document(abount 5000 pages) so I'm asking if that is possible
with lucene.
I'm using pdfbox class and i found a way to do it (searching with instring
page by page) but it is too slow:
PDDocument pddDocument=PDDocument.load(f);
PDFTextStripper textStripper=new PDFTextStripper();
int lastpage = textStripper.getEndPage();
String page= null;
int found= 0;
for(int i=1; i<lastpage ; i++){
textStripper.setStartPage(i);
textStripper.setEndPage(i);
page = textStripper.getText(pddDocument);
found = page .indexOf(searchtext);
if (found>0) {returnpage= i; break;}
}
----------------
Is there a way to speed up the search with lucene? Can I use indexing to
solve this problem? thanks.
--
View this message in context:
http://www.nabble.com/search-trough-single-pdf-document---return-page-number-tp25905217p25905217.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]