My crawler indexing crawled pages with these code:
Document doc = new Document();
doc.add(new Field("body", page.getHtmlData(), Store.YES, Index.UN_TOKENIZED
));
doc.add(new Field("url", page.getUrl(), Store.YES, Index.UN_TOKENIZED));
doc.add(new Field("title", page.getTitle(), Store.YES, Index.TOKENIZED));
doc.add(new Field("id", Integer.toString(page.getId()), Store.YES, Index.NO
));
try {
   indexWriter.addDocument(doc);
}
catch (Exception e) {
   log.error(e.getMessage());
}

I need to write application able to search through indexed pages' html code
using code patterns like:
<table width="100%" height="50" style="border: 1px solid red;">
 *
 <th>*test*</th>
 *
</table>
This should match all documents with such code regardless of order of tag
parameters.
Is it possible with lucene engine?

Thanks!

Reply via email to