My crawler indexing crawled pages with these code: Document doc = new Document(); doc.add(new Field("body", page.getHtmlData(), Store.YES, Index.UN_TOKENIZED )); doc.add(new Field("url", page.getUrl(), Store.YES, Index.UN_TOKENIZED)); doc.add(new Field("title", page.getTitle(), Store.YES, Index.TOKENIZED)); doc.add(new Field("id", Integer.toString(page.getId()), Store.YES, Index.NO )); try { indexWriter.addDocument(doc); } catch (Exception e) { log.error(e.getMessage()); }
I need to write application able to search through indexed pages' html code using code patterns like: <table width="100%" height="50" style="border: 1px solid red;"> * <th>*test*</th> * </table> This should match all documents with such code regardless of order of tag parameters. Is it possible with lucene engine? Thanks!