Hello, I've been working with lucene for about a month now. I've got my indexes created, but I'm having a problem with the results I've been returning.
The site I'm working on has a lot of small html files that are used for page construction (nav bars, footers, etc) and they're being returned high in the results because they contain the search term(s) I'm looking for and are small so they rank higher than larger documents. I want to exclude them from the index and I've come up with two ideas: 1) move them to a directory, which I will exclude from the index, but I'll have to change a bunch of links 2) detect them with some sort of flag and exclude them from the index. We were thinking that we could have a fake tag that lucene would detect and not index those pages. Has anyone run into this problem before? How difficult would it be to implement 2? Is there a way to detect a fake tag? I'm assuming that I can create a new boolean in the HTMLDocument class (true if it contains the exclude tag) and then not run Indexwriter.addDocument() if I find it. b -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>