Op Tuesday 08 April 2008 15:18:34 schreef Itamar Syn-Hershko: > Paul, > > I don't see how this answers the question.
Towards the end, the page describes when a Scorer is called and roughly what it does. > I was asking why Lucene > has to access the index with exact terms, and not use RegEx or > simpler wildcards support internally? If Lucene will be able to look > for "w?rd" or "wor*" and treat the wildcards as wildcards, this will > greatly improve speed of searches and will eliminate the need for > Query rewriting. When it is known in advance that "w?rd" and "wor*" will be used in queries a lot, one can write a tokenizer that indexes them so that they can be searched directly. The problem is to know that in advance, that is at indexing time. > Since some people may want to index chars like those used in > wildcards, they could be escaped (or, those people will use the > standard search classes available today instead). I'm not entirely > sure what part of Lucene does the actual access to the terms and > position vectors, but if it could be sub-classed or cloned, and then > modified to honor wildcards or even RegEx, that would bring Lucene to > new heights. There are regular expression queries in the regex contrib module, however these work by rewriting to actually indexed terms. > Unless, again, there is a specific reason why this can't > be done. There is no specific reason why it cannot be done, one only needs to provide the corresponding tokenizer to be used at indexing time. Kind regards, Paul Elschot > > Itamar. > > -----Original Message----- > From: Paul Elschot [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 08, 2008 1:56 AM > To: java-user@lucene.apache.org > Subject: Re: Why Lucene has to rewrite queries prior to actual > searching? > > Itamar, > > Have a look here: > http://lucene.apache.org/java/2_3_1/scoring.html > > Regards, > Paul Elschot > > Op Tuesday 08 April 2008 00:34:48 schreef Itamar Syn-Hershko: > > Paul and John, > > > > Thanks for your quick reply. > > > > The problem with query rewriting is the beforementioned > > MaxClauseException. Instead of inflating the query and passing a > > deterministic list of terms to the actual search routine, Lucene > > could have accessed the vectors in the index using some sort of > > filter. So, for example, if it knows to access "Foobar" by its name > > in the index, why can't it take "Foo*" and just get all the vectors > > until "Fop" is met (for example). Why does it have to get > > deterministic list of terms? > > > > I will take a look at the Scorer - can you describe in short what > > exactly it does and where and when it is being called? > > > > I don't get John's comment though - Query::rewrite is being called > > prior to the actual searching (through QueryParser), how come it > > can use "information gathered from IndexReader at search time"? > > > > Itamar. > > > > -----Original Message----- > > From: Paul Elschot [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, April 08, 2008 12:57 AM > > To: java-user@lucene.apache.org > > Subject: Re: Why Lucene has to rewrite queries prior to actual > > searching? > > > > Itamar, > > > > Query rewrite replaces wildcards with terms available from the > > index. Usually that involves replacing a wildcard with a > > BooleanQuery that is an effective OR over the available terms while > > using a flat coordination factor, i.e. it does not matter how many > > of the available terms actually match a document, as long as at > > least one matches. > > > > For the required query parts (AND like), Scorer.skipTo() is used, > > and that could well be the filter mechanism you are referring to; > > have a look at the javadocs of Scorer, and, if necessary, at the > > actual code of ConjunctionScorer. > > > > Regards, > > Paul Elschot > > > > Op Monday 07 April 2008 23:13:09 schreef Itamar Syn-Hershko: > > > Hi all, > > > > > > Can someone from the experts here explain why Lucene has to get a > > > "rewritten" query for the Searcher - so Phrase or Wildcards > > > queries have to rewrite themselves into a "primitive" query, that > > > is then passed to Lucene to look for? I'm probably not familiar > > > too much with the internals of Lucene, but I'd imagine that if > > > you can inflate a query using wildcards via xxxxQuery sub > > > classing, you could as easily (?) have some sort of Filter > > > mechanism during the search, so that Lucene retrieves the > > > Position vectors for all the terms that pass that filter, instead > > > of retrieving only the position data for deterministic terms > > > (with no wildcards etc.). If that was possible to do somehow, it > > > could greatly increase the searchability of Lucene indices by > > > using RegEx (without re-writing and getting the dreaded > > > MaxClauseCount error) and similar. > > > > > > Would love to hear some insights on this one. > > > > > > Itamar. > > > > ------------------------------------------------------------------- > >-- To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > ------------------------------------------------------------------- > >-- To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]