> Only terms returned from the Analyzer are considered, so if a stop > word is removed it does not count for tf or idf. But I need to compare according to non indexed words also. By the way, goole does this.
> This will happen automatically with PhraseQuery with a slop factor. > The closer the words, the better the score. However, with a pure > boolean query, proximity is not considered at all (nor should it > be). You can use a large slop factor for phrases such as "quick > fox"~100 and see how the scores work then. This means that all words must be in the result. This is not always the case in my application. If I am searching for "quick brown fox", "quick fox" is an acceptable result. I just need to know whether I need to resort the search results according to my criteria, or there are some methods to override which will bring results already sorted. On 7/22/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Jul 22, 2005, at 9:59 AM, Ahmed El-dawy wrote: > > > Hello, > > I am using lucene to search plain text, but the order of the search > > results is not satisfying to my needs. First, I want to know how the > > similarity works. Then, I need to extend it. > > Use IndexSearcher.explain() to see how each individual hit is scored > against a Query - this will be the clearest way to see why things > score the way they do. > > > First, does the similarity class work on analyzed text or original > > search text? To be precise, does it count the stop words as found > > terms or not? > > Only terms returned from the Analyzer are considered, so if a stop > word is removed it does not count for tf or idf. > > > Second, I want to add a factor of how relative are the terms of the > > query found in text. For example, when I search for "quick fox", "fox > > quick" and "quick brown fox" will be less ranked than "quick fox". > > This will happen automatically with PhraseQuery with a slop factor. > The closer the words, the better the score. However, with a pure > boolean query, proximity is not considered at all (nor should it > be). You can use a large slop factor for phrases such as "quick > fox"~100 and see how the scores work then. > > Erik > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards, Ahmed Saad --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
