I also got the same question. It seems it is very hard to efficiently do phrase based query.
I think most search engines do phrase based query, or at least appear to be. So, like in google, the query result must contain all the words user searched on. It seems to me that the impacted-sorted list makes sense if you are trying to do pure vector space based ranking. This is from what I have read from the research papers. They all talk about how to optimize the vector space model using this impact-sorted list approach. Unfortunately, the vector space model has serious drawbacks. It does not take the inter-word relation into account. Thus, could result in a search result where documents matching only some keywords ranked higher than documents matching all of them. I still yet to see whether the impact-sorted list approach could handle this efficiently. Cheers, Jian On 1/11/07, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
On Jan 9, 2007, at 6:25 AM, Dalton, Jeffery wrote: > e. <impact, num_docs, (doc1,...docN)> > f. <impact, num_docs, ([doc1, freq ,<positions>],...[docN, freq > ,<positions>]) How do you build an efficient PhraseScorer to work with an impact- sorted posting list? The way PhraseScorer currently works is: find a doc that contains all terms, then see if the terms occur consecutively in phrase order, then determine a score. The TermDocs objects feeding PhraseScorer return doc_nums in ascending order, so finding an intersection is easy. But if the document numbers are returned in what looks to the PhraseScorer like random order... ?? Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]