Thank you Chris. That seems like a good suggestion. I will try to pass a different Query object to the Highlighter api that the one used for searching.

I plan to break down the HTML document and store the title/sub title/content in different fields of the index. So if I create a new query comparing company name and keywords against title and content fields, then I am assuming that highlighter api will give a higher ranking to the fragment where both terms of the query match against those fragments where just one term(either title or content) matches. I am assuming that even if I do not increase the boost factor of any of the terms, the api will take care of this ranking. This is my understanding of the scoring/ranking algorithm. Any comments anyone?

Thanks,
Harini

Chris Hostetter wrote:

: My requirement is to show the relevant fragments of the news article for
: each company along with the search results. But the highlighter api
: sometimes picks up the fragments which are not so relevant to the news
: article/company. I would like to know if there is anyway that I can
: modify the scoring/ranking of these fragments in such a way that the
: news items in which a company name & keywords in the headline gets
: assigned a very strong relevancy ranking,  closely followed by a company
: name mention in the first paragraph and a  multiple-mention within the
: entire story. Something like headline =   5 points,  first paragraph =
: four, etc.

Well, the sample query you mentioned isn't checking any company names, or
doing anything with a "keywords" field.  I'm not to familiar with the way
the highlighter package works, but i imagine that with the types of
queries you said you are using, if you are highlighting the "Content"
field, the CompanyId and the FilingDate clauses of your query will be
fairly irelevent (becuase they are numbers, not because they are different
field names)

An idea i've suggested before (but i don't remember if anyone ever said
wether it is a viable use of the Highlighter or not) is to give the
highlighter a completely different Query object then the one you used to
get your search results.

ie, if you search query (what you want used to compute score) is...

 +(CompanyId:10 CompanyId:20) Content:"cost saving" Content:outsource

...but once you've gotten those results, what you really care about is
highlighting the name of the company, and you think the best fragments
when those company names appear near the other words, then give the
highlighter a query that looks like...

 "companyname10 cost savings"~20 "companyname20 outsource"~20 ...etc



: >>> Here is the search query(BooleanQuery) I am passing to the
: >>> IndexSearcher
: >>> and QueryScorer:
: >>> +DocumentType:news
: >>> +(CompanyId:10 CompanyId:20 CompanyId:30 CompanyId:40)
: >>> +FilingDate:[20041201 TO 20051201]
: >>> +(Content:"cost saving" Content:"cost savings" Content:outsource
: >>> Content:outsources Content:downsize Content:downsizes
: >>> Content:restructuring Content:restructure)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to