Hi, We are implementing a search engine for a huge dataset (approximately 50 million html pages). We have indexed various field related information, such as Title, Body , Meta text, H1, URL etc. Lucene provides the setBoost() function to give weightage to these fields. What should be the values for these fields? Should they be relative? Are there any standard values?
We've also computed Page Rank for those web pages, what can be the best way to combine the page rank information with the lucene's document score? -- Kushal Dave