You cannot, in general, structure a Lucene query such that it will yield
the same document rankings that Google would for that (query, document
set).  The reason for this is that Google employs a scoring algorithm that
includes information about the topology of the pages (i.e., how the
pages are linked together).  (An overview of what Google does in this
regard may be found at http://www.google.com/technology/index.html .)
Thus, in order to get Lucene to do "what Google does", you'd have to
rewrite large chunks of it.

Joshua

 [EMAIL PROTECTED] Per Obscurius...www.ics.uci.edu/~jmadden
    Joshua Madden: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.

On Mon, 25 Feb 2002, Spencer, Dave wrote:

> I'm pretty sure google gives priority to the words appearing in the
> title and URL.
> 
> I believe sect 4.2.5 says this here:
> http://citeseer.nj.nec.com/cache/papers/cs/13017/http:zSzzSzwww-db.stanf
> ord.eduzSzpubzSzpaperszSzgoogle.pdf/brin98anatomy.pdf
> from here: 
> http://citeseer.nj.nec.com/brin98anatomy.html
> 
> So you have to have Lucene store the title as a separate field.
> 
> This is then what you'd have if like me you boost (the caret is "boost")
> the title by *5 and the URL by *2:
> 
> +(title:george^5.0 url:george^2.0 contents:george) +(title:bush^5.0
> url:bush^2.0 contents:bush) +(title:white^5.0 url:white^2.0
> contents:white) +(title:house^5.0 url:house^2.0 contents:house)
> 
> 
> -----Original Message-----
> From: Ian Lea [mailto:[EMAIL PROTECTED]]
> Sent: Saturday, February 23, 2002 8:15 AM
> To: Lucene Users List
> Subject: Re: Googlifying lucene querys
> 
> 
> +george +bush +white +house
> 
> 
> --
> Ian.
> 
> Jari Aarniala wrote:
> > 
> > Hello,
> > 
> > Despite of the confusing subject ;) my question is simple. I'm just
> > trying out Lucene for the first time and would like to know how one
> > would go on implementing the search on the index with the same logic
> > that Google uses.
> >         For example, if the user input is "george bush white house",
> how
> > do I easily construct a query that searches ALL of the words above? If
> I
> > have understood correctly, passing the search string above to the
> > queryParser creates a query that search for ANY of the words above.
> > 
> >         Thanks for any help,
> 
> --
> To unsubscribe, e-mail:
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
> 
> 


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to