date:20050127

Re: Search results excerpt similar to Google

2005-01-27 Thread Jason Polites

I think they do a proximity result based on keyword matches.  So... If you 
search for "lucene" and the document returned has this word at the very 
start and the very end of the document, then you will see the two sentences 
(sequences of words) surrounding the two keyword matches, one from the start 
of the document and one from the end.

How you determine which words from the result you include in the summary is 
up to you.  The problem with this it that in Lucene-land you have to store 
the content of the document inside in index verbatim (so you can get 
arbitrary portions of it out).  This means your index will be larger than it 
really needs to be.

I usually just store the first 255 characters in the index and use this as a 
summary.  It's not as good as Google, but it seems to work ok.

- Original Message - 
From: "Ben" <[EMAIL PROTECTED]>
To: "Lucene" 
Sent: Friday, January 28, 2005 5:08 PM
Subject: Search results excerpt similar to Google


Hi
Is it hard to implement a function that displays the search results
excerpts similar to Google?
Is it just string manipulations or there are some logic behind it? I
like their excerpts.
Thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: google mini? who needs it when Lucene is there

2005-01-27 Thread David Spencer

Xiaohong Yang (Sharon) wrote:
Hi,

I agree that Google mini is quite expensive. It might be similar to the desktop version in quality. Anyone knows google's ratio of index to text? Is it true that Lucene's index is about 500 times the original text size (not including image size)? I don't have one installed, so I cannot measure.
500:1 for Lucene? I don't think so.
In my wikipedia search engine the data in the MySQL DB I index from is
approx 1.0 GB (sum of lengths of title and body), while the Lucene index
of just these 2 fields is 250MB, thus in this case the Lucene index is
25% of the corpus size.

Best,

Sharon

jian chen <[EMAIL PROTECTED]> wrote:
Hi,
I was searching using google and just found that there was a new
feature called "google mini". Initially I thought it was another free
service for small companies. Then I realized that it costs quite some
money ($4,995) for the hardware and software. (I guess the proprietary
software costs a whole lot more than actual hardware.)
The "nice" feature is that, you can only index up to 50,000 documents
with this price. If you need to index more, sorry, send in the
check...
It seems to me that any small biz will be ripped off if they install
this google mini thing, compared to using Lucene to implement a easy
to use search software, which could search up to whatever number of
documents you could image.
I hope the lucene project could get exposed more to the enterprise so
that people know that they have not only cheaper but more importantly,
BETTER alternatives.
Jian
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

50 matches

Mail list logo