Re: Lucene in Action e-book now available!

Erik Hatcher Fri, 10 Dec 2004 16:33:32 -0800

Jonathan,

On Dec 10, 2004, at 5:29 PM, Jonathan Hager wrote:

Congratulations on the book.  I ordered my copy the other day via
regular post and am eagerly awaiting it.  It looks like it will make
lucene available to a much wider audience.

Thank you. Let me know off-line when you receive your copy, so I can get an idea of when it actually ships.

Based on the table of contents, I wanted to toss out a couple of ideas
for your next book or articles.


Suggestions more than welcome!

1. I didn't see any examples of indexing a database table.  Although
it was mentioned in Chapter 1.  At the company I am currently
consulting at, we index the data from the database because its cleaner
than indexing the web.

It is true that LIA does not cover any details of indexing a database. The idea is that Lucene indexes text, and how you get text into Lucene is not usually generalizable - or at least not worth the complexity of trying to do so.

The jGuru case study makes some strong points about indexing your content rather than your website. I've found that words of wisdom often are more potent than specific technical details that do not generalize.

  This discussion should include why you would
want to use lucene to index a database table, rather than just using
the database indexes.  (The top reasons we choose to use Lucene
instead of just database indexes are: It allows stem word recognition;
It allows fuzzy searching; It ranks the results based on how good the
match is; It contains a parser that will parse natural language
queries; It has better Analyzers)

These are all great points. I will add these to my notes for future editions.

2. This one is a cookbook idea, I think it would be possible to index
the access log of web server.  Than when a user views product X the
searcher could search for a other products that were viewed by people
that also looked at product X.  In this way you can create basic
"cross-selling" opportunities.  This feature is a big seller to
managers for commercial search offerings.

While this is not exactly what you mention, I did write a section on the new term vector feature allowing it to be used for for finding similar documents.

3. A lot of search applications being built using lucene are web
applications.  I didn't see any reference to the two different
strategies for paging a hit list.  The two strategies are repeating
the search and caching a search.  An example of this would be good.
[I know that I have seen this online, its just nice to have a
reference in book form]

The Searching chapter has a brief section discussing this topic, with the advice that re-searching is the best first approach.

Please don't take this as criticism.


No worries.  Your points are all very valid.

To address the inevitable follow-up, I am setting up a blog site where we'll post items that need further clarification, correction, or completely new things that we did not cover. This will take me a few weeks to get time to set up properly. I will announce it here when it is available.

I look forward to reading the book and appreciate your 14+ months of
hard work to create a concise but valuable book for Lucene.


Thanks again for your feedback.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene in Action e-book now available!

Reply via email to