Various Ideas from ApacheCon

Grant Ingersoll Mon, 07 May 2007 15:26:21 -0700

Hey Gang,

Back from ApacheCon in Amsterdam, and thought I would give a bit of areport on a few things that were interesting related to Lucene.

First off, there was a very high level of interest in Lucene andSolr, which was great to see.

In doing a training and a talk, couple of things that people seemedto ask about a fair amount.

1. Updates and how to do them. The whole delete/add thing just neversits well with newcomers. I want to throw out the idea ofimplementing something like the Layers functionality in photo editingtools like Photoshop (whereby the underlying image is not changed,but the layer adds/deletes/masks it). I wonder how complicated itwould be to mark a document as being updated and then know that wehave to look in an alternate place for information concerning thatField/Document such as the "updates" file. I don't know the detailsof implementing it, but wanted to see if it makes any sense at all.Gut reaction is it would be slower for searching, but how much slowernot sure. It could potentially be faster for updating and couldallow for per field updates. Just an idea, feel free to shoot itfull of holes. The other option might be to think about whether aflexible indexing implementation could be optimized for updatesinstead of searching. Optimization or merges could then bring theupdates back into the fold.

2. How does Lucene search compare w/ using built in DB search? Hasanyone done a study comparing Lucene performance/quality to the likesof MySQL/Postgres/Oracle? Related question is always on how tointegrate the two.

3. Some questions on the use cases of ParallelReader. So, if anyonecares to contribute in that arena, please do so, since I haven't usedit.

4. As much as we like to ignore file format issues (PDF, etc.) it isone of the big questions people have about using Lucene. Tika shouldhelp in this area, but still seems to be a little way off. Ourwebsite could help by giving more concrete advice on how to handledifferent file formats and maybe even some benchmarks on it. I thinkwe can maintain Lucene's independence from these libraries whilestill giving advice on how handle them. Maybe a best practicessection on the wiki?

5. Distributed Searching - Code/demonstration to do search acrossseveral indexes on several machines would be useful.

At any rate, just some random thoughts garnered from ApacheCon. Allin all, a good conf. w/ lots of Lucene interest.


-Grant


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Various Ideas from ApacheCon

Reply via email to