On Apr 17, 2008, at 3:00 PM, John Wang wrote:

What is the current progress on Lucy?

For various reasons, Dave Balmain has been largely unavailable over the last year. Without my primary co-conspirator, and without a user base, there weren't a lot of people to bounce ideas off of, so I chose to go where the action was -- back to the KinoSearch user base.

However, when I went back, I took the designs and concepts that Dave and I had hashed out for Lucy and worked them into KS. So now those designs been aired out and tested over a bunch of KS devel releases, and I've managed to work through a number of problems we'd left unresolved. In the process I've accumulated a fair amount of material I can commit to the Lucy repo after a bit of cleanup, and for some reason three different emails arrived today inquiring about Lucy's status -- so I should probably get busy. My plan is to finish the next KS release before I get back to Lucy in earnest, though, so a formal Lucy release is not imminent.

Which version of Lucene index format is it up to?

We aren't at that point. In any case, the only thing which has given Lucy a shot of working with Lucene indexes is the very recent resolution of LUCENE-510 -- Lucy wouldn't have worked with Lucene files at all except perhaps in a crippled compatibility mode had the Lucene file format not changed.

Looking forward, file format compatibility may continue to be a bugaboo. The Lucene format spec document was written up as an afterthought rather than composed, and it is exceedingly difficult to implement unless you are able to do a close line-by-line port -- which you can't when your target is a dynamic language. For any port to establish and maintain compatibility with the spec is an expensive, fiddly time-suck, and that will continue to be the case so long as the file format changes up as rapidly as it has historically.

Indeed, my primary goal with the next KS release is to design and write up a formal file spec which, when compared with the current Lucene spec is: shorter, simpler, more coherent, easier to implement, easier to extend, evolves more gracefully, uses human-readable metadata, and perhaps even lends itself to faster searching and indexing. To see some of what I'm up to, follow the discussion that Mike McCandless and I are having under the "Flexible indexing design" and the "Pooling of postings in DocumentsWriter" threads.

If I'm successful in that file spec design effort, perhaps I can persuade the Lucene community that it's in everyone's interest to adopt some of its major elements -- or even better, collaborate with other Lucene devs to improve on it. There's historical precedent for that best-case course of events in how Lucene's recent indexing speed improvements came about.

An saner file spec would make Lucy (and other ports) *much* less complicated to write and maintain. That would represent significant "lucy progress", and that's where a good fraction of my energies are being expended right now.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to