Marvin Humphrey wrote on 1/13/10 3:56 PM:
Of course, my next question is predictable: when will 0.30 be fully cooked?
The goal has been to get the KS dev branch file format and API stable before
displacing the stable branch. However, there are still difficult file format
problems to solve, particularly with regards to term dictionaries and posting
lists -- such as support for multi-stream posting formats and indexing of
non-text field types.
Are the file format problems actually bugs in the current format, or features
you would like to see added? IMO there's a big difference.
I understand your long-time aversion to changing the index format between
releases, and how that can break existing indexes in the wild. I've been bitten
by that situation with other Perl projects from CPAN where ModuleA gets
installed as part of a regular sysadmin upgrade because it is pulled in as a
dependency by ModuleB, and then my code that depends on ModuleA suddenly stops
working. CPAN's versioning is not ideal in that regard.
However, there are already checks in Build.PL for incompatible index formats,
and KS is not likely to be pulled in blindly as a dependency. That is, if
someone is installing KS from CPAN, they are doing it intentionally and Build.PL
will (or could be made to) help prevent them from shooting themselves.
Rebuilding an index is not the end of the world. We (and by we I mean search
developers) do it all the time, even with big doc corpora.
The perfect is the enemy of the good. I.e., if we wait until the perfect,
ultimate file format is finished, a stable KS 0.30 releases might never see CPAN
till KS is made obsolete by Lucy. That would be too bad, I think, because there
are sooo many good improvements in the .30 branch, stable and trunk, that people
could be taking advantage of without having to install a dev release or keep up
with svn trunk.
Small, stable, incremental and frequent releases to CPAN. I've been converted to
that idea. I'm trying now to convince you, Marvin. How am I doing? :)
I haven't really been working on those problems too much lately. After
reaching our goals for index opening speed and integration of memory mapped
sort caches last year, I could have gone back to that -- but instead, I've
gone to work on Lucy. Lucy isn't that far off at this point: N months.
That's great. Really. And as one of your faithful commit list readers, I applaud
everything you've achieved so far. It's monumental. It's good.
I just worry about the perfect being the enemy of the good. It's something I've
struggled with myself wrt Swish3, which has been gestating about as long as KS has.
There are some people who are using stable branch KS and who would be
disrupted if we simply clobber the stable branch by releasing the dev branch
on top of it, e.g. the MojoMojo folks
(<http://mojomojo.org/features#Searching>). I'm reluctant to do that, since
we haven't reached our goals for file format and API stability. Yeah, they
were warned by the "alpha" label, but KS has also been promising a level of
stability which we have yet to deliver. A one-time painful switch might have
been OK, but forcing them back into an ongoing dev cycle isn't.
It's only painful and disruptive if existing users install the newest KS. They
don't have to. And as above, we could come up with a reasonable system to help
them be very intentional about it. It could be as simple as changing the magic
version number in Build.PL (which is currently 0.20) whenever the index format
changes in some backwards incompatible way.
To avoid disrupting such users, we could take one of two paths:
* Fork the current stable release under "KinoSearch0" and expect existing
users to switch.
* Move the dev branch (svn trunk) under "KinoSearch2" and release it as an
alpha. (I lean towards this option because it sets a precedent for how I
think we'll need to handle versioning in Lucy.)
What about #3: stabilize svn trunk and release it as KS 0.30.
When there's another index compat change, release it as KS 0.40, etc.
If we'd managed to launch Lucy by now, this question would be academic,
because Lucy would have become the successor to the KS dev branch. And I've
kind of been working on Lucy with that in mind.
Lucy remains my main goal. From a marketing perspective, I'm not sure that
it's ideal to launch "KinoSearch2" as an alpha, then deprecate it in favor of
Lucy a few months later. And once Lucy is launched in earnest and people
outside our small circle start contributing, KS will have to be deprecated
because licensing issues will eventually prevent us from backporting some
important chunk of Lucy code to KS.
Deprecating KS in the future is fine and good. Between now and then, though,
let's get 0.30 released.
So that's why I've been kind of keeping my head down and working feverishly on
Lucy. I figured we'd get Lucy out as an alpha, grow its user base by
releasing Ruby and Python bindings, then harness the excitement from that to
work on the difficult problems that have held back KS. Designing a pluggable
indexing framework is hard; it's almost impossible without a large user base,
since only a small subset of users will be in a situation where they can test
drive the pluggability features and help us refine the API.
That makes total sense to me. Lucy is the future. And its viability to date
depends upon ideas worked out in the past years in KS. I expect that kind of
cross-fertilization to continue as long as the IP issues remain compatible. And
I expect to continue to help as I can.
And, how can I help?
You and Nate have been very helpful with regards to code and API review. If
we go down the current path towards Lucy, I'd ask you to continue exploring
new areas and providing feedback about how it went.
Will do.
If we decide to make a formal CPAN release of dev branch somehow, there will
be some mechanical work to do. If you wanted to do that, you could -- but I'm
under the impression that you don't have that much time (compared with the 60
or so hours I've been putting in each week) and I don't want to squander a
limited resource.
I wouldn't call it squandering. I would call it sharing. :)
Regarding time, I have some at the moment because I want to use KS at $work.
If I could go back in time, I would have released the KS 0.20 branch under the
namespace "KinoSearch2" and the 0.30 branch under "KinoSearch3". Maybe that
points the way forward. Whatever we do, though, I'm determined not to let
progress towards Lucy flag again.
I agree. Lucy should remain your primary concern.
How can I help move toward a KS 0.30 release?
--
Peter Karman . http://peknet.com/ . [email protected]