Doug Cutting wrote:
Michael,
This sounds like very good work. The back-compatibility of this
approach is great. But we should also consider this in the broader
context of index-format flexibility.
Three general approaches have been proposed. They are not exclusive.
1. Make the index format extensible by adding user-implementable
reader and writer interfaces for postings.
2. Add a richer set of standard index formats, including things like
compressed fields, no-positions, per-position weights, etc.
3. Provide hooks for including arbitrary binary data.
Your proposal is of type (3). LUCENE-662 is a (1). Approaches of
type (2) are most friendly to non-Java implementations, since the
semantics of each variation are well-defined.
I don't see a reason not to pursue all three, but in a coordinated
manner. In particular, we don't want to add a feature of type (3)
that would make it harder to add type (1) APIs. It would thus be best
if we had a rough specification of type (1) and type (2). A proposal
of type (2) is at:
http://wiki.apache.org/jakarta-lucene/FlexibleIndexing
But I'm not sure that we yet have any proposed designs for an
extensible posting API. (Is anyone aware of one?) This payload
proposal can probably be easily incorporated into such a design, but I
would have more confidence if we had one. I guess I should attempt one!
Doug,
thanks for your detailed response. I'm aware that the long-term goal is
the flexible index format and I see the payloads patch only as a part of
it. The patch focuses on extending the index data structures and about a
possible payload encoding. It doesn't focus yet on a flexible API, it
only offers the two mentioned low-level methods to add and retrieve byte
arrays.
I would love to work with you guys on the flexible index format and to
combine my patch with your suggestions and the patch from Nicolas! I
will look at your proposal and Nicolas' patch tomorrow (have to go now).
I just attached my patch (LUCENE-755), so if you get a chance you could
take a look at it.
Maybe it would make sense now to follow your suggestion you made earlier
this year and start a new package to work on the new index format? On
the other hand, if people would like to use the payloads soon I guess
due to the backwards compatibility it would be low risk to add it to the
current index format to provide this feature until we can finish the
flexible format?
- Michael
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]