[
https://issues.apache.org/jira/browse/LUCENE-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141111#comment-13141111
]
Uwe Schindler commented on LUCENE-3490:
---------------------------------------
I removed CodecProvider again in revision: 1195963
The new code uses a simple loader mechanism that can load any no-arg ctor based
Codec or PostingsFormat using a new interface NamedSPI that simply defines the
getName() method.
> Restructure codec hierarchy
> ---------------------------
>
> Key: LUCENE-3490
> URL: https://issues.apache.org/jira/browse/LUCENE-3490
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3490_SPI.patch
>
>
> Spinoff of LUCENE-2621. (Hoping we can do some of the renaming etc here in a
> rote way to make progress).
> Currently Codec.java only represents a portion of the index, but there are
> other parts of the index
> (stored fields, term vectors, fieldinfos, ...) that we want under codec
> control. There is also some
> inconsistency about what a Codec is currently, for example Memory and Pulsing
> are really just
> PostingsFormats, you might just apply them to a specific field. On the other
> hand, PreFlex actually
> is a Codec: it represents the Lucene 3.x index format (just not all parts
> yet). I imagine we would
> like SimpleText to be the same way.
> So, I propose restructuring the classes so that we have something like:
> * CodecProvider <-- codec name to Class resolution only
> * Codec <-- represents the index format (PostingsFormat + FieldsFormat + ...)
> * PostingsFormat: this is what Codec controls today, and Codec will return
> one of these for a field.
> * FieldsFormat: Stored Fields + Term Vectors + FieldInfos?
> I think for PreFlex, it doesnt make sense to expose its PostingsFormat as a
> 'public' class, because preflex
> can never be per-field so there is no use in allowing you to configure
> PreFlex for a specific field.
> Similarly, I think we should do the same thing for SimpleText. Nobody needs
> SimpleText for production, it should
> just be a Codec where we try to make as much of the index as plain text and
> simple as possible for debugging/learning/etc.
> So we don't need to expose its PostingsFormat. On the other hand, I don't
> think we need Pulsing or Memory codecs,
> because its pretty silly to make your entire index use one of their
> PostingsFormats. To parallel with analysis:
> PostingsFormat is like Tokenizer and Codec is like Analyzer, and we don't
> need Analyzers to "show off" every Tokenizer.
> Later, once we abstract FieldInfos reading/writing out of o.a.l.index into
> codec control, we can also then
> move the baked in PerFieldCodecWrapper out (it would basically be
> PerFieldPostingsFormat). Privately it would
> write the ids to the file like it does today. all 3.x hairy backwards code
> would move to PreflexCodec. SimpleTextCodec
> would get a plain text fieldinfos impl, etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]