[ 
https://issues.apache.org/jira/browse/LUCENE-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141001#comment-13141001
 ] 

Uwe Schindler commented on LUCENE-3490:
---------------------------------------

bq. remains to factor out all these little embedded codecs from tests and move 
them to test-framework, register them, etc.

You can also leave them inside the tests and add a meta-inf there.

bq. Thanks for doing all this work Uwe... its a really good step forward... if 
you feel like doing the simplification I am all for it 

I think I will do it and require all codecs to have no-arg Ctor. The whole 
spi.* package/folder can then go away. This reduces the amount of code to 20% 
and makes it easier for users implementing own codecs. They just have to list 
the class names in META-INF. We will have 2 meta-inf files: 
META-INF/services/o.a.l.index.codecs.Codec and 
META-INF/services/o.a.l.index.codecs.PostingsFormat that list all classes - 
done. No more magic needed.

I will change the branch to use this and add a Lucene40Pulsing codec (named 
"Lucene40Pulsing") based on abstract PulsingCodec.
                
> Restructure codec hierarchy
> ---------------------------
>
>                 Key: LUCENE-3490
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3490
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-3490_SPI.patch
>
>
> Spinoff of LUCENE-2621. (Hoping we can do some of the renaming etc here in a 
> rote way to make progress).
> Currently Codec.java only represents a portion of the index, but there are 
> other parts of the index 
> (stored fields, term vectors, fieldinfos, ...) that we want under codec 
> control. There is also some 
> inconsistency about what a Codec is currently, for example Memory and Pulsing 
> are really just 
> PostingsFormats, you might just apply them to a specific field. On the other 
> hand, PreFlex actually
> is a Codec: it represents the Lucene 3.x index format (just not all parts 
> yet). I imagine we would
> like SimpleText to be the same way.
> So, I propose restructuring the classes so that we have something like:
> * CodecProvider <-- codec name to Class resolution only
> * Codec <-- represents the index format (PostingsFormat + FieldsFormat + ...)
> * PostingsFormat: this is what Codec controls today, and Codec will return 
> one of these for a field.
> * FieldsFormat: Stored Fields + Term Vectors + FieldInfos?
> I think for PreFlex, it doesnt make sense to expose its PostingsFormat as a 
> 'public' class, because preflex
> can never be per-field so there is no use in allowing you to configure 
> PreFlex for a specific field.
> Similarly, I think we should do the same thing for SimpleText. Nobody needs 
> SimpleText for production, it should
> just be a Codec where we try to make as much of the index as plain text and 
> simple as possible for debugging/learning/etc.
> So we don't need to expose its PostingsFormat. On the other hand, I don't 
> think we need Pulsing or Memory codecs,
> because its pretty silly to make your entire index use one of their 
> PostingsFormats. To parallel with analysis:
> PostingsFormat is like Tokenizer and Codec is like Analyzer, and we don't 
> need Analyzers to "show off" every Tokenizer.
> Later, once we abstract FieldInfos reading/writing out of o.a.l.index into 
> codec control, we can also then
> move the baked in PerFieldCodecWrapper out (it would basically be 
> PerFieldPostingsFormat). Privately it would
> write the ids to the file like it does today. all 3.x hairy backwards code 
> would move to PreflexCodec. SimpleTextCodec
> would get a plain text fieldinfos impl, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to