[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093816#comment-13093816
 ] 

Robert Muir commented on LUCENE-2308:
-------------------------------------

Thanks Mike! This is really helpful!

While reviewing this/merging the flexscoring branch, I had a few ideas of 
improvements:
* I think FT should be immutable? Personally, I don't think we should enforce 
"patterns" like Freezable or Builder at this low-level, instead I think 
FieldType should be a simple immutable class with a single ctor that takes the 
minimal stuff that we (core lucene) need. It can still be concrete, but then 
you have to specify everything. Then, things like TextField/StringField are 
sugar APIs for common configurations. I don't like the idea of mutable 
FieldTypes that are reused across different fields because I am concerned that 
somehow the 'wrong configuration' will be applied accidentally.
* Along these lines, we can then remove the "copy constructor", which also 
seems unnatural to java users, since FieldType would then be immutable there is 
no reason to ever copy it.
* I think BinaryField should be able to index as binary? This is a new feature 
in Lucene 4 but its unfortunately really hard to do: there are a few 
approaches, but they are all difficult: custom 
tokenstream/AttributeImpls/AttributeFactory etc.
* In the future, a BinaryField like this could be a base impl for 
CollationField: due to historical reasons we expose this capability as an 
Analyzer but I think this isn't great: its really an implementation detail. For 
example in Solr, its a real FieldType that uses an analyzer behind the scenes. 
So in this sense I think its more consistent with NumericField.


> Separately specify a field's type
> ---------------------------------
>
>                 Key: LUCENE-2308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2308
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
> LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
> LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
> LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
> LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
> LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
> LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
> LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
> LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
> LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
> LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
> LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch
>
>
> This came up from dicussions on IRC.  I'm summarizing here...
> Today when you make a Field to add to a document you can set things
> index or not, stored or not, analyzed or not, details like omitTfAP,
> omitNorms, index term vectors (separately controlling
> offsets/positions), etc.
> I think we should factor these out into a new class (FieldType?).
> Then you could re-use this FieldType instance across multiple fields.
> The Field instance would still hold the actual value.
> We could then do per-field analyzers by adding a setAnalyzer on the
> FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
> for per-field codecs (with flex), where we now have
> PerFieldCodecWrapper).
> This would NOT be a schema!  It's just refactoring what we already
> specify today.  EG it's not serialized into the index.
> This has been discussed before, and I know Michael Busch opened a more
> ambitious (I think?) issue.  I think this is a good first baby step.  We could
> consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
> off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to