[ 
https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697034#action_12697034
 ] 

Michael McCandless commented on LUCENE-1590:
--------------------------------------------


bq. In principle the Field instances should have no indexing options.

You mean retrieved fields right?  I agree.

bq. but for this case it would really be better to make the Field infos 
"public", so somebody could enumerate all fields and test then, which options 
were used during indexing. Mixing this with retrieval of stored fields is not 
good.

I agree, we should make it possible to access the "schema"
(FieldInfos) from the index. This would presumably replace the
getFieldNames(FieldOption) IndexReader exposes today.

Since FieldInfos is per-segment, one challenge is how Multi*Reader
should work.  Should it simply merge on-the-fly?  (ie present a single
FieldInfo that merged the fields by the same name across all segmens)

bq. FieldsReader would be better to have a central method like 
copyFieldOptions(FieldInfo, Fieldable), that copies all options from FieldInfo 
to the Fieldable (without looking at the stored contents).

This sounds like a good stop-gap measure, but I'd rather put our
energy towards exposing the schema, decoupling "retrieved" Fields from
indexed fields, etc.


> Stored-only fields automatically enable norms and tf when added to document
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1590
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4, 2.4.1, 2.9
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1590.patch, LUCENE-1590.patch, LUCENE-1590.patch
>
>
> During updating my internal components to the new TrieAPI, I have seen the 
> following:
> I index a lot of numeric fields with trie encoding omitting norms and term 
> frequency. This works great. Luke shows that both is omitted.
> As I sometimes also want to have the components of the field stored and want 
> to use the same field name for it. So I add additionally the field again to 
> the document, but stored only (as the Field c'tor using a TokenStream cannot 
> additionally store the field). As it is stored only, I thought, that I can 
> left out explicit setting of omitNorms and omitTermFreqAndPositions. After 
> adding the stored-only-without-omits field, Luke shows all fields with norms 
> enabled. I am not sure, if the norms/tf were really added to the index, but 
> Luke shows a value for the norms and FieldInfo has it enabled.
> In my opinion, this is not intuitive, o.a.l.document.Field  should switch 
> both omit* options on when storing fields only (and also disable other 
> indexing-only options). Alternatively the internal FieldInfo.update(boolean 
> isIndexed, boolean storeTermVector, boolean storePositionWithTermVector, 
> boolean storeOffsetWithTermVector, boolean omitNorms, boolean storePayloads, 
> boolean omitTermFreqAndPositions) should only change the omit* and other 
> options, if the isIndexed parameter (not this.isIndexed) is also true, 
> elsewhere leave it as it is.
> In principle, when adding a stored-only field, any indexing-specific options 
> should not be changed in FieldInfo. If the field was indexed with norms 
> before, norms should stay enabled (but this would be the default as it is).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to