[ 
https://issues.apache.org/jira/browse/LUCENE-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697024#action_12697024
 ] 

Uwe Schindler commented on LUCENE-1590:
---------------------------------------

{quote}
Patch looks good! All tests pass. That was trickier than I expected;
thanks Uwe. I plan to commit in a day or two.
{quote}

The only tricky part was the FieldsReader. The original bug was fixed in a few 
lines (FieldInfo ctor and update()).

{quote}
It's a good catch, all the places in FieldsReader where we fail to
carryover OTFAP from FieldInfo --> Field instance on the document.
It's yet another example of how having the loaded Document "seem like"
the indexed document causes problems.
{quote}

I am still not happy with the new FieldReader because it cannot replicate all 
indexing infos (but now does almost everything). I know, it does not affect 
functionality (as only the stored contents can be retrieved). In principle the 
Field instances should have *no* indexing options. Luke would the display 
nothing anymore, but for this case it would really be better to make the Field 
infos "public", so somebody could enumerate all fields and test then, which 
options were used during indexing. Mixing this with retrieval of stored fields 
is not good.

One case is now not implemented correctly in FieldsReader: A binary stored 
field have a special if-clause in FieldsReader. The binary field is loaded as 
stored only, currently only omitTf and omitNorms are set (I added this). But 
e.g. INDEX is always false and so on. In principle for completeness, all 
options from FieldInfo should be replicated here.
FieldsReader would be better to have a central method like 
copyFieldOptions(FieldInfo, Fieldable), that copies all options from FieldInfo 
to the Fieldable (without looking at the stored contents). The other if-cases 
should only initialize the stored parts and type. I think, I give it a try.
The whole info is now more important: If somebody in the past had stored the 
string contents compressed, he must now use a binary field and compress 
himself. In this case, Luke would not display any indexing options anymore. 
This is not bad, but inconsistent.

So the better case is to make the Field properties public not on the document 
level, but on the IndexReader level.

> Stored-only fields automatically enable norms and tf when added to document
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1590
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.4, 2.4.1, 2.9
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1590.patch, LUCENE-1590.patch, LUCENE-1590.patch
>
>
> During updating my internal components to the new TrieAPI, I have seen the 
> following:
> I index a lot of numeric fields with trie encoding omitting norms and term 
> frequency. This works great. Luke shows that both is omitted.
> As I sometimes also want to have the components of the field stored and want 
> to use the same field name for it. So I add additionally the field again to 
> the document, but stored only (as the Field c'tor using a TokenStream cannot 
> additionally store the field). As it is stored only, I thought, that I can 
> left out explicit setting of omitNorms and omitTermFreqAndPositions. After 
> adding the stored-only-without-omits field, Luke shows all fields with norms 
> enabled. I am not sure, if the norms/tf were really added to the index, but 
> Luke shows a value for the norms and FieldInfo has it enabled.
> In my opinion, this is not intuitive, o.a.l.document.Field  should switch 
> both omit* options on when storing fields only (and also disable other 
> indexing-only options). Alternatively the internal FieldInfo.update(boolean 
> isIndexed, boolean storeTermVector, boolean storePositionWithTermVector, 
> boolean storeOffsetWithTermVector, boolean omitNorms, boolean storePayloads, 
> boolean omitTermFreqAndPositions) should only change the omit* and other 
> options, if the isIndexed parameter (not this.isIndexed) is also true, 
> elsewhere leave it as it is.
> In principle, when adding a stored-only field, any indexing-specific options 
> should not be changed in FieldInfo. If the field was indexed with norms 
> before, norms should stay enabled (but this would be the default as it is).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to