[ 
https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997129#comment-12997129
 ] 

Simon Willnauer commented on LUCENE-2881:
-----------------------------------------

bq. I fixed a bug in FieldInfos that could lead to wrong field numbers, that 
might have been related to the wrong behavior you're seeing, Simon

ah that could be the reason. I will need to patch my branch again to see if 
your patch helps. Will do tomorrow.

{quote}
About codecIds: I made the fix to FieldInfo.clone() to set the codecId on the 
clone. I also made FieldInfo.codecId private and added getter and setter. The 
setter checks whether the new value for codecId is different from the previous 
one, and throws in exception in that case (unless it was set to the default 0 
before, which I think means Preflex codec).
{quote}

The clone issue seems to be fixed in your latest patch. While the setter seems 
kind of wrong. Lemme explain how that numbering works. If you create a SI for a 
flushed segment SegmentCodecs creates an ordinal for each codec and sets it to 
the corresponding fields. The ordinal (codecID) is the array index in the 
SegmentCodec's Codec array which holds the codec instance used for that field. 
So codecID = 0 is a valid value for segments having PreFlex or a >= 4.0 codec. 
But if we open a segment that is pre-flex there will only be one codec for the 
entire segment with codecID=0, thats why this is assigned. (Note: I need to 
document this where its set!) I think we should initialize the codecID with a 
different value and replace the this.codecId != 0 check with something like 
this.codecId != -1. Maybe we should just use an assert here instead of an 
exception, this is somewhat internal though.

{quote}
All tests pass. Please let me know if that fixes your problem. If not then you 
should at least see the new exception that I added, which might make debugging 
easier.{quote}

Will do tomorrow! What exactly was the problem with the previous patch beside 
the codecID clone issue?


> Track FieldInfo per segment instead of per-IW-session
> -----------------------------------------------------
>
>                 Key: LUCENE-2881
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2881
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Realtime Branch, CSF branch, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Michael Busch
>             Fix For: Realtime Branch, CSF branch, 4.0
>
>         Attachments: lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, 
> lucene-2881.patch
>
>
> Currently FieldInfo is tracked per IW session to guarantee consistent global 
> field-naming / ordering. IW carries FI instances over from previous segments 
> which also carries over field properties like isIndexed etc. While having 
> consistent field ordering per IW session appears to be important due to bulk 
> merging stored fields etc. carrying over other properties might become 
> problematic with Lucene's Codec support.  Codecs that rely on consistent 
> properties in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment 
> and field (using the field id within the file name). Yet, if a segment has no 
> DocValues indexed in a particular segment but a previous segment in the same 
> IW session had DocValues, FieldInfo#docValues will be true  since those 
> values are reused from previous segments. 
> We already work around this "limitation" in SegmentInfo with properties like 
> hasVectors or hasProx which is really something we should manage per Codec & 
> Segment. Ideally FieldInfo would be managed per Segment and Codec such that 
> its properties are valid per segment. It also seems to be necessary to bind 
> FieldInfoS to SegmentInfo logically since its really just per segment 
> metadata.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to