[ 
https://issues.apache.org/jira/browse/LUCENE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5969:
--------------------------------
    Attachment: LUCENE-5969_part3.patch

Here is a patch for part 3. I think its ready, we should close the issue after 
this.
Other improvements can be separate issues from here.
Also after resolving this issue and backporting, we can do further cleanups in 
trunk, and remove all the 4.x support in backwards-codecs and further cleanups 
in SegmentInfos. 

Patch finishes adding all safety (docvalues, terms, postings, commit points). 
CodecUtil "segmentHeader" is renamed to "indexHeader", as its used for all 
index files (including commit points). 

BlockTree doesn't "backdoor" via checkindex to return stats, there is a dead 
simple API for this.

Norms sparse encoding is further improved with PATCHED strategy.

There is an API change for SegmentInfos for safety, instead of instance methods 
for reading read into "mutable" SIS:
{code}
SegmentInfos.read(Dir);
SegmentInfos.read(Dir, file);
{code}

these are now static methods that return a clean instance (and named readCommit 
and readLatestCommit respectively, to not be fragile on upgrade).

There is more to fix here, IMO SIS "tries to take on too much" (mutable state 
by IndexWriter, tracking of counters etc by IndexWriter, reading/writing 
commits, tries to be a "low level user-friendly" and too much publicly exposed 
dangers. This is all for a heavily versioned important file with conditional 
logic. But thats a bigger problem.


> Add Lucene50Codec
> -----------------
>
>                 Key: LUCENE-5969
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5969
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 5.0, Trunk
>
>         Attachments: LUCENE-5969.patch, LUCENE-5969.patch, 
> LUCENE-5969_part2.patch, LUCENE-5969_part3.patch
>
>
> Spinoff from LUCENE-5952:
>   * Fix .si to write Version as 3 ints, not a String that requires parsing at 
> read time.
>   * Lucene42TermVectorsFormat should not use the same codecName as 
> Lucene41StoredFieldsFormat
> It would also be nice if we had a "bumpCodecVersion" script so rolling a new 
> codec is not so daunting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to