Andrzej Bialecki  created LUCENE-4050:
-----------------------------------------

             Summary: Change SegmentInfos format to plain text
                 Key: LUCENE-4050
                 URL: https://issues.apache.org/jira/browse/LUCENE-4050
             Project: Lucene - Java
          Issue Type: Improvement
          Components: core/codecs
            Reporter: Andrzej Bialecki 
             Fix For: 4.0


I propose to change the format of SegmentInfos file (segments_NN) to use plain 
text instead of the current binary format.

SegmentInfos file represents a commit point, and it also declares what codecs 
were used for writing each of the segments that the commit point consists of. 
However, this is a chicken and egg situation - in theory the format of this 
file is customizable via Codec.getSegmentInfosFormat, but in practice we have 
to first discover what is the codec implementation that wrote this file - so 
the SegmentCoreReaders assumes a certain fixed binary layout of a preamble of 
this file that contains the codec name... and then the file is read again, only 
this time using the right Codec.

This is ugly. Instead I propose to use a simple plain text format, either line 
oriented properties or JSON, in such a way that newer versions could easily 
extend it, and which wouldn't require any special Codec to read and parse. 
Consequently we could remove SegmentInfosFormat altogether, and instead add 
SegmentInfoFormat (notice the singular) to Codec to read single per-segment 
SegmentInfo-s in a codec-specific way. E.g. for Lucene40 codec we could either 
add another file or we could extend the .fnm file (FieldInfos) to contain also 
this information. 

Then the plain text SegmentInfos would contain just the following information:

* list of global files for this commit point (if any)
* list of segments for this commit point, and their corresponding codec class 
names
* user data map


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to