[ https://issues.apache.org/jira/browse/LUCENE-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicolas Lalevée updated LUCENE-662: ----------------------------------- Attachment: indexFormat.patch indexFormat-only.patch Synchronized with the trunk, so with the payload feature. It allowed me to refactor in one class the payload writing which is in two places today : it is now in the DefaultPostingWriter class. >From my last update, the TODO list is still to do, nothing has been fixed. >Furthermore there is a regression in the new patch : the ensureOpen() is not >correctly handled for lazy loaded fields : a test fail. This is due to the >fact that the FieldsReader doesn't handle it anymore in my patch. As the data >struture can be customized, lazy loading is exported to the FieldData created >by the FieldsReader. So the both instance have to communicate about the >closing of the streams. So a new item in the TODO list. As discussed in java-dev, here is a light patch with only the index format handling, without the possibility to redefine how data and postings are store/retreived. > Extendable writer and reader of field data > ------------------------------------------ > > Key: LUCENE-662 > URL: https://issues.apache.org/jira/browse/LUCENE-662 > Project: Lucene - Java > Issue Type: Improvement > Components: Store > Reporter: Nicolas Lalevée > Priority: Minor > Attachments: entrytable.patch, generic-fieldIO-2.patch, > generic-fieldIO-3.patch, generic-fieldIO-4.patch, generic-fieldIO-5.patch, > generic-fieldIO.patch, indexFormat-only.patch, indexFormat.patch, > indexFormat.patch, indexFormat.patch > > > As discussed on the dev mailing list, I have modified Lucene to allow to > define how the data of a field is writen and read in the index. > Basically, I have introduced the notion of IndexFormat. It is in fact a > factory of FieldsWriter and FieldsReader. So the IndexReader, the indexWriter > and the SegmentMerger are using this factory and not doing a "new > FieldsReader/Writer()". > I have also introduced the notion of FieldData. It handles every data of a > field, and also the writing and the reading in a stream. I have done this way > because in the current design of Lucene, Fiedable is an interface, so methods > with a protected or package visibility cannot be defined. > A FieldsWriter just writes data into a stream via the FieldData of the field. > A FieldsReader instanciates a FieldData depending on the field name. Then it > use the field data to read the stream. And finnaly it instanciates a Field > with the field data. > About compatibility, I think it is kept, as I have writen a > DefaultIndexFormat that provides some DefaultFieldsWriter and > DefaultFieldsReader. These implementations do the exact job that is done > today. > To acheive this modification, some classes and methods had to be moved from > private and/or final to public or protected. > About the lazy fields, I have implemented them in a more general way in the > implementation of the abstract class FieldData, so it will be totally > transparent for the Lucene user that will extends FieldData. The stream is > kept in the fieldData and used as soon as the stringValue (or something else) > is called. Implementing this way allowed me to handle the recently introduced > LOAD_FOR_MERGE; it is just a lazy field data, and when read() is called on > this lazy field data, the saved input stream is directly copied in the output > stream. > I have a last issue with this patch. The current design allow to read an > index in an old format, and just do a writer.addIndexes() into a new format. > With the new design, you cannot, because the writer will use the > FieldData.write provided by the reader. > enjoy ! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]