Extendable writer and reader of field data
------------------------------------------
Key: LUCENE-662
URL: http://issues.apache.org/jira/browse/LUCENE-662
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Nicolas Lalevée
Priority: Minor
Attachments: generic-fieldIO.patch
As discussed on the dev mailing list, I have modified Lucene to allow to define
how the data of a field is writen and read in the index.
Basically, I have introduced the notion of IndexFormat. It is in fact a factory
of FieldsWriter and FieldsReader. So the IndexReader, the indexWriter and the
SegmentMerger are using this factory and not doing a "new
FieldsReader/Writer()".
I have also introduced the notion of FieldData. It handles every data of a
field, and also the writing and the reading in a stream. I have done this way
because in the current design of Lucene, Fiedable is an interface, so methods
with a protected or package visibility cannot be defined.
A FieldsWriter just writes data into a stream via the FieldData of the field.
A FieldsReader instanciates a FieldData depending on the field name. Then it
use the field data to read the stream. And finnaly it instanciates a Field with
the field data.
About compatibility, I think it is kept, as I have writen a DefaultIndexFormat
that provides some DefaultFieldsWriter and DefaultFieldsReader. These
implementations do the exact job that is done today.
To acheive this modification, some classes and methods had to be moved from
private and/or final to public or protected.
About the lazy fields, I have implemented them in a more general way in the
implementation of the abstract class FieldData, so it will be totally
transparent for the Lucene user that will extends FieldData. The stream is kept
in the fieldData and used as soon as the stringValue (or something else) is
called. Implementing this way allowed me to handle the recently introduced
LOAD_FOR_MERGE; it is just a lazy field data, and when read() is called on this
lazy field data, the saved input stream is directly copied in the output stream.
I have a last issue with this patch. The current design allow to read an index
in an old format, and just do a writer.addIndexes() into a new format. With the
new design, you cannot, because the writer will use the FieldData.write
provided by the reader.
enjoy !
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]