I have a base implementation of lazy field loading that I am starting to
test and wanted to run my approach by everyone to hear their thoughts.
I have, as per Doug's suggestion from a while ago, created an interface
named Fieldable that is implemented by Field and a new, private class,
owned by FieldsReader. I have introduced an "enumerated" type to the
Field class named LazyLoad (which can be YES or NO, in the same spirit
as Field.TermVector). Any place that used to take Field now takes
Fieldable. This should be completely transparent and
backward-compatible. The existing constructors of field all assume lazy
to be off.
On creation of a Field, a user can pass in LazyLoad.YES or NO to a
constructor that takes either a String value or a byte array (it does
not apply to the Reader constructors since they do not store their
content). Indexing and writing of fields take place as normal, the only
difference being there is an extra bit added to the field writing that
marks the field as being lazy.
On reading in of the field, if it is Lazy, instead of reading in the
value for the field and constructing a Field, construct a LazyField
instance which takes in the pointer of the fieldsStream and the amount
of data to read. This instance, since it is a private class of
FieldsReader, maintains access to the fieldsStream. Thus, when a
application goes to access the value of the field, we check to see if it
is has been loaded or not. If it has not, we load it using the
fieldsStream, the pointer and the length to read.
Does anyone see any issues with this? I think it will only really pay
off on large stored fields, but have not quantified it yet. My main
concern is the semantics of the fieldsStream and whether that would be
closed behind the back of the LazyField implementation. My
understanding is that as long as the IndexReader is open, this stream
should also be open. Is that correct? What am I forgetting about?
If testing goes well, I should be able to button this up this week or
next and submit the patch.
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]