Your right, more flexibility is needed, but it goes beyond just field
loading in my mind. I think this is what Doug was getting at (at least
partially) with http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard
#12 although that focuses on Indexing, I think it should be considered
for searching. I am not sure we should just continue adding more and
more methods onto IndexReader. I think the 2.x move gives us an
opportunity to refactor some of the things we think we can make better.
I am not sure you need 509 when you have Lazy loading. In my mind, you
have the best of both worlds. You can get all the meta-info about all
the stored fields on the Document w/o the penalty of loading the actual
data.
My use case is below (my guess is this is quite common).
Run a search, get back your hits and display summary information on the
hits (i.e. the "small" fields). User picks the Hit they want to see
more info on, go display the full document, including, most likely, the
info in the really large stored fields (i.e the original document). To
date, I have been storing this info elsewhere b/c of the loading
penalty. With lazy loading, I don't need to do this. I can just defer
loading until the second level access is needed and I never load it if
the user doesn't ask for it.
In the case where you only get a few smaller fields, you have to go back
and get the document again when you want to display the contents of the
large field.
Of course, there are several other use cases where you may only want
certain fields, but I don't think there is much cost associated with
loading small fields, just the large ones, so you can just make them lazy.
Yonik Seeley wrote:
On 3/31/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
<https://issues.apache.org:443/jira/browse/LUCENE-509>
Yes, I'd personally find a way to retrieve just fields x,y, and z more
useful than lazy loading.
Thinking a little more, it would be nice if the field reading API was
opened up a little more so that multiple things could be done... even
construct different field/document objects (say a document
implementation that indexed the fields, etc).
That could be used to implement either lazy field loading, or loading
of specific fields.
The lazy loading alone doesn't really address LUCENE-509
I was thinking something along the lines of
// an IndexReader would call FieldReader methods for each
abstract class FieldReader {
boolean readField(int fieldnum, String fieldName); // users return
true if this field should be read.
boolean stringField(int fieldnum, byte[] utf8); // returns true to
keep reading next field
OR
boolean stringField(int fieldnum, String str); // returns true to
keep reading next field
boolean binaryField(int fieldnum, byte[] data); // returns true to
keep reading next field
}
class IndexReader {
// expert level API
void readFields(int doc, FieldReader reader);
}
Just brainstorming so far...
-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]