I think it would be better to have IndexReaderProperties, and
IndexWriterProperties.
Just seems an easier API for maintenance. It is more logical, as it
keeps related items together.
On Nov 8, 2007, at 12:04 PM, Doug Cutting wrote:
Michael McCandless wrote:
One thing is: I'd prefer to not use system property for this, since
it's so global, but I'm not sure how to better do it.
I agree. That was the quick-and-dirty hack. Ideally it should be
a method on IndexReader. I can think of two ways to do that:
1. Add a generic method like IndexReader#setProperty(String,String).
2. Add a specific method like IndexReader#setTermIndexDivisor(int).
I slightly prefer the former, as it permits various IndexReaders
implementations to support arbitrary properties, at the expense of
being untyped, but that might be overkill. Thoughts?
We can't add a "setIndexDivisor(...)" method because the terms are
already loading (consuming too much ram) during the ctor.
Aren't indexes loaded lazily? That's an important optimization for
merging, no? For performance reasons, opening an IndexReader
shouldn't do much more than open files. However, if we build a
more generic mechanism, we should not rely on that.
What if, instead, we passed down a Properties instance to IndexReader
ctors? Or alternatively a dedicated class, eg,
"IndexReaderInitParameters"? The advantage of a dedicated class is
it's strongly typed at compile time, and, you could put things in
there like an optional DeletionPolicy instance as well. I think
there
are a growing list of these sorts of "advanced optional parameters
used during init" that could be handled with such an approach?
(I probably should have read your entire message before starting to
respond... But it's nice to see that we think alike!) This is
similar to my (2) approach, but attempts to solve the typing issue,
although I'm not sure how...
The way we handle it in Hadoop is to pass around a <String,String>
map in the abstract kernel, then have concrete implementation
classes provide static methods that access it. So this might look
something like:
public class LuceneProperties extends Properties {
// utility methods to handle conversion of values to and from
Strings
void setInt(String prop, int value);
int getInt(String prop);
void setClass(String prop, Class value);
Class getClass(String prop);
Object newInstance(String prop)
...
}
public class SegmentReaderProperties {
private static final String DIVISOR_PROP =
"org.apache.lucene.index.SegmentReader.divisor";
public static setTermIndexDivisor(LuceneProperties props, int i) {
props.setInt(DIVISOR_PROP, i);
}
}
Then the IndexReader constructor methods could accept a
LuceneProperties. No point in making this IndexReader specific,
since it might be useful for, e.g., IndexWriter, Searchers,
Directories, etc.
An advantage of a <String,String> map over a <String,Object> map
for Hadoop is that it's trivial to serialize.
Is this what you had in mind?
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]