I think it would be better to have IndexReaderProperties, and IndexWriterProperties.

Just seems an easier API for maintenance. It is more logical, as it keeps related items together.

On Nov 8, 2007, at 12:04 PM, Doug Cutting wrote:

Michael McCandless wrote:
One thing is: I'd prefer to not use system property for this, since
it's so global, but I'm not sure how to better do it.

I agree. That was the quick-and-dirty hack. Ideally it should be a method on IndexReader. I can think of two ways to do that:

1. Add a generic method like IndexReader#setProperty(String,String).
2. Add a specific method like IndexReader#setTermIndexDivisor(int).

I slightly prefer the former, as it permits various IndexReaders implementations to support arbitrary properties, at the expense of being untyped, but that might be overkill. Thoughts?

We can't add a "setIndexDivisor(...)" method because the terms are
already loading (consuming too much ram) during the ctor.

Aren't indexes loaded lazily? That's an important optimization for merging, no? For performance reasons, opening an IndexReader shouldn't do much more than open files. However, if we build a more generic mechanism, we should not rely on that.

What if, instead, we passed down a Properties instance to IndexReader
ctors?  Or alternatively a dedicated class, eg,
"IndexReaderInitParameters"?  The advantage of a dedicated class is
it's strongly typed at compile time, and, you could put things in
there like an optional DeletionPolicy instance as well. I think there
are a growing list of these sorts of "advanced optional parameters
used during init" that could be handled with such an approach?

(I probably should have read your entire message before starting to respond... But it's nice to see that we think alike!) This is similar to my (2) approach, but attempts to solve the typing issue, although I'm not sure how...

The way we handle it in Hadoop is to pass around a <String,String> map in the abstract kernel, then have concrete implementation classes provide static methods that access it. So this might look something like:

public class LuceneProperties extends Properties {
// utility methods to handle conversion of values to and from Strings
  void setInt(String prop, int value);
  int getInt(String prop);
  void setClass(String prop, Class value);
  Class getClass(String prop);
  Object newInstance(String prop)
  ...
}

public class SegmentReaderProperties {
  private static final String DIVISOR_PROP =
    "org.apache.lucene.index.SegmentReader.divisor";
  public static setTermIndexDivisor(LuceneProperties props, int i) {
    props.setInt(DIVISOR_PROP, i);
  }
}

Then the IndexReader constructor methods could accept a LuceneProperties. No point in making this IndexReader specific, since it might be useful for, e.g., IndexWriter, Searchers, Directories, etc.

An advantage of a <String,String> map over a <String,Object> map for Hadoop is that it's trivial to serialize.

Is this what you had in mind?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to