Doug Cutting:
> Folks are discussing whether generics are a special case for
> back-compatibility. This is an important discussion, since major
> releases are defined by their back-compatibility. This discussion thus
> should have priority over the discussion of new 3.0 features.
Okeedoke. Since I'm working on this right now for KS, though, I'd like to
continue the conversation under a new thread heading.
I have a bunch of file format changes to push through, and I'm hoping to
implement them using pluggable modules. For instance, I'd like to be able to
swap out bit-vector-based deletions for tombstone-based deletions, just by
overriding a method or two.
Jason Rutherglen:
> Decoupling IndexReader would for 3.0 would be great. This includes making
> public SegmentReader, MultiSegmentReader.
I definitely think that IndexReader can and should be made more pluggable. Is
exposing per-segment sub-readers a definite win, though? Does it make sense
to leave open the door to index components which don't operate on segments?
Or even to eliminate SegmentReader entirely and have sub-components of
IndexReader manage collation?
I've been thinking about this with regard to tombstone-based deletions, where
you can't know everything about a segment unless you've opened up other
segments.
> A constructor like new SegmentReader(TermsDictionary termDictionary,
> TermPostings termPostings, ColumnStrideFields csd, DocIdBitSet deletedDocs);
You end up with a proliferation of constructors that way. Term vectors?
Arbitrary auxiliary components such as an R-tree component supporting
geographic search?
My original proposal to clean this up involved an "IndexComponent" class.
However, when I started implementing it, I ended up with a slew of new classes
with only two factory methods each.
We could possibly move those factory methods up into Schema, but I'm reluctant
to
dirty it up, since it's a major public class in KS (as I anticipate it will be
in Lucy) and major public classes should be as simple as possible.
So, how about an IndexArchitecture or IndexPlan class?
class MyArchitecture extends IndexArchitecture {
public PostingsWriter PostingsWriter() {
return new PForDeltaPostingsWriter();
}
public PostingsReader PostingsReader() {
return new PForDeltaPostingsReader();
}
public DeletionsWriter DeletionsWriter() {
return new TombstoneWriter();
}
public DeletionsReader DeletionsReader() {
return new TombstoneReader();
}
}
Lucene:
IndexWriter writer = new IndexWriter("/path/to/index",
new StandardAnalyzer(), new MyArchitecture());
Lucy with Java bindings:
class MySchema extends Schema {
public MySchema() {
initField("title", "text");
initField("content", "text");
}
public IndexArchitecture indexArchitecture() {
return new MyArchitecture();
}
public Analyzer analyzer() {
return new PolyAnalyzer("en");
}
}
IndexWriter writer = new IndexWriter(MySchema.open("/path/to/index"));
> Decouple rollback, commit, IndexDeletionPolicy from DirectoryIndexReader
> into a class like SegmentsVersionSystem which could act as the controller
> for reopen types of methods. There could be a SegmentVersionSystem that
> manages the versioning of a single segment.
I like it. :)
Sometimes you want to change up the merge policy for different writers against
the same index. How does that fit into your plan?
My thought is that merge-policies would be application-specific rather than
index-specific.
Marvin Humphrey
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]