On Fri, Mar 13, 2009 at 4:03 PM, Marvin Humphrey <[email protected]> wrote: > Every once in a while, the > segment merging algorithm decides that it needs to perform a big > consolidation, and you have to wait while it finishes.
Yes, but that's an artifact of current approach of adding segments rather than making real-time replacements. I was more wondering if there is anything inherent about the rate of change required that would prevent a fully incremental update from working. If it could be pulled off, I think the advantages are large: no degradation due to accumulated changes, and no periodic long merges. There's also the benefit that any changes written are likely to be hot in the cache, so no warm up is needed. > How does this interface look? > > package MyDelWriter; > use base qw( Lucy::Index::DeletionsWriter ); > ... This feels to me like it is solving the wrong problem. There's nothing wrong with it, but DeletionsWriter and DeletionsReader seem like internal implementation details of particular type of Index. Should the callers even have to know about their existence? I'd hope that the interface between a Scorer and an Index could be very simple, probably just a single function to get a PostingList. Thta PostingList would provide navigation by docID, but deletions would be handled internally and never be seen by the Scorer. For indexing, I'd love to see the same agnostic behaviour. The Indexer calls knows only about a single function like UpdatePosting(docID, newPostings). Whether this is done internally via tombstones, real-time updates or carrier pigeon is hidden from the caller. So while the interface you propose is probably great for making small modifications to the current Index, I'd rather it not be part of the official API that all Index formats must support. I want each component to make as few assumptions as possible about the internals of other components. My canonical example for this is that I want to be able to store my index in SQLite, and write a thin layer of interface between it and the rest of Lucy. But my real desire is to substitute a custom mmap() solution such as the fast graph database referenced earlier. I think the easiest way to make this possible is to reduce the points of intersection between the components to the simplest set possible. Instead of specifying a full internal API for each component, specify (and restrict) only the the portions visible to the rest of the program. Nathan Kurz [email protected]
