"Chris Hostetter" <[EMAIL PROTECTED]> wrote:
> I haven't really delved into the MergePolicy work that's been done, but a
> recent Jira comment going me poking arround the javadocs -- MergePolicy is
> a public interface, which suggests clients are allowed to impliment it,
> leading me wonder about two things...
>
> 1) Writing a MergePolicy requires knowing about the package protected
> SegmentInfos class ... how do we expect people to make that work (i know
> we've said in the past that people shouldn't have to implement classes in
> the o.a.l namespace just to make thigns work for them)
Good point. Currently your class (implementing MergePolicy) must be
part of the o.a.l.index package, so you can see the package-protected
SegmentInfos/SegmentInfo classes. I had thought that was OK.
Is it really so bad to require users to put their class into the
o.a.l.index package, when what they are doing is a very advanced
thing?
The only other option I can see is to make SegmentInfos/SegmentInfo
public.
Maybe we should add API warning caveats in the javadocs ("this API is
advanced & new & may change") like we have now for Payloads, and leave
the package-protection in place for now to limit usage to brave early
adopters (even if we intend later to make things public)?
> 2) should we instead make this an abstract base class to help "future
> proof" ourselves against wanting to add support for more "optional"
> methods we might want to allow MergePolicies to specify?
>
> (this being the age old interface vs bse class discussion ... providing a
> base class allows us add support for new methods later by providing
> defaults, interfaces can never be changed except in major leases (ie:
> X.0)
>
> For example: suppose down the road we want to support an option like yonik
> describes here...
>
> https://issues.apache.org/jira/browse/LUCENE-1043?#action_12539675
> > More controversial: maybe even expand the number of docs that can be
> > bulk copied by not bothering removing deleted docs if it's some very small
> > number (unless it's an optimize). This is probably not worth it.
>
> ...this is the kind ofthing a MergePolicy could specify with some new
> method...
> public float getMaxAllowedPercentageOfDeletedDocsIgnored() {
> return 0.0f;
> }
> ...that individual MergePolicies could override.
Switching to an abstract base class is a good idea. I think it's
important to reserve the freedom to add default methods in-between
major releases. I'll work out a patch.
> Perhaps the broader question is: do we really want/expect people to write
> their own MergePolicies, or is hte interface just to provide an
> abstraction for picking one of the provided Impls? ... in that case, it
> seems like we should lock down the API a bit more (we can always open it
> up later)
I *think* people will want to implement their own merge policies,
though it is of course hard to tell at this point :). EG use cases:
customize optimize to NOT merge the very large segments; favor merging
segments that have many pending deletes; postpone heavy merging until
overnight when search traffic is low; make a merge policy that's free
to merge non-adjacent segments (though we can't do that one until we
fix IndexWriter to accept such a MergeSpecification).
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]