[
https://issues.apache.org/jira/browse/LUCENE-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881502#action_12881502
]
Karthick Sankarachary commented on LUCENE-2348:
-----------------------------------------------
{quote}1. If your filterable data is in another store (e.g. a database), then
you would still need either some way to get to the top level reader or a way to
know what its offset is, but there is no way to get that information from the
reader which was passed in.{quote}
In theory, one could obtain the top-level reader from a segment reader as
follows: IndexReader.open(((SegmentReader) reader).directory()), where reader
is what is provided to the filter. Of course, the top-level reader that you
obtain this way might be a little bit "ahead" of the segment reader's actual
parent, given that it was created more recently. If you think it makes sense, I
can add a convenience method to the StatefulFilter to obtain the top-level
reader using this approach.
{quote}2. If you want to return the newest item instead of the oldest item, it
will be too late if getStatefulDocIdSet for an earlier call has already
returned the older one.{quote}
Actually, if you create a DuplicateFilter with keepMode set to
KM_USE_FIRST_OCCURRENCE, then it will return the document from the first
matching segment, and ignore the ones in subsequent segments (due to its
stateful behavior). However, the current approach would break in the event
keepMode is set to KM_USE_LAST_OCCURRENCE. Again, in theory, if we could
determine if the reader corresponds to the last segment, then we could defer
all matches until after the last reader has been processed. Needless to say,
I'm open to any other suggestions that you might have to address that case.
> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment
> readers
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-2348
> URL: https://issues.apache.org/jira/browse/LUCENE-2348
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/*
> Affects Versions: 2.9.2
> Reporter: Trejkaz
> Attachments: LUCENE-2348.patch
>
>
> DuplicateFilter currently works by building a single doc ID set, without
> taking into account that getDocIdSet() will be called once per segment and
> only with each segment's local reader.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]