Re: Doc IDs via IndexReader?

Shai Erera Wed, 22 Jul 2009 21:20:14 -0700

off the top of my head, if you have in hand all the doc IDs that were
returned so far, you can do this:
1) Build a Filter which will return any doc ID that is not in that list. For
example, pass it the list of doc IDs and every time next() or skipTo is
called, it will skip over the given doc IDs.
2) Execute a MatchAllDocsQuery, together w/ the above filter.
3) The result will be the list of doc IDs the filter accepted, namely those
that were not returned so far.
4) Delete each doc ID.


This will make sure you attempt to delete only docs that are not deleted
yet.

Another approach you can take is iterate from 0 to reader.maxDoc() and
delete only the docs that were not retrieved so far. with this, you may
attempt to delete an already deleted document. If you want to avoid it, call
reader.isDeleted(id) and delete it only if it returns false.

Hope this helps,

Shai

On Thu, Jul 23, 2009 at 5:58 AM, Anuj Bhatt <anuj.bh...@gmail.com> wrote:

> Hi,
>
> I'm relatively new to Lucene. I have the following case: I have
> indexed a bunch of documents. I then, query the index using
> IndexSearcher and retrieve the documents using Hits (I do know this is
> deprecated -- I'm using v 2.4.1). So, I do this for a set of queries
> and maintain which documents are returned to each one. In the end of
> it all, I have a list of documents maintained (more specifically, the
> hits.id(some_iterator_int) associated with the doc). Now, I wish to
> delete the documents which have not been returned for any query, from
> the index. How can I do this?
>
> My initial assumption was that I could retrieve all the doc ids from
> IndexReader and just traverse the list that I have maintained, if it
> is in the list, I don't delete it otherwise I do. Looking around
> didn't yield anything, and hence the mail.
>
>
> Any suggestions?
>
>
> Regards,
> Anuj
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Doc IDs via IndexReader?

Reply via email to