Hi All,
I am finally having some time to upgrade our lucene from the 2.4 series
to the 2.9 series. And I am having a problem that while everything
compiles great I am getting a new UnsupportedOperationException.
java.lang.UnsupportedOperationException
at
org.apache.lucene.index.AbstractAllTermDocs.seek(AbstractAllTermDocs.java:42)
at
org.apache.lucene.index.DirectoryReader$MultiTermDocs.termDocs(DirectoryReader.java:1186)
at
org.apache.lucene.index.DirectoryReader$MultiTermDocs.next(DirectoryReader.java:1118)
at
org.expasy.core.index.SubQueryFilter.fastForLargeResultSets(SubQueryFilter.java:129)
I copied in the code that calls this. See an explanation of what it
tries to achieve underneath.
private void fastForLargeResultSets(String foreignField, BitSet bits,
TermDocs docs, TermDocs foreignDocs, IndexReader foreignReader, BitSet
queryResults)
throws IOException
{
int start = queryResults.nextSetBit(0);
TermEnum foreignEnum = foreignReader.terms(new Term(foreignField, ""));
while (foreignEnum.next())
{
Term term = foreignEnum.term();
if (term == null || !term.field().equals(foreignField))
break;
if (!term.text().equals("not_null"))
{
foreignDocs.skipTo(start);
foreignDocs.seek(term);
//Source of exception in my code
while (foreignDocs.next())
{
int doc = foreignDocs.doc();
if (queryResults.get(doc))
{
foreignDocs.skipTo(doc);
if (term != null && term.text() != null)
buffer.add(term.text());
}
// Use a buffer to avoid jumping around on disk to much.
//
if (buffer.size() >= BUFFERSIZE)
{
emptyBuffer(buffer, bits, docs);
}
}
}
}
if (!buffer.isEmpty())
{
emptyBuffer(buffer, bits, docs);
}
}
The purpose of this code is to fill a bitset as a filter. The filter is
used to find documents in index a who have a linking key value to them
in index b.
While resource intensive this code path was quite fast for when you have
multimillion documents in index b pointing to multimillion documents in
index b.
i.e. it creates a "join" between two queries on different indexes.
for a live example
http://www.uniprot.org/uniprot/?query=citation%3A%28author%3Afink%29
this a search for fink in the field author in the "citation" index.
For each document in the "citation" index that matches term "fink" in
the field "author" retrieve the terms that contain an uniquely
identifying key value for documents in the "uniprot" index. Generate a
bitset to use in filtering the documents in the "uniprot" index (done in
the emptybuffer method).
Is this a bug? and does anyone have ideas for an effective (maybe
superior) work around?
Regards and thanks for a great project!
Jerven
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org