Peter,
Currently we can issue a simple search query and expect a response back in about 0.2 seconds (~3,000 results)
You may want to try something like the following (I do this in FishEye, seems to be performant for moderately large field-spaces).
Use a custom HitCollector, and store all the matching doc-ids in a java.util.BitSet. This will still give you your 0.2second performance.
Then, use a TermDocs iterator to visit each term in your "species name" field, "printing out" (or whatever) each species name if it contains a docid in your bitset. Something like this pseudocode:
BitSet docs = doSearch(query); // 0.2seconds
TermEnum te = reader.terms(new Term("species-name", "")); TermDocs td = reader.termDocs(); Term t = te.term(); while (t!=null && t.field().equals("species-name")) { td.seek(te); while (td.next()) { int docid = td.doc(); if (docs.get(docid)) { print "match:" + docid; break; // try next term } }
if (!te.next()) { break; } t = te.term(); }
te.close(); td.close();
Now, with 2.3 million (or 4 million!) species names, I'm not sure how fast it will be to iterate through all the "species-name" termdocs. But I would be interested to find out; if you give this a code a try, could you report back your results?
=Matt
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]