I just wrote some simple code to test this. For my test I ran the test with 3 queries: - A 3 term boolean - A single term query with over 5000 hits - A single term query with 0 hits
For each query I ran the ran 4 tests of 10,000 searches: 1) using hits.length to get the counts and the standard similarity 2) using hits.length to get the counts and a custom similarity 3) using HitCollector to get the counts and the standard similarity 4) using HitCollector to get the counts and a custom similarity The custom similarity returns 0 for all methods. The results are kind of surprising. It doesn't look like the speed up is enough to make the change to our application. Here are the results, the test class is also attached: time (mills) 14095, useHC=false, standardSimilarity=true, count=47, query=abstract_recent:(genetically modified organism) time (mills) 15406, useHC=false, standardSimilarity=false, count=0, query=abstract_recent:(genetically modified organism) time (mills) 13768, useHC=true, standardSimilarity=true, count=47, query=abstract_recent:(genetically modified organism) time (mills) 14404, useHC=true, standardSimilarity=false, count=47, query=abstract_recent:(genetically modified organism) time (mills) 6790, useHC=false, standardSimilarity=true, count=5776, query=lname:smith time (mills) 4901, useHC=false, standardSimilarity=false, count=0, query=lname:smith time (mills) 5209, useHC=true, standardSimilarity=true, count=5776, query=lname:smith time (mills) 5578, useHC=true, standardSimilarity=false, count=5776, query=lname:smith time (mills) 47, useHC=false, standardSimilarity=true, count=0, query=lname:dfdsalkfjdsalkjflsa time (mills) 37, useHC=false, standardSimilarity=false, count=0, query=lname:dfdsalkfjdsalkjflsa time (mills) 41, useHC=true, standardSimilarity=true, count=0, query=lname:dfdsalkfjdsalkjflsa time (mills) 198, useHC=true, standardSimilarity=false, count=0, query=lname:dfdsalkfjdsalkjflsa On Thursday 06 April 2006 15:19, Chris Hostetter wrote: > : I need the count, and don't need the docs at this point. If I had a > : simple query, (e.g. "book") I can use docFreq(), and it's lightning > : fast. If I just run it as a query it's much slower. I'm just > : wondering if I did a custom scorer / similarity / hitcollector, how > : much faster than a query could I get it? Or is there a better way? > > A custom HitCollector would be the first big win, something like this > would probably work... > > final int[] count = new int[1] > searcher.search(query, new HitCollector() { > public void collect(int doc, float score) { > count[0]++; > } > }); > return count[0] > > otherways you might be able to shave time would be... > > * if your query can be represented as in simple set logic logic (you > don't seem to be concerned with score) then implimenting it as a > Filter may be faster becuase it won't do any score calculation, just a > simple match/no-match (which is what you seem to want) ... but it will > definitely take up more memory then a query > > * if you customize your similarity so that every function returns 0 or 1 > you might shave a little bit of time off by skipping some of the math > equations ... but i really doubt it. > > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED]
import org.apache.lucene.analysis.SimpleAnalyzer; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.DefaultSimilarity; import org.apache.lucene.search.HitCollector; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; public class LuceneCountTest { public static void main(String[] args) throws Exception { testCounts("abstract_recent:(genetically modified organism)"); testCounts("lname:smith"); testCounts("lname:dfdsalkfjdsalkjflsa"); } private static void testCounts(String queryString) throws Exception { testCount(queryString, false, true); testCount(queryString, false, false); testCount(queryString, true, true); testCount(queryString, true, false); System.out.println(); System.out.println(); } private static void testCount(String queryString, final boolean useHitCollector, final boolean standardSimilarity) throws Exception { IndexSearcher searcher = createSearcher(standardSimilarity); QueryParser parser = new QueryParser("f", new SimpleAnalyzer()); parser.setDefaultOperator(QueryParser.AND_OPERATOR); Query query = parser.parse(queryString); searcher.search(query); long startTime = System.currentTimeMillis(); final int[] hitCounts = new int[1]; for (int i = 0; i < 10000; i++) { hitCounts[0] = 0; if (useHitCollector) { searcher.search(query, new HitCollector() { @Override public void collect(int doc, float score) { hitCounts[0]++; } }); } else { Hits hits = searcher.search(query); hitCounts[0] = hits.length(); } } System.out.println("time (mills) " + (System.currentTimeMillis() - startTime) + ", useHC=" + useHitCollector + ", standardSimilarity=" + standardSimilarity + ", count=" + hitCounts[0] + ", query=" + queryString); searcher.close(); } private static IndexSearcher createSearcher(final boolean standardSimilarity) throws Exception { String indexPath = "/usr/local/cs/scholaruniverse/global/data/cv18/version/lucene/person_profile"; IndexSearcher searcher = new IndexSearcher(indexPath); searcher.setSimilarity(new DefaultSimilarity() { public float coord(int overlap, int maxOverlap) { if (standardSimilarity) { return super.coord(overlap, maxOverlap); } else { return 0f; } } public float idf(int docFreq, int numDocs) { if (standardSimilarity) { return super.idf(docFreq, numDocs); } else { return 0f; } } public float queryNorm(float sumOfSquaredWeights) { if (standardSimilarity) { return super.queryNorm(sumOfSquaredWeights); } else { return 0f; } } public float sloppyFreq(int distance) { if (standardSimilarity) { return super.sloppyFreq(distance); } else { return 0f; } } public float tf(float freq) { if (standardSimilarity) { return super.tf(freq); } else { return 0f; } } public float lengthNorm(String field, int numTerms) { if (standardSimilarity) { return super.lengthNorm(field, numTerms); } else { return 0f; } } }); return searcher; } }
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]