first off: you should double check the correctness ofyour customized similarity class. I'm pretty sure it's resulting in a differnet set of matches then the DefaultSimilarity because your tf function returns 0f regardless of wether there is a match. (when i said "every function returns 0 or 1" i ment you acctually have to have the if (match) return 1 else 0 logic at a minimum)
second: these times really shocked me untill i realized you were reporting the sum of the execution times for all 10000 iterations (whew!) third: for queries this simple, i don't think you're going find much differneces in speed between the differnet models, but if you elliminate the Similarity as a variable (compare only #1 and #3 for each query), you can see that HitCollector was fairly consistently "a little faster" then using Hits (but i'll admit, i thought it would be "more faster") I think if you tried bigger, more complex queries (nested booleans, with mandatory/required clauses, span queries, booleans with 20 or more optional clauses) you might see a bigger discrepency. it's also not clear how big your test index is, as doug has pointed out before, it can make a big difference in benchmarking. : For each query I ran the ran 4 tests of 10,000 searches: : 1) using hits.length to get the counts and the standard similarity : 2) using hits.length to get the counts and a custom similarity : 3) using HitCollector to get the counts and the standard similarity : 4) using HitCollector to get the counts and a custom similarity : : The custom similarity returns 0 for all methods. : The results are kind of surprising. It doesn't look like the speed up is : enough to make the change to our application. : : Here are the results, the test class is also attached: : : time (mills) 14095, useHC=false, standardSimilarity=true, count=47, : query=abstract_recent:(genetically modified organism) : time (mills) 15406, useHC=false, standardSimilarity=false, count=0, : query=abstract_recent:(genetically modified organism) : time (mills) 13768, useHC=true, standardSimilarity=true, count=47, : query=abstract_recent:(genetically modified organism) : time (mills) 14404, useHC=true, standardSimilarity=false, count=47, : query=abstract_recent:(genetically modified organism) : : : time (mills) 6790, useHC=false, standardSimilarity=true, count=5776, : query=lname:smith : time (mills) 4901, useHC=false, standardSimilarity=false, count=0, : query=lname:smith : time (mills) 5209, useHC=true, standardSimilarity=true, count=5776, : query=lname:smith : time (mills) 5578, useHC=true, standardSimilarity=false, count=5776, : query=lname:smith : : : time (mills) 47, useHC=false, standardSimilarity=true, count=0, : query=lname:dfdsalkfjdsalkjflsa : time (mills) 37, useHC=false, standardSimilarity=false, count=0, : query=lname:dfdsalkfjdsalkjflsa : time (mills) 41, useHC=true, standardSimilarity=true, count=0, : query=lname:dfdsalkfjdsalkjflsa : time (mills) 198, useHC=true, standardSimilarity=false, count=0, : query=lname:dfdsalkfjdsalkjflsa : : : : : On Thursday 06 April 2006 15:19, Chris Hostetter wrote: : > : I need the count, and don't need the docs at this point. If I had a : > : simple query, (e.g. "book") I can use docFreq(), and it's lightning : > : fast. If I just run it as a query it's much slower. I'm just : > : wondering if I did a custom scorer / similarity / hitcollector, how : > : much faster than a query could I get it? Or is there a better way? : > : > A custom HitCollector would be the first big win, something like this : > would probably work... : > : > final int[] count = new int[1] : > searcher.search(query, new HitCollector() { : > public void collect(int doc, float score) { : > count[0]++; : > } : > }); : > return count[0] : > : > otherways you might be able to shave time would be... : > : > * if your query can be represented as in simple set logic logic (you : > don't seem to be concerned with score) then implimenting it as a : > Filter may be faster becuase it won't do any score calculation, just a : > simple match/no-match (which is what you seem to want) ... but it will : > definitely take up more memory then a query : > : > * if you customize your similarity so that every function returns 0 or 1 : > you might shave a little bit of time off by skipping some of the math : > equations ... but i really doubt it. : > : > : > : > : > -Hoss : > : > : > --------------------------------------------------------------------- : > To unsubscribe, e-mail: [EMAIL PROTECTED] : > For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]