hi all, I am looking for a 'BooleanMatcher' in lucene. for many application, we don't need order matched documents by relevant scores. we just like the boolean query. But the BooleanScorer/BooleanScorer2 is a little bit heavy for the purpose of relevant scoring. one use case is: we have some fields which has very small number of tokens(usually only one word). such as id,tag or something else. But we need query like this: id in (1,3,5.....). if using booleanQuery (id:1 id:3 id:5 ...). BooleanScorer can only apply to 31 terms. BooleanScorer2 using priority queue to know how many terms are matched(Coord). Filters may help but it can be a very complicated query(or else, it self still using BooleanQuery, there is a recursive problem)
we may divide current BooleanScorer to a BooleanMatcher and a Ranker. if we need score the hitted docs, we ask the BooleanScorer for not only hitted id but also tf/idf coord or anything we need to use in ranking. but sometimes we only need docIds. then the BooleanMatcher can optimize it's implementation. for the case of many disjunction terms, we can do it like Filter or BooleanScorer instead of BooleanScorer2. is it possible? following is some user demands I searched from the mail list. the first one is my own requirement. 1. https://github.com/neo4j/community/issues/494 2. mail to lucene qibaoy...@126.com qibaoy...@126.com via lucene.apache.org May 6 to lucene Hi, I met a problem about how to search many keywords in about 5,000,000 documents.For example the query may be like "(a1 or a2 or a3 ....a200) and (b1 or b2 or b3 or b4 ..... b400)",I found it will take vey long time(40seconds) to get the the answer in only one field(Title field),and JVM will throw OutMemory error in more fields(title field plus content field).Any suggestions or good idea to solve this problem?thanks in advance. 3 mail to lucene Chris Book chrisb...@gmail.com via lucene.apache.org Apr 11 to solr-user Hello, I have a solr index running that is working very well as a search. But I want to add the ability (if possible) to use it to do matching. The problem is that by default it is only looking for all the input terms to be present, and it doesn't give me any indication as to how many terms in the target field were not specified by the input. For example, if I'm trying to match to the song title "dust in the wind", I'm correctly getting a match if the input query is "dust in wind". But I don't want to get a match if the input is just "dust". Although as a search "dust" should return this result, I'm looking for some way to filter this out based on some indication that the input isn't close enough to the output. Perhaps if I could get information that that the number of input terms is much less than the number of terms in the field. Or something else along those line? I realize that this isn't the typical use case for a search, but I'm just looking for some suggestions as to how I could improve the above example a bit. Thanks, Chris --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org