Simon Willnauer created LUCENE-4628:
---------------------------------------
Summary: Add common terms query to gracefully handle very high
frequent terms dynamically
Key: LUCENE-4628
URL: https://issues.apache.org/jira/browse/LUCENE-4628
Project: Lucene - Core
Issue Type: Improvement
Components: modules/other
Reporter: Simon Willnauer
Priority: Minor
Fix For: 4.1, 5.0
I had this problem quite a couple of times the last couple of month that
searches very often contained super high frequent terms and disjunction queries
became way too slow. The main problem was that stopword filtering wasn't really
an option since in the domain those high-freq terms where not really stopwords
though. So for instance searching for a song title "this is it" or for a band
"A" didn't really fly with stopwords. I thought about that for a while and came
up with a query based solution that decides based on a threshold if something
is considered a stopword or not and if so it moves the term in two boolean
queries one for high-frequent and one for low-frequent such that those high
frequent terms are only matched if the low-frequent sub-query produces a match.
Yet if all terms are high frequent it makes the entire thing a Conjunction
which gave me reasonable results as well as performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]