Query basic filter with correction feature
------------------------------------------

         Key: NUTCH-72
         URL: http://issues.apache.org/jira/browse/NUTCH-72
     Project: Nutch
        Type: New Feature
  Components: searcher  
 Environment: lucene
    Reporter: Christophe Noel


This plugin improves query-basic plugin with a correction feature.

Lucene includes FuzzyQuery feature which consists of searching not only for 
matching terms, but searching for very similar terms too.
This plugin should be used instead of query-basic, for people looking for an 
easy solution about users query requests correction.

Correction Query Plugin can be used as follows :
Solution 1 :  If you want to search for very similar terms, add 
autocorrectionmod as the first term of the query (example : 'nutch engine' -> 
'autocorrectionmod nutch engine')
Solution 2 : Create a new search.jsp page which include a "correction" checkbox 
management (<input type="checkbox" name="autocorrection" value="true"> may 
automatically add 'autocorrectionmod' as the first term of the query) 

QueryFuzzy knows a big problem : it is very slow for large index !

So Correction Query Plugin works as follows :
- it is not useful for big indexes
- it only works for 5 characters and more words
- it only look for words matching with the 2 first characters (to improve 
performance this should be set to 3/4)
- it only works for 65 % matching suffixes (algorithm is levenstein)

PLease give your opinion about it.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to