[jira] Updated: (LUCENE-626) Adaptive, user query session analyzing spell checker.

Karl Wettin (JIRA) Tue, 30 Jan 2007 04:25:58 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Karl Wettin updated LUCENE-626:
-------------------------------

    Attachment:     (was: spellcheck_20060804_2.tar.gz)

> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
>                 Key: LUCENE-626
>                 URL: https://issues.apache.org/jira/browse/LUCENE-626
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Karl Wettin
>            Priority: Minor
>         Attachments: spellchecker.diff
>
>
> From javadocs:
>  This is an adaptive, user query session analyzing spell checker. In plain 
> words, a word and phrase dictionary that will learn from how users act while 
> searching.
> Be aware, this is a beta version. It is not finished, but yeilds great 
> results if you have enough user activity, RAM and a faily narrow document 
> corpus. The RAM problem can be fixed if you implement your own subclass of 
> SpellChecker as the abstract methods of this class are the CRUD methods. This 
> will most probably change to a strategy class in future version.
> TODO:
> 1. Gram up results to detect compositewords that should not be composite 
> words, and vice verse.
> 2. Train a gramed token (markov) chain with output from an expectation 
> maximization algorithm (weka clusters?) parallel to a closest path (A* or 
> bredth first?) to allow contextual suggestions on queries that never was 
> placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults containg the query 
> string, number of hits and a time stamp. Add it to a chronologically ordered 
> list in the user session (LinkedList makes sense) that you pass on to 
> train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every 100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results. Don't modify it! This 
> method call will be hidden in a facade in future version.
> Note that the spell checker is case sensitive, so you want to clean up query 
> the same way when you train as when you request the suggestions.
> I recommend something like query = query.toLowerCase().replaceAll(" ", " 
> ").trim() 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-626) Adaptive, user query session analyzing spell checker.

Reply via email to