[
https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: (was: spellcheck_20060804_2.tar.gz)
> Adaptive, user query session analyzing spell checker.
> -----------------------------------------------------
>
> Key: LUCENE-626
> URL: https://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: spellchecker.diff
>
>
> From javadocs:
> This is an adaptive, user query session analyzing spell checker. In plain
> words, a word and phrase dictionary that will learn from how users act while
> searching.
> Be aware, this is a beta version. It is not finished, but yeilds great
> results if you have enough user activity, RAM and a faily narrow document
> corpus. The RAM problem can be fixed if you implement your own subclass of
> SpellChecker as the abstract methods of this class are the CRUD methods. This
> will most probably change to a strategy class in future version.
> TODO:
> 1. Gram up results to detect compositewords that should not be composite
> words, and vice verse.
> 2. Train a gramed token (markov) chain with output from an expectation
> maximization algorithm (weka clusters?) parallel to a closest path (A* or
> bredth first?) to allow contextual suggestions on queries that never was
> placed.
> Usage:
> Training
> At user query time, create an instance of QueryResults containg the query
> string, number of hits and a time stamp. Add it to a chronologically ordered
> list in the user session (LinkedList makes sense) that you pass on to
> train(sessionQueries) as the session times out.
> You also want to call the bootstrap() method every 100000 queries or so.
> Spell checking
> Call getSuggestions(query) and look at the results. Don't modify it! This
> method call will be hidden in a facade in future version.
> Note that the spell checker is case sensitive, so you want to clean up query
> the same way when you train as when you request the suggestions.
> I recommend something like query = query.toLowerCase().replaceAll(" ", "
> ").trim()
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]