[
https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wettin updated LUCENE-626:
-------------------------------
Assignee: (was: Karl Wettin)
Description:
Extensive javadocs available in patch, but I also try to keep it compiled here:
http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description
A semi-retarded reinforcement learning thingy backed by algorithmic second
level suggestion schemes that learns from and adapts to user behavior as
queries change, suggestions are accepted or declined, etc.
Except for detecting spelling errors it considers context,
composition/decomposition and a few other things.
heroes of light and magik -> heroes of might and magic
vinci da code -> da vinci code
java docs -> javadocs
blacksabbath -> black sabbath
Depends on LUCENE-550
was:
Extensive java docs available in patch, but I try to keep it compiled here:
http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description
Example:
{code:java}
public void testImportData() throws Exception {
// load 200 000 user queries with session data and time stamp. no goals
specified.
System.out.println("Processing
http://ginandtonique.org/~kalle/data/pirate.data.gz");
importFile(new InputStreamReader(new GZIPInputStream(new
URL("http://ginandtonique.org/~kalle/data/pirate.data.gz").openStream())));
System.out.println("Processing
http://ginandtonique.org/~kalle/data/hero.data.gz");
importFile(new InputStreamReader(new GZIPInputStream(new
URL("http://ginandtonique.org/~kalle/data/hero.data.gz").openStream())));
System.out.println("Done.");
// run some tests without the second level suggestions,
// i.e. user behavioral data only. no ngrams or so.
assertEquals("pirates of the caribbean", facade.didYouMean("pirates ofthe
caribbean"));
assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the
carribbean"));
assertEquals("pirates caribbean", facade.didYouMean("pirates carricean"));
assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the
carriben"));
assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the
carabien"));
assertEquals("pirates of the caribbean", facade.didYouMean("pirates of the
carabbean"));
assertEquals("pirates of the caribbean", facade.didYouMean("pirates og
carribean"));
assertEquals("pirates of the caribbean soundtrack",
facade.didYouMean("pirates of the caribbean music"));
assertEquals("pirates of the caribbean score", facade.didYouMean("pirates
of the caribbean soundtrack"));
assertEquals("pirate of caribbean", facade.didYouMean("pirate of
carabian"));
assertEquals("pirates of caribbean", facade.didYouMean("pirate of
caribbean"));
assertEquals("pirates of caribbean", facade.didYouMean("pirates of
caribbean"));
// depening on how many hits and goals are noted with these two queries
// perhaps the delta should be added to a synonym dictionary?
assertEquals("homm iv", facade.didYouMean("homm 4"));
// not yet known.. and we have no second level yet.
assertNull(facade.didYouMean("the pilates"));
// use the dictionary built from user queries to build the token phrase and
ngram suggester.
facade.getDictionary().getPrioritesBySecondLevelSuggester().put(Factory.ngramTokenPhraseSuggesterFactory(facade.getDictionary()),
1d);
// now it's learned
assertEquals("the pirates", facade.didYouMean("the pilates"));
// typos
assertEquals("heroes of might and magic", facade.didYouMean("heroes of
fight and magic"));
assertEquals("heroes of might and magic", facade.didYouMean("heroes of
right and magic"));
assertEquals("heroes of might and magic", facade.didYouMean("heroes of
magic and light"));
// composite dictionary key not learned yet..
assertEquals(null, facade.didYouMean("heroesof lightand magik"));
// learn
assertEquals("heroes of might and magic", facade.didYouMean("heroes of
light and magik"));
// test
assertEquals("heroes of might and magic", facade.didYouMean("heroesof
lightand magik"));
// wrong term order
assertEquals("heroes of might and magic", facade.didYouMean("heroes of
magic and might"));
}
{code}
> Extended spell checker with phrase support and adaptive user session analysis.
> ------------------------------------------------------------------------------
>
> Key: LUCENE-626
> URL: https://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Karl Wettin
> Priority: Minor
> Attachments: LUCENE-626_20071023.txt
>
>
> Extensive javadocs available in patch, but I also try to keep it compiled
> here:
> http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description
> A semi-retarded reinforcement learning thingy backed by algorithmic second
> level suggestion schemes that learns from and adapts to user behavior as
> queries change, suggestions are accepted or declined, etc.
> Except for detecting spelling errors it considers context,
> composition/decomposition and a few other things.
> heroes of light and magik -> heroes of might and magic
> vinci da code -> da vinci code
> java docs -> javadocs
> blacksabbath -> black sabbath
> Depends on LUCENE-550
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]