[
https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wettin updated LUCENE-626:
-------------------------------
Attachment: (was: didyoumean.patch.bz2)
> Extended spell checker with phrase support and adaptive user session analysis.
> ------------------------------------------------------------------------------
>
> Key: LUCENE-626
> URL: https://issues.apache.org/jira/browse/LUCENE-626
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Priority: Minor
> Attachments: LUCENE-626_2007_10_16.txt
>
>
> Extensive java docs available in patch, but I try to keep it compiled here:
> http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description
> The patch spellcheck.diff should not depend on anything but Lucene trunk. It
> has basic support for phrase suggestions and query goal detection, but is
> pretty buggy and lacks features available in didyoumean.diff.bz2. The latter
> depends on LUCENE-550.
> Example:
> {code:java}
> public void testImportData() throws Exception {
> // load 200 000 user queries with session data and time stamp. no goals
> specified.
> System.out.println("Processing
> http://ginandtonique.org/~kalle/data/pirate.data.gz");
> importFile(new InputStreamReader(new GZIPInputStream(new
> URL("http://ginandtonique.org/~kalle/data/pirate.data.gz").openStream())));
> System.out.println("Processing
> http://ginandtonique.org/~kalle/data/hero.data.gz");
> importFile(new InputStreamReader(new GZIPInputStream(new
> URL("http://ginandtonique.org/~kalle/data/hero.data.gz").openStream())));
> System.out.println("Done.");
> // run some tests without the second level suggestions,
> // i.e. user behavioral data only. no ngrams or so.
>
> assertEquals("pirates of the caribbean", facade.didYouMean("pirates ofthe
> caribbean"));
> assertEquals("pirates of the caribbean", facade.didYouMean("pirates of
> the carribbean"));
> assertEquals("pirates caribbean", facade.didYouMean("pirates carricean"));
> assertEquals("pirates of the caribbean", facade.didYouMean("pirates of
> the carriben"));
> assertEquals("pirates of the caribbean", facade.didYouMean("pirates of
> the carabien"));
> assertEquals("pirates of the caribbean", facade.didYouMean("pirates of
> the carabbean"));
> assertEquals("pirates of the caribbean", facade.didYouMean("pirates og
> carribean"));
> assertEquals("pirates of the caribbean soundtrack",
> facade.didYouMean("pirates of the caribbean music"));
> assertEquals("pirates of the caribbean score", facade.didYouMean("pirates
> of the caribbean soundtrack"));
> assertEquals("pirate of caribbean", facade.didYouMean("pirate of
> carabian"));
> assertEquals("pirates of caribbean", facade.didYouMean("pirate of
> caribbean"));
> assertEquals("pirates of caribbean", facade.didYouMean("pirates of
> caribbean"));
> // depening on how many hits and goals are noted with these two queries
> // perhaps the delta should be added to a synonym dictionary?
> assertEquals("homm iv", facade.didYouMean("homm 4"));
> // not yet known.. and we have no second level yet.
> assertNull(facade.didYouMean("the pilates"));
> // use the dictionary built from user queries to build the token phrase
> and ngram suggester.
>
> facade.getDictionary().getPrioritesBySecondLevelSuggester().put(Factory.ngramTokenPhraseSuggesterFactory(facade.getDictionary()),
> 1d);
> // now it's learned
> assertEquals("the pirates", facade.didYouMean("the pilates"));
> // typos
> assertEquals("heroes of might and magic", facade.didYouMean("heroes of
> fight and magic"));
> assertEquals("heroes of might and magic", facade.didYouMean("heroes of
> right and magic"));
> assertEquals("heroes of might and magic", facade.didYouMean("heroes of
> magic and light"));
> // composite dictionary key not learned yet..
> assertEquals(null, facade.didYouMean("heroesof lightand magik"));
> // learn
> assertEquals("heroes of might and magic", facade.didYouMean("heroes of
> light and magik"));
> // test
> assertEquals("heroes of might and magic", facade.didYouMean("heroesof
> lightand magik"));
> // wrong term order
> assertEquals("heroes of might and magic", facade.didYouMean("heroes of
> magic and might"));
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]