[ 
https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wettin updated LUCENE-626:
-------------------------------

    Attachment:     (was: spellchecker.diff)

> Extended spell checker with phrase support and adaptive user session analysis.
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-626
>                 URL: https://issues.apache.org/jira/browse/LUCENE-626
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Karl Wettin
>            Assignee: Karl Wettin
>            Priority: Minor
>         Attachments: LUCENE-626_2007_10_16.txt
>
>
> Extensive java docs available in patch, but I try to keep it compiled here: 
> http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description
> The patch spellcheck.diff should not depend on anything but Lucene trunk. It 
> has basic support for phrase suggestions  and query goal detection, but is 
> pretty buggy and lacks features available in didyoumean.diff.bz2. The latter 
> depends on LUCENE-550.
> Example:
> {code:java}
> public void testImportData() throws Exception {
>     // load 200 000 user queries with session data and time stamp. no goals 
> specified.
>     System.out.println("Processing 
> http://ginandtonique.org/~kalle/data/pirate.data.gz";);
>     importFile(new InputStreamReader(new GZIPInputStream(new 
> URL("http://ginandtonique.org/~kalle/data/pirate.data.gz";).openStream())));
>     System.out.println("Processing 
> http://ginandtonique.org/~kalle/data/hero.data.gz";);
>     importFile(new InputStreamReader(new GZIPInputStream(new 
> URL("http://ginandtonique.org/~kalle/data/hero.data.gz";).openStream())));
>     System.out.println("Done.");
>     // run some tests without the second level suggestions,
>     // i.e. user behavioral data only. no ngrams or so.
>     
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates ofthe 
> caribbean"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates of 
> the carribbean"));
>     assertEquals("pirates caribbean", facade.didYouMean("pirates carricean"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates of 
> the carriben"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates of 
> the carabien"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates of 
> the carabbean"));
>     assertEquals("pirates of the caribbean", facade.didYouMean("pirates og 
> carribean"));
>     assertEquals("pirates of the caribbean soundtrack", 
> facade.didYouMean("pirates of the caribbean music"));
>     assertEquals("pirates of the caribbean score", facade.didYouMean("pirates 
> of the caribbean soundtrack"));
>     assertEquals("pirate of caribbean", facade.didYouMean("pirate of 
> carabian"));
>     assertEquals("pirates of caribbean", facade.didYouMean("pirate of 
> caribbean"));
>     assertEquals("pirates of caribbean", facade.didYouMean("pirates of 
> caribbean"));
>     // depening on how many hits and goals are noted with these two queries
>     // perhaps the delta should be added to a synonym dictionary? 
>     assertEquals("homm iv", facade.didYouMean("homm 4"));
>     // not yet known.. and we have no second level yet.
>     assertNull(facade.didYouMean("the pilates"));
>     // use the dictionary built from user queries to build the token phrase 
> and ngram suggester.      
>     
> facade.getDictionary().getPrioritesBySecondLevelSuggester().put(Factory.ngramTokenPhraseSuggesterFactory(facade.getDictionary()),
>  1d);
>     // now it's learned
>     assertEquals("the pirates", facade.didYouMean("the pilates"));
>     // typos
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> fight and magic"));
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> right and magic"));
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> magic and light"));
>     // composite dictionary key not learned yet..
>     assertEquals(null, facade.didYouMean("heroesof lightand magik"));
>     // learn
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> light and magik"));
>     // test
>     assertEquals("heroes of might and magic", facade.didYouMean("heroesof 
> lightand magik"));
>     // wrong term order
>     assertEquals("heroes of might and magic", facade.didYouMean("heroes of 
> magic and might"));
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to