[ https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wettin updated LUCENE-626: ------------------------------- Attachment: (was: spellchecker.diff) > Extended spell checker with phrase support and adaptive user session analysis. > ------------------------------------------------------------------------------ > > Key: LUCENE-626 > URL: https://issues.apache.org/jira/browse/LUCENE-626 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Karl Wettin > Assignee: Karl Wettin > Priority: Minor > Attachments: LUCENE-626_2007_10_16.txt > > > Extensive java docs available in patch, but I try to keep it compiled here: > http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description > The patch spellcheck.diff should not depend on anything but Lucene trunk. It > has basic support for phrase suggestions and query goal detection, but is > pretty buggy and lacks features available in didyoumean.diff.bz2. The latter > depends on LUCENE-550. > Example: > {code:java} > public void testImportData() throws Exception { > // load 200 000 user queries with session data and time stamp. no goals > specified. > System.out.println("Processing > http://ginandtonique.org/~kalle/data/pirate.data.gz"); > importFile(new InputStreamReader(new GZIPInputStream(new > URL("http://ginandtonique.org/~kalle/data/pirate.data.gz").openStream()))); > System.out.println("Processing > http://ginandtonique.org/~kalle/data/hero.data.gz"); > importFile(new InputStreamReader(new GZIPInputStream(new > URL("http://ginandtonique.org/~kalle/data/hero.data.gz").openStream()))); > System.out.println("Done."); > // run some tests without the second level suggestions, > // i.e. user behavioral data only. no ngrams or so. > > assertEquals("pirates of the caribbean", facade.didYouMean("pirates ofthe > caribbean")); > assertEquals("pirates of the caribbean", facade.didYouMean("pirates of > the carribbean")); > assertEquals("pirates caribbean", facade.didYouMean("pirates carricean")); > assertEquals("pirates of the caribbean", facade.didYouMean("pirates of > the carriben")); > assertEquals("pirates of the caribbean", facade.didYouMean("pirates of > the carabien")); > assertEquals("pirates of the caribbean", facade.didYouMean("pirates of > the carabbean")); > assertEquals("pirates of the caribbean", facade.didYouMean("pirates og > carribean")); > assertEquals("pirates of the caribbean soundtrack", > facade.didYouMean("pirates of the caribbean music")); > assertEquals("pirates of the caribbean score", facade.didYouMean("pirates > of the caribbean soundtrack")); > assertEquals("pirate of caribbean", facade.didYouMean("pirate of > carabian")); > assertEquals("pirates of caribbean", facade.didYouMean("pirate of > caribbean")); > assertEquals("pirates of caribbean", facade.didYouMean("pirates of > caribbean")); > // depening on how many hits and goals are noted with these two queries > // perhaps the delta should be added to a synonym dictionary? > assertEquals("homm iv", facade.didYouMean("homm 4")); > // not yet known.. and we have no second level yet. > assertNull(facade.didYouMean("the pilates")); > // use the dictionary built from user queries to build the token phrase > and ngram suggester. > > facade.getDictionary().getPrioritesBySecondLevelSuggester().put(Factory.ngramTokenPhraseSuggesterFactory(facade.getDictionary()), > 1d); > // now it's learned > assertEquals("the pirates", facade.didYouMean("the pilates")); > // typos > assertEquals("heroes of might and magic", facade.didYouMean("heroes of > fight and magic")); > assertEquals("heroes of might and magic", facade.didYouMean("heroes of > right and magic")); > assertEquals("heroes of might and magic", facade.didYouMean("heroes of > magic and light")); > // composite dictionary key not learned yet.. > assertEquals(null, facade.didYouMean("heroesof lightand magik")); > // learn > assertEquals("heroes of might and magic", facade.didYouMean("heroes of > light and magik")); > // test > assertEquals("heroes of might and magic", facade.didYouMean("heroesof > lightand magik")); > // wrong term order > assertEquals("heroes of might and magic", facade.didYouMean("heroes of > magic and might")); > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]