a lexicon object for merging spellchecker and synonyms from stemming --------------------------------------------------------------------
Key: LUCENE-1190 URL: https://issues.apache.org/jira/browse/LUCENE-1190 Project: Lucene - Java Issue Type: New Feature Components: contrib/*, Search Affects Versions: 2.3 Reporter: Mathieu Lecarme Attachments: aphone+lexicon.patch Some Lucene features need a list of referring word. Spellchecking is the basic example, but synonyms is an other use. Other tools can be used smoothlier with a list of words, without disturbing the main index : stemming and other simplification of word (anagram, phonetic ...). For that, I suggest a Lexicon object, wich contains words (Term + frequency), wich can be built from Lucene Directory, or plain text files. Classical TokenFilter can be used with Lexicon (LowerCaseFilter and ISOLatin1AccentFilter should be the most useful). Lexicon uses a Lucene Directory, each Word is a Document, each meta is a Field (word, ngram, phonetic, fields, anagram, size ...). Above a minimum size, number of differents words used in an index can be considered as stable. So, a standard Lexicon (built from wikipedia by example) can be used. A similarTokenFilter is provided. A spellchecker will come soon. A fuzzySearch implementation, a neutral synonym TokenFilter can be done. Unused words can be remove on demand (lazy delete?) Any criticism or suggestions? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]