[jira] Updated: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-1190: Attachment: aphone+lexicon.patch > a lexicon object for merging spellchecker and synonyms from stemming > > > Key: LUCENE-1190 > URL: https://issues.apache.org/jira/browse/LUCENE-1190 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/*, Search >Affects Versions: 2.3 >Reporter: Mathieu Lecarme > Attachments: aphone+lexicon.patch, aphone+lexicon.patch > > > Some Lucene features need a list of referring word. Spellchecking is the > basic example, but synonyms is an other use. Other tools can be used > smoothlier with a list of words, without disturbing the main index : stemming > and other simplification of word (anagram, phonetic ...). > For that, I suggest a Lexicon object, wich contains words (Term + frequency), > wich can be built from Lucene Directory, or plain text files. > Classical TokenFilter can be used with Lexicon (LowerCaseFilter and > ISOLatin1AccentFilter should be the most useful). > Lexicon uses a Lucene Directory, each Word is a Document, each meta is a > Field (word, ngram, phonetic, fields, anagram, size ...). > Above a minimum size, number of differents words used in an index can be > considered as stable. So, a standard Lexicon (built from wikipedia by > example) can be used. > A similarTokenFilter is provided. > A spellchecker will come soon. > A fuzzySearch implementation, a neutral synonym TokenFilter can be done. > Unused words can be remove on demand (lazy delete?) > Any criticism or suggestions? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-1190: Attachment: aphone+lexicon.patch > a lexicon object for merging spellchecker and synonyms from stemming > > > Key: LUCENE-1190 > URL: https://issues.apache.org/jira/browse/LUCENE-1190 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/*, Search >Affects Versions: 2.3 >Reporter: Mathieu Lecarme > Attachments: aphone+lexicon.patch > > > Some Lucene features need a list of referring word. Spellchecking is the > basic example, but synonyms is an other use. Other tools can be used > smoothlier with a list of words, without disturbing the main index : stemming > and other simplification of word (anagram, phonetic ...). > For that, I suggest a Lexicon object, wich contains words (Term + frequency), > wich can be built from Lucene Directory, or plain text files. > Classical TokenFilter can be used with Lexicon (LowerCaseFilter and > ISOLatin1AccentFilter should be the most useful). > Lexicon uses a Lucene Directory, each Word is a Document, each meta is a > Field (word, ngram, phonetic, fields, anagram, size ...). > Above a minimum size, number of differents words used in an index can be > considered as stable. So, a standard Lexicon (built from wikipedia by > example) can be used. > A similarTokenFilter is provided. > A spellchecker will come soon. > A fuzzySearch implementation, a neutral synonym TokenFilter can be done. > Unused words can be remove on demand (lazy delete?) > Any criticism or suggestions? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]