[ https://issues.apache.org/jira/browse/SOLR-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240534#comment-13240534 ]
Christian Moen commented on SOLR-3276: -------------------------------------- Thanks a lot, Robert and Uwe! This is a summary of the changes: * Improvements to {{schema.xml}} ** Improved description of {{text_ja}} and {{JapaneseAnalyzer}} to explain synonym compounds ** Added a brief description of user dictionaries and a commented out sample tokenizer entry ** Added a mention on using different segmentation modes for index and query, but no elaborate config ** Added a wiki reference to the Japanese language support (currently a placeholder) * Fixed user dictionary attribute naming conventions for {{JapaneseTokenizerFactory}}. Attributes are now {{userDictionary}} and {{userDictionaryEncoding}} (was {{user-dictionary}} and {{user-ictionary-encoding}} * Added a sample user dictionary file to {{solr/example/solr/conf/lang/userdict_ja.txt}} with format details, which is referenced from the commented out example in {{schema.xml}}. * Changed Kuromoji to Japanese naming in {{stopwords.txt}} and {{stoptags.txt}}. (Need a {{sync-analyzers}} in {{solr}} prior to commit) * Replaced a one-liner of code, I've changed a {{System.out.println}} in {{UserDictionary}} to throw a {{RuntimeException}} with a proper error message in case user dictionary parsing fails, and tested this manually. If you claim that there's been some scope-creep in this patch, I can't argue with that. :) > Fix attribute conventions for JapaneseTokenizerFactory and add important > information to schema.xml > -------------------------------------------------------------------------------------------------- > > Key: SOLR-3276 > URL: https://issues.apache.org/jira/browse/SOLR-3276 > Project: Solr > Issue Type: Improvement > Components: documentation > Affects Versions: 3.6, 4.0 > Reporter: Christian Moen > Assignee: Christian Moen > Priority: Blocker > Attachments: SOLR-3276.patch, SOLR-3276.patch > > > The description of the {{ja_text}} field type in {{schema.xml}} is > incomplete, doesn't describe user dictionaries and lacks a reference to the > wiki page with extensive Japanese language support details > ([http://wiki.apache.org/solr/JapaneseLanguageSupport] - currently a > placeholder page). > The attribute convention used by {{JapaneseTokenizerFactory}} doesn't comply > with the standards and used {{user-dictionary}} and > {{user-dictionary-encoding}} instead of {{userDictionary}} and > {{userDictionaryEncoding}}. > These changes are low risk and it would be a shame to not get this right in > 3.6 with all the work done on Japanese. > Patch coming up shortly. I really hope it's okay to commit this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org