[
https://issues.apache.org/jira/browse/ROL-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kohei Nozaki updated ROL-2090:
------------------------------
Attachment: ROL-2090.patch
With this patch, Now users able to add some configurations for Lucene in
{{roller-custom.properties}}. An example intended to use in a Japanese blog:
{noformat}
lucene.anaylzer.maxTokenCount=100000
lucene.analyzer.class=org.apache.lucene.analysis.cjk.CJKAnalyzer
{noformat}
But putting a jar which contains {{Analyzer}} implementation to container's
library directory didn't work. I guess it's related to difference of class
loader.
Any feedback welcome.
> Lucene integration doesn't work well for entries that written in some
> languages
> -------------------------------------------------------------------------------
>
> Key: ROL-2090
> URL: https://issues.apache.org/jira/browse/ROL-2090
> Project: Apache Roller
> Issue Type: Improvement
> Components: Data Model & JPA Backend
> Affects Versions: 5.1.2
> Reporter: Kohei Nozaki
> Assignee: Roller Unassigned
> Priority: Minor
> Attachments: ROL-2090.patch
>
>
> Reported in
> http://benzaiten.dyndns.org/roller/ugya/entry/roller_500_to_510_migration
> (Japanese). Summary in English:
> h4. Japanese keywords doesn't hit against the latter part of long entry
> It's caused by maximum token limit in the following code. The author said
> that typical Japanese text is not splitted by white spaces so that's not work
> well with it.
> {noformat}
> // Limit to 1000 tokens.
> LimitTokenCountAnalyzer analyzer = new LimitTokenCountAnalyzer(
> IndexManagerImpl.getAnalyzer(), 1000);
> {noformat}
> h4. StandardAnalyzer doesn't work well with Japanese text
> Roller uses {{StandardAnalyzer}} but there are some other language specific
> implementations for it such as {{CJKAnalyzer}} or {{JapaneseAnalyzer}}. The
> author said that these implementations improve accuracy for such languages. I
> know these implementations are language specific so we can't simply replace
> it to them but might want to switch it in flexible manner, Such as using
> language configuration in each blogs.
> {noformat}
> public static final Analyzer getAnalyzer() {
> return new StandardAnalyzer(FieldConstants.LUCENE_VERSION);
> }
> {noformat}
> I'm still not sure what would be proper solutions but I believe we have room
> for some improvement here. Any advices would be appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)