[ https://issues.apache.org/jira/browse/LUCENE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577309#action_12577309 ]
Hiroaki Kawai commented on LUCENE-1032: --------------------------------------- I think this feature should merged to https://issues.apache.org/jira/browse/LUCENE-1215 Unicode compatibility decomposition will fix this issue. :-) > CJKAnalyzer should convert half width katakana to full width katakana > --------------------------------------------------------------------- > > Key: LUCENE-1032 > URL: https://issues.apache.org/jira/browse/LUCENE-1032 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Affects Versions: 2.0.0 > Reporter: Andrew Lynch > > Some of our Japanese customers are reporting errors when performing searches > using half width characters. > The desired behavior is that a document containing half width characters > should be returned when performing a search using full width equivalents or > when searching by the half width character itself. > Currently, a search will not return any matches for half width characters. > Here is a test case outlining desired behavior (this may require a new > Analyzer). > {code} > public class TestJapaneseEncodings extends TestCase > { > byte[] fullWidthKa = new byte[]{(byte) 0xE3, (byte) 0x82, (byte) 0xAB}; > byte[] halfWidthKa = new byte[]{(byte) 0xEF, (byte) 0xBD, (byte) 0xB6}; > public void testAnalyzerWithHalfWidth() throws IOException > { > Reader r1 = new StringReader(makeHalfWidthKa()); > TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1); > assertNotNull(stream); > Token token = stream.next(); > assertNotNull(token); > assertEquals(makeFullWidthKa(), token.termText()); > } > public void testAnalyzerWithFullWidth() throws IOException > { > Reader r1 = new StringReader(makeFullWidthKa()); > TokenStream stream = new CJKAnalyzer().tokenStream("foo", r1); > assertEquals(makeFullWidthKa(), stream.next().termText()); > } > private String makeFullWidthKa() throws UnsupportedEncodingException > { > return new String(fullWidthKa, "UTF-8"); > } > private String makeHalfWidthKa() throws UnsupportedEncodingException > { > return new String(halfWidthKa, "UTF-8"); > } > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]