[ https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531956 ]
Daniel John Debrunner commented on DERBY-2967: ---------------------------------------------- Looking carefully at the SQL Standard - section 8.5 SR 3 c) ii) 4) then currently (I think) Derby's LIKE with TERRITORY_BASED collation is not being implemented correctly. For a pattern like 'aa' (norway) or 'ch' (spain) then the SQL standard indicates that LIKE operates a character at a time. So the pattern is not the combination of 'aa' or 'ch', but two separate characters 'a' 'a' or 'c' 'h' . The collation is only used when comparing this single ( 'exactly 1 (one)' ) character. MySQL indicates this as well, stating LIKE performs matching on a per-character basis, thus it can produce different results to the = comparison operator. E.g. 'AA' LIKE 'Å' is FALSE, but 'AA' = 'Å' is TRUE This would indicate that during LIKE processing a CollationElementIterator should only ever be created on a single character, though this does go back to Kathey's question of what is a single character (see DERBY-3080). There seem to be three forms that could be called a single character: 1) A simple single Unicode codepoint such a 'Å' U+212B 2) A single Unicode codepoint followed by one or more combining marks, e.g. U+0041 U+030A 3) A contraction, where two or more characters *sort as if* they were a single base character (e.g. CH in spanish) Unicode TR10 I think is saying that 1) and 2) are single characters, but 3) is not. (MySQL reference: http://dev.mysql.com/doc/refman/4.1/en/string-comparison-functions.html ) > Single character does not match high value unicode character with collation > TERRITORY_BASED > ------------------------------------------------------------------------------------------- > > Key: DERBY-2967 > URL: https://issues.apache.org/jira/browse/DERBY-2967 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.4.0.0 > Reporter: Kathey Marsden > Assignee: Mamta A. Satoor > Attachments: DERBY2967_offset_based_diff_Oct02_07.txt, > DERBY2967_offset_based_stat_Oct02_07.txt, fullcoll.out, > patch2_setOffset_fullcoll.out, patch2_with_setOffset_diff_Sep2007.txt, > patch2_with_setOffset_stat_Sep2007.txt, step1_iteratorbased_Sep1507_diff.txt, > step1_iteratorbased_Sep1507_stat.txt, temp_diff.txt, temp_stat.txt, > TestFrench.java, TestNorway.java > > > With TERRITORY_BASED collation '_' does not match the character \uFA2D. It > is the same for english or norwegian. FOR collation UCS_BASIC it matches > fine. Could you tell me if this is a bug? > Here is a program to reproduce. > import java.sql.*; > public class HighCharacter { > public static void main(String args[]) throws Exception > { > System.out.println("\n Territory no_NO"); > Class.forName("org.apache.derby.jdbc.EmbeddedDriver"); > Connection conn = > DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED"); > testLikeWithHighestValidCharacter(conn); > conn.close(); > System.out.println("\n Territory en_US"); > conn = > DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED"); > testLikeWithHighestValidCharacter(conn); > conn.close(); > System.out.println("\n Collation USC_BASIC"); > conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true"); > testLikeWithHighestValidCharacter(conn); > } > public static void testLikeWithHighestValidCharacter(Connection conn) throws > SQLException { > Statement stmt = conn.createStatement(); > try { > stmt.executeUpdate("drop table t1"); > }catch (SQLException se) > {// drop failure ok. > } > stmt.executeUpdate("create table t1(c11 int)"); > stmt.executeUpdate("insert into t1 values 1"); > > // \uFA2D - the highest valid character according to > // Character.isDefined() of JDK 1.4; > PreparedStatement ps = > conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?"); > String[] match = { "%", "_", "\uFA2D" }; > for (int i = 0; i < match.length; i++) { > System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]); > ps.setString(1, match[i]); > ResultSet rs = ps.executeQuery(); > if( rs.next() && rs.getString(1).equals("1")) > System.out.println("PASS"); > else System.out.println("FAIL: no match"); > rs.close(); > } > } > } > Mamta made some comments on this issue in the following thread: > http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.