[ https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522039 ]
Mamta A. Satoor commented on DERBY-2967: ---------------------------------------- I spent some time on this Jira entry to explore Dan's suggestion for _ search in a string ************* --Note that the iterator object is of type CollationElementIterator. int currentChar = iterator.getOffset(); do { iterator.next(); } while (iterator.getOffset() == currentChar) ************* I believe the code suggested by Dan above will do the trick but I am not sure how to fit that logic in the current code inside the iapi.types.Like.like method (method starting at line 258) which is where the current implementation for _ resides. Some background information on the classes and methods involved in this discussion: There are 2 like methods inside WorkHorseForCollatorDatatypes(which handles collation sensitive methods for character string types with territory based collation) and they only differ in the sense that one accepts the escape DVD while the other one does not. Both these methods call the like method(starting at line 96) in iapi.types.Like. This like method ends up calling another like method in the same class (starting at line 258) which provides the actual implementation. Notice, that this like method does not work with CollationElementIterator. Instead, it expects the caller to send the int array containing the collation elements for string to be searched into, pattern to be looked and escape sequence. This is done for performance reasons. We do not want to construct the collation element arrary for the strings during every call to like. Instead, we want to construct it once and reuse it every subsequent time. And hence, the current implementation does not work with CollationElementIterator. As a solution, I am thinking that may be I should have another int array in WorkHorseForCollatorDatatypes, which will keep track of the starting position of the collation elements for each of the characters. We already have an int array, collationElementsForString, which holds the collation elements for all the characters that this WorkHorseForCollatorDatatypes holds. If we knew where the new collation elements start in collationElementsForString, we can just advance to the next character's collation element starting position when we find a _. Let me know if anyone has any feedback on this approach or has any other suggestions on fixing the problem. > Single character does not match high value unicode character with collation > TERRITORY_BASED > ------------------------------------------------------------------------------------------- > > Key: DERBY-2967 > URL: https://issues.apache.org/jira/browse/DERBY-2967 > Project: Derby > Issue Type: Bug > Components: SQL > Affects Versions: 10.4.0.0 > Reporter: Kathey Marsden > Attachments: TestFrench.java, TestNorway.java > > > With TERRITORY_BASED collation '_' does not match the character \uFA2D. It > is the same for english or norwegian. FOR collation UCS_BASIC it matches > fine. Could you tell me if this is a bug? > Here is a program to reproduce. > import java.sql.*; > public class HighCharacter { > public static void main(String args[]) throws Exception > { > System.out.println("\n Territory no_NO"); > Class.forName("org.apache.derby.jdbc.EmbeddedDriver"); > Connection conn = > DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED"); > testLikeWithHighestValidCharacter(conn); > conn.close(); > System.out.println("\n Territory en_US"); > conn = > DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED"); > testLikeWithHighestValidCharacter(conn); > conn.close(); > System.out.println("\n Collation USC_BASIC"); > conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true"); > testLikeWithHighestValidCharacter(conn); > } > public static void testLikeWithHighestValidCharacter(Connection conn) throws > SQLException { > Statement stmt = conn.createStatement(); > try { > stmt.executeUpdate("drop table t1"); > }catch (SQLException se) > {// drop failure ok. > } > stmt.executeUpdate("create table t1(c11 int)"); > stmt.executeUpdate("insert into t1 values 1"); > > // \uFA2D - the highest valid character according to > // Character.isDefined() of JDK 1.4; > PreparedStatement ps = > conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?"); > String[] match = { "%", "_", "\uFA2D" }; > for (int i = 0; i < match.length; i++) { > System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]); > ps.setString(1, match[i]); > ResultSet rs = ps.executeQuery(); > if( rs.next() && rs.getString(1).equals("1")) > System.out.println("PASS"); > else System.out.println("FAIL: no match"); > rs.close(); > } > } > } > Mamta made some comments on this issue in the following thread: > http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.