[jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED

Mamta A. Satoor (JIRA) Wed, 22 Aug 2007 23:23:00 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522039
 ]


Mamta A. Satoor commented on DERBY-2967:
----------------------------------------

I spent some time on this Jira entry to explore Dan's suggestion for _ search 
in a string
*************
--Note that the iterator object is of type CollationElementIterator. 
int currentChar = iterator.getOffset(); 
do { 
  iterator.next(); 
} while (iterator.getOffset() == currentChar) 
*************

I believe the code suggested by Dan above will do the trick but I am not sure 
how to fit that logic in the current code inside the iapi.types.Like.like 
method (method starting at line 258) which is where the current implementation 
for _ resides. 

Some background information on the classes and methods involved in this 
discussion: There are 2 like methods inside WorkHorseForCollatorDatatypes(which 
handles collation sensitive methods for character string types with territory 
based collation) and they only differ in the sense that one accepts the escape 
DVD while the other one does not. Both these methods call the like 
method(starting at line 96) in iapi.types.Like. This like method ends up 
calling another like method in the same class (starting at line 258) which 
provides the actual implementation. Notice, that this like method does not work 
with CollationElementIterator. Instead, it expects the caller to send the int 
array containing the collation elements for string to be searched into, pattern 
to be looked and escape sequence. This is done for performance reasons. We do 
not want to construct the collation element arrary for the strings during every 
call to like. Instead, we want to construct it once and reuse it every 
subsequent time. And hence, the current implementation does not work with 
CollationElementIterator.

As a solution, I am thinking that may be I should have another int array in 
WorkHorseForCollatorDatatypes, which will keep track of the starting position 
of the collation elements for each of the characters. We already have an int 
array, collationElementsForString, which holds the collation elements for all 
the characters that this WorkHorseForCollatorDatatypes holds. If we knew where 
the new collation elements start in collationElementsForString, we can just 
advance to the next character's collation element starting position when we 
find a _. 

Let me know if anyone has any feedback on this approach or has any other 
suggestions on fixing the problem.

> Single character does not match high value unicode character with collation 
> TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2967
>                 URL: https://issues.apache.org/jira/browse/DERBY-2967
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.0.0
>            Reporter: Kathey Marsden
>         Attachments: TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match  the character \uFA2D.  It 
> is the same for english or norwegian. FOR collation UCS_BASIC it matches 
> fine.  Could you tell me if this is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn = 
> DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn = 
> DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>    }
> public static  void testLikeWithHighestValidCharacter(Connection conn) throws 
> SQLException {
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>  
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>      String[] match = { "%", "_", "\uFA2D" };
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else          System.out.println("FAIL: no match");
>    rs.close();
>    }
>   }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED

Reply via email to