[jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED

Mamta A. Satoor (JIRA) Thu, 18 Oct 2007 09:11:19 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535958
 ]


Mamta A. Satoor commented on DERBY-2967:
----------------------------------------

Thanks, Knut,  for checking my commit. I was hesitant too about all the objects 
creations.

I think we can definitely make the first change suggested by you. I will go 
ahead and give it a try
*************part of the change suggested by Knut****************
a) We could use the compare() method instead of iterators. It caches and reuses 
the iterators across calls and therefore it might be more efficient. It would 
also simplify the code, since the else clause in checkEquality() could be 
rewritten to: 

} else {//dealing with territory based character string 
    return collator.compare(new String(pat, pLoc, 1), new String(val, vLoc, 1)) 
== 0: 
} 
*************end of part of the change suggested by Knut*********

But as for the second alternative, we can't create a CollationElementIerator 
for the entire string ahead of time for LIKE operation. Let me use an example 
to illustrate why. In Norway, the collation element(s) returned for string 'aa' 
is not same as collation element(s) return for one 'a' at a time. So, when the 
user has a WHERE clause  'caad' LIKE '%a%', SQL spec requires us to return a 
TRUE for this WHERE clause. We will not implement that behavior if we generated 
collation elements for entire string 'caad' at one shot. We need to break 
'caad' into four characters and have collation element for each one of those 4 
characters. In Norway, if we generated collation elements for string 'caad', it 
will find only 3 characters in that string and those 3 characters will be 'c', 
'aa' and 'd'.  Because of this, we have to generate collation element(s) one 
character at a time.

Would love to hear if there are any other ideas to cut down on object creation.


> Single character does not match high value unicode character with collation 
> TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2967
>                 URL: https://issues.apache.org/jira/browse/DERBY-2967
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.0.0
>            Reporter: Kathey Marsden
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY2967_Oct11_07_diff.txt, 
> DERBY2967_Oct11_07_stat.txt, DERBY2967_offset_based_diff_Oct02_07.txt, 
> DERBY2967_offset_based_stat_Oct02_07.txt, fullcoll.out, 
> patch2_setOffset_fullcoll.out, patch2_with_setOffset_diff_Sep2007.txt, 
> patch2_with_setOffset_stat_Sep2007.txt, step1_iteratorbased_Sep1507_diff.txt, 
> step1_iteratorbased_Sep1507_stat.txt, temp_diff.txt, temp_stat.txt, 
> TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match  the character \uFA2D.  It 
> is the same for english or norwegian. FOR collation UCS_BASIC it matches 
> fine.  Could you tell me if this is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn = 
> DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn = 
> DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>    }
> public static  void testLikeWithHighestValidCharacter(Connection conn) throws 
> SQLException {
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>  
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>      String[] match = { "%", "_", "\uFA2D" };
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else          System.out.println("FAIL: no match");
>    rs.close();
>    }
>   }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED

Reply via email to