[
https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535958
]
Mamta A. Satoor commented on DERBY-2967:
----------------------------------------
Thanks, Knut, for checking my commit. I was hesitant too about all the objects
creations.
I think we can definitely make the first change suggested by you. I will go
ahead and give it a try
*************part of the change suggested by Knut****************
a) We could use the compare() method instead of iterators. It caches and reuses
the iterators across calls and therefore it might be more efficient. It would
also simplify the code, since the else clause in checkEquality() could be
rewritten to:
} else {//dealing with territory based character string
return collator.compare(new String(pat, pLoc, 1), new String(val, vLoc, 1))
== 0:
}
*************end of part of the change suggested by Knut*********
But as for the second alternative, we can't create a CollationElementIerator
for the entire string ahead of time for LIKE operation. Let me use an example
to illustrate why. In Norway, the collation element(s) returned for string 'aa'
is not same as collation element(s) return for one 'a' at a time. So, when the
user has a WHERE clause 'caad' LIKE '%a%', SQL spec requires us to return a
TRUE for this WHERE clause. We will not implement that behavior if we generated
collation elements for entire string 'caad' at one shot. We need to break
'caad' into four characters and have collation element for each one of those 4
characters. In Norway, if we generated collation elements for string 'caad', it
will find only 3 characters in that string and those 3 characters will be 'c',
'aa' and 'd'. Because of this, we have to generate collation element(s) one
character at a time.
Would love to hear if there are any other ideas to cut down on object creation.
> Single character does not match high value unicode character with collation
> TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
> Key: DERBY-2967
> URL: https://issues.apache.org/jira/browse/DERBY-2967
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.4.0.0
> Reporter: Kathey Marsden
> Assignee: Mamta A. Satoor
> Attachments: DERBY2967_Oct11_07_diff.txt,
> DERBY2967_Oct11_07_stat.txt, DERBY2967_offset_based_diff_Oct02_07.txt,
> DERBY2967_offset_based_stat_Oct02_07.txt, fullcoll.out,
> patch2_setOffset_fullcoll.out, patch2_with_setOffset_diff_Sep2007.txt,
> patch2_with_setOffset_stat_Sep2007.txt, step1_iteratorbased_Sep1507_diff.txt,
> step1_iteratorbased_Sep1507_stat.txt, temp_diff.txt, temp_stat.txt,
> TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match the character \uFA2D. It
> is the same for english or norwegian. FOR collation UCS_BASIC it matches
> fine. Could you tell me if this is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
> public static void main(String args[]) throws Exception
> {
> System.out.println("\n Territory no_NO");
> Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
> Connection conn =
> DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Territory en_US");
> conn =
> DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Collation USC_BASIC");
> conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
> testLikeWithHighestValidCharacter(conn);
> }
> public static void testLikeWithHighestValidCharacter(Connection conn) throws
> SQLException {
> Statement stmt = conn.createStatement();
> try {
> stmt.executeUpdate("drop table t1");
> }catch (SQLException se)
> {// drop failure ok.
> }
> stmt.executeUpdate("create table t1(c11 int)");
> stmt.executeUpdate("insert into t1 values 1");
>
> // \uFA2D - the highest valid character according to
> // Character.isDefined() of JDK 1.4;
> PreparedStatement ps =
> conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
> String[] match = { "%", "_", "\uFA2D" };
> for (int i = 0; i < match.length; i++) {
> System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
> ps.setString(1, match[i]);
> ResultSet rs = ps.executeQuery();
> if( rs.next() && rs.getString(1).equals("1"))
> System.out.println("PASS");
> else System.out.println("FAIL: no match");
> rs.close();
> }
> }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.