-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

robert kuzelj wrote:

> hi mark,
>
>  > However, once you start displaying things on the console, all bets are
>  > off...because your console needs to understand UTF-8 as well, so using
>  > the 'eyeball' method of testing won't work too well here.
> in principle i agree with you. but only having java tests is simply
> not enough. i have to take into consideration that multiple clients
> may access the db. be it via the console or gui clients or what ever.
> so i have to make sure that every client is in sync.
>
> i have done now quite some extensive testing with various
> clients:
>
> - JFaceDBC (an eclipse plugin)
>    i used it for writing the data into the db
>    and reading from it.
> - the (mysql) console (read only)
> - mysql-command-center 0.94  (read only)
> - a java programm that reads from
>    the database via a ResultSet.
>    the reading is done once via getString
>    and once via getBytes.
>
> i have defined two connection that i use for writing
> (exclusively thru JFaceDBC) and reading.
>
> NONE = jdbc:mysql://localhost/?autoReconnect=true
> UTF8 = as_above + &useUnicode=true&characterEncoding=UTF-8
>
> A) write by NONE
>
>     1) read by NONE (JFACE)
>        K??sel;B?;
>        g?l;?lb?rt;
>        ???????????;???????????;
>
>     2) read by UTF8 (JFACE)
>        KÃÃsel;BÃb;
>        ÃgÃl;ÃlbÃrt;
>        ???????????;???????????;
>
>     3) read by UTF8 (mysql-CONSOLE)
>        | KÃÃsel    | BÃb        |
>        | ÃgÃl      | ÃlbÃrt    |
>        | ??????????? | ??????????? |
>
>     4) read by UTF8 (mysql-cc 0.9.4)
>        the same result as 3)
>
>     5) read by NONE (java getBytes)
>        [java] K?sel - B?
>        [java] ?l - ?b?t
>        [java] ??????????? - ???????????
>
>     6) read by NONE (java getString)
>
>        [java] K??sel - B?
>        [java] ?g?l - ?lb?rt
>        [java] ??????????? - ???????????
>
>     7) read by UTF (java getBytes)
>        [java] KÃÃsel - BÃb
>        [java] ÃgÃl - ÃlbÃrt
>        [java] ??????????? - ???????????
>
>     8) read by UTF (java getString)
>        [java] K?sel - B?
>        [java] ?l - ?b?t
>        [java] ??????????? - ???????????
>
> B) write by UTF8
>
>     1) read by NONE (JFACE)
>        K??sel;B?;
> ?      g?l;?lb?rt;
>        ???????????;???????????;
>
>     2) read by UTF8 (JFACE)
>        KÃÃsel;BÃb;
>        ÃgÃl;ÃlbÃrt;
>        ???????????;???????????;
>
>     3) read by UTF8 (mysql-CONSOLE)
>        | KÃÃsel               | BÃb                   |
>        | ÃgÃl                 | ÃlbÃrt               |
>        | ??????????? | ??????????? |
>
>     4) read by UTF8 (mysql-cc 0.9.4)
>        the same result as 3)
>
>     5) read by NONE (java getBytes)
>        [java] K??sel - B?
>        [java] ?g?l - ?lb?rt
>        [java] ??????????? - ???????????
>
>     6) read by NONE (java getString)
>        [java] K??sel - B?
>        [java] ?g?l - ?lb?rt
>        [java] ??????????? - ???????????
>
>     7) read by UTF (java getBytes)
>        [java] KÃÃsel - BÃb
>        [java] ÃgÃl - ÃlbÃrt
>        [java] ??????????? - ???????????
>
>     8) read by UTF (java getString)
>        [java] K?sel - B?
>        [java] ?l - ?b?t
>        [java] ??????????? - ???????????
>
> and here are the results so far
> 1) it is possible to get every into sync.
>     JFace, mysql-console, mysqlcc 0.9.4 and java
>     displayed the same stuff
> 2) it will only work if i write AND read thru
>     a UTF-8 enabled connection. which is really no
>     surprise (not even for me ;-) ).
> 3) i can get java only than into sync if i read
>     from a ResultSet via
>     s = new String(rs.getByte("fieldname"));
>     i can not use the more natural
>     s = rs.getString("fieldname");
>
> now thing begin to get realy weird if i start to
> write thru my java-app.
> - ResultSet.getBytes now returns garbage
> - ResultSet.getString shows the correct result
> - JFace shows almost the correct result (a lot
>    of question marks thou)
> - mysql-console shows only garbage
> - mysqlcc 0.9.4 shows only garbage
>
> i am really puzzled. why cant i write via java to
> the db without f***ing up other clients and why cant
> i read via ResultSet.getString when other clients
> show the correct results but only thru getBytes?
>
> for any help or hints i'd be incredibly happy.

robertj,

Hopefully the following JUnit testcase helps show that your problem
doesn't exist at the JDBC level. It creates a UTF-8 connection to
MySQL-4.1.2, stores UTF-8 encoded strings directly and by prepared
statements, retrieves them, compares to original as strings, and
byte-for-byte using the UTF-8 encoding built in to the JVM.

As a further demonstration, here's the results of a select from data
stored by JDBC in a terminal that obviously doesn't understand UTF-8
(the DOS command window), which is my guess is your problem with
mysql-console and mysqlcc, because mysql_console doesn't know anything
about character sets, it relies on your terminal to display them
correctly, and mysqlcc doesn't know about utf-8, it only knows about
that character sets from MySQL-4.0.x and older:

- ----------------------------------------------------------------------
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 152 to server version: 4.1.2-alpha-max-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select * from testUtf8;
+--------+--------+
| field1 | field2 |
+--------+--------+
| KÂÂsel | BÃb    |
| KÂÂsel | BÃb    |
+--------+--------+
2 rows in set (0.00 sec)
- ----------------------------------------------------------------------

And here's one from a unix x-term, which _does_ know about UTF-8:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 153 to server version: 4.1.2-alpha-max-log

- ----------------------------------------------------------------------

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select * from testUtf8;
+--------+--------+
| field1 | field2 |
+--------+--------+
| KÃÃsel | BÃb    |
| KÃÃsel | BÃb    |
+--------+--------+
2 rows in set (0.02 sec)
- ----------------------------------------------------------------------

    public void testUtf8Encoding2() throws Exception {
        
        String field1 = "KÃÃsel";
        String field2 = "BÃb";
        byte[] field1AsBytes = field1.getBytes("utf-8");
        byte[] field2AsBytes = field2.getBytes("utf-8");
        
                Properties props = new Properties();
        props.put("characterEncoding", "UTF8");
        props.put("useUnicode", "true");

        Connection utfConn = DriverManager.getConnection(dbUrl, props);
        Statement utfStmt = utfConn.createStatement();

        try {
                utfStmt.executeUpdate("DROP TABLE IF EXISTS testUtf8");
                utfStmt.executeUpdate("CREATE TABLE testUtf8 (field1
varchar(32), field2 varchar(32)) CHARACTER SET UTF8");
                utfStmt.executeUpdate("INSERT INTO testUtf8 VALUES ('" + field1
+ "','" + field2 + "')");
                PreparedStatement pStmt = utfConn.prepareStatement("INSERT INTO
testUtf8 VALUES (?, ?)");
                pStmt.setString(1, field1);
                pStmt.setString(2, field2);
                pStmt.executeUpdate();
                
                ResultSet rs = utfStmt.executeQuery("SELECT * FROM testUtf8");
                assertTrue(rs.next());
                
                // Compare results stored using direct statement
                
                // Compare to original string
                assertTrue(field1.equals(rs.getString(1)));
                assertTrue(field2.equals(rs.getString(2)));
                
                // Compare byte-for-byte, ignoring encoding
                assertTrue(bytesAreSame(field1AsBytes, rs.getBytes(1)));
                assertTrue(bytesAreSame(field2AsBytes, rs.getBytes(2)));
                
                assertTrue(rs.next());
                // Compare to original string
                assertTrue(field1.equals(rs.getString(1)));
                assertTrue(field2.equals(rs.getString(2)));
                
                // Compare byte-for-byte, ignoring encoding
                assertTrue(bytesAreSame(field1AsBytes, rs.getBytes(1)));
                assertTrue(bytesAreSame(field2AsBytes, rs.getBytes(2)));
                
        } finally {
                utfStmt.executeUpdate("DROP TABLE IF EXISTS testUtf8");
        }

    }




- --
Mr. Mark Matthews
MySQL AB, Software Development Manager, J2EE and Windows Platforms
Office: +1 708 332 0507
www.mysql.com

MySQL Guide to Lower TCO
http://www.mysql.com/it-resources/white-papers/tco.php
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAtQKptvXNTca6JD8RAvVNAJsEDn/FtfeJcoMCDsvnySpP8WJtrgCgiqwf
dahjntUHL0w4ak9aejZdDq8=
=mDnM
-----END PGP SIGNATURE-----

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to