Sorry. In hast, I copied the wrong results. I generated a new set-here they are:

=========================================
1) WHAT IS INSERTED
=========================================

       ord           binary
----------------------------------------------------------------
 0fc1  4033          11100000 10111111 10000001
 0fc2  4034          11100000 10111111 10000010
 0fc3  4035          11100000 10111111 10000011
 0fc4  4036          11100000 10111111 10000100
 0fc5  4037          11100000 10111111 10000101
 0fc6  4038          11100000 10111111 10000110
 0fc7  4039          11100000 10111111 10000111
 0fc8  4040          11100000 10111111 10001000
 0fc9  4041          11100000 10111111 10001001
 0fca  4042          11100000 10111111 10001010
 0fcb  4043          11100000 10111111 10001011
 0fcc  4044          11100000 10111111 10001100
 0fcd  4045          11100000 10111111 10001101
 0fce  4046          11100000 10111111 10001110
 0fcf  4047          11100000 10111111 10001111
 0fd0  4048          11100000 10111111 10010000
 0fd1  4049          11100000 10111111 10010001
 0fd2  4050          11100000 10111111 10010010
 0fd3  4051          11100000 10111111 10010011
 0fd4  4052          11100000 10111111 10010100
 0fd5  4053          11100000 10111111 10010101

=========================================
2) INCORRECT ORDER RESULT?
=========================================

node_id   path    ord     len         binary
----------------------------------------------------------------
 5086     X       4033     3          11100000 10111111 10000001
 5098     X       4045     3          11100000 10111111 10001101
 5100     X       4047     3          11100000 10111111 10001111
 5101     X       4048     3          11100000 10111111 10010000
 5097     X       4044     3          11100000 10111111 10001100
 5095     X       4042     3          11100000 10111111 10001010
 5099     X       4046     3          11100000 10111111 10001110
 5088     X       4035     3          11100000 10111111 10000011
 5093     X       4040     3          11100000 10111111 10001000
 5102     X       4049     3          11100000 10111111 10010001
 5103     X       4050     3          11100000 10111111 10010010
 5087     X       4034     3          11100000 10111111 10000010
 5104     X       4051     3          11100000 10111111 10010011
 5105     X       4052     3          11100000 10111111 10010100
 5089     X       4036     3          11100000 10111111 10000100
 5091     X       4038     3          11100000 10111111 10000110
 5092     X       4039     3          11100000 10111111 10000111
 5106     X       4053     3          11100000 10111111 10010101
 5090     X       4037     3          11100000 10111111 10000101
 5094     X       4041     3          11100000 10111111 10001001
 5096     X       4043     3          11100000 10111111 10001011


=========================================
3) EXPECTED RESULT
=========================================

node_id  path     ord     len         binary
----------------------------------------------------------------
 5086     X       4033     3          11100000 10111111 10000001
 5087     X       4034     3          11100000 10111111 10000010
 5088     X       4035     3          11100000 10111111 10000011
 5089     X       4036     3          11100000 10111111 10000100
 5090     X       4037     3          11100000 10111111 10000101
 5091     X       4038     3          11100000 10111111 10000110
 5092     X       4039     3          11100000 10111111 10000111
 5093     X       4040     3          11100000 10111111 10001000
 5094     X       4041     3          11100000 10111111 10001001
 5095     X       4042     3          11100000 10111111 10001010
 5096     X       4043     3          11100000 10111111 10001011
 5097     X       4044     3          11100000 10111111 10001100
 5098     X       4045     3          11100000 10111111 10001101
 5099     X       4046     3          11100000 10111111 10001110
 5100     X       4047     3          11100000 10111111 10001111
 5101     X       4048     3          11100000 10111111 10010000
 5102     X       4049     3          11100000 10111111 10010001
 5103     X       4050     3          11100000 10111111 10010010
 5104     X       4051     3          11100000 10111111 10010011
 5105     X       4052     3          11100000 10111111 10010100
 5106     X       4053     3          11100000 10111111 10010101

==========================================================
==========================================================





Paul DuBois wrote:
At 19:05 -0500 11/7/05, Eric Herrera wrote:

I'm attempting to sort using utf8_bin and I don't think its sorting properly. I believe I have everything set correctly. I've appended all related data. I also have a small perl script below which I used to
generate the tests.

I may be wrong, but I expect this query:

select id, utf8_data from test order by utf8_data collate utf8_bin

to generate the "EXPECTED ORDER" data section below. However, "TEST ORDER" is what gets generated.


I don't see any difference between the two sections.





Am I correct in my assumption that this is not ordering according to utf8 binary?

Anyone have experience with utf8 binary collation?




-Eric


===========================================================
===========================================================

CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `utf8_data` text character set utf8 collate utf8_bin,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin

============================================================
============================================================
mysql> show collation like 'utf8%';
+--------------------+---------+-----+---------+----------+---------+
| Collation          | Charset | Id  | Default | Compiled | Sortlen |
+--------------------+---------+-----+---------+----------+---------+
| utf8_general_ci    | utf8    |  33 | Yes     | Yes      |       1 |
| utf8_bin           | utf8    |  83 |         | Yes      |       1 |
| utf8_unicode_ci    | utf8    | 192 |         | Yes      |       8 |
| utf8_icelandic_ci  | utf8    | 193 |         | Yes      |       8 |
| utf8_latvian_ci    | utf8    | 194 |         | Yes      |       8 |
| utf8_romanian_ci   | utf8    | 195 |         | Yes      |       8 |
| utf8_slovenian_ci  | utf8    | 196 |         | Yes      |       8 |
| utf8_polish_ci     | utf8    | 197 |         | Yes      |       8 |
| utf8_estonian_ci   | utf8    | 198 |         | Yes      |       8 |
| utf8_spanish_ci    | utf8    | 199 |         | Yes      |       8 |
| utf8_swedish_ci    | utf8    | 200 |         | Yes      |       8 |
| utf8_turkish_ci    | utf8    | 201 |         | Yes      |       8 |
| utf8_czech_ci      | utf8    | 202 |         | Yes      |       8 |
| utf8_danish_ci     | utf8    | 203 |         | Yes      |       8 |
| utf8_lithuanian_ci | utf8    | 204 |         | Yes      |       8 |
| utf8_slovak_ci     | utf8    | 205 |         | Yes      |       8 |
| utf8_spanish2_ci   | utf8    | 206 |         | Yes      |       8 |
| utf8_roman_ci      | utf8    | 207 |         | Yes      |       8 |
| utf8_persian_ci    | utf8    | 208 |         | Yes      |       8 |
+--------------------+---------+-----+---------+----------+---------+
19 rows in set (0.01 sec)

=========================================================
unidb.pl
=========================================================
use DBI;
use Encode;

my $dbh = DBI->connect("test");

$dbh->do("delete from test");

my $sth = $dbh->prepare(
  "INSERT into test set utf8_data = ? "
);

foreach my $ord (4033..4053){

  my $hex = sprintf("%4.4x", $ord);
  print "$hex - $ord\n";
  my $unicode;
  eval "\$unicode = \"\\x{$hex}\"";

  #$unicode = pack("U*",$ord);
  $unicode = Encode::encode_utf8($unicode);

  $sth->execute($unicode);
}

print "\n\n";

# read them back out
my $sth = $dbh->prepare("select id, utf8_data from test order by utf8_data collate utf8_bin");
$sth->execute;
printf("%5s %5s %10s %5s %25s\n","node_id","path", "ord", "len" ,
"binary");
print "----------------------------------------------------------------\n";
while (my @a = $sth->fetchrow_array()){
  my $len = length($a[1]);
  $a[1] = Encode::decode_utf8($a[1]);
  printf("%5s %5s %10s %5s %25s\n", $a[0], 'X', ord($a[1]), $len),
  unpack("B*",$a[1])
  );
}

====================================================================
TEST ORDER
====================================================================

$ perl unidb.pl
0fc1 - 4033
0fc2 - 4034
0fc3 - 4035
0fc4 - 4036
0fc5 - 4037
0fc6 - 4038
0fc7 - 4039
0fc8 - 4040
0fc9 - 4041
0fca - 4042
0fcb - 4043
0fcc - 4044
0fcd - 4045
0fce - 4046
0fcf - 4047
0fd0 - 4048
0fd1 - 4049
0fd2 - 4050
0fd3 - 4051
0fd4 - 4052
0fd5 - 4053

node_id  path        ord   len                    binary
----------------------------------------------------------------
  106     X       4033     3  111000001011111110000001
  107     X       4034     3  111000001011111110000010
  108     X       4035     3  111000001011111110000011
  109     X       4036     3  111000001011111110000100
  110     X       4037     3  111000001011111110000101
  111     X       4038     3  111000001011111110000110
  112     X       4039     3  111000001011111110000111
  113     X       4040     3  111000001011111110001000
  114     X       4041     3  111000001011111110001001
  115     X       4042     3  111000001011111110001010
  116     X       4043     3  111000001011111110001011
  117     X       4044     3  111000001011111110001100
  118     X       4045     3  111000001011111110001101
  119     X       4046     3  111000001011111110001110
  120     X       4047     3  111000001011111110001111
  121     X       4048     3  111000001011111110010000
  122     X       4049     3  111000001011111110010001
  123     X       4050     3  111000001011111110010010
  124     X       4051     3  111000001011111110010011
  125     X       4052     3  111000001011111110010100
  126     X       4053     3  111000001011111110010101


===================================================================
EXPECTED ORDER
===================================================================

$ perl unidb.pl
node_id  path        ord   len                    binary
----------------------------------------------------------------
  106     X       4033     3  111000001011111110000001
  107     X       4034     3  111000001011111110000010
  108     X       4035     3  111000001011111110000011
  109     X       4036     3  111000001011111110000100
  110     X       4037     3  111000001011111110000101
  111     X       4038     3  111000001011111110000110
  112     X       4039     3  111000001011111110000111
  113     X       4040     3  111000001011111110001000
  114     X       4041     3  111000001011111110001001
  115     X       4042     3  111000001011111110001010
  116     X       4043     3  111000001011111110001011
  117     X       4044     3  111000001011111110001100
  118     X       4045     3  111000001011111110001101
  119     X       4046     3  111000001011111110001110
  120     X       4047     3  111000001011111110001111
  121     X       4048     3  111000001011111110010000
  122     X       4049     3  111000001011111110010001
  123     X       4050     3  111000001011111110010010
  124     X       4051     3  111000001011111110010011
  125     X       4052     3  111000001011111110010100
  126     X       4053     3  111000001011111110010101



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to