Re: Utf8 collations

2004-12-01 Thread Andrew Nagy
Brown, Brooks wrote:
I ran mysqld with arguments --default-character-set=utf8 and --default-collation=utf8_unicode_ci, and created a table with no collation specification.  I didn't test specifically with ç, but e and e-acute were equivalent.  For me this is was problem, but it sounds like for you it's the desired behavior.
 

Sure is, thanks!!!
Andrew
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: Utf8 collations

2004-12-01 Thread Brown, Brooks
I ran mysqld with arguments --default-character-set=utf8 and 
--default-collation=utf8_unicode_ci, and created a table with no collation 
specification.  I didn't test specifically with ç, but e and e-acute were 
equivalent.  For me this is was problem, but it sounds like for you it's the 
desired behavior.

-Original Message-
From: Andrew Nagy [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 01, 2004 11:03 AM
To: Brown, Brooks
Cc: [EMAIL PROTECTED]
Subject: Re: Utf8 collations


Brooks, this isn't an answer to your question, but a question to you 
regarding what you have done.

I would like to do a query such as:
SELECT title FROM books WHERE title LIKE '%Francais%';

And in return get:

+--+
| title|
+--+
| Français |
+--+

Does the "CHARACTER SET utf8 collate utf8_unicode_ci" allow for this?

How would one set the collation of international characters with english 
characters for searching?

Thanks
Andrew


Brown, Brooks wrote:
> All of the unicode collations listed in the reference manual except the 
> binary collations are not sensitive to diacritical marks.  That is, if I do 
> the following:
> 
> create table t ( filename varchar(260) ) type=InnoDB CHARACTER SET utf8 
> collate utf8_unicode_ci;
> 
> -- insert an e-acute
> insert into t values ( x'c3a9' ); 
> 
> mysql> select * from t where filename = 'e';
> +--+
> | f|
> +--+
> | é|
> +--+
> 
> The problem is that e really isn't the same as e-acute for the file system.  
> Ideally, what I want is a collation that is case insensitive, but is 
> sensitive to diacritical symbols, but a case sensitive collation would be 
> okay if it were sensitive to diacritical symbols?  Is there none available 
> for utf8 as the manual indicates?  If not, how difficult would it be to 
> develop one?
> 
> I am using 4.1.3 on Mac OS X.
> 
> Brooks R. Brown
> Software Engineer
> Extensis, Inc.
> <http://www.extensis.com/>
> 



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Utf8 collations

2004-12-01 Thread Andrew Nagy
Brooks, this isn't an answer to your question, but a question to you 
regarding what you have done.

I would like to do a query such as:
SELECT title FROM books WHERE title LIKE '%Francais%';
And in return get:
+--+
| title|
+--+
| Français |
+--+
Does the "CHARACTER SET utf8 collate utf8_unicode_ci" allow for this?
How would one set the collation of international characters with english 
characters for searching?

Thanks
Andrew
Brown, Brooks wrote:
All of the unicode collations listed in the reference manual except the binary 
collations are not sensitive to diacritical marks.  That is, if I do the 
following:
create table t ( filename varchar(260) ) type=InnoDB CHARACTER SET utf8 collate 
utf8_unicode_ci;
-- insert an e-acute
insert into t values ( x'c3a9' ); 

mysql> select * from t where filename = 'e';
+--+
| f|
+--+
| é|
+--+
The problem is that e really isn't the same as e-acute for the file system.  
Ideally, what I want is a collation that is case insensitive, but is sensitive 
to diacritical symbols, but a case sensitive collation would be okay if it were 
sensitive to diacritical symbols?  Is there none available for utf8 as the 
manual indicates?  If not, how difficult would it be to develop one?
I am using 4.1.3 on Mac OS X.
Brooks R. Brown
Software Engineer
Extensis, Inc.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: Utf8 collations

2004-11-02 Thread Gleb Paharenko
Hi.

See:

  http://dev.mysql.com/doc/mysql/en/Charset-config-file.html



"Brown, Brooks" <[EMAIL PROTECTED]> wrote:

> All of the unicode collations listed in the reference manual except the =

> binary collations are not sensitive to diacritical marks.  That is, if I =

> do the following:

> 

> create table t ( filename varchar(260) ) type=3DInnoDB CHARACTER SET =

> utf8 collate utf8_unicode_ci;

> 

> -- insert an e-acute

> insert into t values ( x'c3a9' );=20

> 

> mysql> select * from t where filename =3D 'e';

> +--+

> | f|

> +--+

> | =E9|

> +--+

> 

> The problem is that e really isn't the same as e-acute for the file =

> system.  Ideally, what I want is a collation that is case insensitive, =

> but is sensitive to diacritical symbols, but a case sensitive collation =

> would be okay if it were sensitive to diacritical symbols?  Is there =

> none available for utf8 as the manual indicates?  If not, how difficult =

> would it be to develop one?

> 

> I am using 4.1.3 on Mac OS X.

> 

> Brooks R. Brown

> Software Engineer

> Extensis, Inc.

> 

> 

> 



-- 
For technical support contracts, goto https://order.mysql.com/?ref=ensita
This email is sponsored by Ensita.NET http://www.ensita.net/
   __  ___ ___   __
  /  |/  /_ __/ __/ __ \/ /Gleb Paharenko
 / /|_/ / // /\ \/ /_/ / /__   [EMAIL PROTECTED]
/_/  /_/\_, /___/\___\_\___/   MySQL AB / Ensita.NET
   <___/   www.mysql.com




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: UTF8 collations in 4.1.3

2004-08-04 Thread Jeremy March

> Entering it in hex works for me too.  So the problem _was_ actually with
> the values I inserted into the database.
> 
> What's the best way to actually see what is stored in the database,
> preferably as hex or something else that a terminal is guaranteed to
> display correctly?  Clearly, what I was doing earlier was not correct.


SELECT hex(your_column) FROM your_table;

I usually convert utf8 to ucs2 so that I can recognize the codepoints easier.

SELECT hex(CONVERT(your_column USING ucs2)) FROM your_table;

There is also a new UNHEX() function which appeared in 4.1.2.

best,

Jeremy March


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: UTF8 collations in 4.1.3

2004-08-04 Thread Jody McIntyre
On Tue, Aug 03, 2004 at 01:11:44PM -0400, Jeremy March wrote:

> Is this for Swedish language data?  I don't know Swedish so I don't
> actually know where u-diaeresis is sorted in Swedish myself, but
> according to the source code (in the file: strings/ctype-uca.c) the
> u-diaeresis is sorted as an equivalent of "y" in utf8_swedish_ci.

I don't know Swedish either but section 11.3.13 of the manual
( http://dev.mysql.com/doc/mysql/en/Charset-collation-effect.html ) says
that it is sorted with y, as you said.

> The unicode codepoint for u-diaeresis is 0x00FC and the capital
> U-diaeresis is 0x00DC.
> 
> I just tested this with 4.1.4 (from the bk tree) and it worked correctly
> for me.  My keyboard isn't setup to enter u-diaeresis easily so I
> entered it in hex.  Try this:

Entering it in hex works for me too.  So the problem _was_ actually with
the values I inserted into the database.

What's the best way to actually see what is stored in the database,
preferably as hex or something else that a terminal is guaranteed to
display correctly?  Clearly, what I was doing earlier was not correct.

Thanks,
Jody


> 
> CREATE TABLE swedish (col char(20) COLLATE utf8_swedish_ci);
> 
> INSERT INTO swedish VALUES (CONVERT(_ucs2 0x004D00FC006C006C00650072
> USING utf8)), ('MySQL'), ('Muffler'), ('MX Systems');
> 
> SELECT * FROM swedish ORDER BY col;
> ++
> | col|
> ++
> | Muffler|
> | MX Systems |
> | M??ller|
> | MySQL  |
> ++
> 4 rows in set (0.00 sec)
> 
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
> 

-- 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



re: UTF8 collations in 4.1.3

2004-08-03 Thread Jeremy March
>...
> mysql> SELECT col2 FROM test ORDER BY col2 COLLATE utf8_swedish_ci;
> ++
> | col2   |
> ++
> | M(u-diaresis)ller  |
> | Muffler|
> | MX Systems |
> | MySQL  |
> ++
> ...

> I expect M(u-diaeresis)ller to sort after MX Systems in the following:
> ...
> I have tried various UTF8 collations and, apart from utf8_bin, they all
> place M(u-diaresis)ller at the start.
> ...

Is this for Swedish language data?  I don't know Swedish so I don't
actually know where u-diaeresis is sorted in Swedish myself, but
according to the source code (in the file: strings/ctype-uca.c) the
u-diaeresis is sorted as an equivalent of "y" in utf8_swedish_ci.

The unicode codepoint for u-diaeresis is 0x00FC and the capital
U-diaeresis is 0x00DC.

I just tested this with 4.1.4 (from the bk tree) and it worked correctly
for me.  My keyboard isn't setup to enter u-diaeresis easily so I
entered it in hex.  Try this:

CREATE TABLE swedish (col char(20) COLLATE utf8_swedish_ci);

INSERT INTO swedish VALUES (CONVERT(_ucs2 0x004D00FC006C006C00650072
USING utf8)), ('MySQL'), ('Muffler'), ('MX Systems');

SELECT * FROM swedish ORDER BY col;
++
| col|
++
| Muffler|
| MX Systems |
| MÃller|
| MySQL  |
++
4 rows in set (0.00 sec)



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: utf8 collations with national symbol's grouping

2003-09-23 Thread Ilja Polivanov
Sorry :)

i see that simbols i've written was converted into ASCII symbols. That is
what i need, but in MySQL :). So is there any collation where: a = a-umlaut
= a-with-any-other-"fix" ?



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]