Thanks for that answer. It squares with my solution: have an additional
column that has the lower case values of the case sensitive unicode
setting. 


Martin Mueller

Professor emeritus of English and Classics
Northwestern University




On 11/25/14 6:48 AM, "Rik" <r...@grib.nl> wrote:

>Not a unicode one that I know of, converting it to latin1 for the grouping
>works for that particular use case, but I can't make any promises how it'd
>work on your entire set which may hold any unicode character, a lot of
>which cannot be converted to latin1:
>
>mysql> SET NAMES utf8;
>Query OK, 0 rows affected (0.00 sec)
>
>mysql> CREATE TABLE test ( foo VARCHAR(3)) ENGINE=InnoDB COLLATE=utf8_bin;
>Query OK, 0 rows affected (0.14 sec)
>
>mysql> SELECT GROUP_CONCAT(foo) FROM test GROUP BY foo;
>Empty set (0.00 sec)
>
>mysql> INSERT INTO test VALUES ('Ete'),('été'),('ete');
>Query OK, 3 rows affected (0.05 sec)
>Records: 3  Duplicates: 0  Warnings: 0
>
>mysql> SELECT *, GROUP_CONCAT(foo) FROM test GROUP BY foo;
>+-------+-------------------+
>| foo   | GROUP_CONCAT(foo) |
>+-------+-------------------+
>| Ete   | Ete               |
>| ete   | ete               |
>| été   | été               |
>+-------+-------------------+
>3 rows in set (0.00 sec)
>
>mysql> SELECT *, GROUP_CONCAT(foo) FROM test GROUP BY foo COLLATE
>utf8_general_ci;
>+------+-------------------+
>| foo  | GROUP_CONCAT(foo) |
>+------+-------------------+
>| Ete  | Ete,été,ete       |
>+------+-------------------+
>1 row in set (0.00 sec)
>
>mysql> SELECT *, GROUP_CONCAT(foo) FROM test GROUP BY CONVERT(foo USING
>latin1) COLLATE latin1_general_ci;
>+-------+-------------------+
>| foo   | GROUP_CONCAT(foo) |
>+-------+-------------------+
>| Ete   | Ete,ete           |
>| été   | été               |
>+-------+-------------------+
>2 rows in set (0.00 sec
>
>
>If you entire dataset fits in latin1, creating the table as such might be
>the best solution in this case entirely, depending on the environment.
>Another option is just to use utf8_bin as collation, but grouping by
>LOWER(yourcolumnname), or if that's not enough performance, denormalizing
>into an extra lowercase column.
>
>
>On Mon, Nov 24, 2014 at 11:36 PM, Martin Mueller <
>martinmuel...@northwestern.edu> wrote:
>
>> Is there a unicode setting on mysql that is case insensitive but
>> diacritics sensitive? Given 'Ete', 'été',  'ete' a group by routine for
>> such a setting would return two values: 'été',  'ete'.  I couldn't find
>> it, but I may not have known where to look.
>>
>> Martin Mueller
>>
>> Professor emeritus of English and Classics
>> Northwestern University
>>
>>
>> >
>>


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to