The "mb4" stands for "multi-byte 4". It allows utf8 to handle characters up to 4 bytes long. Basic utf8 for MariaDB only handles characters up to 3 bytes long. Three bytes is already enough for the entire Basic Multilingual Plane, which includes everything needed for nearly all modern languages, as well as a large number of symbols. There are also some performance and compatibility benefits for this lower level character set. Whether this is intentional, or what the discussion of the tradeoffs for Koha looked like, I couldn't say (I wasn't there).
Joel Coehoorn Director of Information Technology York College of Nebraska On Wed, Aug 25, 2021 at 3:53 PM Michael Kuhn <m...@adminkuhn.ch> wrote: > Hi > > 1. In the last few years when installing Koha on Debian GNU/Linux 9 or > 10 the character sets in MariaDB were as follows: > > MariaDB [(none)]> SHOW VARIABLES LIKE '%char%'; > +--------------------------+----------------------------+ > | Variable_name | Value | > +--------------------------+----------------------------+ > | character_set_client | utf8mb4 | > | character_set_connection | utf8mb4 | > | character_set_database | utf8mb4 | > | character_set_filesystem | binary | > | character_set_results | utf8mb4 | > | character_set_server | utf8mb4 | > | character_set_system | utf8 | > | character_sets_dir | /usr/share/mysql/charsets/ | > +--------------------------+----------------------------+ > > Today I installed Koha 21.05.03 on Debian GNU/Linux 11 with MariaDB > 10.5.11 where the character sets are as follows: > > MariaDB [(none)]> SHOW VARIABLES LIKE '%char%'; > +--------------------------+----------------------------+ > | Variable_name | Value | > +--------------------------+----------------------------+ > | character_set_client | utf8 | > | character_set_connection | utf8 | > | character_set_database | utf8mb4 | > | character_set_filesystem | binary | > | character_set_results | utf8 | > | character_set_server | utf8mb4 | > | character_set_system | utf8 | > | character_sets_dir | /usr/share/mysql/charsets/ | > +--------------------------+----------------------------+ > > I'm not sure what is going on here. Does anyone know why the character > sets for client, connection and results have changed from utf8mb4 to > utf8? Is this correct with Koha or should these character sets be changed? > > 2. Today I came upon an installation of Koha 18.11.05 using MariaDB > 10.0.32 which has the following character sets: > > MariaDB [(none)]> SHOW VARIABLES LIKE '%char%'; > +--------------------------+----------------------------+ > | Variable_name | Value | > +--------------------------+----------------------------+ > | character_set_client | utf8 | > | character_set_connection | utf8 | > | character_set_database | latin1 | > | character_set_filesystem | binary | > | character_set_results | utf8 | > | character_set_server | latin1 | > | character_set_system | utf8 | > | character_sets_dir | /usr/share/mysql/charsets/ | > +--------------------------+----------------------------+ > > This seems quite wrong to me - as far as I know "latin1" was never a > supported character set in Koha... as far as I know the character sets > should be set as shown in topic 1. > > However, is it still possible to update such a database with these > character sets to Koha 21.05.03 without destroying the data completely? > > Best wishes: Michael > -- > Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis > Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz > T 0041 (0)61 261 55 61 · E m...@adminkuhn.ch · W www.adminkuhn.ch > _______________________________________________ > > Koha mailing list http://koha-community.org > Koha@lists.katipo.co.nz > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha > _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha