https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=18336
Bug ID: 18336 Summary: Use utf8mb4 instead of utf8 for MySQL tables, columns, and connections Change sponsored?: --- Product: Koha Version: master Hardware: All OS: All Status: NEW Severity: normal Priority: P5 - low Component: Architecture, internals, and plumbing Assignee: gmcha...@gmail.com Reporter: dc...@prosentient.com.au QA Contact: testo...@bugs.koha-community.org As noted by myself on the koha-devel listserv, Martin on https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=11944#c247, and Mark Tompsett on https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=15794#c2, we might want to use utf8mb4 instead of utf8 for MySQL tables, columns, and connections. utf8 in MySQL has a 3 byte limit: https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8.html. While most "normal" characters in most languages are covered by 1-3 bytes, UTF8 does allow for 4 bytes, which means we have a problem using lesser used characters in Chinese, Japanese, and Korean among other languages. It also means we can't store emoji. When MySQL encounters a 4 byte UTF8 character, it immediately truncates the string from that character onward. Unfortunately, it doesn't raise an error. It raises a warning, which isn't that easy to detect. In my case, I'm trying to store a MARCXML record with a 4 byte character, and while C4::Biblio::AddBiblio returns true, MySQL corrupts the XML record when using utf8 encoding rather than utf8mb4 (both in terms of the MySQL column and the MySQL connection set by Koha::Database). -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/