It's critical, with Manifold, that the database instance be capable of handling any characters it's likely to encounter. For Postgresql we tell people to install it with the utf-8 collation, for instance, and when we create database instances ourselves we try to specify that as well. For MariaDB, have a look at the database implementation we've got, and let me know if this is something we're missing anywhere?
Thanks, Karl On Wed, Jan 23, 2019 at 3:00 AM Markus Schuch <markus_sch...@web.de> wrote: > Hi, > > while using MySQL/MariaDB for MCF i encountered a "deadlock" kind of > situation caused by a UTF-16 character (e.g. U+1F3AE) in a String > inserted in one of the varchar colums. > > In my case a connector wrote th title of a parent document in to the > version string of the process document, which contained the character > U+1F3AE - a gamepad :) > > This lead to SQL Error 22001 "Incorrect string value: '\xF0\x9F\x8E\xAE' > for column 'lastversion' at row 1" in mysql because the utf8 collation > encoding does not support that kind of chars. (utf8mb4 does) > > The cause was hard to find, because it somehow it lead to a transaction > abort loop in the incremental ingester and the error was not logged > properly. > > My question: > - should we create the mysql database with utf8mb4 by default? > - or should inserted strings be sanatized from UTF-16 chars? > - or should 22001 be handled better? > > Thanks in advance > Markus >