#18392: Use utf8mb4 encoding with MySQL 5.5 -------------------------------------+------------------------------------- Reporter: EmilStenstrom | Owner: nobody Type: Uncategorized | Status: new Component: Database layer | Version: 1.4 (models, ORM) | Resolution: Severity: Normal | Triage Stage: Design Keywords: utf8mb4 mysql | decision needed Has patch: 1 | Needs documentation: 0 Needs tests: 1 | Patch needs improvement: 1 Easy pickings: 0 | UI/UX: 0 -------------------------------------+------------------------------------- Changes (by rogeliorv):
* keywords: hack utf8mb4 mysql => utf8mb4 mysql * needs_better_patch: 0 => 1 * has_patch: 0 => 1 * needs_tests: 0 => 1 Comment: Replying to [comment:8 EmilStenstrom]: > Replying to [comment:7 rogeliorv]: > > As a way to test it. The hack consists in adding self.query('SET NAMES utf8mb4') in MySQLdb.connections in Connection.set_character_set function as shown here: http://pastebin.com/MW5BgRgP > > > > Of course the correct way would be to change this in django when setting up the cursor connection. > > Did your hack remove the exception? What was the rationale behind the hack? What's the next step? Yes, the hack removed the exception. The rationale followed was to make the mysql client to use a certain encoding. The next step is to make django's mysql connections to use utf8mb4 by default or otherwise make it more configurable. Since utf8bm4 is utf8 compatible, there should be no extra changes in that regard. To achieve this django.db.base.cursor should be changed in class DatabaseWrapper function _cursor, (complete function definition here http://pastebin.com/A6dMEMd4): ''kwargs = { "conv": django_conversions, "charset": "utf8mb4", "use_unicode": True, } '' Unfortunately this won't work unless we also change MySQLdb.connections class Connection function set_character_set: Change the two bottom lines to (complete function definition here: http://pastebin.com/AMN1B8za) #Hack so data can be decoded/encoded using python's utf8 since # python does not know about mysql utf8mb4 ''if charset == 'utf8mb4':'' ''charset = 'utf8''' ''self.string_decoder.charset = charset'' ''self.unicode_literal.charset = charset'' This will guarantee you can use special characets like πππβΊππππ Unlike the previous hack, which worked on reading/writing data, this patch only allows me to read data in utfmb4 format, but now I've hit an error on insertion/creation where I get 'Cursor' object has no attribute '_last_executed'. I will report evidence on this error as I find it. All your help regarding this error is appreciated. You can reach me via twitter, @rogeliorv -- Ticket URL: <https://code.djangoproject.com/ticket/18392#comment:9> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To post to this group, send email to django-updates@googlegroups.com. To unsubscribe from this group, send email to django-updates+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-updates?hl=en.