Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users
Hi, ak schrieb: > After some thoughts I came to the following conclusion: if you guys > want to keep support of legacy charsets in fact you don't have to > force model objects too be unicoded. Firstly, they are passed to > templates and filters and we can't mix legacy charsets with unicode in > one template. Next, if I don't use unicode, I don't have to code my > python sources (views) in unicode. So, I need to be able to pass > string values into my model objects and my strings are not unicoded. > > So if everyone agreed, the way is simple: > 1. when django loads data from db and fills in a model object, all > strings have to be encoded according to DEFAULT_CHARSET > 2. when django passes data from form object to model object, it has to > encode strings according to DEFAULT_CHARSET again This thread is moving more and more away the tickets. I started it to get some help in deciding how to proceed with these ... Regarding ak's proposal, this is going against a widely shared agreement within the python world that applications should internally use unicode strings (not: utf8 strings) and decode/encode to a bytestring at the boundaries, which is usually input/output, or for database applications it's the communication between the database backend (e.g. MySQLdb) and the database. I'm not in a position to make any decisions for django, but I'm pretty sure that you cannot convince the core developers to follow your path. Down to earth and back to tickets, my current understanding is this: The problem that started the original thread in django-users was that the MySQLdb backend thought it was using latin-1 encoding for the connection and therefore could not encode '', which is in iso-8859-15 but not in iso-8859-1 aka iso-latin-1. Ticket #2896 seems to explain how this can happen. In my opinion, each of the three tickets in the subject should solve this issue, and none tries to cope with templates written in a different encoding than settings.DEFAULT_ENCODING. #952 allows to use a different encoding on the connection than settings.DEFAULT_CHARSET. It does it for all backends. #1365 sets connection.charset in the mysql backend to utf8. This makes the MySQLdb use utf8 encoding, but it's hackish and has been reported not to work in all environments. #3370 opens the mysql backend connection with charset='utf8', which seems a cleaner way to do the same as #1365. It also fixes the __repr__ of models (not sure if this is the best way, but this can be added to any of the other patches) My bottom line is that #952 has a different scope than the other two tickets, and that #1365 should be closed as duplicate of #3370. #3370 and #952 can co-exist. So, would anybody object against closing #1365 and promoting #952 and #3370 to "Accepted" (which was their state before we started this discussion)? Michael --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users
On Jan 28, 2:02 pm, "ak" <[EMAIL PROTECTED]> wrote: > Bjorn, if you read my first messages and specially my patch #3370, you > find that I made a suggestion that if the guys want to move to unicode > they better drop all native encodings support and so does my patch. You mean require all I/O edge/boundary points to convert to/from Python unicode strings? (We'll of course need to support non-UTF character encodings in databases, files, the web, etc.) > Then people started to answer me that this is wrong. And at the moment > noone is able to explain the whole thing and answer my quesions: > 1. how do they want to support templates and python code (views/ > scripts) in native encodings if django itself would be all in unicode. > The only way i see is to encode/decode everything at programmer's end > and this means for me no native encodings support at all. Support for Unicode strings (u"") in code is described in PEP-263, e.g., #!/usr/bin/python # -*- coding: -*- Unfortunately it's not implemented yet (AFAIK), so you can't just have unescaped literals: s = u"encoded text goes here" # doesn't work yet; pending PEP-263 An alternative for literals in code is to surround them with unicode() and specify the appropriate encoding: s = unicode("encoded text goes here", "encoding name") An even better way is to externalize all strings in .po files and use gettext, which has some support for returning unicode strings. I guess templates could have their character encoding identified either through a similar mechanism, through a global settings variable, or just use the system default encoding. > 2. how do they want to support legacy databases if db connection speaks > unicode I'm not sure I can follow you. How to configure a database adapter depends on the database and adapter you're using. Some can accept unicode strings; for those that don't I guess you'll need a wrapper of some sort. Rgds, Bjorn --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: Is this group moderated, or is it a bug with google groups?
On 26-Jan-07, at 12:19 PM, medhat wrote: > So many times I send messages to the group, but my message does not > appear at all, or it might appear a day or two after I actually send > it, which of course makes it appear down on the list, and nobody > really > sees it. not moderated - and no bug in googlegroups either. There must be some obstruction between your mailserver and google. -- regards kg http://lawgon.livejournal.com http://nrcfosshelpline.in/web/ --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Unicode or strings in models
Splitting this into new thread since it's already not about db client encodings... ak wrote: > So if everyone agreed, the way is simple: > 1. when django loads data from db and fills in a model object, all > strings have to be encoded according to DEFAULT_CHARSET > 2. when django passes data from form object to model object, it has to > encode strings according to DEFAULT_CHARSET again No, it would defeat the purpose of all this unicode endeavor. The point is to work internally (view code, models code etc) on unicode objects only. I'd rather propose this: - db backend decodes data from db into unicode - all models' properties that are now str's should contain unicode all the time (after reading from db and after assigning from forms) - user would override __unicode__ of models instead of __str__ that is used now - a standard __str__ should be defined as: def __str__(self): return unicode(self).encode(settings.DEFAULT_CHARSET) - model validation should call all models' __str__s and warn if they return unicode objects I believe migration from __str__ to __unicode__ would as simple as a search/replace operation. I apologies if this was already discussed and resolved. If yes, please point me to where it was. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users
After some thoughts I came to the following conclusion: if you guys want to keep support of legacy charsets in fact you don't have to force model objects too be unicoded. Firstly, they are passed to templates and filters and we can't mix legacy charsets with unicode in one template. Next, if I don't use unicode, I don't have to code my python sources (views) in unicode. So, I need to be able to pass string values into my model objects and my strings are not unicoded. So if everyone agreed, the way is simple: 1. when django loads data from db and fills in a model object, all strings have to be encoded according to DEFAULT_CHARSET 2. when django passes data from form object to model object, it has to encode strings according to DEFAULT_CHARSET again In fact, my patch #3370 is wrong then, actually newforms.model.save() method should be patched to recode clean_data from unicode to DEFAULT_CHARSET (if it differs) when passing this data to model object and for now we would get everything in place: utf8-based templates and legacy-charset-based templates would be both correctly supported and any national characters would be stored in db perfectly as they do now with oldforms (ofcourse remember what I said about #952) And the second required patch is about recoding unicode strings loaded from db to DEFAULT_CHARSET (if differs) when passing them to model objects and back from DEFAULT_CHARSET to unicode when we save model objects to db. This patch will solve #952 issue and again it will work ok with both unicode and legacy-charset based templates. And even more here: if we have a legacy database which doesn't understand unicode, we can realize this fact immediately after connecting to db and decide the correct way to decode/encode strings. As I see, this way fixes all unicode/charsets issues and answers all questions. So, if there are no objections, I can write this patch tomorrow or by monday. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users
Michael Radziej wrote: > 1. Are all these tickets really about the connection encoding? > > 2. If so, what's the problem of using utf8 for the connection for > everybody? I don't see how this would be a problem for anybody who is > using a different encoding for templates, within the database's storage > or else, since there's no loss in converting anything into utf8. Or is > there? I agree with the 2nd point. You still can run into a theoretical problem with it in a scenario when an input is richer than a storage: - a database that is internally stores data in a legacy encoding (say iso-8859-1) - a web frontend that talks utf-8 - a user enters, say, Russian characters into a form - data travels as utf-8 right until db where it will fail to encode them in iso-8859-1 because it doesn't have place for Russian characters But it's indeed a very theoretical case. Most legacy system use the same legacy encoding for both backend and frontend and there would be no errors in the path: legacy (web) - unicode (newforms) - utf-8 (db connection) - legacy (db) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users
ak wrote: > Could someone please explain me what was a problem with unicode support > in oldforms so newforms have been made with unicode inside ? I can! The thing is it has absolutely nothing to do with forms, it's just historical coincidence. Originally Django was written with using byte strings everywhere and there were no such thing as "conversion problem". However there were problems with incorrect string operations on byte strings (maxlength counting, upper/lower casing, etc.) Some time ago there was a decision to convert Django to work internally with unicode strings and convert them into byte strings on boundaries to the web and to the database. And there were no such thing as newforms at that moment. And then Adrian started to implement newforms and he has chosen to do its internal in unicode, for compatibility with Django's future as I understand it. > Kick me if I wrong but what is a real reason to convert bytes back and > forth ? Religion ? Reasons are purely technical... I'll list them but please do read until the end of the letter before you disagree. I believe you just misunderstand some things about unicode. 1. Unicode is a universal encoding that can store all characters. Without universal encoding an app written by a Russian programmer wouldn't be able to use a library written by a French programmer. This is why we need unicode. 2. In Python unicode strings can be either 'unicode' objects or byte-strings encoded in utf-8. The problem with utf-8 is that you can't string operations with it. For example you can't cut a month's name to 3 letter just by doing month[0:3] because letters can occupy different count of bytes. This is what unicode objects are for and why Django internally should work with unicode. May I recommend you my post about unicode and bytes (it's in russian): http://softwaremaniacs.org/blog/2006/07/28/unicode-and-bytes/ > I agree with everyone who says that unicode is a > must and 'legacy' charsets are crap but guys I already have a BIG > application that was about 80% migrated from other python frameworks to > django some time ago and for legacy reasons it was all in national > charset, not unicode. What gives you an idea that Django won't work with this data? All this unicode stuff is purely internal. If you want your app to output windows-1251, set DEFAULT_CHARSET to windows-1251 and data would be automatically converted from and to it. I believe even newforms already use this setting to convert unicode data for templates (if not it should be just fixed and I'm happy to make a patch since I got some free time). > Then I found that oldforms support will be > dropped soon or later. So we at here have decided to start moving (yes, > moving again !!!) all our code to newforms and what we got ? We got > that we now have to recode everything to utf-8 Sure not :-). I'd say it would be wise thing to do *eventually*. But for now you absolutely can keep your templates and python sources in windows-1251. > Did anyone who used unicode with oldform has any problems ? I am sure > noone did. In fact nobody used unicode with old forms. All things in request.POST, manipulator.flatten_data and in db models were always in byte strings (except db models with psycopg2). And there were problems with it. They were just fixed very early (a couple of them by yours truly). > So guys please explain me what was a reason to make me to migrate to > unicode ? I still think that you're confusing migrating Django internals to unicode objects and converting your files to utf-8. It's not about the latter. > My opinion is simple: let's decide once ether django is for unicode or > django supports both unicode and national charsets and then let's work. Sure Django does and will support national charsets. This is why we have DEFAULT_CHARSET setting. Internal unicode just lets Django have all the encode/decode stuff localized in two places instead of littered all over the code. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users
Michael, of you read again the topic about euro sign in newforms you can find that this touches everything. Personally I couldn't find a way to use utf-8 to connect MySQL and keep using cp1251 in my templates: it basically doesn't work. With my patch (#3370) and utf8 everywhere it does. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---
Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users
Guys Could someone please explain me what was a problem with unicode support in oldforms so newforms have been made with unicode inside ? Kick me if I wrong but what is a real reason to convert bytes back and forth ? Religion ? I agree with everyone who says that unicode is a must and 'legacy' charsets are crap but guys I already have a BIG application that was about 80% migrated from other python frameworks to django some time ago and for legacy reasons it was all in national charset, not unicode. Then I found that oldforms support will be dropped soon or later. So we at here have decided to start moving (yes, moving again !!!) all our code to newforms and what we got ? We got that we now have to recode everything to utf-8 and search for bugs in over than 10k lines of our oldforms-based code until we move everything to newforms and utf-8. But really why ? Did anyone who used unicode with oldform has any problems ? I am sure noone did. Did anyone who used native encodings with oldforms has any problems (except of patch against one line of code I dscribed before or #952) ? Noone did. So guys please explain me what was a reason to make me to migrate to unicode ? Django is a web framework for perfectionists with deadlines. I see may perfectionists here but what about deadlines ? My opinion is simple: let's decide once ether django is for unicode or django supports both unicode and national charsets and then let's work. If you tell me that from now there is only "unicode future" i'd agree and start searching for bugs and sending patches like #3370 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~--~~~~--~~--~--~---