Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users

2007-01-27 Thread Michael Radziej

Hi,

ak schrieb:
> After some thoughts I came to the following conclusion: if you guys 
> want to keep support of legacy charsets in fact you don't have to 
> force model objects too be unicoded. Firstly, they are passed to 
> templates and filters and we can't mix legacy charsets with unicode in 
> one template. Next, if I don't use unicode, I don't have to code my 
> python sources (views) in unicode. So, I need to be able to pass 
> string values into my model objects and my strings are not unicoded.
> 
> So if everyone agreed, the way is simple:
> 1. when django loads data from db and fills in a model object, all 
> strings have to be encoded according to DEFAULT_CHARSET
> 2. when django passes data from form object to model object, it has to 
> encode strings according to DEFAULT_CHARSET again

This thread is moving more and more away the tickets. I started it to
get some help in deciding how to proceed with these ...

Regarding ak's proposal, this is going against a widely shared agreement
within the python world that applications should internally use unicode
strings (not: utf8 strings) and decode/encode to a bytestring at the
boundaries, which is usually input/output, or for database applications
it's the communication between the database backend (e.g. MySQLdb) and
the database. I'm not in a position to make any decisions for django,
but I'm pretty sure that you cannot convince the core developers to
follow your path.

Down to earth and back to tickets, my current understanding is this:

The problem that started the original thread in django-users was that
the MySQLdb backend thought it was using latin-1 encoding for the
connection and therefore could not encode '€', which is in iso-8859-15
but not in iso-8859-1 aka iso-latin-1. Ticket #2896 seems to explain how
this can happen.

In my opinion, each of the three tickets in the subject should solve
this issue, and none tries to cope with templates written in a different
encoding than settings.DEFAULT_ENCODING.

#952 allows to use a different encoding on the connection than
settings.DEFAULT_CHARSET. It does it for all backends.

#1365 sets connection.charset in the mysql backend to utf8. This makes
the MySQLdb use utf8 encoding, but it's hackish and has been reported
not to work in all environments.

#3370 opens the mysql backend connection with charset='utf8', which
seems a cleaner way to do the same as #1365. It also fixes the __repr__
of models (not sure if this is the best way, but this can be added to
any of the other patches)

My bottom line is that #952 has a different scope than the other two
tickets, and that #1365 should be closed as duplicate of #3370. #3370
and #952 can co-exist.


So, would anybody object against closing #1365 and promoting #952 and
#3370 to "Accepted" (which was their state before we started this
discussion)?

Michael


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users

2007-01-27 Thread Bjørn Stabell

On Jan 28, 2:02 pm, "ak" <[EMAIL PROTECTED]> wrote:
> Bjorn, if you read my first messages and specially my patch #3370, you
> find that I made a suggestion that if the guys want to move to unicode
> they better drop all native encodings support and so does my patch.

You mean require all I/O edge/boundary points to convert to/from 
Python unicode strings?  (We'll of course need to support non-UTF 
character encodings in databases, files, the web, etc.)

> Then people started to answer me that this is wrong. And at the moment
> noone is able to explain the whole thing and answer my quesions:
> 1. how do they want to support templates and python code (views/
> scripts) in native encodings if django itself would be all in unicode.
> The only way i see is to encode/decode everything at programmer's end
> and this means for me no native encodings support at all.

Support for Unicode strings (u"") in code is described in PEP-263, 
e.g.,

  #!/usr/bin/python
  # -*- coding:  -*-

Unfortunately it's not implemented yet (AFAIK), so you can't just have 
unescaped literals:

  s = u"encoded text goes here" # doesn't work yet; pending 
PEP-263

An alternative for literals in code is to surround them with unicode() 
and specify the appropriate encoding:

  s = unicode("encoded text goes here", "encoding name")

An even better way is to externalize all strings in .po files and use 
gettext, which has some support for returning unicode strings.


I guess templates could have their character encoding identified 
either through a similar mechanism, through a global settings 
variable, or just use the system default encoding.


> 2. how do they want to support legacy databases if db connection speaks 
> unicode

I'm not sure I can follow you.  How to configure a database adapter 
depends on the database and adapter you're using.  Some can accept 
unicode strings; for those that don't I guess you'll need a wrapper of 
some sort.


Rgds,
Bjorn


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Is this group moderated, or is it a bug with google groups?

2007-01-27 Thread Kenneth Gonsalves


On 26-Jan-07, at 12:19 PM, medhat wrote:

> So many times I send messages to the group, but my message does not
> appear at all, or it might appear a day or two after I actually send
> it, which of course makes it appear down on the list, and nobody  
> really
> sees it.

not moderated - and no bug in googlegroups either. There must be some  
obstruction between your mailserver and google.

-- 

regards
kg
http://lawgon.livejournal.com
http://nrcfosshelpline.in/web/



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Unicode or strings in models

2007-01-27 Thread Ivan Sagalaev

Splitting this into new thread since it's already not about db client 
encodings...

ak wrote:
> So if everyone agreed, the way is simple:
> 1. when django loads data from db and fills in a model object, all 
> strings have to be encoded according to DEFAULT_CHARSET
> 2. when django passes data from form object to model object, it has to 
> encode strings according to DEFAULT_CHARSET again

No, it would defeat the purpose of all this unicode endeavor. The point 
is to work internally (view code, models code etc) on unicode objects 
only. I'd rather propose this:

- db backend decodes data from db into unicode
- all models' properties that are now str's should contain unicode all 
the time (after reading from db and after assigning from forms)
- user would override __unicode__ of models instead of __str__ that is 
used now
- a standard __str__ should be defined as:

 def __str__(self):
   return unicode(self).encode(settings.DEFAULT_CHARSET)

- model validation should call all models' __str__s and warn if they 
return unicode objects

I believe migration from __str__ to __unicode__ would as simple as a 
search/replace operation.

I apologies if this was already discussed and resolved. If yes, please 
point me to where it was.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users

2007-01-27 Thread ak

After some thoughts I came to the following conclusion: if you guys 
want to keep support of legacy charsets in fact you don't have to 
force model objects too be unicoded. Firstly, they are passed to 
templates and filters and we can't mix legacy charsets with unicode in 
one template. Next, if I don't use unicode, I don't have to code my 
python sources (views) in unicode. So, I need to be able to pass 
string values into my model objects and my strings are not unicoded.

So if everyone agreed, the way is simple:
1. when django loads data from db and fills in a model object, all 
strings have to be encoded according to DEFAULT_CHARSET
2. when django passes data from form object to model object, it has to 
encode strings according to DEFAULT_CHARSET again

In fact, my patch #3370 is wrong then, actually newforms.model.save() 
method should be patched to recode clean_data from unicode to 
DEFAULT_CHARSET (if it differs) when passing this data to model object 
and for now we would get everything in place: utf8-based templates and 
legacy-charset-based templates would be both correctly supported and 
any national characters would be stored in db perfectly as they do now 
with oldforms (ofcourse remember what I said about #952)
And the second required patch is about recoding unicode strings loaded 
from db to DEFAULT_CHARSET (if differs) when passing them to model 
objects and back from DEFAULT_CHARSET to unicode when we save model 
objects to db. This patch will solve #952 issue and again it will work 
ok with both unicode and legacy-charset based templates.
And even more here: if we have a legacy database which doesn't 
understand unicode, we can realize this fact immediately after 
connecting to db and decide the correct way to decode/encode strings.

As I see, this way fixes all unicode/charsets issues and answers all 
questions. So, if there are no objections, I can write this patch 
tomorrow or by monday.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users

2007-01-27 Thread Ivan Sagalaev

Michael Radziej wrote:
> 1. Are all these tickets really about the connection encoding?
> 
> 2. If so, what's the problem of using utf8 for the connection for
> everybody? I don't see how this would be a problem for anybody who is
> using a different encoding for templates, within the database's storage
> or else, since there's no loss in converting anything into utf8. Or is
> there?

I agree with the 2nd point. You still can run into a theoretical problem 
with it in a scenario when an input is richer than a storage:

- a database that is internally stores data in a legacy encoding (say 
iso-8859-1)
- a web frontend that talks utf-8
- a user enters, say, Russian characters into a form
- data travels as utf-8 right until db where it will fail to encode them 
in iso-8859-1 because it doesn't have place for Russian characters

But it's indeed a very theoretical case. Most legacy system use the same 
legacy encoding for both backend and frontend and there would be no 
errors in the path: legacy (web) - unicode (newforms) - utf-8 (db 
connection) - legacy (db)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users

2007-01-27 Thread Ivan Sagalaev

ak wrote:
> Could someone please explain me what was a problem with unicode support
> in oldforms so newforms have been made with unicode inside ?

I can! The thing is it has absolutely nothing to do with forms, it's 
just historical coincidence.

Originally Django was written with using byte strings everywhere and 
there were no such thing as "conversion problem". However there were 
problems with incorrect string operations on byte strings (maxlength 
counting, upper/lower casing, etc.) Some time ago there was a decision 
to convert Django to work internally with unicode strings and convert 
them into byte strings on boundaries to the web and to the database. And 
there were no such thing as newforms at that moment.

And then Adrian started to implement newforms and he has chosen to do 
its internal in unicode, for compatibility with Django's future as I 
understand it.

> Kick me if I wrong but what is a real reason to convert bytes back and
> forth ? Religion ?

Reasons are purely technical... I'll list them but please do read until 
the end of the letter before you disagree. I believe you just 
misunderstand some things about unicode.

1. Unicode is a universal encoding that can store all characters. 
Without universal encoding an app written by a Russian programmer 
wouldn't be able to use a library written by a French programmer. This 
is why we need unicode.

2. In Python unicode strings can be either 'unicode' objects or 
byte-strings encoded in utf-8. The problem with utf-8 is that you can't 
string operations with it. For example you can't cut a month's name to 3 
letter just by doing month[0:3] because letters can occupy different 
count of bytes. This is what unicode objects are for and why Django 
internally should work with unicode.

May I recommend you my post about unicode and bytes (it's in russian): 
http://softwaremaniacs.org/blog/2006/07/28/unicode-and-bytes/

> I agree with everyone who says that unicode is a
> must and 'legacy' charsets are crap but guys I already have a BIG
> application that was about 80% migrated from other python frameworks to
> django some time ago and for legacy reasons it was all in national
> charset, not unicode.

What gives you an idea that Django won't work with this data? All this 
unicode stuff is purely internal. If you want your app to output 
windows-1251, set DEFAULT_CHARSET to windows-1251 and data would be 
automatically converted from and to it. I believe even newforms already 
use this setting to convert unicode data for templates (if not it should 
be just fixed and I'm happy to make a patch since I got some free time).

> Then I found that oldforms support will be
> dropped soon or later. So we at here have decided to start moving (yes,
> moving again !!!) all our code to newforms and what we got ? We got
> that we now have to recode everything to utf-8

Sure not :-). I'd say it would be wise thing to do *eventually*. But for 
now you absolutely can keep your templates and python sources in 
windows-1251.

> Did anyone who used unicode with oldform has any problems ? I am sure
> noone did.

In fact nobody used unicode with old forms. All things in request.POST, 
manipulator.flatten_data and in db models were always in byte strings 
(except db models with psycopg2).

And there were problems with it. They were just fixed very early (a 
couple of them by yours truly).

> So guys please explain me what was a reason to make me to migrate to
> unicode ?

I still think that you're confusing migrating Django internals to 
unicode objects and converting your files to utf-8. It's not about the 
latter.

> My opinion is simple: let's decide once ether django is for unicode or
> django supports both unicode and national charsets and then let's work.

Sure Django does and will support national charsets. This is why we have 
DEFAULT_CHARSET setting. Internal unicode just lets Django have all the 
encode/decode stuff localized in two places instead of littered all over 
the code.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users

2007-01-27 Thread ak

Michael, of you read again the topic about euro sign in newforms you
can find that this touches everything. Personally I couldn't find a way
to use utf-8 to connect MySQL and keep using cp1251 in my templates: it
basically doesn't work. With my patch (#3370) and utf8 everywhere it
does.


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: unicode issues in multiple tickets (#952, #1356, #3370) and thread about Euro sign in django-users

2007-01-27 Thread ak

Guys

Could someone please explain me what was a problem with unicode support
in oldforms so newforms have been made with unicode inside ?
Kick me if I wrong but what is a real reason to convert bytes back and
forth ? Religion ? I agree with everyone who says that unicode is a
must and 'legacy' charsets are crap but guys I already have a BIG
application that was about 80% migrated from other python frameworks to
django some time ago and for legacy reasons it was all in national
charset, not unicode. Then I found that oldforms support will be
dropped soon or later. So we at here have decided to start moving (yes,
moving again !!!) all our code to newforms and what we got ? We got
that we now have to recode everything to utf-8 and search for bugs in
over than 10k lines of our oldforms-based code until we move everything
to newforms and utf-8. But really why ?
Did anyone who used unicode with oldform has any problems ? I am sure
noone did.
Did anyone who used native encodings with oldforms has any problems
(except of patch against one line of code I dscribed before or #952) ?
Noone did.

So guys please explain me what was a reason to make me to migrate to
unicode ?

Django is a web framework for perfectionists with deadlines. I see may
perfectionists here but what about deadlines ?

My opinion is simple: let's decide once ether django is for unicode or
django supports both unicode and national charsets and then let's work.
If you tell me that from now there is only "unicode future" i'd agree
and start searching for bugs and sending patches like  #3370


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---