Re: TEMPLATE_CHARSET

Malcolm Tredinnick Tue, 10 Apr 2007 03:08:24 -0700

On Tue, 2007-04-10 at 08:24 +0200, Gabor Farkas wrote:
> Malcolm Tredinnick wrote:
> > On Mon, 2007-04-09 at 18:56 +0400, Ivan Sagalaev wrote:
> >> As implemented now in unicode branch templates are loaded from files 
> >> stored only in utf-8 (as far as I can read the code). However there's a 
> >> problem with legacy template files that are stored in one-byte 
> >> encodings. This is unfortunately not a rare thing and happen to raise 
> >> offenses from developers who we are forcing effectively recode all their 
> >> template files (and may be change text editors etc.)
> >>
> >> May be a TEMPLATE_CHARSET setting will be useful?
> > 
> > Yeah, I wondered about this. I was kind of hoping it wouldn't be an
> > issue, but people will insist on using non-portable encodings in their
> > files even in the 21st century. :-(
> > 
> > I'm not so much worried about the one-off conversion (after all, it's
> > for those peoples' benefit that we're doing this) as much as filesystems
> > that store in a particular encoding by default. There's no reliable,
> > non-expensive way to automatically detect the file encoding, so it
> > probably has to be done with a setting.
> > 
> > TEMPLATE_CHARSET is not the right name, though. Templates aren't the
> > problem: the filesystem encoding is. So maybe FILESYSTEM_ENCODING or
> > something explicit like that. We'll need to graft it into each
> > filesystem-based template loader and it defaults to utf-8.
> > 
> 
> python uses sys.getfilesystemencoding to detect the filesystem's 
> encoding, but it's not perfect, because for example on linux, there 
> isn't any guarantee that this encoding (as far as i know python is 
> guessing it based on your locale) is used in your template file.
> 
> 
> but what i don't understand, is: why is this an issue? can't we assume 
> that the developer knows the encoding of his own files? and if he knows 
> them, he should be able to easily recode them to utf-8, for example.


This is the simplest solution from our point of view, but it's the more
painful one from a content author's point of view. I have some sympathy
for the existing "odd" encoding problem.

It's also not worth worrying about too much. It's a total of about four
lines that need changing to implement this and it's not even a real
performance problem -- a slight hit for non-UTF-8 files, but generally
much less than 2% of a total of few microseconds for single byte
alternative encodings (for a slow disk read on a relatively large
template file). I did the timings on this, I'm not pulling the numbers
out of thin air. We skip the encoding when the config variable is set to
utf-8 or None (since it's slightly faster to do it in the template
constructor) and we encode on read for other encodings. No problem.
Let's stop spending cycles on this one; it's too much of a small-fry
issue.

> i'm personally for the one-encoding-to-rule-them-all solution btw :)
> the only issue i have with it is that if we make utf-8 mandatory,
> will it be still easy/possible to generate html files in non-utf-8?

Don't get confused between input encoding and output encoding. I am
keeping them very separate internally. Input is unicode or UTF-8
bytestrings, except in known cases (reading from files in template
loaders, database input, form input) and output is in
settings.DEFAULT_ENCODING.

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: TEMPLATE_CHARSET

Reply via email to