On Tue, 2007-04-10 at 08:24 +0200, Gabor Farkas wrote: > Malcolm Tredinnick wrote: > > On Mon, 2007-04-09 at 18:56 +0400, Ivan Sagalaev wrote: > >> As implemented now in unicode branch templates are loaded from files > >> stored only in utf-8 (as far as I can read the code). However there's a > >> problem with legacy template files that are stored in one-byte > >> encodings. This is unfortunately not a rare thing and happen to raise > >> offenses from developers who we are forcing effectively recode all their > >> template files (and may be change text editors etc.) > >> > >> May be a TEMPLATE_CHARSET setting will be useful? > > > > Yeah, I wondered about this. I was kind of hoping it wouldn't be an > > issue, but people will insist on using non-portable encodings in their > > files even in the 21st century. :-( > > > > I'm not so much worried about the one-off conversion (after all, it's > > for those peoples' benefit that we're doing this) as much as filesystems > > that store in a particular encoding by default. There's no reliable, > > non-expensive way to automatically detect the file encoding, so it > > probably has to be done with a setting. > > > > TEMPLATE_CHARSET is not the right name, though. Templates aren't the > > problem: the filesystem encoding is. So maybe FILESYSTEM_ENCODING or > > something explicit like that. We'll need to graft it into each > > filesystem-based template loader and it defaults to utf-8. > > > > python uses sys.getfilesystemencoding to detect the filesystem's > encoding, but it's not perfect, because for example on linux, there > isn't any guarantee that this encoding (as far as i know python is > guessing it based on your locale) is used in your template file. > > > but what i don't understand, is: why is this an issue? can't we assume > that the developer knows the encoding of his own files? and if he knows > them, he should be able to easily recode them to utf-8, for example.
This is the simplest solution from our point of view, but it's the more painful one from a content author's point of view. I have some sympathy for the existing "odd" encoding problem. It's also not worth worrying about too much. It's a total of about four lines that need changing to implement this and it's not even a real performance problem -- a slight hit for non-UTF-8 files, but generally much less than 2% of a total of few microseconds for single byte alternative encodings (for a slow disk read on a relatively large template file). I did the timings on this, I'm not pulling the numbers out of thin air. We skip the encoding when the config variable is set to utf-8 or None (since it's slightly faster to do it in the template constructor) and we encode on read for other encodings. No problem. Let's stop spending cycles on this one; it's too much of a small-fry issue. > i'm personally for the one-encoding-to-rule-them-all solution btw :) > the only issue i have with it is that if we make utf-8 mandatory, > will it be still easy/possible to generate html files in non-utf-8? Don't get confused between input encoding and output encoding. I am keeping them very separate internally. Input is unicode or UTF-8 bytestrings, except in known cases (reading from files in template loaders, database input, form input) and output is in settings.DEFAULT_ENCODING. Regards, Malcolm --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---
