Re: urlencode in template gives unexpected result (for me :-)

2011-08-25 Thread Tom Evans
On Thu, Aug 25, 2011 at 7:23 AM, Michel30  wrote:
> Thanks Tom that clarifies a lot, learning every day.
>
> My filesystem is ext4, encoding is irrelevant here right?
> So, I guess the best thing to do is to convert my database into utf-8
> using a method as described here:
> http://www.bothernomore.com/2008/12/16/character-encoding-hell/
>
> That way I'm consistently using utf-8.
> Would this also be backwards compatible with my legacy app? I don't
> see it using any encoding specific.
>
> Thanks,
> Michel
>

Encoding is always relevant. Your filesystem will treat the filename
as just a series of bytes, but what those bytes are depends upon the
character encoding of the application that created the files.

I'm not sure how this will be displayed via email, but an example of a
file created with a latin1 name, and then attempted to be opened with
the equivalent unicode name:

>>> filename=u'£££'
>>> fp=open(filename.encode('latin1'), 'w+')
>>> fp.close()
>>> fp=open(filename.encode('utf-8'), 'r')
Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 2] No such file or directory: '\xc2\xa3\xc2\xa3\xc2\xa3'
>>> os.listdir('.')
['\xa3\xa3\xa3']

\xa3 is the encoding of the '£' symbol in latin1, \xc2\xa3 is the
encoding of the same symbol in UTF-8.

Cheers

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: urlencode in template gives unexpected result (for me :-)

2011-08-25 Thread Michel30
Thanks Tom that clarifies a lot, learning every day.

My filesystem is ext4, encoding is irrelevant here right?
So, I guess the best thing to do is to convert my database into utf-8
using a method as described here:
http://www.bothernomore.com/2008/12/16/character-encoding-hell/

That way I'm consistently using utf-8.
Would this also be backwards compatible with my legacy app? I don't
see it using any encoding specific.

Thanks,
Michel

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: urlencode in template gives unexpected result (for me :-)

2011-08-24 Thread Tom Evans
On Wed, Aug 24, 2011 at 2:50 PM, Michel30  wrote:
>
>
> On Aug 24, 3:22 pm, Tom Evans  wrote:
>> On Wed, Aug 24, 2011 at 1:47 PM, Michel30  wrote:
>> > Hi all,
>>
>> > I have written an application using Django 1.3 , apache2 and a mysql
>> > db.
>> > I'm using the db to store filepaths and filenames for legacy purposes
>> > while serving them to users with apache.
>>
>> > Now mysql is using latin-1 (with the filenames most likely stored in
>> > CP-1252) while Django uses utf-8.
>>
>> That is not going to fly. You will likely need to ensure you have a
>> consistent character encoding across your website, database and file
>> system.
>>
>> Cheers
>>
>> Tom
>
> Tom,
>
> that looks like it would be best, yes (this is my first exposure to
> encoding problems)
>
> I cannot change the filesystem or mysql encoding since the legacy
> application is still using it. I assumed that with utf-8 I would be
> good as it covers all(?) and I understood mysql translates itself from
> latin-1 to utf-8 and vice versa.
>
> As far as I can see this only hurts my hyperlinks, more specifically
> only file.filename so wouldn't translating only these work?
>

Trusting mysql to DTRT with character encoding does not work well in
my experience. For starters, if your database is latin1, there is a
huge range of UTF-8 characters that cannot be encode to latin1. If
your website is presented in UTF-8, as is default for Django, then
input submitted by your users will be in UTF-8 as well, and quite
easily cannot be stored in the database. Many browsers will submit
\u2019 - ’ - instead of a simple ' character, which will not fit in
latin1.

When it comes to serving your files, Apache url-decodes your request,
it doesn't assume anything about the character encoding of the bytes
after that and will simply open that file system location path. If
your files are stored in the file system with latin1 names, that means
the requested file name must be encoded in latin1. So sure, you could
latin1 encode each filename, and then urlencode the result.

You are opening yourself up for a world of pain by not using
consistent character encodings. It will hurt you eventually.

Cheers

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: urlencode in template gives unexpected result (for me :-)

2011-08-24 Thread Michel30


On Aug 24, 3:22 pm, Tom Evans  wrote:
> On Wed, Aug 24, 2011 at 1:47 PM, Michel30  wrote:
> > Hi all,
>
> > I have written an application using Django 1.3 , apache2 and a mysql
> > db.
> > I'm using the db to store filepaths and filenames for legacy purposes
> > while serving them to users with apache.
>
> > Now mysql is using latin-1 (with the filenames most likely stored in
> > CP-1252) while Django uses utf-8.
>
> That is not going to fly. You will likely need to ensure you have a
> consistent character encoding across your website, database and file
> system.
>
> Cheers
>
> Tom

Tom,

that looks like it would be best, yes (this is my first exposure to
encoding problems)

I cannot change the filesystem or mysql encoding since the legacy
application is still using it. I assumed that with utf-8 I would be
good as it covers all(?) and I understood mysql translates itself from
latin-1 to utf-8 and vice versa.

As far as I can see this only hurts my hyperlinks, more specifically
only file.filename so wouldn't translating only these work?

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



Re: urlencode in template gives unexpected result (for me :-)

2011-08-24 Thread Tom Evans
On Wed, Aug 24, 2011 at 1:47 PM, Michel30  wrote:
> Hi all,
>
> I have written an application using Django 1.3 , apache2 and a mysql
> db.
> I'm using the db to store filepaths and filenames for legacy purposes
> while serving them to users with apache.
>
> Now mysql is using latin-1 (with the filenames most likely stored in
> CP-1252) while Django uses utf-8.
>

That is not going to fly. You will likely need to ensure you have a
consistent character encoding across your website, database and file
system.

Cheers

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.



urlencode in template gives unexpected result (for me :-)

2011-08-24 Thread Michel30
Hi all,

I have written an application using Django 1.3 , apache2 and a mysql
db.
I'm using the db to store filepaths and filenames for legacy purposes
while serving them to users with apache.

Now mysql is using latin-1 (with the filenames most likely stored in
CP-1252) while Django uses utf-8.

I generate the links to the files thusly in my template:
   {{ file.filename }}

This works until I have funky character, lets say File….pdf

Then my hyperlink reads:
File….pdf

While Apache throws a 404 with: NotFound /path/File….pdf

Obviously because it expects this link:
File….pdf

Any ideas how to fix this in the template?
Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.