Re: [kid-discuss] Kid's decoding guess

David Stanek Sat, 11 Feb 2006 20:01:00 -0800

On Sun, Feb 12, 2006 at 02:03:13AM +0100, BJ?rn Lindqvist wrote:
> > > I think that Kid's way to deal with encodings is slightly
> > > non-optimal..  It could be improved by having Kid default to utf8
> > > instead of ascii. It is likely that that brings other problems, but it
> > > is still better than guessing ascii which in a web context is a
> > > totally brain damaged guess. I'm sure there are alot of web apps out
> > > there waiting to be broken because the programmer didn't realise that
> > > his code only works with ascii characters.
> > >
> > > My other idea is that Kid would refuse to run unless an encoding is
> > > explicitly specified somewhere. "In the face of ambiguity, refuse the
> > > temptation to guess." I hope you can please fix this problem somehow.
> > > Please CC me replies as I don't subscribe.
> >
> > I believe Kid tries to use the encoding that *you* have set. There
> > may be a case in there somewhere that has a bad default, but I was
> > not able to find it during a few quick greps. See the documentation
> > for 'sys.getdefaultencoding()'.
> 
> Yes! That's what I'm saying. Python's default encoding is ascii and
> that is what is causing the exception:
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position
> 3: ordinal not in range(128)
> 
> Python IMHO made a big mistake by having ascii as the default
> encoding, but it is probably to late to change that now. Kid should
> not replicate that mistake by assuming that "sys.getdefaultencoding()"
> is the explicitly requested encoding. Maybe you can investigate how
> Cheetah chooses encoding? I may be totally wrong, but I don't think it
> relies on sys.getdefaultencoding() at all.
> 
> --
> mvh Bj?rn
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://sel.as-us.falkag.net/sel?cmd_______________________________________________
> kid-template-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/kid-template-discuss
>


I guess it really depends on hew you look at it. I think it is
pretty sane. If Python interprets all strings as ASCII then why
should Kid be any different. In this case I think it is up to the
user to tell Kid how the strings are encoded if they are different.

Two ways to do this:
1. Decode the string into a unicode object
2. Set the magical 'assume_encoding' property on you template
instance.

Quick example:

    In [6]:t = kid.Template(source='<e>$x</e>')
                
    In [7]:t.x = unichr(0xe4).encode('utf-8')
                        
    In [8]:t.serialize()

    [snip]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
    position 0: ordinal not in range(128)

    In [9]:t.assume_encoding = 'utf-8'

    In [10]:t.x = unichr(0xe4).encode('utf-8')

    In [11]:t.serialize()
    Out[11]:'<?xml version="1.0"
    encoding="utf-8"?>\n<e>\xc3\xa4</e>'
    
David                            

-- 
GPG keyID #6272EDAF on http://pgp.mit.edu
Key fingerprint = 8BAA 7E11 8856 E148 6833  655A 92E2 3E00 6272 EDAF

pgpe7pN7mjNET.pgp
Description: PGP signature

Re: [kid-discuss] Kid's decoding guess

Reply via email to