Re: How to print first(national) char from unicode string encoded in utf-8?

2008-09-01 Thread Marco Bizzarri
2008/9/1  [EMAIL PROTECTED]:
 Hi,

 I have a problem with unicode string in Pylons templates(Mako). I will
 print first char from my string encoded in UTF-8 and urllib.quote(),
 for example string 'Łukasz':

 ${urllib.unquote(c.user.firstName).encode('latin-1')[0:1]}

 and I received this information:

 type 'exceptions.UnicodeDecodeError': 'utf8' codec can't decode byte
 0xc5 in position 0: unexpected end of data

 When I change from [0:1] to [0:2] everything is ok. I think it is
 because of unicode and encoding utf-8(2 bytes).

 How to resolve this problem?

 Best regards
 --
 http://mail.python.org/mailman/listinfo/python-list


First: you're talking about utf8 encoding, but you've written latin1
encoding. Even though I do not know Mako templates, there should be no
problem in your snippet of code, if encoding is latin1, at least for
what I can understand.

Do not assume utf8 is a two byte encoding; utf8 is a variable length
encoding. Indeed,

'a' encoded as utf8 is 'a' (one byte)

'à' encode as utf8 is '\xc3\xa0' (two bytes).


Can you explain what you're trying to accomplish (rather than how
you're tryin to accomplish it) ?



Regards
Marco



-- 
Marco Bizzarri
http://notenotturne.blogspot.com/
http://iliveinpisa.blogspot.com/
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to print first(national) char from unicode string encoded in utf-8?

2008-09-01 Thread sniipe
On 1 Wrz, 15:10, Marco Bizzarri [EMAIL PROTECTED] wrote:
 2008/9/1  [EMAIL PROTECTED]:



  Hi,

  I have a problem with unicode string in Pylons templates(Mako). I will
  print first char from my string encoded in UTF-8 and urllib.quote(),
  for example string 'Łukasz':

  ${urllib.unquote(c.user.firstName).encode('latin-1')[0:1]}

  and I received this information:

  type 'exceptions.UnicodeDecodeError': 'utf8' codec can't decode byte
  0xc5 in position 0: unexpected end of data

  When I change from [0:1] to [0:2] everything is ok. I think it is
  because of unicode and encoding utf-8(2 bytes).

  How to resolve this problem?

  Best regards
  --
 http://mail.python.org/mailman/listinfo/python-list

 First: you're talking about utf8 encoding, but you've written latin1
 encoding. Even though I do not know Mako templates, there should be no
 problem in your snippet of code, if encoding is latin1, at least for
 what I can understand.

 Do not assume utf8 is a two byte encoding; utf8 is a variable length
 encoding. Indeed,

 'a' encoded as utf8 is 'a' (one byte)

 'à' encode as utf8 is '\xc3\xa0' (two bytes).

 Can you explain what you're trying to accomplish (rather than how
 you're tryin to accomplish it) ?

 Regards
 Marco

 --
 Marco 
 Bizzarrihttp://notenotturne.blogspot.com/http://iliveinpisa.blogspot.com/

When I do ${urllib.unquote(c.user.firstName)} without encoding to
latin-1 I got different chars than I will get: no Łukasz but Łukasz
--
http://mail.python.org/mailman/listinfo/python-list

Re: How to print first(national) char from unicode string encoded in utf-8?

2008-09-01 Thread Marco Bizzarri
On Mon, Sep 1, 2008 at 3:25 PM,  [EMAIL PROTECTED] wrote:


 When I do ${urllib.unquote(c.user.firstName)} without encoding to
 latin-1 I got different chars than I will get: no Łukasz but Å ukasz
 --
 http://mail.python.org/mailman/listinfo/python-list

That's crazy. string.encode('latin1') gives you a latin1 encoded
string; latin1 is a single byte encoding, therefore taking the first
byte should be no problem.

Have you tried:

urlib.unquote(c.user.firstName)[0].encode('latin1') or

urlib.unquote(c.user.firstName)[0].encode('utf8')

I'm assuming here that the urlib.unquote(c.user.firstName) returns an
encodable string (which I'm absolutely not sure), but if it does, this
should take the first 'character'.

Regards
Marco
-- 
Marco Bizzarri
http://notenotturne.blogspot.com/
http://iliveinpisa.blogspot.com/
--
http://mail.python.org/mailman/listinfo/python-list