Re: Should ugettext_lazy return instanceof unicode? Or are reusable apps responsible for calling force_text a lot?

2016-10-24 Thread Mike Edmunds
Thanks for the helpful responses.

A more-succinct statement of the underlying issue:

   - ugettext_lazy proxies len() and other methods of unicode, but not 
   __class__. So isinstance(ugettext_lazy(), unicode) is False.
   - urlencode in the Python standard library liberally mixes 
    
   duck typing and isinstance type-testing. As a result, it misinterprets 
   ugettext_lazy() objects as a sequences, rather than text strings. (And 
   right or wrong, there are many cases 
    of 
isinstance 
   text-type detection 
    
   in Python library code 
    
   -- and in other popular packages 
   

   .)
   - My reusable Django app is caught in the middle. (I don't control the 
   calling code that's using ugettext_lazy, and I certainly don't control the 
   Python standard library code.)
   - The calling code can be forgiven for assuming that this should work, 
   because the ugettext_lazy docs 
   

 
   state it can be used "wherever you would use a unicode string ... in Python 
   code."

Per Moritz's comments, it sounds like really the only practical resolution 
is adding a note to the docs. I'll open a ticket/PR to update the 
ugettext_lazy docs, clarifying that the result is not necessarily usable 
"wherever" you can use unicode.


Cheers,
Mike


P.S., FWIW, making isinstance(ugettext_lazy(), unicode) return True does, 
in fact, seem to solve this particular problem. Here's what it might look 
like, with tests . But as 
Raphael Michel points out, there's likely a lot of code out there that's 
depending on the opposite behavior. I found at least one example in Django 
itself (in that linked patch). The change also breaks pickling lazy objects 
in Python 2. And even if there's some way to fix that, given the age of 
ugettext_lazy, any changes to it are likely to cause all kinds of 
downstream problems.



On Sunday, October 23, 2016 at 9:38:58 AM UTC-7, Raphael Michel wrote:
>
> Hello, 
>
> Am Fri, 21 Oct 2016 13:49:16 -0700 (PDT) 
> schrieb Mike Edmunds : 
> >1. Should the result of ugettext_lazy somehow inherit from 
> > unicode? 
>
> I believe this would break giant measures of code out there that use 
> "not isinstace(lazystr, unicode)" exectly to detect that it is a lazy 
> string and not a regular one. 
>
> Cheers 
> rami 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/17a41e96-b169-4418-8d57-a5ac8dd37e64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Should ugettext_lazy return instanceof unicode? Or are reusable apps responsible for calling force_text a lot?

2016-10-23 Thread Raphael Michel
Hello,

Am Fri, 21 Oct 2016 13:49:16 -0700 (PDT)
schrieb Mike Edmunds :
>1. Should the result of ugettext_lazy somehow inherit from
> unicode?

I believe this would break giant measures of code out there that use
"not isinstace(lazystr, unicode)" exectly to detect that it is a lazy
string and not a regular one.

Cheers
rami

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/20161023183840.36da34cb%40kvothe.
For more options, visit https://groups.google.com/d/optout.


pgpvJXHU_ALwE.pgp
Description: Digitale Signatur von OpenPGP


Re: Should ugettext_lazy return instanceof unicode? Or are reusable apps responsible for calling force_text a lot?

2016-10-23 Thread gilberto dos santos alves
hi. IMHO here single line on your app and code like

# -*- coding: utf-8 -*-

put it explicity in calling app of your source code.

or could use (for python 2.7)
[1] https://docs.python.org/2/library/codecs.html#module-codecs

.ps i used this with python-sphinx and solve lot of issues with pt-BR
strings.

--
gilberto dos santos alves
+55(11)9-8646-5049
sao paulo - sp - brasil





2016-10-21 18:49 GMT-02:00 Mike Edmunds :

> A user has reported an issue
>  with a Django
> reusable app I maintain, where they're passing my app a ugettext_lazy
> object that I ultimately pass to the requests package. Because
> instanceof(lazystr, unicode) is False, requests and urllib.urlencode
> mis-handle the text, leading to a UnicodeEncodeError or an
> incorrectly-encoded query string.
>
> I'm wondering what the right fix for this is:
>
>1. Should the result of ugettext_lazy somehow inherit from unicode? If
>indeed the "result of a ugettext_lazy() call can be used wherever you would
>use a unicode string" (docs
>
> ),
>then it would seem to be a problem that isinstance(lazystr, unicode) isn't
>true. Both requests
>
> 
>and urllib.urlencode
> (also
>py3 urllib.parse.urlencode
>)
>use instanceof tests to detect text strings. (And unfortunately, that's
>probably
>
> 
>a lot more common than duck-typing in this case.)
>2. Or should my reusable app be calling force_text on *everything* it
>might receive from its callers before passing on to other packages?
>Essentially saying, lazy strings are really only valid while inside the
>Django world, and a (currently-undocumented) responsibility of reusable
>apps is to convert all lazy strings before handing them off to other
>(non-Django) python code.
>3. Or should I just be telling my app's users to call force_text
>themselves if they're using ugettext_lazy? (Not thrilled with this idea, as
>missing it can lead to very subtle errors. See the 'p4' example below. And
>this might warrant a clarification to "... can be used wherever you would
>use a unicode string..." in the docs.)
>4. Or...?
>
> Here's an (extremely pared-down) example demonstrating the specific
> problem:
>
> # My reusable app passes several string params from the caller to requests:
> import requests
> def my_reusable_app(params):
> return requests.post('http://example.com', params=params)
>
> # Code in the calling app:
> from django.utils.translation import ugettext, ugettext_lazy
> response = my_reusable_app({
> 'p1': u"alpha\u0391", # works correctly
> 'p2': ugettext(u"beta\u0392"), # works correctly
> 'p3': ugettext_lazy(u"gamma\u0393"), # requests: UnicodeEncodeError "in 
> position 0" (!)
> 'p4': ugettext_lazy(u"ASCII"), # urlencode: generates 
> "p4=A=S=C=I=I" rather than "p4=ASCII"
> })
>
> print(response.request.url)
>
> The UnicodeEncodeError in p3 results from requests.models._encode_params 
> 
>  not realizing the ugettext_lazy object is unicode, and failing to encode it 
> to utf-8 before handing off to urlencode.
>
>
> If you comment p3 out, the exception goes away, but urlencode 
>  fails to 
> realize the p4 ugettext_lazy object is text, and incorrectly encodes it as a 
> sequence of individual character params.
>
>
> [Above is all Python 2.7, but also applies to python3; substitute "str" 
> wherever I wrote "unicode". Django 1.8--1.10, and probably others.]
>
>
> Thanks for any advice. Happy to take a shot at proposing doc changes, if 
> that's the right answer.
>
>
> Mike
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/0ad8b57f-d158-438c-b822-
> ca791577ba34%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 

Re: Should ugettext_lazy return instanceof unicode? Or are reusable apps responsible for calling force_text a lot?

2016-10-22 Thread 'Moritz Sichert' via Django developers (Contributions to Django itself)
>  1. Should the result of ugettext_lazy somehow inherit from unicode? If indeed
> the "result of a ugettext_lazy() call can be used wherever you would use a
> unicode string"

I don't think making isinstance(lazy_str, unicode) return True would really fix
things, as it will probably break somewhere deeper then. In essence I think this
boils down to the other libs not duck-typing "correctly". However it is
definitely worth mentioning those limitations in the docs.

>  2. Or should my reusable app be calling force_text on /everything/ it might
> receive from its callers before passing on to other packages? Essentially
> saying, lazy strings are really only valid while inside the Django world,
> and a (currently-undocumented) responsibility of reusable apps is to 
> convert
> all lazy strings before handing them off to other (non-Django) python 
> code.

You don't need force_text() for that, calling str(my_lazy_str) is enough (or
six.text_type(my_lazy_str) if you want to support Python 2.7).
I think this would be actually the best duck-typing approach.

>  3. Or should I just be telling my app's users to call force_text themselves 
> if
> they're using ugettext_lazy? (Not thrilled with this idea, as missing it 
> can
> lead to very subtle errors. See the 'p4' example below. And this might
> warrant a clarification to "... can be used wherever you would use a 
> unicode
> string..." in the docs.)

I don't think this error is subtle, I mean the function name tells you exactly
that it is lazy. I would say it is the responsibility of the programmer using
ugettext_lazy() to transform it to a string when using libraries that know
nothing about Django. Still, I'd say a reusable Django app should probably be
able to deal with lazy strings.

The best "fix" in my opinion is to add a note in the docs.


Moritz

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/70239b9a-1bed-7019-aa9e-f9478cb43b50%40googlemail.com.
For more options, visit https://groups.google.com/d/optout.


signature.asc
Description: OpenPGP digital signature


Should ugettext_lazy return instanceof unicode? Or are reusable apps responsible for calling force_text a lot?

2016-10-21 Thread Mike Edmunds
A user has reported an issue 
 with a Django 
reusable app I maintain, where they're passing my app a ugettext_lazy 
object that I ultimately pass to the requests package. Because 
instanceof(lazystr, unicode) is False, requests and urllib.urlencode 
mis-handle the text, leading to a UnicodeEncodeError or an 
incorrectly-encoded query string.

I'm wondering what the right fix for this is:

   1. Should the result of ugettext_lazy somehow inherit from unicode? If 
   indeed the "result of a ugettext_lazy() call can be used wherever you would 
   use a unicode string" (docs 
   
),
 
   then it would seem to be a problem that isinstance(lazystr, unicode) isn't 
   true. Both requests 
   

 
   and urllib.urlencode 
    (also 
   py3 urllib.parse.urlencode 
   ) 
   use instanceof tests to detect text strings. (And unfortunately, that's 
   probably 
   

 
   a lot more common than duck-typing in this case.)
   2. Or should my reusable app be calling force_text on *everything* it 
   might receive from its callers before passing on to other packages? 
   Essentially saying, lazy strings are really only valid while inside the 
   Django world, and a (currently-undocumented) responsibility of reusable 
   apps is to convert all lazy strings before handing them off to other 
   (non-Django) python code.
   3. Or should I just be telling my app's users to call force_text 
   themselves if they're using ugettext_lazy? (Not thrilled with this idea, as 
   missing it can lead to very subtle errors. See the 'p4' example below. And 
   this might warrant a clarification to "... can be used wherever you would 
   use a unicode string..." in the docs.)
   4. Or...?

Here's an (extremely pared-down) example demonstrating the specific problem:

# My reusable app passes several string params from the caller to requests:
import requests
def my_reusable_app(params):
return requests.post('http://example.com', params=params)

# Code in the calling app:
from django.utils.translation import ugettext, ugettext_lazy
response = my_reusable_app({
'p1': u"alpha\u0391", # works correctly
'p2': ugettext(u"beta\u0392"), # works correctly
'p3': ugettext_lazy(u"gamma\u0393"), # requests: UnicodeEncodeError "in 
position 0" (!)
'p4': ugettext_lazy(u"ASCII"), # urlencode: generates 
"p4=A=S=C=I=I" rather than "p4=ASCII"
})

print(response.request.url)

The UnicodeEncodeError in p3 results from requests.models._encode_params 

 not realizing the ugettext_lazy object is unicode, and failing to encode it to 
utf-8 before handing off to urlencode.


If you comment p3 out, the exception goes away, but urlencode 
 fails to 
realize the p4 ugettext_lazy object is text, and incorrectly encodes it as a 
sequence of individual character params.


[Above is all Python 2.7, but also applies to python3; substitute "str" 
wherever I wrote "unicode". Django 1.8--1.10, and probably others.]


Thanks for any advice. Happy to take a shot at proposing doc changes, if that's 
the right answer.


Mike


-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/0ad8b57f-d158-438c-b822-ca791577ba34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.