[issue3300] urllib.quote and unquote - Unicode issues

Bill Janssen Thu, 07 Aug 2008 17:19:10 -0700

Bill Janssen <[EMAIL PROTECTED]> added the comment:

On Thu, Aug 7, 2008 at 4:23 PM, Guido van Rossum <[EMAIL PROTECTED]>wrote:


>
> >> However I fear that this middle ground will in practice cause:
> >>
> >> (a) more in-the-field failures, since devs are notorious for testing
> >> with ASCII only; and
> >
> > Returning bytes deals with this problem.
>
> In an unpleasant way. We might as well consider changing all APIs that
> deal with URLs to insist on bytes.
>

That seems a bit over-the-top.  Most URL operations *are* about strings, and
most of the APIs should deal with strings; we're talking about the return
result of an operation specifically designed to extract binary data from the
one place where it's allowed to occur.  Vastly smaller than "changing all
APIs that deal with URLs".

By the way, I see that the email package dodges this by encoding the bytes
to strings using the codec "raw-unicode-escape".  In other words, byte
sequences in the outward form of a string.  I'd be OK with that.  That is,
make the default codec for "unquote" be "raw-unicode-escape".  All the bytes
will come through unscathed, and people who are naively expecting ASCII
strings will still receive them, so the code won't break.  This actually
seems to be closest to the current usage, so I'm going to change my patch to
do that.

Added file: http://bugs.python.org/file11078/unnamed

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3300>
_______________________________________

<div dir="ltr">On Thu, Aug 7, 2008 at 4:23 PM, Guido van Rossum <span 
dir="ltr">&lt;<a href="mailto:[EMAIL PROTECTED]">[EMAIL 
PROTECTED]</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote 
class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 
0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br><div class="Ih2E3d">
&gt;&gt; However I fear that this middle ground will in practice cause:<br>
&gt;&gt;<br>
&gt;&gt; (a) more in-the-field failures, since devs are notorious for 
testing<br>
&gt;&gt; with ASCII only; and<br>
&gt;<br>
&gt; Returning bytes deals with this problem.<br>
<br>
</div>In an unpleasant way. We might as well consider changing all APIs that<br>
deal with URLs to insist on bytes.<br>
<div class="Ih2E3d"></div></blockquote><div><br>That seems a bit 
over-the-top.&nbsp; Most URL operations *are* about strings, and most of the 
APIs should deal with strings; we&#39;re talking about the return result of an 
operation specifically designed to extract binary data from the one place where 
it&#39;s allowed to occur.&nbsp; Vastly smaller than &quot;changing all APIs 
that deal with URLs&quot;.<br>
<br>By the way, I see that the email package dodges this by encoding the bytes 
to strings using the codec &quot;raw-unicode-escape&quot;.&nbsp; In other 
words, byte sequences in the outward form of a string.&nbsp; I&#39;d be OK with 
that.&nbsp; That is, make the default codec for &quot;unquote&quot; be 
&quot;raw-unicode-escape&quot;.&nbsp; All the bytes will come through 
unscathed, and people who are naively expecting ASCII strings will still 
receive them, so the code won&#39;t break.&nbsp; This actually seems to be 
closest to the current usage, so I&#39;m going to change my patch to do 
that.<br>
</div><br></div><br></div>

_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3300] urllib.quote and unquote - Unicode issues

Reply via email to