Robert Bradshaw, 07.09.2010 20:16:
> On Tue, Sep 7, 2010 at 3:31 AM, Stefan Behnel wrote:
>> Robert Bradshaw, 07.09.2010 10:20:
>>>> Could you comment on this please?
>>>>
>>>> http://permalink.gmane.org/gmane.comp.python.cython.devel/10243
>>>>
>>>> I think I made it pretty clear there what I think the two suitable
>>>> alternatives are.
>>>
>>> Yes, you favor either (1) re-interpretation of the literal depending
>>> on the type context they're used in or (2) disallowing interpretation
>>> of string literals when unicode literal are enabled.
>>>
>>> I think (1) is a bad path to take and would prefer not to burden users
>>> with (2).
>>
>> So, what about doing the following then:
>>
>> 1) we keep the current implementation as is, i.e. unprefixed string
>> literals can coerce to char* literals during type analysis that match the
>> byte sequence in the source file and properly handle byte escapes
>
> I'd be more OK with that, except for I'd rather have consistent
> handling of the \u escape. The -2 behavior is the same, the -3
> behavior as below, so the from __future__ import unicode_literals is
> more of an intermediate step, so not quite as important in the long
> run.

I think so, too. In the long run, users should be able to appreciate -3 
more than the partial imports. There's still some way to go to get it 
rolling smoothly (see Lisandro's "str" problem), but that'll come over time.


>> 2) with the -3 option, we disallow byte values>  127 in byte string
>> literals and do not generate a byte string representation for unprefixed
>> string literals that contain them, thus effectively preventing their
>> coercion to char*
>>
>> That's basically the ASCII-only proposal with added escapes, and my
>> proposal minus non-ASCII literal characters. Should make life easy for
>> basically everyone, with the added benefit of increasing the compatibility
>> with Python 3.
>
> +1

Here's an attempt:

http://hg.cython.org/cython-devel/rev/8f4cda480124

Hudson complains about one of the tests in Py<=2.5, but I should be able to 
fix that.


>> We may additionally consider warning about '\u...' in unprefixed char*
>> strings. I think this particular case will be rare enough to encourage a
>> 'b' prefix or a '\\' escape.
>
> If we do this, we should have a warning for sure.

It's generally valid Python to put a plain Unicode escape sequence into a 
byte string, but a warning will make it clear that it does have a code 
smell to do that because it makes the literal look like something that it 
is not. I think that in the context of char* literals, we are free to 
decide either way (as long as the char* context doesn't occur due to an 
internal optimisation of Cython...)


> I'd love to hear what others think.

Sure. Please give it a try.

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to