Re: [Cython] Fixing #602 - type inference for byte string literals

Lisandro Dalcin Mon, 03 Jan 2011 04:02:05 -0800

On 3 January 2011 04:41, Stefan Behnel <stefan...@behnel.de> wrote:
> Hi,
>
> I've been working on a fix for ticket #602, negative indexing for inferred
> char*.
>
> http://trac.cython.org/cython_trac/ticket/602
>
> Currently, when you write this:
>
>     s = b'abc'
>
> s is inferred as char*. This has several drawbacks. For one, we loose the
> length information, so "len(s)" becomes O(n) instead of O(1). Negative
> indexing fails completely because it will use pointer arithmetic, thus
> leaving the allocated memory area of the string. Also, code like the
> following is extremely inefficient because it requires multiple conversions
> from a char* of unknown length to a Python bytes object:
>
>     s = b'abc'
>     a = s1 + s
>     b = s2 + s
>
> I came to the conclusion that the right fix is to stop letting byte string
> literals start off as char*. This immediately fixes these issues and
> improves Python compatibility while still allowing automatic coercion, but
> it also comes with its own drawbacks.
>
> In nogil blocks, you will have to explicitly declare a variable as char*
> when assigning a byte string literal to it, otherwise you'd get a compile
> time error for a Python object assignment. I think this is a minor issue as
> most users would declare their variables anyway when using nogil blocks.
> Given that there isn't much you can do with a Python string inside of a
> nogil block, we could also honour nogil blocks during type inference and
> automatically infer char* for literals here. I don't think it would hurt
> anyone to do that.
>
> The second drawback is that it impacts type inference for char loops.
> Previously, you could write
>
>     s = b'abc'
>     for c in s:
>         print c
>
> and Cython would infer 'char' for c and print integer byte values. When s
> is inferred as 'bytes', c will be inferred as 'Python object' because
> Python 2 returns 1-byte strings and Python 3 returns integers on iteration.
> Thus the loop will run entirely in Python code and return different things
> in Py2 and Py3.
>
> I do not expect that this is a major issue either. Iteration over literals
> should be rare, after all, and if the byte string is constructed in any
> way, the type either becomes a bytes object through Python operations (like
> concatenation) or is explicitly provided, e.g. as a return type of a
> function call. But it is a clear behavioural change for the type inference
> in an area where Cython's (and Python's) semantics are tricky anyway.
>
> Personally, I think that the advantages outweigh the disadvantages here.
> Most common use cases won't notice the change because coercion will not be
> impacted, and most existing code (IMHO) either uses explicit typing or
> expects a Python bytes object anyway. So my preferred change would be to
> make byte string literals 'bytes' by default, except in nogil blocks.
>
> Opinions?
>


+1


-- 
Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169
_______________________________________________
Cython-dev mailing list
Cython-dev@codespeak.net
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Fixing #602 - type inference for byte string literals

Reply via email to