Robert Bradshaw, 11.10.2010 07:27:
> On Sat, Oct 9, 2010 at 9:03 AM, Stefan Behnel wrote:
>> My impression is that the dominating use case for comparison of char* values
>> is to compare the strings rather than the pointers. The only truly common
>> use case for pointer comparison (as Dag pointed out) is for loop variables,
>> e.g. when traversing a char* string value byte by byte. Except for this
>> case, I wouldn't be surprised if most new users (especially Python
>> programmers) intuitively expected "==" to compare the string values and "is"
>> to compare the pointers.
>
> On an orthogonal note, I'm surprised you of all people are talking
> about treating char* as strings
I admit that my terminology is a bit sloppy here. In the above, "string"
refers to what C calls strings, i.e. null terminated byte sequences.
> (and assuming the encoding is a
> null-terminated one like ASCII or UTF-8).
I'm not making any assumptions about encodings. The content may be encoded
text, base64 encoded binary data or plain binary data. However, I am aware
that if it contains null bytes, the Python comparison operators will not
work "correctly", e.g.
"" == "\0"
will return True with Python operator semantics when bytes literals are
treated as plain char*.
That's always the case with char* values, though, and Cython's semantics
ignore that in other places, too. I think it's fine to leave this case to
the user. If there are (or may be) null bytes involved, you must pass an
explicit length. Simple rule.
> For all pointers, it might be useful (and fairly easy) to be able to
> do a[:5] == b[:5]. As we do with loops, char* could have an implicit
> end, so a[:] would be a[:strlen(a)] if a is a char*.
I agree that this would be a nice feature. (BTW, these things are currently
easiest to implement in Optimise.py, but I think they deserve a better
place as they actually implement language features, not optimisations).
I should note that Python semantics dictate that a[:5] creates a copy of
the array. However, it's perfectly reasonable to "optimise" the copy away
inside of a comparison so that the above examples do the right thing
efficiently.
On top of that, we could then change the semantics only of bytes *literals*
to make the comparison operators compare their content. I.e.
cdef char* s = ...
print s == b"abcdefg"
will strcmp() s to the byte sequence b"abcdefg" (maybe even using
strncmp()), whereas
cdef char* s = ...
cdef char* cmpval = b"abcdefg"
print s == cmpval
compares the pointers as before, as would
cdef char* s = ...
print s == <char*>b"abcdefg"
I don't see a use case for comparing a char* to a bytes literal as
pointers. Everyone who does that deserves breaking his or her code in a
future Cython version.
In general, I think that bytes literals should behave more like Python
bytes objects than char*, if only to preserve their correct length in the
face of null bytes. They can easily (and safely) coerce to a char* at any
time, so we don't really loose anything by switching them into Python
objects by default and (as Craig hinted) inferring char* where it is safe.
(The above might sounds like a 180° turn of myself, but it's good to change
your mind when the facts change)
Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev