Re: [Cython] [cython-users] String comparison

Stefan Behnel Mon, 11 Oct 2010 00:44:42 -0700

Robert Bradshaw, 11.10.2010 07:27:
> On Sat, Oct 9, 2010 at 9:03 AM, Stefan Behnel wrote:
>> My impression is that the dominating use case for comparison of char* values
>> is to compare the strings rather than the pointers. The only truly common
>> use case for pointer comparison (as Dag pointed out) is for loop variables,
>> e.g. when traversing a char* string value byte by byte. Except for this
>> case, I wouldn't be surprised if most new users (especially Python
>> programmers) intuitively expected "==" to compare the string values and "is"
>> to compare the pointers.
>
> On an orthogonal note, I'm surprised you of all people are talking
> about treating char* as strings


I admit that my terminology is a bit sloppy here. In the above, "string" 
refers to what C calls strings, i.e. null terminated byte sequences.


> (and assuming the encoding is a
> null-terminated one like ASCII or UTF-8).

I'm not making any assumptions about encodings. The content may be encoded 
text, base64 encoded binary data or plain binary data. However, I am aware 
that if it contains null bytes, the Python comparison operators will not 
work "correctly", e.g.

     "" == "\0"

will return True with Python operator semantics when bytes literals are 
treated as plain char*.

That's always the case with char* values, though, and Cython's semantics 
ignore that in other places, too. I think it's fine to leave this case to 
the user. If there are (or may be) null bytes involved, you must pass an 
explicit length. Simple rule.


> For all pointers, it might be useful (and fairly easy) to be able to
> do a[:5] == b[:5]. As we do with loops, char* could have an implicit
> end, so a[:] would be a[:strlen(a)] if a is a char*.

I agree that this would be a nice feature. (BTW, these things are currently 
easiest to implement in Optimise.py, but I think they deserve a better 
place as they actually implement language features, not optimisations).

I should note that Python semantics dictate that a[:5] creates a copy of 
the array. However, it's perfectly reasonable to "optimise" the copy away 
inside of a comparison so that the above examples do the right thing 
efficiently.

On top of that, we could then change the semantics only of bytes *literals* 
to make the comparison operators compare their content. I.e.

     cdef char* s = ...
     print s == b"abcdefg"

will strcmp() s to the byte sequence b"abcdefg" (maybe even using 
strncmp()), whereas

     cdef char* s = ...
     cdef char* cmpval = b"abcdefg"
     print s == cmpval

compares the pointers as before, as would

     cdef char* s = ...
     print s == <char*>b"abcdefg"

I don't see a use case for comparing a char* to a bytes literal as 
pointers. Everyone who does that deserves breaking his or her code in a 
future Cython version.

In general, I think that bytes literals should behave more like Python 
bytes objects than char*, if only to preserve their correct length in the 
face of null bytes. They can easily (and safely) coerce to a char* at any 
time, so we don't really loose anything by switching them into Python 
objects by default and (as Craig hinted) inferring char* where it is safe.

(The above might sounds like a 180° turn of myself, but it's good to change 
your mind when the facts change)

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] [cython-users] String comparison

Reply via email to