Nikolaus Rath, 09.10.2010 03:15:
> It seems that Cython translates the line
>
>          if ent.d_name in (b'.', b'..'):
>
> as well as
>
>          if ent.d_name == b'.' or ent.d_name == '..':
>
> into pointer comparison rather than a strcmp():
>
>      if (!(__pyx_v_ent->d_name == __pyx_k_1)) {
>        __pyx_t_3 = (__pyx_v_ent->d_name == __pyx_k_2);
>      } else {
>        __pyx_t_3 = (__pyx_v_ent->d_name == __pyx_k_1);
>      }
>      if (__pyx_t_3) {
>
> (__pyx_k_2 and __pyx_k_1 are "." and "00" respectively).

That's not explicitly intended but a side effect of a couple of 
implementation details and optimisations in Cython. The main problem here 
is that Cython's byte strings start off as char* values, and "==" on 
pointers is the same as "is", i.e. it compares the pointers.

I think it's ok to consider this specific case a bug. If a user wants 
pointer comparison, "is" is the most explicit operator w.r.t. Python 
semantics. Using strcmp() for the "==" operator on char* values makes sense 
to me. We special case char* all over the place, so this is just another 
exception.

So, for char* values, "s1 == s2" should translate into

    (s1 == s2) || (
       (s1 != NULL) && (s2 != NULL) && (strcmp(s1, s2) == 0))

Note that this allows both pointers to be NULL in the success case. I think 
it makes sense to include this.

This change certainly introduces new corner cases, though, as "ptr == ptr" 
means "ptr is ptr" for everything else. So this will surprise at least some 
users and also break existing code. But I think it's at least mostly 
obvious to the average Python developer what the right fix is.

One step further: Would it be worth making "is" the correct operator for 
pointers in the long run? I.e. discourage the use of "==" for pointers and 
eventually make it an error? Personally, I often feel an itch in the back 
of my head when I write "ptr == ptr" and I know that what I actually mean 
is "ptr is ptr". I still write it most of the time because I find "==" more 
quickly readable than the chain of words involving "is", which involves 
text reading instead of just operator spotting.

Opinions?

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to