Hi, I'm wondering how to continue the support for this feature given the fact that identifiers are Unicode strings in Py3. We currently only intern byte strings that look like Python identifiers, so in Py3, they simply no longer look like identifiers, as they are not Unicode strings.
I can see four ways how to deal with this: 1) drop string interning completely 2) disable string interning in Py3 and use normally created byte strings instead 3) keep separate sets of identifier-like byte strings and unicode strings in the compiler and write them into the C file. Then, depending on the Python version, either intern the byte strings or the unicode strings, and create the other set as un-interned strings. 4) keep the information if a string should be interned for all strings we deal with (bytes and unicode), remove the intern tab and merge it with the general string tab by adding an additional field "intern". Then __Pyx_InitStrings() would create the strings differently depending on the compile time Python version, i.e., it would intern Unicode identifiers in Py3 and byte string identifiers in Py2, and create everything else as normal strings. Personally, I favour 4) - although I could live with 1) - but since I'm not quite sure what the original intention of string interning was (saving memory?), I'd like to hear other opinions first. Stefan _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
