Hi,

I'm wondering how to continue the support for this feature given the fact that
identifiers are Unicode strings in Py3. We currently only intern byte strings
that look like Python identifiers, so in Py3, they simply no longer look like
identifiers, as they are not Unicode strings.

I can see four ways how to deal with this:

1) drop string interning completely

2) disable string interning in Py3 and use normally created byte strings instead

3) keep separate sets of identifier-like byte strings and unicode strings in
the compiler and write them into the C file. Then, depending on the Python
version, either intern the byte strings or the unicode strings, and create the
other set as un-interned strings.

4) keep the information if a string should be interned for all strings we deal
with (bytes and unicode), remove the intern tab and merge it with the general
string tab by adding an additional field "intern". Then __Pyx_InitStrings()
would create the strings differently depending on the compile time Python
version, i.e., it would intern Unicode identifiers in Py3 and byte string
identifiers in Py2, and create everything else as normal strings.

Personally, I favour 4) - although I could live with 1) - but since I'm not
quite sure what the original intention of string interning was (saving
memory?), I'd like to hear other opinions first.

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to