Hi, On 18 March 2015 at 15:49, Amaury Forgeot d'Arc <amaur...@gmail.com> wrote: > 2015-03-17 18:27 GMT+01:00 Eleytherios Stamatogiannakis <est...@gmail.com>: >> Right now when PyPy receives a utf8 string (from a C function) it has to >> do 2 copies: >> >> 1. convert the cdata string to a pypy byte string via ffi.string >> 2. convert ffi.string to a unicode string >> >> When pypy sends a utf8 string it also does 2 copies: >> >> 1. convert pypy unicode string to utf8-encoded byte string >> 2. copy the byte string into a cdata string.
The "easy" solution to reduce the number of copies is to have one custom function that does both steps. The more involved solution that you suggest is, imho, breaking the way CFFI is supposed to work; see below. >> From what i understand, there is a cffi optimization dealing with windows >> unicode (via set_unicode) where on windows platforms and when using the >> native windows unicode strings, cffi avoids doing one of the copies in both >> of above cases. >> >> On linux where the default unicode format for C libraries nowadays is >> UTF8, there is no such optimization, so we have to do the two copies in all >> string passing. I think you're misunderstanding set_unicode() (or else I'm misunderstanding what you say). It's just a way to declare some Windows-specific unicode types, like TCHAR, to be either "char" or "wchar_t". It doesn't enable or disable any optimization. >> PyPy at some point was going towards using utf8 string internally, but i >> don't know if this is still the plan or not. PyPy might go there, at some point, but clearly not CPython. We still need a way to avoid the double copies there. >> 1. If PyPy doesn't go towards using utf8 strings internally, maybe we need >> some special C type that denotes that the string is utf8 and pypy/cffi >> should do the conversion from-to it automatically. Something like "wchar_t" >> in windows but denoting a utf8 string. CFFI can define a special type >> ("__utf8char_t"?) for these strings. What we really want is simply a variant of ffi.string() that accepts a "char *" pointer, interprets it as utf-8, and returns a unicode object; as well as another function that does the opposite. If you're interested in supporting the Windows case specially, then you want a variant that would copy from/to a "TCHAR *" pointer on Windows. This is doable without any CFFI special types. > This is a first step towards SWIG's typemaps: > http://www.swig.org/Doc3.0/Typemaps.html#Typemaps_nn4 > > That's also something I wanted to have in another projects: automatic > conversion to PYTHON_HANDLE, for example. > > But typemaps are a tough thing, and they would likely differ between CPython > and PyPy. > Armin, what do you think? I think that typemaps are not the right solution to this problem :-) A bientôt, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev