Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: On 2008-06-05 21:14, Alexandre Vassalotti wrote: > Alexandre Vassalotti <[EMAIL PROTECTED]> added the comment: > > I now think the proposed changes wouldn't be bad thing, after all. I > have been bitten myself by the confusing naming of the Unicode API. So, > there is definitely a potential for errors. > > The main problem with PyUnicode_AsString(), as Marc-André pointed out, > is it doesn't follow the API signature of the rest of the Unicode API: > > char *PyUnicode_AsString(PyObject *unicode); > PyObject *PyUnicode_AsUTF8String(PyObject *unicode); > PyObject *PyUnicode_AsASCIIString(PyObject *unicode); > > On the other hand, I do like the simple API of PyUnicode_AsString. Also, > I have to admit that the apparent similarity between the PyString and > the PyUnicode API helped me to port my code to Py3K when I first started > working on Python core. So, pragmatism might beat purity here.
There are a few cases in the interpreter where it is indeed useful to have direct access to the buffer with the default encoded (= UTF-8 in Py3k) char* buffer. However, the naming of the API is poorly chosen, since the other PyUnicode_AsXYZ() APIs either return a PyObject* or copy the data to an output variable. How about PyUnicode_GetUTF8Buffer() or just PyUnicode_UTF8() ?! Note that the function *must* check the UTF-8 buffer for embedded NUL bytes and then raise an exception if it finds one. Otherwise, the API would silently cause truncations. ---------- title: Remove PyUnicode_AsString(), rework PyUnicode_AsStringAndSize(), add PyUnicode_AsChar() -> Remove PyUnicode_AsString(), rework PyUnicode_AsStringAndSize(), add PyUnicode_AsChar() _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2799> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com