Re: Unicode problem in ucs4

M.-A. Lemburg Mon, 23 Mar 2009 03:05:48 -0700

On 2009-03-23 08:18, abhi wrote:
> On Mar 20, 5:47 pm, "M.-A. Lemburg" <[email protected]> wrote:
>>> unicodeTest.c
>>> #include<Python.h>
>>> static PyObject *unicode_helper(PyObject *self,PyObject *args){
>>>    PyObject *sampleObj = NULL;
>>>            Py_UNICODE *sample = NULL;
>>>       if (!PyArg_ParseTuple(args, "O", &sampleObj)){
>>>                 return NULL;
>>>       }
>>>     // Explicitly convert it to unicode and get Py_UNICODE value
>>>       sampleObj = PyUnicode_FromObject(sampleObj);
>>>       sample = PyUnicode_AS_UNICODE(sampleObj);
>>>       wprintf(L"database value after unicode conversion is : %s\n",
>>> sample);
>> You have to use PyUnicode_AsWideChar() to convert a Python
>> Unicode object to a wchar_t representation.
>>
>> Please don't make any assumptions on what Py_UNICODE maps
>> to and always use the the Unicode API for this. It is designed
>> to provide a portable interface and will not do more conversion
>> work than necessary.
>
> Hi Mark,
>      Thanks for the help. I tried PyUnicode_AsWideChar() but I am
> getting the same result i.e. only the first letter.
> 
> sample code:
> 
> #include<Python.h>
> 
> static PyObject *unicode_helper(PyObject *self,PyObject *args){
>         PyObject *sampleObj = NULL;
>         wchar_t *sample = NULL;
>         int size = 0;
> 
>       if (!PyArg_ParseTuple(args, "O", &sampleObj)){
>                 return NULL;
>       }
> 
>          // use wide char function
>       size = PyUnicode_AsWideChar(databaseObj, sample,
> PyUnicode_GetSize(databaseObj));


The 3. argument is the buffer size in bytes, not code points.
The result will require sizeof(wchar_t) * PyUnicode_GetSize(databaseObj)
bytes without a trailing NUL, otherwise sizeof(wchar_t) *
(PyUnicode_GetSize(databaseObj) + 1).

You also have to allocate the buffer to store the wchar_t data in.
Passing in a NULL pointer will result in a seg fault. The function
does not allocate a buffer for you:

/* Copies the Unicode Object contents into the wchar_t buffer w.  At
   most size wchar_t characters are copied.

   Note that the resulting wchar_t string may or may not be
   0-terminated.  It is the responsibility of the caller to make sure
   that the wchar_t string is 0-terminated in case this is required by
   the application.

   Returns the number of wchar_t characters copied (excluding a
   possibly trailing 0-termination character) or -1 in case of an
   error. */

PyAPI_FUNC(Py_ssize_t) PyUnicode_AsWideChar(
    PyUnicodeObject *unicode,   /* Unicode object */
    register wchar_t *w,        /* wchar_t buffer */
    Py_ssize_t size             /* size of buffer */
    );



>       printf("%d chars are copied to sample\n", size);
>       wprintf(L"database value after unicode conversion is : %s\n",
> sample);
>       return Py_BuildValue("");
> 
> }
> 
> 
> static PyMethodDef funcs[]={{"unicodeTest",(PyCFunction)
> unicode_helper,METH_VARARGS,"test ucs2, ucs4"},{NULL}};
> 
> void initunicodeTest(void){
>         Py_InitModule3("unicodeTest",funcs,"");
> 
> }
> 
> This prints the following when input value is given as "test":
> 4 chars are copied to sample
> database value after unicode conversion is : t
> 
> Any ideas?
> 
> -
> Abhigyan
> --
> http://mail.python.org/mailman/listinfo/python-list

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 23 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2009-03-19: Released mxODBC.Connect 1.0.1      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
--
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode problem in ucs4

Reply via email to