On Mar 23, 3:04 pm, "M.-A. Lemburg" <m...@egenix.com> wrote: > On 2009-03-23 08:18, abhi wrote: > > > > > On Mar 20, 5:47 pm, "M.-A. Lemburg" <m...@egenix.com> wrote: > >>> unicodeTest.c > >>> #include<Python.h> > >>> static PyObject *unicode_helper(PyObject *self,PyObject *args){ > >>> PyObject *sampleObj = NULL; > >>> Py_UNICODE *sample = NULL; > >>> if (!PyArg_ParseTuple(args, "O", &sampleObj)){ > >>> return NULL; > >>> } > >>> // Explicitly convert it to unicode and get Py_UNICODE value > >>> sampleObj = PyUnicode_FromObject(sampleObj); > >>> sample = PyUnicode_AS_UNICODE(sampleObj); > >>> wprintf(L"database value after unicode conversion is : %s\n", > >>> sample); > >> You have to use PyUnicode_AsWideChar() to convert a Python > >> Unicode object to a wchar_t representation. > > >> Please don't make any assumptions on what Py_UNICODE maps > >> to and always use the the Unicode API for this. It is designed > >> to provide a portable interface and will not do more conversion > >> work than necessary. > > > Hi Mark, > > Thanks for the help. I tried PyUnicode_AsWideChar() but I am > > getting the same result i.e. only the first letter. > > > sample code: > > > #include<Python.h> > > > static PyObject *unicode_helper(PyObject *self,PyObject *args){ > > PyObject *sampleObj = NULL; > > wchar_t *sample = NULL; > > int size = 0; > > > if (!PyArg_ParseTuple(args, "O", &sampleObj)){ > > return NULL; > > } > > > // use wide char function > > size = PyUnicode_AsWideChar(databaseObj, sample, > > PyUnicode_GetSize(databaseObj)); > > The 3. argument is the buffer size in bytes, not code points. > The result will require sizeof(wchar_t) * PyUnicode_GetSize(databaseObj) > bytes without a trailing NUL, otherwise sizeof(wchar_t) * > (PyUnicode_GetSize(databaseObj) + 1). > > You also have to allocate the buffer to store the wchar_t data in. > Passing in a NULL pointer will result in a seg fault. The function > does not allocate a buffer for you: > > /* Copies the Unicode Object contents into the wchar_t buffer w. At > most size wchar_t characters are copied. > > Note that the resulting wchar_t string may or may not be > 0-terminated. It is the responsibility of the caller to make sure > that the wchar_t string is 0-terminated in case this is required by > the application. > > Returns the number of wchar_t characters copied (excluding a > possibly trailing 0-termination character) or -1 in case of an > error. */ > > PyAPI_FUNC(Py_ssize_t) PyUnicode_AsWideChar( > PyUnicodeObject *unicode, /* Unicode object */ > register wchar_t *w, /* wchar_t buffer */ > Py_ssize_t size /* size of buffer */ > ); > > > > > printf("%d chars are copied to sample\n", size); > > wprintf(L"database value after unicode conversion is : %s\n", > > sample); > > return Py_BuildValue(""); > > > } > > > static PyMethodDef funcs[]={{"unicodeTest",(PyCFunction) > > unicode_helper,METH_VARARGS,"test ucs2, ucs4"},{NULL}}; > > > void initunicodeTest(void){ > > Py_InitModule3("unicodeTest",funcs,""); > > > } > > > This prints the following when input value is given as "test": > > 4 chars are copied to sample > > database value after unicode conversion is : t > > > Any ideas? > > > - > > Abhigyan > > -- > >http://mail.python.org/mailman/listinfo/python-list > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Mar 23 2009)>>> > Python/Zope Consulting and Support ... http://www.egenix.com/ > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > ________________________________________________________________________ > 2009-03-19: Released mxODBC.Connect 1.0.1 http://python.egenix.com/ > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/
Thanks Marc, John, With your help, I am at least somewhere. I re-wrote the code to compare Py_Unicode and wchar_t outputs and they both look exactly the same. #include<Python.h> static PyObject *unicode_helper(PyObject *self,PyObject *args){ const char *name; PyObject *sampleObj = NULL; Py_UNICODE *sample = NULL; wchar_t * w=NULL; int size = 0; int i; if (!PyArg_ParseTuple(args, "O", &sampleObj)){ return NULL; } // Explicitly convert it to unicode and get Py_UNICODE value sampleObj = PyUnicode_FromObject(sampleObj); sample = PyUnicode_AS_UNICODE(sampleObj); printf("size of sampleObj is : %d\n",PyUnicode_GET_SIZE (sampleObj)); w = (wchar_t *) malloc((PyUnicode_GET_SIZE(sampleObj)+1)*sizeof (wchar_t)); size = PyUnicode_AsWideChar(sampleObj,w,(PyUnicode_GET_SIZE(sampleObj) +1)*sizeof(wchar_t)); printf("%d chars are copied to w\n",size); printf("size of wchar_t is : %d\n", sizeof(wchar_t)); printf("size of Py_UNICODE is: %d\n",sizeof(Py_UNICODE)); for(i=0;i<PyUnicode_GET_SIZE(sampleObj);i++){ printf("sample is : %c\n",sample[i]); printf("w is : %c\n",w[i]); } return sampleObj; } static PyMethodDef funcs[]={{"unicodeTest",(PyCFunction) unicode_helper,METH_VARARGS,"test ucs2, ucs4"},{NULL}}; void initunicodeTest(void){ Py_InitModule3("unicodeTest",funcs,""); } This gives the following output when I pass "abc" as input: size of sampleObj is : 3 3 chars are copied to w size of wchar_t is : 4 size of Py_UNICODE is: 4 sample is : a w is : a sample is : b w is : b sample is : c w is : c So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3 \0s after a char, printf or wprintf is only printing one letter. I need to further process the data and those libraries will need the data in UCS2 format (2 bytes), otherwise they fail. Is there any way by which I can force wchar_t to be 2 bytes, or can I convert this UCS4 data to UCS2 explicitly? - Abhigyan -- http://mail.python.org/mailman/listinfo/python-list