The problem is probably in the transcoding. XMLString::transcode() transcodes to whatever native code page your machine is set up with. Unless that code page allows zwj and zwnj to be represented, your transcoding results will not be what you expect. You should transcode to an encoding that can represent any characters you can get (like Xerces' internal UTF-16 encoding). See XMLTransService.
________________________________ From: jinesh kj [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 28, 2007 9:56 AM To: [email protected] Subject: Re: reg:[reading data with ZWJ and ZWNJ] hi, I actually need the whole text with the zwj. My code i am attaching. Only the section which does interaction with xml file. Hope its enough. My code is little big, so it may take a little time for you to understand i havent commented it properly. If you need explanation on any part please let me know. cheers Jinesh K J On Nov 28, 2007 5:43 PM, Alberto Massari <[EMAIL PROTECTED]> wrote: The file you attached is correct, and the same modified DOMPrint that I used before return the ZWJ characters in the content of getTextContent. Could you show us the code you are using to read the file? Alberto jinesh kj wrote: > hi, > > I dumped using mysql -X command which will give me output as xml file. > I dont know whether there is any problem with my xml files. Is there > any specific notation to represent the ZWJ and ZWNJ in xml files? > > I am attaching an xml file i have. > > Thank you for your help, and if you have a better idea what to do with > the xml file when i get characters like these, or any links to those > details, please point me. > > regards > > Jinesh K J > > On Nov 28, 2007 4:46 PM, Alberto Massari <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > If you can read the original file, but not when you edit it, I > would bet > the reason is in the way you edit your XML files (and dump from the > database). What are you using? Could you attach a small sample file? > > Alberto > > jinesh kj wrote: > > hi, > > > > I tried reading the file you send. It didnt give any error, > which means it > > was reading perfectly. I dont know how to check in the debugger > and all, so > > dont know whether it read 200d or not. But if i try to edit the > xml file, > > with some text data along with, it is not reading the the text. > Do i have to > > do anything for it? Basically i am trying to read through an xml > file, which > > is a dump of mysql database. It have many zwj and all. I dont > know whether > > it is according to specified encoding or so and all.But since it > was dumped > > from database, using the built in function, i think a chance for > error is > > too low. > > > > I am trying to use a similar function only, in my program, it > returns > > nothing when there is a ZWJ in my data. > > > > I hope i am clear. I am able to read xml files without ZWJ easily. > > > > regards > > > > Jinesh K J > > > > On Nov 28, 2007 4:02 PM, Alberto Massari > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote: > > > > > >> I am attaching a sample XML that contains a U+200D character > between a > >> --| and |-- pattern; I modified DOMPrint to issue a > >> > >> const XMLCh* > data=doc->getDocumentElement()->getTextContent(); > >> > >> and in the debugger I see that data[4] is \x200D > >> Have you checked your source XML really has that character? > Also, is > >> the representation of the ZWJ character in the XML file valid > according > >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)? > >> > >> Alberto > >> > >> jinesh kj wrote: > >> > >>> hi, > >>> > >>> Actually, getTextContent is not returning any value when there > is a Zero > >>> width joiner. > >>> > >>> cheers > >>> > >>> Jinesh K J > >>> > >>> On Nov 28, 2007 3:28 PM, Alberto Massari > < [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > >>> > >> wrote: > >> > >>> > >>>> Hi Jinesh, > >>>> which kind of issues are you having? The text returned by > >>>> > >> getTextContent > >> > >>>> should contain a \x200D value inside. Or have you transcoded > it into > >>>> chars? > >>>> > >>>> Alberto > >>>> > >>>> jinesh kj wrote: > >>>> > >>>> > >>>>> hi all, > >>>>> > >>>>> I was trying to read from an XML file where some data have > ZERO Width > >>>>> > >>>>> > >>>> Joiner > >>>> > >>>> > >>>>> in it. I used the getTextContent in DOMNode. I was able to > read the > >>>>> > >>>>> > >>>> contents > >>>> > >>>> > >>>>> without Zero width joiner, but there are some issues with these > >>>>> > >> special > >> > >>>>> characters. What do i have to change? Do i have to make any > special > >>>>> settings? Or do i have to use any other function insttead? > >>>>> > >>>>> cheers > >>>>> Jinesh K J > >>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > >> > > > > > > > > > > > -- > My Feelings,Expressions- > http://logbookofanobserver.blogspot.com > > SMC : My computer, My language http://smc.org.in > സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ -- My Feelings,Expressions- http://logbookofanobserver.blogspot.com SMC : My computer, My language http://smc.org.in സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
