The file you attached is correct, and the same modified DOMPrint that I used before return the ZWJ characters in the content of getTextContent. Could you show us the code you are using to read the file?

Alberto

jinesh kj wrote:
hi,

I dumped using mysql -X command which will give me output as xml file. I dont know whether there is any problem with my xml files. Is there any specific notation to represent the ZWJ and ZWNJ in xml files?

I am attaching an xml file i have.

Thank you for your help, and if you have a better idea what to do with the xml file when i get characters like these, or any links to those details, please point me.

regards

Jinesh K J

On Nov 28, 2007 4:46 PM, Alberto Massari <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    If you can read the original file, but not when you edit it, I
    would bet
    the reason is in the way you edit your XML files (and dump from the
    database). What are you using? Could you attach a small sample file?

    Alberto

    jinesh kj wrote:
    > hi,
    >
    > I tried reading the file you send. It didnt give any error,
    which means it
    > was reading perfectly. I dont know how to check  in the debugger
    and all, so
    > dont know whether it  read 200d or not. But if i try to edit the
    xml file,
    > with some text data along with, it is not reading the the text.
    Do i have to
    > do anything for it? Basically i am trying to read through an xml
    file, which
    > is a dump of mysql database. It have many zwj and all. I dont
    know whether
    > it is according to specified encoding or so and all.But since it
    was dumped
    > from database, using the built in function, i think a chance for
    error is
    > too low.
    >
    > I am trying to use a similar function only, in my program, it
    returns
    > nothing when there is a ZWJ in my data.
    >
    > I hope i am clear. I am able to read xml files without ZWJ easily.
    >
    > regards
    >
    > Jinesh K J
    >
    > On Nov 28, 2007 4:02 PM, Alberto Massari
    <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
    >
    >
    >> I am attaching a sample XML that contains a U+200D character
    between a
    >> --| and |-- pattern; I modified DOMPrint to issue a
    >>
    >>            const XMLCh*
    data=doc->getDocumentElement()->getTextContent();
    >>
    >> and in the debugger I see that data[4] is \x200D
    >> Have you checked your source XML  really has that character?
    Also, is
    >> the representation of the ZWJ character in the XML file valid
    according
    >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
    >>
    >> Alberto
    >>
    >> jinesh kj wrote:
    >>
    >>> hi,
    >>>
    >>> Actually, getTextContent is not returning any value when there
    is a Zero
    >>> width joiner.
    >>>
    >>> cheers
    >>>
    >>> Jinesh K J
    >>>
    >>> On Nov 28, 2007 3:28 PM, Alberto Massari
    <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
    >>>
    >> wrote:
    >>
    >>>
    >>>> Hi Jinesh,
    >>>> which kind of issues are you having? The text returned by
    >>>>
    >> getTextContent
    >>
    >>>> should contain a \x200D value inside. Or have you transcoded
    it into
    >>>> chars?
    >>>>
    >>>> Alberto
    >>>>
    >>>> jinesh kj wrote:
    >>>>
    >>>>
    >>>>> hi all,
    >>>>>
    >>>>> I was trying to read from an XML file where some data have
    ZERO Width
    >>>>>
    >>>>>
    >>>> Joiner
    >>>>
    >>>>
    >>>>> in it. I used the getTextContent in DOMNode. I was able to
    read the
    >>>>>
    >>>>>
    >>>> contents
    >>>>
    >>>>
    >>>>> without Zero width joiner, but there are some issues with these
    >>>>>
    >> special
    >>
    >>>>> characters. What do i have to change? Do i have to make any
    special
    >>>>> settings? Or do i have to use any other function insttead?
    >>>>>
    >>>>> cheers
    >>>>> Jinesh K J
    >>>>>
    >>>>>
    >>>>>
    >>>>>
    >>>
    >>>
    >>
    >
    >
    >




--
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Reply via email to