The file you attached is correct, and the same modified DOMPrint that I
used before return the ZWJ characters in the content of getTextContent.
Could you show us the code you are using to read the file?
Alberto
jinesh kj wrote:
hi,
I dumped using mysql -X command which will give me output as xml file.
I dont know whether there is any problem with my xml files. Is there
any specific notation to represent the ZWJ and ZWNJ in xml files?
I am attaching an xml file i have.
Thank you for your help, and if you have a better idea what to do with
the xml file when i get characters like these, or any links to those
details, please point me.
regards
Jinesh K J
On Nov 28, 2007 4:46 PM, Alberto Massari <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:
If you can read the original file, but not when you edit it, I
would bet
the reason is in the way you edit your XML files (and dump from the
database). What are you using? Could you attach a small sample file?
Alberto
jinesh kj wrote:
> hi,
>
> I tried reading the file you send. It didnt give any error,
which means it
> was reading perfectly. I dont know how to check in the debugger
and all, so
> dont know whether it read 200d or not. But if i try to edit the
xml file,
> with some text data along with, it is not reading the the text.
Do i have to
> do anything for it? Basically i am trying to read through an xml
file, which
> is a dump of mysql database. It have many zwj and all. I dont
know whether
> it is according to specified encoding or so and all.But since it
was dumped
> from database, using the built in function, i think a chance for
error is
> too low.
>
> I am trying to use a similar function only, in my program, it
returns
> nothing when there is a ZWJ in my data.
>
> I hope i am clear. I am able to read xml files without ZWJ easily.
>
> regards
>
> Jinesh K J
>
> On Nov 28, 2007 4:02 PM, Alberto Massari
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>
>
>> I am attaching a sample XML that contains a U+200D character
between a
>> --| and |-- pattern; I modified DOMPrint to issue a
>>
>> const XMLCh*
data=doc->getDocumentElement()->getTextContent();
>>
>> and in the debugger I see that data[4] is \x200D
>> Have you checked your source XML really has that character?
Also, is
>> the representation of the ZWJ character in the XML file valid
according
>> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
>>
>> Alberto
>>
>> jinesh kj wrote:
>>
>>> hi,
>>>
>>> Actually, getTextContent is not returning any value when there
is a Zero
>>> width joiner.
>>>
>>> cheers
>>>
>>> Jinesh K J
>>>
>>> On Nov 28, 2007 3:28 PM, Alberto Massari
<[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
>>>
>> wrote:
>>
>>>
>>>> Hi Jinesh,
>>>> which kind of issues are you having? The text returned by
>>>>
>> getTextContent
>>
>>>> should contain a \x200D value inside. Or have you transcoded
it into
>>>> chars?
>>>>
>>>> Alberto
>>>>
>>>> jinesh kj wrote:
>>>>
>>>>
>>>>> hi all,
>>>>>
>>>>> I was trying to read from an XML file where some data have
ZERO Width
>>>>>
>>>>>
>>>> Joiner
>>>>
>>>>
>>>>> in it. I used the getTextContent in DOMNode. I was able to
read the
>>>>>
>>>>>
>>>> contents
>>>>
>>>>
>>>>> without Zero width joiner, but there are some issues with these
>>>>>
>> special
>>
>>>>> characters. What do i have to change? Do i have to make any
special
>>>>> settings? Or do i have to use any other function insttead?
>>>>>
>>>>> cheers
>>>>> Jinesh K J
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>
>
>
>
--
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com
SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ