hi, I dumped using mysql -X command which will give me output as xml file. I dont know whether there is any problem with my xml files. Is there any specific notation to represent the ZWJ and ZWNJ in xml files?
I am attaching an xml file i have. Thank you for your help, and if you have a better idea what to do with the xml file when i get characters like these, or any links to those details, please point me. regards Jinesh K J On Nov 28, 2007 4:46 PM, Alberto Massari <[EMAIL PROTECTED]> wrote: > If you can read the original file, but not when you edit it, I would bet > the reason is in the way you edit your XML files (and dump from the > database). What are you using? Could you attach a small sample file? > > Alberto > > jinesh kj wrote: > > hi, > > > > I tried reading the file you send. It didnt give any error, which means > it > > was reading perfectly. I dont know how to check in the debugger and > all, so > > dont know whether it read 200d or not. But if i try to edit the xml > file, > > with some text data along with, it is not reading the the text. Do i > have to > > do anything for it? Basically i am trying to read through an xml file, > which > > is a dump of mysql database. It have many zwj and all. I dont know > whether > > it is according to specified encoding or so and all.But since it was > dumped > > from database, using the built in function, i think a chance for error > is > > too low. > > > > I am trying to use a similar function only, in my program, it returns > > nothing when there is a ZWJ in my data. > > > > I hope i am clear. I am able to read xml files without ZWJ easily. > > > > regards > > > > Jinesh K J > > > > On Nov 28, 2007 4:02 PM, Alberto Massari <[EMAIL PROTECTED]> > wrote: > > > > > >> I am attaching a sample XML that contains a U+200D character between a > >> --| and |-- pattern; I modified DOMPrint to issue a > >> > >> const XMLCh* > data=doc->getDocumentElement()->getTextContent(); > >> > >> and in the debugger I see that data[4] is \x200D > >> Have you checked your source XML really has that character? Also, is > >> the representation of the ZWJ character in the XML file valid according > >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)? > >> > >> Alberto > >> > >> jinesh kj wrote: > >> > >>> hi, > >>> > >>> Actually, getTextContent is not returning any value when there is a > Zero > >>> width joiner. > >>> > >>> cheers > >>> > >>> Jinesh K J > >>> > >>> On Nov 28, 2007 3:28 PM, Alberto Massari <[EMAIL PROTECTED]> > >>> > >> wrote: > >> > >>> > >>>> Hi Jinesh, > >>>> which kind of issues are you having? The text returned by > >>>> > >> getTextContent > >> > >>>> should contain a \x200D value inside. Or have you transcoded it into > >>>> chars? > >>>> > >>>> Alberto > >>>> > >>>> jinesh kj wrote: > >>>> > >>>> > >>>>> hi all, > >>>>> > >>>>> I was trying to read from an XML file where some data have ZERO > Width > >>>>> > >>>>> > >>>> Joiner > >>>> > >>>> > >>>>> in it. I used the getTextContent in DOMNode. I was able to read the > >>>>> > >>>>> > >>>> contents > >>>> > >>>> > >>>>> without Zero width joiner, but there are some issues with these > >>>>> > >> special > >> > >>>>> characters. What do i have to change? Do i have to make any special > >>>>> settings? Or do i have to use any other function insttead? > >>>>> > >>>>> cheers > >>>>> Jinesh K J > >>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > >> > > > > > > > > -- My Feelings,Expressions- http://logbookofanobserver.blogspot.com SMC : My computer, My language http://smc.org.in സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
<?xml version="1.0"?> <resultset statement="select * from TEXTS where BookCode=0009 "> <row> <field name="BookCode">0009</field> <field name="PageNo">6</field> <field name="ImageLoc">/test/extra1/garnome-2.20.0/0009_Meinkamph_Img_600_Deskew/0009_MeinKamph_Img_600_Deskew_Page_0006.tif</field> <field name="Text">ut'amayaayirunnu. mahaajanasamuuhatte aat'iyulaykkaanâ poonna prabhaashhand-a ng-ng-al'ilâkkuut'i addeihn' tanr'e bhaashhaaparamaaya shaili muurâchchakuut't'iyet'uttirunnu. etiraal'iyilâninnu vashamaakkeind-t'a at'avukal'â manassilaakkikkond-t'utanne manakkaruttoot'e ayaal'e neirit'unnatilâ hir'r'larâ kaand-ikkunna dhairyavun' sthairyavun' lookananmaykkuveind-t'i upayoogichchirunneng-kilâ ennu naan' praarâtthichchupookun'. charitrn' srxshht'ichcha oru charitrapurushhanâ rachikkunna samakaaliinacharitrn' enna nilayilâ- naasi prasthaanattinr'e aantara muulyavun' atinr'e veirukal'un' pat'hikkaanâ utakunna muulagranthamenna nila yilâ- ii krxti shraddheiyamaand-ennatinu sn'shayamilla. uttamamaaya oru saahityarachana ennatilupari ii irupataan' nuur'r'aand-t'ile ativikasita raajyang-ng-al'ilâ onnaaya jarâmaniyilâ parishuddha aaryanâraktattinr'e mahattvn' uyarâttippit'ikkaanâ vempiya oru raashht'ratantrajnj-anr'e vikalamaaya antarâdarâshanattinr'e aavishhkaarn' enna nilaykk ii krxti shraddha arâhikkunnu. rachayitaavinr'e aatmavatta suukshhmamaayi pratiphalippikkunna ii krxti phaasisatteyun' sarâvaadhipatyatteyun' etirâkkaanaagrahikkunna ellaa varun' paat'hapustakn'poole pat'hikkeind-t'ataand-. vit't'uviizchayillaatta oru vikat'a vishvaasattinr'e vaktaavaaya addeihn' etra lakshhn' nissahaayaraaya juutanmaareyun' mar'r'ul'l'avareyun' gyaas choon'bar'ukal'ilit't'u konnu ennat innun' oru du:svapnamaayi irupataan'nuur'r'aand-t'inr'e kallichcha man:saakshhiyeppoolun' aloosarappet'uttikkond-t'irikkunnu. juutaviroodhn', svavn'shaaraadhana, adhikaaradaahn', vit't'uviizchayillaayma, aashayang-ng-al'ilul'l'a kat'un'pit'uttn' enning-ng-ane pala duushhyavashang-ng-al'umul'l'a hir'r'lar'ut'e svabhaavattinr'e mar'uvashn' ii aatmakathayilâ ang-ng-ing-ng- sphurikkunnund-t'. hir'r'larâkk jarâmaniyilâ kit't'iya vampichcha pintund-aykkul'l'a at'isthaanakaarand-avun' ii krxtiyilâninnu kur'eyokke manassilaakkaan'. ad'ool'âph hir'r'larâ vishvacharitrattile orapuurâvvapratibhaasamaand-. deishiiyatayut'eyun' saamyavaadattinr'eyun' meilang-kiyand-inj-nj-u manushhyavarâggatte vn'shaat'isthaanattilâ maatrn' nookkikkond-t' raktattinr'e parishuddhi parigand-ichch aaryavarâggattinuveind-t'i lookaadhipatyn' neit'iyet'ukkaanâ shramichcha oru vanâkit'a kalaapakaariyun' manushhyavidveishhiyun' hin'saamuurâttiyumaayirunnu addeiha menn janang-ng-al'â potuve vishvasikkunnu. hir'r'lar'ut'e cheytikal'ut'e duushhyaphalang-ng-al'â neirit't' anubhavichchavarilâ innavasheishhichchit't'ul'l'avarâkk dashaabda ng-ng-al'âkkusheishhavun' nj-et't'alund-t'aakkunna oru bhiikarasatvamaand- addeihn'. onnaan' lookayuddhattinusheishhn' vijayiraajyang-ng-al'â jarâmaniyut'emeilâ eilpichcha saampattikavun' bhuumishaastraparavun' maanasikavumaaya aaghaatang-ng-al'â und-arâtti vit't'a pratikaaradaahn' aakaarn'puund-t'ataayirunnu addeihn' ennu vichaarikku nnavarun' kur'avalla. lookaavasaanakaalatte sarâvavinaashashaktiyaaya kalâkki</field> </row> <row> <field name="BookCode">0009</field> <field name="PageNo">1</field> <field name="ImageLoc">/extra2/Annotation/0009_Meinkamph_Img_600_Deskew/0009_MeinKamph_Img_600_Deskew_Page_0001.tif</field> <field name="Text">ad'ool'âph hir'r'larâ (1889-1945) pragalbhanaaya jarâmanâ seichchhaadhikaari. rand-t'aan' lookamahaayuddhattinr'e kaarand-a kkaaranâ. 1889-lâ aastriyayilâ janichchu. skuul'âpat'hann' kazinj-nj- kur'enaal'â chitra ng-ng-al'â varachchun' vir'r'un' nat'annu. 1913-lâ sainyattilâ cheirânnu. tut'arânn raashht'riiyatti lit'apet't', jarâmmanâ naashhand-alâ sooshhyalisr' (naasi) paarât't'iyut'e talavanaayi. krameind-a jarâmmaniyut'e chaanâsalarâ, prasid'anr' ennii padavikal'â neit'i(1933-â34). 1923-lâ jayililâvachch mainâ kaan'ph (enr'e pooraat't'n') ezuti. it hir'r'lar'ut'e aatmakatha yaand-. charitrapraadhaanyamul'l'a ii krxti rand-t'u bhaagamaayi 1925-â'26-laand- prasiddhiika richchat. rand-t'aan' lookamahaayuddhattilâ paraajayappet't'appool'â 1945 eiprilâ 30-n aatmahatya cheytu. </field> </row> </resultset>
