replace illegal xml characters
hi! I am working with InDesign exported xml and parse it in a python application. I learned here: http://boodebr.org/main/python/all-about-python-and-unicode that there actually are sets of illegal unicode characters for xml (and henceforth for every compliant xml parser). I already implemented a regex solution to replace the characters in question, but I wonder if there is a efficient and out-of-the-box solution somewhere out there for this problem. does anybody know? thanks! gabriel -- http://mail.python.org/mailman/listinfo/python-list
Re: replace illegal xml characters
In [EMAIL PROTECTED], killkolor wrote: I am working with InDesign exported xml and parse it in a python application. I learned here: http://boodebr.org/main/python/all-about-python-and-unicode that there actually are sets of illegal unicode characters for xml (and henceforth for every compliant xml parser). I already implemented a regex solution to replace the characters in question, but I wonder if there is a efficient and out-of-the-box solution somewhere out there for this problem. does anybody know? Does InDesign export broken XML documents? What exactly is your problem? Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list
Re: replace illegal xml characters
Does InDesign export broken XML documents? What exactly is your problem? yes, unfortunately it does. it uses all possible unicode characters, though not all are alowed in valid xml (see link in the first post). in any way for my application i should be checking if the xml that comes in is valid and replace all non-valid characters. is there something out there to do this? -- http://mail.python.org/mailman/listinfo/python-list
Re: replace illegal xml characters
On Mar 21, 8:03 am, killkolor [EMAIL PROTECTED] wrote: Does InDesign export broken XML documents? What exactly is your problem? yes, unfortunately it does. it uses all possible unicode characters, though not all are alowed in valid xml (see link in the first post). in any way for my application i should be checking if the xml that comes in is valid and replace all non-valid characters. is there something out there to do this? You might be able to use Beautiful Soup: http://www.crummy.com/software/BeautifulSoup/ There are also some good examples for parsing XML at http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/ and the Dive Into Python site. Mike -- http://mail.python.org/mailman/listinfo/python-list
Re: replace illegal xml characters
killkolor wrote: Does InDesign export broken XML documents? What exactly is your problem? yes, unfortunately it does. it uses all possible unicode characters, though not all are alowed in valid xml (see link in the first post). in any way for my application i should be checking if the xml that comes in is valid and replace all non-valid characters. is there something out there to do this? I doubt it. Dealing with broken XML is nothing standard-modules should cope with. The link you provided has all you need - why not just use it? Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: replace illegal xml characters
killkolor wrote: Does InDesign export broken XML documents? What exactly is your problem? yes, unfortunately it does. it uses all possible unicode characters, though not all are alowed in valid xml (see link in the first post). Are you sure about this? Could you post a small example? If this is true, don't forget to file a bug report with Adobe too. --Irmen -- http://mail.python.org/mailman/listinfo/python-list