replace illegal xml characters

2007-03-21 Thread killkolor
hi!

I am working with InDesign exported xml and parse it in a python
application. I learned here: 
http://boodebr.org/main/python/all-about-python-and-unicode
that there actually are sets of illegal unicode characters for xml
(and henceforth for every compliant xml parser). I already implemented
a regex solution to replace the characters in question, but I wonder
if there is a efficient and out-of-the-box solution somewhere out
there for this problem. does anybody know?

thanks!
gabriel

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: replace illegal xml characters

2007-03-21 Thread Marc 'BlackJack' Rintsch
In [EMAIL PROTECTED], killkolor
wrote:

 I am working with InDesign exported xml and parse it in a python
 application. I learned here: 
 http://boodebr.org/main/python/all-about-python-and-unicode
 that there actually are sets of illegal unicode characters for xml
 (and henceforth for every compliant xml parser). I already implemented
 a regex solution to replace the characters in question, but I wonder
 if there is a efficient and out-of-the-box solution somewhere out
 there for this problem. does anybody know?

Does InDesign export broken XML documents?  What exactly is your problem?

Ciao,
Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: replace illegal xml characters

2007-03-21 Thread killkolor
 Does InDesign export broken XML documents?  What exactly is your problem?

yes, unfortunately it does. it uses all possible unicode characters,
though not all are alowed in valid xml (see link in the first post).
in any way for my application i should be checking if the xml that
comes in is valid and replace all non-valid characters. is there
something out there to do this?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: replace illegal xml characters

2007-03-21 Thread kyosohma
On Mar 21, 8:03 am, killkolor [EMAIL PROTECTED] wrote:
  Does InDesign export broken XML documents?  What exactly is your problem?

 yes, unfortunately it does. it uses all possible unicode characters,
 though not all are alowed in valid xml (see link in the first post).
 in any way for my application i should be checking if the xml that
 comes in is valid and replace all non-valid characters. is there
 something out there to do this?

You might be able to use Beautiful Soup:

http://www.crummy.com/software/BeautifulSoup/

There are also some good examples for parsing XML at
http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/

and the Dive Into Python site.


Mike

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: replace illegal xml characters

2007-03-21 Thread Diez B. Roggisch
killkolor wrote:

 Does InDesign export broken XML documents?  What exactly is your problem?
 
 yes, unfortunately it does. it uses all possible unicode characters,
 though not all are alowed in valid xml (see link in the first post).
 in any way for my application i should be checking if the xml that
 comes in is valid and replace all non-valid characters. is there
 something out there to do this?

I doubt it. Dealing with broken XML is nothing standard-modules should cope
with. The link you provided has all you need - why not just use it?


Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: replace illegal xml characters

2007-03-21 Thread Irmen de Jong
killkolor wrote:
 Does InDesign export broken XML documents?  What exactly is your problem?
 
 yes, unfortunately it does. it uses all possible unicode characters,
 though not all are alowed in valid xml (see link in the first post).

Are you sure about this? Could you post a small example?

If this is true, don't forget to file a bug report with Adobe too.

--Irmen
-- 
http://mail.python.org/mailman/listinfo/python-list