Re: [Tutor] Convert XML codes to normal text?
2009/3/4 Eric Dorsey dors...@gmail.com: d = feedparser.parse('http://snipt.net/dorseye/feed') x=0 for i in d['entries']: print d['entries'][x].title print d['entries'][x].summary print x+=1 Output Explode / Implode List gt;gt;gt; V = list(V) snip I know, for example, that the gt; code means , but what I don't know is how to convert it in all my data to show properly? In all the feedparser snip What you are looking for is unescape from saxutils. Example: from xml.sax.saxutils import unescape unescape('gt;') '' Greets Sander ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Convert XML codes to normal text?
*So here is my program, I'm pulling some information off of my Snipt feed .. * import feedparser d = feedparser.parse('http://snipt.net/dorseye/feed') x=0 for i in d['entries']: print d['entries'][x].title print d['entries'][x].summary print x+=1 *Output* Explode / Implode List gt;gt;gt; V = list(V) gt;gt;gt; V ['s', 'p', 'a', 'm', 'm', 'y'] gt;gt;gt; V = ''.join(V) gt;gt;gt; V 'spammy' gt;gt;gt; I know, for example, that the gt; code means , but what I don't know is how to convert it in all my data to show properly? In all the feedparser examples it just smoothly has the output correct (like in one the data was spanwhatever/span and it had the special characters just fine.) I didn't notice any special call on their feedparser.parse() and I can't seem to find anything in the feedparser documentation that addresses this. Has anyone run into this before? Thanks! ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Convert XML codes to normal text?
Eric Dorsey wrote: _So here is my program, I'm pulling some information off of my Snipt feed .._ snip I know, for example, that the gt; code means , but what I don't know is how to convert it in all my data to show properly? In all the feedparser examples it just smoothly has the output correct Why not str.replace()? mystring = mystring.replace('gt;', '') (like in one the data was spanwhatever/span and it had the special characters just fine.) I didn't notice any special call on their feedparser.parse() and I can't seem to find anything in the feedparser documentation that addresses this. Has anyone run into this before? Thanks! It's because gt; is not the same as . gt; is HTML escape sequence for , which means browser would substitute them to a real instead of considering it as part of html tags. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Convert XML codes to normal text?
On Wed, Mar 4, 2009 at 11:13 AM, Eric Dorsey dors...@gmail.com wrote: I know, for example, that the gt; code means , but what I don't know is how to convert it in all my data to show properly? I Feedparser returns the output in html only so except html tags and entities in the output. What you want is to Unescape HTML entities ( http://effbot.org/zone/re-sub.htm#unescape-html ) import feedparser import re, htmlentitydefs def unescape(text): def fixup(m): text = m.group(0) if text[:2] == #: # character reference try: if text[:3] == #x: return unichr(int(text[3:-1], 16)) else: return unichr(int(text[2:-1])) except ValueError: pass else: # named entity try: text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]) except KeyError: pass return text # leave as is return re.sub(#?\w+;, fixup, text) d = feedparser.parse('http://snipt.net/dorseye/feed') x=0 for i in d['entries']: print unescape(d['entries'][x].title) print unescape(d['entries'][x].summary) print x+=1 HTH, Senthil ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor