Re: [Tutor] Convert XML codes to normal text?

2009-03-04 Thread Sander Sweers
2009/3/4 Eric Dorsey dors...@gmail.com:
 d = feedparser.parse('http://snipt.net/dorseye/feed')

 x=0
 for i in d['entries']:
     print d['entries'][x].title
     print d['entries'][x].summary
     print
     x+=1

 Output

 Explode / Implode List
 gt;gt;gt; V = list(V)

snip

 I know, for example, that the gt; code means , but what I don't know is
 how to convert it in all my data to show properly? In all the feedparser

snip

What you are looking for is unescape from saxutils. Example:

 from xml.sax.saxutils import unescape
 unescape('gt;')
''

Greets
Sander
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Convert XML codes to normal text?

2009-03-03 Thread Eric Dorsey
*So here is my program, I'm pulling some information off of my Snipt feed ..
*

import feedparser

d = feedparser.parse('http://snipt.net/dorseye/feed')

x=0
for i in d['entries']:
print d['entries'][x].title
print d['entries'][x].summary
print
x+=1

*Output*

Explode / Implode List
gt;gt;gt; V = list(V)
gt;gt;gt; V
['s', 'p', 'a', 'm', 'm', 'y']
gt;gt;gt; V = ''.join(V)
gt;gt;gt; V
'spammy'
gt;gt;gt;

I know, for example, that the gt; code means , but what I don't know is
how to convert it in all my data to show properly? In all the feedparser
examples it just smoothly has the output correct (like in one the data was
spanwhatever/span and it had the special characters just fine.) I didn't
notice any special call on their feedparser.parse() and I can't seem to find
anything in the feedparser documentation that addresses this. Has anyone run
into this before? Thanks!
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Convert XML codes to normal text?

2009-03-03 Thread Lie Ryan

Eric Dorsey wrote:
_So here is my program, I'm pulling some information off of my Snipt 
feed .._


snip

I know, for example, that the gt; code means , but what I don't know 
is how to convert it in all my data to show properly? In all the 
feedparser examples it just smoothly has the output correct 


Why not str.replace()?

mystring = mystring.replace('gt;', '')

(like in one 
the data was spanwhatever/span and it had the special characters 
just fine.) I didn't notice any special call on their feedparser.parse() 
and I can't seem to find anything in the feedparser documentation that 
addresses this. Has anyone run into this before? Thanks!
It's because gt; is not the same as . gt; is HTML escape sequence for 
, which means browser would substitute them to a real  instead of 
considering it as part of html tags.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Convert XML codes to normal text?

2009-03-03 Thread Senthil Kumaran
On Wed, Mar 4, 2009 at 11:13 AM, Eric Dorsey dors...@gmail.com wrote:
 I know, for example, that the gt; code means , but what I don't know is
 how to convert it in all my data to show properly? I

Feedparser returns the output in html only so except html tags and
entities in the output.
What you want is to Unescape HTML entities (
http://effbot.org/zone/re-sub.htm#unescape-html )

import feedparser
import re, htmlentitydefs

def unescape(text):
def fixup(m):
text = m.group(0)
if text[:2] == #:
# character reference
try:
if text[:3] == #x:
return unichr(int(text[3:-1], 16))
else:
return unichr(int(text[2:-1]))
except ValueError:
pass
else:
# named entity
try:
text = unichr(htmlentitydefs.name2codepoint[text[1:-1]])
except KeyError:
pass
return text # leave as is
return re.sub(#?\w+;, fixup, text)


d = feedparser.parse('http://snipt.net/dorseye/feed')

x=0
for i in d['entries']:
print unescape(d['entries'][x].title)
print unescape(d['entries'][x].summary)
print
x+=1



HTH,
Senthil
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor