Hello, I'm using XML Reader (xml.sax.xmlreader.XMLReader) to create an rss reader.
I can parse the file but am unsure how to extract the elements I require. For example: For each <item> element I want the title and description. I have some stub code; I want to create a list of objects which include a title and description. I have the following code (a bit hacked up): import sys from xml.sax import make_parser from xml.sax import handler class rssObject(object): objectList=[] def addObject(self,object): rssObject.objectList.append(object) class rssObjectDetail(object): title = "" content = "" class SimpleHandler(handler.ContentHandler): def startElement(self,name,attrs): print name def endElement(self,name): print name def characters(self,data): print data class SimpleDTDHandler(handler.DTDHandler): def notationDecl(self,name,publicid,systemid): print "Notation: " , name, publicid, systemid def unparsedEntityDecl(self,name,publicid,systemid): print "UnparsedEntity: " , name, publicid, systemid, ndata p= make_parser() c = SimpleHandler() p.setContentHandler(c) p.setDTDHandler(SimpleDTDHandler()) p.parse('topstories.xml') And am using this xml file: <?xml version="1.0"?> <rss version="2.0"> <channel> <title>Stuff.co.nz - Top Stories</title> <link>http://www.stuff.co.nz</link> <description>Top Stories from Stuff.co.nz. New Zealand, world, sport, business & entertainment news on Stuff.co.nz. </description> <language>en-nz</language> <copyright>Fairfax New Zealand Ltd.</copyright> <ttl>30</ttl> <image> <url>/static/images/logo.gif</url> <title>Stuff News</title> <link>http://www.stuff.co.nz</link> </image> <item id="4423924" count="1"> <title>Prince Harry 'wants to live in Africa'</title> <link>http://www.stuff.co.nz/4423924a10.html?source=RSStopstories_20080303 </link> <description>For Prince Harry it must be the ultimate dark irony: to be in such a privileged position and have so much opportunity, and yet be unable to fulfil a dream of fighting for the motherland.</description> <author>EDMUND TADROS</author> <guid isPermaLink="false">stuff.co.nz/4423924</guid> <pubDate>Mon, 03 Mar 2008 00:44:00 GMT</pubDate> </item> </channel> </rss> Is there something I'm missing? I can't figure out how to correctly interpret the document using the SAX parser. I'm sure I;'m missing something obvious :) Any tips or advice would be appreciated! Also advice on correctly implementing what I want to achieve would be appreciated as using objectList=[] in the ContentHandler seems like a hack. Thanks!
-- http://mail.python.org/mailman/listinfo/python-list