On Mar 24, 2005, at 8:35 AM, [EMAIL PROTECTED] wrote:

David Reed wrote:

There's probably a better mailing list with XML parsing experts. I'm
certainly not an expert but have done a little XML parsing.
I've always
followed the pattern of using startElement, characters and endElement
to grab all the data. In the startElement method you set a instance
variable to keep track of the current tag you are processing. You use
the characters method to build up the values and then in the
endElement
method you store the data in your data structure. See the pyxml HOWTO
for an example - specifically this section:
http://pyxml.sourceforge.net/topics/howto/node14.html

Yes, sure. Thanks, but that's not what I wanted to know. Perhaps I wasn't clear enough. It's not really so much XML related...

def startElement(self, name, attrs):
    self._queue.append(name) # keep the order of processed tags
    handler = str('_start_'+name)
    if hasattr(self, handler):
        self.__class__.__dict__[handler](self, attrs)

Is there a better syntax for self.__class__.__dict__[handler]?


You should be able to use getattr to get the method and then call it. That's a little cleaner IMO.


And where should the "output" go to?
All examples use print statements in the element handlers.


I'm not certain we are clear. Instead of output statements you set store the data in some instance variable - in your case it appears self.pages is your instance variable containing the data. So your endElement method would set something in self.pages based on the tag indicated and the data built up from the characters method and any of the attrs from the start tag. If all your data is in the attrs that you get in the startElement tag then there's no need to do anything in the characters or endElement methods. If you want to use the startElement/characters/endElement approach, I can try to find a small example I've written and send it to you off-list.


I wrote those get... methods - but I guess they don't belong in the XML handler, but perhaps in the parser or somewhere else.
It works, but I don't think it's good design.


def getPages(self):
    return self.pages.getSortedArray()

def getPage(self, no):
    return self.pages[no]

parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
pxh = MyHandler()
parser.setContentHandler(pxh)
parser.parse(dateiname)
for p in pxh.getPages(): ...

I should ask the last question on the twisted ML, I guess:

Further, if I'd like to use it in a twisted driven asynchronous app,
would I let the parser run in a thread? (Or how can I make
the parser non-blocking?)


I've never looked into twister so I can't answer this.

Dave

_______________________________________________
Pythonmac-SIG maillist  -  Pythonmac-SIG@python.org
http://mail.python.org/mailman/listinfo/pythonmac-sig

Reply via email to