Hey everyone, this may be a stupid question, but I noticed the following and as I'm pretty new to using xml and python, I was wondering if I could get an explanation.
Let's say I write a simple xml parser, for an xml file that just loads the content of each tag into a dict (the xml file doesn't have multiple hierarchies in it, it's flat other than the parent node) so we have <parent> <option1>foo</option1> <option2>bar</option2> . . . </parent> (I'm using xml.parsers.expat) the parser sets a flag that says it's in the parent, and sets the value of the current tag it's processing in the start tag handler. The character data handler sets a dictionary value like so: dictName[curTag] = data after I'm done processing the file, I print out the dict, and the first value is <a few bits of whitespace> : <a whole bunch of whitespace> There are comments in the xml file - is this what is causing this? There are also blank lines. . .but I don't see how a blank line would be interpreted as a tag. Comments though, I could see that happening. Actually, I just did a test on an xml file that had no comments or whitespace and got the same behaviour. If I feed it the following xml file: <options> <one>hey</one> <two>bee</two> <three>eff</three> </options> it prints out: " : three : eff two : bee one : hey" wtf. For reference, here's the handler functions: def handleCharacterData(self, data): if self.inOptions and self.curTag != "options": self.options[self.curTag] = data def handleStartElement(self, name, attributes): if name == "options": self.inOptions = True if self.inOptions: self.curTag = name def handleEndElement(self, name): if name == "options": self.inOptions = False self.curTag = "" Sorry if the whitespace in the code got mangled (fingers crossed...) -- http://mail.python.org/mailman/listinfo/python-list