On Wed, Nov 2, 2011 at 9:17 PM, Terry Brown <terry_n_br...@yahoo.com> wrote:
> If it helps at all, the ElementTree model is that elements have > both .text and .tail attributes. [snip] > Pythons flagship XML library is lxml of course, http://lxml.de/, but it > basically uses ElementTree for element representation. I don't want to use specialized parsers for any language. The problem isn't parsing, it's code generation and (especially) checking. Edward P.S., here is the common tokenizer for the import code. It's hard to imagine anything simpler:: def tokenize (self,s): result,i,n = [],0,0 while i < len(s): progress = j = i if s[i] == '\n': i,kind = i+1,'nl' elif s[i].isspace(): i,kind = self.skipWs(s,i),'ws' elif self.startsComment(s,i): i,kind = self.skipComment(s,i),'comment' elif self.startsString(s,i): i,kind = self.skipString(s,i),'string' elif self.startsId(s,i): i,kind = self.skipId(s,i),'id' else: i,kind = i+1,'other' assert progress < i and j == progress val = s[j:i] result.append((kind,val,n),) n += val.count('\n') # g.trace('%3s %7s %s' % (n,kind,repr(val[:20]))) return result But as I write this, I see that there is something simpler. There should be an isSpace method that replaces the test:: s[i].isspace() Put this test before the "raw" test for newline, and have the html version of skipWs suck all contiguous whitespace into a single 'ws' token whose val is always a single blank. This does *not* change the code generators: it only simplifies the checking logic, which is where the complications are. For html, there is still the problem of leading whitespace in comments. The answer to *that* is to replace all the skipX methods with skipXtoken methods. They will return (i,kind,val) with the obvious defaults in the base class that can be over-ridden in subclasses, especially the html parser class. This creates a general framework for doing any kind of token munging as required by the verification code. It's good, and good enough. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To post to this group, send email to leo-editor@googlegroups.com. To unsubscribe from this group, send email to leo-editor+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.