On 2006-08-21, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hi, I am having some difficulty trying to create a regular expression. > > Consider: > ><tag1 name="john"/> <br/> <tag2 value="adj__tall__"/> ><tag1 name="joe"/> ><tag1 name="jack"/> ><tag2 value="adj__short__"/> > > Whenever a tag1 is followed by a tag 2, I want to retrieve the > values of the tag1:name and tag2:value attributes. So my end > result here should be > > john, tall > jack, short > > Ideas?
It seems to me that an html parser might be a better solution. Here's a slapped-together example. It uses a simple state machine. from HTMLParser import HTMLParser class MyHTMLParser(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.state = "get name" self.name_attrs = None self.result = {} def handle_starttag(self, tag, attrs): if self.state == "get name": if tag == "tag1": self.name_attrs = attrs self.state = "found name" elif self.state == "found name": if tag == "tag2": name = None for attr in self.name_attrs: if attr[0] == "name": name = attr[1] adj = None for attr in attrs: if attr[0] == "value" and attr[1][:3] == "adj": adj = attr[1][5:-2] if name == None or adj == None: print "Markup error: expected attributes missing." else: self.result[name] = adj self.state = "get name" elif tag == "tag1": # A new tag1 overrides the old one self.name_attrs = attrs p = MyHTMLParser() p.feed(""" <tag1 name="john"/> <br/> <tag2 value="adj__tall__"/> <tag1 name="joe"/> <tag1 name="jack"/> <tag2 value="adj__short__"/> """) print repr(p.result) p.close() There's probably a better way to search for attributes in attr than "for attr in attrs", but I didn't think of it, and the example I found on the net used the same idiom. The format of attrs seems strange. Why isn't it a dictionary? -- Neil Cerutti Sermon Outline: I. Delineate your fear II. Disown your fear III. Displace your rear --Church Bulletin Blooper -- http://mail.python.org/mailman/listinfo/python-list