(Damn gmane's authorizor, I think I lost four postings because the auth messages went to my work email address (and I thought the authorization was supposed to be one-time only per group anyway??). I deleted them as spam since I hadn't posted from there for days :-( Grrr. At least I could reconstruct this one...)
"bruce" <[EMAIL PROTECTED]> writes: > for guys with python/xpath expertise.. > > i'm playing with xpath.. and i'm trying to solve an issue... > > i have the following kind of situation where i'm trying to get certain data. > > i have a bunch of tr/td... > > i can create an xpath, that gets me all of the tr.. i only want to get the > sibling tr up until i hit a 'tr' that has a 'th' anybody have an idea as to > how this query might be created?.. [...] ((//tr/th)[2]/../following-sibling::tr/td/..)[count(.|((//tr/th)[3]/../preceding-sibling::*))=count((//tr/th)[3]/../preceding-sibling::*)] which makes use of the following idiom for writing an intersection: $set1[count(.|$set2)=count($set2)] and gets the second group in the sequence you describe. IMHO, this illustrates what happens when XPath is pushed too far ;-) I don't see an easier way, but perhaps I missed one. Example code: (Note that the expression used here doesn't get any trailing group of tr elements if there's no terminating tr/th -- that fits your specification, but may not be what you really wanted. To fix that, meditate on the above expression for an hour or two <0.8 wink>.) #--------------------------------------------------------- def xpath(path, source): import StringIO import pprint from lxml import etree f = StringIO.StringIO(source) tree = etree.parse(f) r = tree.xpath(path) #return "\n".join(etree.tostring(el) for el in r) return pprint.pformat([etree.tostring(el) for el in r]) simple = """\ <html> <tr><th>A</th></tr> <tr><td>B</td></tr> <tr><td>C</td></tr> <tr><th>D</th></tr> <tr><td>E</td></tr> <tr><td>F</td></tr> <tr><th>G</th></tr> <tr><td>H</td></tr> <tr><td>I</td></tr> </html> """ for i in range(3): expr = '((//tr/th)[%s]/../following-sibling::tr/td/..)[count(.|((//tr/th)[%s]/../preceding-sibling::*))=count((//tr/th)[%s]/../preceding-sibling::*)]' % (i+1, i+2, i+2) print "---------------------" print xpath(expr, simple) #--------------------------------------------------------- john[0]$ tst.py --------------------- ['<tr><td>B</td></tr>\n', '<tr><td>C</td></tr>\n'] --------------------- ['<tr><td>E</td></tr>\n', '<tr><td>F</td></tr>\n'] --------------------- [] Knowing what you're doing, though, you'd probably be better off with BeautifulSoup than XPath. Also note that mechanize (which I know you're using) only supports BeautifulSoup 2 at present. You can't use BeautifulSoup 3 yet (I hope to fix that 'RSN'). John -- http://mail.python.org/mailman/listinfo/python-list