On Sat, 2005-03-19 at 00:14 -0800, Sean McIlroy wrote: > I'm dealing with XML files in which there are lots of tags of the > following form: <a><b>x</b><c>y</c></a> (all of these letters are being > used as 'metalinguistic variables') Not all of the tags in the file are > of that form, but that's the only type of tag I'm interested in. (For > the insatiably curious, I'm talking about a conversation log from MSN > Messenger.) What I need to do is to pull out all the x's and y's in a > form I can use. In other words, from... > > . > . > <a><b>x1</b><c>y1</c></a> > . > . > <a><b>x2</b><c>y2</c></a> > . > . > <a><b>x3</b><c>y3</c></a> > . > . > > ...I would like to produce, for example,... > > [ (x1,y1), (x2,y2), (x3,y3) ] > > Now, I'm aware that there are extensive libraries for dealing with > marked-up text, but here's the thing: I think I have a reasonable > understanding of python, but I use it in a lisplike way, and in > particular I only know the rudiments of how classes work. So here's > what I'm asking for: > > Can anybody give me a rough idea how to come to grips with the problem > described above? Or even (dare to dream) example code? Any help will be > very much appreciated.
There are many tools you can use to get this done in Python. Here's a recipe using Amara ( http://www.xml.com/pub/a/2005/01/19/amara.html ) DOC = """\ <matrix> <a><b>x1</b><c>y1</c></a> <a><b>x2</b><c>y2</c></a> <a><b>x3</b><c>y3</c></a> </matrix> """ from amara import binderytools matrix = [] for row in binderytools.pushbind(u'a', string=DOC): matrix.append((unicode(row.b), unicode(row.c))) print matrix Which outputs: [(u'x1', u'y1'), (u'x2', u'y2'), (u'x3', u'y3')] If your matrix actually has a variable or previously unknown number of columns (e.g. <a><b>x1</b><c>y1</c><d>z1</d></a> ), the following version of the for loop is a more general solution: for row in binderytools.pushbind(u'a', string=DOC): matrix.append(tuple([ unicode(e) for e in row.xml_xpath(u'*') ])) Same output, of course. I even tested it for you in Amara 0.9.4. And what the heck, while I was there, I added it to the demos. You can make things even more obfuscated^H^H^H^H^H^H^H^H^H^Hterse using further lambda or list comp tricks, but I leave that as an exercise for the perverse ;-) -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html Writing and Reading XML with XIST - http://www.xml.com/pub/a/2005/03/16/py-xml.html Introducing the Amara XML Toolkit - http://www.xml.com/pub/a/2005/01/19/amara.ht Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 Querying WordNet as XML - http://www.ibm.com/developerworks/xml/library/x-think29.html Packaging XSLT lookup tables as EXSLT functions - http://www.ibm.com/developerworks/xml/library/x-tiplook2.html -- http://mail.python.org/mailman/listinfo/python-list