On Feb 23, 9:11 pm, monkeys paw <mon...@joemoney.net> wrote: > if I have a string such as '<td>01/12/2011</td>' and i want > to reformat it as '20110112', how do i pull out the components > of the string and reformat them into a YYYYDDMM format? > > I have: > > import re > > test = re.compile('\d\d\/') > f = open('test.html') # This file contains the html dates > for line in f: > if test.search(line): > # I need to pull the date components here What you need are parentheses, which capture part of the text you're matching. Each set of parentheses creates a "group". To get to these groups, you need the match object which is returned by re.search. Group 0 is the entire match, group 1 is the contents of the first set of parentheses, and so forth. If the regex does not match, then re.search returns None.
DATA FILE (test.html): <table> <tr><td>David</td><td>02/19/1967</td></tr> <tr><td>Susan</td><td>05/23/1948</td></tr> <tr><td>Clare</td><td>09/22/1952</td></tr> <tr><td>BP</td><td>08/27/1990</td></tr> <tr><td>Roger</td><td>12/19/1954</td></tr> </table> CODE: import re rx_test = re.compile(r'<td>(\d{2})/(\d{2})/(\d{4})</td>') f = open('test.html') for line in f: m = rx_test.search(line) if m: new_date = m.group(3) + m.group(1) + m.group(2) print "raw text: ",m.group(0) print "new date: ",new_date print OUTPUT: raw text: <td>02/19/1967</td> new date: 19670219 raw text: <td>05/23/1948</td> new date: 19480523 raw text: <td>09/22/1952</td> new date: 19520922 raw text: <td>08/27/1990</td> new date: 19900827 raw text: <td>12/19/1954</td> new date: 19541219 -- http://mail.python.org/mailman/listinfo/python-list