On Feb 24, 2:11 am, monkeys paw <mon...@joemoney.net> wrote: > if I have a string such as '<td>01/12/2011</td>' and i want > to reformat it as '20110112', how do i pull out the components > of the string and reformat them into a YYYYDDMM format? > > I have: > > import re > > test = re.compile('\d\d\/') > f = open('test.html') # This file contains the html dates > for line in f: > if test.search(line): > # I need to pull the date components here
I second using an html parser to extact the content of the TD's, but I would also go one step further reformatting and do something such as: >>> from time import strptime, strftime >>> d = '01/12/2011' >>> strftime('%Y%m%d', strptime(d, '%m/%d/%Y')) '20110112' That way you get some validation about the data, ie, if you get '13/12/2011' you've probably got mixed data formats. hth Jon. -- http://mail.python.org/mailman/listinfo/python-list