In article <kaidnysdesvyi_jqnz2dnuvz_qgdn...@insightbb.com>, monkeys paw <mon...@joemoney.net> wrote:
> if I have a string such as '<td>01/12/2011</td>' and i want > to reformat it as '20110112', how do i pull out the components > of the string and reformat them into a YYYYDDMM format? > > I have: > > import re > > test = re.compile('\d\d\/') > f = open('test.html') # This file contains the html dates > for line in f: > if test.search(line): > # I need to pull the date components here My first thought is that any attempt to parse HTML by using regex is doomed to failure. HTML is meant to be parsed by an HTML parser. Python gives you several to pick from; the best that I know of is the third-party lxml package (http://lxml.de/). My second thought is that my first thought was correct. -- http://mail.python.org/mailman/listinfo/python-list