In article <kaidnysdesvyi_jqnz2dnuvz_qgdn...@insightbb.com>,
 monkeys paw <mon...@joemoney.net> wrote:

> if I have a string such as '<td>01/12/2011</td>' and i want
> to reformat it as '20110112', how do i pull out the components
> of the string and reformat them into a YYYYDDMM format?
> 
> I have:
> 
> import re
> 
> test = re.compile('\d\d\/')
> f = open('test.html')  # This file contains the html dates
> for line in f:
>      if test.search(line):
>          # I need to pull the date components here

My first thought is that any attempt to parse HTML by using regex is 
doomed to failure.  HTML is meant to be parsed by an HTML parser.  
Python gives you several to pick from; the best that I know of is the 
third-party lxml package (http://lxml.de/).

My second thought is that my first thought was correct.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to