if I have a string such as '<td>01/12/2011</td>' and i want
to reformat it as '20110112', how do i pull out the components
of the string and reformat them into a YYYYDDMM format?

I have:

import re

test = re.compile('dd/')
f = open('test.html')  # This file contains the html dates
for line in f:
    if test.search(line):
        # I need to pull the date components here

I am no python guru but you could use beautifulsoup to parse html as its much easier

some untested pseudocode below. adapt to your needs.

from BeautifulSoup import BeautifulSoup

#read html data or whatever source
html_data = open('/yourwebsite/page.html','r').read()
#Create the soup object from the HTML data
soup = new BeautifulSoup(html_data)
someData = soup.find('td',name='someTable') #Find the proper tag see beautifulsoup docs value = someData.attrs[2][1] # the value of 3rd attrib of the tag , just an example

##end

now when you have the date in some str format the next thing is your date conversion. For this
re fer to dateutil parse http://labix.org/python-dateutil

hope it help.




----------------------------
posted via Grepler.com -- poster is authenticated.
begin 644 end

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to