On Tue, Apr 23, 2013 at 11:53 AM, Roy Smith <r...@panix.com> wrote: > In article <mailman.944.1366680414.3114.python-l...@python.org>, > Rodrick Brown <rodrick.br...@gmail.com> wrote: > >> I would like some feedback on possible solutions to make this script run >> faster. > > If I had to guess, I would think this stuff: > >> line = line.replace('mediacdn.xxx.com', 'media.xxx.com') >> line = line.replace('staticcdn.xxx.co.uk', ' >> static.xxx.co.uk') >> line = line.replace('cdn.xxx', 'www.xxx') >> line = line.replace('cdn.xxx', 'www.xxx') >> line = line.replace('cdn.xx', 'www.xx') >> siteurl = line.split()[6].split('/')[2] >> line = re.sub(r'\bhttps?://%s\b' % siteurl, "", line, 1) > > You make 6 copies of every line. That's slow.
One of those is a regular expression substitution, which is also likely to be a hot-spot. But definitely profile. ChrisA -- http://mail.python.org/mailman/listinfo/python-list