Re: speed of string chunks file parsing

bearophileHUGS Mon, 06 Apr 2009 08:25:45 -0700

Hyunchul Kim:

> Following script do exactly what I want but I want to improve the speed.


This may be a bit faster, especially if sequences are long (code
untested):

import re
from collections import deque

def scanner1(deque=deque):
    result_seq = deque()
    cp_regular_expression = re.compile("^a complex regular expression
here$")
    for line in file(inputfile):
        if cp_regular_expression.match(line) and result_seq:
            yield result_list
            result_seq = deque()
        result_seq.append(line)
    yield result_seq

If the sequences are processed on the fly then you don't need to
create new ones and you can clear old ones, this may be a bit faster:

def scanner2(deque=deque):
    result_seq = deque()
    cp_regular_expression = re.compile("^a complex regular expression
here$")
    for line in file(inputfile):
        if cp_regular_expression.match(line) and result_seq:
            yield result_list
            result_seq.clear()
        result_seq.append(line)
    yield result_seq

Note that most of the time may be used by the regular expression,
often there are ways to speed it up using string methods, even as a
first faster approximate match too.

Bye,
bearophile
--
http://mail.python.org/mailman/listinfo/python-list

Re: speed of string chunks file parsing

Reply via email to