Hi, My function is in the following way to handle file line by line. There are multiple error patterns defined and need to apply to each line. I use multiprocessing.Pool to handle the file in block.
The memory usage increases to 2G for a 1G file. And stays in 2G even after the file processing. File closed in the end. If I comment out the call to re_pat.match, memory usage is normal and keeps under 100Mb. am I using re in a wrong way? I cannot figure out a way to fix the memory leak. And I googled . def line_match(lines, errors) for error in errors: try: re_pat = re.compile(error['pattern']) except Exception: print_error continue for line in lines: m = re_pat.match(line) # other code to handle matched object def process_large_file(fo): p = multiprocessing.Pool() while True: lines = list(itertools.islice(fo, line_per_proc)) if not lines: break result = p.apply_async(line_match, args=(errors, lines)) Notes: I omit some code as I think the significant difference is with/without re_pat.match(...) Regards, -Meiling -- https://mail.python.org/mailman/listinfo/python-list