On 06/10/2010 22:41, jay thompson wrote:
Hello everyone,I'm trying to extract some data from a large memory mapped file (the largest is ~30GB) with re.finditer() and re.start(). Pythons regular expression module is great but the size of re.start() is 32bits (signed so I can really only address 2GB). I was wondering if any here had some suggestions on how to get the long offsets I need. btw... I can't break up the file because the pattern I'm looking for can occur anywhere and on any boundry. Also, is seek() limited to 32bit addresses? this is what I have in python 2.7 AMD64: with open(file_path, 'r+b') as file: file_map = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) file_map.seek(0) pattern = re.compile("pattern") for iii in re.finditer(pattern, file_map): offset = iii.start() write_to_sqlite(offset)
I would've thought that a 64-bit version of Python would have 64-bit offsets. Is that not the case? -- http://mail.python.org/mailman/listinfo/python-list
