As nice as it would be to use 64bit offsets I am instead mmapping the file
in 1GB chunks and getting the results I need. I would still be interested in
a 64bit solution though.


On Wed, Oct 6, 2010 at 2:41 PM, jay thompson <>wrote:

> Hello everyone,
> I'm trying to extract some data from a large memory mapped file (the
> largest is ~30GB) with re.finditer() and re.start(). Pythons regular
> expression module is great but the size of re.start() is 32bits (signed so I
> can really only address 2GB).  I was wondering if any here had some
> suggestions on how to get the long offsets I need. btw... I can't break up
> the file because the pattern I'm looking for can occur anywhere and on any
> boundry.
> Also, is seek() limited to 32bit addresses?
> this is what I have in python 2.7 AMD64:
> with open(file_path, 'r+b') as file:
>     file_map = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)
>     pattern = re.compile("pattern")
>     for iii in re.finditer(pattern, file_map):
>         offset = iii.start()
>         write_to_sqlite(offset)
> --
> "It's quite difficult to remind people that all this stuff was here for a
> million years before people. So the idea that we are required to manage it
> is ridiculous. What we are having to manage is us."   ...Bill Ballantine,
> marine biologist.

"It's quite difficult to remind people that all this stuff was here for a
million years before people. So the idea that we are required to manage it
is ridiculous. What we are having to manage is us."   ...Bill Ballantine,
marine biologist.

Reply via email to