Greg Lindstrom wrote:

I have a file with varying length records. All but the first record, that is; it's always 107 bytes long. What I would like to do is strip out all linefeeds from the file, read the character in position 107 (the end of segment delimiter) and then replace all of the end of segment characters with linefeeds, making a file where each segment is on its own line.

Hmmmm... here's one way of doing it:

import mmap
import sys

DELIMITER_OFFSET = 107

data_file = file(sys.argv[1], "r+w")
data_file.seek(0, 2)
data_length = data_file.tell()
data = mmap.mmap(data_file.fileno(), data_length, access=mmap.ACCESS_WRITE)
delimiter = data[DELIMITER_OFFSET]

for index, char in enumerate(data):
    if char == delimiter:
        data[index] = "\n"

data.flush()

There are doubtless more efficient ways, like using mmap.mmap.find()
instead of iterating over every character but that's an exercise for
the reader. And personally I would make extra copies ANYWAY--not doing
so is asking for trouble.
--
Michael Hoffman
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to