<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >I want to scan a file byte for byte for occurences of the the four byte > pattern 0x00000100. I've tried with this: > > # start > import sys > > numChars = 0 > startCode = 0 > count = 0 > > inputFile = sys.stdin > > while True: > ch = inputFile.read(1) > numChars += 1 > > if len(ch) < 1: break > > startCode = ((startCode << 8) & 0xffffffff) | (ord(ch)) > if numChars < 4: continue > > if startCode == 0x00000100: > count = count + 1 > > print count > # end > > But it is very slow. What is the fastest way to do this? Using some > native call? Using a buffer? Using whatever? > > /David
How about something like: #!/usr/bin/env python import sys fn = 't.dat' ss = '\x00\x00\x01\x00' be = len(ss) - 1 # length of overlap to check blocksize = 4 * 1024 # need to ensure that blocksize > overlap fp = open(fn, 'rb') b = fp.read(blocksize) found = 0 while len(b) > be: if b.find(ss) != -1: found = 1 break b = b[-be:] + fp.read(blocksize) fp.close() sys.exit(found) -- http://mail.python.org/mailman/listinfo/python-list