m_ahlenius wrote: > I am using Python 2.6.5. > > Unfortunately I don't have other versions installed so its hard to > test with a different version. > > As for the log compression, its a bit hard to test. Right now I may > process 100+ of these logs per night, and will get maybe 5 which are > reported as corrupt (typically a bad CRC) and 2 which it reported as a > bad tar archive. This morning I checked each of the 7 reported > problem files by manually opening them with "tar -xzvof" and they were > all indeed corrupt. Sign.
So many corrupted files? I'd say you have to address the problem with your infrastructure first. > Unfortunately due to the nature of our business, I can't post the data > files online, I hope you can understand. But I really appreciate your > suggestions. > > The thing that gets me is that it seems to work just fine for most > files, but then not others. Labeling normal files as corrupt hurts us > as we then skip getting any log data from those files. > > appreciate all your help. I've written an autocorruption script, import sys import subprocess import tarfile def process(source, dest, data): for pos in range(len(data)): for bit in range(8): new_data = data[:pos] + chr(ord(data[pos]) ^ (1<<bit)) + data[pos+1:] assert len(data) == len(new_data) out = open(dest, "w") out.write(new_data) out.close() try: t = tarfile.open(dest) for f in t: t.extractfile(f) except Exception, e: if 0 == subprocess.call(["tar", "-xf", dest]): return pos, bit if __name__ == "__main__": source, dest = sys.argv[1:] data = open(source).read() print process(source, dest, data) and I can indeed construct an archive that is rejected by tarfile, but not by tar. My working hypothesis is that the python library is a bit stricter in what it accepts... Peter -- http://mail.python.org/mailman/listinfo/python-list