On 11Nov2020 07:25, Chris Angelico <ros...@gmail.com> wrote: >If the main job of the program, as in this situation, is to read the >entire file, I would probably have it read in the first 1KB or 16KB or >thereabouts, see if that has any NUL bytes, and if not, proceed to >read in the rest of the file. But depending on the situation, I might >actually have a hard limit on the file size (say, "any file over 1GB >isn't what I'm looking for"), so that would reduce the risks too.
You could shoehorn my suggested code for this efficiently. It had a loop body like this: is_text = False try: # expect utf-8, fail if non-utf-8 bytes encountered with open(filename, encoding='utf-8', errors='strict') as f: for lineno, line in enumerate(f, 1): ... other checks on each line of the file ... if not line.endswith('\n'): raise ValueError("line %d: no trailing newline" lineno) if str.isprintable(line[:-1]): raise ValueError("line %d: not all printable" % lineno) # if we get here all checks passed, consider the file # to # be text is_text = True except Exception as e: print(filename, "not text", e) if not is_text: print("skip", filename) continue which scans the entire file to see if it is all text (criteria to be changed to suit the user, but I was going for clean strict utf-8 decode, all chars "printable"). Since we're doing that, we could accumulate the lines as we went and make the replacement in memory. If we get all the way out the bottom, rewrite the file. If memory is a concern, we could copy modified lines to a temporary file, and copy back if everything was good (or not if we make no replacements). Cheers, Cameron Simpson <c...@cskk.id.au> -- https://mail.python.org/mailman/listinfo/python-list