I have a function which guesses the likely encoding used by text files by reading the BOM (byte order mark) at the beginning of the file. A simplified version:
def guess_encoding_from_bom(filename, default): with open(filename, 'rb') as f: sig = f.read(4) if sig.startswith((b'\xFE\xFF', b'\xFF\xFE')): return 'utf_16' elif sig.startswith((b'\x00\x00\xFE\xFF', b'\xFF\xFE\x00\x00')): return 'utf_32' else: return default The idea is that you can call the function with a file name and a default encoding to return if one can't be guessed. I want to provide a default value for the default argument (a default default), but one which will unconditionally fail if you blindly go ahead and use it. E.g. I want to either provide a default: enc = guess_encoding_from_bom("filename", 'latin1') f = open("filename", encoding=enc) or I want to write: enc = guess_encoding_from_bom("filename") if enc == something: # Can't guess, fall back on an alternative strategy ... else: f = open("filename", encoding=enc) If I forget to check the returned result, I should get an explicit failure as soon as I try to use it, rather than silently returning the wrong results. What should I return as the default default? I have four possibilities: (1) 'undefined', which is an standard encoding guaranteed to raise an exception when used; (2) 'unknown', which best describes the result, and currently there is no encoding with that name; (3) None, which is not the name of an encoding; or (4) Don't return anything, but raise an exception. (But which exception?) Apart from option (4), here are the exceptions you get from blindly using options (1) through (3): py> 'abc'.encode('undefined') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.3/encodings/undefined.py", line 19, in encode raise UnicodeError("undefined encoding") UnicodeError: undefined encoding py> 'abc'.encode('unknown') Traceback (most recent call last): File "<stdin>", line 1, in <module> LookupError: unknown encoding: unknown py> 'abc'.encode(None) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: encode() argument 1 must be str, not None At the moment, I'm leaning towards option (1). Thoughts? -- Steven -- https://mail.python.org/mailman/listinfo/python-list