Le samedi 26 octobre 2019 17:49:57 UTC+2, Peter Otten a écrit : > Pascal wrote: > > > I have a small python (3.7.4) script that should open a log file and > > display its content but as you can see, an encoding error occurs : > > > > ----------------------- > > > > import fileinput > > import sys > > try: > > source = sys.argv[1:] > > except IndexError: > > source = None > > for line in fileinput.input(source): > > print(line.strip()) > > > > ----------------------- > > > > python3.7.4 myscript.py myfile.log > > Traceback (most recent call last): > > ... > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799: > > invalid continuation byte > > > > python3.7.4 myscript.py < myfile.log > > Traceback (most recent call last): > > ... > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799: > > invalid continuation byte > > > > ----------------------- > > > > I add the encoding hook to overcome the error but this time, the script > > reacts differently depending on the input used : > > > > ----------------------- > > > > import fileinput > > import sys > > try: > > source = sys.argv[1:] > > except IndexError: > > source = None > > for line in fileinput.input(source, > > openhook=fileinput.hook_encoded("utf-8", "ignore")): > > print(line.strip()) > > > > ----------------------- > > > > python3.7.4 myscript.py myfile.log > > first line of myfile.log > > ... > > last line of myfile.log > > > > python3.7.4 myscript.py < myfile.log > > Traceback (most recent call last): > > ... > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799: > > invalid continuation byte > > > > python3.7.4 myscript.py /dev/stdin < myfile.log > > first line of myfile.log > > ... > > last line of myfile.log > > > > python3.7.4 myscript.py - < myfile.log > > Traceback (most recent call last): > > ... > > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 799: > > invalid continuation byte > > > > ----------------------- > > > > does anyone have an explanation and/or solution ? > > '-' or no argument tell fileinput to use sys.stdin. This is already text > decoded using Python's default io-encoding, and the open hook is not called. > You can override the default encoding by setting the environment variable > > PYTHONIOENCODING=UTF8:ignore
yes, I just found this about it : https://bugs.python.org/issue26756 this modified script is ok in all cases : import io import fileinput import sys try: source = sys.argv[1:] except IndexError: source = None sys.stdin = io.TextIOWrapper(sys.stdin.buffer, errors='ignore') for line in fileinput.input(source, openhook=fileinput.hook_encoded('utf-8', 'ignore')): print(line.strip()) thanks for the tip ! -- https://mail.python.org/mailman/listinfo/python-list