New submission from Thamme Gowda <tgow...@gmail.com>:
I ran into a line count mismatch bug and I narrowed it down to 9 lines where the line break handling is causing an issue. Please find the attachment named line_break_err.txt to reproduce the below. $ md5sum line_break_err.txt 5dea501b8e299a0ece94d85977728545 line_break_err.txt # wc says there are 9 lines $ wc -l line_break_err.txt 9 line_break_err.txt # if I read from sys.stdin, I get 9 lines $ python -c 'import sys; print(sum(1 for x in sys.stdin))' < line_break_err.txt # but... if I use a open() call, i get 18 $ python -c 'import sys; print("Linecount=", sum(1 for x in open(sys.argv[1])))' line_break_err.txt Linecount= 18 # changing encoding or error handling has no effect $ python -c 'import sys; print("Linecount=", sum(1 for x in open(sys.argv[1], "r", encoding="utf-8", errors="replace")))' line_break_err.txt Linecount= 18 $ python -c 'import sys; print("Linecount=", sum(1 for x in open(sys.argv[1], "r", encoding="utf-8", errors="ignore")))' line_break_err.txt Linecount= 18 # but, not just wc, even awk says there are only 9 lines $ awk 'END {print "Linecount=", NR}' line_break_err.txt Linecount= 9 # let's see python 2 using io # python2 -c 'import sys,io; print("Linecount=", sum(1 for x in io.open(sys.argv[1], encoding="ascii", errors="ignore")))' line_break_err.txt ('Linecount=', 18) # But this one which we no longer use somehow gets it right $ python2 -c 'import sys; print("Linecount=", sum(1 for x in open(sys.argv[1])))' line_break_err.txt ('Linecount=', 9) Tested it on 1. Linux Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] :: Anaconda, Inc. on linux 2. OSX Python 3.7.3 (default, Mar 27 2019, 16:54:48) [Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin 3. python 2 on OSX Python 2.7.16 (default, Jun 19 2019, 07:40:37) [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)] on darwin ---- P.S. this is my first issue created. If this issue is a duplicate, I am happy to close it. ---------- components: IO, Library (Lib), Unicode messages: 356224 nosy: Thamme Gowda, ezio.melotti, vstinner priority: normal severity: normal status: open title: Line count mis match between open() vs sys.stdin api calls type: behavior _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue38740> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com