Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19665#discussion_r149030699 --- Diff: dev/run-tests.py --- @@ -289,7 +289,7 @@ def exec_sbt(sbt_args=()): stdin=echo_proc.stdout, stdout=subprocess.PIPE) echo_proc.wait() - for line in iter(sbt_proc.stdout.readline, ''): + for line in iter(sbt_proc.stdout.readline, b''): --- End diff -- This previous code causes an infinite loop in Python 3 because `''` is `str`; however, `sbt_proc.stdout.readline()` returns `b''`, `bytes` at the end: This can be tested as below: ```python import subprocess sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE) print(type(sbt_proc.stdout.readline())) ``` In Python 2: ``` >>> import subprocess >>> sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE) >>> print(type(sbt_proc.stdout.readline())) <type 'str'> ``` In Python 3: ``` >>> import subprocess >>> sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE) >>> print(type(sbt_proc.stdout.readline())) <class 'bytes'> ``` however, In Python 2: ```python >>> b'' == '' True >>> print(type(b''), type('')) (<type 'str'>, <type 'str'>) ``` In Python 3: ```python >>> b'' == '' False >>> print(type(b''), type('')) <class 'bytes'> <class 'str'> ``` The infinite loop can be tested as below, in Python 3: ```python import subprocess sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE) for line in iter(sbt_proc.stdout.readline, ''): print(line) ``` In Python 2, the codes above does not cause the infinite loop. This is also fine if we use `b''` for the sentinel, because `bytes` is an alias for `str` in Python 2. ```python import subprocess sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE) for line in iter(sbt_proc.stdout.readline, b''): print(line) ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org