Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22782#discussion_r227336318
  
    --- Diff: bin/docker-image-tool.sh ---
    @@ -79,7 +79,7 @@ function build {
       fi
     
       # Verify that Spark has actually been built/is a runnable distribution
    -  # i.e. the Spark JARs that the Docker files will place into the image 
are present
    --- End diff --
    
    For the issue itself, It's related to a historical reason for Python. 
Python 2 supported `str` type as bytes like string. It looked a mistake that 
confuses users about the concept between bytes and string, and then Python 3 
introduced `str` as unicode strings concepts like other programing languages.
    
    `open(...).read()` reads it as `str` (which is bytes) in Python 2 but it's 
read in unicode strings in Python 3 - where we need an implicit conversion 
between bytes and strings. Looks it had to be to minimise the breaking changes 
in users codes.
    
    So, bytes to string conversion happened here and unfortunately our 
Jenkins's system default encoding is set to ascii (even though arguably UTF-8 
is common).
    
    
    For non-ascii itself, please see the justification at 
http://www.scalastyle.org/rules-dev.html in ScalaStyle.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to