Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20424#discussion_r164637553
  
    --- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
    @@ -191,7 +191,20 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
             daemon = pb.start()
     
             val in = new DataInputStream(daemon.getInputStream)
    -        daemonPort = in.readInt()
    +        try {
    +          daemonPort = in.readInt()
    +        } catch {
    +          case exc: EOFException =>
    +            throw new IOException(s"No port number in $daemonModule's 
stdout")
    +        }
    +
    +        // test that the returned port number is within a valid range.
    +        // note: this does not cover the case where the port number
    +        // is arbitrary data but is also coincidentally within range
    +        if (daemonPort < 1 || daemonPort > 0xffff) {
    +          throw new IOException(s"Bad port number in $daemonModule's 
stdout: " +
    +            f"0x$daemonPort%08x")
    --- End diff --
    
    just a thought:
    
     this error message won't mean much to the typical user.  Would it be 
sensible to tell the user exactly what python command to run themselves to 
figure out the problem?  Eg. "unexpected stdout from 
/foo/bar/some/path/to/python -m /path/to/daemon.py".  That's what would help 
with that sitecustomization.py case.  Or not useful in general?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to