Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r164637553 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +191,20 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) - daemonPort = in.readInt() + try { + daemonPort = in.readInt() + } catch { + case exc: EOFException => + throw new IOException(s"No port number in $daemonModule's stdout") + } + + // test that the returned port number is within a valid range. + // note: this does not cover the case where the port number + // is arbitrary data but is also coincidentally within range + if (daemonPort < 1 || daemonPort > 0xffff) { + throw new IOException(s"Bad port number in $daemonModule's stdout: " + + f"0x$daemonPort%08x") --- End diff -- just a thought: this error message won't mean much to the typical user. Would it be sensible to tell the user exactly what python command to run themselves to figure out the problem? Eg. "unexpected stdout from /foo/bar/some/path/to/python -m /path/to/daemon.py". That's what would help with that sitecustomization.py case. Or not useful in general?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org