Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20424#discussion_r164649608 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -191,7 +191,20 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String daemon = pb.start() val in = new DataInputStream(daemon.getInputStream) - daemonPort = in.readInt() + try { + daemonPort = in.readInt() + } catch { + case exc: EOFException => + throw new IOException(s"No port number in $daemonModule's stdout") + } + + // test that the returned port number is within a valid range. + // note: this does not cover the case where the port number + // is arbitrary data but is also coincidentally within range + if (daemonPort < 1 || daemonPort > 0xffff) { --- End diff -- I mean, I left my sign-off because what we do is basically move the _same_ check (`java.net.InetSocketAddress.checkPort`) ahead and another one is simply to wraps an exception, `EOFException`. I think we are here safe in theory. I got your point of reserved ports and now the condition became narrower. I should check other things like which error it produces before in this case and if the current error message is nicer. Also, this seems not completely addressing the concerns about it. I was wondering if this is worth doing these stuff. If you strongly prefer this, I won't stay against but may request few more investigations.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org