Github user bersprockets commented on the issue: https://github.com/apache/spark/pull/20519 >yea but we can't simply flush and ignore the stdout specifically from sitecustomize unless we define a kind of an additional protocol like this because we can't simply distinguish if the output We might be able to distinguish between sitecustomize.py output and daemon.py output. Assuming the code in the sitecustomize.py is not multi-threaded, we can assume all output from sitecustomize.py comes *before* any output from daemon.py. Therefore, if daemon.py first prints a "magic number" or some other string that is unlikely to show up in sitecustomize.py output, PythonWorkerFactory.startDaemon() will know when daemon.py output starts. daemon.py would print the port number only after printing this magic value. For example: <pre> <junk from sitecustomize.py>daemon port: ^@^@\325 </pre> Once the scala code sees "daemon port: " in the launched process's stdout, it knows the next 4 bytes are the port number. However, if sitecustomize.py starts multi-threaded code (and if that's even possible, that's a corner-corner-corner case), its output could potentially be interleaved with the daemon's output. Also, I am not sure sitecustomize.py output is guaranteed to show up first in stdout, but it seems reasonable that it would.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org