[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...

bersprockets Sat, 10 Feb 2018 10:08:07 -0800

Github user bersprockets commented on the issue:

    https://github.com/apache/spark/pull/20519
  
    >yea but we can't simply flush and ignore the stdout specifically from 
sitecustomize unless we define a kind of an additional protocol like this 
because we can't simply distinguish if the output
    
    We might be able to distinguish between sitecustomize.py output and 
daemon.py output. Assuming the code in the sitecustomize.py is not 
multi-threaded, we can assume all output from sitecustomize.py comes *before* 
any output from daemon.py. Therefore, if daemon.py first prints a "magic 
number" or some other string that is unlikely to show up in sitecustomize.py 
output, PythonWorkerFactory.startDaemon() will know when daemon.py output 
starts. daemon.py would print the port number only after printing this magic 
value. For example:
    
    <pre>
    <junk from sitecustomize.py>daemon port: ^@^@\325
    </pre>
    
    Once the scala code sees "daemon port: " in the launched process's stdout, 
it knows the next 4 bytes are the port number.
    
    However, if sitecustomize.py starts multi-threaded code (and if that's even 
possible, that's a corner-corner-corner case), its output could potentially be 
interleaved with the daemon's output. Also, I am not sure sitecustomize.py 
output is guaranteed to show up first in stdout, but it seems reasonable that 
it would.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20519: [Spark-23240][python] Don't let python site customizatio...

Reply via email to