Xich Long Le created ZEPPELIN-4435:
--------------------------------------

             Summary: Zeppelin Spark interpreter stuck with connection refused
                 Key: ZEPPELIN-4435
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4435
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.8.2
            Reporter: Xich Long Le


We're using Spark interpreter with Isolated process per user and scoped process 
per notebook.

When a Spark job of a user crashed/exited.
 * User can't restart their interpreter from the Notebook UI.
 * Admin can't restart from interpreter page as well.
 * Spark interpreter still work fine for other users.
 * To get spark interpreter for said user to work, we have to restart zeppelin 
server process.

We noticed two things:
 * PID for spark interpreter written at ZEPPELIN_HOME/run/ does not mapped with 
user name. Ex: zeppelin-interpreter-spark-zeppelin-\{ZEPPELIN_HOST}.pid. Which 
means an interpreter start later will overwrite PID of the previous one.
 * Zeppelin server process still keep the interpreter port open although the 
relevant interpreter.sh process has exited already.

Restarting the whole service when one user has problem does not really fit 
multi-tenant environment. Just let me know which log/debug I can try to help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to