An issue has come up regarding PR 789 that I feel should be a community
discussion.

The PR takes configuration preferences, and converts them into environment
variables, in particular for VMs launched as independent processes.  It is
not requested functionality and nothing else depends on it.

I don't think we should merge this until there's an interface to support
it, because I think its going to lead to additional user misconfiguration
issues in an area where we already have too many.

Right now, Zeppelin allows many configuration options to be set in 2, 3, 4,
or 5 places (depending on the option), without any indication to the user
of what's being used, any check for conflicts, etc.

The problem arises when a user inputs conflicting configuration choices in
different places.  This is actually very easy to do because the same
options are set in so many places, and I've seen it more than a few times.
If there's a config conflict, Zeppelin will behave in an unexpected manner
or fail, and the problem becomes difficult to diagnose because the user is
*sure* they've set that configuration option correctly (which, of course,
they have).

This is a design issue:  We haven't specified an order of precedence.  The
example I've given on multiple occasions is what happens if a user
specifies one SPARK_HOME and a different spark.home?  Right now (unless its
changed recently) the Spark interpreter will use SPARK_HOME, but the
PySpark Interpreter -- which wants to connect through the Spark Interpreter
-- will use spark.home.  The range of failure modes is obvious.

This PR is likely to make that worse, because it converts some
configuration options, but not others, into environment variables that,
depending on Zeppelin's configuration, will or won't get propagated to some
but not all launched interpreters.   That's terribly complex, and its sure
to make a confusing situation even more confusing.

I therefore think we should wait on this until we have an interface to
support it -- one that clearly indicates to the user what configuration
settings are being taken from where, and why.

Reply via email to