Hi again,

(long intro, jump to the line with the XXXXX if you're not interested in
the background)

I'm sorry for all the mails, they all come from my quest to get Knox
packaged up as a Cloudera Parcel plus a CSD (Custom Service Descriptor) to
run it from Cloudera Manager (CM) similar to what Ambari currently allows.
I did the same for NiFi and was facing similar issues there (e.g. <
https://issues.apache.org/jira/browse/NIFI-5350>, <
https://issues.apache.org/jira/browse/NIFI-5573> and others).

For people unfamiliar with Cloudera Manager I'll explain how it works. That
should make it clearer why I have the issues I describe below.

Cloudera Manager extracts the "things" it manages into a directory (e.g.
/opt/cloudera/parcels/KNOX) and they are owned by root:root. This is not to
be changed by any process (e.g. no configuration file changes, no changing
of symlinks, no storing of PIDs, logs etc.).

Every time a process (e.g. Knox Gateway) is started CM creates a new
directory (/var/run/cloudera-scm-agent/process/XXX) where it copies/creates
the necessary config files + keytabs for _this run_ of the tool.

It then starts the processes by pointing them at this directory, so they
can pick up their config there and it also captures stdout & stderr in this
folder.

This is different from Ambari. Ambari extracts Knox and creates symlinks
from its conf, logs, pids and data directory to /etc/XXX. This is possible
here because those directories (/etc/...) don't change.


XXXXXXXXX


The problems I have with Knox so far (I'm sure I'll find more the further I
get) are:

* gateway.sh has no way to take in options from the "outside". With Hadoop,
HBase, (now) NiFi you can pass in arbitrary Java options using variables
like HADOOP_JAVA_OPTS and similar.

In theory all the "setup" is already there for Knox as well using variables
like APP_CONF_DIR but unfortunately, they get set to hardcoded values at
the beginning of the script.

Proposal: Add at least a APP_JAVA_OPTS variable so I can pass in arbitrary
stuff to be added to the Java command line. But really, I'd love to just
remove the defaults for APP_LOG_DIR etc. IFF they are already set externally

* gateway.sh checks whether various directories exist. These are hardcoded
(e.g. APP_HOME_DIR/conf). But those directories are configurable using
GATEWAY_HOME etc. so those checks should either be removed or fixed, so
they take those variables into account

* knoxcli create-master takes a --master argument which I only found out by
looking at Ambari. The source says it's for testing only. It seems as if
that should be documented though. I think it's pretty useful to allow the
master being created non-interactively

* gateway.sh does allow one thing to be overridden externally and that is
the pid dir using ENV_PID_DIR. Unfortunately, knox-env.sh (which is being
sourced unconditionally) overrides this variable with an empty value. I
think this line should just be removed from knox-env.sh

* Launcher looks for a file called gateway.cfg but it always and
unconditionally looks in its "own" directory (launcherDir). I need a way to
point this to a different location. It allows me to define GATEWAY_HOME as
a system property. While I can also define that as an environment variable
the System property is checked first. And if it finds a gateway-site.xml
there it uses that. I need it to use the one from the environment variable.

* gateway.sh allows the process to run in the foreground but still captures
stdout & stderr to files. I would argue that it makes more sense to leave
them as is and print them to the console instead.

I'm happy to create issues for all of these and also provide patches for
some/all of them depending on my available time. I just wanted to bring
this up before I started to see if anyone has any better ideas and/or
things that I might have missed.

Thanks for reading!

Cheers,
Lars

Reply via email to