Cloudera specifics aside, some of these things have been on my personal "back burner" todo list; I just haven't as of yet been able to bring them to the front burner ;-)
Please feel free to create issues for these, and provide patches as you're able. Thanks for this useful feedback, Phil On Thu, Oct 11, 2018 at 8:43 AM Lars Francke <lars.fran...@gmail.com> wrote: > Hi again, > > (long intro, jump to the line with the XXXXX if you're not interested in > the background) > > I'm sorry for all the mails, they all come from my quest to get Knox > packaged up as a Cloudera Parcel plus a CSD (Custom Service Descriptor) to > run it from Cloudera Manager (CM) similar to what Ambari currently allows. > I did the same for NiFi and was facing similar issues there (e.g. < > https://issues.apache.org/jira/browse/NIFI-5350>, < > https://issues.apache.org/jira/browse/NIFI-5573> and others). > > For people unfamiliar with Cloudera Manager I'll explain how it works. That > should make it clearer why I have the issues I describe below. > > Cloudera Manager extracts the "things" it manages into a directory (e.g. > /opt/cloudera/parcels/KNOX) and they are owned by root:root. This is not to > be changed by any process (e.g. no configuration file changes, no changing > of symlinks, no storing of PIDs, logs etc.). > > Every time a process (e.g. Knox Gateway) is started CM creates a new > directory (/var/run/cloudera-scm-agent/process/XXX) where it copies/creates > the necessary config files + keytabs for _this run_ of the tool. > > It then starts the processes by pointing them at this directory, so they > can pick up their config there and it also captures stdout & stderr in this > folder. > > This is different from Ambari. Ambari extracts Knox and creates symlinks > from its conf, logs, pids and data directory to /etc/XXX. This is possible > here because those directories (/etc/...) don't change. > > > XXXXXXXXX > > > The problems I have with Knox so far (I'm sure I'll find more the further I > get) are: > > * gateway.sh has no way to take in options from the "outside". With Hadoop, > HBase, (now) NiFi you can pass in arbitrary Java options using variables > like HADOOP_JAVA_OPTS and similar. > > In theory all the "setup" is already there for Knox as well using variables > like APP_CONF_DIR but unfortunately, they get set to hardcoded values at > the beginning of the script. > > Proposal: Add at least a APP_JAVA_OPTS variable so I can pass in arbitrary > stuff to be added to the Java command line. But really, I'd love to just > remove the defaults for APP_LOG_DIR etc. IFF they are already set > externally > > * gateway.sh checks whether various directories exist. These are hardcoded > (e.g. APP_HOME_DIR/conf). But those directories are configurable using > GATEWAY_HOME etc. so those checks should either be removed or fixed, so > they take those variables into account > > * knoxcli create-master takes a --master argument which I only found out by > looking at Ambari. The source says it's for testing only. It seems as if > that should be documented though. I think it's pretty useful to allow the > master being created non-interactively > > * gateway.sh does allow one thing to be overridden externally and that is > the pid dir using ENV_PID_DIR. Unfortunately, knox-env.sh (which is being > sourced unconditionally) overrides this variable with an empty value. I > think this line should just be removed from knox-env.sh > > * Launcher looks for a file called gateway.cfg but it always and > unconditionally looks in its "own" directory (launcherDir). I need a way to > point this to a different location. It allows me to define GATEWAY_HOME as > a system property. While I can also define that as an environment variable > the System property is checked first. And if it finds a gateway-site.xml > there it uses that. I need it to use the one from the environment variable. > > * gateway.sh allows the process to run in the foreground but still captures > stdout & stderr to files. I would argue that it makes more sense to leave > them as is and print them to the console instead. > > I'm happy to create issues for all of these and also provide patches for > some/all of them depending on my available time. I just wanted to bring > this up before I started to see if anyone has any better ideas and/or > things that I might have missed. > > Thanks for reading! > > Cheers, > Lars >