This sounds like a worthwhile piece of work, Lars. Would the "parcel" need to be added to the Knox project?
+1 to Phil's response. On Thu, Oct 11, 2018 at 9:33 AM Phil Zampino <pzamp...@apache.org> wrote: > Cloudera specifics aside, some of these things have been on my personal > "back burner" todo list; I just haven't as of yet been able to bring them > to the front burner ;-) > > Please feel free to create issues for these, and provide patches as you're > able. > > Thanks for this useful feedback, > Phil > > > On Thu, Oct 11, 2018 at 8:43 AM Lars Francke <lars.fran...@gmail.com> > wrote: > > > Hi again, > > > > (long intro, jump to the line with the XXXXX if you're not interested in > > the background) > > > > I'm sorry for all the mails, they all come from my quest to get Knox > > packaged up as a Cloudera Parcel plus a CSD (Custom Service Descriptor) > to > > run it from Cloudera Manager (CM) similar to what Ambari currently > allows. > > I did the same for NiFi and was facing similar issues there (e.g. < > > https://issues.apache.org/jira/browse/NIFI-5350>, < > > https://issues.apache.org/jira/browse/NIFI-5573> and others). > > > > For people unfamiliar with Cloudera Manager I'll explain how it works. > That > > should make it clearer why I have the issues I describe below. > > > > Cloudera Manager extracts the "things" it manages into a directory (e.g. > > /opt/cloudera/parcels/KNOX) and they are owned by root:root. This is not > to > > be changed by any process (e.g. no configuration file changes, no > changing > > of symlinks, no storing of PIDs, logs etc.). > > > > Every time a process (e.g. Knox Gateway) is started CM creates a new > > directory (/var/run/cloudera-scm-agent/process/XXX) where it > copies/creates > > the necessary config files + keytabs for _this run_ of the tool. > > > > It then starts the processes by pointing them at this directory, so they > > can pick up their config there and it also captures stdout & stderr in > this > > folder. > > > > This is different from Ambari. Ambari extracts Knox and creates symlinks > > from its conf, logs, pids and data directory to /etc/XXX. This is > possible > > here because those directories (/etc/...) don't change. > > > > > > XXXXXXXXX > > > > > > The problems I have with Knox so far (I'm sure I'll find more the > further I > > get) are: > > > > * gateway.sh has no way to take in options from the "outside". With > Hadoop, > > HBase, (now) NiFi you can pass in arbitrary Java options using variables > > like HADOOP_JAVA_OPTS and similar. > > > > In theory all the "setup" is already there for Knox as well using > variables > > like APP_CONF_DIR but unfortunately, they get set to hardcoded values at > > the beginning of the script. > > > > Proposal: Add at least a APP_JAVA_OPTS variable so I can pass in > arbitrary > > stuff to be added to the Java command line. But really, I'd love to just > > remove the defaults for APP_LOG_DIR etc. IFF they are already set > > externally > > > > * gateway.sh checks whether various directories exist. These are > hardcoded > > (e.g. APP_HOME_DIR/conf). But those directories are configurable using > > GATEWAY_HOME etc. so those checks should either be removed or fixed, so > > they take those variables into account > > > > * knoxcli create-master takes a --master argument which I only found out > by > > looking at Ambari. The source says it's for testing only. It seems as if > > that should be documented though. I think it's pretty useful to allow the > > master being created non-interactively > > > > * gateway.sh does allow one thing to be overridden externally and that is > > the pid dir using ENV_PID_DIR. Unfortunately, knox-env.sh (which is being > > sourced unconditionally) overrides this variable with an empty value. I > > think this line should just be removed from knox-env.sh > > > > * Launcher looks for a file called gateway.cfg but it always and > > unconditionally looks in its "own" directory (launcherDir). I need a way > to > > point this to a different location. It allows me to define GATEWAY_HOME > as > > a system property. While I can also define that as an environment > variable > > the System property is checked first. And if it finds a gateway-site.xml > > there it uses that. I need it to use the one from the environment > variable. > > > > * gateway.sh allows the process to run in the foreground but still > captures > > stdout & stderr to files. I would argue that it makes more sense to leave > > them as is and print them to the console instead. > > > > I'm happy to create issues for all of these and also provide patches for > > some/all of them depending on my available time. I just wanted to bring > > this up before I started to see if anyone has any better ideas and/or > > things that I might have missed. > > > > Thanks for reading! > > > > Cheers, > > Lars > > >