This sounds like a worthwhile piece of work, Lars.
Would the "parcel" need to be added to the Knox project?

+1 to Phil's response.


On Thu, Oct 11, 2018 at 9:33 AM Phil Zampino <pzamp...@apache.org> wrote:

> Cloudera specifics aside, some of these things have been on my personal
> "back burner" todo list; I just haven't as of yet been able to bring them
> to the front burner ;-)
>
> Please feel free to create issues for these, and provide patches as you're
> able.
>
> Thanks for this useful feedback,
>    Phil
>
>
> On Thu, Oct 11, 2018 at 8:43 AM Lars Francke <lars.fran...@gmail.com>
> wrote:
>
> > Hi again,
> >
> > (long intro, jump to the line with the XXXXX if you're not interested in
> > the background)
> >
> > I'm sorry for all the mails, they all come from my quest to get Knox
> > packaged up as a Cloudera Parcel plus a CSD (Custom Service Descriptor)
> to
> > run it from Cloudera Manager (CM) similar to what Ambari currently
> allows.
> > I did the same for NiFi and was facing similar issues there (e.g. <
> > https://issues.apache.org/jira/browse/NIFI-5350>, <
> > https://issues.apache.org/jira/browse/NIFI-5573> and others).
> >
> > For people unfamiliar with Cloudera Manager I'll explain how it works.
> That
> > should make it clearer why I have the issues I describe below.
> >
> > Cloudera Manager extracts the "things" it manages into a directory (e.g.
> > /opt/cloudera/parcels/KNOX) and they are owned by root:root. This is not
> to
> > be changed by any process (e.g. no configuration file changes, no
> changing
> > of symlinks, no storing of PIDs, logs etc.).
> >
> > Every time a process (e.g. Knox Gateway) is started CM creates a new
> > directory (/var/run/cloudera-scm-agent/process/XXX) where it
> copies/creates
> > the necessary config files + keytabs for _this run_ of the tool.
> >
> > It then starts the processes by pointing them at this directory, so they
> > can pick up their config there and it also captures stdout & stderr in
> this
> > folder.
> >
> > This is different from Ambari. Ambari extracts Knox and creates symlinks
> > from its conf, logs, pids and data directory to /etc/XXX. This is
> possible
> > here because those directories (/etc/...) don't change.
> >
> >
> > XXXXXXXXX
> >
> >
> > The problems I have with Knox so far (I'm sure I'll find more the
> further I
> > get) are:
> >
> > * gateway.sh has no way to take in options from the "outside". With
> Hadoop,
> > HBase, (now) NiFi you can pass in arbitrary Java options using variables
> > like HADOOP_JAVA_OPTS and similar.
> >
> > In theory all the "setup" is already there for Knox as well using
> variables
> > like APP_CONF_DIR but unfortunately, they get set to hardcoded values at
> > the beginning of the script.
> >
> > Proposal: Add at least a APP_JAVA_OPTS variable so I can pass in
> arbitrary
> > stuff to be added to the Java command line. But really, I'd love to just
> > remove the defaults for APP_LOG_DIR etc. IFF they are already set
> > externally
> >
> > * gateway.sh checks whether various directories exist. These are
> hardcoded
> > (e.g. APP_HOME_DIR/conf). But those directories are configurable using
> > GATEWAY_HOME etc. so those checks should either be removed or fixed, so
> > they take those variables into account
> >
> > * knoxcli create-master takes a --master argument which I only found out
> by
> > looking at Ambari. The source says it's for testing only. It seems as if
> > that should be documented though. I think it's pretty useful to allow the
> > master being created non-interactively
> >
> > * gateway.sh does allow one thing to be overridden externally and that is
> > the pid dir using ENV_PID_DIR. Unfortunately, knox-env.sh (which is being
> > sourced unconditionally) overrides this variable with an empty value. I
> > think this line should just be removed from knox-env.sh
> >
> > * Launcher looks for a file called gateway.cfg but it always and
> > unconditionally looks in its "own" directory (launcherDir). I need a way
> to
> > point this to a different location. It allows me to define GATEWAY_HOME
> as
> > a system property. While I can also define that as an environment
> variable
> > the System property is checked first. And if it finds a gateway-site.xml
> > there it uses that. I need it to use the one from the environment
> variable.
> >
> > * gateway.sh allows the process to run in the foreground but still
> captures
> > stdout & stderr to files. I would argue that it makes more sense to leave
> > them as is and print them to the console instead.
> >
> > I'm happy to create issues for all of these and also provide patches for
> > some/all of them depending on my available time. I just wanted to bring
> > this up before I started to see if anyone has any better ideas and/or
> > things that I might have missed.
> >
> > Thanks for reading!
> >
> > Cheers,
> > Lars
> >
>

Reply via email to