So I did spend a few hours looking through the shell scripts, Launcher,
KnoxCLI, GatewayServer and GatewayConfigImpl.

Before I start work I'd like to get opinions to avoid unnecessary work.

There are lots of inconsistencies in the current code for handling
directories and properties. I'd love to make all of that as consistent as I
can.

The big question for me however is: What about backwards compatibility? If
I change the way properties are handled it might mean different behaviour
if people keep on using the same config as they are now.

I say _might_ because I assume that most people stick to the standards
(especially via Ambari) and are not using some of the ways to change
configs. Most of them aren't really documented anyway.

My current plan is as such:

- Make all paths configurable using a consistent way in which these paths
can be specified

- The new order should be:
  1) Java System Property
  2) Environment variable
  3) Config from gateway-default.xml & gateway-site.xm
  4) Some computed default

- This is not possible for all settings. Especially the location of the
gateway-site.xml can obviously not be configured in gateway-site.xml itself
:)

- Knox currently (using Invoker + Launcher) sets Java System Properties for
us coming from the gateway.cfg file. I think this is bad form because they
take precedent over environment variables and are set automatically on
first run and one needs to change them.

So I'd really like to do away with the concept of those cfg files in its
entirety and pass parameters explicitly. I looked at how the new Shell
scripts from Hadoop 3 work and would like to implement something very
similar here.

If we want to keep the cfg file then I would propose either not using it to
set Java System properties or at least to change the default gateway.cfg to
not set GATEWAY_HOME and log4j.configuration

And if we keep the file I'd also like to make its location configurable.

But really I'd like to do away with it. Happy to be convinced otherwise
though.

- gateway.sh will not hardcode any values anymore but use the same logic as
the code to determine final locations

- gateway.sh will allow customization via environment variables

- knox-env.sh will be renamed to knox-function.sh and a new empty
knox-env.sh will be created that basically serves as documentation for all
the environment variables that can be used to customize gateway.sh

- Document knoxcli.sh create-master --master

- Processes running in the foreground will not redirect stdout & stderr to
a file

Any concerns or comments?

On Thu, Oct 11, 2018 at 5:36 PM Kevin Risden <kris...@apache.org> wrote:

> Agree with the changes proposed. It is rather annoying to have to edit
> gateway.sh to apply changes when external environment variables are useful.
> This work would also help support a Docker image (KNOX-1285) since we would
> run into the same customization issues.
>
> Kevin Risden
>
>
> On Thu, Oct 11, 2018 at 11:12 AM Lars Francke <lars.fran...@gmail.com>
> wrote:
>
> > Thanks both of you.
> >
> > No, these changes would benefit me and this is my use case but we would
> not
> > need to add anything parcel or cloudera specific. And I think that while
> I
> > have a specific need they are useful in general as well.
> >
> > It would probably be an overhaul of gateway.sh, knox-env.sh and
> knoxcli.sh.
> > Maybe a few changes to Launcher.
> >
> >
> >
> > On Thu, Oct 11, 2018, 16:50 larry mccay <larry.mc...@gmail.com> wrote:
> >
> > > This sounds like a worthwhile piece of work, Lars.
> > > Would the "parcel" need to be added to the Knox project?
> > >
> > > +1 to Phil's response.
> > >
> > >
> > > On Thu, Oct 11, 2018 at 9:33 AM Phil Zampino <pzamp...@apache.org>
> > wrote:
> > >
> > > > Cloudera specifics aside, some of these things have been on my
> personal
> > > > "back burner" todo list; I just haven't as of yet been able to bring
> > them
> > > > to the front burner ;-)
> > > >
> > > > Please feel free to create issues for these, and provide patches as
> > > you're
> > > > able.
> > > >
> > > > Thanks for this useful feedback,
> > > >    Phil
> > > >
> > > >
> > > > On Thu, Oct 11, 2018 at 8:43 AM Lars Francke <lars.fran...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi again,
> > > > >
> > > > > (long intro, jump to the line with the XXXXX if you're not
> interested
> > > in
> > > > > the background)
> > > > >
> > > > > I'm sorry for all the mails, they all come from my quest to get
> Knox
> > > > > packaged up as a Cloudera Parcel plus a CSD (Custom Service
> > Descriptor)
> > > > to
> > > > > run it from Cloudera Manager (CM) similar to what Ambari currently
> > > > allows.
> > > > > I did the same for NiFi and was facing similar issues there (e.g. <
> > > > > https://issues.apache.org/jira/browse/NIFI-5350>, <
> > > > > https://issues.apache.org/jira/browse/NIFI-5573> and others).
> > > > >
> > > > > For people unfamiliar with Cloudera Manager I'll explain how it
> > works.
> > > > That
> > > > > should make it clearer why I have the issues I describe below.
> > > > >
> > > > > Cloudera Manager extracts the "things" it manages into a directory
> > > (e.g.
> > > > > /opt/cloudera/parcels/KNOX) and they are owned by root:root. This
> is
> > > not
> > > > to
> > > > > be changed by any process (e.g. no configuration file changes, no
> > > > changing
> > > > > of symlinks, no storing of PIDs, logs etc.).
> > > > >
> > > > > Every time a process (e.g. Knox Gateway) is started CM creates a
> new
> > > > > directory (/var/run/cloudera-scm-agent/process/XXX) where it
> > > > copies/creates
> > > > > the necessary config files + keytabs for _this run_ of the tool.
> > > > >
> > > > > It then starts the processes by pointing them at this directory, so
> > > they
> > > > > can pick up their config there and it also captures stdout & stderr
> > in
> > > > this
> > > > > folder.
> > > > >
> > > > > This is different from Ambari. Ambari extracts Knox and creates
> > > symlinks
> > > > > from its conf, logs, pids and data directory to /etc/XXX. This is
> > > > possible
> > > > > here because those directories (/etc/...) don't change.
> > > > >
> > > > >
> > > > > XXXXXXXXX
> > > > >
> > > > >
> > > > > The problems I have with Knox so far (I'm sure I'll find more the
> > > > further I
> > > > > get) are:
> > > > >
> > > > > * gateway.sh has no way to take in options from the "outside". With
> > > > Hadoop,
> > > > > HBase, (now) NiFi you can pass in arbitrary Java options using
> > > variables
> > > > > like HADOOP_JAVA_OPTS and similar.
> > > > >
> > > > > In theory all the "setup" is already there for Knox as well using
> > > > variables
> > > > > like APP_CONF_DIR but unfortunately, they get set to hardcoded
> values
> > > at
> > > > > the beginning of the script.
> > > > >
> > > > > Proposal: Add at least a APP_JAVA_OPTS variable so I can pass in
> > > > arbitrary
> > > > > stuff to be added to the Java command line. But really, I'd love to
> > > just
> > > > > remove the defaults for APP_LOG_DIR etc. IFF they are already set
> > > > > externally
> > > > >
> > > > > * gateway.sh checks whether various directories exist. These are
> > > > hardcoded
> > > > > (e.g. APP_HOME_DIR/conf). But those directories are configurable
> > using
> > > > > GATEWAY_HOME etc. so those checks should either be removed or
> fixed,
> > so
> > > > > they take those variables into account
> > > > >
> > > > > * knoxcli create-master takes a --master argument which I only
> found
> > > out
> > > > by
> > > > > looking at Ambari. The source says it's for testing only. It seems
> as
> > > if
> > > > > that should be documented though. I think it's pretty useful to
> allow
> > > the
> > > > > master being created non-interactively
> > > > >
> > > > > * gateway.sh does allow one thing to be overridden externally and
> > that
> > > is
> > > > > the pid dir using ENV_PID_DIR. Unfortunately, knox-env.sh (which is
> > > being
> > > > > sourced unconditionally) overrides this variable with an empty
> > value. I
> > > > > think this line should just be removed from knox-env.sh
> > > > >
> > > > > * Launcher looks for a file called gateway.cfg but it always and
> > > > > unconditionally looks in its "own" directory (launcherDir). I need
> a
> > > way
> > > > to
> > > > > point this to a different location. It allows me to define
> > GATEWAY_HOME
> > > > as
> > > > > a system property. While I can also define that as an environment
> > > > variable
> > > > > the System property is checked first. And if it finds a
> > > gateway-site.xml
> > > > > there it uses that. I need it to use the one from the environment
> > > > variable.
> > > > >
> > > > > * gateway.sh allows the process to run in the foreground but still
> > > > captures
> > > > > stdout & stderr to files. I would argue that it makes more sense to
> > > leave
> > > > > them as is and print them to the console instead.
> > > > >
> > > > > I'm happy to create issues for all of these and also provide
> patches
> > > for
> > > > > some/all of them depending on my available time. I just wanted to
> > bring
> > > > > this up before I started to see if anyone has any better ideas
> and/or
> > > > > things that I might have missed.
> > > > >
> > > > > Thanks for reading!
> > > > >
> > > > > Cheers,
> > > > > Lars
> > > > >
> > > >
> > >
> >
>

Reply via email to