I think some of the configuration is around log4j configurations which is probably going to have to change related to log4j2 (see KNOX-1462). For Docker support some of the same things (like running in the foreground and configs from environment variables) would be beneficial (see KNOX-1285). I read through the github repo and things looked pretty reasonable. Would be interesting in concrete changes and see how they play out.
Kevin Risden On Thu, Nov 8, 2018 at 4:44 PM Lars Francke <lars.fran...@gmail.com> wrote: > Larry, > > thank you so much for spending time on this. I didn't actually expect an > answer as I saw the ball in my court. Thank you! Especially as this must be > one of the most boring parts of writing any piece of software, sorry :( > > I agree that lots of things are "close" but it's not all consistent. I have > to "relearn" this every time I look at it as well as it's not super trivial > to keep it all in memory ;-) > > I have created some documentation for myself to remember these things. I've > just pushed it to a temporary repository. In case we'll decide to change > something and create an issue from it I'll attach it there as well: < > https://github.com/lfrancke/knox-config-docs> > Far from beautiful but it collects my findings. > > As you can see you are correct that the order is correct for some variables > but different for others and on top of that they are inconsistent. All of > them can be configured slightly differently. And the big problem on top of > that is that the most important setting (the one taking top precedence) is > set via Java System properties. > In general I think that's a good decision because a user will actively have > to add a System Property while an Environment Variable and other default > settings could be coming from an "unknown" environment. > > The problem is that the launcher first reads a properties file (gateway.cfg > or knoxcli.cfg) from the bin directory and sets everything it finds in > there as System Properties. If it can't find this file it uses a default > one with hardcoded things. > > Because the current shell scripts don't allow me to pass in any Java System > Properties that means that I can't effectively override these defaults > without changing this cfg file. I think that's counterintuitive. Especially > as the launcher writes its default cfg file to the bin directory on first > launch. This means that it effectively shifts the precedence around. The > least important ones (hardcoded defaults) now move to the top spot. > > Yes it _can_ all be changed by changing this file but at least in my > opinion that is very counter-intuitive and different from the way all other > Hadoop ecosystem tools work that I know. > > Unfortunately I totally agree with you on the backwards compatibility > issue...and I don't have a good idea here either. > > For the shell script at least I propose to do the following: > 1) Run shellcheck/shellharden over them (same thing happened with Hadoop in > the CLI rewrite) so they are up to current bash standards. > > 2) Currently the script uses APP_ environment variables. I propose adding a > second set called GATEWAY_ and in version X check if any of them are set > and if they are emit a warning (or even abort startup?) that they are going > to be used in version X+1 to modify startup behavior > > 3) At the same time check if anyone set APP_ externally and warn them that > they are currently not being honored and that they'll go away entirely in > X+1 as well > > That way I could at least pass in Java Properties to the startup script and > it would finally allow me to override the defaults. > > I will try to spend a bit of time to work out a more detailed plan for the > rest of it but if we agree that it's worthwhile to make all of this > consistent then I believe we won't be able to do it without breaking > changes in current behavior. I also believe however that almost no one will > rely on the current behavior. I mean...it took me at least a day to > understand how it currently works and I still only get half of it. I doubt > many people have made the same effort (there's always the good chance that > others are smarter than me and didn't take that long ;-) ) though. > > In general I would propose the following: > > * Roll out any changes over multiple versions of Knox in stages > * Stage 1: Improvements that can be made without any breaking changes > * Stage 2: WARN users if they rey on behavior that is going to change in > the future > * Stage 3: Make the change > > Sorry for the rambling... I'll try to keep the Github Repository more > structured. > > But I'd also be happy if you told me that you think this is not worth the > effort. Unfortunately our sponsor only paid me for a few days of work on > this so I'll be a bit slow to work on this anyway but I would be happy to > continue/start the work. > > Cheers, > Lars > > > On Wed, Nov 7, 2018 at 7:59 PM larry mccay <lmc...@apache.org> wrote: > > > Hi Lars - > > > > Sorry for the delay in getting back to this thread. > > I have to admit, I have yet to get my head around the risk associated > with > > the proposed changes. > > Somewhat due to the way that some of the layers in the ordering you > propose > > sort of overlap today. > > For instance, the ConfigConfigImpl class takes environment variables and > > system properties into account within its accessor methods for a number > of > > these directories. > > > > Example: > > @Override > > public String getGatewayDataDir() { > > String systemValue = > > System.getProperty(GATEWAY_DATA_HOME_VAR, > > System.getenv(GATEWAY_DATA_HOME_VAR)); > > String dataDir = null; > > if (systemValue != null) { > > dataDir = systemValue; > > } else { > > dataDir = get(DATA_DIR, getGatewayHomeDir() + File.separator + > > DEFAULT_DATA_DIR); > > } > > return FilenameUtils.normalize(dataDir); > > } > > > > The above does take sys property above environment variable. > > If neither is set it checks the xml config files and ultimately fallsback > > to some default. > > I believe this is exactly what your proposed order is. > > > > Directories that are to be relative to the DataDir determined above > should > > be fine since the are considered relative: > > > > @Override > > public String getGatewayServicesDir() { > > return get(STACKS_SERVICES_DIR, getGatewayDataDir() + File.separator > + > > DEFAULT_STACKS_SERVICES_DIR); > > } > > > > @Override > > public String getGatewayApplicationsDir() { > > return get(APPLICATIONS_DIR, getGatewayDataDir() + File.separator + > > DEFAULT_APPLICATIONS_DIR); > > } > > > > as examples. > > > > Unless we want to be able to use different directory names for those as > > well. > > That would seem more risky. > > > > Now, the ConfDir seems to actually be the same as the DataDir discovery: > > > > @Override > > public String getGatewayConfDir() { > > String value = getVar( GATEWAY_CONF_HOME_VAR, getGatewayHomeDir() + > > File.separator + "conf" ); > > return FilenameUtils.normalize(value); > > } > > > > Where getVar does the interrogation of Sys Prop and Env vars: > > > > private String getVar( String variableName, String defaultValue ) { > > String value = get( variableName ); > > if( value == null ) { > > value = System.getProperty( variableName ); > > } > > if( value == null ) { > > value = System.getenv( variableName ); > > } > > if( value == null ) { > > value = defaultValue; > > } > > return value; > > } > > > > So, it actually seems like the code is processing things in the order > that > > you propose already. > > > > You are saying that the order is being manipulated by the use of the cfg > > files, Launcher, etc by setting system properties then? > > > > Again, I still just trying to assess the risk on existing deployments and > > any provisioning tools out there that may be making assumptions. > > > > thanks, > > > > --larry > > > > On Fri, Oct 12, 2018 at 4:37 PM larry mccay <lmc...@apache.org> wrote: > > > > > Hi Lars - > > > > > > Thanks for raising the backward compatibility concern, it was exactly > > what > > > I was thinking as I was catching backing up the thread. > > > We need to give serious thought to upgrade scenarios. > > > > > > I will need to spend some time with the proposal but the bottom line is > > > this area clearly needs some love and we just have to find the safest > way > > > to improve it. > > > > > > Specific concerns are obviously around existing deployments and any > > > assumptions that are made by devops type installs, Ambari, etc. > > > > > > thanks, > > > > > > --larry > > > > > > On Fri, Oct 12, 2018 at 1:39 PM Lars Francke <lars.fran...@gmail.com> > > > wrote: > > > > > >> So I did spend a few hours looking through the shell scripts, > Launcher, > > >> KnoxCLI, GatewayServer and GatewayConfigImpl. > > >> > > >> Before I start work I'd like to get opinions to avoid unnecessary > work. > > >> > > >> There are lots of inconsistencies in the current code for handling > > >> directories and properties. I'd love to make all of that as consistent > > as > > >> I > > >> can. > > >> > > >> The big question for me however is: What about backwards > compatibility? > > If > > >> I change the way properties are handled it might mean different > > behaviour > > >> if people keep on using the same config as they are now. > > >> > > >> I say _might_ because I assume that most people stick to the standards > > >> (especially via Ambari) and are not using some of the ways to change > > >> configs. Most of them aren't really documented anyway. > > >> > > >> My current plan is as such: > > >> > > >> - Make all paths configurable using a consistent way in which these > > paths > > >> can be specified > > >> > > >> - The new order should be: > > >> 1) Java System Property > > >> 2) Environment variable > > >> 3) Config from gateway-default.xml & gateway-site.xm > > >> 4) Some computed default > > >> > > >> - This is not possible for all settings. Especially the location of > the > > >> gateway-site.xml can obviously not be configured in gateway-site.xml > > >> itself > > >> :) > > >> > > >> - Knox currently (using Invoker + Launcher) sets Java System > Properties > > >> for > > >> us coming from the gateway.cfg file. I think this is bad form because > > they > > >> take precedent over environment variables and are set automatically on > > >> first run and one needs to change them. > > >> > > >> So I'd really like to do away with the concept of those cfg files in > its > > >> entirety and pass parameters explicitly. I looked at how the new Shell > > >> scripts from Hadoop 3 work and would like to implement something very > > >> similar here. > > >> > > >> If we want to keep the cfg file then I would propose either not using > it > > >> to > > >> set Java System properties or at least to change the default > gateway.cfg > > >> to > > >> not set GATEWAY_HOME and log4j.configuration > > >> > > >> And if we keep the file I'd also like to make its location > configurable. > > >> > > >> But really I'd like to do away with it. Happy to be convinced > otherwise > > >> though. > > >> > > >> - gateway.sh will not hardcode any values anymore but use the same > logic > > >> as > > >> the code to determine final locations > > >> > > >> - gateway.sh will allow customization via environment variables > > >> > > >> - knox-env.sh will be renamed to knox-function.sh and a new empty > > >> knox-env.sh will be created that basically serves as documentation for > > all > > >> the environment variables that can be used to customize gateway.sh > > >> > > >> - Document knoxcli.sh create-master --master > > >> > > >> - Processes running in the foreground will not redirect stdout & > stderr > > to > > >> a file > > >> > > >> Any concerns or comments? > > >> > > >> On Thu, Oct 11, 2018 at 5:36 PM > Kevin Risden > <kris...@apache.org> > > wrote: > > >> > > >> > Agree with the changes proposed. It is rather annoying to have to > edit > > >> > gateway.sh to apply changes when external environment variables are > > >> useful. > > >> > This work would also help support a Docker image (KNOX-1285) since > we > > >> would > > >> > run into the same customization issues. > > >> > > > >> > Kevin Risden > > >> > > > >> > > > >> > On Thu, Oct 11, 2018 at 11:12 AM Lars Francke < > lars.fran...@gmail.com > > > > > >> > wrote: > > >> > > > >> > > Thanks both of you. > > >> > > > > >> > > No, these changes would benefit me and this is my use case but we > > >> would > > >> > not > > >> > > need to add anything parcel or cloudera specific. And I think that > > >> while > > >> > I > > >> > > have a specific need they are useful in general as well. > > >> > > > > >> > > It would probably be an overhaul of gateway.sh, knox-env.sh and > > >> > knoxcli.sh. > > >> > > Maybe a few changes to Launcher. > > >> > > > > >> > > > > >> > > > > >> > > On Thu, Oct 11, 2018, 16:50 larry mccay <larry.mc...@gmail.com> > > >> wrote: > > >> > > > > >> > > > This sounds like a worthwhile piece of work, Lars. > > >> > > > Would the "parcel" need to be added to the Knox project? > > >> > > > > > >> > > > +1 to Phil's response. > > >> > > > > > >> > > > > > >> > > > On Thu, Oct 11, 2018 at 9:33 AM Phil Zampino < > pzamp...@apache.org > > > > > >> > > wrote: > > >> > > > > > >> > > > > Cloudera specifics aside, some of these things have been on my > > >> > personal > > >> > > > > "back burner" todo list; I just haven't as of yet been able to > > >> bring > > >> > > them > > >> > > > > to the front burner ;-) > > >> > > > > > > >> > > > > Please feel free to create issues for these, and provide > patches > > >> as > > >> > > > you're > > >> > > > > able. > > >> > > > > > > >> > > > > Thanks for this useful feedback, > > >> > > > > Phil > > >> > > > > > > >> > > > > > > >> > > > > On Thu, Oct 11, 2018 at 8:43 AM Lars Francke < > > >> lars.fran...@gmail.com > > >> > > > > >> > > > > wrote: > > >> > > > > > > >> > > > > > Hi again, > > >> > > > > > > > >> > > > > > (long intro, jump to the line with the XXXXX if you're not > > >> > interested > > >> > > > in > > >> > > > > > the background) > > >> > > > > > > > >> > > > > > I'm sorry for all the mails, they all come from my quest to > > get > > >> > Knox > > >> > > > > > packaged up as a Cloudera Parcel plus a CSD (Custom Service > > >> > > Descriptor) > > >> > > > > to > > >> > > > > > run it from Cloudera Manager (CM) similar to what Ambari > > >> currently > > >> > > > > allows. > > >> > > > > > I did the same for NiFi and was facing similar issues there > > >> (e.g. < > > >> > > > > > https://issues.apache.org/jira/browse/NIFI-5350>, < > > >> > > > > > https://issues.apache.org/jira/browse/NIFI-5573> and > others). > > >> > > > > > > > >> > > > > > For people unfamiliar with Cloudera Manager I'll explain how > > it > > >> > > works. > > >> > > > > That > > >> > > > > > should make it clearer why I have the issues I describe > below. > > >> > > > > > > > >> > > > > > Cloudera Manager extracts the "things" it manages into a > > >> directory > > >> > > > (e.g. > > >> > > > > > /opt/cloudera/parcels/KNOX) and they are owned by root:root. > > >> This > > >> > is > > >> > > > not > > >> > > > > to > > >> > > > > > be changed by any process (e.g. no configuration file > changes, > > >> no > > >> > > > > changing > > >> > > > > > of symlinks, no storing of PIDs, logs etc.). > > >> > > > > > > > >> > > > > > Every time a process (e.g. Knox Gateway) is started CM > > creates a > > >> > new > > >> > > > > > directory (/var/run/cloudera-scm-agent/process/XXX) where it > > >> > > > > copies/creates > > >> > > > > > the necessary config files + keytabs for _this run_ of the > > tool. > > >> > > > > > > > >> > > > > > It then starts the processes by pointing them at this > > >> directory, so > > >> > > > they > > >> > > > > > can pick up their config there and it also captures stdout & > > >> stderr > > >> > > in > > >> > > > > this > > >> > > > > > folder. > > >> > > > > > > > >> > > > > > This is different from Ambari. Ambari extracts Knox and > > creates > > >> > > > symlinks > > >> > > > > > from its conf, logs, pids and data directory to /etc/XXX. > This > > >> is > > >> > > > > possible > > >> > > > > > here because those directories (/etc/...) don't change. > > >> > > > > > > > >> > > > > > > > >> > > > > > XXXXXXXXX > > >> > > > > > > > >> > > > > > > > >> > > > > > The problems I have with Knox so far (I'm sure I'll find > more > > >> the > > >> > > > > further I > > >> > > > > > get) are: > > >> > > > > > > > >> > > > > > * gateway.sh has no way to take in options from the > "outside". > > >> With > > >> > > > > Hadoop, > > >> > > > > > HBase, (now) NiFi you can pass in arbitrary Java options > using > > >> > > > variables > > >> > > > > > like HADOOP_JAVA_OPTS and similar. > > >> > > > > > > > >> > > > > > In theory all the "setup" is already there for Knox as well > > >> using > > >> > > > > variables > > >> > > > > > like APP_CONF_DIR but unfortunately, they get set to > hardcoded > > >> > values > > >> > > > at > > >> > > > > > the beginning of the script. > > >> > > > > > > > >> > > > > > Proposal: Add at least a APP_JAVA_OPTS variable so I can > pass > > in > > >> > > > > arbitrary > > >> > > > > > stuff to be added to the Java command line. But really, I'd > > >> love to > > >> > > > just > > >> > > > > > remove the defaults for APP_LOG_DIR etc. IFF they are > already > > >> set > > >> > > > > > externally > > >> > > > > > > > >> > > > > > * gateway.sh checks whether various directories exist. These > > are > > >> > > > > hardcoded > > >> > > > > > (e.g. APP_HOME_DIR/conf). But those directories are > > configurable > > >> > > using > > >> > > > > > GATEWAY_HOME etc. so those checks should either be removed > or > > >> > fixed, > > >> > > so > > >> > > > > > they take those variables into account > > >> > > > > > > > >> > > > > > * knoxcli create-master takes a --master argument which I > only > > >> > found > > >> > > > out > > >> > > > > by > > >> > > > > > looking at Ambari. The source says it's for testing only. It > > >> seems > > >> > as > > >> > > > if > > >> > > > > > that should be documented though. I think it's pretty useful > > to > > >> > allow > > >> > > > the > > >> > > > > > master being created non-interactively > > >> > > > > > > > >> > > > > > * gateway.sh does allow one thing to be overridden > externally > > >> and > > >> > > that > > >> > > > is > > >> > > > > > the pid dir using ENV_PID_DIR. Unfortunately, knox-env.sh > > >> (which is > > >> > > > being > > >> > > > > > sourced unconditionally) overrides this variable with an > empty > > >> > > value. I > > >> > > > > > think this line should just be removed from knox-env.sh > > >> > > > > > > > >> > > > > > * Launcher looks for a file called gateway.cfg but it always > > and > > >> > > > > > unconditionally looks in its "own" directory (launcherDir). > I > > >> need > > >> > a > > >> > > > way > > >> > > > > to > > >> > > > > > point this to a different location. It allows me to define > > >> > > GATEWAY_HOME > > >> > > > > as > > >> > > > > > a system property. While I can also define that as an > > >> environment > > >> > > > > variable > > >> > > > > > the System property is checked first. And if it finds a > > >> > > > gateway-site.xml > > >> > > > > > there it uses that. I need it to use the one from the > > >> environment > > >> > > > > variable. > > >> > > > > > > > >> > > > > > * gateway.sh allows the process to run in the foreground but > > >> still > > >> > > > > captures > > >> > > > > > stdout & stderr to files. I would argue that it makes more > > >> sense to > > >> > > > leave > > >> > > > > > them as is and print them to the console instead. > > >> > > > > > > > >> > > > > > I'm happy to create issues for all of these and also provide > > >> > patches > > >> > > > for > > >> > > > > > some/all of them depending on my available time. I just > wanted > > >> to > > >> > > bring > > >> > > > > > this up before I started to see if anyone has any better > ideas > > >> > and/or > > >> > > > > > things that I might have missed. > > >> > > > > > > > >> > > > > > Thanks for reading! > > >> > > > > > > > >> > > > > > Cheers, > > >> > > > > > Lars > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > >