> You either need Internet access to download python, to have python embedded
> in the build at build time, or to have it pre-installed on the system
Linux Standard Base (LSB) compliant Linux distributions _will_ have a Python
interpreter. Unfortunately, the spec only specifies:
"The default installed Python version shall be 2.4.2 or greater."
(Section 6.3 of the Runtime Languages specification -
https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Languages/LSB-Languages.pdf)
I always like to point that out, but that being said, I agree that Python isn't
really the best fit here. For as long as we keep using 'goose', I think Go is
the way to go (pun intended).
> As a side note, it also paves the way for moving other scripts to
> Python like WebDep.pm,
That may be a bad example because, as was mentioned, it should be going away :P
I think that scripts that need to interface with Traffic Ops and have no really
tight performance constraints are currently our best use-case for Python, since
we already have a working, supported TO client in Python.
But for as long as CentOS (as our only officially supported distro) stubbornly
clings to the rotting carcass of Python2, using it for much else - especially
in the build process of major TC components - seems like a bad idea.
________________________________________
From: Chris Lemmons <[email protected]>
Sent: Tuesday, November 13, 2018 2:16 PM
To: [email protected]
Subject: Re: [EXTERNAL] The future of the db/admin.pl script
Stepping back for a moment, I see a few goals:
- Install without requiring Internet access. Since local security
policies may (rightfully!) restrict upline resources, it would be best
if we can install without needing to download things.
- Not distribute CatX dependencies. This would be a bit of a
non-starter, I think.
- Use technologies well-understood by the team.
- Install quickly.
- Maintain the script easily.
- Minimize system dependencies.
When I analyze some of the suggestions, here's what I see.
Python: You either need Internet access to download python, to have
python embedded in the build at build time, or to have it
pre-installed on the system. The same goes for every dependent
library. The first option doesn't meet the goal of not requiring
Internet access. We can embed python if we wish without violating
licenses, since Python is licensed under the PSF, which is Category A.
However, while PyInstaller itself is GPL, which is CatX, it has a
bootstrapper exemption that applies to the files that it stores
directly as part of the installer-creator. Unfortunately, the
exemption does not allow modification of the files and contains what I
think is a Field of Use limitation which requires that they be used
only when linked into another executable. I suspect that this causes a
PyInstaller binary to be incompatible with the distribution mechanisms
of the Apache project. Perhaps there is another python single-binary
system we could consider here, though. Ultimately, though Python isn't
really designed as a compiled language and if we want to compile it,
we might want to choose a language designed for that.
Having python pre-installed would increase deployment challenges,
particularly since many system repositories don't have the modern
versions of Python that we'd prefer to use.
Bash: It's pre-installed everywhere, but doesn't come with a sane YAML
implementation. We could build or add such a YAML implementation, but
it sounds gnarly. I've seen some attempts at doing YAML parsing with
sed... that way madness lies. There are some binary YAML-parsing
tools, which we'd need to vendor and build to include as binaries. But
if we're compiling binaries anyway, we've already side-stepped the
advantage of bash.
Go: Since all the choices amount to needing to compile something down
to a binary, it makes the most sense to use a language that compiles
nicely down to a binary. Go is the obvious choice, since it has nice
multithreading options, is a technology the team is already familiar
with, is already a dependency for existing systems (so it's exactly as
portable as most of ATC), and compiles quickly and easily using
existing tools. It's also got a solid YAML library suitably licensed
for us.
C or something like that: Basically the same advantages of Go, but
without the developer familiarity or multithreading options.
All things considered, I think Go is the best choice for this
particular problem. It avoids all the bad things we want to avoid and
gives us some neat advantages in return.
On Mon, Nov 12, 2018 at 4:59 PM Eric Friedrich -X (efriedri - TRITON
UK BIDCO LIMITED c/o Alter Domus (UK) Limited -OBO at Cisco)
<[email protected]> wrote:
>
>
>
> On Nov 12, 2018, at 4:03 PM, Rawlin Peters
> <[email protected]<mailto:[email protected]>> wrote:
>
> replies inline
>
> On Mon, Nov 12, 2018 at 11:51 AM Eric Friedrich -X (efriedri - TRITON
> UK BIDCO LIMITED c/o Alter Domus (UK) Limited -OBO at Cisco)
> <[email protected]<mailto:[email protected]>> wrote:
>
> Hey Rawlin-
> Both good points worth some serious thought.
>
> On Nov 12, 2018, at 1:29 PM, Rawlin Peters
> <[email protected]<mailto:[email protected]>> wrote:
>
> Eric,
>
> I share your sentiment about being reluctant to introduce another
> language as a dependency for Traffic Ops, but I wasn't able to find a
> really good, easily-available utility for parsing yaml (a la `jq` for
> json parsing) in a Bash script. Since the goose config is in yaml,
> `db/admin.pl` uses a yaml package to parse the goose config into
> variables which are then passed to the external `psql` et al.
> commands. It is possible to parse yaml using sed, but the example I
> found for doing that seemed really sketchy and fragile. So I figured
> using a solid YAML-parsing library like PyYAML in Python would be a
> safer bet while still allowing the use of a fully-featured programming
> language rather than "Bash + <insert yaml-parsing CLI tool here>". It
> would also allow us to potentially use a DB library to interface with
> the DB directly in Python rather than requiring `psql` et al. and just
> shelling out to those external commands (although I plan to continue
> doing it that way for now).
> EF> Yeah, the YAML parsing was the only thing I saw that was not a perfect
> fit for bash.
> Given how concise that db.yaml file that is, I wouldn’t think twice about
> getting the open line via:
> $ DB=“development” grep -A2 “$DB_NAME:” dbconf.yml | grep open | awk -F ":"
> '{print $2}'
>
> No special tool needed.
>
> I wish we could depend on the dbconf.yml file being a standard format
> like that in order to just be able to grep it with context, but there
> are no restrictions on empty lines or arbitrary comments in the yaml
> syntax that would easily break the grep command. We shouldn't have to
> enforce arbitrary formatting of a yaml config file just to remove the
> need for a yaml parser.
>
> EF> Yeah good point. I rarely touch the file so was not too worried about the
> syntax. But yeah, if people regularly change that file we should be more
> permissive with our parsing
>
> Is pyyaml part of the batteries-included packages?
> If not, we’ll need a way to distribute pyyaml.whl as part of the traffic ops
> RPM. (For those of us without permitted Internet access in our deployments,
> installing via pip means standing up our own private PyPi repo) at each of
> our customers- something I would really like to avoid.
>
> I don't believe it is part of the standard library, but I do believe
> we can figure out a way to keep it all self-contained to the traffic
> ops RPM without requiring internet access at install time. I think
> https://github.com/pyinstaller/pyinstaller is more than capable of
> doing that on Linux/Mac/Windows. I just tried it out on my Python
> version of db/admin.pl on my mac, and it created a self-contained
> binary of 4.7MB size that seems to work just fine. It does still
> depend on the external commands like `psql` et al. from the postgres
> package, but that's no different from the current Perl version. Do you
> think that would be sufficient?
>
> I think pyinstaller also solves the python2 vs python3 availability
> problem, since the interpreter is packaged into the resulting binary
> itself.
> EF> I think that would be OK. The licensing on that one is GPLv2 which is
> category X to include in source (we can distribute the boot loader though).
> Do you know of any alternatives with more compatible licensing?
>
>
>
>
> As a side note, it also paves the way for moving other scripts to
> Python like WebDep.pm, which uses a Perl package that is virtually
> impossible to install/get running on Mac because of Perl's broken SSL
> on Mac, which would make it much easier to start as a new developer on
> the project. I remember when I started working on Traffic Control, I
> had to copy someone else's Perl `traffic_ops/app/local` directory who
> had been on the project a long time and had actually gotten it to
> build on Mac before it became unusable. Eliminating issues like that
> by using a more popular and supportable language is a win in my book,
> but right now I'm just focusing on `db/admin.pl` to allow for better
> testability of the DB migration operations.
> EF>All the web_deps stuff should go away with the rest of perl, right?
>
> We really should be targeting RPMs that contain all dependencies (or can be
> resolved via yum/rpm), rather than asking our users to install stuff from the
> Internet at install time. Its fragile (packages disappear) and a bunch of TC
> users do not run in environments with access to the net.
>
> I’m not opposed to Python, its probably my favorite language, certainly the
> one I'm most comfortable in. I just don't see a compelling need for it with
> these changes.
> At some point soon we will need to rewrite perl scripts into something else
> (postinstall, ORT, etc…). We should closely consider our use of language for
> those as well- Go, Python, bash, etc…
>
> I think Go could be a reasonable language for stuff like db/admin.pl,
> but it seemed more natural to keep scripts as scripts for small stuff
> like that.
>
> - Rawlin
>