Hi Dimitry, FTP masters and others,

I know Dimitry was fighting an uphill battle with kubernetes between 2016
and 2018 and he experienced first hand the problems posed by vendored code.

We see more and more software making excessive use of vendored code. Pretty
much everything that is written in Go. Some of these are crucially
important, like Docker or Kubernetes. So I understand the concern everyone
has about how this fits with the Debian Policy.

Debian Policy, paragraph 4.13 states:
(for your convenience I include it below :) )
https://www.debian.org/doc/debian-policy/ch-source.html#convenience-copies-of-code

=================
4.13 Convenience copies of code

Some software packages include in their distribution convenience copies of
code from other software packages, generally so that users compiling from
source don’t have to download multiple packages. Debian packages should not
make use of these convenience copies unless the included package is
explicitly intended to be used in this way. [17] If the included code is
already in the Debian archive in the form of a library, the Debian
packaging should ensure that binary packages reference the libraries
already in Debian and the convenience copy is not used. If the included
code is not already in Debian, it should be packaged separately as a
prerequisite if possible. [18]

[18] Having multiple copies of the same code in Debian is inefficient,
often creates either static linking or shared library conflicts, and, most
importantly, increases the difficulty of handling security vulnerabilities
in the duplicated code.
=================

I think this is the part that has the most bearing on the vendored code
problem, especially the footnote. I agree with this principle. But we
should apply it to the state of affairs in 2020, and to this specific
situation.

Keeping all that in mind, here are the reasons why I think it is acceptable
for now to package Kubernetes with the vendored code, and even the best
solution that is available currently:

1. OTHER EXAMPLES. If we take this paragraph completely literally and to
the extreme then other packages are also in violation of it. True, the
current packaging of kubernetes does this to a greater extent than its
predecessor for example, but perhaps this shows that this section was
always open for interpretation. Examples of some prominent packages in
Debian that bundle and use the vendored code (in parentheses is the number
of go packages bundled, estimate):
- docker.io (58, including some that are vendored more than once within the
same source package, but not including the fact that docker.io itself is
made up of 7 tarballs)
- kubernetes (20 for the previous version, 200 now)
- prometheus (4)
- golang (4)
None of these were REJECTed, and please don't sabotage these packages now
:-D The idea was only to show that, at least for now, vendoring is a fact
in Debian. There is an effort to improve the situation but in the meantime
we just go on. Not great, not terrible..

2. MAINTAINABILITY. Having every single vendored repo available as a
separate package in Debian is not feasible. It is true that some of them
are already packaged. But the expectation that all of them are (with the
exact version that is needed for Kubernetes), is not going to happen. Also,
the golang-* packages have a number of different maintainers. Hundreds of
such packages would be required to build Kubernetes. So one can be rest
assured that every future release in Debian will be blocked on waiting for
dozens of these packages to be updated. Dimitry and a few others worked
hard on trying to pull this off but even they could not do it. Since 2016 a
total of 3 Kubernetes releases made it into Debian/unstable, but there have
been 17 major and countless minor upstream releases of Kubernetes.
Thousands of issues were fixed upstream, including serious security flaws,
these never made it into Debian. Exactly because the packaging was too
difficult to maintain. So, how maintainable was that solution then, despite
the huge amount of effort put in? In my opinion this shows that the
reasoning on maintainability in DP does not apply here.

3. NO FORKS. Debian developers hacking Kubernetes source code, so it
compiles with a lucky enough version of a dependency that made it into
Debian, makes the Debian version of Kubernetes different from the standard
one that everyone expects. This is totally unwelcome by almost every user.
No sane cluster admin would dare to use this "fork", ever. There were some
attempts to get the Kubernetes contributors to update dependencies to a
specific version: https://github.com/kubernetes/kubernetes/issues/27543 .
Reading the whole thread helps to put some perspective on this. The
Kubernetes contributors were actually quite helpful throughout but they
have made it clear that they will not update dependencies for update's
sake. Maybe with some projects Debian would have the upper hand, but not
with Kubernetes.

4. TESTING. The Kubernetes releases are meticulously tested, with far
greater technical resources that Debian can collectively muster. The
Kubernetes project runs e2e tests regularly on thousands of nodes (donated
compute time). If we were to continue to have a fork we would be obliged to
do the same. Even if we could run such extensive tests on our fork, and
these e2e tests revealed a problem, who is going to interpret the results
and fix our snowflake? The debian fork was never tested this way and it
seems unlikely that it could ever be.

5. SECURITY. The strongest and most applicable point from the DP footnote
is about security vulnerabilities in the duplicated code. This is
completely valid. But again, with the maintainability issues (see 2) we
won't be able to roll out security fixes in time. How did security in the
Debian forks created by DDs worked in the past?
https://www.explainxkcd.com/wiki/index.php/424:_Security_Holes . It is true
that without listing the bundled dependencies in Built-Using, it is harder
to find out if a vulnerability in one of them affects the binary. (Hint: it
is hard anyway.) In the case of Kubernetes, and other Go programs in
general, an automated tool could be made that extracts go.mod/go.sum for
monitoring the dependencies for security vulnerability reports. Doing the
whole dance of let's package and maintain hundreds of dependencies so we
have a machine readable Built-Using instead of a machine readable
go.mod/go.sum seems a lot more harm than good for security. Furthermore the
current situation forces users to add third party repos to sources.list to
get up to date Kubernetes releases and/or download who-knows-whats-in-it
binaries. So this is not great, but the alternative is terrible.

6. EFFICIENCY. Go libraries, vendored or not, are essentially statically
linked into the binary. This is still the case when the result is a
"dynamic" Go binary, e.g. linked to libc. While there are some experiments
for shared libraries in Go, there is no real world use. This means that
vendoring has no effect on linking behavior so the whole point is beside
the issue.

7. DFSG. I am not aware of any DFSG issues in the vendored packages. No
funny licenses, blobs, network downloaded stuff, etc. If there are any,
please point it out specifically, and it will be fixed with high priority.
I have checked the licences of every dependency and have compiled it in a
container with no network access, so what's in the .orig.tar.gz is exactly
what was compiled, nothing more.

I do think there is a good case for Kubernetes to be an exception from 4.13
for now, just like other Go packages effectively are. It is a massively
popular project topped only by the Linux kernel. We cannot afford not to
have up to date versions in Debian, or have forks that no one can use.

So let's find a way to make this happen!

Regards,
-- 
LENART, János
<o...@debian.org>

Reply via email to