Bug#849754: guerillabackup review

halfdog Mon, 22 Jan 2018 14:30:37 -0800

Hello Michael,

Thanks for you detailed review.

Michael Lustfield wrote:
> I reviewed this packaging and came up with some issues:
> 
> - The description does nothing to explain what makes this solution unique.
>   + Why is this special?
>   + How does it even work?
>   + This should be easily gleaned from the package description.

This is the current description for reference. There is similar
topic at the end of the mail, where improvements are discussed
in detail.

===========================
 The tool supports asynchronous, local-coordinated, distributed
 secure backup generation, forwarding, verification, storage
 and deletion even in rugged environments. The system is open
 to integration of own code. The codebase is very small and
 kept simple to ease auditing.
 .
 Unlike business backup solutions, guerillabackup also considers
 following conditions that might be encountered with private use:
 * infrastructure uptime below 99%, e.g. machines not always powered
 * poor network connectivity, e.g. mobile access, parallel video
   streaming
 * poor physical access and anti-theft protection of machines
 * no security monitoring of machines
 .
 See /usr/share/doc/guerillabackup/Design.txt.gz section "Requirements"
 for more information.
===========================

> - There is a lintian override for a legitimate mistake (unindented list).

Could you please point out the legitimate mistake? The 4 bullet points
above form a list. If you remove them, lintian will not complain
any more.

> - d/patches is scary...
>   + Do not create lintian overrides using d/patches.

Where should they be put instead?

>   + Keep patches to essential changes (not syntax and language changes).
>   + Do not perform such a massive application rewrite using d/patches.

The size of patches is mostly due to a previous review arguing
for Python2 to Python3 change. As the upstream version IS written
in Python2 and I cannot change the past, those changes are applied
as patch. As the changes are very intrusive, they affect nearly
all files, thus the patch is generated automatically. I know,
that this intermediate state is not nice. If the package is accepted,
the next upstream version will include them already and thus the
patches will not require them any more - acceptance will shape
the development goals of upstream. As long as no acceptance can
be reached, I do not want to change/branch upstream in each direction
the contraticting review comments suggest, thus creating a large mess
of parallel versions.

>   + That change should be sent upstream... not held local.

They will go upstream, as soon as it is clear, that they are needed.
Without moving towards acceptance of the packet, the changes are
definitely not needed by anyone. Downstream shapes upstream and
vice versa.

> - Absolutely DO NOT generate lintian-override files from d/patches.

Where should they be put instead? Would this be the correct solution?

* Put them in d/guerillabackup.lintian-overrides

* Change d/rules somehow, so that "dh_lintian" is called automatically?

> - d/control should wrap long lines.

Looking into d/control all lines are wrapped at the first whitespace
after 60 characters. Currently the longest line is currently 70
characters (see d/control posted above). This is significantly
beyond typical terminal width and even ~5 characters below the
average line length of e.g. util-linux control file in Debian
Stretch.

What are the Debian rules for line length? Below 50?

> - d/rules could use some more whitespace.

Where? The file is just 37 lines and in my opinion not hard to
read. Any whitespace I could think of would just decrease readability
by stretching the content. Each paragraph, containing at most
3 commands, is separated from the next by a blank line. Should
there be a blank line between each single command?

> - The py2 dependency is bad (very bad, especially at this point).
>   + There are only two years left until py2 EoL.

Where did you find that? All "grep -i -r python2" hits are in
files not to be included (see Py2/3 patch comments from above)
or are centered around py2/py3 migration itself. Maybe I missed
something.

> - d/links suggests binaries are not being installed into standard locations.

Because Python when invoked via she-bang adds by default the binaries'
directory to the search path. I do not want to have all other
binaries of a target system (some might be Python code by themself)
on the search path for security reasons. As backups are usually
run as root, a single unrelated package could thus allow to gain
privileges.

> - d/NEWS is not needed here and will be needlessly noisy for users.

Removed.

> - d/install should use wildcards to grab docs, instead of a file list

This was done with forward compatibility in mind - future documentation
without relevance for the package should not be included automatically
to keep the package small and to avoid having useless documentation
included (whitelisting vs. blacklisting).

Still I changed it to /*.txt . At the moment the number of files
is small enough to stick to a "card blanche" approach.

> - d/install does not need the leading "./"

Already removed in yesterday uploaded version.

> - d/compat should probably get a bump to 11
>   + in d/control as well

Even if mentors.debian upload platform will then complain about
it? That was the exact reason, why I kept it lower until now -
there is no ring to bind them all.

I increased to 11 in d/compat, but keeping it at 10.0.0 in control -
otherwise the package cannot be built any more.

> - This is a python package, but raw source is being installed to /usr/lib.
>   + The python scripts should be compiled.

This was a discussion point in previous review: as backup should
also work when disasters are evolving, it should not be too fragile.
The code base is very small, so Python JIT does not really degrade
performace. On the other hand: it was not possible to find any
documentation on .pyc file generation/update/use failure modes.
So they just increase failure- and attack surface to an unknown
extend without any reasonable benefit. Therefore .pyc generation
was disabled.

If Debian appreciates taking safety/security decisions without
being backed by facts, I have not problem changing the behaviour
for the Debian version of the package. At the moment they are
not generated and even when generated, use of them is disabled
also - similar to defense in depth. 2 Options:

* Just generate them (and do not use them) to make build infrastructure
  happy.
* Generate and enable them.

>   + There should be no "src/*" inclusion in d/install.

See above.

> - Using d/install is not the correct way to include init script.
> - The debhelper scripts have already been commented on.
> - The interaction with the init system was also already commented on.

What would be the correct way for those 3 topics?

Please consider also safety/security requirements regarding systemd
configuration initialization:

==================
# Do not enable the services on fresh install by default. The
# user should do that manually for those services, he really wants
# to run.

# Do not start the services after install or update. Without this
# option, all units would be started during upgrade, even those
# not enabled. When user did not enable them, dpkg should respect
# that. Those enabled will still be started by custom postinst
# code.
==================

Maybe I am misguided, but I assuemed that packaging should not
have a negative impact on software quality. So if software requirement
is, that software has to be configured MANUALLY for safe operation,
packaging should not enforce the opposite. I tried to find someone
with deep systemd knowledge to create a solution that allows

* default packaging handling of init files _PLUS_

* fulfilling security requirements

but failed. At the moment, the included scripts fulfill the security
requirements as stated above, they seem to work flawless on Debian
Stretch and I do not know, how to improve them any further.

> - This packaging doesn't follow any sort of standard.
>   + There is no way to simply grab the source and build a package.

See below. All questions regarding preferred package building
data storage location were unanswered yet, therefore this part
is missing yet.

>   + I would suggest creating a debian packaging repo (on alioth or salsa).
>   + Using gbp will make it substantially easier for others to review.
>   + Following a standard also means others can easily hop in and help.

That was exactly, what I was asking for multiple times. According
to https://wiki.debian.org/Alioth/Git what is the most suitable
repository mode to use for this type of package (collab/project/
personal)?

> - Tests are included, but nothing is done with them.
>   + If you built them, use them!

They are included in upstream for development time testing. As
it takes ages arguing if 60 characters is an appropriate line
length for d/control files, I assume, that there will be no time
for automated unit test in near future.

> Not Debian-centric:
> 
> - Why is there no Makefile?

Because nothing is built - the Python files are run via interpreter
and JIT for security reasons. See above.

>   + What Install.txt provides should be available via make.

I can add that. Just to be sure and to avoid useless work: I assumed
that "make install" produces poor quality installs as they cannot
be undone easily. Thus nobody should be encouraged to develop
software that way. For development, everything is prepared to
run all tests as non-root build user. So no "make install" is
needed for software development. For rollout staging/production
tests, in my opition, the packages should be used.

If above assumption is wrong or someone or something (Debian build
systems CI, SOPs?) really needs a "make install", I will add it.
If we cannot name anyone in need and this is just a kind of
"we did that for 20 years" argument, I would rather leave them away.

>   + It's also where you would stick bits of code to generate documentation.

Not according to the Debian maintainer's guide, see
https://www.debian.org/doc/manuals/maint-guide/dother.en.html
5.15.3. manpage.xml.ex

This is also, what the debian package template generator proposes.

> - The "release" version v0.0.0 suggests incomplete work.
>   + There has been no release since v0.0.0 (in 2016), despite code changes.

Please suggest any other number, you like to see. The software
is the port of well-tested previous software to Python2. The package
as provided via Debian mentors is working without any flaws 24/7
on multiple machines. As long as there are no other users, there
will be no additional use-cases, no additional requirements and
hence no need for newer versions. So at the moment the next update
is scheduled, when Pyhton3 is discontinued and replaced by Python4
or when systemd is outdated.

> - Why are man pages under debian/?
>   + Make them a part of the software, not the package.
>   + Generate the file man page during build (using make).

Because https://www.debian.org/doc/manuals/maint-guide/dother.en.html
suggests to do so, see above.

> Personally, I don't understand what this backup solution offers over others.
> Even after discussing this on IRC, I still really don't have any idea what it
> offers over other solutions. I feel like a lot of buzz words are being
> arbitrarily assigned to the description. This makes it hard to follow what is
> fact and what is opinion.
> 
> Perhaps:
> Guerilla Backup is a backup solution for small and unstable networks. It can
> use your choice of transports (ssh, telnet, xmpp, etc.) to push backups to a
> distributed cluster. Guerilla Backup ensures security by pre-encrypting data
> before upload and only this key can be used to decrypt the data.

Thanks for this suggestion. I tried to merge your suggestions
with additional information from the core requirements. Do you
think, that this description gives a better impression? I also
added a warning to scare away users, that may not have means to
configure and operate such a system anyway.

============================
GuerillaBackup is a backup solution for tailoring special purpose
backup data flows, e.g. in rugged environments, with special legal
constraints (privacy regulations, cross border data storage), need
for integration of custom data processing, auditing or quality
assurance code. It is kept small to ease code audits and does
not attempt to duplicate features, stable and trusted high quality
encryption, networking or storage solutions already provide. So
for example it can be integrated with your choice of trusted transports
(ssh, SSL, xmpp, etc.) to transfer backups to other nodes according
to predefined or custom transfer policies. GuerillaBackup ensures
security by encrypting data at source and only this key can be
used to decrypt the data.

WARNING: If you are familiar with GDPR, ITIL-service-strategy/design
ISO-27k, ... this software will allow you to create custom solutions
fulfilling your needs for high quality, legally sound backup and
archiving data flows. If you do NOT run production with those
(or similar terms) in mind, you should look out for something
else.

See /usr/share/doc/guerillabackup/Design.txt.gz section "Requirements"
for more information.
============================

> There's no real distinction between opinions, technical info, or general user
> info. It makes it pretty much impossible to know what someone is getting with
> this software.

Therefore the reference to the software requirements is included.
The requirements correspond to all claims made in the description
and each requirement is fulfilled by the current implementation.

What else should be cited to show, that a claim is legit?

Best regards,
hd

Bug#849754: guerillabackup review

Reply via email to