Hello Michael, Thanks for you detailed review.
Michael Lustfield wrote: > I reviewed this packaging and came up with some issues: > > - The description does nothing to explain what makes this solution unique. > + Why is this special? > + How does it even work? > + This should be easily gleaned from the package description. This is the current description for reference. There is similar topic at the end of the mail, where improvements are discussed in detail. =========================== The tool supports asynchronous, local-coordinated, distributed secure backup generation, forwarding, verification, storage and deletion even in rugged environments. The system is open to integration of own code. The codebase is very small and kept simple to ease auditing. . Unlike business backup solutions, guerillabackup also considers following conditions that might be encountered with private use: * infrastructure uptime below 99%, e.g. machines not always powered * poor network connectivity, e.g. mobile access, parallel video streaming * poor physical access and anti-theft protection of machines * no security monitoring of machines . See /usr/share/doc/guerillabackup/Design.txt.gz section "Requirements" for more information. =========================== > - There is a lintian override for a legitimate mistake (unindented list). Could you please point out the legitimate mistake? The 4 bullet points above form a list. If you remove them, lintian will not complain any more. > - d/patches is scary... > + Do not create lintian overrides using d/patches. Where should they be put instead? > + Keep patches to essential changes (not syntax and language changes). > + Do not perform such a massive application rewrite using d/patches. The size of patches is mostly due to a previous review arguing for Python2 to Python3 change. As the upstream version IS written in Python2 and I cannot change the past, those changes are applied as patch. As the changes are very intrusive, they affect nearly all files, thus the patch is generated automatically. I know, that this intermediate state is not nice. If the package is accepted, the next upstream version will include them already and thus the patches will not require them any more - acceptance will shape the development goals of upstream. As long as no acceptance can be reached, I do not want to change/branch upstream in each direction the contraticting review comments suggest, thus creating a large mess of parallel versions. > + That change should be sent upstream... not held local. They will go upstream, as soon as it is clear, that they are needed. Without moving towards acceptance of the packet, the changes are definitely not needed by anyone. Downstream shapes upstream and vice versa. > - Absolutely DO NOT generate lintian-override files from d/patches. Where should they be put instead? Would this be the correct solution? * Put them in d/guerillabackup.lintian-overrides * Change d/rules somehow, so that "dh_lintian" is called automatically? > - d/control should wrap long lines. Looking into d/control all lines are wrapped at the first whitespace after 60 characters. Currently the longest line is currently 70 characters (see d/control posted above). This is significantly beyond typical terminal width and even ~5 characters below the average line length of e.g. util-linux control file in Debian Stretch. What are the Debian rules for line length? Below 50? > - d/rules could use some more whitespace. Where? The file is just 37 lines and in my opinion not hard to read. Any whitespace I could think of would just decrease readability by stretching the content. Each paragraph, containing at most 3 commands, is separated from the next by a blank line. Should there be a blank line between each single command? > - The py2 dependency is bad (very bad, especially at this point). > + There are only two years left until py2 EoL. Where did you find that? All "grep -i -r python2" hits are in files not to be included (see Py2/3 patch comments from above) or are centered around py2/py3 migration itself. Maybe I missed something. > - d/links suggests binaries are not being installed into standard locations. Because Python when invoked via she-bang adds by default the binaries' directory to the search path. I do not want to have all other binaries of a target system (some might be Python code by themself) on the search path for security reasons. As backups are usually run as root, a single unrelated package could thus allow to gain privileges. > - d/NEWS is not needed here and will be needlessly noisy for users. Removed. > - d/install should use wildcards to grab docs, instead of a file list This was done with forward compatibility in mind - future documentation without relevance for the package should not be included automatically to keep the package small and to avoid having useless documentation included (whitelisting vs. blacklisting). Still I changed it to /*.txt . At the moment the number of files is small enough to stick to a "card blanche" approach. > - d/install does not need the leading "./" Already removed in yesterday uploaded version. > - d/compat should probably get a bump to 11 > + in d/control as well Even if mentors.debian upload platform will then complain about it? That was the exact reason, why I kept it lower until now - there is no ring to bind them all. I increased to 11 in d/compat, but keeping it at 10.0.0 in control - otherwise the package cannot be built any more. > - This is a python package, but raw source is being installed to /usr/lib. > + The python scripts should be compiled. This was a discussion point in previous review: as backup should also work when disasters are evolving, it should not be too fragile. The code base is very small, so Python JIT does not really degrade performace. On the other hand: it was not possible to find any documentation on .pyc file generation/update/use failure modes. So they just increase failure- and attack surface to an unknown extend without any reasonable benefit. Therefore .pyc generation was disabled. If Debian appreciates taking safety/security decisions without being backed by facts, I have not problem changing the behaviour for the Debian version of the package. At the moment they are not generated and even when generated, use of them is disabled also - similar to defense in depth. 2 Options: * Just generate them (and do not use them) to make build infrastructure happy. * Generate and enable them. > + There should be no "src/*" inclusion in d/install. See above. > - Using d/install is not the correct way to include init script. > - The debhelper scripts have already been commented on. > - The interaction with the init system was also already commented on. What would be the correct way for those 3 topics? Please consider also safety/security requirements regarding systemd configuration initialization: ================== # Do not enable the services on fresh install by default. The # user should do that manually for those services, he really wants # to run. # Do not start the services after install or update. Without this # option, all units would be started during upgrade, even those # not enabled. When user did not enable them, dpkg should respect # that. Those enabled will still be started by custom postinst # code. ================== Maybe I am misguided, but I assuemed that packaging should not have a negative impact on software quality. So if software requirement is, that software has to be configured MANUALLY for safe operation, packaging should not enforce the opposite. I tried to find someone with deep systemd knowledge to create a solution that allows * default packaging handling of init files _PLUS_ * fulfilling security requirements but failed. At the moment, the included scripts fulfill the security requirements as stated above, they seem to work flawless on Debian Stretch and I do not know, how to improve them any further. > - This packaging doesn't follow any sort of standard. > + There is no way to simply grab the source and build a package. See below. All questions regarding preferred package building data storage location were unanswered yet, therefore this part is missing yet. > + I would suggest creating a debian packaging repo (on alioth or salsa). > + Using gbp will make it substantially easier for others to review. > + Following a standard also means others can easily hop in and help. That was exactly, what I was asking for multiple times. According to https://wiki.debian.org/Alioth/Git what is the most suitable repository mode to use for this type of package (collab/project/ personal)? > - Tests are included, but nothing is done with them. > + If you built them, use them! They are included in upstream for development time testing. As it takes ages arguing if 60 characters is an appropriate line length for d/control files, I assume, that there will be no time for automated unit test in near future. > Not Debian-centric: > > - Why is there no Makefile? Because nothing is built - the Python files are run via interpreter and JIT for security reasons. See above. > + What Install.txt provides should be available via make. I can add that. Just to be sure and to avoid useless work: I assumed that "make install" produces poor quality installs as they cannot be undone easily. Thus nobody should be encouraged to develop software that way. For development, everything is prepared to run all tests as non-root build user. So no "make install" is needed for software development. For rollout staging/production tests, in my opition, the packages should be used. If above assumption is wrong or someone or something (Debian build systems CI, SOPs?) really needs a "make install", I will add it. If we cannot name anyone in need and this is just a kind of "we did that for 20 years" argument, I would rather leave them away. > + It's also where you would stick bits of code to generate documentation. Not according to the Debian maintainer's guide, see https://www.debian.org/doc/manuals/maint-guide/dother.en.html 5.15.3. manpage.xml.ex This is also, what the debian package template generator proposes. > - The "release" version v0.0.0 suggests incomplete work. > + There has been no release since v0.0.0 (in 2016), despite code changes. Please suggest any other number, you like to see. The software is the port of well-tested previous software to Python2. The package as provided via Debian mentors is working without any flaws 24/7 on multiple machines. As long as there are no other users, there will be no additional use-cases, no additional requirements and hence no need for newer versions. So at the moment the next update is scheduled, when Pyhton3 is discontinued and replaced by Python4 or when systemd is outdated. > - Why are man pages under debian/? > + Make them a part of the software, not the package. > + Generate the file man page during build (using make). Because https://www.debian.org/doc/manuals/maint-guide/dother.en.html suggests to do so, see above. > Personally, I don't understand what this backup solution offers over others. > Even after discussing this on IRC, I still really don't have any idea what it > offers over other solutions. I feel like a lot of buzz words are being > arbitrarily assigned to the description. This makes it hard to follow what is > fact and what is opinion. > > Perhaps: > Guerilla Backup is a backup solution for small and unstable networks. It can > use your choice of transports (ssh, telnet, xmpp, etc.) to push backups to a > distributed cluster. Guerilla Backup ensures security by pre-encrypting data > before upload and only this key can be used to decrypt the data. Thanks for this suggestion. I tried to merge your suggestions with additional information from the core requirements. Do you think, that this description gives a better impression? I also added a warning to scare away users, that may not have means to configure and operate such a system anyway. ============================ GuerillaBackup is a backup solution for tailoring special purpose backup data flows, e.g. in rugged environments, with special legal constraints (privacy regulations, cross border data storage), need for integration of custom data processing, auditing or quality assurance code. It is kept small to ease code audits and does not attempt to duplicate features, stable and trusted high quality encryption, networking or storage solutions already provide. So for example it can be integrated with your choice of trusted transports (ssh, SSL, xmpp, etc.) to transfer backups to other nodes according to predefined or custom transfer policies. GuerillaBackup ensures security by encrypting data at source and only this key can be used to decrypt the data. WARNING: If you are familiar with GDPR, ITIL-service-strategy/design ISO-27k, ... this software will allow you to create custom solutions fulfilling your needs for high quality, legally sound backup and archiving data flows. If you do NOT run production with those (or similar terms) in mind, you should look out for something else. See /usr/share/doc/guerillabackup/Design.txt.gz section "Requirements" for more information. ============================ > There's no real distinction between opinions, technical info, or general user > info. It makes it pretty much impossible to know what someone is getting with > this software. Therefore the reference to the software requirements is included. The requirements correspond to all claims made in the description and each requirement is fulfilled by the current implementation. What else should be cited to show, that a claim is legit? Best regards, hd