Hi all. I wrote this draft design doc / deployment plan for the tag-to-upload service, perhaps best summarised by Sean like this:
We designed and implemented a system to make it possible for DDs to upload new versions of packages by simply pushing a specially formatted git tag to salsa. Please see this blog post to learn about how it works: https://spwhitton.name/blog/entry/tag2upload/ The server side of this is not running yet and there is some work to do for that. We've had a number of peripheral conversations, and informal internal reviews, but I think it's the stage now to have a public design review etc. I'm CCing this to -devel because I just did a lightning talk demo of the prototype and IME many people are interested in these kinds of questions. Right now this document is maintained here: https://salsa.debian.org/dgit-team/dgit/tree/wip.tag2upl-draft but NB that that is a potentially rewinding branch. (I probably won't rewind it until it's time to fold it into master at which point I may just delete it.) Ian. TAG-TO-UPLOAD - DEBIAN - DRAFT DESIGN / DEPLOYMENT PLAN ======================================================= Overall structure and dataflow ------------------------------ * Uploader (DD or DM) makes signed git tag (containing metadata forming instructions to tag2upload service) * Uploader pushes said tag to salsa. [1] * salsa sends webhook to tag2upload service. * tag2upload service : provides an HTTPS service accessible to salsa's IP addrs : fishes url and tag name out of webhook json ! checks that url is basically sane - retrieves tag data (git shallow clone) ! parses the tag metadata ! checks to see if it is relevant ! verifies signature ! checks to see if signed by DD, or DM for appropriate package - obtains relevant git history - obtains, if applicable, orig tarball from archive - makes source package # signs source package and "dgit view" git tag - pushes history and both tags to dgit git server - uploads source package to archive * archive publishes package as normal [1] In principle other git servers would be possible but it would have to be restricted to ones where we can either avoid, or stop, them being used as a channel for a DoS attack against the tag2upload service. Service architecture -------------------- I propose the following architecture for the tag2upload service. * Packet filter limiting the incoming connections to salsa. * Conventional webserver offering TLS and using Let's Encrypt. (Alternatively, HTTP could be used, but in the future we might want to handle embargoed security uploads so let's not.) * Web-service-style "application server" written in some scripting language listens on a local TCP port, handles HTTP connections proxied by the webserver, parses the JSON, and connects to: * Trusted service daemon. Listens on a TCP connection and accepts a simple line-based "url tag" protocol. Checks urls and tags for basic syntax and sanity (eg that it has the right protocol and host). Keeps track of incoming requests in a sqlite3 database so that execution can be deferred and retried as applicable. Spawns per-request worker children. * Request processor. Trusted. Does the trusted parts above. * Some VM or container or maybe chroot. Instantiated by request processor via adt-virt protocol. Request processor controls this by sending it commands (via the adt-virt facility for this). * In the VM, git is used to fetch all the bits and dgit does the actual source package generation work. * Trusted service daemon needs access to its GPG key which should be on a hardware token and not accessible to the VM instances. Privsep ------- The tag2upload service will have to have a signing key that can upload source packages to the archive. We do not want that signing key to be abused. In particular, even though it will be in a hardware token we want to avoid giving unrestricted access to that key to code which also has a large attack surface. In particular, source package construction is very complex. So there will be a privilege separation arrangement, as described above. Different tasks run in a different security context: ! is fully trusted and has access to the signing key - runs in the discardable VM or container, controlled by `!' # is achieved by the `dgit rpush' protocol, where the trusted (invoking, signing) part offers a restricted signing oracle to the less-trusted (building) part. The signing oracle will check that the files to be signed are roughly in the right form and that they name the right source package. It will construct the "dgit view" git tag itself from metadata provided by the building part. : can run as different unix users or even different VMs or something, if desirable Reproducibility, metdata and auditing ------------------------------------- The trusted part of the tag2upload service will keep some logs, particularly of each tag it is told about and what the disposition of that was, and when it was retried. Also, it will send the following information to a public mailing list: - The tag object data for any tag it decides to process, before it passes it to the VM. - A report (more or less, a shell transcript) of each processing attempt - The list will also be the public email address of the tag2upload robot's signing key The generated .dscs will contain additional fields Git-Tag-Tagger: Firstname Surname <email@address> "tagger" line from the git tag converted to deb822 format Git-Tag-Info: tag=<tagobjid> fp=<fingerprint> algos=1,8 <tagobjid> is the git object ID of the tag object (if someone wants to find this, it can be found on the dgit git server) <fingerprint> is the "fingerprint_in_hex" from the VALIDSIG line in the gpgv output. algos is the <pubkey-algo> and <hash-algo> (here, 1,8 as examples). This additional metadata is necessary to be able to tell by looking at the .dsc who the original uploader was (which might be different to the maintainer, in the sponsorship case). (Programs which use the uploader signature identity will send mails to the mailing list mentioned above, until they have been updated. This is not desirable but not a blocker for deployment.) The generated .changes will contain copies of the two .dsc fields above. The upload will contain a .source_buildinfo. This will list the versions of the software running in the VM, which is primarily what controls the generated .dsc. It will also list the versions of dgit-infrastructure and git running in the trusted part, because the trusted part assembles the tag lines etc. and interprets the git tag. Eventually hopefully there will be a mode for sbuild (related to binary build reproduction), or a suitable script, which can verify a reproduction attempt. For now the src:dgit test suite will check that the upload is reproducible if run again in the same environment. DoS --- This service is not very resistant to DoS attacks. In particular, sending it bad URLs might stall it (since it has to retry failing URLs). So we (i) do not expose it to anyone but salsa and (ii) limit it to trying to fetch salsa urls. Making very many tags on salsa would stress this tag2upload service a bit but not fatally, and it would be a DoS against salsa too. After signature verification, we are much more vulnerable to DoS. An approved signer can get the service to do a lot of work. That is the purpose of the service, indeed. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.