The Docker registry protocol has very limited search and index
functionality - you can (if enabled on the server) list all repositories,
list tags for each repository, and then download the manifest for each tag.
This obviously is very inefficient if you want to ask questions like:

 * "What containers are available with the org.flatpak.metadata" and what
are their human readable names?
 * "Is there a newer version available of any of these 50 installed
applications" (especially without fingerprinting your machine by asking
about each application)

We're planning to distribute desktop applications as Flatpaks on
registry.fedoraproject.org, and to be able make GNOME Software and updates
work properly we need a way to efficiently query the metadata of the
registry.

To handle this, I wrote a new server "Flagstate" -
https://github.com/owtaylor/flagstate - It sits alongside a
docker/distribution registry instance. At initialization, it retrieves all
the metadata from the registry server and stores it in a database. It then
incrementally updates the database in response to webhook notifications
from the registry.

The index server supports queries. A hypothetical URL would be:


https://registry.fedoraproject/index/static?architecture=amd64&annotation:org.flatpak.metadata:exists&label=latest

Which returns a JSON dump of the metadata for matching containers. The
response is designed to be cacheable with consistent ordering and Etag
support. See
https://github.com/owtaylor/flagstate/blob/master/docs/protocol.md for
details.

In addition to the Flatpak use case, this also would be useful for the
Cockpit project that wants to be able to provide a nice interface for
browsing Cockpit plugins that are packaged as container images. Additional
future use cases that could be accomplished with pretty simple extension of
the Flagstate code include providing a backend for CLI searches - docker
search, podman search, ..., and providing a more comprehensive web frontend
than what is currently on registry.fedoraproject.org - allow seeing names,
descriptions, and so forth extracted from container labels.

I'd like to propose that we work toward a deployment of this on Fedora
infrastructure.

Mini-FAQ
=======

Can't this be done by just traversing the registry in a cron job and
writing a static index?
  Without the incremental update process, it's going to be impossible to
provide a reasonable update frequency for a medium-large index. It would be
possible to static-generate a fixed set of queries instead of having
dynamic responses, but I think the current approach is more flexible - it
avoids having to hard-code, say, Flatpak specifics in the server
configuration.

Are there existing projects that could be used instead of writing our own?
  Not that I'm aware of - 'reg', which is used for the current HTML
frontend for registry.fedoraproject.org, takes the above approach of
traversing the entire registry and writing a static index.

How is the data stored?
  Data is stored in a a PostgreSQL database - this is purely a shadow of
the data as canonically stored in the registry, so has minimal backup
needs. PostgreSQL 9.4 or newer is needed for the jsonb functionality,
though this requirement could be removed if necessary.

Why the language/license choice?
   Flagstate is writing in Go and license under the Apache License 2.0 to
match the docker/distribution codebase and maximize the chance of creating
a community.

Is it secure?
   It's hard to say without some careful checking. The index only indexes
public information and care is taken when building SQL queries. DOS is
always a tricky thing to handle on any unauthenticated API - setting the
statement_timeout postgresql parameter might would make accidental DOS
harder.
_______________________________________________
infrastructure mailing list -- infrastructure@lists.fedoraproject.org
To unsubscribe send an email to infrastructure-le...@lists.fedoraproject.org

Reply via email to