I noticed that the REST API examples don't mention anything about deleting a particular version of a repository. This is a use case that we need to support.
-Dennis On Wed, May 17, 2017 at 10:03 PM, Michael Hrivnak <mhriv...@redhat.com> wrote: > We've discussed versioned repositories and their merits in the past, but > I'd like to propose a specific direction, and inclusion in 3.0. As a recap > of goals, versions can help us answer two important questions about the > history of a repository: > > 1) What set of content is in a specific version of a repository? > 2) What changed between two arbitrary versions of a repository? > > I am proposing a model where Pulp creates a new version of a repository > for every operation that changes that repo's content. For example, a sync > task would create a single new version. > > Basic Example > ----------- > > - You create repository "foo". > - You sync repository "foo", which produces version 1 of that repo. > - You sync once per day for some period of time, automatically creating a > new version each time. > - You publish repo "foo", which defaults to publishing the most recent > version. > - You don't like something that's new in the repo, so you roll back by > publishing a previous version. > > Data Model Basics > ----------- > > In the past we've stored the relationship between a content unit and a > repo as a standard many-to-many through table. There's a reference to a > unit, and a reference to a repo. > > The version scheme I'm pitching adds two new fields to that through table: > > vadded - a foreign key to the repo version in which this content unit was > added > vremoved - a foreign key to the repo version in which this content unit > was removed. This can be null. > > Multiple entries can exist for the same content unit and repo, so long as > a new one is not added until the previous one's "vremoved" field is set. > > With this structure, it is easy to query the database to answer both > questions we started with. > > REST API > ---------- > > Some endpoint will be made that gives access to the versions of a specific > repository. Ideally we would have a nested endpoint like this: > > /api/v3/repositories/foo/versions/ > > But nested views have been a problem for us with DRF (django rest > framework). If we aren't able to make that happen, I've gotten this to work > in my PoC branch: > > /api/v3/repositoryversions/?repository=foo > > It's not yet clear how best to represent content through the REST API. A > nested endpoint within the repo version object would be ideal. > > /api/v3/repositories/foo/versions/4/content/ > > Operations on a repo where a version could be chosen, such as a publish, > should default to the latest version. It's an open question how best to > represent that, and perhaps it takes the form of two endpoints: > > default to latest: POST /api/v3/repositories/foo/distributors/bar/publish > > specify a version: POST /api/v3/repositories/foo/versions/4/publish > > But that's just one idea. Much about our REST API layout has yet to be > written in stone, and we have flexibility. > > Orphans > --------- > > Notice that this changes the orphan workflow. Removing a content unit from > a repo doesn't make it an orphan. This helps reduce the need to run an > orphan cleanup task, which in turn helps avoid the inherent race condition > that task can introduce. > > Trim History > --------- > > But you may not want to keep history forever, so a valuable feature will > be the ability to trim history. I think this would just be an operation > that squashes a bunch of versions together, and it could optionally take > that opportunity to immediately delete a content unit that becomes an > orphan. > > Illustrating the workflow, if you wanted to squash history prior to > version 10, the task would: > > - delete all of a repo's relationships in the through table where vremoved > is a version <= 10 > - optionally check if each content unit is now an orphan and remove if so > - update all remaining entries where vadded < 10 by setting vadded to 10 > > PoC > -------- > > I have a branch with proof-of-concept code here: > > https://github.com/pulp/pulp/compare/3.0-dev...mhrivnak:vers > ioned-repos?expand=1 > > The models are the most interesting place to look. In particular, I'm very > pleased with how simple the "content()" method is, which returns a QuerySet > matching all the content in a given version. > > The rest is REST ;) API stuff mostly, which isn't all that interesting > except to demonstrate how the data could potentially be exposed. You can > run the included tests (which I made just for dev purposes- not sure if > they deserve a long-term home) which are found in the root of the git repo, > and that loads some data into the database. Then you can hit this endpoint > as an example: > > http://yourhost:8000/api/v3/repositoryversions/?repository=r1 > > Obviously this code is rough, so please consider it for directional and > conceptual purposes only. Assume major additions and improvements if we > follow through on this concept. > > Value > ------- > > Tracking history in this way opens up great possibilities. Some examples: > > Promotion could become a matter of having two publishers on a repo with > different settings, one for "testing" and one for "production", and just > publishing whichever version you like with each. Multiple repos and copy > operations are no longer needed for promotion. Austin suggested that the > ability to tag versions with arbitrary key:value pairs could enhance this > use case. > > An added concept, which could come post-3.0, is tracking publications more > explicitly and associating each with a version. Although I could see a case > for laying this groundwork now before the API is locked down. Promotion > could become more about making a publication available in a different > location, rather than re-creating it. We'd also know which content is part > of a publication, and guarantee that content doesn't get removed before the > publication does. This is a deficiency we have in Pulp 2. > > Pulp-to-pulp sync could become very efficient since they could easily > replicate only the changes since the last sync. > > Incremental exports become more concrete. Rather than depending on a > timestamp, you can know with certainty which version you have in the remote > location, and thus which newer versions need to be exported. > > We could add a "finalized" boolean or similar to a version, and use that > to know if it was successfully completed. If not, for example if a sync > task stopped abruptly, the incomplete version could easily be recognized > and removed. > > Feedback Please > ---------- > > Please ask questions, provide feedback, add ideas, suggest alternatives, > etc. I'm perfectly happy even throwing this PoC away if we come up with > something better. > > Thanks! > > -- > > Michael Hrivnak > > Principal Software Engineer, RHCE > > Red Hat > > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev > >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev