I am looking forward to discussing the use cases. I hope we can get versioned repositories into 3.0. Thanks everyone for the discussion so far.
-Dennis On Fri, Dec 1, 2017 at 5:16 PM, Brian Bouterse <[email protected]> wrote: > Thank you all for such great discussion! > > To recap some discussion we had today. We are going to look at the > versioned repos use cases at an upcoming MVP call in the near future > (probably 12/8). Look for the pulp-list announcement. If you have use cases > you want to share, you can add them in red in the Versioned Repos section > of the MVP here: https://pulp.plan.io/projects/ > pulp/wiki/Pulp_3_Minimum_Viable_Product/#Versioned-Repositories > > Once the use cases are known, we can look at the PR and see if it fulfills > them. From the discussion today, the general consensus is that gap will be > relatively small, which makes including it in Pulp3 feasible. > > @misa providing those types of features may be possible. Imagine an > optional attribute on a repo version named 'frozen' that defaults to True. > While the latest repo_version for a repo has frozen=False, any action that > would normally create a new repo version (copy, add/remove, delete, etc) > would act on the existing repo version and *not* create a new one. Then the > user can update the frozen attribute of the repo version when they want, > which commits the transaction as a repo version. I don't think this would > be too hard to implement. > > > On Thu, Nov 30, 2017 at 3:20 PM, Michael Hrivnak <[email protected]> > wrote: > >> >> >> On Thu, Nov 30, 2017 at 11:43 AM, Mihai Ibanescu < >> [email protected]> wrote: >> >>> I am late to the thread, so I apologize if I repeat things that have >>> been discussed already. >>> >>> Is it a meaningful use case to publish an older version of the repo? >>> Once published, do you keep track of which version got published, and how >>> do you decide which version to push next? This seems like a complication to >>> me. >>> >>> >> A publication will have a reference to the version that it was created >> from. To illustrate how that would get used: Your CTO calls early on a >> Saturday morning and says "I read in the news about a major security flaw >> in cowsay, and I know our applications depend heavily on it. What version >> do we have deployed right now???!!!" You can concretely determine which >> publications are being currently "distributed" to your infrastructure, and >> from there see their exact content sets by virtue of the repo version. >> >> Then there is the promotion workflow, which in Pulp 2 requires a lot of >> copying and re-publishing. With repo versions, you'll have a sequence of >> versions of course. Let's say there's 1, 2 and 3. Version 1 is deployed >> now, version 2 is undergoing testing, and version 3 got created last night >> by the weekly sync job you setup. You would have two different distributors >> that make these publications available to clients: one for production, and >> one for testing. "Promotion" becomes just the act of updating the reference >> on a distribution to a different publication. When testing on version 2 is >> done, assuming it passes, you can update the production distribution to >> make it use version 2. >> >> There are a few use cases for publishing an old version. >> >> One is: I want to publish the same exact content set two different ways, >> with two different publishers. If the contents change between publishes, I >> want a guarantee that it won't cause the second publish to use different >> content than the first. >> >> Second: I like the state of the content in a repo as it is right now. I >> want to publish that exact content set. If any changes happen to the >> content in that repo between now and when my publish task gets run by a >> worker, I don't want those changes to affect the publish I'm requesting >> right now. >> >> Third: I want the ability to roll back from a bad content set to a >> known-good one. How many publications must I keep around to have confidence >> that if I need to roll back some distance, that publication will still be >> available? It's valuable to know I can re-publish an older version any time >> I need it. >> >> Fourth: In some cases you may decide after-the-fact that you need to >> publish the same content set a different way. Maybe you went to kickstart >> from a yum repo and then remembered that (this is a true story) one version >> of your installer is too old to know about sha256 checksums, so you have to >> go re-publish the same content set with different settings for how the >> metadata gets generated. >> >> Otherwise, just as reproducible builds of software is a very valuable >> trait, reproducible publishes of repositories are valuable for similar >> reasons. >> >> >> >>> As a user / content developer, it seems more useful to me to always >>> publish the latest (i.e. don't have an optional version for publishing), >>> but have the ability to copy from a specific version of a repo into another >>> repo (or the same repo, effectively reverting the content of latest). >>> >>> So I would shift the discussion away from the REST API (for now), and >>> more into the expected behavior for manipulating content within pulp. The >>> operations I am aware of are: syncing units, importing units, >>> copying/deleting units, and I am seeking clarification on how versioning >>> will work for each. >>> >>> Syncing is probably the easiest, because it can handle all the changes >>> internally and create a new version at the end. >>> >>> For importing, if you don't want to create unnecessary intermediate >>> versions that are meaningless, I would want the ability to upload more than >>> one unit and associate it to the repo, and then create a version. In other >>> words, a transactional multi-upload. >>> >> >> Indeed. We want to have a behavior in Pulp 3 anyway that lets you >> arbitrarily add and remove multiple content units in one operation. That's >> one of the more notable missing features from Pulp 2. As Brian has pointed >> out, one option is to let a user directly POST to a "versions" endpoint and >> express what content they want to add/remove. Even without repo versions, >> we'd still want an API that lets you bulk add/remove. >> >> >>> For copying, as suggested above, I want to optionally specify the >>> version. >>> >>> Deleting by itself is not hard, it does what it needs to do and then >>> creates a version. >>> >>> The more complicated use case would be: what if I wanted to change the >>> contents of repoA: >>> * add 3 packages from repo1 version 1 >>> * add 4 packages from repo2 (latest) >>> * delete 5 packages >>> >>> and at the end have a single version change for repoA. >>> >>> Or, for the same repoA: >>> * delete all units of type "rpm" and name "glibc" >>> * copy unit type "rpm" and name "glibc" from two versions ago >>> >>> >>> If you wanted this use case, then you need a new resource type, somewhat >>> similar to a Task, let's call it Transaction. It is tied to the repository >>> it operates on (repoA in the example above), and locks it from further >>> changes until the transaction is committed or aborted. It could be >>> implemented internally as a repository. You start with the current contents >>> of repoA, and you perform whatever operations you need to do (including >>> changing repo metadata). When you "commit" the Transaction, it becomes >>> *the* new version of the repository and unlocks repoA. >>> >> >> Yep, we're on the same page with the use case I think. The other option >> is to let you as a user query for whatever content you care about adding >> and removing; find it however you see fit. Then use the bulk add/remove >> feature to carry that out in one operation. >> >> I do like the idea of persistently storing a Transaction as you call it, >> and possibly even letting a user build one explicitly. Even just as an >> implementation detail, any bulk add/remove endpoint may need to store the >> requested changes temporarily in the database as a means to get the input >> from the web handler to a celery worker. We probably don't want to stuff >> 10k+ content references into an AMQP message and pass them all in as an >> argument to the task. And if we're going to store them in the DB, maybe it >> would make sense to expose that to the user and let them create a >> Transaction directly. >> >> >>> Whether a Version is a full copy of the repo or a delta is an >>> implementation detail. I would argue for full copy, otherwise you run into >>> the inefficiencies of cvs which had to apply patches in reverse order just >>> to get to a version in the past. I would find it more useful to have a repo >>> diff resource (diff version 1 with version 3, or repo1 version 1 with repo2 >>> latest). >>> >> >> Agreed that it's an implementation detail. In the case of cvs and >> similar, all changes had to be applied sequentially in order to construct a >> final product. When you're only tracking set membership, querying becomes >> MUCH simpler and is very efficient. >> >> >>> >>> Unfortunately, it is a rather large paradigm shift, and not one that you >>> can push in a 3.0 -> 3.1 transition. Parts of it will need to land in 3.0 >>> proper, determining what can be left out is an exercise to the reader who >>> managed to keep up with my long emails. >>> >>> Hey, a man can dream. >>> >> >> I'm dreaming with you! (and also likely putting people to sleep with my >> own long emails) I also think this is a hallmark behavior that is important >> to get right conceptually, and very important to a variety of stakeholders. >> >> Thanks a lot for sharing your insight! If you have more thoughts on these >> use cases, please keep it coming. >> >> _______________________________________________ >> Pulp-dev mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/pulp-dev >> >> > > _______________________________________________ > Pulp-dev mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/pulp-dev > >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
