On Sun, Jul 21, 2019 at 3:00 PM Brian Bouterse <[email protected]> wrote:
> > > On Sun, Jul 21, 2019 at 6:23 AM Tatiana Tereshchenko <[email protected]> > wrote: > >> +1 to the idea of a repo_key. >> >> Should we also add the ability to apply custom validation of the content >> being added? >> Similar to a repo_key, Content model can optionally provide an additional >> validator. >> Use cases: >> - for pulp_file to avoid relative path overlap - e.g. 'a/b' and 'a' >> > In thinking this over more, I'm unsure that pulp_file has the use case. > Two different Artifacts having relative paths 'a' and 'a/b' in one repo > version doesn't seem problematic. This problem statement is similar to the > Distribution.base_path overlap problem statement where it's unavoidably > ambiguous which Distribution should be matched when base_paths are allowed > to overlap. In this case for pulp_file, it's not ambiguous in the same way, > the relative_path I expect to match to exactly 1 content unit either 'a', > or 'a/b', but not both. What do you think about this? > I agree that the problem is similar to the Distribution.base_path overlap. If I understand you correctly, yes, it's not a problem if you query content one by one. What about use cases when we want to have a repo version on a filesystem? E.g. Browsable repositoiries (this feature has already been asked for by our stakeholders), export (e.g. rsync). > > - for pulp_rpm to filter by signature/signing key >> > Can we expand on this use case a bit? Is it that the repo version should > only contains units signed or unsigned rpms? Or is it that we are ok with a > mixture as long as each NEVRA is unique? I suspect the former, but I want > to be sure. > I think it should contain only signed units and optionally signed by specific keys only. See pulp 2 feature description https://docs.pulpproject.org/plugins/pulp_rpm/user-guide/features.html#package-signatures-and-gpg-key-id-filtering Another use case which comes to mind is: keeping the last N versions of a unit within a repo verison. Tanya > >> Plugins can solve it by defining their own stage but it seems like almost >> any plugin needs to ensure absence of collisions specific to it, even the >> simple pulp_file. >> It means that our default pipeline becomes less useful and will be hardly >> ever used by any [currently known] plugins. >> >> Any thoughts? >> >> Tanya >> >> >> On Mon, Jul 8, 2019 at 9:09 PM Brian Bouterse <[email protected]> >> wrote: >> >>> I want to retell Simon's proposal to have "Content defines a 'repo_key' >>> similar to a unit_key. This key must be unique within a repo version (and >>> not globally like the unit_key." >>> >>> We could adopt his proposal to have the repo_key tuple defined on >>> Content in pulpcore. If we left the add/remove APIs in core and adopt for >>> both sync and add/remove a "keep newest to associate" functionality >>> described earlier in the thread. This "keep newest to associate" code would >>> be used by sync in the form of a core stage that is a generalized version >>> of the RemoveDuplicates stage. This would become part of the default >>> pipeline for all users of Stages API. I think this would be better than >>> plugin writers implementing it over and over and also less effort for >>> plugin wrtiers. This design would meet the current needs of pulp_cookbook, >>> pulp_file, and pulp_rpm which are the only 3 places I know we have this >>> problem so far, but I believe more content types are susceptible to this. >>> >>> What do you think we should do? >>> >>> Thanks! >>> Brian >>> >>> >>> >>> >>> >>> On Thu, Jun 27, 2019 at 4:03 AM Tatiana Tereshchenko < >>> [email protected]> wrote: >>> >>>> Sure, the code can be de-duplicated. >>>> My main worry is that it's a responsibility of a plugin writer not to >>>> forget to ensure uniqueness constraints within a repo version for every >>>> workflow (sync, copy, anything else) where a repo version is created. >>>> Every time before RepositoryVersion.create() is called, there should be >>>> a check that there is no colliding content in a repo version. >>>> It would be much more reliable and friendly, in my opinion, if plugin >>>> writer could define rules/callbacks/whatever for each content and it would >>>> be applied to any repository creation. >>>> At the same time this eliminates the flexibility to define different >>>> logic for content collision for different workflows, however I'm not sure >>>> if such a use case exists or is desired. >>>> >>>> Tanya >>>> >>>> >>>> On Wed, Jun 26, 2019 at 6:49 PM Austin Macdonald <[email protected]> >>>> wrote: >>>> >>>>> @Tanya Tereshchenko <[email protected]> >>>>> >>>>>> Do I understand correctly that it doesn't cover the sync case and >>>>>> it's only about explicit repo version creation? >>>>>> >>>>> >>>>> I don't mean that add/remove could not share code with remove >>>>> duplicate stage. I wanted to point out that we have a problem here (how to >>>>> remove duplicates) that has similar patterns to other problems with add >>>>> remove (recursive, copy, deciding which content to keep with a collision, >>>>> etc.) I don't doubt that pulpcore could help solve these problems, but I >>>>> think that as we approach our GA, we should consider solving this problem >>>>> (for now) by getting out of the way of plugin writers rather than by >>>>> implementing code that is supposed to work for all plugins. I suspect that >>>>> plenty of the plugins will be implementing their own add/remove anyway. >>>>> >>>>> On Tue, Jun 25, 2019 at 12:56 PM David Davis <[email protected]> >>>>> wrote: >>>>> >>>>>> I don't think this solution would work in the case of creating a new >>>>>> repository version. Suppose for example you had two content units that >>>>>> collide, one in a repo version and one older unit that a user explicitly >>>>>> wants to add to the repo version. If the latter one is older, then what >>>>>> would happen? >>>>>> >>>>>> David >>>>>> >>>>>> >>>>>> On Tue, Jun 25, 2019 at 12:48 PM Brian Bouterse <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Having a way for units to express their uniqueness per repo sounds >>>>>>> good because then more areas of Pulp's code could answer the question: >>>>>>> "will I have a duplicate if I add content X to repo_version Y". >>>>>>> >>>>>>> Let's assume we know that situation is about to occur during sync >>>>>>> for example, what do we do about it? In the errata case we know the >>>>>>> "new" >>>>>>> one should replace the existing one. Maybe we start to 'order' the units >>>>>>> with colliding repo keys and keep the newest one always? Would this work >>>>>>> for pulp_cookbook and pulp_rpm? Would it generalize? Is this what you >>>>>>> imagined? >>>>>>> >>>>>>> On Tue, Jun 25, 2019 at 5:30 AM Tatiana Tereshchenko < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Do I understand correctly that it doesn't cover the sync case and >>>>>>>> it's only about explicit repo version creation? >>>>>>>> So the suggestion is to implement the same logic twice: for sync >>>>>>>> case - RemoveDuplicates stage and/or maybe some custom stage (e.g. to >>>>>>>> disallow overlapping paths), and for direct repo version creation - >>>>>>>> your >>>>>>>> proposal. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Jun 24, 2019 at 3:13 PM Austin Macdonald < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I have a design in mind for solving this problem: >>>>>>>>> >>>>>>>>> 1. Remove POST to RepositoryVersion (no general add/remove >>>>>>>>> endpoint). >>>>>>>>> 2. Add an endpoint to kick off an add/remove task, namespaced by >>>>>>>>> plugin. ie `POST pulp/api/v3/docker/add-remove/` >>>>>>>>> This view can be provided to all plugins by the plugin >>>>>>>>> template, and will be based on the current RepositoryVersionCreate: >>>>>>>>> >>>>>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/viewsets/repository.py#L221-L258 >>>>>>>>> Note: the main purpose of this view is to kick off the general >>>>>>>>> add/remove task, which will be unchanged: >>>>>>>>> >>>>>>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/tasks/repository.py#L70 >>>>>>>>> 3. Add an add/remove serializer to the plugin API. >>>>>>>>> 3. Plugins needing further customization can provide their own >>>>>>>>> task and subclassed serializer. >>>>>>>>> >>>>>>>>> This gives the plugin writer full control over the endpoint >>>>>>>>> (customizable arguments and validation), and full control over the >>>>>>>>> flow >>>>>>>>> (extra logic, depsolving, enforced uniqueness). It only uses the >>>>>>>>> existing >>>>>>>>> patterns (and existing required knowledge), but requires no work >>>>>>>>> (other >>>>>>>>> than using the template) for the simple case. >>>>>>>>> >>>>>>>>> On Mon, Jun 3, 2019 at 2:56 PM Simon Baatz <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On Mon, Jun 03, 2019 at 09:11:07AM -0400, David Davis wrote: >>>>>>>>>> > @Simon I like the idea behind the repo_key solution you came >>>>>>>>>> up with. >>>>>>>>>> > Can you be more specific around cases you think that it >>>>>>>>>> couldn't >>>>>>>>>> > handle? I imagine that plugin writers could use properties or >>>>>>>>>> > denormailzation (ie additional database columns) to solve >>>>>>>>>> cases where >>>>>>>>>> > they need uniqueness across data that isn't in the database. >>>>>>>>>> In a worst >>>>>>>>>> > case scenario, they can't use the pulpcore solution and just >>>>>>>>>> have to >>>>>>>>>> > roll their own. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What I wrote probably sounded too pessimistic. You are right, in >>>>>>>>>> most cases that should be doable. >>>>>>>>>> >>>>>>>>>> I agree that we could have a simple default solution that just >>>>>>>>>> requires to specify a couple of field names in the easiest case. >>>>>>>>>> As you >>>>>>>>>> say, it should be possible use custom logic in a plugin if >>>>>>>>>> required. >>>>>>>>>> >>>>>>>>>> Here is the case I was thinking of that it can't handle: >>>>>>>>>> >>>>>>>>>> In pulp_file, a uniqueness constraint on "relative_path" would >>>>>>>>>> allow >>>>>>>>>> content units "a" and "a/b" to be in a repo version. >>>>>>>>>> >>>>>>>>>> However, we may want file repos to be representable on an actual >>>>>>>>>> file >>>>>>>>>> system (e.g. when exporting them as tar files). For the repo >>>>>>>>>> above, >>>>>>>>>> this does not work, as "a" can't be a file and a directory at the >>>>>>>>>> same time on a standard Unix file system. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Pulp-dev mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Pulp-dev mailing list >>>>>>>>> [email protected] >>>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Pulp-dev mailing list >>>>>>>> [email protected] >>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pulp-dev mailing list >>>>>>> [email protected] >>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> [email protected] >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> [email protected] >>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>> _______________________________________________ >> Pulp-dev mailing list >> [email protected] >> https://www.redhat.com/mailman/listinfo/pulp-dev >> >
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
