Thanks for raising this issue. The pulp_file also suffers from this problem in that files with duplicate names can be added to repo versions but they probably shouldn't be:
https://pulp.plan.io/issues/4028 @Simon I like the idea behind the repo_key solution you came up with. Can you be more specific around cases you think that it couldn't handle? I imagine that plugin writers could use properties or denormailzation (ie additional database columns) to solve cases where they need uniqueness across data that isn't in the database. In a worst case scenario, they can't use the pulpcore solution and just have to roll their own. David On Fri, May 31, 2019 at 3:27 PM Simon Baatz <gmbno...@gmail.com> wrote: > On Fri, May 31, 2019 at 01:12:58PM +0200, Tatiana Tereshchenko wrote: > > A while ago RemoveDuplicates stage [0] was introduced to solve the > > problem of enforcing uniqueness constraints within a repository > version > > at sync time. > > The same problem ought to be solved when content which already exists > > in Pulp is added to a repository. E.g. Content was uploaded, or > content > > was synced as a part of other repo. And now you want to add/copy it to > > your repo. > > RPM plugin has to solve this problem (specific examples can be seen in > > this issue [1]). > > It would be great if other plugins can share if the same problem > exists > > for them and if it's valuable to add some mechanism to the pulpcore. > > I believe, if you use RemoveDuplicates stage during sync, then your > > plugin is impacted by the described problem. > > Yes, the problem exists also for pulp_cookbook (although it does not > use the RemoveDuplicates stage). Currently, the implementation to > avoid duplicates in pulp_cookbook has the following components: > > - Content defines a 'repo_key' [0] similar to a unit_key. This key > must be unique within a repo version (and not globally like the > unit_key) > > - Cookbook metadata obtained during a sync does not contain > digests. Therefore pulp_cookbook uses a custom stage > QueryExistingRepoContentAndArtifacts [1] to identify existing > content within the repo version the sync is based on. Content is > queried using the repo key in the base repo version (and duplicates > need not to be removed after the fact). > > (However, something like repo_key might be useful in the > RemoveDuplicates stage for other plugins.) > > - As I found no way to ensure repo_key uniqueness on content > association, it is done at publication time [2] based on the repo_key. > However, this feels like a workaround. I think it should be > enforced on repo version creation. > > > My personal opinion: if RemoveDuplicates stage was worth adding to the > > pulpcore (stages API in pulpcore-plugin), a mechanism to ensure > > uniqueness constraints within a repo version at association time makes > > sense to add as well. > > I fully agree. I don't think the repo_key approach used by > pulp_cookbook is general enough. It works well with Cookbooks, but > other content types might have uniqueness constraints that > can't be mapped directly to a composite key on repo versions. > > > [0] > https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/models.py#L70 > [1] > https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/tasks/synchronizing.py#L61 > [2] > https://github.com/gmbnomis/pulp_cookbook/blob/573e1813bd33c0d09d44cf2cab8634f0e4d10fd4/pulp_cookbook/app/tasks/publishing.py#L63 > > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev