The Rubygems api includes sha as part of the metadata for a gem. Couldn't you use that as part of the natural key?
I'm surprised that Chef's supermarket API doesn't include this as well. Maybe we could open a feature request? David On Tue, Jan 8, 2019 at 2:50 PM Simon Baatz <gmbno...@gmail.com> wrote: > On 08.01.2019 17:16, Jeff Ortel wrote: > > > > > > On 1/3/19 1:28 PM, Simon Baatz wrote: > >> On Thu, Jan 03, 2019 at 01:02:57PM -0500, David Davis wrote: > >>> I don't think that using integer ids with bulk_create and > >>> supporting > >>> mysql/mariadb are necessarily mutually exclusive. I think there > >>> might > >>> be a way to find the records created using bulk_create if we > >>> know the > >>> natural key. It might be more performant than using UUIDs as well. > >> This assumes that there is a natural key. For content types with no > >> digest information in the meta data, there may be a natural key > >> for content within a repo version only, but no natural key for the > >> overall content. (If we want to support non-immediate modes for such > >> content. In immediate mode, a digest can be computed from the > >> associated artifact(s)). > > > > Can you give some examples of Content without a natural key? > > For example, the meta-data obtained for Cookbooks is "version" and > "name" (the same seems to apply to Ruby Gems). With immediate sync > policy, we can add a digest to each content unit as we know the digest > of the associated artifact. Thus, the natural key is "version", "name", > and "digest" > > In "non-immediate mode", we only have "version" and "name" to work with > during sync. Now, there is a trade-off (I think) and we have the > following possibilities: > > 1. Just pretend that "version" and "name" are unique. We have a natural > key, but it may lead to the cross-repo effects that I described a while > ago on the list. > 2. Use "version" and "name" as natural key within a repo version, but > not globally. In this scenario, it may turn out that two content units > are actually the same after downloading. > > I prefer option 2: Content sharing is not perfect, but as a user, I > don't have to fear that repositories mix-up content that happens to have > the same name and version. > > There is also an extension of 2., which allows content sharing during > sync for immediate mode. Define a "pseudo" natural key on global > content level: "version", "name" and "digest". "digest" may be null. Two > content units are considered the same if they match in all three > attributes and these attributes are not null. But even in immediate > mode, the artifact will not be downloaded if "name" and "version" are > already present in the repository version the sync is based on. A > pipeline for this could look like: > > def pipeline_stages(self, new_version): > pipeline = [ > self.first_stage, > QueryExistingContentUnits(new_version=new_version), > ExistingContentNeedsNoArtifacts() > ] > if self.download_artifacts: > pipeline.extend([ArtifactDownloader(), ArtifactSaver(), > UpdateContentWithDownloadResult(), > QueryExistingContentUnits()]) > pipeline.extend([ContentUnitSaver()]) > return pipeline > > QueryExistingContentUnits(new_version=new_version) associates based on > the "repo version key", > QueryExistingContentUnits() associates globally based on the "pseudo > natural key" (digest must be set to match at all) > > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev