I apologize for the lengthy post, but I did not know where to file an issue for this. It is a generic problem affecting most Pulp 3 plugins.
I am puzzled for some time now about the natural keys used for content in plugins. Examples are: pulp_python: 'filename' pulp_ansible: 'version', 'role' (for role: 'namespace', 'name') pulp_rpm (RPM package): 'name', 'epoch', 'version', 'release', 'arch', 'checksum_type', 'pkgId' pulp_cookbook: 'name', 'version' These look like keys that make sense for content in a single repo (version), but not necessarily for content in a per plugin pool of content. In an ideal world, these keys are globally unique, i.e. there is only a single "utils-0.9.0" Python module world-wide that refers to the same artifacts as the "utils-0.9.0" module on PyPi. But, as far as I know, the world is far from ideal, especially in an enterprise setting... With the current implementation, the following scenarios could happen if I got it right: 1. In Acme Corp, a team develops a Python module/Ansible role/Chef cookbook called "acme_utils", which is part of a repo on a Pulp instance. Another team using different repos happens to choose the same name for their unrelated utility package. They may not be able to create a content unit if they use e.g. the same version or file name. 2. A team happens to choose a name that is already known in PyPi/Galaxy/Supermarket. (Or, someone posts a new name on PyPi/Galaxy/Supermarket that happens to be in use in the company for years). Then, as above, the team may not be able to create content units for their own artifacts. Additionally, *very ugly* things may happen during a sync. The current QueryExistingContentUnits stage may decide that, based on the natural key, completely unrelated content units are already present. The stage just puts them into the new repo version. Example for pulp_python: Somebody does something very stupid (or very sinister): (The files "Django-1.11.16-py2.py3-none-any.whl" and "Django-1.11.16.tar.gz" need to be in the current directory.) export ARTIFACT_HREF=$(http --form POST :8000/pulp/api/v3/artifacts/ file@./Django-1.11.16-py2.py3-none-any.whl | jq -r '._href') http POST :8000/pulp/api/v3/content/python/packages/ artifact=$ARTIFACT_HREF filename=Django-2.0-py3-none-any.whl export ARTIFACT_HREF=$(http --form POST :8000/pulp/api/v3/artifacts/ file@./Django-1.11.16.tar.gz | jq -r '._href') http POST :8000/pulp/api/v3/content/python/packages/ artifact=$ARTIFACT_HREF filename=Django-2.0.tar.gz Somebody else wants to mirror Django 2.0 from PyPi (version_specifier: "==2.0"): http POST :8000/pulp/api/v3/repositories/ name=foo export REPO_HREF=$(http :8000/pulp/api/v3/repositories/ | jq -r '.results[] | select(.name == "foo") | ._href') http -v POST :8000/pulp/api/v3/remotes/python/ name='bar' url='https://pypi.org/' 'includes:=[{"name": "django", "version_specifier":"==2.0"}]' export REMOTE_HREF=$(http :8000/pulp/api/v3/remotes/python/ | jq -r '.results[] | select(.name == "bar") | ._href') http POST :8000$REMOTE_HREF'sync/' repository=$REPO_HREF Now the created repo version contains bogus content (Django 1.11.16 instead of 2.0): $ http :8000/pulp/api/v3/repositories/1/versions/1/content/ | jq '.["results"] | map(.version, .artifact)' [ "1.11.16", "/pulp/api/v3/artifacts/1/", "1.11.16", "/pulp/api/v3/artifacts/2/" ] A "not so dumb" version of this scenario may happen by error like this: export ARTIFACT_HREF=$(http --form POST :8000/pulp/api/v3/artifacts/ file@./Django-1.11.15-py2.py3-none-any.whl | jq -r '._href') http POST :8000/pulp/api/v3/content/python/packages/ artifact=$ARTIFACT_HREF filename=Django-1.11.15-py2.py3-none-any.whl #Forgot to do this: export ARTIFACT_HREF=$(http --form POST :8000/pulp/api/v3/artifacts/ file@./Django-1.11.16-py2.py3-none-any.whl | jq -r '._href') http POST :8000/pulp/api/v3/content/python/packages/ artifact=$ARTIFACT_HREF filename=Django-1.11.16-py2.py3-none-any.whl From now on, no synced repo version on the same Pulp instance will have a Django 1.11.16 wheel. 3. A team releases "module" version "2.0.0" by creating a new version of the "release" repo. However, packaging went wrong and the release needs to be rebuilt. Nobody wants to use version "2.0.1" for the new shiny release, it must be "2.0.0" (the version hasn't been published to the outside world yet). How does the team publish a new repo version containing the re-released module? (The best idea I have is: the team needs to create a new version without the content unit first. Then, find _all_ repo versions that still reference the content unit and delete them. Delete orphan content units. Create the new content unit and add it to a new repo version). 4. A Pulp instance contains unsigned RPM content that will be signed for release. It is not possible to store the signed RPMs on the same instance. (Or alternatively, someone just forgot to sign the RPMs when importing/syncing. They will remain unsigned on subsequent syncs even if the remote repo has been fixed.) (I did not check the behavior in Pulp 2, but most content types have fields like checksum/commit/repo_id/digest in their unit key.) Before discussing implementation options (changing key, adapt sync), I have the following questions: - Is the assessment of the scenarios outlined above correct? - Do you think it make sense to support (some of) these use cases? - If so, are there plans to do so that I am not aware of? _______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev