We'd like to start a discussion on the "relative path problem" identified recently. Problem:
Currently, a relative_path is tied to content in Pulp. This means that if a content unit exists in two places within a repository or across repositories, it has to be stored as two separate content units. This creates redundant data and potential confusion for users. As a specific example, we need to support mirroring content in pulp_rpm <https://pulp.plan.io/issues/6353>. Currently, for each location at which a single package is stored, we’ll need to create a content unit. We could end up with several records representing a single package. Users may be confused about why they see multiple records for a package and they may have trouble for example deciding which content unit to copy. Proposed Solution: Move “relative_path” from its current location on ContentArtifact, to RepositoryContent. This will require a sizable data migration. It is possibly the case that in rare cases, repository versions may change slightly due to deduplication. A repository-version-wide uniqueness constraint will be present on “relative_path”, independently of any other repository uniquness constraints (repo_key_fields) defined by the plugin writer. Modify the Stages API so that the relative_path can be processed in the correct location – instead of “DeclarativeArtifact” it will likely need to go on “DeclarativeContent” Remove “location_href” from the RPM Package content model – it was never a true part of the RPM (file) metadata, it is derived from the repository metadata. So storing it as a part of the Content unit doesn’t entirely make sense. Alternatives In most cases, a content unit will have a single relative path for a content unit. Creating a general solution to solve a one-off problem is usually not a good idea. As an alternative, we could look at another solution for mirroring content. One example might be to create a new object (e.g. RpmRepoMirrorContentMapping) that maps content to specific paths within a repo or repo version. Questions - How do we handle this in pulp_file? How are content units identified in pulp_file without relative_path? - Checksum? - How was this problem handled in Pulp 2? Please weigh in if you have any input on potential problems with the proposal, potential alternate solutions, or other insights or questions!
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev