Yes but I was imagining the mapping would be stored not as Content but as a separate object. So we wouldn't use filename for the mapping (rather we'd use ContentArtifact pk) and we wouldn't need to change ContentArtifact's relative_path at all. That said, I think your solution captures the idea though and is better in some ways.
Changing the RepositoryContent model to point to ContentArtifacts and store relative_paths is probably the best and most correct solution in theory. However, it's going to be painful to implement for both core and plugins. David On Thu, Apr 30, 2020 at 12:33 PM Daniel Alley <dal...@redhat.com> wrote: > @David Davis <davidda...@redhat.com> so this proposal would go something > like this, correct?: > > * For the signed metadata / exact mirror use-case we need to store the > repository metadata itself as a content unit inside the RepositoryVersion > anyway (because the hash must be equal) > * Because we have this metadata lying around, we can reference it at > publish time to discover the appropriate PublishedArtifact.relative_path > * Create a map of "filename" -> "location_href" and look up the > filename of each RPM package to find the appropriate path > * This should be pretty fast for the RPM plugin since createrepo_c is > doing all the hard work > * Data migration to ensure ContentArtifact.relative_path is only storing > the filename (and I would suggest we also change the name to "filename") > * If metadata isn't present in the RepositoryVersion, then just tweak the > PublishedArtifact.relative_path so that it uses whichever our default repo > layout is > > On Tue, Apr 28, 2020 at 11:41 AM David Davis <davidda...@redhat.com> > wrote: > >> Yes, that's correct. During our meeting we discussed two options: the >> first was to extend RepositoryContent to store relative path per >> ContentArtifact as storing a relative_path per Content won't work for >> multi-Artifact Content units. >> >> An alternative that I pitched was to have plugins (or maybe even core >> someday) store this information outside RepositoryContent and then use this >> information during publishing to set relative_path on PublishedArtifacts. >> We'd have to modify the content app if we wanted to support pass through >> publications but I think asking plugins to use published artifacts in this >> case is warranted. That said, I don't think anyone else was keen on this >> idea though. >> >> David >> >> >> On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg <mdell...@redhat.com> >> wrote: >> >>> That is only used for passthrough publication afaik. If you publish each >>> content unit "by hand", you create a new relative path for each published >>> artifact. That is, why it can be empty and still the content can be >>> published. >>> >>> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley <dal...@redhat.com> wrote: >>> >>>> We realized in our discussion that the original proposal described in >>>> my email will not work, because "relative_path" ultimately describes the >>>> path of the published *artifacts* (not content), and for content types >>>> with multiple artifacts, storing this information in a field on >>>> RepositoryContent would not be possible. >>>> >>>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley <dal...@redhat.com> wrote: >>>> >>>>> There is a video call scheduled to discuss this issue tomorrow >>>>> (Tuesday April 28th) at 13:30 UTC (please convert to your local time). >>>>> https://meet.google.com/scy-csbx-qiu >>>>> >>>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis <davidda...@redhat.com> >>>>> wrote: >>>>> >>>>>> I had a chance to think about this some more yesterday and wanted to >>>>>> email out my thoughts. I also think that this change sounds scary and >>>>>> will >>>>>> have a big impact on plugin writers so I thought of a couple >>>>>> alternatives: >>>>>> >>>>>> First, we could add a relative_path field to RepositoryContent >>>>>> instead of moving it there. This would be an optional field. It would be >>>>>> up >>>>>> to plugins to manage this field and they would still need to populate the >>>>>> relative_path field on ContentArtifact. But plugins could use this >>>>>> optional >>>>>> field to store relative paths per repository and then use this field when >>>>>> generating publications. >>>>>> >>>>>> The second alternative is one that is already laid out in the >>>>>> original email but to call it out again: it would be to not solve this in >>>>>> pulpcore. RPM would create its own object that would map content in a >>>>>> repository to relative_paths. >>>>>> >>>>>> David >>>>>> >>>>>> >>>>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp <p...@atix.de> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> I am not currently very well versed in the classes involved, but >>>>>>> moving relative_path around sounds slightly scary with the potential to >>>>>>> break things. >>>>>>> >>>>>>> >>>>>>> As such, I would be interested to be kept in the loop as this moves >>>>>>> forward. (Mailing list once there is some movement is entirely >>>>>>> sufficient >>>>>>> 😉) >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Quirin Pamp >>>>>>> ------------------------------ >>>>>>> *From:* pulp-dev-boun...@redhat.com <pulp-dev-boun...@redhat.com> >>>>>>> on behalf of Ina Panova <ipan...@redhat.com> >>>>>>> *Sent:* 21 April 2020 14:07:13 >>>>>>> *To:* Daniel Alley <dal...@redhat.com> >>>>>>> *Cc:* Pulp-dev <pulp-dev@redhat.com> >>>>>>> *Subject:* Re: [Pulp-dev] the "relative path" problem >>>>>>> >>>>>>> Daniel, >>>>>>> >>>>>>> how about setting up a meeting and brainstorm the alternatives, >>>>>>> pros/cons there? >>>>>>> >>>>>>> >>>>>>> -------- >>>>>>> Regards, >>>>>>> >>>>>>> Ina Panova >>>>>>> Senior Software Engineer| Pulp| Red Hat Inc. >>>>>>> >>>>>>> "Do not go where the path may lead, >>>>>>> go instead where there is no path and leave a trail." >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley <dal...@redhat.com> >>>>>>> wrote: >>>>>>> >>>>>>> Bump, this item needs to move forwards soon. Does anyone have any >>>>>>> thoughts? >>>>>>> >>>>>>> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka <ppi...@redhat.com> >>>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> I'd like to add one more question to this topic. Do you think it is >>>>>>> a blocker for PRs [0] & [1] as by testing [2] this features I haven't >>>>>>> run >>>>>>> into real world example where two really same name packages appears. >>>>>>> I think this is a 'must have' feature but until we solve/decide it >>>>>>> we can have two features working may with warning in docs for users that >>>>>>> can happen in some 'special' repositories. >>>>>>> >>>>>>> To follow topic directly I like proposed move to 'RepositoryContent' >>>>>>> and add it to its uniqueness constraint (if I understand well). >>>>>>> >>>>>>> [0] https://github.com/pulp/pulp_rpm/pull/1657 >>>>>>> [1] https://github.com/pulp/pulp_rpm/pull/1642 >>>>>>> [2] tested with centos 7, 8, opensuse and SLE repositories >>>>>>> >>>>>>> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley <dal...@redhat.com> >>>>>>> wrote: >>>>>>> >>>>>>> We'd like to start a discussion on the "relative path problem" >>>>>>> identified recently. >>>>>>> Problem: >>>>>>> >>>>>>> Currently, a relative_path is tied to content in Pulp. This means >>>>>>> that if a content unit exists in two places within a repository or >>>>>>> across >>>>>>> repositories, it has to be stored as two separate content units. This >>>>>>> creates redundant data and potential confusion for users. >>>>>>> >>>>>>> As a specific example, we need to support mirroring content in >>>>>>> pulp_rpm <https://pulp.plan.io/issues/6353>. Currently, for each >>>>>>> location at which a single package is stored, we’ll need to create a >>>>>>> content unit. We could end up with several records representing a single >>>>>>> package. Users may be confused about why they see multiple records for a >>>>>>> package and they may have trouble for example deciding which content >>>>>>> unit >>>>>>> to copy. >>>>>>> Proposed Solution: >>>>>>> >>>>>>> Move “relative_path” from its current location on ContentArtifact, >>>>>>> to RepositoryContent. This will require a sizable data migration. It is >>>>>>> possibly the case that in rare cases, repository versions may change >>>>>>> slightly due to deduplication. >>>>>>> >>>>>>> A repository-version-wide uniqueness constraint will be present on >>>>>>> “relative_path”, independently of any other repository uniquness >>>>>>> constraints (repo_key_fields) defined by the plugin writer. >>>>>>> >>>>>>> Modify the Stages API so that the relative_path can be processed in >>>>>>> the correct location – instead of “DeclarativeArtifact” it will likely >>>>>>> need >>>>>>> to go on “DeclarativeContent” >>>>>>> >>>>>>> Remove “location_href” from the RPM Package content model – it was >>>>>>> never a true part of the RPM (file) metadata, it is derived from the >>>>>>> repository metadata. So storing it as a part of the Content unit doesn’t >>>>>>> entirely make sense. >>>>>>> Alternatives >>>>>>> >>>>>>> In most cases, a content unit will have a single relative path for a >>>>>>> content unit. Creating a general solution to solve a one-off problem is >>>>>>> usually not a good idea. As an alternative, we could look at another >>>>>>> solution for mirroring content. One example might be to create a new >>>>>>> object >>>>>>> (e.g. RpmRepoMirrorContentMapping) that maps content to specific paths >>>>>>> within a repo or repo version. >>>>>>> Questions >>>>>>> >>>>>>> - How do we handle this in pulp_file? How are content units >>>>>>> identified in pulp_file without relative_path? >>>>>>> - Checksum? >>>>>>> - How was this problem handled in Pulp 2? >>>>>>> >>>>>>> >>>>>>> Please weigh in if you have any input on potential problems with the >>>>>>> proposal, potential alternate solutions, or other insights or questions! >>>>>>> _______________________________________________ >>>>>>> Pulp-dev mailing list >>>>>>> Pulp-dev@redhat.com >>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Pavel Picka >>>>>>> Red Hat >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pulp-dev mailing list >>>>>>> Pulp-dev@redhat.com >>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Pulp-dev mailing list >>>>>>> Pulp-dev@redhat.com >>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>> >>>>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> Pulp-dev@redhat.com >>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>>
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev