@David Davis <[email protected]> so this proposal would go something like this, correct?:
* For the signed metadata / exact mirror use-case we need to store the repository metadata itself as a content unit inside the RepositoryVersion anyway (because the hash must be equal) * Because we have this metadata lying around, we can reference it at publish time to discover the appropriate PublishedArtifact.relative_path * Create a map of "filename" -> "location_href" and look up the filename of each RPM package to find the appropriate path * This should be pretty fast for the RPM plugin since createrepo_c is doing all the hard work * Data migration to ensure ContentArtifact.relative_path is only storing the filename (and I would suggest we also change the name to "filename") * If metadata isn't present in the RepositoryVersion, then just tweak the PublishedArtifact.relative_path so that it uses whichever our default repo layout is On Tue, Apr 28, 2020 at 11:41 AM David Davis <[email protected]> wrote: > Yes, that's correct. During our meeting we discussed two options: the > first was to extend RepositoryContent to store relative path per > ContentArtifact as storing a relative_path per Content won't work for > multi-Artifact Content units. > > An alternative that I pitched was to have plugins (or maybe even core > someday) store this information outside RepositoryContent and then use this > information during publishing to set relative_path on PublishedArtifacts. > We'd have to modify the content app if we wanted to support pass through > publications but I think asking plugins to use published artifacts in this > case is warranted. That said, I don't think anyone else was keen on this > idea though. > > David > > > On Tue, Apr 28, 2020 at 10:30 AM Matthias Dellweg <[email protected]> > wrote: > >> That is only used for passthrough publication afaik. If you publish each >> content unit "by hand", you create a new relative path for each published >> artifact. That is, why it can be empty and still the content can be >> published. >> >> On Tue, Apr 28, 2020 at 4:09 PM Daniel Alley <[email protected]> wrote: >> >>> We realized in our discussion that the original proposal described in my >>> email will not work, because "relative_path" ultimately describes the path >>> of the published *artifacts* (not content), and for content types with >>> multiple artifacts, storing this information in a field on >>> RepositoryContent would not be possible. >>> >>> On Mon, Apr 27, 2020 at 6:08 PM Daniel Alley <[email protected]> wrote: >>> >>>> There is a video call scheduled to discuss this issue tomorrow (Tuesday >>>> April 28th) at 13:30 UTC (please convert to your local time). >>>> https://meet.google.com/scy-csbx-qiu >>>> >>>> On Sat, Apr 25, 2020 at 7:02 AM David Davis <[email protected]> >>>> wrote: >>>> >>>>> I had a chance to think about this some more yesterday and wanted to >>>>> email out my thoughts. I also think that this change sounds scary and will >>>>> have a big impact on plugin writers so I thought of a couple alternatives: >>>>> >>>>> First, we could add a relative_path field to RepositoryContent instead >>>>> of moving it there. This would be an optional field. It would be up to >>>>> plugins to manage this field and they would still need to populate the >>>>> relative_path field on ContentArtifact. But plugins could use this >>>>> optional >>>>> field to store relative paths per repository and then use this field when >>>>> generating publications. >>>>> >>>>> The second alternative is one that is already laid out in the original >>>>> email but to call it out again: it would be to not solve this in pulpcore. >>>>> RPM would create its own object that would map content in a repository to >>>>> relative_paths. >>>>> >>>>> David >>>>> >>>>> >>>>> On Tue, Apr 21, 2020 at 9:22 AM Quirin Pamp <[email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> I am not currently very well versed in the classes involved, but >>>>>> moving relative_path around sounds slightly scary with the potential to >>>>>> break things. >>>>>> >>>>>> >>>>>> As such, I would be interested to be kept in the loop as this moves >>>>>> forward. (Mailing list once there is some movement is entirely sufficient >>>>>> 😉) >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Quirin Pamp >>>>>> ------------------------------ >>>>>> *From:* [email protected] <[email protected]> on >>>>>> behalf of Ina Panova <[email protected]> >>>>>> *Sent:* 21 April 2020 14:07:13 >>>>>> *To:* Daniel Alley <[email protected]> >>>>>> *Cc:* Pulp-dev <[email protected]> >>>>>> *Subject:* Re: [Pulp-dev] the "relative path" problem >>>>>> >>>>>> Daniel, >>>>>> >>>>>> how about setting up a meeting and brainstorm the alternatives, >>>>>> pros/cons there? >>>>>> >>>>>> >>>>>> -------- >>>>>> Regards, >>>>>> >>>>>> Ina Panova >>>>>> Senior Software Engineer| Pulp| Red Hat Inc. >>>>>> >>>>>> "Do not go where the path may lead, >>>>>> go instead where there is no path and leave a trail." >>>>>> >>>>>> >>>>>> On Fri, Apr 17, 2020 at 5:57 PM Daniel Alley <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Bump, this item needs to move forwards soon. Does anyone have any >>>>>> thoughts? >>>>>> >>>>>> On Wed, Apr 1, 2020 at 9:40 AM Pavel Picka <[email protected]> wrote: >>>>>> >>>>>> Hi, >>>>>> I'd like to add one more question to this topic. Do you think it is a >>>>>> blocker for PRs [0] & [1] as by testing [2] this features I haven't run >>>>>> into real world example where two really same name packages appears. >>>>>> I think this is a 'must have' feature but until we solve/decide it we >>>>>> can have two features working may with warning in docs for users that can >>>>>> happen in some 'special' repositories. >>>>>> >>>>>> To follow topic directly I like proposed move to 'RepositoryContent' >>>>>> and add it to its uniqueness constraint (if I understand well). >>>>>> >>>>>> [0] https://github.com/pulp/pulp_rpm/pull/1657 >>>>>> [1] https://github.com/pulp/pulp_rpm/pull/1642 >>>>>> [2] tested with centos 7, 8, opensuse and SLE repositories >>>>>> >>>>>> On Wed, Apr 1, 2020 at 3:22 PM Daniel Alley <[email protected]> >>>>>> wrote: >>>>>> >>>>>> We'd like to start a discussion on the "relative path problem" >>>>>> identified recently. >>>>>> Problem: >>>>>> >>>>>> Currently, a relative_path is tied to content in Pulp. This means >>>>>> that if a content unit exists in two places within a repository or across >>>>>> repositories, it has to be stored as two separate content units. This >>>>>> creates redundant data and potential confusion for users. >>>>>> >>>>>> As a specific example, we need to support mirroring content in >>>>>> pulp_rpm <https://pulp.plan.io/issues/6353>. Currently, for each >>>>>> location at which a single package is stored, we’ll need to create a >>>>>> content unit. We could end up with several records representing a single >>>>>> package. Users may be confused about why they see multiple records for a >>>>>> package and they may have trouble for example deciding which content unit >>>>>> to copy. >>>>>> Proposed Solution: >>>>>> >>>>>> Move “relative_path” from its current location on ContentArtifact, to >>>>>> RepositoryContent. This will require a sizable data migration. It is >>>>>> possibly the case that in rare cases, repository versions may change >>>>>> slightly due to deduplication. >>>>>> >>>>>> A repository-version-wide uniqueness constraint will be present on >>>>>> “relative_path”, independently of any other repository uniquness >>>>>> constraints (repo_key_fields) defined by the plugin writer. >>>>>> >>>>>> Modify the Stages API so that the relative_path can be processed in >>>>>> the correct location – instead of “DeclarativeArtifact” it will likely >>>>>> need >>>>>> to go on “DeclarativeContent” >>>>>> >>>>>> Remove “location_href” from the RPM Package content model – it was >>>>>> never a true part of the RPM (file) metadata, it is derived from the >>>>>> repository metadata. So storing it as a part of the Content unit doesn’t >>>>>> entirely make sense. >>>>>> Alternatives >>>>>> >>>>>> In most cases, a content unit will have a single relative path for a >>>>>> content unit. Creating a general solution to solve a one-off problem is >>>>>> usually not a good idea. As an alternative, we could look at another >>>>>> solution for mirroring content. One example might be to create a new >>>>>> object >>>>>> (e.g. RpmRepoMirrorContentMapping) that maps content to specific paths >>>>>> within a repo or repo version. >>>>>> Questions >>>>>> >>>>>> - How do we handle this in pulp_file? How are content units >>>>>> identified in pulp_file without relative_path? >>>>>> - Checksum? >>>>>> - How was this problem handled in Pulp 2? >>>>>> >>>>>> >>>>>> Please weigh in if you have any input on potential problems with the >>>>>> proposal, potential alternate solutions, or other insights or questions! >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> [email protected] >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Pavel Picka >>>>>> Red Hat >>>>>> >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> [email protected] >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> [email protected] >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>> _______________________________________________ >>> Pulp-dev mailing list >>> [email protected] >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >>
_______________________________________________ Pulp-dev mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-dev
