I think using pkgid is problematic though. Consider the case where you have two packages with the same location_href but different pkgIds. Since the pulp_rpm code uses location_href (which also gets stored as relative_path) as the filename, which one will get published when a repo version is published?
PS - Don't tell me that two different packages will never have the same location_href. If it's one thing I've learned working on RPM, things that will never happen sometimes do happen. David On Fri, Mar 20, 2020 at 4:46 AM Pavel Picka <ppi...@redhat.com> wrote: > I think we should keep nevra as unique constraint, but as I mentioned > before (above in this thread) your idea is similar to mine as my suggestion > was NEVRA + checksum (pkgId). > With pkgId I've already tested it and working good. > > On Fri, Mar 20, 2020 at 5:43 AM Daniel Alley <dal...@redhat.com> wrote: > >> I discussed this a little bit on the #rpm.org channel. Here is the gist >> of that discussion >> >> - The metadata is "crazy, but technically valid" >> - "the entire SUSE ecosystem tends to do this a lot, anything using >> OBS, including nvidia and dell and friends" >> - "also, SUSE packages can have the same NEVRA with being completely >> different packages because of how their build system makes packages" >> >> I'm not sure what the best means to fix it would be. Perhaps the >> uniqueness constraint should be on the location_href, instead of on the >> NEVRA? Or on NEVRA + location_href? >> >> On Wed, Mar 18, 2020 at 9:47 AM Ina Panova <ipan...@redhat.com> wrote: >> >>> Pavel, >>> I meant to say, that pulp3 does not have such limitation as pulp2 had ( >>> saving rpms on the filesystem with same nevra). >>> The error is raised in pulp3 [0] when a repo version is created, because >>> of the repo key[1], we cannot have 2 rpms with save NEVRA. >>> >>> We can enable that, if we decide to, by adding location_href to the >>> repo_key, *but* this needs to be evaluated, it can have side effects and we >>> should involve our stakeholders to weigh in. >>> >>> [0] >>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570 >>> [1] >>> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/models/package.py#L188 >>> >>> -------- >>> Regards, >>> >>> Ina Panova >>> Senior Software Engineer| Pulp| Red Hat Inc. >>> >>> "Do not go where the path may lead, >>> go instead where there is no path and leave a trail." >>> >>> >>> On Wed, Mar 18, 2020 at 2:24 PM Pavel Picka <ppi...@redhat.com> wrote: >>> >>>> True in opensuse repository there are two possibilities 'src' and >>>> 'nosrc' (this one should be legacy without source code), both are >>>> recognized by createrepo_c as arch 'src'. >>>> >>>> To point the pulp2 code I mentioned I found here [0] (base rpm package >>>> what I understood). >>>> >>>> The rise of error in pulp3 happening here [1] in pulpcore when adding >>>> packages to repository version. >>>> So as Ina mentioned it doesn't have to be an issue with packages itself >>>> than the logic in sync. >>>> >>>> [0] >>>> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779 >>>> [1] >>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570 >>>> >>>> On Wed, Mar 18, 2020 at 1:55 PM Ina Panova <ipan...@redhat.com> wrote: >>>> >>>>> Tanya and Pavel, >>>>> in this issue it is explained why we cannot keep 2 packages with same >>>>> NEVRA but different checksums within a repo >>>>> https://pulp.plan.io/issues/494 >>>>> >>>>> Pulp2 had a limitation where it was not able to save on the filesystem >>>>> 2 rpms with same filename, it lead to the primary.xml that could have >>>>> pointed to the rpm that did not actually get saved. >>>>> I believe in Pulp3 we could allow having rpm with same NEVRA if they >>>>> have different location_href within a repo. >>>>> >>>>> -------- >>>>> Regards, >>>>> >>>>> Ina Panova >>>>> Senior Software Engineer| Pulp| Red Hat Inc. >>>>> >>>>> "Do not go where the path may lead, >>>>> go instead where there is no path and leave a trail." >>>>> >>>>> >>>>> On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko < >>>>> ttere...@redhat.com> wrote: >>>>> >>>>>> Hi Pavel, >>>>>> >>>>>> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka <ppi...@redhat.com> >>>>>> wrote: >>>>>> >>>>>>> Hello, would like to ask you how to proceed with issue with >>>>>>> duplicate (but not really) packages. >>>>>>> >>>>>>> I am syncing suse repository (opensuse42 and SLE12) and get and >>>>>>> duplicate error. But when checking the packages [0](from primary.xml) >>>>>>> glibc >>>>>>> and glibc they got same nevra but different checksum (and a few more as >>>>>>> size..) so doesn't look like real duplicates. >>>>>>> >>>>>> Those are weird, the have the same nevra but see the location_href, >>>>>> one is src and the other one is nosrc! :/ : >>>>>> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/> >>>>>> <location href="src/glibc-2.19-20.3.src.rpm"/> >>>>>> >>>>>> It looks like something OpenSUSE specific. I'm not sure if it's a >>>>>> valid way to create a repo with such metadata, we need to figure it out >>>>>> at >>>>>> some point. >>>>>> >>>>>> >>>>>>> I've checked Pulp2 and there is used nevra+sum for repository >>>>>>> uniqueness. In pulp3 we use only nevra. >>>>>>> >>>>>> Why do you think that in pulp 2 we use NEVRA + checksum? have you >>>>>> tested it? please point to the code. >>>>>> I believe in Pulp 2 as well as in Pulp 3 we allow to have packages >>>>>> with different checksums in Pulp storage. >>>>>> I don't think we allow having the same packages with different >>>>>> checksums in the same repo. >>>>>> FWIW, in pulp 2 the most recently added package is chosen to stay in >>>>>> a repo, no packages with duplicate NEVRA left after sync, see >>>>>> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333 >>>>>> >>>>>> >>>>>>> >>>>>>> My suggestion is to extend repo_key_fields for rpm package as is in >>>>>>> pulp2 with pkgId (checksum). As I don't think they are really duplicates >>>>>>> and other software can rely on specific version of package. >>>>>>> >>>>>> >>>>>> Unfortunately, I don't remember the main reason to remove duplicates >>>>>> based on nevra. Was it because some tooling will complain, or was it just >>>>>> to avoid duplicates at resync time? Does anyone know? >>>>>> We should not change it unless we know for sure that it's needed + we >>>>>> would need to have an agreement from all our stakeholders for that >>>>>> change. >>>>>> >>>>>> For now, I think we can move on and ensure that no duplicates are in >>>>>> a repo version. To my understanding, the behaviour will be the same as in >>>>>> pulp 2. >>>>>> Feel free to share where you get duplicate error to see if it's a bug >>>>>> or not. I wonder why duplicates are not removed automatically. Maybe >>>>>> because the first version contains duplicates due to this bug >>>>>> https://pulp.plan.io/issues/6217 ? >>>>>> >>>>>> Tanya >>>>>> >>>>>> >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> >>>>>>> [0] >>>>>>> >>>>>>>> <package type="rpm"> >>>>>>>> <name>glibc</name> >>>>>>>> <arch>src</arch> >>>>>>>> <version epoch="0" ver="2.19" rel="20.3"/> >>>>>>>> <checksum type="sha256" >>>>>>>> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310</checksum> >>>>>>>> <summary>Standard Shared Libraries (from the GNU C >>>>>>>> Library)</summary> >>>>>>>> <description>The GNU C Library provides the most important >>>>>>>> standard libraries used >>>>>>>> by nearly all programs: the standard C library, the standard math >>>>>>>> library, and the POSIX thread library. A system is not functional >>>>>>>> without these libraries.</description> >>>>>>>> <packager>https://www.suse.com/</packager> >>>>>>>> <url>http://www.gnu.org/software/libc/libc.html</url> >>>>>>>> <time file="1426696882" build="1425645307"/> >>>>>>>> <size package="591662" installed="13047428" archive="974464"/> >>>>>>>> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/> >>>>>>>> <format> >>>>>>>> <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception >>>>>>>> and GPL-2.0+</rpm:license> >>>>>>>> <rpm:vendor>SUSE LLC <https://www.suse.com/></rpm:vendor> >>>>>>>> <rpm:group>System/Libraries</rpm:group> >>>>>>>> <rpm:buildhost>sheep16</rpm:buildhost> >>>>>>>> <rpm:sourcerpm/> >>>>>>>> <rpm:header-range start="872" end="144403"/> >>>>>>>> <rpm:requires> >>>>>>>> <rpm:entry name="pwdutils"/> >>>>>>>> <rpm:entry name="xz"/> >>>>>>>> <rpm:entry name="fdupes"/> >>>>>>>> <rpm:entry name="systemd-rpm-macros"/> >>>>>>>> <rpm:entry name="libselinux-devel"/> >>>>>>>> <rpm:entry name="makeinfo"/> >>>>>>>> </rpm:requires> >>>>>>>> </format> >>>>>>>> </package> >>>>>>>> >>>>>>>> <package type="rpm"> >>>>>>>> <name>glibc</name> >>>>>>>> <arch>src</arch> >>>>>>>> <version epoch="0" ver="2.19" rel="20.3"/> >>>>>>>> <checksum type="sha256" >>>>>>>> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068</checksum> >>>>>>>> <summary>Standard Shared Libraries (from the GNU C >>>>>>>> Library)</summary> >>>>>>>> <description>The GNU C Library provides the most important >>>>>>>> standard libraries used >>>>>>>> by nearly all programs: the standard C library, the standard math >>>>>>>> library, and the POSIX thread library. A system is not functional >>>>>>>> without these libraries.</description> >>>>>>>> <packager>https://www.suse.com/</packager> >>>>>>>> <url>http://www.gnu.org/software/libc/libc.html</url> >>>>>>>> <time file="1426696883" build="1423750734"/> >>>>>>>> <size package="12678975" installed="13047285" archive="13057760"/> >>>>>>>> <location href="src/glibc-2.19-20.3.src.rpm"/> >>>>>>>> <format> >>>>>>>> <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception >>>>>>>> and GPL-2.0+</rpm:license> >>>>>>>> <rpm:vendor>SUSE LLC <https://www.suse.com/></rpm:vendor> >>>>>>>> <rpm:group>System/Libraries</rpm:group> >>>>>>>> <rpm:buildhost>sheep02</rpm:buildhost> >>>>>>>> <rpm:sourcerpm/> >>>>>>>> <rpm:header-range start="872" end="144334"/> >>>>>>>> <rpm:requires> >>>>>>>> <rpm:entry name="pwdutils"/> >>>>>>>> <rpm:entry name="xz"/> >>>>>>>> <rpm:entry name="fdupes"/> >>>>>>>> <rpm:entry name="systemd-rpm-macros"/> >>>>>>>> <rpm:entry name="libselinux-devel"/> >>>>>>>> <rpm:entry name="makeinfo"/> >>>>>>>> </rpm:requires> >>>>>>>> </format> >>>>>>>> </package> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Pavel Picka >>>>>>> Red Hat >>>>>>> _______________________________________________ >>>>>>> Pulp-dev mailing list >>>>>>> Pulp-dev@redhat.com >>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>>> >>>>>> _______________________________________________ >>>>>> Pulp-dev mailing list >>>>>> Pulp-dev@redhat.com >>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>>>> >>>>> >>>> >>>> -- >>>> Pavel Picka >>>> Red Hat >>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> Pulp-dev@redhat.com >>>> https://www.redhat.com/mailman/listinfo/pulp-dev >>>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> Pulp-dev@redhat.com >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> > > -- > Pavel Picka > Red Hat > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev