Hi Pavel,

On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka <ppi...@redhat.com> wrote:

> Hello, would like to ask you how to proceed with issue with duplicate (but
> not really) packages.
>
> I am syncing suse repository (opensuse42 and SLE12) and get and duplicate
> error. But when checking the packages [0](from primary.xml) glibc and glibc
> they got same nevra but different checksum (and a few more as size..) so
> doesn't look like real duplicates.
>
Those are weird, the have the same nevra but see the location_href, one is
src and the other one is nosrc! :/ :
<location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
<location href="src/glibc-2.19-20.3.src.rpm"/>

It looks like something OpenSUSE specific. I'm not sure if it's a valid way
to create a repo with such metadata, we need to figure it out at some point.


> I've checked Pulp2 and there is used nevra+sum for repository uniqueness.
> In pulp3 we use only nevra.
>
Why do you think that in pulp 2 we use NEVRA + checksum? have you tested
it?  please point to the code.
I believe in Pulp 2 as well as in Pulp 3 we allow to have packages with
different checksums in Pulp storage.
I don't think we allow having the same packages with different checksums in
the same repo.
FWIW, in pulp 2 the most recently added package is chosen to stay in a
repo, no packages with duplicate NEVRA left after sync, see
https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333


>
> My suggestion is to extend repo_key_fields for rpm package as is in pulp2
> with pkgId (checksum). As I don't think they are really duplicates and
> other software can rely on specific version of package.
>

Unfortunately, I don't remember the main reason to remove duplicates based
on nevra. Was it because some tooling will complain, or was it just to
avoid duplicates at resync time? Does anyone know?
We should not change it unless we know for sure that it's needed + we would
need to have an agreement from all our stakeholders for that change.

For now, I think we can move on and ensure that no duplicates are in a repo
version. To my understanding, the behaviour will be the same as in pulp 2.
Feel free to share where you get duplicate error to see if it's a bug or
not. I wonder why duplicates are not removed automatically. Maybe because
the first version contains duplicates due to this bug
https://pulp.plan.io/issues/6217 ?

Tanya


>
> What do you think?
>
>
> [0]
>
>> <package type="rpm">
>>   <name>glibc</name>
>>   <arch>src</arch>
>>   <version epoch="0" ver="2.19" rel="20.3"/>
>>   <checksum type="sha256"
>> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310</checksum>
>>   <summary>Standard Shared Libraries (from the GNU C Library)</summary>
>>   <description>The GNU C Library provides the most important standard
>> libraries used
>> by nearly all programs: the standard C library, the standard math
>> library, and the POSIX thread library. A system is not functional
>> without these libraries.</description>
>>   <packager>https://www.suse.com/</packager>
>>   <url>http://www.gnu.org/software/libc/libc.html</url>
>>   <time file="1426696882" build="1425645307"/>
>>   <size package="591662" installed="13047428" archive="974464"/>
>> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
>>   <format>
>>     <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and
>> GPL-2.0+</rpm:license>
>>     <rpm:vendor>SUSE LLC &lt;https://www.suse.com/&gt;</rpm:vendor>
>>     <rpm:group>System/Libraries</rpm:group>
>>     <rpm:buildhost>sheep16</rpm:buildhost>
>>     <rpm:sourcerpm/>
>>     <rpm:header-range start="872" end="144403"/>
>>     <rpm:requires>
>>       <rpm:entry name="pwdutils"/>
>>       <rpm:entry name="xz"/>
>>       <rpm:entry name="fdupes"/>
>>       <rpm:entry name="systemd-rpm-macros"/>
>>       <rpm:entry name="libselinux-devel"/>
>>       <rpm:entry name="makeinfo"/>
>>     </rpm:requires>
>>   </format>
>> </package>
>>
>> <package type="rpm">
>>   <name>glibc</name>
>>   <arch>src</arch>
>>   <version epoch="0" ver="2.19" rel="20.3"/>
>>   <checksum type="sha256"
>> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068</checksum>
>>   <summary>Standard Shared Libraries (from the GNU C Library)</summary>
>>   <description>The GNU C Library provides the most important standard
>> libraries used
>> by nearly all programs: the standard C library, the standard math
>> library, and the POSIX thread library. A system is not functional
>> without these libraries.</description>
>>   <packager>https://www.suse.com/</packager>
>>   <url>http://www.gnu.org/software/libc/libc.html</url>
>>   <time file="1426696883" build="1423750734"/>
>>   <size package="12678975" installed="13047285" archive="13057760"/>
>> <location href="src/glibc-2.19-20.3.src.rpm"/>
>>   <format>
>>     <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and
>> GPL-2.0+</rpm:license>
>>     <rpm:vendor>SUSE LLC &lt;https://www.suse.com/&gt;</rpm:vendor>
>>     <rpm:group>System/Libraries</rpm:group>
>>     <rpm:buildhost>sheep02</rpm:buildhost>
>>     <rpm:sourcerpm/>
>>     <rpm:header-range start="872" end="144334"/>
>>     <rpm:requires>
>>       <rpm:entry name="pwdutils"/>
>>       <rpm:entry name="xz"/>
>>       <rpm:entry name="fdupes"/>
>>       <rpm:entry name="systemd-rpm-macros"/>
>>       <rpm:entry name="libselinux-devel"/>
>>       <rpm:entry name="makeinfo"/>
>>     </rpm:requires>
>>   </format>
>> </package>
>
>
> --
> Pavel Picka
> Red Hat
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
_______________________________________________
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev

Reply via email to