Re: [Rpm-ecosystem] Some points about zchunk

2018-07-11 Thread Michael Schroeder
On Wed, Jul 11, 2018 at 11:20:00AM +0100, Jonathan Dieter wrote:
> I must be missing something because I don't understand how that
> follows.  As I understand it, dnf requests the primary metadata. 
> Librepo then downloads either primary.xml.gz or primary.xml.zck. 
> Librepo then asks libsolv to decompress the xml file and convert it
> into a solv file.  dnf then uses the solv file directly.  Why should
> dnf care whether librepo downloaded primary.xml.gz or primary.xml.zck?

But it's not librepo that calls libsolv, it's libdnf.

Anyway, this discussion started because you said:

> I had originally planned to do something along these lines (I think I
> used primary-zck rather than primary@zchunk), but realized that this
> pushed the "choose best format" code into the top-level tools, rather
> than leaving the decision in librepo.

So you're kind of contradicting yourself, IMHO.

Basically all libdnf does is call:
  path = lr_yum_repo_path(yum_repo, "primary");
and then:
  fp_primary = solv_xfopen(hy_repo_get_string(hrepo, path);
  repo_add_rpmmd(repo, fp_primary, 0, 0);

I don't see why librepo can't automagically download/return the
"primary@zchunk" entry instead of "primary".

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Some points about zchunk

2018-07-11 Thread Michael Schroeder
On Tue, Jul 10, 2018 at 02:05:26PM +0100, Jonathan Dieter wrote:
> The top-level tool only needs to deal with the uncompressed metadata. 
> dnf/libdnf requests the primary metadata from librepo, which downloads
> the zchunk version, passes it to libsolv which decompresses it and
> creates the .solv file usable by the top-level tools.

Yes, so the selection of the flavor to download should be in dnf/libdnf.

> DNF neither knows, nor cares that librepo downloaded the zchunk metadata
> rather than gz.

That's just because libsolv uses the file suffix to autodetect the
compression.
Actually dnf/libdnf sould ask libsolv if it supports the compression
(by calling solv_xfopen_iscompressed()) and not blindly assume that
it will magically work.

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Some points about zchunk

2018-07-10 Thread Michael Schroeder
On Mon, Jul 09, 2018 at 09:32:13PM +0100, Jonathan Dieter wrote:
> I had originally planned to do something along these lines (I think I
> used primary-zck rather than primary@zchunk), but realized that this
> pushed the "choose best format" code into the top-level tools, rather
> than leaving the decision in librepo.

But doesn't it in the top-level tools? How can librepo decide that
it's ok to use zchunk if the top-level tool can't deal with it?
IMHO the top-level tool has to tell librepo what compression/format it
understands.

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Some points about zchunk

2018-07-09 Thread Michael Schroeder
On Sun, Jul 08, 2018 at 07:45:36PM +0100, Jonathan Dieter wrote:
> On Fri, 2018-07-06 at 11:48 +0000, Michael Schroeder wrote:
> > On Thu, Jul 05, 2018 at 08:07:58PM +0300, Jonathan Dieter wrote:
> > > My proposal is here:
> > > https://www.jdieter.net/downloads/zchunk/repomd.dtd
> > > 
> > > In summary, I'm just adding extra zchunk attributes to the main file
> > > element:
> > > zck-location
> > > header-checksum
> > > header-size
> > > zck-timestamp
> > > 
> > > librepo first downloads header-size of the file and then verifies that
> > > the header checksum matches and is valid.
> > 
> > Please use zck-header-checksum and zck-header-size instead.
> 
> Ok, will do.

I tought about this a bit more over the weekend, and maybe we
should do this in a bit more general way. Basically zchunk is
just another compression format, like "xz" or "zstd". If we
want to support yet another compression format, we proably wouldn't
want to add new attributes to the existing elements, but instead
add new elements. E.g.


  
  ...


  
  ...


We might also want to add a "format" attribute in case we want
to get switch from "xml" to something that can be parsed faster,
like "json".

The zchunk compression format would be the same, but with added
"header-size" and "header-checksum" elements (so back to what
you had earier):


  
  ...
  ...
  ...
  ...
  ...
  ...
  ...


The problem with all this is that we don't know how all the
repomd.xml parsers behave when there are multiple  elements
with the same type, so we might need to annotate the "type" with
the compression/format, e.g. "primary@zchunk".

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Some points about zchunk

2018-07-06 Thread Michael Schroeder
On Thu, Jul 05, 2018 at 08:07:58PM +0300, Jonathan Dieter wrote:
> My plan was to just keep the same dictionaries (a different one for
> each metadata file) for at least a whole release, if not more.  My
> dictionary generation script
> (https://www.jdieter.net/downloads/zchunk-dicts/split.py)
> removes checksums before running zstd -D, so the dictionary should
> remain effective for a minimum of one release.

Yes, I guessed that you would say something like this. And it's
also a reasonable thing to do.

It's just a shame that we can't generate the dictionary when creating
the repository metadata, it would be such a nice feature to have.

> At the point where the dictionary changes, everybody just downloads the
> full metadata again with the new dictionary and gets good deltas from
> then on.
> 
> I'm planning to package up the optimal Fedora dictionaries, make them
> Recommended: in createrepo_c, and only change them in Rawhide once,
> somewhere around branching.
> 
> By using the same dictionaries, we are able to validate the checksums
> before decompression, which keeps zchunk from decompressing unverified
> data, a possible attack vector.

That depends. Maybe I'll implement dictionary transcoding for zchunk
just in case the zstd algorithms don't change. Even if that's pretty
unlikely.

> >  2) What to put into repomd.xml? We'll need to old primary.xml.gz for
> > compatibility reasons. It's a good security practice to minimize the
> > attack vector, so we should put the zchunk header checksum into
> > the repodata.xml so that it can be verified before running the zchunk
> > code. So primary.xml.zck with extra attributes for the header? Or an
> > extra element that describes the zchunk header?
> 
> My proposal is here:
> https://www.jdieter.net/downloads/zchunk/repomd.dtd
> 
> In summary, I'm just adding extra zchunk attributes to the main file
> element:
> zck-location
> header-checksum
> header-size
> zck-timestamp
> 
> librepo first downloads header-size of the file and then verifies that
> the header checksum matches and is valid.

Please use zck-header-checksum and zck-header-size instead.

> librepo then grabs any common chunks from already downloaded metadata,
> downloads the remaining chunks, and verifies the body checksum that's
> embedded in the header.

And libzypp will never use librepo, so I have to implement all this
myself ;)

The good thing is that libzypp already supports range downloads from
multiple mirrors in parallel, because we already support delta
metadata downloads. So I just need an libzchunk api that "fills"
the target file with the data from the old metadata and returns a
list of ranges that need to be downloaded.

> >  3) I don't think signature support in zchunk is useful ;)
> 
> Fair enough.  ;)  It doesn't actually work yet, and I suspect that
> you're right in the librepo context, but I think it could be useful in
> other contexts.
> 
> >  4) Nitpick: Why does zchunk use sha1 checksums for the chunks? Either
> > it's something that needs to be cryptographic sound, then sha1 is the
> > wrong choice. Or it's just meant for identifying chunks, then
> > md5 is probably faster/smaller. Or some other checksum. But you
> > really don't need 20 bytes like with sha1.
> 
> It doesn't need to be cryptographically sound because we have a body
> checksum that is sha256.  I'll look at adding MD5 support and
> defaulting to it for the chunk checksum type.

Ah, no, I think you misunderstood. Do *not* add md5 support. In fact,
I'd ask you to remove sha1 support as well to make your code smaller.

My point is that you shouldn't use 20 bytes just for chunk identification
purposes. As you said, it doesn't need to be cryptographically sound, we
don't have to make sure it withstands an attacker.
Just use the first 8 bytes of the sha256 sum instead (or sha512, as
it's a bit faster than sha256 IIRC).

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Proposed zchunk file format

2018-03-02 Thread Michael Schroeder
On Fri, Mar 02, 2018 at 02:33:09PM +0200, Jonathan Dieter wrote:
> No, I didn't expect it to have much effect.  Since openSUSE's xml file
> are (presumably) ordered so new packages come last, do you have any old
> primary.xml files lying around that I can test?
> 
> If not, I'll grab them from the next few updates.

They are ordered for the update channels of Leap, but Tumbleweed
is a rolling release distro and thus not ordered. (This also means
that delte repo downloads currently don't work that well for Tumbleweed,
so I'm eager to find something better).

How about using the Fedora metadata but reorder the entries with
the buildtime as sort key?

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Proposed zchunk file format

2018-02-26 Thread Michael Schroeder
On Fri, Feb 23, 2018 at 11:15:40PM +0200, Jonathan Dieter wrote:
> On Fri, 2018-02-23 at 14:14 +0000, Michael Schroeder wrote:
> > Hi Jonathan!
> > 
> > On Fri, Feb 16, 2018 at 08:52:23PM +0200, Jonathan Dieter wrote:
> > > So here's my proposed file format for the zchunk file.  Should I
> > > add
> > > some flags to facilitate possible different compression formats?
> > > 
> > > +-+-+-+-+-+-+-+-+-+-+-+-+==+=+
> > > >  ID   |  Index size   | Compressed Index | Compressed Dict |
> > > 
> > > +-+-+-+-+-+-+-+-+-+-+-+-+==+=+
> > > 
> > > +===+===+
> > > >   Chunk   |   Chunk   | ==> More chunks
> > > 
> > > +===+===+
> > > [...]
> > 
> > This may be an unfair question, but how does it compare to the
> > 'gzip --rsyncable' + zsync approach that we (openSUSE) are
> > using since almost eight years? I guess it's better, but how much?
> > 
> > Cheers,
> >   Michael.
> 
> I've run some tests with zsync (since it's not in Fedora, I rebuilt the
> latest Tumbleweed source rpm), but ran into problems (which is probably
> unsurprising, given that upstream hasn't released an update in eight
> years).

Oh, I didn't propose to use the zsync tool itself, but just the
file format. I.e. --rsyncable compressed files that are accompanied
by .zsync files.

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Proposed zchunk file format

2018-02-26 Thread Michael Schroeder
On Fri, Feb 23, 2018 at 03:23:00PM -0500, Colin Walters wrote:
> 
> 
> On Fri, Feb 23, 2018, at 9:14 AM, Michael Schroeder wrote:
> 
> > This may be an unfair question, but how does it compare to the
> > 'gzip --rsyncable' + zsync approach that we (openSUSE) are
> > using since almost eight years? I guess it's better, but how much?
> 
> Where is that code?  `git grep zsync` in zypper git master has zero
> hits.  I don't see any obvious library dependencies like librepo, it isn't 
> obvious
> to me that it's in repos.cc in the source (that's what fetches metadata 
> right?).

You need to check libzypp, not zypper.

> And I don't see any zsync files in e.g.:
> http://download.opensuse.org/distribution/leap/42.3/repo/oss/suse/

Well, it's because we do it a bit different. openSUSE uses metalinks
for all files on download.opensuse.org. A metalink file consists of
a list of mirrors plus block checksums.

Now if you look at how zsync works, it's a strong checksum and a
rolling checksum for every block. So what we've decided to do is
just add the rolling checksums to the metalink files we generate.

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] A proof-of-concept for delta'ing repodata

2018-02-13 Thread Michael Schroeder
On Tue, Feb 13, 2018 at 10:52:14AM +0100, Igor Gnatenko wrote:
> This would "break" DNF, because libsolv is assigning Id's by the order of
> packages in metadata. So if something requires "webserver" and there is 
> "nginx"
> and "httpd" providing it (without versions), then lowest Id is picked up (not
> going into details of this).

No, this is not correct. Libsolv doesn't use the Id to pick a package,
exactly to be independent on the package order of the repository.

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] rpm -q --whatrequires and rich deps

2016-04-11 Thread Michael Schroeder
On Mon, Apr 11, 2016 at 12:53:35PM +0200, Michael Mraka wrote:
> So to make things consistent I'd propose to fix
>   rpm -q --whatrequires H 
> which currently returns "none". And fix --whatsuggests, --whatrecommends, etc.
> to work the same way.

Wait, don't make "rpm -q --whatrequires H" return richdep if richdep
contains
Requires: (G if H else I)

The H package is on the "if" side, so it is not required. Either
G or I is required. Putting H in the requires index (that's what
you propose) will just mess up rpm's dependency solving.

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] rpm -q --whatrequires and rich deps

2016-04-07 Thread Michael Schroeder
Dne 7.4.2016 v 11:17 Michael Mraka napsal(a):
> I'd like to hear your unbiased opinion that's why I don't include
> neither my preferences nor current rpm behavior for now.
>
> An example to think about - have a package with following requires installed
>   richdep.spec:
> Requires: A
> Requires: B
> Requires: (C and D)
> Requires: (E or F)
> Requires: (G if H else I)
>
> Which of the following queries should include 'richdep' in the output?
> rpm -q --whatrequires A
> rpm -q --whatrequires B
>
> rpm -q --whatrequires C
> rpm -q --whatrequires D
> rpm -q --whatrequires '(C and D)'
> 
> rpm -q --whatrequires E
> rpm -q --whatrequires F
> rpm -q --whatrequires '(E or F)'
> rpm -q --whatrequires G
> rpm -q --whatrequires '(G if H)'
> rpm -q --whatrequires '(G if H else I)'

The current implementation returns packages that *potentially* break
if the package is deinstalled. I.e. all of

rpm -q --whatrequires A
rpm -q --whatrequires B
rpm -q --whatrequires C
rpm -q --whatrequires D
rpm -q --whatrequires E
rpm -q --whatrequires F
rpm -q --whatrequires G

return "richdep". I think that's the correct behaviour, but I'm biased
as I implemented it ;)

(I think that "--whatrequires '(G if H else I)'" also gives you an answer,
but it does an exact string match.)

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Using rpm db to track unneeded packages

2016-04-01 Thread Michael Schroeder
On Fri, Apr 01, 2016 at 02:05:46PM +0200, Tomas Chvatal wrote:
> I was wondering if it would be possible to extend rpmdb to contain
> information about how package was pulled in the dependency graph.
> 
> At this point we at openSUSE have some sort of solver trying to
> magically do it in zypper and you guys at Fedora have it in dnf.

libzypp stores this information in /var/lib/zypp/AutoInstalled,
yum stores it in /var/lib/yum/yumdb, dnf stores it in
/var/lib/dnf/yumdb. (I guess the next version of dnf that is based
on libhif will store it in /var/lib/yum/yumdb again, but we'll see...)

> I would more like to see it tracked in the rpm because that way we
> could properly have all packages in the db, including the ones users
> installed via rpm commands and could see if they are
>  a) directly requested by user
>  b) just dependency of something and thus eligible for removal

Most implementations (including libzypp) assume that packages that
are not in the extra database are installed via rpm commands.

> I guess dnf/zypper would just flag them during install as True/False
> depending if they are direct request or dependency and for rpm we would
> always flag them as True for the solver as requested by user.
> 
> Would something like this make sense?

Dunno. Check https://bugzilla.redhat.com/show_bug.cgi?id=1167239

You can either store this information directly in the rpm db, or
you can have a little extra db like currently done with libzypp/yum/dnf.

Having an extra db makes it easier to change the autoinstalled status
of a package later on (which is something that needs to be dont from
time to time). Having in stored in the rpm header makes changing
the info more expensive, but it makes sure that the data is always
in sync.

Independent of the database implementation I would very much like
to have a commin interface in librpm, so that different software
management stacks all use the same mechansim.

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Requires based on destination hardware

2015-12-18 Thread Michael Schroeder
On Fri, Dec 18, 2015 at 01:33:14PM +0100, Florian Festi wrote:
> RPM upstream doesn't currently support hardware dependencies. IIRC SuSE
> does something like this in zypper/libzypp. They add all pci ids (and
> probably USB and may be other sources) as Provides: to the transaction.
> They also have an extension to match for Providename patterns. This
> allows have drivers Supplement: a range of hardware.
> 
> I couldn't find any docs on a quick web search. No idea if it is still
> being used.

Yes, those are still in use. The supplements are of the form:

Supplements: modalias()

or

Supplements: modalias(:)

(The version with the package is no longer needed when we support
"and" with the new rich deps.)

An example is:

modalias(xorg-x11-server:pci:v10DEd*sv*sd*bc03sc*i*)  

which matches the nouveau modalias on my system.

The mechanism is implemented with libsolv's namespace callback.
libzypp also supports locale() and filesystem() supplements.

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem


Re: [Rpm-ecosystem] Rich deps syntax finalization

2015-09-01 Thread Michael Schroeder
On Tue, Sep 01, 2015 at 08:54:03AM +0200, Florian Festi wrote:
> Libsolv - the currently only dependency solver with rich deps support -
> does not order the "or" terms but instead tries to pick the "best" of
> all packages - no matter of the order within the dependency.
> 
> While we have discussed about adding this there are several reasons not to:
> 
>  * We have Suggests and Enhances which can be used to state package
> preferences
>  * The semantic is not that clear if you have nested expressions
>  * Especially in combination with Suggests and Enhances
>  * The implementation may be tricky as the rules get normalized to
> conjunctive normal form before solving

Note that debian does use the "or" order to specify preferences,
so when at some point far far in the future debian uses libsolv as
their dependenct solver, I'll have to implement this feature...

Cheers,
  Michael.

-- 
Michael Schroeder   m...@suse.de
SUSE LINUX GmbH,   GF Jeff Hawn, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
___
Rpm-ecosystem mailing list
Rpm-ecosystem@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-ecosystem