Re: [gentoo-portage-dev] Re: [RFC] gpkg format proposal v2

2018-11-13 Thread Michał Górny
On Tue, 2018-11-13 at 00:45 +0100, Ulrich Mueller wrote:
> > > > > > On Mon, 12 Nov 2018, Michał Górny wrote:
> > Once tar is used for inner archive format, it is also a natural choice
> > for the outer format.  If you believe we should use another format, that
> > is introduce a second distinct archive format and depend on a second
> > tool, you need to have a good justification for it.
> 
> Right, that's a better reason. :)
> 
> > So yes, ar is an option, as well as cpio.  In both cases the format is
> > simpler (yet obscure), and the files are smaller.  But does that justify
> > using a second tool that serves the same purpose as tar, given that tar
> > works and we need to use it anyway?  Even if we skip the fact that ar is
> > bundled as part of binutils rather than as stand-alone archiver, we're
> > introducing unnecessarily complexity of learning a second tool.
> > And both ar(1) and cpio(1) have weird CLI, compared to tar(1).
> 
> cpio is not feasible because of file size limitations (4 GiB IIRC).
> 

FWICS, ar has a limit of 10 decimal digits, so around 9.3 GiB.

-- 
Best regards,
Michał Górny


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-portage-dev] Re: [RFC] gpkg format proposal v2

2018-11-12 Thread Ulrich Mueller
> On Mon, 12 Nov 2018, Michał Górny wrote:

> Once tar is used for inner archive format, it is also a natural choice
> for the outer format.  If you believe we should use another format, that
> is introduce a second distinct archive format and depend on a second
> tool, you need to have a good justification for it.

Right, that's a better reason. :)

> So yes, ar is an option, as well as cpio.  In both cases the format is
> simpler (yet obscure), and the files are smaller.  But does that justify
> using a second tool that serves the same purpose as tar, given that tar
> works and we need to use it anyway?  Even if we skip the fact that ar is
> bundled as part of binutils rather than as stand-alone archiver, we're
> introducing unnecessarily complexity of learning a second tool.
> And both ar(1) and cpio(1) have weird CLI, compared to tar(1).

cpio is not feasible because of file size limitations (4 GiB IIRC).

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-portage-dev] Re: [RFC] gpkg format proposal v2

2018-11-12 Thread Michał Górny
On Mon, 2018-11-12 at 21:23 +0100, Ulrich Mueller wrote:
> > > > > > On Mon, 12 Nov 2018, Michał Górny wrote:
> > > Also, what would be wrong with ar? It's a standard POSIX tool, and
> > > should be available everywhere.
> > The original post says what's wrong with ar.  Please be more specific
> > if you disagree with it.
> 
> AFAICS, the arguments are that ar would be obscure, and that the LSB
> considers it deprecated. I don't find either of them convincing.
> Since when do we care about the LSB?
> 

Do you have a convincing arguments for using ar?

I think it's quite obvious that tar is the only sane choice for
the inner archive format since we need to preserve permissions,
ownership etc.  ar can't do it.

Once tar is used for inner archive format, it is also a natural choice
for the outer format.  If you believe we should use another format, that
is introduce a second distinct archive format and depend on a second
tool, you need to have a good justification for it.

So yes, ar is an option, as well as cpio.  In both cases the format is
simpler (yet obscure), and the files are smaller.  But does that justify
using a second tool that serves the same purpose as tar, given that tar
works and we need to use it anyway?  Even if we skip the fact that ar is
bundled as part of binutils rather than as stand-alone archiver, we're
introducing unnecessarily complexity of learning a second tool.
And both ar(1) and cpio(1) have weird CLI, compared to tar(1).

Plus, ar apparently doesn't support directories, so we end up adding
extra complexity to get it unpacked sanely.

For the record, I've did a little experiment and here are the results:

-rw-r--r-- 1 mgorny  mgorny  112928836 11-12 22:13 wine-any-3.20-1.gpkg.ar
-rw-r--r-- 1 mgorny  mgorny  112929280 11-12 22:21 wine-any-3.20-1.gpkg.cpio
-rw-r--r-- 1 mgorny  mgorny  112936960 11-12 22:11 wine-any-3.20-1.gpkg.tar

So yes, we are saving around 8 KiB... out of 108 MiB.  Of course,
the savings may become relevant in case of tiny archives but do we
really need to be concerned about that?

The whole point of the proposal is to make the format simpler, easier to
introspect and to modify.  I believe limiting the number of formats
in use certainly serves that purpose while starting to depend on obscure
tools in order to save 8 KiB is a case of premature optimization.

-- 
Best regards,
Michał Górny


signature.asc
Description: This is a digitally signed message part


Re: [gentoo-portage-dev] Re: [RFC] gpkg format proposal v2

2018-11-12 Thread Alec Warner
On Mon, Nov 12, 2018 at 3:24 PM Ulrich Mueller  wrote:

> > On Mon, 12 Nov 2018, Michał Górny wrote:
>
> >> Also, what would be wrong with ar? It's a standard POSIX tool, and
> >> should be available everywhere.
>
> > The original post says what's wrong with ar.  Please be more specific
> > if you disagree with it.
>
> AFAICS, the arguments are that ar would be obscure, and that the LSB
> considers it deprecated. I don't find either of them convincing.
> Since when do we care about the LSB?
>

I assert that it doesn't matter which tool we pick, so we have arbitrarily
chosen tar because we like it.

If you have a basis for preferring ar over tar; I'd love to hear it. I only
brought it up because I know debian uses it.

-A


>
> Ulrich
>


Re: [gentoo-portage-dev] Re: [RFC] gpkg format proposal v2

2018-11-12 Thread Ulrich Mueller
> On Mon, 12 Nov 2018, Michał Górny wrote:

>> Also, what would be wrong with ar? It's a standard POSIX tool, and
>> should be available everywhere.

> The original post says what's wrong with ar.  Please be more specific
> if you disagree with it.

AFAICS, the arguments are that ar would be obscure, and that the LSB
considers it deprecated. I don't find either of them convincing.
Since when do we care about the LSB?

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-portage-dev] Re: [RFC] gpkg format proposal v2

2018-11-12 Thread Michał Górny
On Mon, 2018-11-12 at 18:33 +0100, Ulrich Mueller wrote:
> > > > > > On Mon, 12 Nov 2018, Michał Górny wrote:
> > On Mon, 2018-11-12 at 17:51 +0100, Fabian Groffen wrote:
> > > I'm wondering here, how much sense does it make to compress 2., 3.
> > > and/or 4. if you compress the whole gpkg?  I have the impression
> > > compression on compression isn't beneficial here.  Shouldn't just
> > > compressing of the gpkg tar be sufficient?
> > Please read the spec again.  It explicitly says it's not compressed.
> 
> Isn't that the wrong way around? The tar format contains a lot of
> padding, so using uncompressed tar for the outer archive would be
> somewhat wasteful. Why not leave the inner tar files uncompressed, but
> compress the whole binpkg instead?

Uncompressed tar is mostly suitable for random access.  Compressed tar
isn't suitable for random access at all.

With uncompressed tar, it's trivial to access one of the members.  With
compressed tar, you always end up decompressing everything.

With uncompressed tar, it's easy to rewrite the metadata (read: apply
package updates) without updating the rest.  With compressed tar, you'd
have to recompress all the huge packages in order to apply updates.

> Also, what would be wrong with ar? It's a standard POSIX tool, and
> should be available everywhere.
> 

The original post says what's wrong with ar.  Please be more specific if
you disagree with it.

-- 
Best regards,
Michał Górny


signature.asc
Description: This is a digitally signed message part


[gentoo-portage-dev] Re: [RFC] gpkg format proposal v2

2018-11-12 Thread Ulrich Mueller
> On Mon, 12 Nov 2018, Michał Górny wrote:

> On Mon, 2018-11-12 at 17:51 +0100, Fabian Groffen wrote:
>> I'm wondering here, how much sense does it make to compress 2., 3.
>> and/or 4. if you compress the whole gpkg?  I have the impression
>> compression on compression isn't beneficial here.  Shouldn't just
>> compressing of the gpkg tar be sufficient?

> Please read the spec again.  It explicitly says it's not compressed.

Isn't that the wrong way around? The tar format contains a lot of
padding, so using uncompressed tar for the outer archive would be
somewhat wasteful. Why not leave the inner tar files uncompressed, but
compress the whole binpkg instead?

Also, what would be wrong with ar? It's a standard POSIX tool, and
should be available everywhere.

Ulrich


signature.asc
Description: PGP signature