Hi!

On Sat, 2020-04-11 at 14:44:32 +0200, Sebastian Andrzej Siewior wrote:
> Package: dpkg
> Version: 1.19.7
> Severity: wishlist

> I've been thinking about parallel decompression for dpkg/xz. Is there
> any interest in doing this? I hacked parallel-unxz [0] in the meantime
> to see what is missing from the API point of view (query block offsets
> is missing).

I'm interested, but mainly if this is provided transparently by
liblzma, in a similar way as it is currently provided for compression.

> My idea of accomplishing this is roughly the following:
> During archive creation the output of tar is also analysed (like via
> libarchive) in order to gain the start position of the files within the
> uncompressed archive (which is something pixz does).

When I checked the implementation in pixz it looked not very efficient
in terms of memory, AFAIR.

> Once we have those, we can reduce the list to the files which spread
> accross a block within the stream and those which are the first files
> within a block.
> Then on the decompression side, each thread could focus on an
> independent block. It starts decompressing the block and throws away
> data until it reaches the start of a file. Then it continues to
> decompress as many files as it can until it reaches the end of the block
> and finishes the file crossing the block boudary.
> 
> So is this something that sounds worth doing or does it sound too
> complex / hacky in general?

The biggest issue is that the dpkg codebase is very much not
thread-safe, so implementing this would imply either duplicating much
of the existing infrastructure, such as using different error handling
stuff, or to bolt this on top, which seems rather unappealing.

I've seen the discussion in the upstream list, and I'd really like to
get this supported there, also because it would benefit many other
projects. :)

Thanks,
Guillem

Reply via email to