On Wed, Jan 22, 2020 at 11:04 AM Anatol Pomozov <anatol.pomo...@gmail.com> wrote:
> Hello > > On Wed, Jan 22, 2020 at 2:23 AM Allan McRae <al...@archlinux.org> wrote: > > > > On 22/1/20 6:54 pm, Anatol Pomozov wrote: > > > The first experiment is to parse db tarfile using the script and then > > > write it back to a file: > > > uncompressed size is 17757184 that is equal to original sample > > > 'zstd -19' compressed size is 4366994 that is 1.0084540990896713 > > > times better than original sample > > > > > > Tar *entries* content is identical to the original file. Uncompressed > > > size is exactly the same. Compressed (zstd -19) size is 0.8% better. > > > It comes from the fact that my script does not set entries user/group > > > value and neither sets tar entries modification time. I am not sure if > > > this information is actually used by pacman. Modification time > > > contains a lot of entropy that compressor does not like. > > > > tl;dr > > > > "original" 4366994 > > no md5 4188019 > > no pgp 1160912 > > np md5+pgp 1021667 > > > > > > But do any of these numbers stand if you keep the tar file? > > I do not fully understand your question here. plainXXX+uncomressed is > a TAR file that matches current db format. > > original 17757184 > no md5 17536365 > no pgp 14085120 > no md5/pgp 13248000 > > But compressed size is what really matters for users. Dropping pgp > signature from db file provides the biggest benefit for compressed > data (3.8 times smaller files). > > > > > Also, I find downloading signature files causes a big pause in > > processing the downloads. Is that just a slow connection to the world > > at my end? > > *.sig files are small so bandwidth should not be a problem. > > My guess is that latency to your Arch mirror is too high and setting > up twice as many ssl connections gives noticeable slowdown. Check if > you use local Australian mirror - it will help to reduce the > connection setup time. Using HTTP over HTTPS might help a bit as well. > Point of order: If you only use a single mirror, there should only be a single connection -- pacman (curl) reuses connections whenever possible and only gives up if the remote doesn't support keepalives (should be rare) or a socket error occurs. > But the best solution for your problem is to have a proper pacman > parallel download support. In this case connection setup will run in > parallel thus sharing its setup latency. It would also require less > HTTP/HTTPS connections as HTTP2 supports multiplexing - multiple > downloads from the same server would share single connection. It's more subtle than this. As I mentioned above, there should only be a single (reused) connection. If the problem is actually latency in TTFB, then it might be a matter of using a more geographically local mirror. Parallelization could help mask some problems here, but it's going to be a LOT of work to change pacman internals to accommodate this.