Hi On Wed, Jan 22, 2020 at 2:03 PM Allan McRae <al...@archlinux.org> wrote: > > On 23/1/20 2:03 am, Anatol Pomozov wrote: > > Hello > > > > On Wed, Jan 22, 2020 at 2:23 AM Allan McRae <al...@archlinux.org> wrote: > >> > >> On 22/1/20 6:54 pm, Anatol Pomozov wrote: > >>> The first experiment is to parse db tarfile using the script and then > >>> write it back to a file: > >>> uncompressed size is 17757184 that is equal to original sample > >>> 'zstd -19' compressed size is 4366994 that is 1.0084540990896713 > >>> times better than original sample > >>> > >>> Tar *entries* content is identical to the original file. Uncompressed > >>> size is exactly the same. Compressed (zstd -19) size is 0.8% better. > >>> It comes from the fact that my script does not set entries user/group > >>> value and neither sets tar entries modification time. I am not sure if > >>> this information is actually used by pacman. Modification time > >>> contains a lot of entropy that compressor does not like. > >> > >> tl;dr > >> > >> "original" 4366994 > >> no md5 4188019 > >> no pgp 1160912 > >> np md5+pgp 1021667 > >> > >> > >> But do any of these numbers stand if you keep the tar file? > > > > I do not fully understand your question here. plainXXX+uncomressed is > > a TAR file that matches current db format. > > > > Oops... Did not look down far enough your supplied files. I downloaded > db.original from your link, which is not original, and thought your > numbers were based off that.
Yeah, db.original at [1] is uncompressed community.db without any modifications. The script uses it as a base for comparisons. It was not clear for me what exactly compression parameters are currently used for *.db file so I chose 'zstd -19'. Let me know if anyone wants to see other compression algorithms/parameters, is it easy to add another experiment to this set. I also pushed a few updates to the 'packed' format implementation and the script is now available at github [2]. [1] https://pkgbuild.com/~anatolik/db-size-experiments/data/ [2] https://github.com/anatol/pacmandb-size-analysis