Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
On 01/27/14 08:35, Steev Klimaszewski wrote: On Sun, 2014-01-26 at 21:00 +0100, Michał Górny wrote: Hi again. If someone is interested in the results of my tests and benchmarks, I've uploaded the initial version of my article on the topic in our dev-space. http://dev.gentoo.org/~mgorny/tmp/squashfs-deltas.pdf I am terribly busy with the uni right now so it will take some time before I continue working on it. I will try to provide a final specification for the first attempt at the idea and ask infra if they are ready to sacrifice the hardware for it. Further possible improvements: 1. switch to LZ4 (stronger compression, even faster) -- will require a newer kernel (3.14?), it should be in kernel 3.11 windows for workgroups release (check anyway) While the stronger compression, and being faster is definitely nice, having portage on squashfs is really nice on ARM devices, however the number of them that have a decently running kernel newer than 3.8 are few and far between, so I'd like to ask that this be held off as long as possible. I know these are just possible improvements, but doing so would definitely alienate a really good place where this would shine. yes, there are good reasons also for amd64 2. dedicated SquashFS delta tool -- I'm working on it but the format seems to be poorly documented so it will take some time :).
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
Dnia 2014-01-27, o godz. 14:53:34 viv...@gmail.com viv...@gmail.com napisał(a): On 01/27/14 08:35, Steev Klimaszewski wrote: On Sun, 2014-01-26 at 21:00 +0100, Michał Górny wrote: Hi again. If someone is interested in the results of my tests and benchmarks, I've uploaded the initial version of my article on the topic in our dev-space. http://dev.gentoo.org/~mgorny/tmp/squashfs-deltas.pdf I am terribly busy with the uni right now so it will take some time before I continue working on it. I will try to provide a final specification for the first attempt at the idea and ask infra if they are ready to sacrifice the hardware for it. Further possible improvements: 1. switch to LZ4 (stronger compression, even faster) -- will require a newer kernel (3.14?), it should be in kernel 3.11 windows for workgroups release (check anyway) That's just the LZ4 library code. We additionally need the SquashFS support code. It has been introduced in squashfs-tools lately (4.2_p20140119 has it, though disabled by ebuild) and I don't see it in the kernel's master branch yet. -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
Dnia 2014-01-27, o godz. 01:35:52 Steev Klimaszewski st...@gentoo.org napisał(a): On Sun, 2014-01-26 at 21:00 +0100, Michał Górny wrote: Hi again. If someone is interested in the results of my tests and benchmarks, I've uploaded the initial version of my article on the topic in our dev-space. http://dev.gentoo.org/~mgorny/tmp/squashfs-deltas.pdf I am terribly busy with the uni right now so it will take some time before I continue working on it. I will try to provide a final specification for the first attempt at the idea and ask infra if they are ready to sacrifice the hardware for it. Further possible improvements: 1. switch to LZ4 (stronger compression, even faster) -- will require a newer kernel (3.14?), While the stronger compression, and being faster is definitely nice, having portage on squashfs is really nice on ARM devices, however the number of them that have a decently running kernel newer than 3.8 are few and far between, so I'd like to ask that this be held off as long as possible. I know these are just possible improvements, but doing so would definitely alienate a really good place where this would shine. I think that if we decide to do the switch, we will host multiple formats at least for some time. It won't be really helpful to prevent people from upgrading their kernel due to new repo archive format :). -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
On Mon, 27 Jan 2014 16:52:09 +0100 Michał Górny mgo...@gentoo.org wrote: That's just the LZ4 library code. We additionally need the SquashFS support code. It has been introduced in squashfs-tools lately (4.2_p20140119 has it, though disabled by ebuild) and I don't see it in the kernel's master branch yet. I'll be glad to add that squashfs-tools support for you shortly. Regards, jer
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
Hi again. If someone is interested in the results of my tests and benchmarks, I've uploaded the initial version of my article on the topic in our dev-space. http://dev.gentoo.org/~mgorny/tmp/squashfs-deltas.pdf I am terribly busy with the uni right now so it will take some time before I continue working on it. I will try to provide a final specification for the first attempt at the idea and ask infra if they are ready to sacrifice the hardware for it. Further possible improvements: 1. switch to LZ4 (stronger compression, even faster) -- will require a newer kernel (3.14?), 2. dedicated SquashFS delta tool -- I'm working on it but the format seems to be poorly documented so it will take some time :). -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
On Sun, 2014-01-26 at 21:00 +0100, Michał Górny wrote: Hi again. If someone is interested in the results of my tests and benchmarks, I've uploaded the initial version of my article on the topic in our dev-space. http://dev.gentoo.org/~mgorny/tmp/squashfs-deltas.pdf I am terribly busy with the uni right now so it will take some time before I continue working on it. I will try to provide a final specification for the first attempt at the idea and ask infra if they are ready to sacrifice the hardware for it. Further possible improvements: 1. switch to LZ4 (stronger compression, even faster) -- will require a newer kernel (3.14?), While the stronger compression, and being faster is definitely nice, having portage on squashfs is really nice on ARM devices, however the number of them that have a decently running kernel newer than 3.8 are few and far between, so I'd like to ask that this be held off as long as possible. I know these are just possible improvements, but doing so would definitely alienate a really good place where this would shine. 2. dedicated SquashFS delta tool -- I'm working on it but the format seems to be poorly documented so it will take some time :).
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
On Fri, 17 Jan 2014 19:53:44 +0100 Michał Górny mgo...@gentoo.org wrote: Dnia 2014-01-17, o godz. 10:18:58 Alec Warner anta...@gentoo.org napisał(a): On Fri, Jan 17, 2014 at 8:27 AM, Michał Górny mgo...@gentoo.org wrote: Hello, all. I'm using squashfs to hold my Gentoo repositories on all of my systems for some time. As you probably know, this allows me to save space while keeping portage fast. However, it makes updating the tree quite burdensome and time-consuming. Yes, full metadata with md5-cache. That's the same thing you get via 'emerge --sync'. And that's why the deltas are so big -- I recall three big cache updates this week. I would absolutely use this on my machines. -- Christopher Head signature.asc Description: PGP signature
[gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
Hello, all. I'm using squashfs to hold my Gentoo repositories on all of my systems for some time. As you probably know, this allows me to save space while keeping portage fast. However, it makes updating the tree quite burdensome and time-consuming. We're already hosting daily gx86 tarballs on our mirrors, and deltas made using diffball. Those can be used with Zac's emerge-delta-webrsync to get daily updates done with minimal network overhead. Sadly, it takes the whole process even more time consuming :). Therefore, I'd like to suggest an alternative solution that could help out Gentoo users that use squashfs for gx86 and would like to be able to get daily updates fast and easy. The idea is to host -- along with the tarballs -- daily squashfs images of gx86 in a chosen format. Additionally, the images would come with deltas made using xdelta3 or a similar tool. Those deltas -- with a slight download overhead -- would allow very fast updates of the squashfs. Now some numbers. I did some tests 'converting' late gx86 daily tarballs to squashfs. I've used squashfs 4.2 with LZO compression since it's quite good and very fast. 96M portage-20140108.sqfs 96M portage-20140109.sqfs 96M portage-20140110.sqfs 96M portage-20140111.sqfs 96M portage-20140112.sqfs 96M portage-20140113.sqfs 97M portage-20140114.sqfs 97M portage-20140115.sqfs For deltas, I've used xdelta3 with max compression (-9) and djw secondary compression (it gave ~0.1M smaller files than fgk and ~0.5M gain than with no secondary compression). 4,9Mportage-20140108.sqfs-portage-20140109.sqfs.vcdiff.djw 6,3Mportage-20140109.sqfs-portage-20140110.sqfs.vcdiff.djw 5,6Mportage-20140110.sqfs-portage-20140111.sqfs.vcdiff.djw 8,9Mportage-20140111.sqfs-portage-20140112.sqfs.vcdiff.djw 6,3Mportage-20140112.sqfs-portage-20140113.sqfs.vcdiff.djw 7,8Mportage-20140113.sqfs-portage-20140114.sqfs.vcdiff.djw 8,5Mportage-20140114.sqfs-portage-20140115.sqfs.vcdiff.djw As you can see, the deltas are quite large compared to the actual changes. However, we could have expected that since we're diffing a compressed filesystem. What's important, however, is that applying it takes ~2.5 second on my 2 GHz Athlon64. So, even with the extra download time, the update is much faster than recreating the squashfs. And unlike some types of unionfs, it doesn't come with extra runtime slowdown. What do you think? -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 17/01/14 11:27 AM, Michał Górny wrote: Hello, all. I'm using squashfs to hold my Gentoo repositories on all of my systems for some time. As you probably know, this allows me to save space while keeping portage fast. However, it makes updating the tree quite burdensome and time-consuming. We're already hosting daily gx86 tarballs on our mirrors, and deltas made using diffball. Those can be used with Zac's emerge-delta-webrsync to get daily updates done with minimal network overhead. Sadly, it takes the whole process even more time consuming :). Therefore, I'd like to suggest an alternative solution that could help out Gentoo users that use squashfs for gx86 and would like to be able to get daily updates fast and easy. The idea is to host -- along with the tarballs -- daily squashfs images of gx86 in a chosen format. Additionally, the images would come with deltas made using xdelta3 or a similar tool. Those deltas -- with a slight download overhead -- would allow very fast updates of the squashfs. Now some numbers. I did some tests 'converting' late gx86 daily tarballs to squashfs. I've used squashfs 4.2 with LZO compression since it's quite good and very fast. 96M portage-20140108.sqfs 96M portage-20140109.sqfs 96M portage-20140110.sqfs 96M portage-20140111.sqfs 96M portage-20140112.sqfs 96M portage-20140113.sqfs 97M portage-20140114.sqfs 97M portage-20140115.sqfs For deltas, I've used xdelta3 with max compression (-9) and djw secondary compression (it gave ~0.1M smaller files than fgk and ~0.5M gain than with no secondary compression). 4,9M portage-20140108.sqfs-portage-20140109.sqfs.vcdiff.djw 6,3M portage-20140109.sqfs-portage-20140110.sqfs.vcdiff.djw 5,6M portage-20140110.sqfs-portage-20140111.sqfs.vcdiff.djw 8,9M portage-20140111.sqfs-portage-20140112.sqfs.vcdiff.djw 6,3M portage-20140112.sqfs-portage-20140113.sqfs.vcdiff.djw 7,8M portage-20140113.sqfs-portage-20140114.sqfs.vcdiff.djw 8,5M portage-20140114.sqfs-portage-20140115.sqfs.vcdiff.djw As you can see, the deltas are quite large compared to the actual changes. However, we could have expected that since we're diffing a compressed filesystem. What's important, however, is that applying it takes ~2.5 second on my 2 GHz Athlon64. So, even with the extra download time, the update is much faster than recreating the squashfs. And unlike some types of unionfs, it doesn't come with extra runtime slowdown. What do you think? PLEASE DO! This sounds fantastic, and is something i've been considering proposing for some time. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iF4EAREIAAYFAlLZWoMACgkQ2ugaI38ACPAb0gEAwFPWdtI5J8l9QH1YTrKe2Mbm OwPL4wGg9ORaHv0FkVcA/iLsP/z3uz3sJoWciR5ZCZ73HblyDgH/flAhakNhl3NZ =ool3 -END PGP SIGNATURE-
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
Michał Górny wrote: What do you think? Excellent. I think it could quickly become the prefered protage storage format, although a loopback mount is needed. //Peter pgpZaqbTCHide.pgp Description: PGP signature
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
Dnia 2014-01-17, o godz. 17:51:41 Peter Stuge pe...@stuge.se napisał(a): Michał Górny wrote: What do you think? Excellent. I think it could quickly become the prefered protage storage format, although a loopback mount is needed. You can use sys-fs/squashfuse :). Though honestly I'd prefer if portage was able to take squashfs image path in repos.conf and mount it locally for build-time. This will have the extra advantage that we wouldn't have to worry about remounting it after sync. -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
Michał Górny wrote: What do you think? Excellent. I think it could quickly become the prefered protage storage format, although a loopback mount is needed. You can use sys-fs/squashfuse :). Though honestly I'd prefer if portage was able to take squashfs image path in repos.conf and mount it locally for build-time. This will have the extra advantage that we wouldn't have to worry about remounting it after sync. Or read it directly without mounting. //Peter pgpVeLJOgAY6E.pgp Description: PGP signature
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
On 01/17/14 17:27, Michał Górny wrote: Hello, all. I'm using squashfs to hold my Gentoo repositories on all of my systems for some time. As you probably know, this allows me to save space while keeping portage fast. However, it makes updating the tree quite burdensome and time-consuming. Me too and you have my total support (maybe I've even proposed this before to te list) [snip] As you can see, the deltas are quite large compared to the actual changes. However, we could have expected that since we're diffing a compressed filesystem. What's important, however, is that applying it takes ~2.5 second on my 2 GHz Athlon64. Have you tried to give an order (always the same) to the compressed files? It could give an advantage, tough it may be limited to 2^16 files the option is -sort sort_file thanks for it, Francesco
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
On Fri, Jan 17, 2014 at 8:27 AM, Michał Górny mgo...@gentoo.org wrote: Hello, all. I'm using squashfs to hold my Gentoo repositories on all of my systems for some time. As you probably know, this allows me to save space while keeping portage fast. However, it makes updating the tree quite burdensome and time-consuming. We're already hosting daily gx86 tarballs on our mirrors, and deltas made using diffball. Those can be used with Zac's emerge-delta-webrsync to get daily updates done with minimal network overhead. Sadly, it takes the whole process even more time consuming :). Therefore, I'd like to suggest an alternative solution that could help out Gentoo users that use squashfs for gx86 and would like to be able to get daily updates fast and easy. The idea is to host -- along with the tarballs -- daily squashfs images of gx86 in a chosen format. Additionally, the images would come with deltas made using xdelta3 or a similar tool. Those deltas -- with a slight download overhead -- would allow very fast updates of the squashfs. Now some numbers. I did some tests 'converting' late gx86 daily tarballs to squashfs. I've used squashfs 4.2 with LZO compression since it's quite good and very fast. 96M portage-20140108.sqfs 96M portage-20140109.sqfs 96M portage-20140110.sqfs 96M portage-20140111.sqfs 96M portage-20140112.sqfs 96M portage-20140113.sqfs 97M portage-20140114.sqfs 97M portage-20140115.sqfs For deltas, I've used xdelta3 with max compression (-9) and djw secondary compression (it gave ~0.1M smaller files than fgk and ~0.5M gain than with no secondary compression). 4,9Mportage-20140108.sqfs-portage-20140109.sqfs.vcdiff.djw 6,3Mportage-20140109.sqfs-portage-20140110.sqfs.vcdiff.djw 5,6Mportage-20140110.sqfs-portage-20140111.sqfs.vcdiff.djw 8,9Mportage-20140111.sqfs-portage-20140112.sqfs.vcdiff.djw 6,3Mportage-20140112.sqfs-portage-20140113.sqfs.vcdiff.djw 7,8Mportage-20140113.sqfs-portage-20140114.sqfs.vcdiff.djw 8,5Mportage-20140114.sqfs-portage-20140115.sqfs.vcdiff.djw As you can see, the deltas are quite large compared to the actual changes. However, we could have expected that since we're diffing a compressed filesystem. What's important, however, is that applying it takes ~2.5 second on my 2 GHz Athlon64. It wasn't clear to me, are these trees with metadata included? -A So, even with the extra download time, the update is much faster than recreating the squashfs. And unlike some types of unionfs, it doesn't come with extra runtime slowdown. What do you think? -- Best regards, Michał Górny
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
Dnia 2014-01-17, o godz. 10:18:58 Alec Warner anta...@gentoo.org napisał(a): On Fri, Jan 17, 2014 at 8:27 AM, Michał Górny mgo...@gentoo.org wrote: Hello, all. I'm using squashfs to hold my Gentoo repositories on all of my systems for some time. As you probably know, this allows me to save space while keeping portage fast. However, it makes updating the tree quite burdensome and time-consuming. We're already hosting daily gx86 tarballs on our mirrors, and deltas made using diffball. Those can be used with Zac's emerge-delta-webrsync to get daily updates done with minimal network overhead. Sadly, it takes the whole process even more time consuming :). Therefore, I'd like to suggest an alternative solution that could help out Gentoo users that use squashfs for gx86 and would like to be able to get daily updates fast and easy. The idea is to host -- along with the tarballs -- daily squashfs images of gx86 in a chosen format. Additionally, the images would come with deltas made using xdelta3 or a similar tool. Those deltas -- with a slight download overhead -- would allow very fast updates of the squashfs. Now some numbers. I did some tests 'converting' late gx86 daily tarballs to squashfs. I've used squashfs 4.2 with LZO compression since it's quite good and very fast. 96M portage-20140108.sqfs 96M portage-20140109.sqfs 96M portage-20140110.sqfs 96M portage-20140111.sqfs 96M portage-20140112.sqfs 96M portage-20140113.sqfs 97M portage-20140114.sqfs 97M portage-20140115.sqfs For deltas, I've used xdelta3 with max compression (-9) and djw secondary compression (it gave ~0.1M smaller files than fgk and ~0.5M gain than with no secondary compression). 4,9Mportage-20140108.sqfs-portage-20140109.sqfs.vcdiff.djw 6,3Mportage-20140109.sqfs-portage-20140110.sqfs.vcdiff.djw 5,6Mportage-20140110.sqfs-portage-20140111.sqfs.vcdiff.djw 8,9Mportage-20140111.sqfs-portage-20140112.sqfs.vcdiff.djw 6,3Mportage-20140112.sqfs-portage-20140113.sqfs.vcdiff.djw 7,8Mportage-20140113.sqfs-portage-20140114.sqfs.vcdiff.djw 8,5Mportage-20140114.sqfs-portage-20140115.sqfs.vcdiff.djw As you can see, the deltas are quite large compared to the actual changes. However, we could have expected that since we're diffing a compressed filesystem. What's important, however, is that applying it takes ~2.5 second on my 2 GHz Athlon64. It wasn't clear to me, are these trees with metadata included? Yes, full metadata with md5-cache. That's the same thing you get via 'emerge --sync'. And that's why the deltas are so big -- I recall three big cache updates this week. -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] RFC: Hosting daily gx86 squashfs images and deltas
On 01/17/2014 04:27 PM, Michał Górny wrote: Hello, all. I'm using squashfs to hold my Gentoo repositories on all of my systems for some time. As you probably know, this allows me to save space while keeping portage fast. However, it makes updating the tree quite burdensome and time-consuming. We're already hosting daily gx86 tarballs on our mirrors, and deltas made using diffball. Those can be used with Zac's emerge-delta-webrsync to get daily updates done with minimal network overhead. Sadly, it takes the whole process even more time consuming :). Therefore, I'd like to suggest an alternative solution that could help out Gentoo users that use squashfs for gx86 and would like to be able to get daily updates fast and easy. The idea is to host -- along with the tarballs -- daily squashfs images of gx86 in a chosen format. Additionally, the images would come with deltas made using xdelta3 or a similar tool. Those deltas -- with a slight download overhead -- would allow very fast updates of the squashfs. Now some numbers. I did some tests 'converting' late gx86 daily tarballs to squashfs. I've used squashfs 4.2 with LZO compression since it's quite good and very fast. 96M portage-20140108.sqfs 96M portage-20140109.sqfs 96M portage-20140110.sqfs 96M portage-20140111.sqfs 96M portage-20140112.sqfs 96M portage-20140113.sqfs 97M portage-20140114.sqfs 97M portage-20140115.sqfs For deltas, I've used xdelta3 with max compression (-9) and djw secondary compression (it gave ~0.1M smaller files than fgk and ~0.5M gain than with no secondary compression). 4,9M portage-20140108.sqfs-portage-20140109.sqfs.vcdiff.djw 6,3M portage-20140109.sqfs-portage-20140110.sqfs.vcdiff.djw 5,6M portage-20140110.sqfs-portage-20140111.sqfs.vcdiff.djw 8,9M portage-20140111.sqfs-portage-20140112.sqfs.vcdiff.djw 6,3M portage-20140112.sqfs-portage-20140113.sqfs.vcdiff.djw 7,8M portage-20140113.sqfs-portage-20140114.sqfs.vcdiff.djw 8,5M portage-20140114.sqfs-portage-20140115.sqfs.vcdiff.djw As you can see, the deltas are quite large compared to the actual changes. However, we could have expected that since we're diffing a compressed filesystem. What's important, however, is that applying it takes ~2.5 second on my 2 GHz Athlon64. So, even with the extra download time, the update is much faster than recreating the squashfs. And unlike some types of unionfs, it doesn't come with extra runtime slowdown. What do you think? +1 I like the idea -- Regards, Markos Chandras