Re: GIT for pdiff generation
Right now the source contents of unstable has, unpacked, 220MB. (Packed gzip its 28MB, while the binary contents per have each have 18MB packed). That should not be a problem in any non-joke box. Unless you'll run it in a memory-constrained vm or something. Well. For our archives it is turned on in the main and backports one. I dont think main will ever run in trouble there: total used free sharedbuffers cached Mem: 33006584 292417803764804 02343936 20783680 while backports isnt as big but still large enough: total used free sharedbuffers cached Mem: 81980847352164 845920 010630125650672 Lets add a safety margin: 350MB is a good guess for the largest. A packages file nearly doesnt count compared to them, unpacked its just some 34mb I.e. something very easy to keep in RAM on a server class or desktop class box. Yes. Other than that, git loads entire objects to memory to manipulate them, which AFAIK CAN cause problems in datasets with very large files (the problem is not usually the size of the repository, but rather the size of the largest object). You probably want to test your use case with several worst-case files AND a large safety margin to ensure it won't break on us anytime soon, using something to track git memory usage. Well, yes. At the sizes you explained now (I thought it would deal with objects 7GB in size, not 7GB worth of objects at most 0.5GB in size), it should not be a problem in any box with a reasonable ammount of free RAM and vm space (say, 1GB). Right, could have written that better. -- bye, Joerg liw I'm a blabbermouth -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87fwq73fqh@gkar.ganneff.de
Re: GIT for pdiff generation
As we are no git gurus ourself: Does anyone out there see any trouble doing it this way? It means storing something around 7GB of uncompressed text files in git, plus the daily changes happening to them, then diffing them in the way described above, however the archive will only need to go back for a couple of weeks and therefore we should be able to apply git gc --prune (coupled with whatever way to actually tell git that everything before $DATE can be removed) to keep the size down. AFAIK, there can be trouble. It all depends on how you're structuring the data in git, and the size of the largest data object you will want to commit to the repository. Right now the source contents of unstable has, unpacked, 220MB. (Packed gzip its 28MB, while the binary contents per have each have 18MB packed). Lets add a safety margin: 350MB is a good guess for the largest. A packages file nearly doesnt count compared to them, unpacked its just some 34mb There is an alternative: git can rewrite the entire history (invalidating all commit IDs from the start pointing up to all the branch heads in the process). You can use that facility to drop old commits. Given the indented use, where you don't seem to need the commit ids to be constant across runs and you will rewrite the history of the entire repo at once and drop everything that was not rewritten, this is likely the less ugly way of doing what you want. Refer to git filter-branch. Its the one and only thing I ever seen where history rewrite is actually something one wants to do. Other than that, git loads entire objects to memory to manipulate them, which AFAIK CAN cause problems in datasets with very large files (the problem is not usually the size of the repository, but rather the size of the largest object). You probably want to test your use case with several worst-case files AND a large safety margin to ensure it won't break on us anytime soon, using something to track git memory usage. Well, yes. -- bye, Joerg Some NM: FTBFS=Fails to Build from Start Err, yes? How do you start in the middle? -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87lj00a3vq@gkar.ganneff.de
Re: GIT for pdiff generation
On Sun, 27 Mar 2011, Joerg Jaspert wrote: Right now the source contents of unstable has, unpacked, 220MB. (Packed gzip its 28MB, while the binary contents per have each have 18MB packed). That should not be a problem in any non-joke box. Unless you'll run it in a memory-constrained vm or something. Lets add a safety margin: 350MB is a good guess for the largest. A packages file nearly doesnt count compared to them, unpacked its just some 34mb I.e. something very easy to keep in RAM on a server class or desktop class box. There is an alternative: git can rewrite the entire history (invalidating all commit IDs from the start pointing up to all the branch heads in the process). You can use that facility to drop old commits. Given the indented use, where you don't seem to need the commit ids to be constant across runs and you will rewrite the history of the entire repo at once and drop everything that was not rewritten, this is likely the less ugly way of doing what you want. Refer to git filter-branch. Its the one and only thing I ever seen where history rewrite is actually something one wants to do. Indeed. Other than that, git loads entire objects to memory to manipulate them, which AFAIK CAN cause problems in datasets with very large files (the problem is not usually the size of the repository, but rather the size of the largest object). You probably want to test your use case with several worst-case files AND a large safety margin to ensure it won't break on us anytime soon, using something to track git memory usage. Well, yes. At the sizes you explained now (I thought it would deal with objects 7GB in size, not 7GB worth of objects at most 0.5GB in size), it should not be a problem in any box with a reasonable ammount of free RAM and vm space (say, 1GB). Some NM: FTBFS=Fails to Build from Start Err, yes? How do you start in the middle? You screw up debian/rules clean, and try two builds in sequence ;-) -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110327192759.ga4...@khazad-dum.debian.net