Re: GIT for pdiff generation

2011-03-28 Thread Joerg Jaspert
 Right now the source contents of unstable has, unpacked, 220MB. (Packed
 gzip its 28MB, while the binary contents per have each have 18MB
 packed).
 That should not be a problem in any non-joke box.  Unless you'll run it
 in a memory-constrained vm or something.

Well. For our archives it is turned on in the main and backports one. I
dont think main will ever run in trouble there:
 total   used   free sharedbuffers cached
Mem:  33006584   292417803764804  02343936   20783680

while backports isnt as big but still large enough:
 total   used   free sharedbuffers cached
Mem:   81980847352164 845920  010630125650672

 Lets add a safety margin: 350MB is a good guess for the largest.
 A packages file nearly doesnt count compared to them, unpacked its just
 some 34mb
 I.e. something very easy to keep in RAM on a server class or desktop
 class box.

Yes.

  Other than that, git loads entire objects to memory to manipulate them,
  which AFAIK CAN cause problems in datasets with very large files (the
  problem is not usually the size of the repository, but rather the size
  of the largest object).  You probably want to test your use case with
  several worst-case files AND a large safety margin to ensure it won't
  break on us anytime soon, using something to track git memory usage.
 Well, yes.
 At the sizes you explained now (I thought it would deal with objects 7GB
 in size, not 7GB worth of objects at most 0.5GB in size), it should not
 be a problem in any box with a reasonable ammount of free RAM and vm
 space (say, 1GB).

Right, could have written that better.

-- 
bye, Joerg
liw I'm a blabbermouth


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87fwq73fqh@gkar.ganneff.de



Re: GIT for pdiff generation

2011-03-27 Thread Joerg Jaspert

   As we are no git gurus ourself: Does anyone out there see any trouble
   doing it this way? It means storing something around 7GB of
   uncompressed text files in git, plus the daily changes happening to
   them, then diffing them in the way described above, however the
   archive will only need to go back for a couple of weeks and therefore
   we should be able to apply git gc --prune (coupled with whatever way
   to actually tell git that everything before $DATE can be removed) to
   keep the size down.
 AFAIK, there can be trouble.  It all depends on how you're structuring
 the data in git, and the size of the largest data object you will want
 to commit to the repository.

Right now the source contents of unstable has, unpacked, 220MB. (Packed
gzip its 28MB, while the binary contents per have each have 18MB
packed).

Lets add a safety margin: 350MB is a good guess for the largest.
A packages file nearly doesnt count compared to them, unpacked its just
some 34mb

 There is an alternative: git can rewrite the entire history
 (invalidating all commit IDs from the start pointing up to all the
 branch heads in the process).  You can use that facility to drop old
 commits.  Given the indented use, where you don't seem to need the
 commit ids to be constant across runs and you will rewrite the history
 of the entire repo at once and drop everything that was not rewritten,
 this is likely the less ugly way of doing what you want.  Refer to git
 filter-branch.

Its the one and only thing I ever seen where history rewrite is
actually something one wants to do.

 Other than that, git loads entire objects to memory to manipulate them,
 which AFAIK CAN cause problems in datasets with very large files (the
 problem is not usually the size of the repository, but rather the size
 of the largest object).  You probably want to test your use case with
 several worst-case files AND a large safety margin to ensure it won't
 break on us anytime soon, using something to track git memory usage.

Well, yes.

-- 
bye, Joerg
Some NM:
 FTBFS=Fails to Build from Start
Err, yes? How do you start in the middle?


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87lj00a3vq@gkar.ganneff.de



Re: GIT for pdiff generation

2011-03-27 Thread Henrique de Moraes Holschuh
On Sun, 27 Mar 2011, Joerg Jaspert wrote:
 Right now the source contents of unstable has, unpacked, 220MB. (Packed
 gzip its 28MB, while the binary contents per have each have 18MB
 packed).

That should not be a problem in any non-joke box.  Unless you'll run it
in a memory-constrained vm or something.

 Lets add a safety margin: 350MB is a good guess for the largest.
 A packages file nearly doesnt count compared to them, unpacked its just
 some 34mb

I.e. something very easy to keep in RAM on a server class or desktop
class box.

  There is an alternative: git can rewrite the entire history
  (invalidating all commit IDs from the start pointing up to all the
  branch heads in the process).  You can use that facility to drop old
  commits.  Given the indented use, where you don't seem to need the
  commit ids to be constant across runs and you will rewrite the history
  of the entire repo at once and drop everything that was not rewritten,
  this is likely the less ugly way of doing what you want.  Refer to git
  filter-branch.
 
 Its the one and only thing I ever seen where history rewrite is
 actually something one wants to do.

Indeed.

  Other than that, git loads entire objects to memory to manipulate them,
  which AFAIK CAN cause problems in datasets with very large files (the
  problem is not usually the size of the repository, but rather the size
  of the largest object).  You probably want to test your use case with
  several worst-case files AND a large safety margin to ensure it won't
  break on us anytime soon, using something to track git memory usage.
 
 Well, yes.

At the sizes you explained now (I thought it would deal with objects 7GB
in size, not 7GB worth of objects at most 0.5GB in size), it should not
be a problem in any box with a reasonable ammount of free RAM and vm
space (say, 1GB).

 Some NM:
  FTBFS=Fails to Build from Start
 Err, yes? How do you start in the middle?

You screw up debian/rules clean, and try two builds in sequence ;-)

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110327192759.ga4...@khazad-dum.debian.net