On Fri, Apr 15, 2005 at 01:19:30PM -0400, C. Scott Ananian wrote:
> Why are blobs per-file?  [After all, Linus insists that files are an 
> illusion.]  Why not just have 'chunks', and assemble *these* 
> into blobs (read, 'files')?  A good chunk size would fit evenly into some 
> number of disk blocks (no wasted space!).

[ I've only been earwigging,  not paying a lot of attention,  however ...]

Funny I was just think of this having read Linus' discourse on
"files don't matter", the obvious chunking factor would be say
a function.

The problem being tending towards having very small files - I know
I tend to prefer small functions.  Hmm - a underlying filesystem that
efficiently stores small files - why does that ring a bell :-)

However the simple answer is to have a preparser for a file / tree
checkin which split say a .c file into it's associated chunks,  anf
represented it in git as a signed/hashed object.  i.e. a automatically
created extra level of indirection (as I seem to recall was added
somewhere else?).

  So say fred.c:

  /*
   * File boiler
   */
  #include <guff>
  #include <more guff>

  /*
   * Fn a boiler
   */
  int fn_a(args) {
  }

  /*
   * Fn b boiler
   */
  long fn_b(args) {
  }

Would be split into 4 parts within git,  the 'file object' which simply
points to the content objects,  and 3 contents objects,  being the stuff
before 'Fn a boiler',  fn_a and it's boiler,  fn_b and it's boiler.

The interesting bit is needing a preprocessor which can roughly parse
the code - i.e. detect where to place the boiler blocks.

You would then do most of your tree operations upon the file objects,
but get the space savings from the content objects being shared.

I suspect that simply to prevent pathological conditions you'd have to
arrange that the contents objects have a minimal size,  irrespective
of the number of desired chunks (functions) they would naturally
contain.  i.e. for compresion efficiency,  you may choose something like
2K as the minimal pre compression content object size.

DF
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to