Re: [git-users] How does Git storing entire files rather than deltas make it superior?

Michael Sat, 02 Nov 2019 20:04:51 -0700

On 2019-11-01, at 12:39 PM, likejudo <anil.r...@gmail.com> wrote:

> I was wondering if this isn't space inefficient - and how does it become 
> superior to a VCS by storing snapshots rather than deltas?

Some people will cite studies showing that the pack files have better
compression than you'd normally expect; this is to be expected from compressing
a larger amount of data.

Some people will cite that "unmodified files checkedsummed" prevent unexpected
alterations; git is actually the first example I know of of a block-chain in
real life, before it was called a block chain. Git gains all the advantages of
blockchains for detecting accuracy.

Some people will question what "superior" means.

The bottom line is this: Git was developed for the linux kernel. Git was
developed based on the needs of a decently sized small project.

Yea, there was a time when I thought linux was big. "Big" is what you get when
Microsoft and Google both start moving their development/version control over
to git. There's stuff in git designed to deal with very, very large archives
that these two have contributed.

In a nutshell, git has these advantages over everything else that came before
it:
1. Ability to work with really large archives.
2. Ability to recover not just a version of a file, but a version of a project,
even as filenames change
3. Ability to check what changes were made in a given subdirectory during a
period of time -- used by people working on a subset of the linux kernel, for
example.
4. Ability to merge more than two deltas off a previous base
5. Ability to ensure no one slipped unauthorized changes into the source code.
6. Ability to have different people work on different files at the same time
without ever running into "locking" issues, without having to have a network
connection at "checkout" time, without needing to have a concept of checking
out.
7. Ability to consider anyone's copy as the "master" copy -- useful if the
maintain/"master" of a project changes.

When you consider these goals, space used by text files isn't nearly as
important. Once you get to something the size of the linux code base, you can
start to think that you might be consuming disk space.

====

As stated, the best way to think of git is a read-only filesystem. Files are
presented to git in their "only" finished form, and do not get stored in the
filesystem until finished. There is no "differential" at the lowest level, only
a bunch of full files that do not change.

Everything else is layered on top of that.

The files are named by their hash code.
There are files that contain mappings of file user-names to hash codes -- which
in turn have a hash code name. These are the "directory listings". Some of
those files are sub-directories instead of user-supplied files.
There are files that contain the hash of the top-level project directory, and
information about which version that project directory has represents.

What does this not give you, that has to be calculated all the time? The diff
from version N to N+1. When you want to apply "what changed" between C and D as
a rebase onto B.

Diff-based VCS's give you that cheaply, but lose all the other benefits.
Linux found those benefits to be better.
Microsoft and Google are switching.

Are there issues/problems? Sure.
Are they less of an issue this way than any other way so far? Seems like it.
Are there features people would like to see in Git? Yep.
Could most of them be added to git without changing the "Read only filesystem"
at the heart? Yes.

Is there a better system design than git? Sure. Do we know what it is? Probably
not.

--
You received this message because you are subscribed to the Google Groups "Git
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to git-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/git-users/45F7E631-0FD0-48C7-AF98-16785213BC77%40gmail.com.

Re: [git-users] How does Git storing entire files rather than deltas make it superior?

Reply via email to