Hi Michael, Just back from a vacation.
On 24/05/2019 19:51, Michael wrote:
On 2019-05-16, at 11:35 AM, Giorgio Forti <alvarmayo...@gmail.com> wrote:
If I commit ONE file Git builds a "zip" that contains the actual situation of
ALL the 6 thousand of files in my C# solution?
And if I check out this commithe file Git gives me back the complete situation
at that moment?
This would be the solution.
This can work with files committed only locally and not pushed to the remote
repository?
So let me explain, if I can, the cheat that git uses.
Git's internal backing store is a write-once file system. Once a file goes into the
"filesystem", it never changes. Since it never changes, the true internal name
is the hash of the file.
True (A) (these are Git's 'Blob' objects).
Elsewhere, there are maps of user-visible-filenames to hash-filenames. And
those maps are also files inside git's file system, and they have a
hash-filename.
True (B) (These are Git's 'Tree' objects).
Subtlety: At (A), the hash is that of just the _content_, and excludes
the file name, date, and other meta data. This de-duplicates renames of
content!
Subtlety: At (B), this is where the _filenames_ (and directory names up
the trees) are membered and associated with blob hashes.
Please re-read those two bits twice more.
Alright, so what git does is this:
1. At any given moment, the "index" or "cache" contains the state of "what will be
checked in next" -- and it consists of a full set of filename to hashname maps for the entire project.
2. On any given commit, the vast majority of files will not change, so the
actual commit will have the same filename to hashname map as the last commit.
3. The actual commit is NOT the set of files, but the set of filename to
hashname maps.
Yes. (C) (This is the Git 'Commit' object, it just knows its parent(s),
and its top level tree).
Please re-read number 3 there.
Git will store lots and lots of junk files over time.
'Junk' - the intermediate work-in-progress stuff (especially from
git-gui), yes.
There is a separate mechanism that goes through and finds all hash-filename
files that are not referenced from any of the commit lists of user-filename to
hash-filename maps, and cleans those out.
Yes.
Git does not build a zip of all the 6000 files in your commit.
It builds a "zip" of the mappings of user-filenames to hash-filenames.
"yes" - see (C) above - that "zip" is just a 40-char has to the top tree
(which then does the 'zip cascade...)
And, since this is a tree structure, ** if a directory does not change, it is
reusing the same directory user filename to git-hash-filename map **.
Absolutely!
The result is that when you change one file, all that changes is the directory
object for the directory containing the filename of that object (because the
user-filename now points to a different hash-filename), and all the parent
directories back up to the root.
This is what makes git fast.
yep
When you say "git add filename", the new file has been added to git's
write-once backing store. And, the filename to hashname map in the index/cache has been
updated.
That's one copy, one hash calculation, and one updating of a 40 byte hash
record in a file. Plus possibly updating hash calculations and data for each
parent directory.
After that, it's just a case of recording which new hash number is in use in
various places.
Yep.
This is git's cheat. It's just relying on unchanging hashes that have been
mostly calculated in the past and are cheap to copy from A to B.
It's more that as simple "just". It's a stonking great benefit from the
verifiable certainty of the strong hash. Even with the recent sha1
'breakage', Git has some extra robustness features that mean it isn't
broken yet (while pdf's are). If you know the hash, and have a copy repo
containing it, you definitely have the right content, history and
everything.
The above descriptions relate to the "loose" object viewpoint. Git then
goes one better, by being able to create a 'pack' file that compresses
all those loose objects into one efficiently accessed view of the data,
mainly aided by "Linus's Law" (files grow, older files are smaller).
Pack files are something for the weekend.
I definitely recommend to learn about "git rebase -i" because you probably will
need it in near future. Git cherry-pick may help, too, if you are not ready to learn
about rebasing.
If you are looking to have to do a big, massive merge, which it sounds like ...
You will want to look into a tool called "imerge". Git-imerge is a program that tries to
solve the "massive merge" problem. It breaks the merge down into many, many, many little
merges, most of which are automatic, and a tiny few will need your help with.
Yes. While I haven't used imerge, it is well thought of and does much of
the early hard work.
It makes the giant merge much less painful. Not pain-free. Less painful.
iMerge operates in two passes. Pass one does the merge, and absolutely clutters
the history.
Pass 2 cleans up the history, and gives you a choice of either "This looks like a normal rebase", "This looks like
a normal merge", or "This looks like a rebase, but keeps the history of how the changes were made". This last
option *should* be the best choice, but at the moment it isn't. It leaves you with two sets of "history links" -- one
set important, one set unimportant -- and there is no way to indicate to git which is which, and no tools to say "don't show
these links by default". The result is that your history will look very messy, because the existing tools make assumptions
that this system breaks.
---
This message was composed with the aid of a laptop cat, and no mouse
Hope it's all looking good.
--
Philip
--
You received this message because you are subscribed to the Google Groups "Git for
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to git-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/git-users/5909a513-dabf-3bb0-9cb9-209019aedfad%40iee.org.
For more options, visit https://groups.google.com/d/optout.