Re: [git-users] Retrieve all files of a group of projects as they were at a given date

Philip Oakley Sun, 26 May 2019 07:56:48 -0700

Hi Michael, Just back from a vacation.

On 24/05/2019 19:51, Michael wrote:

On 2019-05-16, at 11:35 AM, Giorgio Forti <alvarmayo...@gmail.com> wrote:

If I commit ONE file Git builds a "zip" that contains the actual situation of 
ALL the 6 thousand of files in my C# solution?
And if I check out this commithe file Git gives me back the complete situation 
at that moment?
This would be the solution.
This can work with files committed only locally and not pushed to the remote 
repository?

So let me explain, if I can, the cheat that git uses.

Git's internal backing store is a write-once file system. Once a file goes into the 
"filesystem", it never changes. Since it never changes, the true internal name 
is the hash of the file.

True (A) (these are Git's 'Blob' objects).


Elsewhere, there are maps of user-visible-filenames to hash-filenames. And 
those maps are also files inside git's file system, and they have a 
hash-filename.

True (B) (These are Git's 'Tree' objects).

Subtlety: At (A), the hash is that of just the _content_, and excludesthe file name, date, and other meta data. This de-duplicates renames ofcontent!Subtlety: At (B), this is where the _filenames_ (and directory names upthe trees) are membered and associated with blob hashes.


Please re-read those two bits twice more.

Alright, so what git does is this:
1. At any given moment, the "index" or "cache" contains the state of "what will be 
checked in next" -- and it consists of a full set of filename to hashname maps for the entire project.
2. On any given commit, the vast majority of files will not change, so the 
actual commit will have the same filename to hashname map as the last commit.
3. The actual commit is NOT the set of files, but the set of filename to 
hashname maps.

Yes. (C) (This is the Git 'Commit' object, it just knows its parent(s),and its top level tree).


Please re-read number 3 there.

Git will store lots and lots of junk files over time.

'Junk' - the intermediate work-in-progress stuff (especially fromgit-gui), yes.

  There is a separate mechanism that goes  through and finds all hash-filename 
files that are not referenced from any of the commit lists of user-filename to 
hash-filename maps, and cleans those out.

Yes.


Git does not build a zip of all the 6000 files in your commit.
It builds a "zip" of the mappings of user-filenames to hash-filenames.

"yes" - see (C) above - that "zip" is just a 40-char has to the top tree(which then does the 'zip cascade...)


And, since this is a tree structure, ** if a directory does not change, it is 
reusing the same directory user filename to git-hash-filename map **.

Absolutely!


The result is that when you change one file, all that changes is the directory 
object for the directory containing the filename of that object (because the 
user-filename now points to a different hash-filename), and all the parent 
directories back up to the root.

This is what makes git fast.

yep


When you say "git add filename", the new file has been added to git's 
write-once backing store. And, the filename to hashname map in the index/cache has been 
updated.

That's one copy, one hash calculation, and one updating of a 40 byte hash 
record in a file. Plus possibly updating hash calculations and data for each 
parent directory.

After that, it's just a case of recording which new hash number is in use in 
various places.

Yep.


This is git's cheat. It's just relying on unchanging hashes that have been 
mostly calculated in the past and are cheap to copy from A to B.

It's more that as simple "just". It's a stonking great benefit from theverifiable certainty of the strong hash. Even with the recent sha1'breakage', Git has some extra robustness features that mean it isn'tbroken yet (while pdf's are). If you know the hash, and have a copy repocontaining it, you definitely have the right content, history andeverything.

The above descriptions relate to the "loose" object viewpoint. Git thengoes one better, by being able to create a 'pack' file that compressesall those loose objects into one efficiently accessed view of the data,mainly aided by "Linus's Law" (files grow, older files are smaller).Pack files are something for the weekend.

I definitely recommend to learn about "git rebase -i" because you probably will 
need it in near future. Git cherry-pick may help, too, if you are not ready to learn 
about rebasing.

If you are looking to have to do a big, massive merge, which it sounds like ...

You will want to look into a tool called "imerge".  Git-imerge is a program that tries to 
solve the "massive merge" problem. It breaks the merge down into many, many, many little 
merges, most of which are automatic, and a tiny few will need your help with.

Yes. While I haven't used imerge, it is well thought of and does much ofthe early hard work.


It makes the giant merge much less painful. Not pain-free. Less painful.

iMerge operates in two passes. Pass one does the merge, and absolutely clutters 
the history.
Pass 2 cleans up the history, and gives you a choice of either "This looks like a normal rebase", "This looks like 
a normal merge", or "This looks like a rebase, but keeps the history of how the changes were made". This last 
option *should* be the best choice, but at the moment it isn't. It leaves you with two sets of "history links" -- one 
set important, one set unimportant -- and there is no way to indicate to git which is which, and no tools to say "don't show 
these links by default". The result is that your history will look very messy, because the existing tools make assumptions 
that this system breaks.

---
This message was composed with the aid of a laptop cat, and no mouse

Hope it's all looking good.
--
Philip

--
You received this message because you are subscribed to the Google Groups "Git for 
human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/git-users/5909a513-dabf-3bb0-9cb9-209019aedfad%40iee.org.
For more options, visit https://groups.google.com/d/optout.

Re: [git-users] Retrieve all files of a group of projects as they were at a given date

Reply via email to