RE: EXT :Re: GIT and large files

Stewart, Louis (IS) Tue, 20 May 2014 09:55:04 -0700

The files in question would be in directory containing many files some small 
other huge (example: text files, docs,and jpgs are Mbs, but executables and ova 
images are GBs, etc).

Lou

From: Gary Fixler [mailto:[email protected]] 
Sent: Tuesday, May 20, 2014 12:09 PM
To: Stewart, Louis (IS)
Cc: [email protected]
Subject: EXT :Re: GIT and large files

Technically yes, but from a practical standpoint, not really. Facebook recently 
revealed that they have a 54GB git repo[1], but I doubt it has 20+GB files in 
it. I've put 18GB of photos into a git repo, but everything about the process 
was fairly painful, and I don't plan to do it again.
Are your files non-mergeable binaries (e.g. videos)? The biggest problem here 
is with branching and merging. Conflict resolution with non-mergeable assets 
ends up an us-vs-them fight, and I don't understand all of the particulars of 
that. From git's standpoint it's simple - you just have to choose one or the 
other. From a workflow standpoint, you end up causing trouble if two people 
have changed an asset, and both people consider their change important. 
Centralized systems get around this problem with locks.
Git could do this, and I've thought about it quite a bit. I work in games - we 
have code, but also a lot of binaries, that I'd like to keep in sync with the 
code. For awhile I considered suggesting some ideas to this group, but I'm 
pretty sure the locking issue makes it a non-starter. The basic idea - skipping 
locking for the moment - would be to allow setting git attributes by file type, 
file size threshold, folder, etc., to allow git to know that some files are 
considered "bigfiles." These could be placed into the objects folder, but I'd 
actually prefer they go into a .git/bigfile folder. They'd still be saved as 
contents under their hash, but a normal git transfer wouldn't send them. They'd 
be in the tree as 'big' or 'bigfile' (instead of 'blob', 'tree', or 'commit' 
(for submodules)).

Git would warn you on push that there were bigfiles to send, and you could add, 
say, --with-big to also send them, or send them later with, say, `git push 
--big`. They'd simply be zipped up and sent over, without any packfile 
fanciness. When you clone, you wouldn't get the bigfiles, unless you specified 
--with-big, and it would warn you that there are also bigfiles, and tell you 
what command to run to get also get them (`git fetch --big`, perhaps). Git 
status would always let you know if you were missing bigfiles. I think hopping 
around between commits would follow the same strategy, you'd always have to, 
e.g. `git checkout foo --with-big`, or `git checkout foo` and then `git update 
big` (or whatever - I'm not married to any of these names).

Resolving conflicts on merge would simply have to be up to you. It would be 
documented clearly that you're entering weird territory, and that your team has 
to deal with bigfiles somehow, perhaps with some suggested strategies ("Pass 
the conch?"). I could imagine some strategies for this. Maybe bigfiles require 
connecting to a blessed repo to grab the right to make a commit on it. That has 
many problems, of course, and now I can feel everyone reading this shifting 
uneasily in their seats :)
-g

[1] https://twitter.com/feross/status/459259593630433280

On Tue, May 20, 2014 at 8:37 AM, Stewart, Louis (IS) <[email protected]> 
wrote:
Can GIT handle versioning of large 20+ GB files in a directory?

Lou Stewart
AOCWS Software Configuration Management
757-269-2388

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

N�����r��y����b�X��ǧv�^�)޺{.n�+����ا���ܨ}���Ơz�&j:+v�������zZ+��+zf���h���~����i���z��w���?�����&�)ߢf

RE: EXT :Re: GIT and large files

Reply via email to