Re: Migration to Git LFS inflates repository multiple times
On Mon, 12 Nov 2018 at 00:47, Mateusz Loskot wrote: > > Hi, > > I'm posting here for the first time and I hope it's the right place to ask > questions about Git LFS. > > TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times > and how to deal with it? FYI, answers to my questions have been completed via GitHub https://github.com/git-lfs/git-lfs/issues/3374 I'd like to thank Jeff and Ævar here for help too. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net
Re: Migration to Git LFS inflates repository multiple times
On Mon, 12 Nov 2018 at 13:31, Jeff King wrote: > On Mon, Nov 12, 2018 at 12:47:42AM +0100, Mateusz Loskot wrote: > > > > TL;TR: Is this normal a repository migrated to Git LFS inflates multiple > > times > > and how to deal with it? > > That does sound odd to me. People with more LFS experience can probably > give you a better answers FYI, I forwarded my question to https://github.com/git-lfs/git-lfs/issues/3374 > but one thought occurred to me: does LFS > store backup copies of the original refs that it rewrites (similar to > the way filter-branch stores refs/original)? I don't think I see any backup refs (see below for full list). But, I may be misunderstanding what they are, how to look for them. > history. Which might mean storing those large blobs both as Git objects > (for the old history) and in an LFS cache directory (for the new > history). Yes, it makes sense. > And the right next step is probably to delete those backup refs, and > then "git gc --prune=now". Hmm, actually thinking about it, reflogs > could be making the old history reachable, too. > > Try looking at the output of "git for-each-ref" and seeing if there are > any backup refs. I see. Here is the list (long!) of all I found: proj.git (BARE:master) $ git for-each-ref c718eadcf8d09d68c385f0a9355a2c871474fb43 commit refs/heads/1.0 daa75889053b70515179e334cbe3fe6fc7873ff3 commit refs/heads/1.1 cb70db292c1f0c62170d05ffa8dad3c87a6f8ebd commit refs/heads/2.0 f1597e80fcea16bec96dc43f7ab706616126305b commit refs/heads/3.0 1d9e4813ae2fdc5c2b52f7115facda9059b009dc commit refs/heads/master f41edf37e9a4120bc5d5d66b29d110d403b8db9b commit refs/svn/attic/tags/1.0.1006/6674 850166b21f27447c6b503bb753c454ccedcea8ef commit refs/svn/attic/tags/1.0.216/1291 8a24407f2df0ea7a401fbc08b387387538912642 commit refs/svn/attic/tags/1.0.252/1543 771d81b0756d6ff7d73779ed79f49a607bffb80e commit refs/svn/attic/tags/1.0.299/1883 10925d0fe0de090d4de109fd6403b86d014d6a21 commit refs/svn/attic/tags/1.0.342/2288 ca8dc7b243d002ac6b27f219a6172d36e7885ac1 commit refs/svn/attic/tags/1.0.391/2470 79ebabed25d31dfa34ad68e58fa9327f71928df1 commit refs/svn/attic/tags/1.0.433/2657 d3ed45804843c5aa153810b7494e2c7c0b842c82 commit refs/svn/attic/tags/1.0.450/2724 088af5dbb225cb8dfbeafb5b63158234f4e4017d commit refs/svn/attic/tags/1.0.502/2967 5da8598ed98a5a6610108d67849955da14f9d5b8 commit refs/svn/attic/tags/1.0.546/3212 c3463397337f9f6e5f9d8e64cd79c013fd798bc8 commit refs/svn/attic/tags/1.0.615/3470 673b2f93fda830cc8f28d436abead5fd54baa361 commit refs/svn/attic/tags/1.0.657/3704 247cf24b90afd39f4f5dd7a27cf6b74483215285 commit refs/svn/attic/tags/1.0.662/3725 e2a1609bb6b15ee767565ca0ff152eae3f72a76b commit refs/svn/attic/tags/1.0.673/3820 48033fb9046b1ee60ad9b73073e9185c31ee4568 commit refs/svn/attic/tags/1.0.742/4325 d7b566a275209d0aebba39c7a4028c9dcfb8a468 commit refs/svn/attic/tags/1.1.1141/13525 80f922becfc406420d2f14543e6da684f7377504 commit refs/svn/attic/tags/1.1.1535/16534 252481191cacdef0e77eb6ec02c98b07fca7bc77 commit refs/svn/attic/tags/1.1.1582/16435 601f072d559b664c101d89f3445ed3d00f4ef5dd commit refs/svn/attic/tags/1.1.939/12077 417fb23d71cab30e2c6218faed6f86021b67ca25 commit refs/svn/attic/tags/2.0.1156/21143 5539fbbe0078b782af02d46e0c1abc86ce3d5902 commit refs/svn/map c718eadcf8d09d68c385f0a9355a2c871474fb43 commit refs/svn/root/branches/1.0 daa75889053b70515179e334cbe3fe6fc7873ff3 commit refs/svn/root/branches/1.1 cb70db292c1f0c62170d05ffa8dad3c87a6f8ebd commit refs/svn/root/branches/2.0 f1597e80fcea16bec96dc43f7ab706616126305b commit refs/svn/root/branches/3.0 e946e96ce2b37a771769196027ae87b8f24181e0 commit refs/svn/root/tags/1.0.1058 06928f42664384bd5e24c115f9c23acc2fd949da commit refs/svn/root/tags/1.0.1240 9f059337974aa195386c5f3ee21957551624aa27 commit refs/svn/root/tags/1.0.1653 db98d68f93d2e9127ab766d3bbe6c933ec169d29 commit refs/svn/root/tags/1.0.764 20ad69c6b94bc8b73b613b5c21d367f22f423501 commit refs/svn/root/tags/1.1.1163 458be535d7f0bc512e800759759e86a211a418b6 commit refs/svn/root/tags/1.1.1290 df045ab97bee94e8cfe72b70802b837719899587 commit refs/svn/root/tags/1.1.1556 cd8ce83868016a0854c0f3cf1b23ea68a32674a2 commit refs/svn/root/tags/1.1.1706 fcd0801b93f48bd46b276ccb82678a70f11fc3ca commit refs/svn/root/tags/1.1.1809 5df0902cfe973b0a041409ec2e8d2314f2b8031e commit refs/svn/root/tags/1.1.2368 c9a06b4f43bab77ed283fe2736ab5c865e03026e commit refs/svn/root/tags/1.1.2417 32e505f8c4deadb73c63bd20a598481cd164541d commit refs/svn/root/tags/1.1.947 ef9c3667ec419bcb6d5eb5b9dbacb1cff0b1051e commit refs/svn/root/tags/2.0.1187 a1a6a5bedb8949eb91f3509929edb9efa9ad2875 commit refs/svn/root/tags/2.0.1198 33a8f49da311caecdb5521759251bbcb78e3bff2 commit refs/svn/root/tags/2.0.1338 63e59278131281858296b56f4ef5dd91c332941a commit refs/svn/root/tags/2.0.1481 d23a1c662f772a7fc0d23a07794b57cfd9eff064 commit refs/svn/root/tags/2.0.1835 c53e2cc4660a9e3121dff33c28c1383766fda39b commit refs/svn/root/tags/2.0.2148 c7e0293ec09fee809fba707054cd1fd8fe492664 commit
Re: Migration to Git LFS inflates repository multiple times
On Mon, Nov 12 2018, Jeff King wrote: > On Mon, Nov 12, 2018 at 12:47:42AM +0100, Mateusz Loskot wrote: > >> Hi, >> >> I'm posting here for the first time and I hope it's the right place to ask >> questions about Git LFS. >> >> TL;TR: Is this normal a repository migrated to Git LFS inflates multiple >> times >> and how to deal with it? > > That does sound odd to me. People with more LFS experience can probably > give you a better answers, but one thought occurred to me: does LFS > store backup copies of the original refs that it rewrites (similar to > the way filter-branch stores refs/original)? > > If so, then the resulting repo has the new history _and_ the old > history. Which might mean storing those large blobs both as Git objects > (for the old history) and in an LFS cache directory (for the new > history). > > And the right next step is probably to delete those backup refs, and > then "git gc --prune=now". Hmm, actually thinking about it, reflogs > could be making the old history reachable, too. > > Try looking at the output of "git for-each-ref" and seeing if there are > any backup refs. After deleting them (or confirming that there aren't), > prune the reflogs with: > > git reflog expire --expire-unreachable=now --all > > and then "git gc --prune=now". Even if it's only the most recent version of each file this could also be explained by LFS storing each file inflated as-is on disk, whereas git will store them delta-compressed. According to the initial E-Mail "*.exe,*.dll,*.lib,*.pdb,*.zip" was added to LFS. Depending on the content of those they might be delta compressing somewhat better than random data.
Re: Migration to Git LFS inflates repository multiple times
On Mon, Nov 12, 2018 at 12:47:42AM +0100, Mateusz Loskot wrote: > Hi, > > I'm posting here for the first time and I hope it's the right place to ask > questions about Git LFS. > > TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times > and how to deal with it? That does sound odd to me. People with more LFS experience can probably give you a better answers, but one thought occurred to me: does LFS store backup copies of the original refs that it rewrites (similar to the way filter-branch stores refs/original)? If so, then the resulting repo has the new history _and_ the old history. Which might mean storing those large blobs both as Git objects (for the old history) and in an LFS cache directory (for the new history). And the right next step is probably to delete those backup refs, and then "git gc --prune=now". Hmm, actually thinking about it, reflogs could be making the old history reachable, too. Try looking at the output of "git for-each-ref" and seeing if there are any backup refs. After deleting them (or confirming that there aren't), prune the reflogs with: git reflog expire --expire-unreachable=now --all and then "git gc --prune=now". -Peff
Migration to Git LFS inflates repository multiple times
Hi, I'm posting here for the first time and I hope it's the right place to ask questions about Git LFS. TL;TR: Is this normal a repository migrated to Git LFS inflates multiple times and how to deal with it? I'm migrating a big SVN repository to Git. In SVN, a collection of third-party SDKs is maintained along with codebase. Many of the third-party libraries come in binary form. So, I'm migrating binary files of those to Git LFS. I'm following the Git LFS tutorial, section "Migrating existing repository data to LFS" https://github.com/git-lfs/git-lfs/wiki/Tutorial First, I run initial translation of the SVN reoi into Git.. The new repository is a Git bare repository. There are 5 branches and 10+ tags in the proj.git repo. It is quite large: proj.git (BARE:master) $ du -sh 19G Next, I performed the following sequence of steps to optimise it and migrate to Git LFS: 1. Optimise the repo proj.git (BARE:master) $ git gc Enumerating objects: 1432599, done. Counting objects: 100% (1432599/1432599), done. Delta compression using up to 48 threads Compressing objects: 100% (864524/864524), done. Writing objects: 100% (1432599/1432599), done. Total 1432599 (delta 541698), reused 1405922 (delta 525738) Removing duplicate objects: 100% (256/256), done. Checking connectivity: 1432599, done. proj.git (BARE:master) $ du -sh 11G 2. List the file types taking up the most space in the repo proj.git (BARE:master) $ git lfs migrate info --everything migrate: Sorting commits: ..., done migrate: Examining commits: 100% (29412/29412), done *.lib 27 GB 3524/3524 files(s) 100% *.pdb 5.6 GB 1412/1412 files(s) 100% *.cpp 4.8 GB 131848/131854 files(s) 100% *.exe 2.3 GB798/798 files(s) 100% *.dll 2.0 GB 1000/1000 files(s) 100% 3. Migrate the repo to Git LFS proj.git (BARE:master) $ git lfs migrate import --include="*.exe,*.dll,*.lib,*.pdb,*.zip" --everything 4. Check size of the repo after migration to Git LFS proj.git (BARE:master) $ du -sh 47G 5. Cleaning up the `.git` directory after migration to Git LFS proj.git (BARE:master) $ git reflog expire --expire-unreachable=now --all proj.git (BARE:master) $ git gc --prune=now --aggressive Enumerating objects: 1462310, done. Counting objects: 100% (1462310/1462310), done. Delta compression using up to 48 threads Compressing objects: 100% (1422322/1422322), done. Writing objects: 100% (1462310/1462310), done. Total 1462310 (delta 577640), reused 845097 (delta 0) Removing duplicate objects: 100% (256/256), done. Checking connectivity: 1462310, done. 6. Check final disk size of the repo proj.git (BARE:master) $ du -sh 39G 7. List the file types taking up the most space in the repository after migration to Git LFS proj.git (BARE:master) $ git lfs migrate info --everything migrate: Sorting commits: ..., done migrate: Examining commits: 100% (29412/29412), done *.cpp 4.8 GB 131848/131854 files(s) 100% *.png 1.1 GB 696499/696499 files(s) 100% *.h 828 MB86386/86471 files(s) 100% *.csv 820 MB939/939 files(s) 100% *.html 686 MB34126/34126 files(s) 100% Now, I'm looking for anaswers to the following questions: 1. Is the procedure presented above correct to migrate (SVN ->) Git -> Git LFS? 2. Given the initial translation to Git generated 19 GB repo (optimised to 11 GB) is this normal Git LFS migration inflates the repository to 47 GB (optimised ot 39 GB)? 3. Why the inflation happens? Is this a function of number of branches? How to understand the jump from 11 GB to 39 GB? 4. How to optimise the repository to cut the size down further? My next step is to somehow push the fat pig into GitHub, Bitbucket or Azure DevOps ;-) I've used Git for a few years, but I'm pretty newbie regarding low-level or administration tasks, so I might have made basic errors. I'll be thankful for any feedback. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net