I've used something like this approach in the past. All the normal caveats about making sure you've got a backup apply.
Find the names of largest objects in the pack file (not necessarily in size order). In this case they're almost all .rda files. git rev-list --objects --all | grep -f <(git verify-pack -v .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -15) e63fb55738f4d6643939863ec7799776d4b161c5 EWCE.html f67b528ec5e029fbeb45c2ff90d619de0d7ae4c0 articles/EWCE.html b871cbacac1c1fe1b372a8eca9f7c68122fc4bf4 articles/EWCE_files/figure-html/unnamed-chunk-21-1.png ae0e4cda88322aaff0b064136c84096d16dc219f reference/ewce.plot-1.png 8946eeb7255c328676a61da71276a29002e34d1f data/all_hgnc.rda 60814dfe9cbf3cb77b846a9fc0270bc7cc00d50c data/all_hgnc_wtEnsembl.rda d152a56e7290abb06eab1112910a499145dbd3e1 data/all_mgi.rda 7075962fb2ccc78b826c7fc1823d0e3d5e5d7b01 data/all_mgi_wtEnsembl.rda 5d7d0a395c104ad39f105ad85c7a84663e0e6002 data/ensembl_transcript_lengths_GCcontent.rda 100a7fa8df12deb1803a437b442c0897811916df data/mgi_synonym_data.rda f890d2bbd63b7ecff94e4917b6b7188399659221 data/mouse_to_human_homologs.rda fddddd7022bc96d24d75cf71d65c097d84bade88 data/tt_alzh.rda 98aba69ade5c09a2100248c963bb5397860ae089 data/tt_alzh_BA36.rda 0f006997c7a45a5647dd5ce21be650d6c197ea29 data/tt_alzh_BA44.rda 67b2d63f55531f85ece47e298213fd25cacdaa01 data/cortex_mrna.rda Filter files with the .rda extension. I guess you should be more careful here if there are rda files you want to retain, but I don't see any in the main branch on Github. I get a pretty scary looking warning from git, but it seems to have worked out ok for me in the past. git filter-branch --index-filter 'git rm --cached --ignore-unmatch *.rda' -- --all Apply the removal to the repo. rm -Rf .git/refs/original rm -Rf .git/logs/ git gc --aggressive --prune=now Check the new size of the pack folder. du -h .git/objects/pack 3,9M .git/objects/pack You could probably apply this approach to remove the large .html files too, but it looks like they're part of the pkgdown site for your package so I imagine you want to keep them. Mike On Tue, 9 Mar 2021 at 10:09, Murphy, Alan E <a.mur...@imperial.ac.uk> wrote: > Hi both, > > Thank you for your suggestions. Yes, I am still having problems with the > size of my git history in the EWCE package. To clarify, I have already > tried the BFG cleaner to no avail even when I set the max limit to 1 MB > (see my first email for details). > > The issue is that a .git/objects/pack/ file is still greater than the > allotted 5MB, it appears to be 8.9MB in size. As mentioned, I have used the > BFG cleaner and yet this still remains too large. If anyone has suggestions > on how else I could reduce this size that would be great. > > @Nitesh Turaga<mailto:nturaga.b...@gmail.com> how would I go about > checking (and removing?) hidden files from the .git/objects/pack history? > > Kind regards, > Alan. > ________________________________ > From: stefano <mangiolastef...@gmail.com> > Sent: 08 March 2021 22:18 > To: Nitesh Turaga <nturaga.b...@gmail.com> > Cc: Murphy, Alan E <a.mur...@imperial.ac.uk>; bioc-devel@r-project.org < > bioc-devel@r-project.org> > Subject: Re: [Bioc-devel] Removal of large items in git history - > BiocCheck warning > > > This email from mangiolastef...@gmail.com originates from outside > Imperial. Do not click on links and attachments unless you recognise the > sender. If you trust the sender, add them to your safe senders list< > https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping > for this address. > > > > Hello, > > you can use bfg-repo-cleaner , > > have a read to this document, in the section "eliminate big files from > repo" > > > https://docs.google.com/document/d/1jxg7KCMQq3kiCcvodQk9JgtU51LqczOwLit1gHiTP4Q/edit?usp=sharing > > > Best wishes. > > Stefano > > > > Stefano Mangiola | Postdoctoral fellow > > Papenfuss Laboratory > > The Walter Eliza Hall Institute of Medical Research > > +61 (0)466452544 > > > Il giorno mar 9 mar 2021 alle ore 09:11 Nitesh Turaga < > nturaga.b...@gmail.com<mailto:nturaga.b...@gmail.com>> ha scritto: > Hi Alan, > > Did you manage to solve this? > > There seems to be objects in your git repo which are bigger than the size > which is required by Bioconductor for a software package. Please check > hidden files as well. > > One test you can do is, to clone your package from github and see how much > MB are downloaded to this new location. This is a good test to check which > files are still larger than the limit. > > Best, > > Nitesh > > On 3/4/21, 11:19 AM, "Bioc-devel on behalf of Murphy, Alan E" < > bioc-devel-boun...@r-project.org<mailto:bioc-devel-boun...@r-project.org> > on behalf of a.mur...@imperial.ac.uk<mailto:a.mur...@imperial.ac.uk>> > wrote: > > Hi all, > > I am working on the development of EWCE< > https://github.com/NathanSkene/EWCE> for submission to Bioconductor. I > have removed some large objects from the package and moved them to a > separate ExperimentHub package however, after their removal, I got a > BiocCheck large file warning. > > To deal with the data stored in git history, I followed the > instructions to use the BFG cleaner with the max size set to 5MB. This > appeared to work and some things were removed but yet I still get the > warning below: > > $warning[1] "The following files are over 5MB in size: > '.git/objects/pack/pack-366a7ab7a2ba4e656f3a9f3f1408be7ab9f41303.pack'" > > If I try to rerun the BFG cleaner I get the following output: > > > Warning : no large blobs matching criteria found in packfiles - does > the repo need to be packed? > > I have tried two different methods to using the BFG cleaner, one from > BFG<https://rtyley.github.io/bfg-repo-cleaner/> themselves and one from > Bioconductor< > https://bioconductor.org/developers/how-to/git/remove-large-data/>. I > have also completed all steps in both including the prune step: > > > git reflog expire --expire=now --all && git gc --prune=now --aggressive > > I have even tried reducing the max from 5MB to 1MB but still nothing > seems to be left eve at that size. Does anyone know of another way to sort > this issue or have any clue what I may be doing wrong? > > Kind regards, > Alan. > > Alan Murphy > Bioinformatician > Neurogenomics lab > UK Dementia Research Institute > Imperial College London > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > _______________________________________________ > Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel