Dear community,

First, I want to appreciate Nathan's amazing help on my two previous inquiries. 
The answers effectively led me to pinpoint the issue.

 The final decision I made after hours of analysis was to remove all data files 
exceeding 50k sizes from the git history. However, such practice is not 
sustainable and actually is pathological because it invalidates virtually all 
previous data files and hence hampers reproducibility of previous commits, 
especially unit testing. Therefore, I want to leave a message here with a hope 
to reach administrators of bioconductor.

 I would claim that this policy should be relaxed at least for the git 
packfile. Most of us know that the .pack file residing in .git/objects/pack has 
frequently been accused by BiocChecker() for its large size (as in 
here<https://stat.ethz.ch/pipermail/bioc-devel/2019-February/014703.html> or 
here<https://stat.ethz.ch/pipermail/bioc-devel/2020-October/017273.html>), 
which is natural due to the purpose of packfiles: storing "all removal history" 
in a single compact 
space<https://git-scm.com/book/en/v2/Git-Internals-Packfiles#:~:text=The%20packfile%20is%20a%20single,seek%20to%20a%20specific%20object.>.
 Compressing the whole git history in a file is effective only until the 
majority of delta are sentence-based changes in a text source file for example. 
In my practice, however, a modification in blob files tended to contribute much 
more because of boosted delta after compressing datasets where some 
modification has shaken their bit patterns. Such changes were still 
kilobyte-level, but gradually impacted the whole pack file size so I had to 
remove those cases. The current policy therefore forces deletions of kilo-sized 
files in git history, not just 'large' files...
 I might not be the only one using multiple 100kb-sized experimental data in 
unit testing and vignettes. Containing dozens of such files in a 5mb package 
might be acceptable. I believe the same can hold for the pack file because it 
just represents a collection of previous files which are still less than 5mb. I 
guess the policy can relax such file size limit to allow safer and reproducible 
developer practices.

Sincerely,
Adam.


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to