>> `git gc` does delete the old data (if it's not reachable any more). > And it is very expensive. My point exactly.
It's fairly expensive indeed, but it's usually an operation that is not very time-sensitive: it can usually be delayed to a convenient time, and you can run it infrequently and as a low-priority background task. A good reason why you usually don't want to run it frequently, is that due to the sharing ("deduplication"), there's usually not that much garbage to collect. [ IOW, often a thousand backups (of the same machine) don't take up much more space than a single backup. ] >> BTW, if you want to (ab)use a Git repository to do backups, you should >> definitely look at `bup`. > Thanks, it might be exactly what I am looking for. Bup uses the same format as Git, but has its own implementation for most operations because the performance of Git is tuned for a very different use-case. With Bup it's common to have a repository that is much larger than 100GB, whereas Git very rarely manages repositories of such size. Stefan