You came late to the party, but you're the only one who brought cheque! Thanks, it's exactly what I was looking for. On May 28, 2013 4:22 PM, "Ori Berger" <linux...@orib.net> wrote:
> On 05/08/2013 09:22 PM, Elazar Leibovich wrote: > >> Hi, >> >> I have a software product being built a few times a day (continuous >> integration style). The end product is an installable tar.gz with many >> java jars. >> >> Since the content of the tar.gz's is mostly the same, I want to use a >> filesystem that would dedupe the duplicated content. >> >> As I see it, it's s FUSE filesystem that: >> >> . > .snip > . > >> Is there anything like that available? >> Is there a smarter solution? >> > . > > Apologies for being late to the party. > > The tar.gz makes everything a problem - a zip would work better for what > you want (because, unlike a .tar.gz, it will not compress across files - > each one will compress individually). > > However, there is an (essentially) ready made solution which will work > with .zips, but much much much better with the original folders: bup > > https://github.com/bup/bup > > As long as you don't care about ownership/permissions/**modification-time > (there's a branch that has those as well, but IIRC it's not in the main > branch yet), bup: > > a) dedups at the sub-file level (that is, if you add/delete/change 1 byte > in a 100GB file, the additional version will take ~10KB on average). bup > breaks file into "easy to find again" sections, and actually stores those > sections. A change of one byte will likely change just one such section, > which has expected size of ~8KB > > b) gzips each such section individually (so it won't be much larger than a > .tar.gz except for pathological cases) > > c) is randomly accessible - any version, any time > > d) comes with a command line front end, an FTP front end, a FUSE front > end, and possibly more I forgot. > > e) uses git as a storage format. If all else fails, you can poke at the > internals using git. > > f) has a "manual mode" (bup split / bup join), in which you supply your > own file through stdin, and bup still does its own dedup magic. You'd still > want to use .tar (best) or .zip (2nd best) rather than .tar.gz, of course. > > bup is the best thing for backup since sliced bread. It's also reasonably > fast, works locally or client/server through ssh, and more. The only thing > I'm really missing is built-in encryption, and some people who care more > about perms and ctime/mtime/atime in backups miss those - but otherwise, it > is teh awesome. >
_______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il