Hi,

I have a software product being built a few times a day (continuous
integration style). The end product is an installable tar.gz with many java
jars.

Since the content of the tar.gz's is mostly the same, I want to use a
filesystem that would dedupe the duplicated content.

As I see it, it's s FUSE filesystem that:

1. When a file with .tar.gz extension stored, it untar it and store it in a
folder (keeping the file order in a list).
2. When it is read again, it will tar gz the underlying folder, and will
give the gzip'd result.
3. It will keep a list of file hashes, and would replace the file with a
symlink to another file if possible.
4. Bonus: do the same for jars. Java is linked at runtime, so if a .java
file didn't change - neither does its class.

Is there anything like that available?
Is there a smarter solution?

(It is theoretically possible to save a folder instead of a tar.gz, and
dedupe at higher level, but it's much easier to use a tar.gz, since it
plays well with existing Java software (ie, nexus/artifactory, maven etc).
_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Reply via email to