On 2014-10-30 05:26, lu...@plaintext.sk wrote:
Hi,
I want to ask, if deduplicated file content will be cached in linux kernel just 
once for two deduplicated files.

To explain in deep:
  - I use btrfs for whole system with few subvolumes with some compression on 
some subvolumes.
  - I have two directories with eclipse SDK with slightly differences (same 
version, different config)
  - I assume that given directories is deduplicated and so two eclipse 
installations take place on hdd like one would (in rough estimation)
  - I will start one of given eclipse
  - linux kernel will cache all opened files during start of eclipse (I have 
enough free ram)
  - I am just happy stupid linux user:
     1. will kernel cache file content after decompression? (I think yes)
     2. cached data will be in VFS layer or in block device layer?
  - When I will lunch second eclipse (different from first, but deduplicated 
from first) after first one:
     1. will second start require less data to be read from HDD?
     2. will be metadata for second instance read from hdd? (I asume yes)
     3. will be actual data read second time? (I hope not)

Thanks for answers,
have a nice day,

I don't know for certain, but here is how I understand things work in this case: 1. Individual blocks are cached in the block device layer, which means that the de-duplicated data would only be cached at most as many times as there are disks it is on (ie at most 1 time for a single device filesystem, up to twice for a multi-device btrfs raid1 setup). 2. In the vfs layer, the cache handles decoded inodes (the actual file metadata), dentries (the file's entry in the parent directory), and individual pages of file content (after decompression). AFAIK, the vfs layer's cache is pathname based, so that would probably cache two copies of the data, but after the metadata look-up, wouldn't need to read from the disk cause of the block layer cache.

Overall, this means that while de-duplicated data may be cached more than once, it shouldn't need to be reread from disk if there is still a copy in cache. Metadata may or may not need to be read from the disk, depending on what is in the VFS cache.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to