On Wed, Jun 28, 2023 at 5:38 AM Jingbo Xu <jeffl...@linux.alibaba.com> wrote: > > Hi all, > > Sorry for the late reply as I was on vacation these days. > > I test the hash bit for all xattrs given by Alex[1], to see if each > xattr could be mapped into one unique bit in the 32-bit bloom filter. > > [1] > https://lore.kernel.org/all/cal7ro1hhyudrox7a-13p7rlbzswhtqwgodozvcykddku_la...@mail.gmail.com/ > > > On 6/21/23 4:32 PM, Jingbo Xu wrote: > > > > 3.2. input of hash function > > ------------------------- > > As previously described, each hash function will map the given data into > > one bit of the bloom filter map. In our use case, xattr name serves as > > the key of hash function. > > > > When .getxattr() gets called, only index (e.g. EROFS_XATTR_INDEX_USER) > > and the remaining name apart from the prefix are handy. To avoid > > constructing the full xattr name, the above index and name are fed into > > the hash function directly in the following way: > > > > ``` > > bit = xxh32(name, strlen(name), index + i); > > ``` > > > > where index serves as part of seed, so that it gets involved in the > > calculation for the hash. > > > All xattrs are hashed with one single hash function. > > I first tested with the following hash function: > > ``` > xxh32(name, strlen(name), index) > ``` > > where `index` represents the index of corresponding predefined name > prefix (e.g. EROFS_XATTR_INDEX_USER), while `name` represents the name > after stripping the above predefined name prefix (e.g. > "overlay.metacopy" for "user.overlay.metacopy") > > > The mapping results are: > > bit 0: security.SMACK64EXEC > bit 1: > bit 2: user.overlay.protattr > bit 3: trusted.overlay.impure, user.overlay.opaque, user.mime_type > bit 4: > bit 5: user.overlay.origin > bit 6: user.overlay.metacopy, security.evm > bit 8: trusted.overlay.opaque > bit 9: trusted.overlay.origin > bit 10: trusted.overlay.upper, trusted.overlay.protattr > bit 11: security.apparmor, security.capability > bit 12: security.SMACK64 > bit 13: user.overlay.redirect, security.ima > bit 14: user.overlay.upper > bit 15: trusted.overlay.redirect > bit 16: security.SMACK64IPOUT > bit 17: > bit 18: system.posix_acl_access > bit 19: security.selinux > bit 20: > bit 21: > bit 22: system.posix_acl_default > bit 23: security.SMACK64MMAP > bit 24: user.overlay.impure, user.overlay.nlink, security.SMACK64TRANSMUTE > bit 25: trusted.overlay.metacopy > bit 26: > bit 27: security.SMACK64IPIN > bit 28: > bit 29: > bit 30: trusted.overlay.nlink > bit 31: > > Here 30 xattrs are mapped into 22 bits. There are two potential > conflicts, i.e. bit 10 (trusted.overlay.upper, trusted.overlay.protattr) > and bit 24 (user.overlay.impure, user.overlay.nlink).
Bit 11 (apparmor and capabilities) seems like the most likely thing to run into. I.e. on an apparmor-using system, many files would have apparmor xattr set, so looking up security.capabilities on it would cause a false negative and we'd unnecessarily read the xattrs. > > An alternative way is to calculate the hash from the full xattr name by > > feeding the prefix string and the remaining name string separately in > > the following way: > > > > ``` > > xxh32_reset() > > xxh32_update(prefix string, ...) > > xxh32_update(remaining name, ...) > > xxh32_digest() > > ``` > > > > But I doubt if it really deserves to call multiple APIs instead of one > > single xxh32(). > > > I also tested with the following hash function, where the full name of > the xattr, e.g. "user.overlay.metacopy", is fed into the hash function. > > ``` > xxh32(name, strlen(name), 0) > ``` > > > Following are the mapping results: > > bit 0: trusted.overlay.impure, user.overlay.protattr > bit 1: security.SMACK64IPOUT > bit 2: > bit 3: security.capability > bit 4: security.selinux > bit 5: security.ima > bit 6: user.overlay.metacopy > bit 8: > bit 9: trusted.overlay.redirect, security.SMACK64EXEC > bit 10: system.posix_acl_access > bit 11: trusted.overlay.nlink > bit 12: trusted.overlay.opaque > bit 13: > bit 14: > bit 15: > bit 16: > bit 17: user.overlay.impure > bit 18: security.apparmor > bit 19: > bit 20: user.overlay.origin, user.overlay.nlink, security.SMACK64TRANSMUTE > bit 21: > bit 22: trusted.overlay.metacopy, trusted.overlay.protattr > bit 23: user.overlay.upper, security.evm > bit 24: user.overlay.redirect, security.SMACK64IPIN, > system.posix_acl_default > bit 25: security.SMACK64 > bit 26: > bit 27: trusted.overlay.upper, security.SMACK64MMAP > bit 28: trusted.overlay.origin, user.mime_type > bit 29: > bit 30: > bit 31: user.overlay.opaque > > 30 xattrs are mapped into 20 bits. Similarly there are two potential > conflicts, i.e. bit 20 (user.overlay.origin, user.overlay.nlink) and bit > 22 (trusted.overlay.metacopy, trusted.overlay.protattr). > > > Summary > ======= > > Personally I would prefer the former, as it maps xattrs into the bloom > filter more evenly (22 bits vs 20 bits) and can better cooperate with > the kernel routine (index and the remaining name string, rather than the > full name string, are handy). I agree that we want the approach with better cooperation with the kernel function. However, I would much prefer if all the xattrs that are commonly set on many files are unconflicted. This would be at least: selinux, ima, evm, apparmor. Can't you just add a magic constant to the seed? Then we can come up with one that gives a good spread and hardcode that. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc al...@redhat.com alexander.lars...@gmail.com