Hi Bruno,

great work!

About the IndexInput (and IndexOutput): Maybe have a look at OpenSearch where I worked together with the OpenSearch people to generate an IndexInput like the MMap, one which also decrypts on the fly, but is almost as fast as MMapIndexInput when reading from the index and pages are hot, but otherwise it is much faster than the one in your repo! The trick is to implement our own FS cache that keeps a pool of (decrpted) pages off-heap as MemorySegments. The code also has a cache manager which sits on top of a directory and has a fixed size (off-heap for the MemorySegments mentioned before). The IndexInput used for reading is basically a clone of MMapIndexInput, but differs from how it manages its MemorySegments: They are much smaller (not 16 GiB) and there is a hook to the underlying cache to get a decrpyted buffer. In addition all those pages are marked as "sticky" (will never ever go to swap), so the decrpyted pages are only in memory and won't go to disk.

The underlying IO-layer behind uses DirectIO to not pollute two caches. We use DirectIO for reading, decrpts and stores the pages in its own off-heap cache of fixed size (globally for whole OpenSearch node).

In addition the IndexOutput (may) - not sure if it does - write its pages to the cache like a read filesystem.

https://github.com/opensearch-project/opensearch-storage-encryption/tree/main/src/main/java/org/opensearch/index/store

 * Block cache, backed by Caffeine, basically it's a FS cache
   implementation:
   
https://github.com/opensearch-project/opensearch-storage-encryption/tree/main/src/main/java/org/opensearch/index/store/block_cache
 * The main buffer pool directory:
   
https://github.com/opensearch-project/opensearch-storage-encryption/blob/8e9cade44c38b9906af379ccb9aa5099aa1763d7/src/main/java/org/opensearch/index/store/CryptoDirectoryFactory.java#L430-L512
 * The BufferPoolDirectory using the block_cache and its buffer pool
   (preinitialized)

The initial issue where we mocked everything up: https://github.com/opensearch-project/opensearch-storage-encryption/issues/21

Uwe

Am 06.08.2025 um 10:14 schrieb Bruno Roustant:
Hi, I think the encryption module [1] in solr-sandbox is ready for a SIP
discussion.

I created SIP-25 [2] in the wiki, which contains attachments with the
architecture description and some diagrams. (Interestingly, I created them
by driving a generative AI on the encryption module code).

I think the encryption module now supports everything that needs to be
encrypted: index, transaction logs, replication, backups. It requires Solr
9.9.0. There are many tests, but the final test plan is to be discussed,
fyi it is currently used in production in my company.

In this module, the focus is on a seamless encryption, and ease of key
rotation which can be done without service interruption (serving queries
and indexing in parallel). It has an impact on query performance, so there
is a section in the architecture description that explains the use-case,
when to use this Java-level encryption compared to a faster OS-level
encryption.

[1]https://github.com/apache/solr-sandbox/tree/main/encryption
[2]
https://cwiki.apache.org/confluence/display/SOLR/SIP-25%3A+Encryption+Module

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:[email protected]

Reply via email to