Hi Bruno,
great work!
About the IndexInput (and IndexOutput): Maybe have a look at OpenSearch
where I worked together with the OpenSearch people to generate an
IndexInput like the MMap, one which also decrypts on the fly, but is
almost as fast as MMapIndexInput when reading from the index and pages
are hot, but otherwise it is much faster than the one in your repo! The
trick is to implement our own FS cache that keeps a pool of (decrpted)
pages off-heap as MemorySegments. The code also has a cache manager
which sits on top of a directory and has a fixed size (off-heap for the
MemorySegments mentioned before). The IndexInput used for reading is
basically a clone of MMapIndexInput, but differs from how it manages its
MemorySegments: They are much smaller (not 16 GiB) and there is a hook
to the underlying cache to get a decrpyted buffer. In addition all those
pages are marked as "sticky" (will never ever go to swap), so the
decrpyted pages are only in memory and won't go to disk.
The underlying IO-layer behind uses DirectIO to not pollute two caches.
We use DirectIO for reading, decrpts and stores the pages in its own
off-heap cache of fixed size (globally for whole OpenSearch node).
In addition the IndexOutput (may) - not sure if it does - write its
pages to the cache like a read filesystem.
https://github.com/opensearch-project/opensearch-storage-encryption/tree/main/src/main/java/org/opensearch/index/store
* Block cache, backed by Caffeine, basically it's a FS cache
implementation:
https://github.com/opensearch-project/opensearch-storage-encryption/tree/main/src/main/java/org/opensearch/index/store/block_cache
* The main buffer pool directory:
https://github.com/opensearch-project/opensearch-storage-encryption/blob/8e9cade44c38b9906af379ccb9aa5099aa1763d7/src/main/java/org/opensearch/index/store/CryptoDirectoryFactory.java#L430-L512
* The BufferPoolDirectory using the block_cache and its buffer pool
(preinitialized)
The initial issue where we mocked everything up:
https://github.com/opensearch-project/opensearch-storage-encryption/issues/21
Uwe
Am 06.08.2025 um 10:14 schrieb Bruno Roustant:
Hi, I think the encryption module [1] in solr-sandbox is ready for a SIP
discussion.
I created SIP-25 [2] in the wiki, which contains attachments with the
architecture description and some diagrams. (Interestingly, I created them
by driving a generative AI on the encryption module code).
I think the encryption module now supports everything that needs to be
encrypted: index, transaction logs, replication, backups. It requires Solr
9.9.0. There are many tests, but the final test plan is to be discussed,
fyi it is currently used in production in my company.
In this module, the focus is on a seamless encryption, and ease of key
rotation which can be done without service interruption (serving queries
and indexing in parallel). It has an impact on query performance, so there
is a section in the architecture description that explains the use-case,
when to use this Java-level encryption compared to a faster OS-level
encryption.
[1]https://github.com/apache/solr-sandbox/tree/main/encryption
[2]
https://cwiki.apache.org/confluence/display/SOLR/SIP-25%3A+Encryption+Module
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:[email protected]