I would like to store Lucene indexes in an encrypted format. The only security requirement is that if an intruder copies files from the file system, no file will have raw data. It is acceptable for raw data to be visible in raw disk scans. All I want to do is encrypt the readable index files.
Here is one way to encrypt Lucene indexes: encrypt the entire file on disk and store the decrypted version in memory. This is ok with a RAMdirectory, but does not scale. Using a little-known feature of Posix, it is possible to create a memory-mapped file with a raw copy of the data which cannot be found from the file system. The Posix feature is that when you open a file and then delete it, the file still exists in the file system but is not visible through the file system. The data exists as an invisible file in the file system, and the file is deleted when you close the file descriptor. (This does not work on Windows.) Let's call this a 'ghost file'. If memory-mapping works with ghost files, this seems like it should work: a new Directory class will create a file and immediately delete it, then memory-map it. The memory-mapped file will stay allocated inside the JVM until the JVM closes the associated Directory object. The Directory class would create an entire 'ghost Lucene index'. This sequence opens an index: * open encrypted segment file in memory-mapped format * create ghost memory-mapped file * decrypt from encrypted memory into ghost file memory * close the encrypted index file Directory.close() wipes the ghost file data, closes the ghost file, and the file system reclaims the disk space. This sequence creates an index: Directory.createOutput makes a ghost file and a real file. All data is saved to the ghost file. Close on the file encrypts the ghost file data into the real file, and wipes the ghost data. Both files are then closed. One glaring flaw is: what if close() is not called? The raw data will still exist in the free disk space. There are two cases where this would happen: 1) the user fails to call close() but the program finishes normally. This can be countered by adding a finalize() method that makes sure to clear the memory. 2) the JVM fails and shutdown code is not run. The freed ghost data is on the hard disk in the free disk space. It can only be found by scanning the raw disks. One counter to this is to run the app in a virtual machine which does not have access to the raw disk drivers. Is this a workable design? Are there any quirks of the Directory abstraction that make this impossible or pointless? Or quirks in memory-mapped files or how the JVM implements them? Thanks for your time, Lance Norskog -- Lance Norskog goks...@gmail.com