Hi Lance, I did a proof of concept where I stored the main document encrypted in a MongoDB database and the index contains the unstored versions of the data. I built a Solr component that would be set up as the last-component that took the docIds and queried the MongoDB database to build the documents and render them. I know of at least one installation where they do something very similar.
-sujit On May 26, 2013, at 5:59 PM, Lance Norskog wrote: > I would like to store Lucene indexes in an encrypted format. The only > security requirement is that if an intruder copies files from the file > system, no file will have raw data. It is acceptable for raw data to be > visible in raw disk scans. All I want to do is encrypt the readable index > files. > > Here is one way to encrypt Lucene indexes: encrypt the entire file on disk > and store the decrypted version in memory. This is ok with a RAMdirectory, > but does not scale. Using a little-known feature of Posix, it is possible to > create a memory-mapped file with a raw copy of the data which cannot be found > from the file system. The Posix feature is that when you open a file and then > delete it, the file still exists in the file system but is not visible > through the file system. The data exists as an invisible file in the file > system, and the file is deleted when you close the file descriptor. (This > does not work on Windows.) Let's call this a 'ghost file'. > > If memory-mapping works with ghost files, this seems like it should work: a > new Directory class will create a file and immediately delete it, then > memory-map it. The memory-mapped file will stay allocated inside the JVM > until the JVM closes the associated Directory object. The Directory class > would create an entire 'ghost Lucene index'. > > This sequence opens an index: > * open encrypted segment file in memory-mapped format > * create ghost memory-mapped file > * decrypt from encrypted memory into ghost file memory > * close the encrypted index file > Directory.close() wipes the ghost file data, closes the ghost file, and the > file system reclaims the disk space. > > This sequence creates an index: > Directory.createOutput makes a ghost file and a real file. > All data is saved to the ghost file. > Close on the file encrypts the ghost file data into the real file, and wipes > the ghost data. > Both files are then closed. > > One glaring flaw is: what if close() is not called? The raw data will still > exist in the free disk space. > There are two cases where this would happen: > 1) the user fails to call close() but the program finishes normally. This can > be countered by adding a finalize() method that makes sure to clear the > memory. > 2) the JVM fails and shutdown code is not run. The freed ghost data is on the > hard disk in the free disk space. It can only be found by scanning the raw > disks. One counter to this is to run the app in a virtual machine which does > not have access to the raw disk drivers. > > Is this a workable design? Are there any quirks of the Directory abstraction > that make this impossible or pointless? Or quirks in memory-mapped files or > how the JVM implements them? > > Thanks for your time, > > Lance Norskog > > > > > > > -- > Lance Norskog > goks...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org