Hi Lance,

I did a proof of concept where I stored the main document encrypted in a 
MongoDB database and the index contains the unstored versions of the data. I 
built a Solr component that would be set up as the last-component that took the 
docIds and queried the MongoDB database to build the documents and render them. 
I know of at least one installation where they do something very similar.

-sujit

On May 26, 2013, at 5:59 PM, Lance Norskog wrote:

> I would like to store Lucene indexes in an encrypted format. The only 
> security requirement is that if an intruder copies files from the file 
> system, no file will have raw data. It is acceptable for raw data to be 
> visible in raw disk scans. All I want to do is encrypt the readable index 
> files. 
> 
> Here is one way to encrypt Lucene indexes: encrypt the entire file on disk 
> and store the decrypted version in memory. This is ok with a RAMdirectory, 
> but does not scale. Using a little-known feature of Posix, it is possible to 
> create a memory-mapped file with a raw copy of the data which cannot be found 
> from the file system. The Posix feature is that when you open a file and then 
> delete it, the file still exists in the file system but is not visible 
> through the file system. The data exists as an invisible file in the file 
> system, and the file is deleted when you close the file descriptor. (This 
> does not work on Windows.) Let's call this a 'ghost file'. 
> 
> If memory-mapping works with ghost files, this seems like it should work: a 
> new Directory class will create a file and immediately delete it, then 
> memory-map it. The memory-mapped file will stay allocated inside the JVM 
> until the JVM closes the associated Directory object. The Directory class 
> would create an entire 'ghost Lucene index'.
> 
> This sequence opens an index:
> * open encrypted segment file in memory-mapped format
> * create ghost memory-mapped file
> * decrypt from encrypted memory into ghost file memory
> * close the encrypted index file
> Directory.close() wipes the ghost file data, closes the ghost file,  and the 
> file system reclaims the disk space.
> 
> This sequence creates an index:
> Directory.createOutput makes a ghost file and a real file.
> All data is saved to the ghost file.
> Close on the file encrypts the ghost file data into the real file, and wipes 
> the ghost data.
> Both files are then closed.
> 
> One glaring flaw is: what if close() is not called? The raw data will still 
> exist in the free disk space.
> There are two cases where this would happen:
> 1) the user fails to call close() but the program finishes normally. This can 
> be countered by adding a finalize() method that makes sure to clear the 
> memory.
> 2) the JVM fails and shutdown code is not run. The freed ghost data is on the 
> hard disk in the free disk space. It can only be found by scanning the raw 
> disks. One counter to this is to run the app in a virtual machine which does 
> not have access to the raw disk drivers. 
> 
> Is this a workable design? Are there any quirks of the Directory abstraction 
> that make this impossible or pointless? Or quirks in memory-mapped files or 
> how the JVM implements them?
> 
> Thanks for your time,
> 
> Lance Norskog
> 
> 
> 
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to