The easiest way to do this is put the index over
an encrypted file system. Encrypting the actual
_tokens_ has a few problems, not the least of
which is that any encryption algorithm worth
its salt is going to make most searching totally
impossible.

Consider run, runner, running and runs with
simple wildcards. Searching for run* requires that all 4
variants have 'run' as a prefix, and any decent
encryption algorithm will not do that. Any
encryption that _does_ make that search possible
is trivially broken. I usually stop my thinking there,
but ngrams, casing, WordDelimiterFilterFactory
all come immediately to mind as "interesting".

But what about stored data you ask? Yes, the
stored fields are compressed but stored verbatim,
so I've seen arguments for encrypting _that_ stream,
but that's really a "feel good" fig-leaf. If I get access to the
index and it has position information, I can reconstruct
documents without the stored data as Luke does. The
process is a bit lossy, but the reconstructed document
has enough fidelity that it'll give people seriously
concerned about encryption conniption fits.

So all in all I have to back up Shawn's comments: You're
better off isolating your Solr/Lucene system, putting
authorization to view _documents_ at that level, and possibly
using an encrypted filesystem.

FWIW,
Erick

On Sat, Sep 5, 2015 at 7:27 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 9/5/2015 5:06 AM, Adam Retter wrote:
>> I wondered if there is any facility already existing in Lucene for
>> encrypting the values stored into the index and still being able to
>> search them?
>>
>> If not, I wondered if anyone could tell me if this is impossible to
>> implement, and if not to point me perhaps in the right direction?
>>
>> I imagine that just the text values and document fields to index (and
>> optionally store) in the index would be either encrypted on the fly by
>> Lucene using perhaps a public/private key mechanism. When a user issues
>> a search query to Lucene they would also provide a key so that Lucene
>> can decrypt the values as necessary to try and answer their query.
>
> I think you could probably add transparent encryption/decryption at the
> Lucene level in a custom codec.  That probably has implications for
> being able to read the older index when it's time to upgrade Lucene,
> with a complete reindex being the likely solution.  Others will need to
> confirm ... I'm not very familiar with Lucene code, I'm here for Solr.
>
> Any verification of user identity/permission is probably best done in
> your own code, before it makes the Lucene query, and wouldn't
> necessarily be related to the encryption.
>
> Requirements like this are usually driven by paranoid customers or
> product managers.  I think that when you really start to examine what an
> attacker has to do to actually reach the unencrypted information (Lucene
> index in this case), they already have acquired so much access that the
> system is completely breached and it won't matter what kind of
> encryption is added.
>
> I find many of these requirements to be silly, and put an incredible
> burden on admin and developer resources with little or no benefit.
> Here's an example of similar customer encryption requirement which I
> encountered recently:
>
> We have a web application that has three "hops" involved.  A user talks
> to a load balancer, which talks to Apache, where the connection is then
> proxied to a Tomcat server with the AJP protocol.  The customer wanted
> all three hops encrypted.  The first hop was already encrypted, the
> second was easy, but the third proved to be very difficult.  Finally we
> decided that we did not need load balancing on that last hop, and it
> could simply talk to localhost, eliminating the need to encrypt it.
>
> The customer was worried about an attacker sniffing the traffic on the
> LAN and seeing details like passwords.  I consider this to be an insane
> requirement.  In order to sniff that traffic, the attacker would need
> one of two things:  Root access on a server, or physical access to the
> infrastructure.  Physical access can be escalated to root access if you
> know what you're doing.  Once someone has either of those things,
> encrypted traffic won't matter, they will be able to learn anything they
> need or do any damage they desire, without even needing to sniff the
> traffic.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to