Thanks Walter, that would be a neat solution if we just wanted to store
values, but we also want full-text query capabilities.

On 5 September 2015 at 17:56, Walter Underwood <wun...@wunderwood.org>
wrote:

> Alternatively, do not store values in the Solr fields. Return a key and
> fetch encrypted data from a database or other repository.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> On Sep 5, 2015, at 9:40 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> The easiest way to do this is put the index over
> an encrypted file system. Encrypting the actual
> _tokens_ has a few problems, not the least of
> which is that any encryption algorithm worth
> its salt is going to make most searching totally
> impossible.
>
> Consider run, runner, running and runs with
> simple wildcards. Searching for run* requires that all 4
> variants have 'run' as a prefix, and any decent
> encryption algorithm will not do that. Any
> encryption that _does_ make that search possible
> is trivially broken. I usually stop my thinking there,
> but ngrams, casing, WordDelimiterFilterFactory
> all come immediately to mind as "interesting".
>
> But what about stored data you ask? Yes, the
> stored fields are compressed but stored verbatim,
> so I've seen arguments for encrypting _that_ stream,
> but that's really a "feel good" fig-leaf. If I get access to the
> index and it has position information, I can reconstruct
> documents without the stored data as Luke does. The
> process is a bit lossy, but the reconstructed document
> has enough fidelity that it'll give people seriously
> concerned about encryption conniption fits.
>
> So all in all I have to back up Shawn's comments: You're
> better off isolating your Solr/Lucene system, putting
> authorization to view _documents_ at that level, and possibly
> using an encrypted filesystem.
>
> FWIW,
> Erick
>
> On Sat, Sep 5, 2015 at 7:27 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
> On 9/5/2015 5:06 AM, Adam Retter wrote:
>
> I wondered if there is any facility already existing in Lucene for
> encrypting the values stored into the index and still being able to
> search them?
>
> If not, I wondered if anyone could tell me if this is impossible to
> implement, and if not to point me perhaps in the right direction?
>
> I imagine that just the text values and document fields to index (and
> optionally store) in the index would be either encrypted on the fly by
> Lucene using perhaps a public/private key mechanism. When a user issues
> a search query to Lucene they would also provide a key so that Lucene
> can decrypt the values as necessary to try and answer their query.
>
>
> I think you could probably add transparent encryption/decryption at the
> Lucene level in a custom codec.  That probably has implications for
> being able to read the older index when it's time to upgrade Lucene,
> with a complete reindex being the likely solution.  Others will need to
> confirm ... I'm not very familiar with Lucene code, I'm here for Solr.
>
> Any verification of user identity/permission is probably best done in
> your own code, before it makes the Lucene query, and wouldn't
> necessarily be related to the encryption.
>
> Requirements like this are usually driven by paranoid customers or
> product managers.  I think that when you really start to examine what an
> attacker has to do to actually reach the unencrypted information (Lucene
> index in this case), they already have acquired so much access that the
> system is completely breached and it won't matter what kind of
> encryption is added.
>
> I find many of these requirements to be silly, and put an incredible
> burden on admin and developer resources with little or no benefit.
> Here's an example of similar customer encryption requirement which I
> encountered recently:
>
> We have a web application that has three "hops" involved.  A user talks
> to a load balancer, which talks to Apache, where the connection is then
> proxied to a Tomcat server with the AJP protocol.  The customer wanted
> all three hops encrypted.  The first hop was already encrypted, the
> second was easy, but the third proved to be very difficult.  Finally we
> decided that we did not need load balancing on that last hop, and it
> could simply talk to localhost, eliminating the need to encrypt it.
>
> The customer was worried about an attacker sniffing the traffic on the
> LAN and seeing details like passwords.  I consider this to be an insane
> requirement.  In order to sniff that traffic, the attacker would need
> one of two things:  Root access on a server, or physical access to the
> infrastructure.  Physical access can be escalated to root access if you
> know what you're doing.  Once someone has either of those things,
> encrypted traffic won't matter, they will be able to learn anything they
> need or do any damage they desire, without even needing to sniff the
> traffic.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
>


-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

Reply via email to