I think the point of the discussion is really to determine the answer to #1.

I would counter that it is not a compelling feature for MOST users of Lucene, but it can still be implemented externally using binary fields for those that require it, and or even easier (and maybe even faster) using a encrypted filesystem with proper security.

Adding it to the core Lucene complicates the code base, and I do not believe it is warranted.

This is only my opinion.

On Dec 2, 2006, at 2:38 PM, negrinv wrote:


At the contrary Mike, I am beginning to think that there have been a number
of misunderstandings, of my original posting to start with.
When I submitted my proposal I was prepared for some discussion on the
merits  or otherwise of my proposed solution. I had no idea that the
discussion would drift towards security and performance in absolute terms. I
would like now to steer the debate in its intended direction.

I have no difficulty agreeing with you on both counts. A non- encrypted swap file is a security risk, and encryption imposes a performance penalty. Both of which I submit are not relevant to my posting for the following reasons.
Security is all about knowing where you stand so you can take
counter-measures, it is not about a "false sense of security" provided by
knowing you have an encrypted swap file or a 3000 byte encryption key.
Lucene cannot provide security. It would be a legal nightmare and an absurd expectation. The underlying operating system within which Lucene runs does not guarantee security, the encryption software provider does not guarantee security, password protection and physical security are also outside of Lucene's control. What Lucene can do is to provide encryption services,
while the application has to provide a given level of security. For
instance, if you run under an operating system which does not provide swap file encryption, then you must disable the swap file. Does that impose a performance penalty? Probably, if your memory is limited, but now you know where you stand so you make a decision. Performance or encrytpion or more
memory. But one cannot, in my view, shift the responsability for that
decision to Lucene.
I'll give you another example, you mentioned padding of 128 bits. True, there are encryption routines which impose that penalty. For my (initial) implementation I had the choice between an algorythm with padding, or RC4, which does not pad. A 10 character term remains a 10 character term after
encryption. No padding and no index size implications. I said so in my
posting and as an application developer you then have a choice to make. Use
Lucene RC4 encryption as proposed (for the time being) or use another
product, or write your own. Without knowing the application, any decision would be totally out of context, and no one piece of software can satisfy all applications. A possible solution would be for Lucene to offer a choice
of algorythms.

The army I am sure would like to run its tanks at the speed of a Ferrary, but it cannot, it hits a wall known as cost-benefit ratio. It must choose between security and speed and budget, keeping in mind the application. The
modern tank is the answer. A compromise.
My original posting avoided the notion of security and performance in
absolute terms precisely because of all the above considerations, it simply addressed a couple of points which need to be resolved before the specifics
of the implementation can be discussed.

1) is it a good idea to have ancryption added to Lucene? I think so
obviously, but not everyone agrees. As was pointed out in this discussion, some relational database software provides encryption at the column level, a functionality equivalent to the one I proposed. Lucene in some ways competes
with relational databases.

2) assuming the answer to 1) above is yes, how should one go about including encryption in Lucene. My solution is just that, one approach. Others have proposed directory or file system encryption. My view on this is that this level of encryption is already provided by all major operating systems, as well a by some hardware devices. I would not see a justifiable benefit in adding it to Lucene. But that is only my personal opinion, although I am aware that directory encryption is in the hands of the system administrator, not the application end user. Perhaps there are other options which have not
been raised yet.

3) assuming my proposal is acceptable, can it be implemented better. I am not a Lucene expert, I learned Lucene on the go. I would be delighted to see
a better solution presented, it would be a learning experience for me.

I hope I have not added to the confusion.

Season's greetings to you and to all who took time to participate in this
discussion.
Victor

Robert Engels wrote:

I think you misunderstood. If you do not have encrypted swap (like
OSX provides for) then you encryption is pointless as anyone can
inspect the data as it it loaded into the heap by lucene - bypassing
the encryption.

I also think you underestimated the impact on the size of the
indexes, as most secure encryption schemes are going to pad the
payloads to a minimum of 128 bits, and usually much more.

This is going to make a HUGE difference in the size of the index.

On Dec 1, 2006, at 2:00 PM, negrinv wrote:


Good news for OSX users! but what about all the others, should I
say the
majority??
One more reason for encrypting at field level.
Victor


Robert Engels wrote:

Not if running under OSX with encrypted swap turned on ! :)

-----Original Message-----
From: Nicolas Lalev�e <[EMAIL PROTECTED]>
Sent: Dec 1, 2006 4:49 AM
To: java-dev@lucene.apache.org
Subject: Re: Attached proposed modifications to Lucene 2.0 to
support
Field.Store.Encrypted

Le Vendredi 1 D�cembre 2006 11:10, negrinv a �crit�:
Nicolas Lalev�e-2 wrote:
Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit :
Thank you Robert for your commnets. I am inclined to agree
with you,
but
I
would like to establish first of all if simplicity of
implementation
is
the
overriding consideration. But before I dwell on that let me
say that
i
have
discovered that I am not a master of DIFF file creation with
Eclipse.
The diff file attachement to my original posting is absurdly
large
and
not correct. I have therefore attached a zip file containing the
complete source code of the classes I modified. I leave it to
others
to
extract the
diffs properly.
Back to the issue. So far the implementation has not been
difficult
considering that I knew nothing about Lucene internals before I
started.
The reason is that Lucene is very well structured and the changes
just
fitted nicely by adding some code in the right place with minimal
changes to the existing code. But I admit that the proposed
implementation so far is not complete and more work is
required to
overcome some of its restrictions. While I like your idea I
believe
that
it imposed too large a
granularity on the encrypted data, all fields will all kinds
of data
will be encrypted including  images and others which normally
would
be
left alone, thus adding to the performance penalty due to
encryption.

I don't agree with you here. In Lucene, you will encrypt the field
data,
the
field names, and the tokens : I would say that is represents at
least
2/3
of
the index size. Then, with the implementation you suggest, I think
(sorry
I
didn't took time to see you patch) that every time a lucene
data need
to
be
read, it is decrypted each time. With an encrypted FS, your kernel
will
maintain a cache in RAM for you, so it won't hurt so much.
It needs some bench to see what is effectively the best, but I
have
doubt
that
your solution will be faster.

Nicolas.

Nicolas, I am all in favour of some tests to establish which
solution is
best, but I have to say that I don't believe file system or
directory
encryption in Lucene is really justified. Most operating system
already
provide this feature, although they are system-wide or policy- based
solution, hence not always within individual user control.
But if the issue is user control, then I believe Lucene should
provide
maximum granularity when it comes to choice of data to encrypt.
The issue I believe is whether some form of encryption should be
provided
within Lucene to enable application developers to create
applications
which
offer some data protection under user control, with a minimum of
impact,
where by impact I mean both on peformance and workload either in
Lucene
code or user code.

In fact you mean a user that has no control of it's machine, and
that
cannot
encrypt his partition. Here you will have the issue with the
swap : Lucene
will decrypt the data in RAM, that can possibly pushed on the
swap... I
know
this is extreme, but it's a security hole.

--
Nicolas LALEV�E
Solutions & Technologies
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com

------------------------------------------------------------------ --
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





------------------------------------------------------------------- --
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
View this message in context: http://www.nabble.com/Attached-
proposed-modifications-to-Lucene-2.0-to-support-
Field.Store.Encrypted-tf2727614.html#a7645198
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.


-------------------------------------------------------------------- -
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
View this message in context: http://www.nabble.com/Attached- proposed-modifications-to-Lucene-2.0-to-support- Field.Store.Encrypted-tf2727614.html#a7657011 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to