I think the point of the discussion is really to determine the answer
to #1.
I would counter that it is not a compelling feature for MOST users of
Lucene, but it can still be implemented externally using binary
fields for those that require it, and or even easier (and maybe even
faster) using a encrypted filesystem with proper security.
Adding it to the core Lucene complicates the code base, and I do not
believe it is warranted.
This is only my opinion.
On Dec 2, 2006, at 2:38 PM, negrinv wrote:
At the contrary Mike, I am beginning to think that there have been
a number
of misunderstandings, of my original posting to start with.
When I submitted my proposal I was prepared for some discussion on the
merits or otherwise of my proposed solution. I had no idea that the
discussion would drift towards security and performance in absolute
terms. I
would like now to steer the debate in its intended direction.
I have no difficulty agreeing with you on both counts. A non-
encrypted swap
file is a security risk, and encryption imposes a performance
penalty. Both
of which I submit are not relevant to my posting for the following
reasons.
Security is all about knowing where you stand so you can take
counter-measures, it is not about a "false sense of security"
provided by
knowing you have an encrypted swap file or a 3000 byte encryption key.
Lucene cannot provide security. It would be a legal nightmare and
an absurd
expectation. The underlying operating system within which Lucene
runs does
not guarantee security, the encryption software provider does not
guarantee
security, password protection and physical security are also
outside of
Lucene's control. What Lucene can do is to provide encryption
services,
while the application has to provide a given level of security. For
instance, if you run under an operating system which does not
provide swap
file encryption, then you must disable the swap file. Does that
impose a
performance penalty? Probably, if your memory is limited, but now
you know
where you stand so you make a decision. Performance or encrytpion
or more
memory. But one cannot, in my view, shift the responsability for that
decision to Lucene.
I'll give you another example, you mentioned padding of 128 bits.
True,
there are encryption routines which impose that penalty. For my
(initial)
implementation I had the choice between an algorythm with padding,
or RC4,
which does not pad. A 10 character term remains a 10 character term
after
encryption. No padding and no index size implications. I said so in my
posting and as an application developer you then have a choice to
make. Use
Lucene RC4 encryption as proposed (for the time being) or use another
product, or write your own. Without knowing the application, any
decision
would be totally out of context, and no one piece of software can
satisfy
all applications. A possible solution would be for Lucene to offer
a choice
of algorythms.
The army I am sure would like to run its tanks at the speed of a
Ferrary,
but it cannot, it hits a wall known as cost-benefit ratio. It must
choose
between security and speed and budget, keeping in mind the
application. The
modern tank is the answer. A compromise.
My original posting avoided the notion of security and performance in
absolute terms precisely because of all the above considerations,
it simply
addressed a couple of points which need to be resolved before the
specifics
of the implementation can be discussed.
1) is it a good idea to have ancryption added to Lucene? I think so
obviously, but not everyone agrees. As was pointed out in this
discussion,
some relational database software provides encryption at the column
level, a
functionality equivalent to the one I proposed. Lucene in some ways
competes
with relational databases.
2) assuming the answer to 1) above is yes, how should one go about
including
encryption in Lucene. My solution is just that, one approach.
Others have
proposed directory or file system encryption. My view on this is
that this
level of encryption is already provided by all major operating
systems, as
well a by some hardware devices. I would not see a justifiable
benefit in
adding it to Lucene. But that is only my personal opinion, although
I am
aware that directory encryption is in the hands of the system
administrator,
not the application end user. Perhaps there are other options which
have not
been raised yet.
3) assuming my proposal is acceptable, can it be implemented
better. I am
not a Lucene expert, I learned Lucene on the go. I would be
delighted to see
a better solution presented, it would be a learning experience for me.
I hope I have not added to the confusion.
Season's greetings to you and to all who took time to participate
in this
discussion.
Victor
Robert Engels wrote:
I think you misunderstood. If you do not have encrypted swap (like
OSX provides for) then you encryption is pointless as anyone can
inspect the data as it it loaded into the heap by lucene - bypassing
the encryption.
I also think you underestimated the impact on the size of the
indexes, as most secure encryption schemes are going to pad the
payloads to a minimum of 128 bits, and usually much more.
This is going to make a HUGE difference in the size of the index.
On Dec 1, 2006, at 2:00 PM, negrinv wrote:
Good news for OSX users! but what about all the others, should I
say the
majority??
One more reason for encrypting at field level.
Victor
Robert Engels wrote:
Not if running under OSX with encrypted swap turned on ! :)
-----Original Message-----
From: Nicolas Lalev�e <[EMAIL PROTECTED]>
Sent: Dec 1, 2006 4:49 AM
To: java-dev@lucene.apache.org
Subject: Re: Attached proposed modifications to Lucene 2.0 to
support
Field.Store.Encrypted
Le Vendredi 1 D�cembre 2006 11:10, negrinv a �crit�:
Nicolas Lalev�e-2 wrote:
Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit :
Thank you Robert for your commnets. I am inclined to agree
with you,
but
I
would like to establish first of all if simplicity of
implementation
is
the
overriding consideration. But before I dwell on that let me
say that
i
have
discovered that I am not a master of DIFF file creation with
Eclipse.
The diff file attachement to my original posting is absurdly
large
and
not correct. I have therefore attached a zip file containing
the
complete source code of the classes I modified. I leave it to
others
to
extract the
diffs properly.
Back to the issue. So far the implementation has not been
difficult
considering that I knew nothing about Lucene internals before I
started.
The reason is that Lucene is very well structured and the
changes
just
fitted nicely by adding some code in the right place with
minimal
changes to the existing code. But I admit that the proposed
implementation so far is not complete and more work is
required to
overcome some of its restrictions. While I like your idea I
believe
that
it imposed too large a
granularity on the encrypted data, all fields will all kinds
of data
will be encrypted including images and others which normally
would
be
left alone, thus adding to the performance penalty due to
encryption.
I don't agree with you here. In Lucene, you will encrypt the
field
data,
the
field names, and the tokens : I would say that is represents at
least
2/3
of
the index size. Then, with the implementation you suggest, I
think
(sorry
I
didn't took time to see you patch) that every time a lucene
data need
to
be
read, it is decrypted each time. With an encrypted FS, your
kernel
will
maintain a cache in RAM for you, so it won't hurt so much.
It needs some bench to see what is effectively the best, but I
have
doubt
that
your solution will be faster.
Nicolas.
Nicolas, I am all in favour of some tests to establish which
solution is
best, but I have to say that I don't believe file system or
directory
encryption in Lucene is really justified. Most operating system
already
provide this feature, although they are system-wide or policy-
based
solution, hence not always within individual user control.
But if the issue is user control, then I believe Lucene should
provide
maximum granularity when it comes to choice of data to encrypt.
The issue I believe is whether some form of encryption should be
provided
within Lucene to enable application developers to create
applications
which
offer some data protection under user control, with a minimum of
impact,
where by impact I mean both on peformance and workload either in
Lucene
code or user code.
In fact you mean a user that has no control of it's machine, and
that
cannot
encrypt his partition. Here you will have the issue with the
swap : Lucene
will decrypt the data in RAM, that can possibly pushed on the
swap... I
know
this is extreme, but it's a security hole.
--
Nicolas LALEV�E
Solutions & Technologies
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com
------------------------------------------------------------------
--
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-------------------------------------------------------------------
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
View this message in context: http://www.nabble.com/Attached-
proposed-modifications-to-Lucene-2.0-to-support-
Field.Store.Encrypted-tf2727614.html#a7645198
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.
--------------------------------------------------------------------
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
View this message in context: http://www.nabble.com/Attached-
proposed-modifications-to-Lucene-2.0-to-support-
Field.Store.Encrypted-tf2727614.html#a7657011
Sent from the Lucene - Java Developer mailing list archive at
Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]