Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

2006-11-30 Thread negrinv

Luke, I should have mentioned in my earlier posting that what I am proposing
uses password based encrytpion, where the password  is NOT stored anywhere
within Lucene. I avoided  on purpose to make any references to security (as
opposed to encryption) because I believe security to be the responsability
of the end application, not of Lucene. Lucene in my opinion can only provide
encryption services. None of the encryption APIs themselves, wether written
by a third party or by Sun, can guarantee security either. Hence why Lucene
cannot do it also. What it can do is provide the encryption of the data and
its index. Any application using this proposed API extensions will have to
work out the extent to which it can provide security within the context of
all the other APIs involved and  the application requirements themselves. 
I have to agree with you that at some stage Lucene will have to stop
providing new functionality or it will become unmaintenable. But has it
reached that stage yet?
Victor

Luke Nezda wrote:
> 
> Victor-
> Your point is well taken that a comprehensive encryption strategy is not
> quite analogous to compression which is involves more than a
> transformation
> of field values to a more compact form since it requires (at a minimum)
> all
> data structures which comprise the index be encrypted too.  Maybe I spoke
> to
> soon.
> 
> However, after considering this more, I think the scheme would need to be
> quite invasive to provide good security.  I think just plugging in
> encryption simplistically would be very vulnerable to side channel
> attacks.
> It seems the attacker can get clear text terms encrypted via the
> particular
> index's QueryParser implementation and eventually create a fairly complete
> decryption lookup table using Lucene's  data structures, thus undermining
> the security of the internal data structures (encrypted payloads would
> potentially be unaffected (unless they corresponded to index Terms)).
> 
> Let's say this weakness is OK with you.  Using the current API, I think
> you
> can achieve your ends by using encrypting binary field values and adding a
> trailing org.apache.lucene.analysis.TokenFilter you use at index and query
> time that encrypts and Base64 encodes its input (has to be a String). 
> This
> would effectively give you an encrypted form of Lucene's internal data
> structures.
> 
> In addition to my security concerns with the concept, I also still agree
> with the related philosophical issues put forward to this point on the
> related field compression topic.  It seems inevitable to me that if
> encryption support were added, eventually, application developers will try
> to sell Lucene developers on adding features to it in addition to
> supporting
> and maintaining it (ala configurable compression quality factor).  A
> configurable, encrypting Base64 TokenFilter would also be a cool contrib.
> 
> Luke
> 
> On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote:
>>
>>
>> Thank you Luke for your comments and the references you supplied. I read
>> through them and reached the following conclusions. There seems to be a
>> philosophical issue about the boundary between a user application and the
>> Lucene API, where should one start and the other stop.
>> The other issue is the significant difference between compression and
>> encryption.
>> As far as the first issue is concerned it is really a matter of personal
>> choice and preference. My feeling is that as long as adding functionality
>> does not impair the performance of the API as a whole, it makes sense to
>> add
>> it to Lucene and thus simplify the task of the application developer.
>> After
>> all, application developers do not have to use all the features of the
>> API
>> and always have the option of subclassing, writing a better version of it
>> if
>> they can, or writing the functionality as part of the application, even
>> if
>>
>> the API provides that functionality already. The API is there to make
>> life
>> easier for those developers who want to use it, nobody "has" to use it.
>> The second issue is more technical. Compression simply compresses the
>> stored
>> data to save storage. The index itself is not compressed therefore
>> searching
>> proceeds as normal. With encryption however you must encrypt the index as
>> well as the stored data otherwise one could reconstruct the source
>> document
>> from the index and thus defeat the purpose of encryption. Correct me if I
>> am
>> wrong, but I think that encrypting the Lucene index is not easy to
>> achieve
>> from outside of Lucene, it implies re-writing as part of the application
>> much code now part of Lucene (see issue number one above), hence my
>> preference for including it as part of the Lucene API rather than as part
>> of
>> the application.
>> Victor
>>
>>
>> Luke Nezda wrote:
>> >
>> > I think that adding encryption support to Lucene fields is a bad idea
>> for
>> > the same reasons adding compression was a bad idea (conclusive comme

Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

2006-11-30 Thread negrinv

Thank you Robert for your commnets. I am inclined to agree with you, but I
would like to establish first of all if simplicity of implementation is the
overriding consideration. But before I dwell on that let me say that i have
discovered that I am not a master of DIFF file creation with Eclipse. The
diff file attachement to my original posting is absurdly large and not
correct. I have therefore attached a zip file containing the complete source
code of the classes I modified. I leave it to others to extract the diffs
properly.
Back to the issue. So far the implementation has not been difficult
considering that I knew nothing about Lucene internals before I started. The
reason is that Lucene is very well structured and the changes just fitted
nicely by adding some code in the right place with minimal changes to the
existing code. But I admit that the proposed implementation so far is not
complete and more work is required to overcome some of its restrictions.
While I like your idea I believe that it imposed too large a granularity on
the encrypted data, all fields will all kinds of data will be encrypted
including  images and others which normally would be left alone, thus adding
to the performance penalty due to encryption. Many hardware devices and most
operating systems already provide directory or file system encryption
therefore that level of encryption appears to me an unnecessary addition to
Lucene. Encryption at field level however is not provided by anything I
know. The key in my opinion is to decide what is best from the end user
point of view, but perhaps we need more discussion on this. 
Victor

http://www.nabble.com/file/4390/LuceneEncryptionMods.zip
LuceneEncryptionMods.zip 


Robert Engels wrote:
> 
> I think a simpler solution would be to create a EncryptedDirectory
> implementation of Directory, which requires a password to open/modify the
> directory.
> 
> Far simpler, and if yuou are using encryption to begin with, you are
> probably encrypting most of the data anyway.
> 
> -Original Message-
>>From: negrinv <[EMAIL PROTECTED]>
>>Sent: Nov 29, 2006 9:45 PM
>>To: java-dev@lucene.apache.org
>>Subject: Re: Attached proposed modifications to Lucene 2.0 to support
Field.Store.Encrypted
>>
>>
>>Thank you Luke for your comments and the references you supplied. I read
>>through them and reached the following conclusions. There seems to be a
>>philosophical issue about the boundary between a user application and the
>>Lucene API, where should one start and the other stop.
>>The other issue is the significant difference between compression and
>>encryption.
>>As far as the first issue is concerned it is really a matter of personal
>>choice and preference. My feeling is that as long as adding functionality
>>does not impair the performance of the API as a whole, it makes sense to
add
>>it to Lucene and thus simplify the task of the application developer.
After
>>all, application developers do not have to use all the features of the API
>>and always have the option of subclassing, writing a better version of it
if
>>they can, or writing the functionality as part of the application, even if
>>the API provides that functionality already. The API is there to make life
>>easier for those developers who want to use it, nobody "has" to use it.
>>The second issue is more technical. Compression simply compresses the
stored
>>data to save storage. The index itself is not compressed therefore
searching
>>proceeds as normal. With encryption however you must encrypt the index as
>>well as the stored data otherwise one could reconstruct the source
document
>>from the index and thus defeat the purpose of encryption. Correct me if I
am
>>wrong, but I think that encrypting the Lucene index is not easy to achieve
>>from outside of Lucene, it implies re-writing as part of the application
>>much code now part of Lucene (see issue number one above), hence my
>>preference for including it as part of the Lucene API rather than as part
of
>>the application.
>>Victor
>>
>>
>>Luke Nezda wrote:
>>> 
>>> I think that adding encryption support to Lucene fields is a bad idea
>>> for
>>> the same reasons adding compression was a bad idea (conclusive comments
>>> on
>>> the tail of this  issue
>>> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary
>>> fields
>>> can be used by users to achieve this end.  Maybe a contrib with utility
>>> methods would be a compromise to preserve this work and make it
>>> accessible
>>> to others, or alternatively just a faq entry with the sample code or
>>> references to it.
>>> Luke
>>> 
>>> On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote:


 Attached are proposed modifications to Lucene 2.0 to support
 Field.Store.Encrypted.
 The rational behind this proposal is simple. Since Lucene can store
 data
 in
 the index, it effectively makes the data portable. It is conceivable
 that
 some of the data may be sensitive in nature, hence the o

[jira] Commented: (LUCENE-734) Upload Lucene 2.0 artifacts in the Maven 1 repository

2006-11-30 Thread Hoss Man (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-734?page=comments#action_12454777 ] 

Hoss Man commented on LUCENE-734:
-

FYI: anyone can edit the wiki if you create an account and login.

> Upload Lucene 2.0 artifacts in the Maven 1 repository
> -
>
> Key: LUCENE-734
> URL: http://issues.apache.org/jira/browse/LUCENE-734
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Other
>Reporter: Jukka Zitting
>Priority: Minor
>
> The Lucene 2.0 artifacts can be found in the Maven 2 repository, but not in 
> the Maven 1 repository. There are still projects using Maven 1 who might be 
> interested in upgrading to Lucene 2, so having the artifacts also in the 
> Maven 1 repository would be very helpful.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-734) Upload Lucene 2.0 artifacts in the Maven 1 repository

2006-11-30 Thread Jukka Zitting (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-734?page=comments#action_12454774 ] 

Jukka Zitting commented on LUCENE-734:
--

The ReleaseTodo page is immutable so I can't modify it directly.

At least the Maven sync directory information is outdated, the new official 
path (although I think the previous one is still symlinked) is 
/www/people.apache.org/repo/m2-ibiblio-rsync-repository.

You are right in that the artifacts in the Maven 2 repository above should 
(AFAIK) get automatically copied also to the Maven 1 repository. At least it 
works the other way. I'll check that and report back.

> Upload Lucene 2.0 artifacts in the Maven 1 repository
> -
>
> Key: LUCENE-734
> URL: http://issues.apache.org/jira/browse/LUCENE-734
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Other
>Reporter: Jukka Zitting
>Priority: Minor
>
> The Lucene 2.0 artifacts can be found in the Maven 2 repository, but not in 
> the Maven 1 repository. There are still projects using Maven 1 who might be 
> interested in upgrading to Lucene 2, so having the artifacts also in the 
> Maven 1 repository would be very helpful.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-733) problems with some non word ascii characters in searchs

2006-11-30 Thread Hoss Man (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-733?page=all ]

Hoss Man resolved LUCENE-733.
-

Resolution: Invalid

The situation described is very likely depending on the Analyzers used when 
indexing the source text, and when parsing the query ... without specific code 
demonstrating exactly what analysers were used, there isn't really any evidence 
of a "bug"

When getting unexpected results back from a Lucene search, please consults the 
user mailing list before submitting a bug ... the number of people 
reading/replying to the user list who can provide assistence in understanding 
the results you are getting is much larger then the number of people watching 
the Jira issue queue.

> problems with some non word ascii characters in searchs
> ---
>
> Key: LUCENE-733
> URL: http://issues.apache.org/jira/browse/LUCENE-733
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser, Search
>Reporter: Neil Despain
>
> Here are a number of examples of searches that are not acting as I would 
> expect.
> 1.
> -
> I have a document with the text:
> Smith, Bob
> 1.a
> If I do a search:
> Smith,~0.9 Bob~0.9
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:smith,~0.9 content:bob~0.9
> But it only gets a hit on: Bob
> 1.b
> If I do this search:
> "Smith,~0.9 Bob~0.9"~1
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:"bob"~1
> and it also only returns a hit for: Bob
> In both cases words that end with a comma are not found. (other characters 
> have the same affect as commas)
> =
> 2.
> -
> For a document with phone numbers:
> 2124225100
> 212 422 5100
> 212-422-5100
> (212) 422-5100
> (212)4225100
> (212)422-5100
> (212) 422.5100
> (212) 422 5100
> 212.422.5100
> 212.422-5100
> 2.a
> If I do a search:
> 212*422*5100~0.9
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:"(212.422-5100 212-422-5100 2124225100 212.422.5100)"
> I do not get a match on 212)422-5100 -- Doesn't find anything that starts 
> with (212)...
> 2.b
> Search term:
> 212*422*5100
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:212*422*5100
> and does not match 212)422-5100 -- Doesn't find anything that starts with 
> (212)...
> 2.c
> If I try to work around that by searching with proximity for:
> "212 422*5100"~1
> MultiPhraseQueryParser.parse(term) returns a query for:
> content:"(422-5100 422.5100 4225100)"~1
> and again does not find anything with (212)... like (212) 422-5100 or 
> (212)422-5100
> =

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-734) Upload Lucene 2.0 artifacts in the Maven 1 repository

2006-11-30 Thread Hoss Man (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-734?page=comments#action_12454770 ] 

Hoss Man commented on LUCENE-734:
-

The best way to see that this happens would be to provide information in this 
bug about how it can/should be done ... ideally this information should be 
added directly to the "ReleaseTodo" info so it not only gets done for the 1.9.1 
and 2.0.0 releases, but also for future releases...

http://wiki.apache.org/jakarta-lucene/ReleaseTodo

..i can't find the refrence now, but i seem to recall someone somehwere saying 
that things put i nthe maven2 repos were automagically copied to maven1 ...  i 
guess that's not true afterall.

> Upload Lucene 2.0 artifacts in the Maven 1 repository
> -
>
> Key: LUCENE-734
> URL: http://issues.apache.org/jira/browse/LUCENE-734
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Other
>Reporter: Jukka Zitting
>Priority: Minor
>
> The Lucene 2.0 artifacts can be found in the Maven 2 repository, but not in 
> the Maven 1 repository. There are still projects using Maven 1 who might be 
> interested in upgrading to Lucene 2, so having the artifacts also in the 
> Maven 1 repository would be very helpful.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

2006-11-30 Thread Robert Engels
I think a simpler solution would be to create a EncryptedDirectory 
implementation of Directory, which requires a password to open/modify the 
directory.

Far simpler, and if yuou are using encryption to begin with, you are probably 
encrypting most of the data anyway.

-Original Message-
>From: negrinv <[EMAIL PROTECTED]>
>Sent: Nov 29, 2006 9:45 PM
>To: java-dev@lucene.apache.org
>Subject: Re: Attached proposed modifications to Lucene 2.0 to support 
>Field.Store.Encrypted
>
>
>Thank you Luke for your comments and the references you supplied. I read
>through them and reached the following conclusions. There seems to be a
>philosophical issue about the boundary between a user application and the
>Lucene API, where should one start and the other stop.
>The other issue is the significant difference between compression and
>encryption.
>As far as the first issue is concerned it is really a matter of personal
>choice and preference. My feeling is that as long as adding functionality
>does not impair the performance of the API as a whole, it makes sense to add
>it to Lucene and thus simplify the task of the application developer. After
>all, application developers do not have to use all the features of the API
>and always have the option of subclassing, writing a better version of it if
>they can, or writing the functionality as part of the application, even if
>the API provides that functionality already. The API is there to make life
>easier for those developers who want to use it, nobody "has" to use it.
>The second issue is more technical. Compression simply compresses the stored
>data to save storage. The index itself is not compressed therefore searching
>proceeds as normal. With encryption however you must encrypt the index as
>well as the stored data otherwise one could reconstruct the source document
>from the index and thus defeat the purpose of encryption. Correct me if I am
>wrong, but I think that encrypting the Lucene index is not easy to achieve
>from outside of Lucene, it implies re-writing as part of the application
>much code now part of Lucene (see issue number one above), hence my
>preference for including it as part of the Lucene API rather than as part of
>the application.
>Victor
>
>
>Luke Nezda wrote:
>> 
>> I think that adding encryption support to Lucene fields is a bad idea for
>> the same reasons adding compression was a bad idea (conclusive comments on
>> the tail of this  issue
>> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary fields
>> can be used by users to achieve this end.  Maybe a contrib with utility
>> methods would be a compromise to preserve this work and make it accessible
>> to others, or alternatively just a faq entry with the sample code or
>> references to it.
>> Luke
>> 
>> On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>> Attached are proposed modifications to Lucene 2.0 to support
>>> Field.Store.Encrypted.
>>> The rational behind this proposal is simple. Since Lucene can store data
>>> in
>>> the index, it effectively makes the data portable. It is conceivable that
>>> some of the data may be sensitive in nature, hence the option to encrypt
>>> it.
>>> Both the data and its index are encrypted in this implementation.
>>> This is only an initial implementation. It has the following several
>>> restrictions, all of which can be resolved if required, albeit with some
>>> effort and more changes to Lucene:
>>> 1) binary and compressed fields cannot be encrypted as well (a plaintext
>>> once encrypted becomes binary).
>>> 2) Field.Store.Encrypted implies Field.Store.Yes
>>> This makes sense but it forces one to store the data in the same index
>>> where
>>> the tokens are stored. It may be preferable at times to have two indeces,
>>> one for tokens, the other for the data.
>>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an
>>> open
>>> source package, very simple to use which has the advantage of
>>> guaranteeing
>>> that the length of the encrypted field is the same as the original
>>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its
>>> Java
>>> Cryptography Extension, but unfortunately not in Java 1.4.
>>> The BouncyCastle RC4 is not the only algorythm available, others not
>>> depending on third party code can be used, but it was just the simplest
>>> to
>>> implement for this first attempt.
>>> 4) The attachements are modifications in diff form based on an early (I
>>> think August or September '06) repository snapshot of Lucene 2.0
>>> subsequently updated from the Lucene repository on 29/11/06. They may
>>> need
>>> some additional work to merge with the latest version in the Lucene
>>> repository. They also include a couple of JUnit test programs which
>>> explain,
>>> as well as test, the usage. You will need the BouncyCastle .jar
>>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the
>>> size
>>> of the attachements, but it can be downloaded free from:
>

Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

2006-11-30 Thread Robert Engels
Agreed.

-Original Message-
>From: Luke Nezda <[EMAIL PROTECTED]>
>Sent: Nov 29, 2006 8:30 PM
>To: java-dev@lucene.apache.org
>Subject: Re: Attached proposed modifications to Lucene 2.0 to support 
>Field.Store.Encrypted
>
>I think that adding encryption support to Lucene fields is a bad idea for
>the same reasons adding compression was a bad idea (conclusive comments on
>the tail of this  issue
>http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary fields
>can be used by users to achieve this end.  Maybe a contrib with utility
>methods would be a compromise to preserve this work and make it accessible
>to others, or alternatively just a faq entry with the sample code or
>references to it.
>Luke
>
>On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote:
>>
>>
>> Attached are proposed modifications to Lucene 2.0 to support
>> Field.Store.Encrypted.
>> The rational behind this proposal is simple. Since Lucene can store data
>> in
>> the index, it effectively makes the data portable. It is conceivable that
>> some of the data may be sensitive in nature, hence the option to encrypt
>> it.
>> Both the data and its index are encrypted in this implementation.
>> This is only an initial implementation. It has the following several
>> restrictions, all of which can be resolved if required, albeit with some
>> effort and more changes to Lucene:
>> 1) binary and compressed fields cannot be encrypted as well (a plaintext
>> once encrypted becomes binary).
>> 2) Field.Store.Encrypted implies Field.Store.Yes
>> This makes sense but it forces one to store the data in the same index
>> where
>> the tokens are stored. It may be preferable at times to have two indeces,
>> one for tokens, the other for the data.
>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an
>> open
>> source package, very simple to use which has the advantage of guaranteeing
>> that the length of the encrypted field is the same as the original
>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its Java
>> Cryptography Extension, but unfortunately not in Java 1.4.
>> The BouncyCastle RC4 is not the only algorythm available, others not
>> depending on third party code can be used, but it was just the simplest to
>> implement for this first attempt.
>> 4) The attachements are modifications in diff form based on an early (I
>> think August or September '06) repository snapshot of Lucene 2.0
>> subsequently updated from the Lucene repository on 29/11/06. They may need
>> some additional work to merge with the latest version in the Lucene
>> repository. They also include a couple of JUnit test programs which
>> explain,
>> as well as test, the usage. You will need the BouncyCastle .jar
>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the
>> size
>> of the attachements, but it can be downloaded free from:
>> http://www.bouncycastle.org/latest_releases.html
>>
>> 5) Searching an encrypted field is restricted to single terms, no phrase
>> or
>> boolean searches allowed yet, and the term has to be encrypted by the
>> application before searching it. (ref. attached JUnit test programs)
>>
>> To the extent that I have tested it, the code works as intended and does
>> not
>> appear to introduce any regression problems, but more testing by others
>> would be desirable.
>> I don't propose at this stage to do any further work with this API
>> extensions unless there is some expression of interest and direction from
>> the Lucene Developers team. I have an application ready to roll which uses
>> the proposed Lucene encryption API additions (please see
>> http://www.kbforge.com/index.html). The application is not yet available
>> for
>> downloading simply because I am not sure if the Lucene licence allows me
>> to
>> do so. I would appreciate your advice in this regard. My application is
>> free
>> but its source code is not available (yet). I should add that encryption
>> does not have to be an integral part of Lucene, it can be just part of the
>> end application, but somehow it seems to me that Field.Store.Encrypted
>> belongs in the same category as compression and binary values.
>> I would be happy to receive your feedback.
>>
>> victor negrin
>>
>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
>> http://www.nabble.com/file/4377/TestEncryptedDocument.java
>> TestEncryptedDocument.java
>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
>> --
>> View this message in context:
>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>




-
To unsubscribe, e-mail: [E

Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

2006-11-30 Thread Luke Nezda

Victor-
Your point is well taken that a comprehensive encryption strategy is not
quite analogous to compression which is involves more than a transformation
of field values to a more compact form since it requires (at a minimum) all
data structures which comprise the index be encrypted too.  Maybe I spoke to
soon.

However, after considering this more, I think the scheme would need to be
quite invasive to provide good security.  I think just plugging in
encryption simplistically would be very vulnerable to side channel attacks.
It seems the attacker can get clear text terms encrypted via the particular
index's QueryParser implementation and eventually create a fairly complete
decryption lookup table using Lucene's  data structures, thus undermining
the security of the internal data structures (encrypted payloads would
potentially be unaffected (unless they corresponded to index Terms)).

Let's say this weakness is OK with you.  Using the current API, I think you
can achieve your ends by using encrypting binary field values and adding a
trailing org.apache.lucene.analysis.TokenFilter you use at index and query
time that encrypts and Base64 encodes its input (has to be a String).  This
would effectively give you an encrypted form of Lucene's internal data
structures.

In addition to my security concerns with the concept, I also still agree
with the related philosophical issues put forward to this point on the
related field compression topic.  It seems inevitable to me that if
encryption support were added, eventually, application developers will try
to sell Lucene developers on adding features to it in addition to supporting
and maintaining it (ala configurable compression quality factor).  A
configurable, encrypting Base64 TokenFilter would also be a cool contrib.

Luke

On 11/29/06, negrinv <[EMAIL PROTECTED]> wrote:



Thank you Luke for your comments and the references you supplied. I read
through them and reached the following conclusions. There seems to be a
philosophical issue about the boundary between a user application and the
Lucene API, where should one start and the other stop.
The other issue is the significant difference between compression and
encryption.
As far as the first issue is concerned it is really a matter of personal
choice and preference. My feeling is that as long as adding functionality
does not impair the performance of the API as a whole, it makes sense to
add
it to Lucene and thus simplify the task of the application developer.
After
all, application developers do not have to use all the features of the API
and always have the option of subclassing, writing a better version of it
if
they can, or writing the functionality as part of the application, even if

the API provides that functionality already. The API is there to make life
easier for those developers who want to use it, nobody "has" to use it.
The second issue is more technical. Compression simply compresses the
stored
data to save storage. The index itself is not compressed therefore
searching
proceeds as normal. With encryption however you must encrypt the index as
well as the stored data otherwise one could reconstruct the source
document
from the index and thus defeat the purpose of encryption. Correct me if I
am
wrong, but I think that encrypting the Lucene index is not easy to achieve
from outside of Lucene, it implies re-writing as part of the application
much code now part of Lucene (see issue number one above), hence my
preference for including it as part of the Lucene API rather than as part
of
the application.
Victor


Luke Nezda wrote:
>
> I think that adding encryption support to Lucene fields is a bad idea
for
> the same reasons adding compression was a bad idea (conclusive comments
on
> the tail of this  issue
> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary
fields
> can be used by users to achieve this end.  Maybe a contrib with utility
> methods would be a compromise to preserve this work and make it
accessible
> to others, or alternatively just a faq entry with the sample code or
> references to it.
> Luke
>
> On 11/29/06, negrinv <[EMAIL PROTECTED] > wrote:
>>
>>
>> Attached are proposed modifications to Lucene 2.0 to support
>> Field.Store.Encrypted.
>> The rational behind this proposal is simple. Since Lucene can store
data
>> in
>> the index, it effectively makes the data portable. It is conceivable
that
>> some of the data may be sensitive in nature, hence the option to
encrypt
>> it.
>> Both the data and its index are encrypted in this implementation.
>> This is only an initial implementation. It has the following several
>> restrictions, all of which can be resolved if required, albeit with
some
>> effort and more changes to Lucene:
>> 1) binary and compressed fields cannot be encrypted as well (a
plaintext
>> once encrypted becomes binary).
>> 2) Field.Store.Encrypted implies Field.Store.Yes
>> This makes sense but it forces one to store the data in the same index
>> where

[jira] Commented: (LUCENE-735) Simple tool to back-convert from lockless to pre-lockless file format

2006-11-30 Thread Michael McCandless (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-735?page=comments#action_12454591 ] 

Michael McCandless commented on LUCENE-735:
---

To use this, apply the patch to the Lucene trunk, then ant jar-core then run 
this:

java org.apache.lucene.index.ConvertPreLockless 

The conversion is in place, meaning, after this tool runs, your  
should be in 2.0 file format.

> Simple tool to back-convert from lockless to pre-lockless file format
> -
>
> Key: LUCENE-735
> URL: http://issues.apache.org/jira/browse/LUCENE-735
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1
>Reporter: Michael McCandless
> Assigned To: Michael McCandless
>Priority: Minor
> Fix For: 2.1
>
> Attachments: LUCENE-735.patch
>
>
> Simple tool to back-convert from lockless to pre-lockless file format
> The changes for lockless commits (LUCENE-701 ) are fairly minor and so
> creating a tool to convert a lockless format index back to a
> pre-lockless format index is 1) fairly simple, and 2) useful at least
> for brave souls who want to try lockless but have the freedom to roll
> back to Lucene 2.0, using the same index, if anything goes wrong.
> I will attach an initial patch.
> This has not yet received extensive testing so please be extremely
> careful if you use this in production!  I've only done minimal testing
> so far: using IndexFiles to produce an index under lockless,
> converting it to pre-lockless, and then doing searches against that
> index with 2.0.  More testing is clearly needed to ensure separate
> deletes, separate norms, etc, are working correctly.
> The tool prints details of what it did, eg:
>   >> java org.apache.lucene.index.ConvertPreLockless index
>   3 segments in index
>   segment 0: not compound file format
> has deletions
> rename _a_2.del to _a.del
> no separate norms
>   segment 1: not compound file format
> has deletions
> rename _b_1.del to _b.del
> no separate norms
>   segment 2: not compound file format
> has deletions
> rename _c_1.del to _c.del
> no separate norms
>   wrote "segments" file
>   rename segments_8 to segments_8.old
> Caveats:
>   * Tread very carefully!  Test first in a sandox, etc.
>   * Make sure you only run this tool on an index that is not in use by
> any reader/writers, else you could have problems: the tool
> currently does not acquire the write lock even though it's
> modifying the index.
>   * On Windows only: if your index has any un-referenced files (ie,
> files that should have been deleted but were in use at the time)
> at the time you run this tool, then they will never be deleted
> (ie, pre-lockless Lucene won't know to delete them).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-735) Simple tool to back-convert from lockless to pre-lockless file format

2006-11-30 Thread Michael McCandless (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-735?page=all ]

Michael McCandless updated LUCENE-735:
--

Attachment: LUCENE-735.patch

> Simple tool to back-convert from lockless to pre-lockless file format
> -
>
> Key: LUCENE-735
> URL: http://issues.apache.org/jira/browse/LUCENE-735
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.1
>Reporter: Michael McCandless
> Assigned To: Michael McCandless
>Priority: Minor
> Fix For: 2.1
>
> Attachments: LUCENE-735.patch
>
>
> Simple tool to back-convert from lockless to pre-lockless file format
> The changes for lockless commits (LUCENE-701 ) are fairly minor and so
> creating a tool to convert a lockless format index back to a
> pre-lockless format index is 1) fairly simple, and 2) useful at least
> for brave souls who want to try lockless but have the freedom to roll
> back to Lucene 2.0, using the same index, if anything goes wrong.
> I will attach an initial patch.
> This has not yet received extensive testing so please be extremely
> careful if you use this in production!  I've only done minimal testing
> so far: using IndexFiles to produce an index under lockless,
> converting it to pre-lockless, and then doing searches against that
> index with 2.0.  More testing is clearly needed to ensure separate
> deletes, separate norms, etc, are working correctly.
> The tool prints details of what it did, eg:
>   >> java org.apache.lucene.index.ConvertPreLockless index
>   3 segments in index
>   segment 0: not compound file format
> has deletions
> rename _a_2.del to _a.del
> no separate norms
>   segment 1: not compound file format
> has deletions
> rename _b_1.del to _b.del
> no separate norms
>   segment 2: not compound file format
> has deletions
> rename _c_1.del to _c.del
> no separate norms
>   wrote "segments" file
>   rename segments_8 to segments_8.old
> Caveats:
>   * Tread very carefully!  Test first in a sandox, etc.
>   * Make sure you only run this tool on an index that is not in use by
> any reader/writers, else you could have problems: the tool
> currently does not acquire the write lock even though it's
> modifying the index.
>   * On Windows only: if your index has any un-referenced files (ie,
> files that should have been deleted but were in use at the time)
> at the time you run this tool, then they will never be deleted
> (ie, pre-lockless Lucene won't know to delete them).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-735) Simple tool to back-convert from lockless to pre-lockless file format

2006-11-30 Thread Michael McCandless (JIRA)
Simple tool to back-convert from lockless to pre-lockless file format
-

 Key: LUCENE-735
 URL: http://issues.apache.org/jira/browse/LUCENE-735
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.1
Reporter: Michael McCandless
 Assigned To: Michael McCandless
Priority: Minor
 Fix For: 2.1
 Attachments: LUCENE-735.patch


Simple tool to back-convert from lockless to pre-lockless file format

The changes for lockless commits (LUCENE-701 ) are fairly minor and so
creating a tool to convert a lockless format index back to a
pre-lockless format index is 1) fairly simple, and 2) useful at least
for brave souls who want to try lockless but have the freedom to roll
back to Lucene 2.0, using the same index, if anything goes wrong.

I will attach an initial patch.

This has not yet received extensive testing so please be extremely
careful if you use this in production!  I've only done minimal testing
so far: using IndexFiles to produce an index under lockless,
converting it to pre-lockless, and then doing searches against that
index with 2.0.  More testing is clearly needed to ensure separate
deletes, separate norms, etc, are working correctly.

The tool prints details of what it did, eg:

  >> java org.apache.lucene.index.ConvertPreLockless index

  3 segments in index
  segment 0: not compound file format
has deletions
rename _a_2.del to _a.del
no separate norms
  segment 1: not compound file format
has deletions
rename _b_1.del to _b.del
no separate norms
  segment 2: not compound file format
has deletions
rename _c_1.del to _c.del
no separate norms
  wrote "segments" file
  rename segments_8 to segments_8.old

Caveats:

  * Tread very carefully!  Test first in a sandox, etc.

  * Make sure you only run this tool on an index that is not in use by
any reader/writers, else you could have problems: the tool
currently does not acquire the write lock even though it's
modifying the index.

  * On Windows only: if your index has any un-referenced files (ie,
files that should have been deleted but were in use at the time)
at the time you run this tool, then they will never be deleted
(ie, pre-lockless Lucene won't know to delete them).



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-734) Upload Lucene 2.0 artifacts in the Maven 1 repository

2006-11-30 Thread Jukka Zitting (JIRA)
Upload Lucene 2.0 artifacts in the Maven 1 repository
-

 Key: LUCENE-734
 URL: http://issues.apache.org/jira/browse/LUCENE-734
 Project: Lucene - Java
  Issue Type: Task
  Components: Other
Reporter: Jukka Zitting
Priority: Minor


The Lucene 2.0 artifacts can be found in the Maven 2 repository, but not in the 
Maven 1 repository. There are still projects using Maven 1 who might be 
interested in upgrading to Lucene 2, so having the artifacts also in the Maven 
1 repository would be very helpful.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]