Re: Index a source, but not store it... can it be done?

Jason Pump Thu, 08 Mar 2007 10:59:53 -0800

If you store a hash code of the word rather then the actual word youshould be able to search for stuff but not be able to actually retrieveit; you can trade precision for "security" based on the number of bitsin the hash code ( e.g. 32 or 64 bits). I'd think a 64 bit hash would bea reasonable midpoint.


hash64("dog") = 4312311231123121;

"body:4312311231123121" returns document with dog, but also any otherdocument with a word that hashes to the same value.



Walt Stoneburner wrote:

Have an interesting scenario I'd like to get your take on with respect
to Lucene:

A data provider (e.g. someone with a private website or corporately
shared directory of proprietary documents) has requested their content
be indexed with Lucene so employees can be redirected to it, but
provisionally -- under no circumstance should that content be stored
or recreated from the index.

Is that even possible?

The data owner's request makes sense in the context of them wanting to
retain full access control via logins as well as collecting access
metrics.

If the token 'CAT' points to C:\Corporate\animals.doc and the token
'DOG' points also points there, then great, CAT AND DOG will give that
document a higher rating, though it is not possible to reconstruct
(with any great accuracy) what the actual document content is.

However, if for the sake of using the NEAR operator with Lucene the
tokens are stored as  LET'S:1 SELL:2 CAT:3 AND:4 DOG:5 ROBOT:6 TOYS:7
THIS:8 DECEMBER:9 ... then someone could pull all tokens for
animal.doc and reconstitute the token stream.

Does Lucene have any kind of trade off for working with "secure" (and
I use this term loosely) data?

-wls

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Index a source, but not store it... can it be done?

Reply via email to