Ron Hawkins wrote:

>That would be an improvement over a random cypher, but wouldn't the length

>and repeatability of the data patterns after encryption negatively affect

>LZW compression, along with deduplication?

 

Not sure I understand your question, but I'll try.

 

Length-is unchanged

 

Repeatability-XYZ encrypts to (perhaps) ABC consistently, so it's repeatable. 
But WXYZ does not encrypt to xABC, so is that what you mean about 
repeatability? Yes, that will affect compression to some extent. My suspicion 
is that it doesn't make a huge difference: yes, your database of names with ROB 
and ROBERT and ROBBIE won't compress the ROB part, but there will be some magic 
convergence of strings in the ciphertext that wasn't there before (less, but 
some). But my impression is that compression is the big win on larger fields 
anyway, like comment fields and the like. And you probably wouldn't FPE those 
because they're not structured by definition, so there's not much win there. We 
do occasionally have customers who want to encrypt, say, comment fields because 
"some of our reps put SSNs or PANs in those even though they aren't supposed 
to"; but for those, another AES encryption mode is a better choice anyway. Of 
course then you still lose some compression!

 

Note also that in the case above (comment field with possible SSN/PAN) another 
choice is to FPE just the digits. So:

Talked to John; he says his SSN is 123-45-6789, but file has 123-44-6789.

Might encrypt to:

Talked to John; he says his SSN is 761-64-3552, but file has 749-43-7477.

 

If they also had "Will call him back on the 13th", the "13" would also get 
encrypted, of course. Kinda weird but it works.

 

Did I answer the question at all, or am I off in far left field?


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to