Eran Tromer wrote:

You'd gain protection from a 3rd-party adversary listening in on the
traffic (if you didn't a secure channel like SSH in the first place).
But it would be simpler to just use SSH, if that were the problem.

not only inconvenient, but won't solve your fundamental problem.

It will, and here's how.

Ok, so maybe it will. Not as elegantly as storing a single ~60 byte state per file, though.

As far as I can tell, the encryption method we use is reasonably
resilient to even this kind of attack.
"Reasonably" depends on the application.

No doubt about it. What I found, however, is that many of the attacks described apply, usually with no more than minor adjustments, to encrypting pure CBC. In other words, most of the attacks are a result of reusing the same key and IV for repeated encryptions, rather than the IV resets.

I don't think an efficient
stream-based system can be perfectly resilient (for example, once I
found where in the stream the customer list is stored, I can watch how
often it is changed -- which leaks a non-zero amount of business
information).
True.

So you're taking the self-synchronizing approach of rsyncable gzip:
changing the encryption so that local changes in plaintext cause
(pretty) local changes in ciphertext, so the thing behaves reasonably
efficiently when piped to vanilla lib/rsync.

This is no more expensive than usual compression + CBC, CPU power wise.

Specifically, you're
encrypting the compressed plaintext in CBC mode, and resetting the
chained value to the IV whenever the sum of the last sum_span=256 bytes
Where did you get the "256"? It used to be that, but it's now supposed to be 8192. I may have left some reference dangling.

is divisible by sum_mod=8192 but no sooner than sum_min_dist=8192 bytes
after the last reset (a comment to this effect in the sourcecode would
have saved some people some time...).
The intention was, once AP IV proceedings were out, to publish the full paper on the site. Sorry about that.

So the expected chunk size (i.e., runs between IV resets) is 16K,
How did you figure that? I would love to see your mathematical model.

I tried to build one, and failed. I then contacted a mathematician friend of mine, who said that this is a hard problem. By that, he meant that mathematicians around have not come to a consensus regarding how a model for this problem should look like. If you have, indeed, built a model, you may have a publishable paper on your hands.

I ended up doing a "Monte Carlo", which merely means I measured more-or-less the behavior over random data. If memory serves me right, the change from 256 span to 8192 was specifically BECAUSE that made the expected length to be about 12KB, but bumped the variance to over 4KB. In other words, it should be very difficult to guess where an IV reset happened without further information.

meaning a typical change of N bytes in the compressed plaintext will
change roughly N+8K bytes in the ciphertext.

It's actually more than that. The changes propagate backwards in the file, as a side effect of the compression.

That's on top of a
comparable overhead from rsyncable gzip. Reasonable for many workloads,
I presume, though it would be nasty for a database undergoing many tiny
updates.
That's why these are tweakable parameters :-).

As for security... Well, it would certainly give the attacker some
trouble, but it's not leak-proof. For example:

Suppose the adversary can inject some data into the stream (via a web
form or whatever). Then he'll craft a plaintext sequence a consisting of
4097 bytes that sum to 0 modulo RSYNC_WIN=4096 (this forces 'gzip
-rsyncable' to a known state) followed by some data whose gzip
compression has sum 0 modulo 8192 over a rolling 256-byte window exactly
once and no sooner than 8192 from the beginning (this forces the tweaked
CBC mode to revert to its IV). Now, whenever this magic sequence is
injected, the whole compression+encryption process is reset to a state
that is fully determined by the IV, meaning (for example) the next block
can be thought of as encrypted under ECB, with all the obvious attacks
on that. Now, whether the adversary can inject the magic sequence into
the stream just before interesting business data -- that really depends
on the circumstances, but is far from unthinkable.

To give one marginally realistic scenario for the above, suppose your
customer mailing list is stored in a sorted plain text file. Now, I want
to find out if you're dealing with [EMAIL PROTECTED] I thus subscribe
"devik${MAGICSTRING}pad" to your list, so it would appear just before
"[EMAIL PROTECTED]" in the plaintext stream. I also inject
"[EMAIL PROTECTED]" into an arbitrary location in the
plaintext stream. Then, I look at the next backup and check if the
resulting ciphertext blocks are identical. I don't even need to look at
more than one backup to do that.
Interesting attack. You would have to have a security hole in the actual database for that to work, of course. You assume "\n" is the record separator, but that if you place "\n" inside your own record, it will not be escaped when entered into the database. What your attack did, in effect, was to inject "[EMAIL PROTECTED]" into the database. Still, the attack is a valid attack, even if it seems to me a little on the theoretical side.

To show just how theoretical it is, everyone knows that, it's actually [EMAIL PROTECTED], if not even [EMAIL PROTECTED] (.net....).

BTW, encrypting the raw data files separately via 'rsyncrypto -r'
reveals potentially meaningful filenames and makes traffic analysis oh
so easier.

True. All meta data is not encrypted in this mode.

In the unlikely case you're doing that, and in the more
likely case that users of rsyncrypto will misuse '-r' once they see it,
I would have replaced
 rsyncrypto -r plain_dir cipher_dir keyfile key
with
 tar cf - plain_dir | rsyncrypto - cipher_file keyfile key
(which also gets you the various nifty features of tar).
True, but that mode is not very useful to customers. The main problem with it is that it does not allow selective restore of information. I am working on a mode in which the file names are garbled upon compression, and the garbling index is stored in a file (that gets encrypted, just like everything else....).

Better yet, using rexecsync [1] to completely avoid temporary files, I'd
like to do

[EMAIL PROTECTED] rexecsync 'ssh [EMAIL PROTECTED]' \
          'tar clf - / | rsyncrypto - - keyfile key' \
          /backup/fiasco/today

(or its secure sudo-based equivalent). Is that a pipe dream?

 Eran

Haven't known rexecsync. I don't think I'll start using it for this particular use, but it may come in handy for other things. I may even package it for Debian......

[1] http://tromer.org/misc/rexecsync
   "rexecsync: Run a command remotely and save its output to a
    local file, using a difference-based algorithm to reduce
    communication on subsequent updates."
--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html


=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to