Re: file checksums

2000-05-11 Thread Thomas Pornin

On Wed, May 10, 2000 at 05:30:53PM -0400, Theodore Y. Ts'o wrote:
 Consider that block-swapping attacks can preserve the checksum even
 though the attacker doesn't know the underlying data.

That's why I also added:

 each 64 bit or (128 bit) block is enciphered with a different key.

In the theoretical point of view, a good encryption function is
undistinguishable from a random permutation of the message space.
Moreover, two instances of this function, with two different keys,
should not have any more relation between them than two random
permutation. Otherwise, it is called "related keys" and considered as a
valid attack (although academic).

AES candidates have been designed in order to achieve these goals. And
it appears that 3DES is also good in that respect. Actually, 3DES
might be a very good choice for what I intend because:
-- it has been well studied, by many smart people and for many years
-- in bitslice mode, it is efficient, with a free key schedule
-- it has 168 bit keys

About bitslice: the trick (rediscovered by Biham and Biryukov in
1997) is to calculate the function with bit-to-bit logic, as if
it was specialized hardware. So there is one bit of data in each
register/variable. Actually, if we have 64-bit registers, we have 64
parallel instances of 3DES, each running on its own data and with its
own key. Bitslice makes fixed permutations free: these are a routing
problem resolved at compile-time, not at runtime. So key schedule
becomes free, too, and it is easy to make each instance run with its
own key. S-boxes become rather expensive, but alltogether, bitslice is
a big win. Bitslice was used by distributed.net for their DES-cracking
challenge.
And I happen to write these lines on an Alpha machine: 64 bits per
register, and there are exactly 64 64-bit blocks in 512 bytes (a
harddisk sector). Just what was needed.

About the key diversification: each 64-bit block is numbered on, say,
40 bits (this leaves room for an eight terabytes filesystem). A simple
diversification is the following: xor this number with the 40 least
significant bits of the master key. It could be argued that, by doing
this, we provide the attacker up to 2^40 keys, so we might (intuitively)
reduce the cost of an exhaustive search by a factor 2^40. So what ? We
still have 128 "secure" bits, which is considered safe nowadays (I
personnaly think that 80 bits are sufficient, but 128 bits are some
sort of standard; AES required 128-bit keys).


I am rather confident in the fact that there is no security problem
in my scheme, although I am of course discussing it with other people
(this mailing-list is not my only source of information).
What I do not quite know is the feasibility of the implementation; my
two main questions are:
-- will 3DES be fast enough ?
-- will the checksum calculation expensive ?

To the first question I will soon be able to provide an answer. I have
been working on this sort of things (DES, bitslice...) for a long time
and the pieces are reassembling. About "fast enough": I mean 100 MB/s on
a recent Alpha machine. On a 10 MB/s disk, we could read and write to
the disk using only 10% of the cpu, which is acceptable as a default
setting.

To answer to the second question, I need the experience from the
filesystem-guys, who know how a filesystem is typically used, and
who are supposed to at least lurk in this mailing-list. Hence this
discussion.
It is my understanding that typical filesystem write usage is either
creating new files and filling them, truncating files to zero length and
filling them, and appending to files. For these operations, the checksum
cost is zero (in terms of disk accesses). This is not true for mmaped()
files (databases, production of executables...) but I think (and I beg
for comments) that this may be handled with a per-file exception (for
instance, producing the executable file in /tmp and then copying it
into place -- since /tmp may be emptied at boottime, it is pointless to
ensure data integrity in /tmp when the machine is not powered).


Any comment is welcome, of course. I thank you for sharing part of your
time reading my prose.


--Thomas Pornin



Re: file checksums

2000-05-11 Thread Erez Zadok

In message [EMAIL PROTECTED], Thomas Pornin writes:
[...]
 To answer to the second question, I need the experience from the
 filesystem-guys, who know how a filesystem is typically used, and
 who are supposed to at least lurk in this mailing-list. Hence this
 discussion.
 It is my understanding that typical filesystem write usage is either
 creating new files and filling them, truncating files to zero length and
 filling them, and appending to files. For these operations, the checksum
 cost is zero (in terms of disk accesses). This is not true for mmaped()
 files (databases, production of executables...) but I think (and I beg
 for comments) that this may be handled with a per-file exception (for
 instance, producing the executable file in /tmp and then copying it
 into place -- since /tmp may be emptied at boottime, it is pointless to
 ensure data integrity in /tmp when the machine is not powered).
 
 
 Any comment is welcome, of course. I thank you for sharing part of your
 time reading my prose.
 
 
   --Thomas Pornin

Thomas, f/s usage patterns vary of course.  There are at least three papers
you should take a look at concerning this topic:

L. Mummert and M. Satyanarayanan. 
Long Term Distributed File Reference Tracing: Implementation and Experience. 
Technical Report CMU-CS-94-213. 
Carnegie Mellon University, Pittsburgh, U.S., 1994. 

Werner Vogels
File system usage in Windows NT 4.0
SOSP '99

D. Roselli, J. R. Lorch, and T. E. Anderson. 
A Comparison of File System Workloads. 
To appear in USENIX Conf. Proc., June 2000.
(You're going to have to wait to read that one, unless you can get an
advance copy from the authors.  It's an interesting paper.)

Also, I've had to visit this issue recently when I added size-changing
support to my stackable templates (for compression, CBC encryption, all
kinds of encoding, etc.)  You can get a copy of a paper (entitled
"Performance of Size-Changing Algorithms in Stackable File Systems") in
http://www.cs.columbia.edu/~ezk/research/fist/.

Cheers,
Erez.



Re: file checksums

2000-05-11 Thread Jan Harkes

On Thu, May 11, 2000 at 04:30:58AM -0400, Erez Zadok wrote:
 Thomas, f/s usage patterns vary of course.  There are at least three papers
 you should take a look at concerning this topic:
 
 L. Mummert and M. Satyanarayanan. 
 Long Term Distributed File Reference Tracing: Implementation and Experience. 
 Technical Report CMU-CS-94-213. 
 Carnegie Mellon University, Pittsburgh, U.S., 1994. 

You mean this one?

  Long Term Distributed File Reference Tracing:
   Implementation and Experience  (# 45)
  Mummert, L.B., Satyanarayanan, M. 
  Software-Practice and Experience June 1996, Volume 26, No. 6 

  Earlier version appeared as:
  CMU SCS Technical Report, Nov. 1994, CMU-CS-94-213 

Incidentally Tom Kroeger from UCSC has pulled most of the traces off
the tapes in Fall 1998 and burned them onto CDROMs. For more info,

http://www.coda.cs.cmu.edu/DFSTrace/

Jan




Re: file checksums

2000-05-10 Thread Thomas Pornin

On Tue, May 09, 2000 at 03:13:40PM -0400, Theodore Y. Ts'o wrote:
 ... and what prevents the attacker from simply updating the checksum
 when he's modifying the blocks?  

As you may have not noticed, I am talking about a block device where
every data is enciphered. To be more specific, each 64 bit (or 128 bit)
block is enciphered with a different key. The attacker has not access to
the data, neither to the checksum. However, he knows where these items
are, and may perform modifications (although they would be essentially
random). Hence the checksum.

 Clearly you don't understand about cryptographic checksum.

Sarcasm ignored. I have been studying cryptography for the last 5 years.

--Thomas Pornin



Re: file checksums

2000-05-10 Thread Erez Zadok

In message [EMAIL PROTECTED], Thomas Pornin writes:
 On Tue, May 09, 2000 at 03:13:40PM -0400, Theodore Y. Ts'o wrote:
  ... and what prevents the attacker from simply updating the checksum
  when he's modifying the blocks?  
 
 As you may have not noticed, I am talking about a block device where
 every data is enciphered. To be more specific, each 64 bit (or 128 bit)
 block is enciphered with a different key. The attacker has not access to
 the data, neither to the checksum. However, he knows where these items
 are, and may perform modifications (although they would be essentially
 random). Hence the checksum.
 
  Clearly you don't understand about cryptographic checksum.
 
 Sarcasm ignored. I have been studying cryptography for the last 5 years.

Thomas, I must agree with Ted.  I've written a cryptographic f/s myself,
mostly as an exercise in stacking, and also written several papers on
related topics.  What you've described in this tread didn't inspire me that
this is a strong and secure design.  Perhaps it wasn't explained in enough
detail.  Either way, you should follow Ted's advise and pass your detailed
design by some of the security oriented mailing lists.  (Warning: security
buffs aren't as polite in their criticism as the people on this list... :-)

The reasons you've explained for separate checksums aren't very compelling.
You can get most of it by using a different cipher, perhaps in CBC mode.
This would allow you to detect corruptions or attacks in the middle of a
file, if that's what you're concerned about.

If you haven't already, I suggest you also read up on some of the more
prominent papers in the area of secure file systems:

- Matt Blaze's CFS
- Mazieres's SFS (from the last SOSP)
- others you can follow from those two

   --Thomas Pornin

Since I'm partial to stacking anyway, let me suggest an alternate design
using stacking.  If you can pull that off, you'll have the advantage of a
f/s that works on top of any other f/s.  This allows safe and fast backups
of ciphertext, for example (assuming you're backing up via the f/s, not
dump-ing the raw device).

- use overlay stacking, so you hide the mount point.  This also helps in
  hiding certain files.

- for each file foo, you create a file called foo.ckm, which will contain
  your checksum information in whatever way you choose.  You'll have to come
  up with a fast, reliable, incremental checksum algorithm.  It may exist.
  I don't know.  If you use the xor you've suggested, you're not that
  secure.  If you use MD5, you waste CPU b/c you'll have to re-compute the
  checksum on every tiny change to the file.  The .ckm file may contain
  checksum info for the whole file, per block, or whatever you choose.  Your
  f/s will manage that file any way it wants.

- make sure that the .ckm files aren't listed by default: this means hacking
  readdir and lookup, possibly others.

- make sure that only authenticated authorized users can view/access/modify
  idx files (if you wanted that).  You can do that using special purpose
  ioctls that pass (one time?) keys to the f/s.  You can use new ioctls in
  general to create a whole API for updating the .ckm files.

- limit as much as possible root attacks.  One of my cryptfs versions stored
  keys in memory only, hashed by the uid and the session-ID of the process.
  The SID was added to limit make it harder for root users to decrypt other
  users' files.  It's not totally safe, but every bit helps.

You can do much of the above with my stacking templates.  I've distributed
f/s examples showing how to achieve various features.  You can get more info
from http://www.cs.columbia.edu/~ezk/research/fist/.  Either way, note that
this stackable method still doesn't give you all the security you want:
attackers can still get at the raw device of the lower f/s; there is window
of opportunity b/t the lower mount and the stackable f/s mount; anything
that's in memory or can swap/page over the network is vulnerable, and more.

You will find that there's a tremendous amount of effort and many details
that must be addressed to build a secure file system.  I for one would love
to see ultra-secure, fast cryptographic file systems become a standard
component of operating systems.

Good luck.

Erez.



Re: file checksums

2000-05-09 Thread Theodore Y. Ts'o

   Date: Tue, 9 May 2000 10:10:01 +0200
   From: Thomas Pornin [EMAIL PROTECTED]

   My "attack model" is offline: the disk is analysed, possibly modified,
   while the machine is not powered; a somehow similar situation is the
   following: I send (by snail mail) a Jaz disk to a friend. It could be
   intercepted, so it should not be readable by anyone except my friend,
   and it might be modified and sent again.

   Since the encryption aims at being transparent, it does not provide
   any protection against online attacks, when the attacker gains root
   privilege.

... and what prevents the attacker from simply updating the checksum
when he's modifying the blocks?  

Worse yet, you're talking about using an XOR checksum.  It's utterly
trivial to modify data in such a way that the new data has the same XOR
checksum.  So an XOR checksum buys you very, very, little.  You could
use a cryptographic checksum, such as MD5, but such checksums are much
larger (they require 128 bits, at least), and you can no longer "take
off" data implied by the old data.  Otherwise, it's too easy for an
attacker to do the same thing.  You instead have to recalculate the MD5
checksum from scratch, each time you modify the file in anyway.  
(With a modified MD5 checksum you can append to the file as long as the
size of the file is an exact multiple of 16 bytes and you are appending
in 16 byte chunks, but that modification also weakens the result MD5
checksum, for similar reasons).

Clearly you don't understand about cryptographic checksum.  I suggest
that you find a copy of Bruce Schniers's Applied Cryptography, and read
it

By storing the checksum in the filesystem, the only thing this protects
you against is attackers who modify the filesystem via the block
device.  This is actually a relatively rare attack path.

   A short (but real) story to illustrate the need for on-the-fly,
   transparent encryption: in a cryptography-oriented lab, some data was
   considered "sensitive" (commercial secret, and other stuff). In a Sun
   station, a disk broke. Sun agreed to replace it for free, all they
   asked was that the previous disk was returned. However, it was stamped
   "sensitive", so it could not be sent to Sun. Two years later, the
   station had not yet been powered up again.

   The problem was that it was impossible to wipe the disk before sending
   it, since it was broken. All this could have been solved if any data
   ever written to the disk had been enciphered.

That's silly story!  If they really needed to return the disk, all
they'd have to do is swipe it a couple of times over a gigantic magnetic
degausser, and that would wipe the disk clean.  At the NSA, where a
friend of mine used to work, they had deals with companies such as HP
where they would return the disk without platters in them, or if the
company really wanted the platters, they'd sandblast the magnetic oxide
off the disks first, and then return the sandblasted platters.

- Ted



Re: file checksums

2000-05-08 Thread Theodore Y. Ts'o

   Date:Sun, 7 May 2000 15:12:49 +0200
   From: Thomas Pornin [EMAIL PROTECTED]

   I am asking myself whether it would be easy to maintain at the
   filesystem level a per-file checksum of the data. I am thinking about a
   64-bit value which would be the exclusive or of all 64-bit words in the
   file (the file being padded with 0 up to the next block limit).

   If my understanding of the structure of the ext2 filesystem is correct,
   we would need the following:
   -- 64 bits free in each inode, in order to store the checksum
   -- whenever some data is written over old data in the same file, the old
  data must be reread, so that the checksum could be updated accordingly

If the checksum is updated automatically when the file is modified, then
it doesn't protect you in the case where the attacker has breached root
(or some other user/group privileges) and then uses privilege to modify
the the file.  This is why all of the programs which provide this kind
of crypto checksum store the checksums off-line, on some trusted media.
(Example: on a CD-ROM, so that the attacker can't modify the checksum).

By storing the checksum in the filesystem, the only thing this protects
you against is attackers who modify the filesystem via the block
device.  This is actually a relatively rare attack path.  (However,
storing the checksums off-line on a secure, trusted store protects you
against both attacks).

   Besides encryption, the main cost is that data overwritten needs to be
   read again. I am under the impression that, under typical disk usage,
   files are truncated to zero size and rewritten from scratch, which does
   not imply additional cost in my design.

There are a number of files where this isn't the case.  Database files,
and ELF files written by the GCC toolchain come to mind as immediate
examples.  There are also many files for which you are constantly
appending to the file --- log files, for example.   If each time you
open and append to a file, you have to reread the entire log (which
could in some cases grow to hundreds of megabytes) the resulting
performance degredation would be somewhat non-optimal.  :-)

- Ted