Re: file checksums
On Wed, May 10, 2000 at 05:30:53PM -0400, Theodore Y. Ts'o wrote: Consider that block-swapping attacks can preserve the checksum even though the attacker doesn't know the underlying data. That's why I also added: each 64 bit or (128 bit) block is enciphered with a different key. In the theoretical point of view, a good encryption function is undistinguishable from a random permutation of the message space. Moreover, two instances of this function, with two different keys, should not have any more relation between them than two random permutation. Otherwise, it is called "related keys" and considered as a valid attack (although academic). AES candidates have been designed in order to achieve these goals. And it appears that 3DES is also good in that respect. Actually, 3DES might be a very good choice for what I intend because: -- it has been well studied, by many smart people and for many years -- in bitslice mode, it is efficient, with a free key schedule -- it has 168 bit keys About bitslice: the trick (rediscovered by Biham and Biryukov in 1997) is to calculate the function with bit-to-bit logic, as if it was specialized hardware. So there is one bit of data in each register/variable. Actually, if we have 64-bit registers, we have 64 parallel instances of 3DES, each running on its own data and with its own key. Bitslice makes fixed permutations free: these are a routing problem resolved at compile-time, not at runtime. So key schedule becomes free, too, and it is easy to make each instance run with its own key. S-boxes become rather expensive, but alltogether, bitslice is a big win. Bitslice was used by distributed.net for their DES-cracking challenge. And I happen to write these lines on an Alpha machine: 64 bits per register, and there are exactly 64 64-bit blocks in 512 bytes (a harddisk sector). Just what was needed. About the key diversification: each 64-bit block is numbered on, say, 40 bits (this leaves room for an eight terabytes filesystem). A simple diversification is the following: xor this number with the 40 least significant bits of the master key. It could be argued that, by doing this, we provide the attacker up to 2^40 keys, so we might (intuitively) reduce the cost of an exhaustive search by a factor 2^40. So what ? We still have 128 "secure" bits, which is considered safe nowadays (I personnaly think that 80 bits are sufficient, but 128 bits are some sort of standard; AES required 128-bit keys). I am rather confident in the fact that there is no security problem in my scheme, although I am of course discussing it with other people (this mailing-list is not my only source of information). What I do not quite know is the feasibility of the implementation; my two main questions are: -- will 3DES be fast enough ? -- will the checksum calculation expensive ? To the first question I will soon be able to provide an answer. I have been working on this sort of things (DES, bitslice...) for a long time and the pieces are reassembling. About "fast enough": I mean 100 MB/s on a recent Alpha machine. On a 10 MB/s disk, we could read and write to the disk using only 10% of the cpu, which is acceptable as a default setting. To answer to the second question, I need the experience from the filesystem-guys, who know how a filesystem is typically used, and who are supposed to at least lurk in this mailing-list. Hence this discussion. It is my understanding that typical filesystem write usage is either creating new files and filling them, truncating files to zero length and filling them, and appending to files. For these operations, the checksum cost is zero (in terms of disk accesses). This is not true for mmaped() files (databases, production of executables...) but I think (and I beg for comments) that this may be handled with a per-file exception (for instance, producing the executable file in /tmp and then copying it into place -- since /tmp may be emptied at boottime, it is pointless to ensure data integrity in /tmp when the machine is not powered). Any comment is welcome, of course. I thank you for sharing part of your time reading my prose. --Thomas Pornin
Re: file checksums
In message [EMAIL PROTECTED], Thomas Pornin writes: [...] To answer to the second question, I need the experience from the filesystem-guys, who know how a filesystem is typically used, and who are supposed to at least lurk in this mailing-list. Hence this discussion. It is my understanding that typical filesystem write usage is either creating new files and filling them, truncating files to zero length and filling them, and appending to files. For these operations, the checksum cost is zero (in terms of disk accesses). This is not true for mmaped() files (databases, production of executables...) but I think (and I beg for comments) that this may be handled with a per-file exception (for instance, producing the executable file in /tmp and then copying it into place -- since /tmp may be emptied at boottime, it is pointless to ensure data integrity in /tmp when the machine is not powered). Any comment is welcome, of course. I thank you for sharing part of your time reading my prose. --Thomas Pornin Thomas, f/s usage patterns vary of course. There are at least three papers you should take a look at concerning this topic: L. Mummert and M. Satyanarayanan. Long Term Distributed File Reference Tracing: Implementation and Experience. Technical Report CMU-CS-94-213. Carnegie Mellon University, Pittsburgh, U.S., 1994. Werner Vogels File system usage in Windows NT 4.0 SOSP '99 D. Roselli, J. R. Lorch, and T. E. Anderson. A Comparison of File System Workloads. To appear in USENIX Conf. Proc., June 2000. (You're going to have to wait to read that one, unless you can get an advance copy from the authors. It's an interesting paper.) Also, I've had to visit this issue recently when I added size-changing support to my stackable templates (for compression, CBC encryption, all kinds of encoding, etc.) You can get a copy of a paper (entitled "Performance of Size-Changing Algorithms in Stackable File Systems") in http://www.cs.columbia.edu/~ezk/research/fist/. Cheers, Erez.
Re: file checksums
On Thu, May 11, 2000 at 04:30:58AM -0400, Erez Zadok wrote: Thomas, f/s usage patterns vary of course. There are at least three papers you should take a look at concerning this topic: L. Mummert and M. Satyanarayanan. Long Term Distributed File Reference Tracing: Implementation and Experience. Technical Report CMU-CS-94-213. Carnegie Mellon University, Pittsburgh, U.S., 1994. You mean this one? Long Term Distributed File Reference Tracing: Implementation and Experience (# 45) Mummert, L.B., Satyanarayanan, M. Software-Practice and Experience June 1996, Volume 26, No. 6 Earlier version appeared as: CMU SCS Technical Report, Nov. 1994, CMU-CS-94-213 Incidentally Tom Kroeger from UCSC has pulled most of the traces off the tapes in Fall 1998 and burned them onto CDROMs. For more info, http://www.coda.cs.cmu.edu/DFSTrace/ Jan
Re: file checksums
On Tue, May 09, 2000 at 03:13:40PM -0400, Theodore Y. Ts'o wrote: ... and what prevents the attacker from simply updating the checksum when he's modifying the blocks? As you may have not noticed, I am talking about a block device where every data is enciphered. To be more specific, each 64 bit (or 128 bit) block is enciphered with a different key. The attacker has not access to the data, neither to the checksum. However, he knows where these items are, and may perform modifications (although they would be essentially random). Hence the checksum. Clearly you don't understand about cryptographic checksum. Sarcasm ignored. I have been studying cryptography for the last 5 years. --Thomas Pornin
Re: file checksums
In message [EMAIL PROTECTED], Thomas Pornin writes: On Tue, May 09, 2000 at 03:13:40PM -0400, Theodore Y. Ts'o wrote: ... and what prevents the attacker from simply updating the checksum when he's modifying the blocks? As you may have not noticed, I am talking about a block device where every data is enciphered. To be more specific, each 64 bit (or 128 bit) block is enciphered with a different key. The attacker has not access to the data, neither to the checksum. However, he knows where these items are, and may perform modifications (although they would be essentially random). Hence the checksum. Clearly you don't understand about cryptographic checksum. Sarcasm ignored. I have been studying cryptography for the last 5 years. Thomas, I must agree with Ted. I've written a cryptographic f/s myself, mostly as an exercise in stacking, and also written several papers on related topics. What you've described in this tread didn't inspire me that this is a strong and secure design. Perhaps it wasn't explained in enough detail. Either way, you should follow Ted's advise and pass your detailed design by some of the security oriented mailing lists. (Warning: security buffs aren't as polite in their criticism as the people on this list... :-) The reasons you've explained for separate checksums aren't very compelling. You can get most of it by using a different cipher, perhaps in CBC mode. This would allow you to detect corruptions or attacks in the middle of a file, if that's what you're concerned about. If you haven't already, I suggest you also read up on some of the more prominent papers in the area of secure file systems: - Matt Blaze's CFS - Mazieres's SFS (from the last SOSP) - others you can follow from those two --Thomas Pornin Since I'm partial to stacking anyway, let me suggest an alternate design using stacking. If you can pull that off, you'll have the advantage of a f/s that works on top of any other f/s. This allows safe and fast backups of ciphertext, for example (assuming you're backing up via the f/s, not dump-ing the raw device). - use overlay stacking, so you hide the mount point. This also helps in hiding certain files. - for each file foo, you create a file called foo.ckm, which will contain your checksum information in whatever way you choose. You'll have to come up with a fast, reliable, incremental checksum algorithm. It may exist. I don't know. If you use the xor you've suggested, you're not that secure. If you use MD5, you waste CPU b/c you'll have to re-compute the checksum on every tiny change to the file. The .ckm file may contain checksum info for the whole file, per block, or whatever you choose. Your f/s will manage that file any way it wants. - make sure that the .ckm files aren't listed by default: this means hacking readdir and lookup, possibly others. - make sure that only authenticated authorized users can view/access/modify idx files (if you wanted that). You can do that using special purpose ioctls that pass (one time?) keys to the f/s. You can use new ioctls in general to create a whole API for updating the .ckm files. - limit as much as possible root attacks. One of my cryptfs versions stored keys in memory only, hashed by the uid and the session-ID of the process. The SID was added to limit make it harder for root users to decrypt other users' files. It's not totally safe, but every bit helps. You can do much of the above with my stacking templates. I've distributed f/s examples showing how to achieve various features. You can get more info from http://www.cs.columbia.edu/~ezk/research/fist/. Either way, note that this stackable method still doesn't give you all the security you want: attackers can still get at the raw device of the lower f/s; there is window of opportunity b/t the lower mount and the stackable f/s mount; anything that's in memory or can swap/page over the network is vulnerable, and more. You will find that there's a tremendous amount of effort and many details that must be addressed to build a secure file system. I for one would love to see ultra-secure, fast cryptographic file systems become a standard component of operating systems. Good luck. Erez.
Re: file checksums
Date: Tue, 9 May 2000 10:10:01 +0200 From: Thomas Pornin [EMAIL PROTECTED] My "attack model" is offline: the disk is analysed, possibly modified, while the machine is not powered; a somehow similar situation is the following: I send (by snail mail) a Jaz disk to a friend. It could be intercepted, so it should not be readable by anyone except my friend, and it might be modified and sent again. Since the encryption aims at being transparent, it does not provide any protection against online attacks, when the attacker gains root privilege. ... and what prevents the attacker from simply updating the checksum when he's modifying the blocks? Worse yet, you're talking about using an XOR checksum. It's utterly trivial to modify data in such a way that the new data has the same XOR checksum. So an XOR checksum buys you very, very, little. You could use a cryptographic checksum, such as MD5, but such checksums are much larger (they require 128 bits, at least), and you can no longer "take off" data implied by the old data. Otherwise, it's too easy for an attacker to do the same thing. You instead have to recalculate the MD5 checksum from scratch, each time you modify the file in anyway. (With a modified MD5 checksum you can append to the file as long as the size of the file is an exact multiple of 16 bytes and you are appending in 16 byte chunks, but that modification also weakens the result MD5 checksum, for similar reasons). Clearly you don't understand about cryptographic checksum. I suggest that you find a copy of Bruce Schniers's Applied Cryptography, and read it By storing the checksum in the filesystem, the only thing this protects you against is attackers who modify the filesystem via the block device. This is actually a relatively rare attack path. A short (but real) story to illustrate the need for on-the-fly, transparent encryption: in a cryptography-oriented lab, some data was considered "sensitive" (commercial secret, and other stuff). In a Sun station, a disk broke. Sun agreed to replace it for free, all they asked was that the previous disk was returned. However, it was stamped "sensitive", so it could not be sent to Sun. Two years later, the station had not yet been powered up again. The problem was that it was impossible to wipe the disk before sending it, since it was broken. All this could have been solved if any data ever written to the disk had been enciphered. That's silly story! If they really needed to return the disk, all they'd have to do is swipe it a couple of times over a gigantic magnetic degausser, and that would wipe the disk clean. At the NSA, where a friend of mine used to work, they had deals with companies such as HP where they would return the disk without platters in them, or if the company really wanted the platters, they'd sandblast the magnetic oxide off the disks first, and then return the sandblasted platters. - Ted
Re: file checksums
Date:Sun, 7 May 2000 15:12:49 +0200 From: Thomas Pornin [EMAIL PROTECTED] I am asking myself whether it would be easy to maintain at the filesystem level a per-file checksum of the data. I am thinking about a 64-bit value which would be the exclusive or of all 64-bit words in the file (the file being padded with 0 up to the next block limit). If my understanding of the structure of the ext2 filesystem is correct, we would need the following: -- 64 bits free in each inode, in order to store the checksum -- whenever some data is written over old data in the same file, the old data must be reread, so that the checksum could be updated accordingly If the checksum is updated automatically when the file is modified, then it doesn't protect you in the case where the attacker has breached root (or some other user/group privileges) and then uses privilege to modify the the file. This is why all of the programs which provide this kind of crypto checksum store the checksums off-line, on some trusted media. (Example: on a CD-ROM, so that the attacker can't modify the checksum). By storing the checksum in the filesystem, the only thing this protects you against is attackers who modify the filesystem via the block device. This is actually a relatively rare attack path. (However, storing the checksums off-line on a secure, trusted store protects you against both attacks). Besides encryption, the main cost is that data overwritten needs to be read again. I am under the impression that, under typical disk usage, files are truncated to zero size and rewritten from scratch, which does not imply additional cost in my design. There are a number of files where this isn't the case. Database files, and ELF files written by the GCC toolchain come to mind as immediate examples. There are also many files for which you are constantly appending to the file --- log files, for example. If each time you open and append to a file, you have to reread the entire log (which could in some cases grow to hundreds of megabytes) the resulting performance degredation would be somewhat non-optimal. :-) - Ted