On Thu, Oct 3, 2019 at 9:42 PM Stephen Frost <sfr...@snowman.net> wrote: > > It doesn't seem like it would require > > much work at all to construct an argument that a hacker might enjoy > > having unfettered access to pg_clog even if no other part of the > > database can be read. > > The question isn't about what hackers would like to have access to, it's > about what would actually provide them with a channel to get information > that's sensitive, and at what rate. Perhaps there's an argument to be > made that clog would provide a high enough rate of information that > could be used to glean sensitive information, but that's certainly not > an argument that's been put forth, instead it's the knee-jerk reaction > of "oh goodness, if anything isn't encrypted then hackers will be able > to get access to everything" and that's just not a real argument.
Well, I gather that you didn't much like my characterization of your argument as "making it up as you go along," which is probably fair, but I doubt that the people who are arguing that we should encrypt anything will appreciate your characterization of their argument as "knee-jerk" any better. I think everyone would agree that if you have no information about a database other than the contents of pg_clog, that's not a meaningful information leak. You would be able to tell which transactions committed and which transactions aborted, but since you know nothing about the data inside those transactions, it's of no use to you. However, in that situation, you probably wouldn't be attacking the database in the first place. Most likely you have some knowledge about what it contains. Maybe there's a stream of sensor data that flows into the database, and you can see that stream. By watching pg_clog, you can see when a particular bit of data is rejected. That could be valuable. To take a highly artificial example, suppose that the database is fed by secret video cameras which identify the faces of everyone who boards a commercial aircraft and records all of those names in a database, but high-ranking government officials are exempt from the program and there's a white-list of people whose names can't be inserted. When the system tries, a constraint violation occurs and the transaction aborts. Now, if you see a transaction abort show up in pg_clog, you know that either a high-ranking government official just tried to walk onto a plane, or the system is broken. If you see a whole bunch of aborts within a few hours of each other, separated by lots of successful insertions, maybe you can infer a cabinet meeting. I don't know. That's a little bit of a stretch, but I don't see any reason why something like that can't happen. There are probably more plausible examples. The point is that it's unreasonable, at least in my view, to decide that the knowledge of which transactions commit and which transactions abort isn't sensitive. Yeah, on a lot of systems it won't be, but on some systems it might be, so it should be encrypted. What I really find puzzling here is that Cybertec had a patch that encrypted -- well, I don't remember whether it encrypted this, but it encrypted a lot of stuff, and it spent a lot of time being concerned about these exact kinds of issues. I know for example that they thought about the stats file, which is an even more clear vector for information leakage than we're talking about here. They thought about logical decoding spill files, also a clear vector for information leakage. Pretty sure they also thought about WAL. That's all really important stuff, and one thing I learned from reading that patch is that you can't solve those problems in a trivial, mechanical way. Some of those systems currently write data byte-by-byte, and converting them to work block-by-block makes encrypting them a lot easier. So it seems to me that even if you think that patch had the dumbest key management system in the history of the universe, you ought to be embracing some of the ideas that are in that patch because they'll make any future encryption project easier. Instead of arguing about whether these side-channel attacks are important -- and I seem not to be alone here in believing that they are -- we could be working to get code that has already been written to help solve those problems committed. I ask again -- why are you so opposed to a single-key, encrypt-everything approach? Even if you think multiple-key, encrypt-only-some-things is better, they don't have to block each other. > Which database systems have you looked at which have the properties > you're describing above that we should be working hard towards? I haven't studied other database systems much myself. I have, however, talked with coworkers of mine who are trying to convince people to use PostgreSQL and/or Advanced Server, and I've heard a lot from them about what the customers with whom they work would like to see. I base my comments on those conversations. What I hear from them is basically that anything we could give them would help. More would be better than less, of course. People would like a solution with key rotation better than one without; fine-grained encryption better than coarse-grained encryption; less performance overhead better than more; and an encryption algorithm perceived as highly secure better than one perceived as less secure. But having anything at all would help. Secondarily, what I hear is that a lot of EnterpriseDB customers or potential customers reject filesystem encryption not so much because it's not sufficiently fine-grained, but rather because it depends on root@localhost. Getting root@localhost to cooperate is difficult and undesirable, and also filesystem encryption doesn't help at all to protect against root@localhost. I've pointed out repeatedly to many people that putting the encryption inside the database doesn't *really* fix this problem, because root@localhost can ultimately do anything. But, as I said in my earlier email, people perceive that if the filesystem does the encryption, root can just cp all the files and win, whereas if the database does the encryption, that doesn't work, and root's got to work harder. That seems to matter to a lot of people who are talking to my colleagues here at EnterpriseDB. That may, of course, not matter to your users, and that's fine. I'm not trying to block people from attacking this problem from other angles; but I *am* frustrated that you seem to be trying to block what seems to me to be the most promising angle. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company