On Fri, Oct 4, 2019 at 5:49 PM Bruce Momjian <br...@momjian.us> wrote: > We spend a lot of time figuring out exactly how to safely encrypt WAL, > heap, index, and pgsql_tmp files. The idea of doing this for another > 20 types of files --- to find a safe nonce, to be sure a file rewrite > doesn't reuse the nonce, figuring the API, crash recovery, forensics, > tool interface --- is something I would like to avoid. I want to avoid > it not because I don't like work, but because I am afraid the code > impact and fragility will doom the feature.
I'm concerned about that, too, but there's no getting around the fact that there are a bunch of types of files and that they do all need to be dealt with. If we have a good scheme for doing that, hopefully extending it to additional types of files is not that bad, which would then spare us the trouble of arguing about each one individually, and also be more secure. As I also said to Stephen, the people who are discussing this here should *really really really* be looking at the Cybertec patch instead of trying to invent everything from scratch - unless that patch has, like, typhoid, or something, in which case please let me know so that I, too, can avoid looking at it. Even if you wanted to use 0% of the code, you could look at the list of file types that they consider encrypting and think about whether you agree with the decisions they made. I suspect that you would quickly find that you've left some things out of your list. In fact, I can think of a couple pretty clear examples, like the stats files, which clearly contain user data. Another reason that you should go look at that patch is because it actually tries to grapple with the exact problem that you're worrying about in the abstract: there are a LOT of different kinds of files and they all need to be handled somehow. Even if you can convince yourself that things like pg_clog don't need encryption, which I think is a pretty tough sell, there are LOT of file types that directly contain user data and do need to be handled. A lot of the code that writes those various types of files is pretty ad-hoc. It doesn't necessarily do nice things like build up a block of data and then write it out together; it may for example write a byte a time. That's not going to work well for encryption, I think, so the Cybertec patch changes that stuff around. I personally don't think that the patch does that in a way that is sufficiently clean and carefully considered for it to be integrated into core, and my plan had been to work on that with the patch authors. However, that plan has been somewhat derailed by the fact that we now have hundreds of emails arguing about the design, because I don't want to be trying to push water up a hill if everyone else is going in a different direction. It looks to me, though, like we haven't really gotten beyond the point where that patch already was. The issues of nonce and many file types have already been thought about carefully there. I rather suspect that they did not get it all right. But, it seems to me that it would be a lot more useful to look at the code actually written and think about what it gets right and wrong than to discuss these points as a strictly theoretical matter. In other words: maybe I'm wrong here, but it looks to me like we're laboriously reinventing the wheel when we could be working on improving the working prototype. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company