On Sat, Jul 13, 2019 at 6:26 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > I think then I am missing something because what I am talking about is > below code in rbt_insert:
What you're saying here is that, with an rbtree, an exact match will result in a merging of requests which we don't want, so we have to make them always unique. That's fine, but even if you used a binary heap where it wouldn't be absolutely required that you break the ties, you'd still want to think at least a little bit about what behavior is best in case of a tie, just from the point of view of making the system efficient. > I think it would be better to use start_urec_ptr because XID can be > non-unique in our case. As I explained in one of the emails above [1] > that we register the requests for logged and unlogged relations > separately, so XID can be non-unique. Yeah. I didn't understand that explanation. It seems to me that one of the fundamental design questions for this system is whether we should allow there to be an unbounded number of transactions that are pending undo application, or whether it's OK to enforce a hard limit. Either way, there should certainly be pressure applied to try to keep the number low, like forcing undo application into the foreground when a backlog is accumulating, but the question is what to do when that's insufficient. My original idea was that we should not have a hard limit, in which case the shared memory data on what is pending might be incomplete, in which case we would need the discard workers to discover transactions needing undo and add them to the shared memory data structures, and if those structures are full, then we'd just skip adding those details and rediscover those transactions again at some future point. But, my understanding of the current design being implemented is that there is a hard limit on the number of transactions that can be pending undo and the in-memory data structures are sized accordingly. In such a system, we cannot rely on the discard worker(s) to (re)discover transactions that need undo, because if there can be transactions that need undo that we don't know about, then we can't enforce a hard limit correctly. The exception, I suppose, is that after a crash, we'll need to scan all the undo logs and figure out which transactions are pending, but that doesn't preclude using a single queue entry covering both the logged and the unlogged portion of a transaction that has written undo of both kinds. We've got to scan all of the undo logs before we allow any new undo-using transactions to start, and so we can create one fully-up-to-date entry that reflects the data for both persistence levels before any concurrent activity happens. I am wondering (and would love to hear other opinions on) the question of which kind of design we ought to be pursuing, but it's got to be one or the other, not something in the middle. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company