On 2017-12-12 18:18:09 -0500, David Steele wrote: > On 12/12/17 6:07 PM, Andres Freund wrote: > > > > > > I don't see this as any different than what happens during recovery. The > > > unlogged forks are cleaned / re-inited before replay starts which is the > > > same thing we are doing here. > > > > It's quite different - in the recovery case there's no other write > > activity going on. But on a normally running cluster the persistence of > > existing tables can get changed, and oids can get recycled. What > > guarantees that between the time you checked for the init fork the table > > hasn't been dropped, the oid reused and now a permanent relation is in > > its place? > > Well, that's a good point! > > How about rechecking the presence of the init fork after a main/other fork > has been found? Is it possible for an init fork to still be lying around > after an oid has been recycled? Seems like it could be...
I don't see how that'd help. You could just have gone through this cycle multiple times by the time you get to rechecking. All not very likely, but I don't want us to rely on luck here... If we had a way to prevent relfilenode reuse across multiple checkpoints this'd be easier, although ALTER TABLE SET UNLOGGED still'd complicate. I guess we could have the basebackup create placeholder files that prevent relfilenode reuse, but that seems darned ugly. Greetings, Andres Freund