On 15 March 2016 at 07:10, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:
> Petr Jelinek wrote: > > On 04/03/16 17:08, Craig Ringer wrote: > > >I'd really appreciate some review of the logic there by people who know > > >timelines well and preferably know the xlogreader. It's really just one > > >function and 2/3 comments; the code is simple but the reasoning leading > > >to it is not. > > > > I think this will have to be work for committer at this point. I can't > find > > any flaws in the logic myself so I unless somebody protests I am going to > > mark this as ready for committer. > > Great, thanks. I've studied this to the point where I'm confident that > it makes sense, so I'm about to push it. > Thanks, though I'm happy enough to wait for Andres's input as he requested. > Also, hopefully you don't have any future callers for the functions I > marked static (I checked the failover slots series and couldn't see > anything). I will push this early tomorrow. > I don't see it being overly useful for an extension, so that sounds fine. > One thing I'm not quite sure about is why this is said to apply to > "logical slots" and not just to replication slots in general. In what > sense does it not apply to physical replication slots? > It's only useful for logical slots currently because physical slot timeline following is done client side by the walreceiver. When the walsender hits the end of the timeline it sends CopyDone and returns to command mode. The client is expected to request the timeline history file, parse it, work out which timeline to request next and specify that in its next START_REPLICATION call. None of that happens for logical slots. Andres was quite deliberate about keeping timelines out of the interface for logical slots and logical replication. I tried to retain that when adding the ability to follow physical failover, keeping things simple for the client. Given that logical replication slots are intended to be consumed by more than just a postgres downstream it IMO makes sense to keep as much internal complexity hidden, especially when dealing with something as surprisingly convoluted as timelines. This does mean that we can't tell if we replayed past the timeline switch point on the old master before we switched to the new master, but I don't think sending a timeline history file is the solution there. I'd rather expose a walsender command and SQL function to let the client say, after connecting, "I last replayed from timeline <x> up to point <y>, if I start replaying from you will I get a consistent stream?". That can be done separately to this patch and IMO isn't crucial since clients can currently work it out, just with more difficulty. Maybe physical rep can use the same logic, but I'm really not convinced. It already knows how to follow timelines client-side and handles that as part of walreceiver and archive recovery. Because recovery handles following the timeline switches and walreceiver just fetches the WAL as an alternative to archive fetches I'm not sure it makes sense; we still have to have all that logic for archive recovery from restore_command, so doing it differently for streaming just makes things more complicated for no clear benefit. I certainly didn't want to address it in the same patch, much like I intentionally avoided trying to teach pg_xlogdump to be able to follow timelines. Partly to keep it simpler and focused, partly because it'd require timeline.c to be made frontend-clean which means no List, no elog, etc... > > (I noticed that we have this new line, > - randAccess = true; > + randAccess = true; /* allow readPageTLI to go > backwards */ > in which now the comment is an identical copy of an identical line > elsewhere; however, randAccess doesn't actually affect readPageTLI in > any way, so AFAICS both comments are now wrong.) > Yeah, that's completely bogus. I have a clear image in my head of randAccess being tested by ValidXLogPageHeader where it sanity checks state->latestPageTLI, the TLI of the *prior* page, against the TLI of the most recent page read, to make sure the TLI advances. But nope, I'm apparently imagining things 'cos it just isn't there, it just tests to see if we read a LSN later than the prior page instead. > > Well for testing purposes it's quite fine I think. The TAP framework > > enhancements needed for this are now in and it works correctly against > > current master. > > Certainly the new src/test/recovery/t/006whatever.pl file is going to be > part of the commit. What I'm not so sure about is > src/test/modules/decoding_failover: is there any reason we don't just > put the new functions in contrib/test_decoding? Because contrib/test_decoding isn't an extension. It is a sample logical decoding output plugin. I didn't want to mess it up by adding an extension control file, extension script and some extension functions. Sure, you *can* use the same shared library as an extension and as a logical decoding output plugin but I think that makes it rather more confusing as an example. Also, because I don't want people reading test_decoding to find those functions and think it's a good idea to actually use them. It's hidden away in src/test/modules for a reason ;) I have no particular objection to stripping the test module and the parts of the t/ script associated with it if you don't think it's worthwhile. I'd be happy to have it as an ongoing test showing that "hot" timeline following on a logical slot does work, but the tests that use only base backups are probably sufficient to detect obvious breakage/regressions. If Michael Paquier's decoder_raw and receiver_raw were in contrib/ I'd add support for doing it into reciever_raw without exposing SQL functions. They aren't in core and can't form part of the core test suite, so I needed something minimalist to prove the concept, and figured it might as well serve as a regression test as well. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services