Hi, > On the heuristic itself, I am only mildly in favor, and I want to be > honest about how narrow the benefit is. It only helps when a usable > unique index already exists on the subscriber but is not picked first. > > But in that case the correct answer is REPLICA IDENTITY USING INDEX (or > a primary key) on that index, which we already recommend. The case that > really pushes people to use REPLICA IDENTITY FULL: no unique key is > possible, only non-unique indexes is exactly the case this patch > leaves unchanged. Even Ethan's own benchmark uses a table that has a > unique index on "id", which would be better served by setting it as the > replica identity. > > So I would describe this as a small, low-risk improvement to the default > choice, I am fine with it on that basis.
I fully agree with this assessment of the change. It is both convenient and simple for the apply worker to make a clearly better choice if the user hasn't specified the correct index to use as the replica identity. To further justify this patch, we have seen that this mistake has been made by real users which then caused them pain through increased replication lag. After some thought, I decided it would be best to align the change better with this goal (making a simple decision), and therefore I removed the logic to choose based on the number of key columns. Thus, I propose a new patch (attached to this email) which only selects the first unique index and returns early. This may partially address the feedback around looping through the indexes. Furthermore, this simplification makes the behavior more focused and simple for users to understand when multiple indexes are involved. Incorporating other aspects of the index (including the key column logic which I had in v1-v3) would likely make the behavior less intuitive for users. > Yes, I agreed it's not a serious problem. just I wanted to see such the micro > bench. ACK. I will perform some tests on tables with many indexes to see if there is any performance degradation, and I will share the results shortly. > It might be worth factoring in the index size when more than one index > is usable unless others think otherwise. Since the replica identity > index is only re-picked on relcache invalidation, the choice could go > stale as bloat grows, so the apply worker might need to re-check the > replica identity index choice periodically. I partially spoke on this point earlier in my message, but my opinion is that either apply keep the heuristic simplistic, or apply should go into full query planning. In addition, adding relation size to the heuristic would make the behavior both dynamically but also less predictable. For users this might be difficult to understand. Thank you, Ethan Mertz SDE, Amazon Web Services
v4-0001-Improve-index-selection-for-REPLICA-IDENTITY-FULL.patch
Description: Binary data
