On Sun, Jul 5, 2020 at 4:47 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Sat, Jul 4, 2020 at 11:35 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > 9. > > +ReorderBufferHandleConcurrentAbort(ReorderBuffer *rb, ReorderBufferTXN > > *txn, > > { > > .. > > + ReorderBufferToastReset(rb, txn); > > + if (specinsert != NULL) > > + ReorderBufferReturnChange(rb, specinsert); > > .. > > } > > > > Why do we need to do these here when we wouldn't have been done for > > any exception other than ERRCODE_TRANSACTION_ROLLBACK? > > Because we are handling this exception "ERRCODE_TRANSACTION_ROLLBACK" > gracefully and we are continuing with further decoding so we need to > return this change back. >
Okay, then I suggest we should do these before calling stream_stop and also move ReorderBufferResetTXN after calling stream_stop to follow a pattern similar to try block unless there is a reason for not doing so. Also, it would be good if we can initialize specinsert with NULL after returning the change as we are doing at other places. > > 10. I have got the below failure once. I have not investigated this > > in detail as the patch is still under progress. See, if you have any > > idea? > > # Failed test 'check extra columns contain local defaults' > > # at t/013_stream_subxact_ddl_abort.pl line 81. > > # got: '2|0' > > # expected: '1000|500' > > # Looks like you failed 1 test of 2. > > make[2]: *** [check] Error 1 > > make[1]: *** [check-subscription-recurse] Error 2 > > make[1]: *** Waiting for unfinished jobs.... > > make: *** [check-world-src/test-recurse] Error 2 > > Even I got the failure once and after that, it did not reproduce. I > have executed it multiple time but it did not reproduce again. Are > you able to reproduce it consistently? > No, I am also not able to reproduce it consistently but I think this can fail if a subscriber sends the replay_location before actually replaying the changes. First, I thought that extra send_feedback we have in apply_handle_stream_commit might have caused this but I guess that can't happen because we need the commit time location for that and we are storing the same at the end of apply_handle_stream_commit after applying all messages. I am not sure what is going on here. I think we somehow need to reproduce this or some variant of this test consistently to find the root cause. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com