> On Sept. 25, 2017, 5:36 a.m., Jie Yu wrote: > > src/log/recover.cpp > > Lines 652 (patched) > > <https://reviews.apache.org/r/62286/diff/1/?file=1820908#file1820908line652> > > > > Putting this in `recover.cpp|hpp` is very wierd. I am leaning towards > > moving this too `catchup.hpp|cpp` and just overload `catchup` method > > (without positions).
Done! > On Sept. 25, 2017, 5:36 a.m., Jie Yu wrote: > > src/log/recover.cpp > > Lines 701 (patched) > > <https://reviews.apache.org/r/62286/diff/1/?file=1820908#file1820908line701> > > > > Any reason not returnning a failure in this case? No. Fixed. > On Sept. 25, 2017, 5:36 a.m., Jie Yu wrote: > > src/log/recover.cpp > > Lines 711 (patched) > > <https://reviews.apache.org/r/62286/diff/1/?file=1820908#file1820908line711> > > > > Can you add a few comments on this. THis is a bit hacky because we can > > essentially hijacting the recovery protocol. Ideally, we probably should > > split the recovery into several phases and uses on the phase that's > > relevant to this. Per our offline discussion, exposed `runRecoverProtocol()` in `recover.hpp` and moved retry logic from `RecoverProtocolProcess` to `RecoverProcess`. > On Sept. 25, 2017, 5:36 a.m., Jie Yu wrote: > > src/log/recover.cpp > > Lines 730 (patched) > > <https://reviews.apache.org/r/62286/diff/1/?file=1820908#file1820908line730> > > > > Can you explain why we need to recover the same `begin` after a restart? Discussed offline. - Ilya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62286/#review186084 ----------------------------------------------------------- On Oct. 4, 2017, 12:16 a.m., Ilya Pronin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62286/ > ----------------------------------------------------------- > > (Updated Oct. 4, 2017, 12:16 a.m.) > > > Review request for mesos and Jie Yu. > > > Bugs: MESOS-7973 > https://issues.apache.org/jira/browse/MESOS-7973 > > > Repository: mesos > > > Description > ------- > > The process is used to catch-up a non-leading VOTING replica by running > the recovery protocol to find current begin and end positions of the log > and catching-up positions that are missing on the replica. This allows > following replicas to serve eventually consistent reads. > > > Diffs > ----- > > src/log/catchup.hpp 123bc7a57e5e89f9ba75c36ba0cbe5ead807c518 > src/log/catchup.cpp 94e1b00db2cd9d5a2368a979c1fd155bb6cac1f2 > src/tests/log_tests.cpp f9f9400c901152779ae0ebfe74cf8f7aac1d3396 > > > Diff: https://reviews.apache.org/r/62286/diff/2/ > > > Testing > ------- > > Added tests that verify that the new recovery process correctly performs and > produces meaningful result under various circumstances (recovered positions > were truncated, replica was lagging far behind). Ran `make check`. > > > Thanks, > > Ilya Pronin > >