Ping? Anyone know this area of code? Can we just remove one monitor_resume() call from migrate_fd_put_buffer() ?
09.07.2011 15:07, Michael Tokarev wrote: > After some debugging I found a programming error in > error handling in migration, but I'm not sure how to > fix it. > > When migration starts, monitor gets suspended, calling > monitor_suspend() routine which increments assotiated > suspend_cnt counter. > > At the end of migration, in migrate_fd_cleanup(), > monitor_resume() gets called, which decrements the > counter. > > But monitor_resume() gets also called from another > place, in migrate_fd_put_buffer(), in case we > encountered a write error. > > So, suppose a tcp endpoint has disconnected, or the > exec: program terminated due to error or whatnot -- > in all these cases write will fail, and we'll call > monitor_resume() twice as a result: once in this > place in migrate_fd_put_buffer(), and once more at > the end in migrate_fd_cleanup(). > > This results in suspend_cnt being decremented twice, > with the resultant value being -1. > > So monitor_can_read() will return 0 from now on, since > it compares suspend_cnt with 0. And hence, monitor will > stop working. > > To me it looks like monitor_resume() call should be > removed from migrate_fd_put_buffer(), but I'm not sure > _why_ it were here in the first place. > > There's more: monitor_suspend() gets called from within > protocol handlers (using migrate_fd_monitor_suspend() > routine), -- are we sure that all current and future > protocol handlers will call this function? > > Thanks! > > /mjt >