On Thu, Jun 08 2017, Shaohua Li wrote: > On Fri, Jun 09, 2017 at 07:24:29AM +1000, Neil Brown wrote: >> On Thu, Jun 08 2017, Mikulas Patocka wrote: >> >> > On Thu, 8 Jun 2017, Shaohua Li wrote: >> > >> >> On Thu, Jun 08, 2017 at 04:59:03PM +1000, Neil Brown wrote: >> >> > On Wed, Jun 07 2017, Mikulas Patocka wrote: >> >> > >> >> > > The function flush_signals clears all pending signals for the >> >> > > process. It >> >> > > may be used by kernel threads when we need to prepare a kernel thread >> >> > > for >> >> > > responding to signals. However using this function for an userspaces >> >> > > processes is incorrect - clearing signals without the program >> >> > > expecting it >> >> > > can cause misbehavior. >> >> > > >> >> > > The raid1 and raid5 code uses flush_signals in its request routine >> >> > > because >> >> > > it wants to prepare for an interruptible wait. This patch drops >> >> > > flush_signals and uses sigprocmask instead to block all signals >> >> > > (including >> >> > > SIGKILL) around the schedule() call. The signals are not lost, but the >> >> > > schedule() call won't respond to them. >> >> > > >> >> > > Signed-off-by: Mikulas Patocka <[email protected]> >> >> > > Cc: [email protected] >> >> > >> >> > Thanks for catching that! >> >> > >> >> > Acked-by: NeilBrown <[email protected]> >> >> >> >> Applied, thanks! >> >> >> >> Neil, >> >> Not about the patch itself. I had question about that part of code. >> >> Dropped >> >> others since this is raid related. I didn't get the point why it's a >> >> TASK_INTERRUPTIBLE sleep. It seems suggesting the thread will bail out if >> >> a >> >> signal is sent. But I didn't see we check the signal and exit the loop. >> >> What's >> >> the correct behavior here? Since the suspend range is controlled by >> >> userspace, >> > >> > As I understand the code - the purpose is to have an UNINTERRUPTIBLE sleep >> > that isn't accounted in load average and that doesn't trigger the hung >> > task warning. >> >> Exactly my reason - yes. >> >> > >> > There should really be something like TASK_UNINTERRUPTIBLE_LONG for this >> > purpose. >> >> That would be nice. >> >> > >> >> I think the correct behavior is if user kills the thread, we exit the >> >> loop. So >> >> it seems like to be we check if there is fatal signal pending, exit the >> >> loop, >> >> and return IO error. Not sure if we should return IO error though. >> > >> > No, this is not correct - if we report an I/O error for the affected bio, >> > it could corrupt filesystem or confuse other device mapper targets that >> > could be on the top of MD. It is not right to corrupt filesystem if the >> > user kills a process. >> >> Yes, we are too deep to even return something like ERESTARTSYS. >> Blocking is the only option. > > My concern is if the app controlling the suspend range dies, other threads > will block in the kernel side forever. We can't even force kill them. This > is an unfortunate behavior. Would adding a timeout here make sense? The app > controlling the suspend range looks part of the disk firmware now. If the > firmware doesn't respond, returning IO timeout is normal.
Yes, this happens. You can write to /sys/block/mdXX/md/suspend_lo to unblock it, but most people don't know this. A timeout might be appropriate, but I would want it to be quite a long one. Several minutes at least... Though if I found I wanted to do some careful raid6repair surgery on an array, and so suspending the problematic region and started work, I might get annoyed if I/O started getting errors, instead of just waiting like I asked.... but maybe that is a rather far-fetched scenario. Thanks, NeilBrown > > Thanks, > Shaohua
signature.asc
Description: PGP signature

