Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On 13-08-15 12:45 PM, Roland Dreier wrote: Jens / James, do you guys plan to send this to Linus for 3.11? Triggering this bug is a bit esoteric but the impact is pretty nasty (corrupting an unrelated process). The patch is fine with me. Even though the sg driver is named in the patch title, I note that the v2 patch is only against the block layer. Hence it does not need an ack from me. Doug Gilbert -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
Jens / James, do you guys plan to send this to Linus for 3.11? Triggering this bug is a bit esoteric but the impact is pretty nasty (corrupting an unrelated process). Thanks, Roland -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On Wed, Aug 7, 2013 at 9:31 AM, Douglas Gilbert wrote: > So what kind of signal was leading to your "stomping on the memory"? > Was it user generated or something like SIGIO, SIGPIPE or a RT signal? It was sometimes SIGHUP (for reopening log files) and sometimes SIGALARM (for various periodic things). > To get around the SG_IO ioctl restart problem (for non idempotent > SCSI commands) could we replace a -ERESTARTSYS return value > with -EINTR ? > > As I noted in a previous post, for robust user space code using the > SG_IO ioctl, masking signals during the IO may help. Yes, absolutely. But process A should be able to keep its memory uncorrupted even if process B is coded wrong :) > And what about bsg? Is it any better or worse than sg in the case > of interrupted SG_IO ioctls? Apart from the interface (sg_io_hdr > v3 versus v4) it should be a drop in replacement for sg. As far as I can tell bsg looks much better w.r.t. signals -- I don't see anywhere that it schedules work onto a workqueue or other kernel thread, and it looks like the SG_IO ioctl there actually has nowhere that checks for signals. All sleeps will be uninterruptible, which I guess may be better or worse depending on your perspective. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On 13-08-07 11:50 AM, Roland Dreier wrote: On Wed, Aug 7, 2013 at 7:38 AM, David Milburn wrote: I was able to succesfully test this patch overnight, I had been experimenting with the sg driver setting the BIO_NULL_MAPPED flag in sg_rq_end_io_usercontext for a orphan process which prevented the corruption, but your solution seems much better. Very cool, thanks for the testing. I actually looked at using BIO_NULL_MAPPED as well, but it seemed a bit too fragile to me -- it had the right effect of skipping __bio_copy_iov(), and skipping the __free_pages() stuff in there is OK because sg owns its pages rather than the bio layer, but all that seemed vulnerable to being broken by an unrelated change. Out of curiousity, were you already working on this bug? Because if you had fixed it a few weeks earlier we might not have spent so long wondering WTF was stomping on the memory of one of our processes :) Roland, So what kind of signal was leading to your "stomping on the memory"? Was it user generated or something like SIGIO, SIGPIPE or a RT signal? To get around the SG_IO ioctl restart problem (for non idempotent SCSI commands) could we replace a -ERESTARTSYS return value with -EINTR ? As I noted in a previous post, for robust user space code using the SG_IO ioctl, masking signals during the IO may help. And what about bsg? Is it any better or worse than sg in the case of interrupted SG_IO ioctls? Apart from the interface (sg_io_hdr v3 versus v4) it should be a drop in replacement for sg. Doug Gilbert -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
Roland Dreier wrote: On Wed, Aug 7, 2013 at 7:38 AM, David Milburn wrote: I was able to succesfully test this patch overnight, I had been experimenting with the sg driver setting the BIO_NULL_MAPPED flag in sg_rq_end_io_usercontext for a orphan process which prevented the corruption, but your solution seems much better. Very cool, thanks for the testing. I actually looked at using BIO_NULL_MAPPED as well, but it seemed a bit too fragile to me -- it had the right effect of skipping __bio_copy_iov(), and skipping the __free_pages() stuff in there is OK because sg owns its pages rather than the bio layer, but all that seemed vulnerable to being broken by an unrelated change. Out of curiousity, were you already working on this bug? Because if you had fixed it a few weeks earlier we might not have spent so long wondering WTF was stomping on the memory of one of our processes :) Hi Roland, Actually, I was waiting for confirmation from the field which I recently received, I was getting ready to bring this up on linux-scsi, sorry I should have brought it up sooner. I wasn't positive that setting BIO_NULL_MAPPED flag from sg driver was the fix. David Jeffery came up with a reproducer which I ran overnight on the latest upstream kernel with your patch. Thanks, David -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On Wed, Aug 7, 2013 at 7:38 AM, David Milburn wrote: > I was able to succesfully test this patch overnight, I had been experimenting > with the > sg driver setting the BIO_NULL_MAPPED flag in sg_rq_end_io_usercontext for a > orphan process > which prevented the corruption, but your solution seems much better. Very cool, thanks for the testing. I actually looked at using BIO_NULL_MAPPED as well, but it seemed a bit too fragile to me -- it had the right effect of skipping __bio_copy_iov(), and skipping the __free_pages() stuff in there is OK because sg owns its pages rather than the bio layer, but all that seemed vulnerable to being broken by an unrelated change. Out of curiousity, were you already working on this bug? Because if you had fixed it a few weeks earlier we might not have spent so long wondering WTF was stomping on the memory of one of our processes :) - R. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
Roland Dreier wrote: From: Roland Dreier There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp->read_wait, (srp_done(sfp, srp) || sdp->detached)); but neither srp_done() nor sdp->detached is true, so we end up just setting srp->orphan and returning to userspace: srp->orphan = 1; write_unlock_irq(&sfp->rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(&sfp->rq_list_lock, iflags); if (unlikely(srp->orphan)) { if (sfp->keep_orphan) srp->sg_io_owned = 0; else done = 0; } srp->done = done; write_unlock_irqrestore(&sfp->rq_list_lock, iflags); if (likely(done)) { /* Now wake up any sg_read() that is waiting for this * packet. */ wake_up_interruptible(&sfp->read_wait); kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN); kref_put(&sfp->f_ref, sg_remove_sfp); } else { INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext); schedule_work(&srp->ew.work); } Since srp->orphan *is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() -> sg_finish_rem_req() -> blk_rq_unmap_user() -> ... -> bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current->mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! As suggested by James Bottomley , add a check for current->mm (which is NULL if we're on a kernel thread without a real userspace address space) in bio_uncopy_user(), and skip the copy if we're on a kernel thread. There's no reason that I can think of for any caller of bio_uncopy_user() to want to do copying on a kernel thread with a random active userspace address space. Huge thanks to Costa Sapuntzakis for the original pointer to this bug in the sg code. Signed-off-by: Roland Dreier Cc: --- fs/bio.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/fs/bio.c b/fs/bio.c index 94bbc04..c5eae72 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs, int bio_uncopy_user(struct bio *bio) { struct bio_map_data *bmd = bio->bi_private; - int ret = 0; + struct bio_vec *bvec; + int ret = 0, i; - if (!bio_flagged(bio, BIO_NULL_MAPPED)) - ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, -bmd->nr_sgvecs, bio_data_dir(bio) == READ, -0, bmd->is_our_pages); + if (!bio_flagged(bio, BIO_NULL_MAPPED)) { + /* +* if we're in a workqueue, the request is orphaned, so +* don't copy into a random user address space, just free. +*/ + if (current->mm) + ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, +bmd->nr_sgvecs, bio_data_dir(bio) == READ, +0, bmd->is_our_pages); + else if (bmd->is_our_pages) + bio_for_each_segment_all(bvec, bio, i) + __free_page(bvec->bv_page); + } bio_free_map_data(bmd); bio_put(bio); return ret; Hi Roland, I was able to succesfully test this patch overnight, I had been experimenting with
[PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
From: Roland Dreier There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp->read_wait, (srp_done(sfp, srp) || sdp->detached)); but neither srp_done() nor sdp->detached is true, so we end up just setting srp->orphan and returning to userspace: srp->orphan = 1; write_unlock_irq(&sfp->rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(&sfp->rq_list_lock, iflags); if (unlikely(srp->orphan)) { if (sfp->keep_orphan) srp->sg_io_owned = 0; else done = 0; } srp->done = done; write_unlock_irqrestore(&sfp->rq_list_lock, iflags); if (likely(done)) { /* Now wake up any sg_read() that is waiting for this * packet. */ wake_up_interruptible(&sfp->read_wait); kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN); kref_put(&sfp->f_ref, sg_remove_sfp); } else { INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext); schedule_work(&srp->ew.work); } Since srp->orphan *is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() -> sg_finish_rem_req() -> blk_rq_unmap_user() -> ... -> bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current->mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! As suggested by James Bottomley , add a check for current->mm (which is NULL if we're on a kernel thread without a real userspace address space) in bio_uncopy_user(), and skip the copy if we're on a kernel thread. There's no reason that I can think of for any caller of bio_uncopy_user() to want to do copying on a kernel thread with a random active userspace address space. Huge thanks to Costa Sapuntzakis for the original pointer to this bug in the sg code. Signed-off-by: Roland Dreier Cc: --- fs/bio.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/fs/bio.c b/fs/bio.c index 94bbc04..c5eae72 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs, int bio_uncopy_user(struct bio *bio) { struct bio_map_data *bmd = bio->bi_private; - int ret = 0; + struct bio_vec *bvec; + int ret = 0, i; - if (!bio_flagged(bio, BIO_NULL_MAPPED)) - ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, -bmd->nr_sgvecs, bio_data_dir(bio) == READ, -0, bmd->is_our_pages); + if (!bio_flagged(bio, BIO_NULL_MAPPED)) { + /* +* if we're in a workqueue, the request is orphaned, so +* don't copy into a random user address space, just free. +*/ + if (current->mm) + ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, +bmd->nr_sgvecs, bio_data_dir(bio) == READ, +0, bmd->is_our_pages); + else if (bmd->is_our_pages) + bio_for_each_segment_all(bvec, bio, i) + __free_page(bvec->bv_page); + } bio_free_map_data(bmd); bio_put(bio); return ret; -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to m