Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On 13-08-05 11:54 PM, Peter Chang wrote: 2013/8/5 Roland Dreier rol...@kernel.org: From: Roland Dreier rol...@purestorage.com There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp-read_wait, (srp_done(sfp, srp) || sdp-detached)); but neither srp_done() nor sdp-detached is true, so we end up just setting srp-orphan and returning to userspace: srp-orphan = 1; write_unlock_irq(sfp-rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. i think that an additional issue here is that part of reissuing the ioctl is re-queueing the command. since the re-queue is at the front of the block queue there are issues if the command is non-idempotent. Re-issuing a SG_IO ioctl is wrong. More and more SCSI commands (even in SBC-3) are non-idempotent (e.g. COMPARE AND WRITE). And the st driver gets to use the block layer as well and many of its important SCSI commands (SSC) are non-idempotent. we have a local fix that gets rid of most of the orphan stuff and re-waiting if a non-fatal signal was waiting. simpler than unmapping but maybe we're missing some other interesting case? Like to share that fix with us? Also I'm interested in how you know from within a kernel driver whether a signal sent to the controlling process is fatal or not? For example SIGIO's default action is terminate but sg assumes if the controlling process requests SIGIO generation then it will at least override that default action and handle it sensibly. Is there a way to check that assumption? Looked at bsg in the situation where a signal interrupts a SG_IO ioctl. It seems broken; anyone like to comment on this snippet from bsg if a signal hits the first call: blk_execute_rq(bd-queue, NULL, rq, at_head); ret = blk_complete_sgv4_hdr_rq(rq, hdr, bio, bidi_bio); if (copy_to_user(uarg, hdr, sizeof(hdr))) return -EFAULT; As an aside, I got tired of handling signals during SCSI commands in the ddpt utility, especially after adding tape support. So it now masks all the usual suspects during the IO then checks for signals in a small window between IOs. Non maskable signals will still terminate but of course the user gets no guarantees, but it would be still reasonable in the termination case that the interrupted SCSI command was _not_ resubmitted. Doug Gilbert -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
From: Roland Dreier rol...@purestorage.com There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp-read_wait, (srp_done(sfp, srp) || sdp-detached)); but neither srp_done() nor sdp-detached is true, so we end up just setting srp-orphan and returning to userspace: srp-orphan = 1; write_unlock_irq(sfp-rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(sfp-rq_list_lock, iflags); if (unlikely(srp-orphan)) { if (sfp-keep_orphan) srp-sg_io_owned = 0; else done = 0; } srp-done = done; write_unlock_irqrestore(sfp-rq_list_lock, iflags); if (likely(done)) { /* Now wake up any sg_read() that is waiting for this * packet. */ wake_up_interruptible(sfp-read_wait); kill_fasync(sfp-async_qp, SIGPOLL, POLL_IN); kref_put(sfp-f_ref, sg_remove_sfp); } else { INIT_WORK(srp-ew.work, sg_rq_end_io_usercontext); schedule_work(srp-ew.work); } Since srp-orphan *is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() - sg_finish_rem_req() - blk_rq_unmap_user() - ... - bio_uncopy_user() - __bio_copy_iov() - copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current-mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! Fix this by telling sg_finish_rem_req() whether we're on a workqueue or not, and if we are, calling a new function blk_rq_unmap_user_nocopy() that does everything the original blk_rq_unmap_user() does except calling copy_{to,from}_user(). This requires a few levels of plumbing through a copy flag in the bio layer. I also considered fixing this by having the sg code just set BIO_NULL_MAPPED for bios that are unmapped from a workqueue, which happens to work because the __free_page() part of __bio_copy_iov() isn't needed for sg (because sg handles its own pages). However, this seems coincidental and fragile, so I preferred making the fix explicit, at the cost of minor tweaks to the bio code. Huge thanks to Costa Sapuntzakis co...@purestorage.com for the original pointer to this bug in the sg code. Signed-off-by: Roland Dreier rol...@purestorage.com Cc: sta...@vger.kernel.org --- block/blk-map.c| 15 --- drivers/scsi/sg.c | 19 ++- fs/bio.c | 22 +++--- include/linux/bio.h| 2 +- include/linux/blkdev.h | 11 ++- 5 files changed, 44 insertions(+), 25 deletions(-) diff --git a/block/blk-map.c b/block/blk-map.c index 623e1cd..bd63201 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -25,7 +25,7 @@ int blk_rq_append_bio(struct request_queue *q, struct request *rq, return 0; } -static int __blk_rq_unmap_user(struct bio *bio) +static int __blk_rq_unmap_user(struct bio *bio, bool copy) { int ret = 0; @@ -33,7 +33,7 @@ static int __blk_rq_unmap_user(struct bio *bio) if (bio_flagged(bio, BIO_USER_MAPPED)) bio_unmap_user(bio); else - ret = bio_uncopy_user(bio); + ret = bio_uncopy_user(bio, copy); } return ret; @@ -80,7 +80,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq, /* if it was boucned we must call the end io function */ bio_endio(bio, 0); -
Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On Mon, 2013-08-05 at 15:02 -0700, Roland Dreier wrote: From: Roland Dreier rol...@purestorage.com There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp-read_wait, (srp_done(sfp, srp) || sdp-detached)); but neither srp_done() nor sdp-detached is true, so we end up just setting srp-orphan and returning to userspace: srp-orphan = 1; write_unlock_irq(sfp-rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(sfp-rq_list_lock, iflags); if (unlikely(srp-orphan)) { if (sfp-keep_orphan) srp-sg_io_owned = 0; else done = 0; } srp-done = done; write_unlock_irqrestore(sfp-rq_list_lock, iflags); if (likely(done)) { /* Now wake up any sg_read() that is waiting for this * packet. */ wake_up_interruptible(sfp-read_wait); kill_fasync(sfp-async_qp, SIGPOLL, POLL_IN); kref_put(sfp-f_ref, sg_remove_sfp); } else { INIT_WORK(srp-ew.work, sg_rq_end_io_usercontext); schedule_work(srp-ew.work); } Since srp-orphan *is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() - sg_finish_rem_req() - blk_rq_unmap_user() - ... - bio_uncopy_user() - __bio_copy_iov() - copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current-mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! Fix this by telling sg_finish_rem_req() whether we're on a workqueue or not, and if we are, calling a new function blk_rq_unmap_user_nocopy() that does everything the original blk_rq_unmap_user() does except calling copy_{to,from}_user(). This requires a few levels of plumbing through a copy flag in the bio layer. I agree with the analysis. The fix is a bit draconian, though. A workqueue actually runs in a kernel thread and there's a simple test for that (!current-mm), so how about this instead (which is much less intrusive) James --- diff --git a/fs/bio.c b/fs/bio.c index 94bbc04..e2ab39c 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs, int bio_uncopy_user(struct bio *bio) { struct bio_map_data *bmd = bio-bi_private; - int ret = 0; + struct bio_vec *bvec; + int ret = 0, i; - if (!bio_flagged(bio, BIO_NULL_MAPPED)) - ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs, -bmd-nr_sgvecs, bio_data_dir(bio) == READ, -0, bmd-is_our_pages); + if (!bio_flagged(bio, BIO_NULL_MAPPED)) { + /* +* if we're in a workqueue, the request is orphaned, so +* don't copy into the kernel address space, just free +*/ + if (current-mm) + ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs, +bmd-nr_sgvecs, bio_data_dir(bio) == READ, +0, bmd-is_our_pages); + else if (bmd-is_our_pages) + bio_for_each_segment_all(bvec, bio, i) + __free_page(bvec-bv_page); + } bio_free_map_data(bmd); bio_put(bio); return ret; -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to
Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On Mon, Aug 5, 2013 at 4:31 PM, James Bottomley james.bottom...@hansenpartnership.com wrote: I agree with the analysis. The fix is a bit draconian, though. A workqueue actually runs in a kernel thread and there's a simple test for that (!current-mm), so how about this instead (which is much less intrusive) --- diff --git a/fs/bio.c b/fs/bio.c index 94bbc04..e2ab39c 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs, int bio_uncopy_user(struct bio *bio) { struct bio_map_data *bmd = bio-bi_private; - int ret = 0; + struct bio_vec *bvec; + int ret = 0, i; - if (!bio_flagged(bio, BIO_NULL_MAPPED)) - ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs, -bmd-nr_sgvecs, bio_data_dir(bio) == READ, -0, bmd-is_our_pages); + if (!bio_flagged(bio, BIO_NULL_MAPPED)) { + /* +* if we're in a workqueue, the request is orphaned, so +* don't copy into the kernel address space, just free +*/ + if (current-mm) + ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs, +bmd-nr_sgvecs, bio_data_dir(bio) == READ, +0, bmd-is_our_pages); + else if (bmd-is_our_pages) + bio_for_each_segment_all(bvec, bio, i) + __free_page(bvec-bv_page); + } bio_free_map_data(bmd); bio_put(bio); return ret; Yes, looks reasonable -- I can't think of any reason why anyone would ever want the bio code to copy to a random userspace address space. Acked-by: Roland Dreier rol...@purestorage.com -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
On Mon, 2013-08-05 at 16:38 -0700, Roland Dreier wrote: On Mon, Aug 5, 2013 at 4:31 PM, James Bottomley james.bottom...@hansenpartnership.com wrote: I agree with the analysis. The fix is a bit draconian, though. A workqueue actually runs in a kernel thread and there's a simple test for that (!current-mm), so how about this instead (which is much less intrusive) --- diff --git a/fs/bio.c b/fs/bio.c index 94bbc04..e2ab39c 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs, int bio_uncopy_user(struct bio *bio) { struct bio_map_data *bmd = bio-bi_private; - int ret = 0; + struct bio_vec *bvec; + int ret = 0, i; - if (!bio_flagged(bio, BIO_NULL_MAPPED)) - ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs, -bmd-nr_sgvecs, bio_data_dir(bio) == READ, -0, bmd-is_our_pages); + if (!bio_flagged(bio, BIO_NULL_MAPPED)) { + /* +* if we're in a workqueue, the request is orphaned, so +* don't copy into the kernel address space, just free +*/ + if (current-mm) + ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs, +bmd-nr_sgvecs, bio_data_dir(bio) == READ, +0, bmd-is_our_pages); + else if (bmd-is_our_pages) + bio_for_each_segment_all(bvec, bio, i) + __free_page(bvec-bv_page); + } bio_free_map_data(bmd); bio_put(bio); return ret; Yes, looks reasonable -- I can't think of any reason why anyone would ever want the bio code to copy to a random userspace address space. Acked-by: Roland Dreier rol...@purestorage.com You did all the work ... just replace this patch with your previous one and keep the original tags. (test it first, of course ...) James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
Roland, When this sg code was originally designed, there wasn't a bio in sight :-) Now I'm trying to get my head around this. We have launched a data-in SCSI command like READ(10) and the DMA is underway so we are waiting for a done indication. Instead we receive a signal interruption. It is not clear to me why that DMA would not just keep going unless we can get to something that can stop or redirect the DMA. That something is more likely to be the low level driver being used rather than the block layer. In the original design to cope with this the destination pages were locked in memory until the DMA completed. So originally the design was to allow for this case at the top of the waterfall. Now it seems there is bio magic going on half way down the waterfall in the case of a signal interruption. BTW, the keep_orphan logic probably only works for the asynchronous sg interface (i.e. write sg_io_hdr then read response) rather the the synchronous SG_IO ioctl. To support the keep_orphan the user would need to do a read() on the sg device after the SG_IO ioctl was interrupted. Anyway, this obviously needs to be fixed. Doug Gilbert On 13-08-05 06:02 PM, Roland Dreier wrote: From: Roland Dreier rol...@purestorage.com There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp-read_wait, (srp_done(sfp, srp) || sdp-detached)); but neither srp_done() nor sdp-detached is true, so we end up just setting srp-orphan and returning to userspace: srp-orphan = 1; write_unlock_irq(sfp-rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. - Eventually, the SCSI command issued by the first ioctl finishes and ends up in sg_rq_end_io(). At the end of that function, we run through: write_lock_irqsave(sfp-rq_list_lock, iflags); if (unlikely(srp-orphan)) { if (sfp-keep_orphan) srp-sg_io_owned = 0; else done = 0; } srp-done = done; write_unlock_irqrestore(sfp-rq_list_lock, iflags); if (likely(done)) { /* Now wake up any sg_read() that is waiting for this * packet. */ wake_up_interruptible(sfp-read_wait); kill_fasync(sfp-async_qp, SIGPOLL, POLL_IN); kref_put(sfp-f_ref, sg_remove_sfp); } else { INIT_WORK(srp-ew.work, sg_rq_end_io_usercontext); schedule_work(srp-ew.work); } Since srp-orphan *is* set, we set done to 0 (assuming the userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() to run in a workqueue. - In workqueue context we go through sg_rq_end_io_usercontext() - sg_finish_rem_req() - blk_rq_unmap_user() - ... - bio_uncopy_user() - __bio_copy_iov() - copy_to_user(). The key point here is that we are doing copy_to_user() on a workqueue -- that is, we're on a kernel thread with current-mm equal to whatever random previous user process was scheduled before this kernel thread. So we end up copying whatever data the SCSI command returned to the virtual address of the buffer passed into the original ioctl, but it's quite likely we do this copying into a different address space! Fix this by telling sg_finish_rem_req() whether we're on a workqueue or not, and if we are, calling a new function blk_rq_unmap_user_nocopy() that does everything the original blk_rq_unmap_user() does except calling copy_{to,from}_user(). This requires a few levels of plumbing through a copy flag in the bio layer. I also considered fixing this by having the sg code just set BIO_NULL_MAPPED for bios that are unmapped from a workqueue, which happens to work because the __free_page() part of __bio_copy_iov() isn't needed for sg (because sg handles its own pages). However, this seems coincidental and fragile, so I preferred making the fix explicit, at the cost of minor tweaks to the bio code. Huge thanks to Costa Sapuntzakis co...@purestorage.com for the original pointer to this bug in the sg code. Signed-off-by: Roland Dreier
Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
2013/8/5 Roland Dreier rol...@kernel.org: From: Roland Dreier rol...@purestorage.com There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances leads to one process writing data into the address space of some other random unrelated process if the ioctl is interrupted by a signal. What happens is the following: - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the underlying SCSI command will transfer data from the SCSI device to the buffer provided in the ioctl) - Before the command finishes, a signal is sent to the process waiting in the ioctl. This will end up waking up the sg_ioctl() code: result = wait_event_interruptible(sfp-read_wait, (srp_done(sfp, srp) || sdp-detached)); but neither srp_done() nor sdp-detached is true, so we end up just setting srp-orphan and returning to userspace: srp-orphan = 1; write_unlock_irq(sfp-rq_list_lock); return result; /* -ERESTARTSYS because signal hit process */ At this point the original process is done with the ioctl and blithely goes ahead handling the signal, reissuing the ioctl, etc. i think that an additional issue here is that part of reissuing the ioctl is re-queueing the command. since the re-queue is at the front of the block queue there are issues if the command is non-idempotent. we have a local fix that gets rid of most of the orphan stuff and re-waiting if a non-fatal signal was waiting. simpler than unmapping but maybe we're missing some other interesting case? \p -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html