Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-06 Thread Douglas Gilbert

On 13-08-05 11:54 PM, Peter Chang wrote:

2013/8/5 Roland Dreier rol...@kernel.org:

From: Roland Dreier rol...@purestorage.com

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

  - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
underlying SCSI command will transfer data from the SCSI device to
the buffer provided in the ioctl)

  - Before the command finishes, a signal is sent to the process waiting
in the ioctl.  This will end up waking up the sg_ioctl() code:

 result = wait_event_interruptible(sfp-read_wait,
 (srp_done(sfp, srp) || sdp-detached));

but neither srp_done() nor sdp-detached is true, so we end up just
setting srp-orphan and returning to userspace:

 srp-orphan = 1;
 write_unlock_irq(sfp-rq_list_lock);
 return result;  /* -ERESTARTSYS because signal hit process */

At this point the original process is done with the ioctl and
blithely goes ahead handling the signal, reissuing the ioctl, etc.


i think that an additional issue here is that part of reissuing the
ioctl is re-queueing the command. since the re-queue is at the front
of the block queue there are issues if the command is non-idempotent.


Re-issuing a SG_IO ioctl is wrong. More and more SCSI commands (even
in SBC-3) are non-idempotent (e.g. COMPARE AND WRITE). And the st
driver gets to use the block layer as well and many of its important
SCSI commands (SSC) are non-idempotent.


we have a local fix that gets rid of most of the orphan stuff and
re-waiting if a non-fatal signal was waiting. simpler than unmapping
but maybe we're missing some other interesting case?


Like to share that fix with us?

Also I'm interested in how you know from within a kernel driver
whether a signal sent to the controlling process is fatal or
not? For example SIGIO's default action is terminate but sg
assumes if the controlling process requests SIGIO generation
then it will at least override that default action and
handle it sensibly. Is there a way to check that assumption?

Looked at bsg in the situation where a signal interrupts a
SG_IO ioctl. It seems broken; anyone like to comment on this
snippet from bsg if a signal hits the first call:
blk_execute_rq(bd-queue, NULL, rq, at_head);
ret = blk_complete_sgv4_hdr_rq(rq, hdr, bio, bidi_bio);
if (copy_to_user(uarg, hdr, sizeof(hdr)))
return -EFAULT;


As an aside, I got tired of handling signals during SCSI commands
in the ddpt utility, especially after adding tape support. So
it now masks all the usual suspects during the IO then checks
for signals in a small window between IOs. Non maskable signals
will still terminate but of course the user gets no guarantees,
but it would be still reasonable in the termination case that
the interrupted SCSI command was _not_ resubmitted.

Doug Gilbert

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Roland Dreier
From: Roland Dreier rol...@purestorage.com

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

 - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
   underlying SCSI command will transfer data from the SCSI device to
   the buffer provided in the ioctl)

 - Before the command finishes, a signal is sent to the process waiting
   in the ioctl.  This will end up waking up the sg_ioctl() code:

result = wait_event_interruptible(sfp-read_wait,
(srp_done(sfp, srp) || sdp-detached));

   but neither srp_done() nor sdp-detached is true, so we end up just
   setting srp-orphan and returning to userspace:

srp-orphan = 1;
write_unlock_irq(sfp-rq_list_lock);
return result;  /* -ERESTARTSYS because signal hit process */

   At this point the original process is done with the ioctl and
   blithely goes ahead handling the signal, reissuing the ioctl, etc.

 - Eventually, the SCSI command issued by the first ioctl finishes and
   ends up in sg_rq_end_io().  At the end of that function, we run through:

write_lock_irqsave(sfp-rq_list_lock, iflags);
if (unlikely(srp-orphan)) {
if (sfp-keep_orphan)
srp-sg_io_owned = 0;
else
done = 0;
}
srp-done = done;
write_unlock_irqrestore(sfp-rq_list_lock, iflags);

if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
 * packet.
 */
wake_up_interruptible(sfp-read_wait);
kill_fasync(sfp-async_qp, SIGPOLL, POLL_IN);
kref_put(sfp-f_ref, sg_remove_sfp);
} else {
INIT_WORK(srp-ew.work, sg_rq_end_io_usercontext);
schedule_work(srp-ew.work);
}

   Since srp-orphan *is* set, we set done to 0 (assuming the
   userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
   ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
   to run in a workqueue.

 - In workqueue context we go through sg_rq_end_io_usercontext() -
   sg_finish_rem_req() - blk_rq_unmap_user() - ... -
   bio_uncopy_user() - __bio_copy_iov() - copy_to_user().

   The key point here is that we are doing copy_to_user() on a
   workqueue -- that is, we're on a kernel thread with current-mm
   equal to whatever random previous user process was scheduled before
   this kernel thread.  So we end up copying whatever data the SCSI
   command returned to the virtual address of the buffer passed into
   the original ioctl, but it's quite likely we do this copying into a
   different address space!

Fix this by telling sg_finish_rem_req() whether we're on a workqueue
or not, and if we are, calling a new function blk_rq_unmap_user_nocopy()
that does everything the original blk_rq_unmap_user() does except
calling copy_{to,from}_user().  This requires a few levels of plumbing
through a copy flag in the bio layer.

I also considered fixing this by having the sg code just set
BIO_NULL_MAPPED for bios that are unmapped from a workqueue, which
happens to work because the __free_page() part of __bio_copy_iov()
isn't needed for sg (because sg handles its own pages).  However, this
seems coincidental and fragile, so I preferred making the fix
explicit, at the cost of minor tweaks to the bio code.

Huge thanks to Costa Sapuntzakis co...@purestorage.com for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier rol...@purestorage.com
Cc: sta...@vger.kernel.org
---
 block/blk-map.c| 15 ---
 drivers/scsi/sg.c  | 19 ++-
 fs/bio.c   | 22 +++---
 include/linux/bio.h|  2 +-
 include/linux/blkdev.h | 11 ++-
 5 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 623e1cd..bd63201 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -25,7 +25,7 @@ int blk_rq_append_bio(struct request_queue *q, struct request 
*rq,
return 0;
 }
 
-static int __blk_rq_unmap_user(struct bio *bio)
+static int __blk_rq_unmap_user(struct bio *bio, bool copy)
 {
int ret = 0;
 
@@ -33,7 +33,7 @@ static int __blk_rq_unmap_user(struct bio *bio)
if (bio_flagged(bio, BIO_USER_MAPPED))
bio_unmap_user(bio);
else
-   ret = bio_uncopy_user(bio);
+   ret = bio_uncopy_user(bio, copy);
}
 
return ret;
@@ -80,7 +80,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct 
request *rq,
 
/* if it was boucned we must call the end io function */
bio_endio(bio, 0);
-   

Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread James Bottomley
On Mon, 2013-08-05 at 15:02 -0700, Roland Dreier wrote:
 From: Roland Dreier rol...@purestorage.com
 
 There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
 leads to one process writing data into the address space of some other
 random unrelated process if the ioctl is interrupted by a signal.
 What happens is the following:
 
  - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
underlying SCSI command will transfer data from the SCSI device to
the buffer provided in the ioctl)
 
  - Before the command finishes, a signal is sent to the process waiting
in the ioctl.  This will end up waking up the sg_ioctl() code:
 
   result = wait_event_interruptible(sfp-read_wait,
   (srp_done(sfp, srp) || sdp-detached));
 
but neither srp_done() nor sdp-detached is true, so we end up just
setting srp-orphan and returning to userspace:
 
   srp-orphan = 1;
   write_unlock_irq(sfp-rq_list_lock);
   return result;  /* -ERESTARTSYS because signal hit process */
 
At this point the original process is done with the ioctl and
blithely goes ahead handling the signal, reissuing the ioctl, etc.
 
  - Eventually, the SCSI command issued by the first ioctl finishes and
ends up in sg_rq_end_io().  At the end of that function, we run through:
 
   write_lock_irqsave(sfp-rq_list_lock, iflags);
   if (unlikely(srp-orphan)) {
   if (sfp-keep_orphan)
   srp-sg_io_owned = 0;
   else
   done = 0;
   }
   srp-done = done;
   write_unlock_irqrestore(sfp-rq_list_lock, iflags);
 
   if (likely(done)) {
   /* Now wake up any sg_read() that is waiting for this
* packet.
*/
   wake_up_interruptible(sfp-read_wait);
   kill_fasync(sfp-async_qp, SIGPOLL, POLL_IN);
   kref_put(sfp-f_ref, sg_remove_sfp);
   } else {
   INIT_WORK(srp-ew.work, sg_rq_end_io_usercontext);
   schedule_work(srp-ew.work);
   }
 
Since srp-orphan *is* set, we set done to 0 (assuming the
userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
to run in a workqueue.
 
  - In workqueue context we go through sg_rq_end_io_usercontext() -
sg_finish_rem_req() - blk_rq_unmap_user() - ... -
bio_uncopy_user() - __bio_copy_iov() - copy_to_user().
 
The key point here is that we are doing copy_to_user() on a
workqueue -- that is, we're on a kernel thread with current-mm
equal to whatever random previous user process was scheduled before
this kernel thread.  So we end up copying whatever data the SCSI
command returned to the virtual address of the buffer passed into
the original ioctl, but it's quite likely we do this copying into a
different address space!
 
 Fix this by telling sg_finish_rem_req() whether we're on a workqueue
 or not, and if we are, calling a new function blk_rq_unmap_user_nocopy()
 that does everything the original blk_rq_unmap_user() does except
 calling copy_{to,from}_user().  This requires a few levels of plumbing
 through a copy flag in the bio layer.

I agree with the analysis.  The fix is a bit draconian, though.  A
workqueue actually runs in a kernel thread and there's a simple test for
that (!current-mm), so how about this instead (which is much less
intrusive)

James

---

diff --git a/fs/bio.c b/fs/bio.c
index 94bbc04..e2ab39c 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
bio_vec *iovecs,
 int bio_uncopy_user(struct bio *bio)
 {
struct bio_map_data *bmd = bio-bi_private;
-   int ret = 0;
+   struct bio_vec *bvec;
+   int ret = 0, i;
 
-   if (!bio_flagged(bio, BIO_NULL_MAPPED))
-   ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs,
-bmd-nr_sgvecs, bio_data_dir(bio) == READ,
-0, bmd-is_our_pages);
+   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
+   /*
+* if we're in a workqueue, the request is orphaned, so
+* don't copy into the kernel address space, just free
+*/
+   if (current-mm)
+   ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs,
+bmd-nr_sgvecs, bio_data_dir(bio) 
== READ,
+0, bmd-is_our_pages);
+   else if (bmd-is_our_pages)
+   bio_for_each_segment_all(bvec, bio, i)
+   __free_page(bvec-bv_page);
+   }
bio_free_map_data(bmd);
bio_put(bio);
return ret;


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to 

Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Roland Dreier
On Mon, Aug 5, 2013 at 4:31 PM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
 I agree with the analysis.  The fix is a bit draconian, though.  A
 workqueue actually runs in a kernel thread and there's a simple test for
 that (!current-mm), so how about this instead (which is much less
 intrusive)

 ---

 diff --git a/fs/bio.c b/fs/bio.c
 index 94bbc04..e2ab39c 100644
 --- a/fs/bio.c
 +++ b/fs/bio.c
 @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
 bio_vec *iovecs,
  int bio_uncopy_user(struct bio *bio)
  {
 struct bio_map_data *bmd = bio-bi_private;
 -   int ret = 0;
 +   struct bio_vec *bvec;
 +   int ret = 0, i;

 -   if (!bio_flagged(bio, BIO_NULL_MAPPED))
 -   ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs,
 -bmd-nr_sgvecs, bio_data_dir(bio) == 
 READ,
 -0, bmd-is_our_pages);
 +   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
 +   /*
 +* if we're in a workqueue, the request is orphaned, so
 +* don't copy into the kernel address space, just free
 +*/
 +   if (current-mm)
 +   ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs,
 +bmd-nr_sgvecs, 
 bio_data_dir(bio) == READ,
 +0, bmd-is_our_pages);
 +   else if (bmd-is_our_pages)
 +   bio_for_each_segment_all(bvec, bio, i)
 +   __free_page(bvec-bv_page);
 +   }
 bio_free_map_data(bmd);
 bio_put(bio);
 return ret;

Yes, looks reasonable -- I can't think of any reason why anyone would
ever want the bio code to copy to a random userspace address space.

Acked-by: Roland Dreier rol...@purestorage.com
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread James Bottomley
On Mon, 2013-08-05 at 16:38 -0700, Roland Dreier wrote:
 On Mon, Aug 5, 2013 at 4:31 PM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  I agree with the analysis.  The fix is a bit draconian, though.  A
  workqueue actually runs in a kernel thread and there's a simple test for
  that (!current-mm), so how about this instead (which is much less
  intrusive)
 
  ---
 
  diff --git a/fs/bio.c b/fs/bio.c
  index 94bbc04..e2ab39c 100644
  --- a/fs/bio.c
  +++ b/fs/bio.c
  @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
  bio_vec *iovecs,
   int bio_uncopy_user(struct bio *bio)
   {
  struct bio_map_data *bmd = bio-bi_private;
  -   int ret = 0;
  +   struct bio_vec *bvec;
  +   int ret = 0, i;
 
  -   if (!bio_flagged(bio, BIO_NULL_MAPPED))
  -   ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs,
  -bmd-nr_sgvecs, bio_data_dir(bio) == 
  READ,
  -0, bmd-is_our_pages);
  +   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
  +   /*
  +* if we're in a workqueue, the request is orphaned, so
  +* don't copy into the kernel address space, just free
  +*/
  +   if (current-mm)
  +   ret = __bio_copy_iov(bio, bmd-iovecs, bmd-sgvecs,
  +bmd-nr_sgvecs, 
  bio_data_dir(bio) == READ,
  +0, bmd-is_our_pages);
  +   else if (bmd-is_our_pages)
  +   bio_for_each_segment_all(bvec, bio, i)
  +   __free_page(bvec-bv_page);
  +   }
  bio_free_map_data(bmd);
  bio_put(bio);
  return ret;
 
 Yes, looks reasonable -- I can't think of any reason why anyone would
 ever want the bio code to copy to a random userspace address space.
 
 Acked-by: Roland Dreier rol...@purestorage.com

You did all the work ... just replace this patch with your previous one
and keep the original tags. (test it first, of course ...)

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Douglas Gilbert

Roland,
When this sg code was originally designed, there wasn't a bio
in sight :-)

Now I'm trying to get my head around this. We have launched
a data-in SCSI command like READ(10) and the DMA is underway
so we are waiting for a done indication. Instead we receive
a signal interruption. It is not clear to me why that DMA
would not just keep going unless we can get to something that
can stop or redirect the DMA. That something is more likely to
be the low level driver being used rather than the block layer.
In the original design to cope with this the destination pages
were locked in memory until the DMA completed.

So originally the design was to allow for this case at the top
of the waterfall. Now it seems there is bio magic going on
half way down the waterfall in the case of a signal interruption.


BTW, the keep_orphan logic probably only works for the
asynchronous sg interface (i.e. write sg_io_hdr then read response)
rather the the synchronous SG_IO ioctl. To support the keep_orphan
the user would need to do a read() on the sg device after the
SG_IO ioctl was interrupted.


Anyway, this obviously needs to be fixed.

Doug Gilbert

On 13-08-05 06:02 PM, Roland Dreier wrote:

From: Roland Dreier rol...@purestorage.com

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

  - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
underlying SCSI command will transfer data from the SCSI device to
the buffer provided in the ioctl)

  - Before the command finishes, a signal is sent to the process waiting
in the ioctl.  This will end up waking up the sg_ioctl() code:

result = wait_event_interruptible(sfp-read_wait,
(srp_done(sfp, srp) || sdp-detached));

but neither srp_done() nor sdp-detached is true, so we end up just
setting srp-orphan and returning to userspace:

srp-orphan = 1;
write_unlock_irq(sfp-rq_list_lock);
return result;  /* -ERESTARTSYS because signal hit process */

At this point the original process is done with the ioctl and
blithely goes ahead handling the signal, reissuing the ioctl, etc.

  - Eventually, the SCSI command issued by the first ioctl finishes and
ends up in sg_rq_end_io().  At the end of that function, we run through:

write_lock_irqsave(sfp-rq_list_lock, iflags);
if (unlikely(srp-orphan)) {
if (sfp-keep_orphan)
srp-sg_io_owned = 0;
else
done = 0;
}
srp-done = done;
write_unlock_irqrestore(sfp-rq_list_lock, iflags);

if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
 * packet.
 */
wake_up_interruptible(sfp-read_wait);
kill_fasync(sfp-async_qp, SIGPOLL, POLL_IN);
kref_put(sfp-f_ref, sg_remove_sfp);
} else {
INIT_WORK(srp-ew.work, sg_rq_end_io_usercontext);
schedule_work(srp-ew.work);
}

Since srp-orphan *is* set, we set done to 0 (assuming the
userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
to run in a workqueue.

  - In workqueue context we go through sg_rq_end_io_usercontext() -
sg_finish_rem_req() - blk_rq_unmap_user() - ... -
bio_uncopy_user() - __bio_copy_iov() - copy_to_user().

The key point here is that we are doing copy_to_user() on a
workqueue -- that is, we're on a kernel thread with current-mm
equal to whatever random previous user process was scheduled before
this kernel thread.  So we end up copying whatever data the SCSI
command returned to the virtual address of the buffer passed into
the original ioctl, but it's quite likely we do this copying into a
different address space!

Fix this by telling sg_finish_rem_req() whether we're on a workqueue
or not, and if we are, calling a new function blk_rq_unmap_user_nocopy()
that does everything the original blk_rq_unmap_user() does except
calling copy_{to,from}_user().  This requires a few levels of plumbing
through a copy flag in the bio layer.

I also considered fixing this by having the sg code just set
BIO_NULL_MAPPED for bios that are unmapped from a workqueue, which
happens to work because the __free_page() part of __bio_copy_iov()
isn't needed for sg (because sg handles its own pages).  However, this
seems coincidental and fragile, so I preferred making the fix
explicit, at the cost of minor tweaks to the bio code.

Huge thanks to Costa Sapuntzakis co...@purestorage.com for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier 

Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Peter Chang
2013/8/5 Roland Dreier rol...@kernel.org:
 From: Roland Dreier rol...@purestorage.com

 There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
 leads to one process writing data into the address space of some other
 random unrelated process if the ioctl is interrupted by a signal.
 What happens is the following:

  - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
underlying SCSI command will transfer data from the SCSI device to
the buffer provided in the ioctl)

  - Before the command finishes, a signal is sent to the process waiting
in the ioctl.  This will end up waking up the sg_ioctl() code:

 result = wait_event_interruptible(sfp-read_wait,
 (srp_done(sfp, srp) || sdp-detached));

but neither srp_done() nor sdp-detached is true, so we end up just
setting srp-orphan and returning to userspace:

 srp-orphan = 1;
 write_unlock_irq(sfp-rq_list_lock);
 return result;  /* -ERESTARTSYS because signal hit process */

At this point the original process is done with the ioctl and
blithely goes ahead handling the signal, reissuing the ioctl, etc.

i think that an additional issue here is that part of reissuing the
ioctl is re-queueing the command. since the re-queue is at the front
of the block queue there are issues if the command is non-idempotent.

we have a local fix that gets rid of most of the orphan stuff and
re-waiting if a non-fatal signal was waiting. simpler than unmapping
but maybe we're missing some other interesting case?

\p
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html