Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-22 Thread Sagi Grimberg



Hello Christoph,

The comment about locality in the above quote is interesting. How about
modifying patch 2/9 as indicated below ? The modification below does not
change the behavior of this patch if ib_cq.w.cpu is not modified. And it
allows users who care about locality and who want to skip the scheduler
overhead by setting ib_cq.w.cpu to the index of the CPU they want the
work to be processed on.


That sounds acceptable...
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-22 Thread Christoph Hellwig
On Sun, Nov 22, 2015 at 11:51:13AM +0200, Sagi Grimberg wrote:
>
>> Hello Christoph,
>>
>> The comment about locality in the above quote is interesting. How about
>> modifying patch 2/9 as indicated below ? The modification below does not
>> change the behavior of this patch if ib_cq.w.cpu is not modified. And it
>> allows users who care about locality and who want to skip the scheduler
>> overhead by setting ib_cq.w.cpu to the index of the CPU they want the
>> work to be processed on.
>
> That sounds acceptable...

Wouldn't it be a better idea to set the WQ_SYSFS interface and use
the standard sysfs interface for specifying cpumasks or node affinity?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-22 Thread Sagi Grimberg



Wouldn't it be a better idea to set the WQ_SYSFS interface and use
the standard sysfs interface for specifying cpumasks or node affinity?


I think that bart wants to allow the caller to select cpu affinity
per CQ. In this case ib_alloc_cq in workqueue mode would need to
accept a affinity_hint from the caller (default to wild-card 
WORK_CPU_UNBOUND).

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-22 Thread Christoph Hellwig
On Sun, Nov 22, 2015 at 12:36:00PM +0200, Sagi Grimberg wrote:
>
>> Wouldn't it be a better idea to set the WQ_SYSFS interface and use
>> the standard sysfs interface for specifying cpumasks or node affinity?
>
> I think that bart wants to allow the caller to select cpu affinity
> per CQ. In this case ib_alloc_cq in workqueue mode would need to
> accept a affinity_hint from the caller (default to wild-card 
> WORK_CPU_UNBOUND).

Hmm, true.  How would be set that hint from userspace?  I'd really prefer
to see a practical justification for it first.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-22 Thread Sagi Grimberg




I think that bart wants to allow the caller to select cpu affinity
per CQ. In this case ib_alloc_cq in workqueue mode would need to
accept a affinity_hint from the caller (default to wild-card
WORK_CPU_UNBOUND).


Hmm, true.  How would be set that hint from userspace?  I'd really prefer
to see a practical justification for it first.


In order to assign CPUs from user-space we'd need an ethtool like
interface for isert/srpt/. Given that this is something we don't
want to get into right now, I assumed that Bart meant that srpt
would take a "least used" approach from srpt driver (which isn't better
taking the wild-card option I'd say), So I'll let Bart answer...
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] scsi_transport_fc: Introduce scsi_host_{get,put}()

2015-11-22 Thread Christoph Hellwig
On Fri, Nov 20, 2015 at 01:33:04PM -0800, Bart Van Assche wrote:
> Use scsi_host_{get,put}() instead of open-coding these functions.
> Compile-tested only.

s/Introdue/use/ in the subject?

Otherwise this looks fine:

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix a memory leak in scsi_host_dev_release()

2015-11-22 Thread Christoph Hellwig
On Fri, Nov 20, 2015 at 09:49:42AM -0800, Bart Van Assche wrote:
> On 11/20/2015 03:52 AM, Christoph Hellwig wrote:
> >the memory leak looks real, and your fix looks corret, but I still
> >don't like it.
> >
> >I think it's reasonable for SCSI to assume that the final put_device
> >fully frees the struct device including the name pointer that is
> >assigned entirely behind the back of the caller.
> >
> >So I think the fix for this probably should be in the driver core.
> 
> Hello Christoph,
> 
> Thanks for the feedback. However, I'm not sure this can be fixed by
> modifying the driver core. If scsi_host_remove() is not called the SCSI core
> doesn't call put_device(&shost->shost_dev). I will post a second version of
> this patch that ensures that the SCSI core always calls
> put_device(&shost->shost_dev).


Oh, I see.  The release method is called on shost_gendev, but the
name that needs to be freed is in shost_dev.  I take my comment on the
core back.

Let's get this patch in for now and see if we can do something about the
creative driver model (ab-)use for struct Scsi_Host in the long run.

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/9] IB: add a proper completion queue abstraction

2015-11-22 Thread Bart Van Assche

On 11/22/15 06:57, Sagi Grimberg wrote:

I think that bart wants to allow the caller to select cpu affinity
per CQ. In this case ib_alloc_cq in workqueue mode would need to
accept a affinity_hint from the caller (default to wild-card
WORK_CPU_UNBOUND).


Hmm, true.  How would be set that hint from userspace?  I'd really prefer
to see a practical justification for it first.


In order to assign CPUs from user-space we'd need an ethtool like
interface for isert/srpt/. Given that this is something we don't
want to get into right now, I assumed that Bart meant that srpt
would take a "least used" approach from srpt driver (which isn't better
taking the wild-card option I'd say), So I'll let Bart answer...


Hello Christoph and Sagi,

My intention is indeed to allow to control CPU affinity per CQ. One use 
case is to implement a least-used policy in RDMA drivers that use 
multiple completion queues. Another use case is to make CPU affinity 
configurable from user space through something similar to ethtool or via 
sysfs.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESEND] scsi_debug: fix prevent_allow+verify regressions

2015-11-22 Thread Douglas Gilbert

Ruediger Meier observed a regression with the PREVENT ALLOW
MEDIUM REMOVAL command in lk 3.19:
  http://www.spinics.net/lists/util-linux-ng/msg11448.html

Inspection indicated the same regression with VERIFY(10).

The patch is against lk 3.19.3 and also works with lk 4.3.0 .
With this patch both commands are accepted and do nothing.

ChangeLog:
  - fix the lk 3.19 regression so that the PREVENT ALLOW
MEDIUM REMOVAL command is supported once again
  - same fix for VERIFY(10)

Signed-off-by: Douglas Gilbert 
--- a/drivers/scsi/scsi_debug.c	2015-02-11 17:47:09.276206425 -0500
+++ b/drivers/scsi/scsi_debug.c	2015-04-03 22:42:36.343971372 -0400
@@ -455,8 +455,9 @@ static const struct opcode_info_t opcode
 	 0} },
 	{0, 0, 0, F_INV_OP | FF_RESPOND, NULL, NULL, /* MAINT OUT */
 	{0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} },
-	{0, 0, 0, F_INV_OP | FF_RESPOND, NULL, NULL, /* VERIFY */
-	{0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} },
+	{0, 0x2f, 0, F_D_OUT_MAYBE | FF_DIRECT_IO, NULL, NULL, /* VERIFY(10) */
+	{10,  0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xc7,
+	 0, 0, 0, 0, 0, 0} },
 	{1, 0x7f, 0x9, F_SA_HIGH | F_D_IN | FF_DIRECT_IO, resp_read_dt0,
 	vl_iarr, {32,  0xc7, 0, 0, 0, 0, 0x1f, 0x18, 0x0, 0x9, 0xfe, 0,
 		  0xff, 0xff, 0xff, 0xff} },/* VARIABLE LENGTH, READ(32) */
@@ -467,8 +468,8 @@ static const struct opcode_info_t opcode
 	{10,  0x13, 0xff, 0xff, 0, 0, 0, 0xff, 0xff, 0xc7, 0, 0, 0, 0, 0,
 	 0} },
 /* 20 */
-	{0, 0, 0, F_INV_OP | FF_RESPOND, NULL, NULL, /* ALLOW REMOVAL */
-	{0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} },
+	{0, 0x1e, 0, 0, NULL, NULL, /* ALLOW REMOVAL */
+	{6,  0, 0, 0, 0x3, 0xc7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} },
 	{0, 0x1, 0, 0, resp_start_stop, NULL, /* REWIND ?? */
 	{6,  0x1, 0, 0, 0, 0xc7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} },
 	{0, 0, 0, F_INV_OP | FF_RESPOND, NULL, NULL, /* ATA_PT */


Re: [PATCH v2 1/2] scsi: lpfc: Use kzalloc instead of kmalloc

2015-11-22 Thread Sebastian Herbszt
Punit Vara wrote:
> This patch is to the lpfc_els.c which resolves following warning
> reported by coccicheck:
> 
> WARNING: kzalloc should be used for rdp_context, instead of
> kmalloc/memset
> 
> Signed-off-by: Punit Vara 
> ---
>  drivers/scsi/lpfc/lpfc_els.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
> index 36bf58b..9729ab1 100644
> --- a/drivers/scsi/lpfc/lpfc_els.c
> +++ b/drivers/scsi/lpfc/lpfc_els.c
> @@ -4990,13 +4990,12 @@ lpfc_els_rcv_rdp(struct lpfc_vport *vport, struct 
> lpfc_iocbq *cmdiocb,
>   if (RDP_NPORT_ID_SIZE !=
>   be32_to_cpu(rdp_req->nport_id_desc.length))
>   goto rjt_logerr;
> - rdp_context = kmalloc(sizeof(struct lpfc_rdp_context), GFP_KERNEL);
> + rdp_context = kzalloc(sizeof(struct lpfc_rdp_context), GFP_KERNEL);
>   if (!rdp_context) {
>   rjt_err = LSRJT_UNABLE_TPC;
>   goto error;
>   }
>  
> - memset(rdp_context, 0, sizeof(struct lpfc_rdp_context));
>   cmd = &cmdiocb->iocb;
>   rdp_context->ndlp = lpfc_nlp_get(ndlp);
>   rdp_context->ox_id = cmd->unsli3.rcvsli3.ox_id;

Reviewed-by: Sebastian Herbszt 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] lpfc: replaced kmalloc + memset with kzalloc

2015-11-22 Thread Sebastian Herbszt
Saurabh Sengar wrote:
> replacing kmalloc and memset by a single call of kzalloc
> 
> Signed-off-by: Saurabh Sengar 
> ---
> v2 : I didn't got any response for my initial patch,
> I am sending it again on top of latest kernel(today's)
> 
>  drivers/scsi/lpfc/lpfc_els.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
> index b6fa257..92dd204 100644
> --- a/drivers/scsi/lpfc/lpfc_els.c
> +++ b/drivers/scsi/lpfc/lpfc_els.c
> @@ -4956,13 +4956,12 @@ lpfc_els_rcv_rdp(struct lpfc_vport *vport, struct 
> lpfc_iocbq *cmdiocb,
>   if (RDP_NPORT_ID_SIZE !=
>   be32_to_cpu(rdp_req->nport_id_desc.length))
>   goto rjt_logerr;
> - rdp_context = kmalloc(sizeof(struct lpfc_rdp_context), GFP_KERNEL);
> + rdp_context = kzalloc(sizeof(struct lpfc_rdp_context), GFP_KERNEL);
>   if (!rdp_context) {
>   rjt_err = LSRJT_UNABLE_TPC;
>   goto error;
>   }
>  
> - memset(rdp_context, 0, sizeof(struct lpfc_rdp_context));
>   cmd = &cmdiocb->iocb;
>   rdp_context->ndlp = lpfc_nlp_get(ndlp);
>   rdp_context->ox_id = cmd->unsli3.rcvsli3.ox_id;

Same patch was already sent by Punit Vara on 2015-10-24.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Mark Salter
On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> On Sat, 21 Nov 2015 12:30:14 +0100
> Laurent Dufour  wrote:
> 
> > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > 
> > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > merge bios to request infrastucture where we don't obey the merging
> > > > limits properly.
> > > > 
> > > > Does either of you have a known good and first known bad kernel?
> > > 
> > > Not me, I've only hit it one or two times. All I can say is I have hit it 
> > > in
> > > 4.4-rc1.
> > > 
> > > Laurent, can you narrow it down at all?
> > 
> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > setup bi_phys_segments after splitting") which has been pulled by the
> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > git://git.kernel.dk/linux-block").
> > 
> > My system is panicing promptly when running a kernel built at
> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > without panicing.
> > 
> > This being said, I can't explain what's going wrong.
> > 
> > May Ming shed some light here ?
> 
> Laurent, looks there is one bug in blk_bio_segment_split(), would you
> mind testing the following patch to see if it fixes your issue?
> 
> ---
> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> From: Ming Lei 
> Date: Sun, 22 Nov 2015 00:47:13 +0800
> Subject: [PATCH] block: fix segment split
> 
> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> always points to the iterator local variable, which is obviously
> wrong, so fix it by pointing to the local variable of 'bvprv'.
> 
> Signed-off-by: Ming Lei 
> ---
>  block/blk-merge.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index de5716d8..f2efe8a 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
> request_queue *q,
>  
>   seg_size += bv.bv_len;
>   bvprv = bv;
> - bvprvp = &bv;
> + bvprvp = &bvprv;
>   sectors += bv.bv_len >> 9;
>   continue;
>   }
> @@ -108,7 +108,7 @@ new_segment:
>  
>   nsegs++;
>   bvprv = bv;
> - bvprvp = &bv;
> + bvprvp = &bvprv;
>   seg_size = bv.bv_len;
>   sectors += bv.bv_len >> 9;
>   }

I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Ming Lei
On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter  wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour  wrote:
>>
>> > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > >
>> > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > merge bios to request infrastucture where we don't obey the merging
>> > > > limits properly.
>> > > >
>> > > > Does either of you have a known good and first known bad kernel?
>> > >
>> > > Not me, I've only hit it one or two times. All I can say is I have hit 
>> > > it in
>> > > 4.4-rc1.
>> > >
>> > > Laurent, can you narrow it down at all?
>> >
>> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
>> > setup bi_phys_segments after splitting") which has been pulled by the
>> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > git://git.kernel.dk/linux-block").
>> >
>> > My system is panicing promptly when running a kernel built at
>> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > without panicing.
>> >
>> > This being said, I can't explain what's going wrong.
>> >
>> > May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei 
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei 
>> ---
>>  block/blk-merge.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
>> request_queue *q,
>>
>>   seg_size += bv.bv_len;
>>   bvprv = bv;
>> - bvprvp = &bv;
>> + bvprvp = &bvprv;
>>   sectors += bv.bv_len >> 9;
>>   continue;
>>   }
>> @@ -108,7 +108,7 @@ new_segment:
>>
>>   nsegs++;
>>   bvprv = bv;
>> - bvprvp = &bv;
>> + bvprvp = &bvprv;
>>   seg_size = bv.bv_len;
>>   sectors += bv.bv_len >> 9;
>>   }
>
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

OK, looks there are still other bugs, care to share us how to reproduce
it on arm64?

thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Mark Salter
On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter  wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> > > On Sat, 21 Nov 2015 12:30:14 +0100
> > > Laurent Dufour  wrote:
> > > 
> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > > > 
> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > > > merge bios to request infrastucture where we don't obey the merging
> > > > > > limits properly.
> > > > > > 
> > > > > > Does either of you have a known good and first known bad kernel?
> > > > > 
> > > > > Not me, I've only hit it one or two times. All I can say is I have 
> > > > > hit it in
> > > > > 4.4-rc1.
> > > > > 
> > > > > Laurent, can you narrow it down at all?
> > > > 
> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > > > setup bi_phys_segments after splitting") which has been pulled by the
> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > > > git://git.kernel.dk/linux-block").
> > > > 
> > > > My system is panicing promptly when running a kernel built at
> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > > > without panicing.
> > > > 
> > > > This being said, I can't explain what's going wrong.
> > > > 
> > > > May Ming shed some light here ?
> > > 
> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> > > mind testing the following patch to see if it fixes your issue?
> > > 
> > > ---
> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> > > From: Ming Lei 
> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> > > Subject: [PATCH] block: fix segment split
> > > 
> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> > > always points to the iterator local variable, which is obviously
> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> > > 
> > > Signed-off-by: Ming Lei 
> > > ---
> > >  block/blk-merge.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > > index de5716d8..f2efe8a 100644
> > > --- a/block/blk-merge.c
> > > +++ b/block/blk-merge.c
> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
> > > request_queue *q,
> > > 
> > >   seg_size += bv.bv_len;
> > >   bvprv = bv;
> > > - bvprvp = &bv;
> > > + bvprvp = &bvprv;
> > >   sectors += bv.bv_len >> 9;
> > >   continue;
> > >   }
> > > @@ -108,7 +108,7 @@ new_segment:
> > > 
> > >   nsegs++;
> > >   bvprv = bv;
> > > - bvprvp = &bv;
> > > + bvprvp = &bvprv;
> > >   seg_size = bv.bv_len;
> > >   sectors += bv.bv_len >> 9;
> > >   }
> > 
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> 
> OK, looks there are still other bugs, care to share us how to reproduce
> it on arm64?
> 
> thanks,
> Ming

Unfortunately, the best reproducer I have is to boot the platform. I have seen 
the
BUG a few times post-boot, but I don't have a consistant reproducer. I am using
upstream 4.4-rc1 with this config:

  http://people.redhat.com/msalter/fh_defconfig

With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 
boots.
On an AMD Seattle platform, about every 9 boots.

I have a script that loops through an ssh command to reboot the platform under 
test.
I manually install test kernels and then run the script and wait for failure. 
While
debugging, I have tried more minimal configs with which I have been unable to
reproduce the problem even after several hours of reboots. With the above 
mentioned
fh_defconfig, I have been able to get a failure within 20 or so boots with most
kernel builds but at certain kernel commits, the failure has taken a longer 
time to
reproduce.

>From my POV, I can't say which commit causes the problem. So far, I have not 
>been
able to reproduce at all before commit d9734e0d1ccf but I am currently trying to
reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Ming Lei
Hi Mark,

On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter  wrote:
> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter  wrote:
>> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> > > On Sat, 21 Nov 2015 12:30:14 +0100
>> > > Laurent Dufour  wrote:
>> > >
>> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > > > >
>> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > > > merge bios to request infrastucture where we don't obey the merging
>> > > > > > limits properly.
>> > > > > >
>> > > > > > Does either of you have a known good and first known bad kernel?
>> > > > >
>> > > > > Not me, I've only hit it one or two times. All I can say is I have 
>> > > > > hit it in
>> > > > > 4.4-rc1.
>> > > > >
>> > > > > Laurent, can you narrow it down at all?
>> > > >
>> > > > It seems that the panic is triggered by the commit bdced438acd8 
>> > > > ("block:
>> > > > setup bi_phys_segments after splitting") which has been pulled by the
>> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > > > git://git.kernel.dk/linux-block").
>> > > >
>> > > > My system is panicing promptly when running a kernel built at
>> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > > > without panicing.
>> > > >
>> > > > This being said, I can't explain what's going wrong.
>> > > >
>> > > > May Ming shed some light here ?
>> > >
>> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> > > mind testing the following patch to see if it fixes your issue?
>> > >
>> > > ---
>> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> > > From: Ming Lei 
>> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
>> > > Subject: [PATCH] block: fix segment split
>> > >
>> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> > > always points to the iterator local variable, which is obviously
>> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
>> > >
>> > > Signed-off-by: Ming Lei 
>> > > ---
>> > >  block/blk-merge.c | 4 ++--
>> > >  1 file changed, 2 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
>> > > index de5716d8..f2efe8a 100644
>> > > --- a/block/blk-merge.c
>> > > +++ b/block/blk-merge.c
>> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
>> > > request_queue *q,
>> > >
>> > >   seg_size += bv.bv_len;
>> > >   bvprv = bv;
>> > > - bvprvp = &bv;
>> > > + bvprvp = &bvprv;
>> > >   sectors += bv.bv_len >> 9;
>> > >   continue;
>> > >   }
>> > > @@ -108,7 +108,7 @@ new_segment:
>> > >
>> > >   nsegs++;
>> > >   bvprv = bv;
>> > > - bvprvp = &bv;
>> > > + bvprvp = &bvprv;
>> > >   seg_size = bv.bv_len;
>> > >   sectors += bv.bv_len >> 9;
>> > >   }
>> >
>> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>
>> OK, looks there are still other bugs, care to share us how to reproduce
>> it on arm64?
>>
>> thanks,
>> Ming
>
> Unfortunately, the best reproducer I have is to boot the platform. I have 
> seen the
> BUG a few times post-boot, but I don't have a consistant reproducer. I am 
> using
> upstream 4.4-rc1 with this config:
>
>   http://people.redhat.com/msalter/fh_defconfig
>
> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 
> boots.
> On an AMD Seattle platform, about every 9 boots.

Thanks for the input, and I will try to reproduce the issue on mustang with
your kernel config.

>
> I have a script that loops through an ssh command to reboot the platform 
> under test.
> I manually install test kernels and then run the script and wait for failure. 
> While
> debugging, I have tried more minimal configs with which I have been unable to
> reproduce the problem even after several hours of reboots. With the above 
> mentioned
> fh_defconfig, I have been able to get a failure within 20 or so boots with 
> most
> kernel builds but at certain kernel commits, the failure has taken a longer 
> time to
> reproduce.
>
> From my POV, I can't say which commit causes the problem. So far, I have not 
> been
> able to reproduce at all before commit d9734e0d1ccf but I am currently trying 
> to
> reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).

The patch for fixing 'bvprvp' is better to be included for test,
because that issue
may have a big effect on computing physical seg count.

Also I appreciate if you, Laurent or anyone may provide debug log which
can be captured with the attached debug patch when this issue is trigered .


Thanks,
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dd8ad2a..a0db1fe 100644
--- a/drivers/s

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Hannes Reinecke

On 11/20/2015 04:28 PM, Ewan Milne wrote:

On Fri, 2015-11-20 at 15:55 +0100, Hannes Reinecke wrote:

Can't we have a joint effort here?
I've been spending a _LOT_ of time trying to debug things here, but
none of the ideas I've come up with have been able to fix anything.


Yes.  I'm not the one primarily looking at it, and we don't have a
reproducer in-house.  We just have the one dump right now.


Oh, I got plenty of them :-(



I'm almost tempted to increase the count from scsi_alloc_sgtable()
by one and be done with ...



That might not fix it if it is a problem with the merge code, though.


Of course not. But it'll be a band-aid to keep the customer happy.

Cheers,

Hannes
--
Dr. Hannes ReineckezSeries & Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html