RE: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-07-15 Thread Jean Delvare
Le Friday 10 July 2015 à 19:46 +0530, Kashyap Desai a écrit :
> >
> > I am about to commit the patch that was successfully tested by the
> > customer on
> > SLES 12, but I'm a bit confused. The upstream patch you referred to is:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?h=for-
> > next=6431f5d7c6025f8b007af06ea090de308f7e6881
> > [SCSI] megaraid_sas: megaraid_sas driver init fails in kdump kernel
> >
> > But the patch I used is the one you sent by e-mail on May 28th. It is
> > completely
> > different!
> >
> > So what am I supposed to do? Use the patch you sent (and that was tested
> > by the customer) for SLES 11 SP3 and SLES 12? Or was it just for testing
> > and the proper way of fixing the problem would be to backport the
>>upstream commit?
> 
> You can use that patch as valid candidate for upstream submission. Some of
> the MR maintainer (Sumit Saxena) will send that patch. We are just
> organizing other patch series.
> Since SLES already ported patch without commit id, we are fine. I am just
> giving reference that patch which send via email will be send to upstream
> very soon along with other patch set.

OK, thanks for the clarification. The patched SLES 11 SP3 kernel is
currently under testing by the customer, apparently it doesn't work but
I don't have all the details yet. Maybe some more patches need to be
backported because that kernel is older.

-- 
Jean Delvare
SUSE L3 Support

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-07-15 Thread Jean Delvare
Le Friday 10 July 2015 à 19:46 +0530, Kashyap Desai a écrit :
 
  I am about to commit the patch that was successfully tested by the
  customer on
  SLES 12, but I'm a bit confused. The upstream patch you referred to is:
 
  https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?h=for-
  nextid=6431f5d7c6025f8b007af06ea090de308f7e6881
  [SCSI] megaraid_sas: megaraid_sas driver init fails in kdump kernel
 
  But the patch I used is the one you sent by e-mail on May 28th. It is
  completely
  different!
 
  So what am I supposed to do? Use the patch you sent (and that was tested
  by the customer) for SLES 11 SP3 and SLES 12? Or was it just for testing
  and the proper way of fixing the problem would be to backport the
upstream commit?
 
 You can use that patch as valid candidate for upstream submission. Some of
 the MR maintainer (Sumit Saxena) will send that patch. We are just
 organizing other patch series.
 Since SLES already ported patch without commit id, we are fine. I am just
 giving reference that patch which send via email will be send to upstream
 very soon along with other patch set.

OK, thanks for the clarification. The patched SLES 11 SP3 kernel is
currently under testing by the customer, apparently it doesn't work but
I don't have all the details yet. Maybe some more patches need to be
backported because that kernel is older.

-- 
Jean Delvare
SUSE L3 Support

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-07-10 Thread Kashyap Desai
>
> I am about to commit the patch that was successfully tested by the
> customer on
> SLES 12, but I'm a bit confused. The upstream patch you referred to is:
>
> https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?h=for-
> next=6431f5d7c6025f8b007af06ea090de308f7e6881
> [SCSI] megaraid_sas: megaraid_sas driver init fails in kdump kernel
>
> But the patch I used is the one you sent by e-mail on May 28th. It is
> completely
> different!
>
> So what am I supposed to do? Use the patch you sent (and that was tested
> by
> the customer) for SLES 11 SP3 and SLES 12? Or was it just for testing and
> the
> proper way of fixing the problem would be to backport the upstream commit?

You can use that patch as valid candidate for upstream submission. Some of
the MR maintainer (Sumit Saxena) will send that patch. We are just
organizing other patch series.
Since SLES already ported patch without commit id, we are fine. I am just
giving reference that patch which send via email will be send to upstream
very soon along with other patch set.

Thanks, Kashyap
>
> Please advise,
> --
> Jean Delvare
> SUSE L3 Support
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-07-10 Thread Jean Delvare
Hi Kashyap,

Le Tuesday 07 July 2015 à 14:48 +0530, Kashyap Desai a écrit :
> > -Original Message-
> > From: Jean Delvare [mailto:jdelv...@suse.de]
> > Sent: Tuesday, July 07, 2015 2:14 PM
> > To: Kashyap Desai
> > Cc: Bjorn Helgaas; Robin H. Johnson; Adam Radford; Neela Syam Kolli;
> linux-
> > s...@vger.kernel.org; arkadiusz.bub...@open-e.com; Matthew Garrett;
> Sumit
> > Saxena; Uday Lingala; PDL,MEGARAIDLINUX; linux-...@vger.kernel.org;
> linux-
> > ker...@vger.kernel.org; Myron Stowe
> > Subject: Re: megaraid_sas: "FW in FAULT state!!", how to get more debug
> > output? [BKO63661]
> >
> > Hi Kashyap,
> >
> > On Thu, 28 May 2015 19:05:35 +0530, Kashyap Desai wrote:
> > > Bjorn/Robin,
> > >
> > > Apologies for delay. Here is one quick suggestion as we have seen
> > > similar issue (not exactly similar, but high probably to have same
> > > issue) while controller is configured on VM as pass-through and VM
> reboot
> > abruptly.
> > > In that particular issue, driver interact with FW which may  require
> > > chip reset to bring controller to operation state.
> > >
> > > Relevant patch was submitted for only Older controller as it was only
> > > seen for few MegaRaid controller. Below patch already try to do chip
> > > reset, but only for limited controllers...I have attached one more
> > > patch which does chip reset from driver load time for
> > > Thunderbolt/Invader/Fury etc. (In your case you have Thunderbolt
> > > controller, so attached patch is required.)
> > >
> > > http://www.spinics.net/lists/linux-scsi/msg67288.html
> > >
> > > Please post the result with attached patch.
> >
> > Good news! Customer tested your patch and said it fixed the problem :-)
> >
> > I am now in the process of backporting the patch to the SLES 11 SP3
> > kernel for further testing. I'll let you know how it goes. Thank you
> > very much for your assistance.

For the record I was able to backport the patch by myself to SLES 11
SP3, it's currently under testing by the customer.

> Thanks for confirmation. Whatever patch I submitted to you, we have added
> recently (as part of common interface approach to do chip reset at load
> time). We will be submitting that patch to mainline soon.

I am about to commit the patch that was successfully tested by the
customer on SLES 12, but I'm a bit confused. The upstream patch you
referred to is:

https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?h=for-next=6431f5d7c6025f8b007af06ea090de308f7e6881
[SCSI] megaraid_sas: megaraid_sas driver init fails in kdump kernel

But the patch I used is the one you sent by e-mail on May 28th. It is
completely different!

So what am I supposed to do? Use the patch you sent (and that was tested
by the customer) for SLES 11 SP3 and SLES 12? Or was it just for testing
and the proper way of fixing the problem would be to backport the
upstream commit?

Please advise,
-- 
Jean Delvare
SUSE L3 Support

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-07-10 Thread Jean Delvare
Hi Kashyap,

Le Tuesday 07 July 2015 à 14:48 +0530, Kashyap Desai a écrit :
  -Original Message-
  From: Jean Delvare [mailto:jdelv...@suse.de]
  Sent: Tuesday, July 07, 2015 2:14 PM
  To: Kashyap Desai
  Cc: Bjorn Helgaas; Robin H. Johnson; Adam Radford; Neela Syam Kolli;
 linux-
  s...@vger.kernel.org; arkadiusz.bub...@open-e.com; Matthew Garrett;
 Sumit
  Saxena; Uday Lingala; PDL,MEGARAIDLINUX; linux-...@vger.kernel.org;
 linux-
  ker...@vger.kernel.org; Myron Stowe
  Subject: Re: megaraid_sas: FW in FAULT state!!, how to get more debug
  output? [BKO63661]
 
  Hi Kashyap,
 
  On Thu, 28 May 2015 19:05:35 +0530, Kashyap Desai wrote:
   Bjorn/Robin,
  
   Apologies for delay. Here is one quick suggestion as we have seen
   similar issue (not exactly similar, but high probably to have same
   issue) while controller is configured on VM as pass-through and VM
 reboot
  abruptly.
   In that particular issue, driver interact with FW which may  require
   chip reset to bring controller to operation state.
  
   Relevant patch was submitted for only Older controller as it was only
   seen for few MegaRaid controller. Below patch already try to do chip
   reset, but only for limited controllers...I have attached one more
   patch which does chip reset from driver load time for
   Thunderbolt/Invader/Fury etc. (In your case you have Thunderbolt
   controller, so attached patch is required.)
  
   http://www.spinics.net/lists/linux-scsi/msg67288.html
  
   Please post the result with attached patch.
 
  Good news! Customer tested your patch and said it fixed the problem :-)
 
  I am now in the process of backporting the patch to the SLES 11 SP3
  kernel for further testing. I'll let you know how it goes. Thank you
  very much for your assistance.

For the record I was able to backport the patch by myself to SLES 11
SP3, it's currently under testing by the customer.

 Thanks for confirmation. Whatever patch I submitted to you, we have added
 recently (as part of common interface approach to do chip reset at load
 time). We will be submitting that patch to mainline soon.

I am about to commit the patch that was successfully tested by the
customer on SLES 12, but I'm a bit confused. The upstream patch you
referred to is:

https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?h=for-nextid=6431f5d7c6025f8b007af06ea090de308f7e6881
[SCSI] megaraid_sas: megaraid_sas driver init fails in kdump kernel

But the patch I used is the one you sent by e-mail on May 28th. It is
completely different!

So what am I supposed to do? Use the patch you sent (and that was tested
by the customer) for SLES 11 SP3 and SLES 12? Or was it just for testing
and the proper way of fixing the problem would be to backport the
upstream commit?

Please advise,
-- 
Jean Delvare
SUSE L3 Support

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-07-10 Thread Kashyap Desai

 I am about to commit the patch that was successfully tested by the
 customer on
 SLES 12, but I'm a bit confused. The upstream patch you referred to is:

 https://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?h=for-
 nextid=6431f5d7c6025f8b007af06ea090de308f7e6881
 [SCSI] megaraid_sas: megaraid_sas driver init fails in kdump kernel

 But the patch I used is the one you sent by e-mail on May 28th. It is
 completely
 different!

 So what am I supposed to do? Use the patch you sent (and that was tested
 by
 the customer) for SLES 11 SP3 and SLES 12? Or was it just for testing and
 the
 proper way of fixing the problem would be to backport the upstream commit?

You can use that patch as valid candidate for upstream submission. Some of
the MR maintainer (Sumit Saxena) will send that patch. We are just
organizing other patch series.
Since SLES already ported patch without commit id, we are fine. I am just
giving reference that patch which send via email will be send to upstream
very soon along with other patch set.

Thanks, Kashyap

 Please advise,
 --
 Jean Delvare
 SUSE L3 Support
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-07-07 Thread Kashyap Desai
> -Original Message-
> From: Jean Delvare [mailto:jdelv...@suse.de]
> Sent: Tuesday, July 07, 2015 2:14 PM
> To: Kashyap Desai
> Cc: Bjorn Helgaas; Robin H. Johnson; Adam Radford; Neela Syam Kolli;
linux-
> s...@vger.kernel.org; arkadiusz.bub...@open-e.com; Matthew Garrett;
Sumit
> Saxena; Uday Lingala; PDL,MEGARAIDLINUX; linux-...@vger.kernel.org;
linux-
> ker...@vger.kernel.org; Myron Stowe
> Subject: Re: megaraid_sas: "FW in FAULT state!!", how to get more debug
> output? [BKO63661]
>
> Hi Kashyap,
>
> On Thu, 28 May 2015 19:05:35 +0530, Kashyap Desai wrote:
> > Bjorn/Robin,
> >
> > Apologies for delay. Here is one quick suggestion as we have seen
> > similar issue (not exactly similar, but high probably to have same
> > issue) while controller is configured on VM as pass-through and VM
reboot
> abruptly.
> > In that particular issue, driver interact with FW which may  require
> > chip reset to bring controller to operation state.
> >
> > Relevant patch was submitted for only Older controller as it was only
> > seen for few MegaRaid controller. Below patch already try to do chip
> > reset, but only for limited controllers...I have attached one more
> > patch which does chip reset from driver load time for
> > Thunderbolt/Invader/Fury etc. (In your case you have Thunderbolt
> > controller, so attached patch is required.)
> >
> > http://www.spinics.net/lists/linux-scsi/msg67288.html
> >
> > Please post the result with attached patch.
>
> Good news! Customer tested your patch and said it fixed the problem :-)
>
> I am now in the process of backporting the patch to the SLES 11 SP3
kernel for
> further testing. I'll let you know how it goes. Thank you very much for
your
> assistance.

Thanks for confirmation. Whatever patch I submitted to you, we have added
recently (as part of common interface approach to do chip reset at load
time). We will be submitting that patch to mainline soon.

~ Kashyap
>
> --
> Jean Delvare
> SUSE L3 Support
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-07-07 Thread Jean Delvare
Hi Kashyap,

On Thu, 28 May 2015 19:05:35 +0530, Kashyap Desai wrote:
> Bjorn/Robin,
> 
> Apologies for delay. Here is one quick suggestion as we have seen similar
> issue (not exactly similar, but high probably to have same issue) while
> controller is configured on VM as pass-through and VM reboot abruptly.
> In that particular issue, driver interact with FW which may  require chip
> reset to bring controller to operation state.
> 
> Relevant patch was submitted for only Older controller as it was only seen
> for few MegaRaid controller. Below patch already try to do chip reset, but
> only for limited controllers...I have attached one more patch which does
> chip reset from driver load time for Thunderbolt/Invader/Fury etc. (In
> your case you have Thunderbolt controller, so attached patch is required.)
> 
> http://www.spinics.net/lists/linux-scsi/msg67288.html
> 
> Please post the result with attached patch.

Good news! Customer tested your patch and said it fixed the problem :-)

I am now in the process of backporting the patch to the SLES 11 SP3
kernel for further testing. I'll let you know how it goes. Thank you
very much for your assistance.

-- 
Jean Delvare
SUSE L3 Support
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-07-07 Thread Jean Delvare
Hi Kashyap,

On Thu, 28 May 2015 19:05:35 +0530, Kashyap Desai wrote:
 Bjorn/Robin,
 
 Apologies for delay. Here is one quick suggestion as we have seen similar
 issue (not exactly similar, but high probably to have same issue) while
 controller is configured on VM as pass-through and VM reboot abruptly.
 In that particular issue, driver interact with FW which may  require chip
 reset to bring controller to operation state.
 
 Relevant patch was submitted for only Older controller as it was only seen
 for few MegaRaid controller. Below patch already try to do chip reset, but
 only for limited controllers...I have attached one more patch which does
 chip reset from driver load time for Thunderbolt/Invader/Fury etc. (In
 your case you have Thunderbolt controller, so attached patch is required.)
 
 http://www.spinics.net/lists/linux-scsi/msg67288.html
 
 Please post the result with attached patch.

Good news! Customer tested your patch and said it fixed the problem :-)

I am now in the process of backporting the patch to the SLES 11 SP3
kernel for further testing. I'll let you know how it goes. Thank you
very much for your assistance.

-- 
Jean Delvare
SUSE L3 Support
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-07-07 Thread Kashyap Desai
 -Original Message-
 From: Jean Delvare [mailto:jdelv...@suse.de]
 Sent: Tuesday, July 07, 2015 2:14 PM
 To: Kashyap Desai
 Cc: Bjorn Helgaas; Robin H. Johnson; Adam Radford; Neela Syam Kolli;
linux-
 s...@vger.kernel.org; arkadiusz.bub...@open-e.com; Matthew Garrett;
Sumit
 Saxena; Uday Lingala; PDL,MEGARAIDLINUX; linux-...@vger.kernel.org;
linux-
 ker...@vger.kernel.org; Myron Stowe
 Subject: Re: megaraid_sas: FW in FAULT state!!, how to get more debug
 output? [BKO63661]

 Hi Kashyap,

 On Thu, 28 May 2015 19:05:35 +0530, Kashyap Desai wrote:
  Bjorn/Robin,
 
  Apologies for delay. Here is one quick suggestion as we have seen
  similar issue (not exactly similar, but high probably to have same
  issue) while controller is configured on VM as pass-through and VM
reboot
 abruptly.
  In that particular issue, driver interact with FW which may  require
  chip reset to bring controller to operation state.
 
  Relevant patch was submitted for only Older controller as it was only
  seen for few MegaRaid controller. Below patch already try to do chip
  reset, but only for limited controllers...I have attached one more
  patch which does chip reset from driver load time for
  Thunderbolt/Invader/Fury etc. (In your case you have Thunderbolt
  controller, so attached patch is required.)
 
  http://www.spinics.net/lists/linux-scsi/msg67288.html
 
  Please post the result with attached patch.

 Good news! Customer tested your patch and said it fixed the problem :-)

 I am now in the process of backporting the patch to the SLES 11 SP3
kernel for
 further testing. I'll let you know how it goes. Thank you very much for
your
 assistance.

Thanks for confirmation. Whatever patch I submitted to you, we have added
recently (as part of common interface approach to do chip reset at load
time). We will be submitting that patch to mainline soon.

~ Kashyap

 --
 Jean Delvare
 SUSE L3 Support
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-07-01 Thread Jean Delvare
Hi Kashyap,

I finally managed to backport your patch to the SLES 12 kernel :-) I'll
build a test kernel for the customer and have them test it. I'll let you
know if I need your help later for the SLES 11 SP3 kernel backport -
thanks for the offer!

Jean

Le Tuesday 30 June 2015 à 16:03 +0530, Kashyap Desai a écrit :
> Jean,
> 
> Patch is available at below repo -
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git - b for-next
> 
> Commit id -
> 6431f5d7c6025f8b007af06ea090de308f7e6881
> 
> If you share megaraid_sas driver code of your tree, I can provide patch for
> you.
> 
> ` Kashyap
> 
> > -Original Message-
> > From: Jean Delvare [mailto:jdelv...@suse.de]
> > Sent: Monday, June 29, 2015 6:55 PM
> > To: Kashyap Desai
> > Cc: Bjorn Helgaas; Robin H. Johnson; Adam Radford; Neela Syam Kolli;
> > linux-
> > s...@vger.kernel.org; arkadiusz.bub...@open-e.com; Matthew Garrett; Sumit
> > Saxena; Uday Lingala; PDL,MEGARAIDLINUX; linux-...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; Myron Stowe
> > Subject: RE: megaraid_sas: "FW in FAULT state!!", how to get more debug
> > output? [BKO63661]
> >
> > Hi Kashyap,
> >
> > Thanks for the patch. May I ask what tree it was based on? Linus'
> > latest? I am trying to apply it to the SLES 11 SP3 and SLES 12 kernel
> > trees (based
> > on kernel v3.0 + a bunch of backports and v3.12
> > respectively) but your patch fails to apply in both cases. I'll try harder
> > but I don't
> > know anything about the megaraid_sas code so I really don't know where I'm
> > going.
> >
> > Does your patch depend on any other that may not be present in the SLES
> > 11 SP3 and SLES 12 kernels?
> >
> > Thanks,
> > Jean
> >
> > Le Thursday 28 May 2015 à 19:05 +0530, Kashyap Desai a écrit :
> > > Bjorn/Robin,
> > >
> > > Apologies for delay. Here is one quick suggestion as we have seen
> > > similar issue (not exactly similar, but high probably to have same
> > > issue) while controller is configured on VM as pass-through and VM
> > > reboot
> > abruptly.
> > > In that particular issue, driver interact with FW which may  require
> > > chip reset to bring controller to operation state.
> > >
> > > Relevant patch was submitted for only Older controller as it was only
> > > seen for few MegaRaid controller. Below patch already try to do chip
> > > reset, but only for limited controllers...I have attached one more
> > > patch which does chip reset from driver load time for
> > > Thunderbolt/Invader/Fury etc. (In your case you have Thunderbolt
> > > controller, so attached patch is required.)
> > >
> > > http://www.spinics.net/lists/linux-scsi/msg67288.html
> > >
> > > Please post the result with attached patch.
> > >
> > > Thanks, Kashyap


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-07-01 Thread Jean Delvare
Hi Kashyap,

I finally managed to backport your patch to the SLES 12 kernel :-) I'll
build a test kernel for the customer and have them test it. I'll let you
know if I need your help later for the SLES 11 SP3 kernel backport -
thanks for the offer!

Jean

Le Tuesday 30 June 2015 à 16:03 +0530, Kashyap Desai a écrit :
 Jean,
 
 Patch is available at below repo -
 
 git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git - b for-next
 
 Commit id -
 6431f5d7c6025f8b007af06ea090de308f7e6881
 
 If you share megaraid_sas driver code of your tree, I can provide patch for
 you.
 
 ` Kashyap
 
  -Original Message-
  From: Jean Delvare [mailto:jdelv...@suse.de]
  Sent: Monday, June 29, 2015 6:55 PM
  To: Kashyap Desai
  Cc: Bjorn Helgaas; Robin H. Johnson; Adam Radford; Neela Syam Kolli;
  linux-
  s...@vger.kernel.org; arkadiusz.bub...@open-e.com; Matthew Garrett; Sumit
  Saxena; Uday Lingala; PDL,MEGARAIDLINUX; linux-...@vger.kernel.org; linux-
  ker...@vger.kernel.org; Myron Stowe
  Subject: RE: megaraid_sas: FW in FAULT state!!, how to get more debug
  output? [BKO63661]
 
  Hi Kashyap,
 
  Thanks for the patch. May I ask what tree it was based on? Linus'
  latest? I am trying to apply it to the SLES 11 SP3 and SLES 12 kernel
  trees (based
  on kernel v3.0 + a bunch of backports and v3.12
  respectively) but your patch fails to apply in both cases. I'll try harder
  but I don't
  know anything about the megaraid_sas code so I really don't know where I'm
  going.
 
  Does your patch depend on any other that may not be present in the SLES
  11 SP3 and SLES 12 kernels?
 
  Thanks,
  Jean
 
  Le Thursday 28 May 2015 à 19:05 +0530, Kashyap Desai a écrit :
   Bjorn/Robin,
  
   Apologies for delay. Here is one quick suggestion as we have seen
   similar issue (not exactly similar, but high probably to have same
   issue) while controller is configured on VM as pass-through and VM
   reboot
  abruptly.
   In that particular issue, driver interact with FW which may  require
   chip reset to bring controller to operation state.
  
   Relevant patch was submitted for only Older controller as it was only
   seen for few MegaRaid controller. Below patch already try to do chip
   reset, but only for limited controllers...I have attached one more
   patch which does chip reset from driver load time for
   Thunderbolt/Invader/Fury etc. (In your case you have Thunderbolt
   controller, so attached patch is required.)
  
   http://www.spinics.net/lists/linux-scsi/msg67288.html
  
   Please post the result with attached patch.
  
   Thanks, Kashyap


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-06-30 Thread Kashyap Desai
Jean,

Patch is available at below repo -

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git - b for-next

Commit id -
6431f5d7c6025f8b007af06ea090de308f7e6881

If you share megaraid_sas driver code of your tree, I can provide patch for
you.

` Kashyap

> -Original Message-
> From: Jean Delvare [mailto:jdelv...@suse.de]
> Sent: Monday, June 29, 2015 6:55 PM
> To: Kashyap Desai
> Cc: Bjorn Helgaas; Robin H. Johnson; Adam Radford; Neela Syam Kolli;
> linux-
> s...@vger.kernel.org; arkadiusz.bub...@open-e.com; Matthew Garrett; Sumit
> Saxena; Uday Lingala; PDL,MEGARAIDLINUX; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Myron Stowe
> Subject: RE: megaraid_sas: "FW in FAULT state!!", how to get more debug
> output? [BKO63661]
>
> Hi Kashyap,
>
> Thanks for the patch. May I ask what tree it was based on? Linus'
> latest? I am trying to apply it to the SLES 11 SP3 and SLES 12 kernel
> trees (based
> on kernel v3.0 + a bunch of backports and v3.12
> respectively) but your patch fails to apply in both cases. I'll try harder
> but I don't
> know anything about the megaraid_sas code so I really don't know where I'm
> going.
>
> Does your patch depend on any other that may not be present in the SLES
> 11 SP3 and SLES 12 kernels?
>
> Thanks,
> Jean
>
> Le Thursday 28 May 2015 à 19:05 +0530, Kashyap Desai a écrit :
> > Bjorn/Robin,
> >
> > Apologies for delay. Here is one quick suggestion as we have seen
> > similar issue (not exactly similar, but high probably to have same
> > issue) while controller is configured on VM as pass-through and VM
> > reboot
> abruptly.
> > In that particular issue, driver interact with FW which may  require
> > chip reset to bring controller to operation state.
> >
> > Relevant patch was submitted for only Older controller as it was only
> > seen for few MegaRaid controller. Below patch already try to do chip
> > reset, but only for limited controllers...I have attached one more
> > patch which does chip reset from driver load time for
> > Thunderbolt/Invader/Fury etc. (In your case you have Thunderbolt
> > controller, so attached patch is required.)
> >
> > http://www.spinics.net/lists/linux-scsi/msg67288.html
> >
> > Please post the result with attached patch.
> >
> > Thanks, Kashyap
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-06-30 Thread Kashyap Desai
Jean,

Patch is available at below repo -

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git - b for-next

Commit id -
6431f5d7c6025f8b007af06ea090de308f7e6881

If you share megaraid_sas driver code of your tree, I can provide patch for
you.

` Kashyap

 -Original Message-
 From: Jean Delvare [mailto:jdelv...@suse.de]
 Sent: Monday, June 29, 2015 6:55 PM
 To: Kashyap Desai
 Cc: Bjorn Helgaas; Robin H. Johnson; Adam Radford; Neela Syam Kolli;
 linux-
 s...@vger.kernel.org; arkadiusz.bub...@open-e.com; Matthew Garrett; Sumit
 Saxena; Uday Lingala; PDL,MEGARAIDLINUX; linux-...@vger.kernel.org; linux-
 ker...@vger.kernel.org; Myron Stowe
 Subject: RE: megaraid_sas: FW in FAULT state!!, how to get more debug
 output? [BKO63661]

 Hi Kashyap,

 Thanks for the patch. May I ask what tree it was based on? Linus'
 latest? I am trying to apply it to the SLES 11 SP3 and SLES 12 kernel
 trees (based
 on kernel v3.0 + a bunch of backports and v3.12
 respectively) but your patch fails to apply in both cases. I'll try harder
 but I don't
 know anything about the megaraid_sas code so I really don't know where I'm
 going.

 Does your patch depend on any other that may not be present in the SLES
 11 SP3 and SLES 12 kernels?

 Thanks,
 Jean

 Le Thursday 28 May 2015 à 19:05 +0530, Kashyap Desai a écrit :
  Bjorn/Robin,
 
  Apologies for delay. Here is one quick suggestion as we have seen
  similar issue (not exactly similar, but high probably to have same
  issue) while controller is configured on VM as pass-through and VM
  reboot
 abruptly.
  In that particular issue, driver interact with FW which may  require
  chip reset to bring controller to operation state.
 
  Relevant patch was submitted for only Older controller as it was only
  seen for few MegaRaid controller. Below patch already try to do chip
  reset, but only for limited controllers...I have attached one more
  patch which does chip reset from driver load time for
  Thunderbolt/Invader/Fury etc. (In your case you have Thunderbolt
  controller, so attached patch is required.)
 
  http://www.spinics.net/lists/linux-scsi/msg67288.html
 
  Please post the result with attached patch.
 
  Thanks, Kashyap

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-06-29 Thread Jean Delvare
Hi Kashyap,

Thanks for the patch. May I ask what tree it was based on? Linus'
latest? I am trying to apply it to the SLES 11 SP3 and SLES 12 kernel
trees (based on kernel v3.0 + a bunch of backports and v3.12
respectively) but your patch fails to apply in both cases. I'll try
harder but I don't know anything about the megaraid_sas code so I really
don't know where I'm going.

Does your patch depend on any other that may not be present in the SLES
11 SP3 and SLES 12 kernels?

Thanks,
Jean

Le Thursday 28 May 2015 à 19:05 +0530, Kashyap Desai a écrit :
> Bjorn/Robin,
> 
> Apologies for delay. Here is one quick suggestion as we have seen similar
> issue (not exactly similar, but high probably to have same issue) while
> controller is configured on VM as pass-through and VM reboot abruptly.
> In that particular issue, driver interact with FW which may  require chip
> reset to bring controller to operation state.
> 
> Relevant patch was submitted for only Older controller as it was only seen
> for few MegaRaid controller. Below patch already try to do chip reset, but
> only for limited controllers...I have attached one more patch which does
> chip reset from driver load time for Thunderbolt/Invader/Fury etc. (In
> your case you have Thunderbolt controller, so attached patch is required.)
> 
> http://www.spinics.net/lists/linux-scsi/msg67288.html
> 
> Please post the result with attached patch.
> 
> Thanks, Kashyap


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-06-29 Thread Jean Delvare
Hi Kashyap,

Thanks for the patch. May I ask what tree it was based on? Linus'
latest? I am trying to apply it to the SLES 11 SP3 and SLES 12 kernel
trees (based on kernel v3.0 + a bunch of backports and v3.12
respectively) but your patch fails to apply in both cases. I'll try
harder but I don't know anything about the megaraid_sas code so I really
don't know where I'm going.

Does your patch depend on any other that may not be present in the SLES
11 SP3 and SLES 12 kernels?

Thanks,
Jean

Le Thursday 28 May 2015 à 19:05 +0530, Kashyap Desai a écrit :
 Bjorn/Robin,
 
 Apologies for delay. Here is one quick suggestion as we have seen similar
 issue (not exactly similar, but high probably to have same issue) while
 controller is configured on VM as pass-through and VM reboot abruptly.
 In that particular issue, driver interact with FW which may  require chip
 reset to bring controller to operation state.
 
 Relevant patch was submitted for only Older controller as it was only seen
 for few MegaRaid controller. Below patch already try to do chip reset, but
 only for limited controllers...I have attached one more patch which does
 chip reset from driver load time for Thunderbolt/Invader/Fury etc. (In
 your case you have Thunderbolt controller, so attached patch is required.)
 
 http://www.spinics.net/lists/linux-scsi/msg67288.html
 
 Please post the result with attached patch.
 
 Thanks, Kashyap


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-05-28 Thread Kashyap Desai
Bjorn/Robin,

Apologies for delay. Here is one quick suggestion as we have seen similar
issue (not exactly similar, but high probably to have same issue) while
controller is configured on VM as pass-through and VM reboot abruptly.
In that particular issue, driver interact with FW which may  require chip
reset to bring controller to operation state.

Relevant patch was submitted for only Older controller as it was only seen
for few MegaRaid controller. Below patch already try to do chip reset, but
only for limited controllers...I have attached one more patch which does
chip reset from driver load time for Thunderbolt/Invader/Fury etc. (In
your case you have Thunderbolt controller, so attached patch is required.)

http://www.spinics.net/lists/linux-scsi/msg67288.html

Please post the result with attached patch.

Thanks, Kashyap

> -Original Message-
> From: Bjorn Helgaas [mailto:bhelg...@google.com]
> Sent: Thursday, May 28, 2015 5:55 PM
> To: Robin H. Johnson
> Cc: Adam Radford; Neela Syam Kolli; linux-s...@vger.kernel.org;
> arkadiusz.bub...@open-e.com; Matthew Garrett; Kashyap Desai; Sumit
Saxena;
> Uday Lingala; megaraidlinux@avagotech.com;
linux-...@vger.kernel.org;
> linux-kernel@vger.kernel.org; Jean Delvare; Myron Stowe
> Subject: Re: megaraid_sas: "FW in FAULT state!!", how to get more debug
> output? [BKO63661]
>
> [+cc Jean, Myron]
>
> Hello megaraid maintainers,
>
> Have you been able to take a look at this at all?  People have been
reporting this
> issue since 2012 on upstream, Debian, and Ubuntu, and now we're getting
> reports on SLES.
>
> My theory is that the Linux driver relies on some MegaRAID
initialization done by
> the option ROM, and the bug happens when the BIOS doesn't execute the
option
> ROM.
>
> If that's correct, you should be able to reproduce it on any system by
booting
> Linux (v3.3 or later) without running the MegaRAID SAS 2208 option ROM
(either
> by enabling a BIOS "fast boot" switch, or modifying the BIOS to skip
it).  If the
> Linux driver doesn't rely on the option ROM, you might even be able to
> reproduce it by physically removing the option ROM from the MegaRAID.
>
> Bjorn
>
> On Wed, Apr 29, 2015 at 12:28:32PM -0500, Bjorn Helgaas wrote:
> > [+cc linux-pci, linux-kernel, Kashyap, Sumit, Uday, megaraidlinux.pdl]
> >
> > On Sun, Jul 13, 2014 at 01:35:51AM +, Robin H. Johnson wrote:
> > > On Sat, Jul 12, 2014 at 11:29:20AM -0600, Bjorn Helgaas wrote:
> > > > Thanks for the report, Robin.
> > > >
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=63661 bisected the
> > > > problem to 3c076351c402 ("PCI: Rework ASPM disable code"), which
> > > > appeared in v3.3.  For starters, can you verify that, e.g., by
> > > > building
> > > > 69166fbf02c7 (the parent of 3c076351c402) to make sure that it
> > > > works, and building 3c076351c402 itself to make sure it fails?
> > > >
> > > > Assuming that's the case, please attach the complete dmesg and
> > > > "lspci -vvxxx" output for both kernels to the bugzilla.  ASPM is a
> > > > feature that is configured on both ends of a PCIe link, so I want
> > > > to see the lspci info for the whole system, not just the SAS
adapters.
> > > >
> > > > It's not practical to revert 3c076351c402 now, so I'd also like to
> > > > see the same information for the newest possible kernel (if this
> > > > is possible; I'm not clear on whether you can boot your system or
> > > > not) so we can figure out what needs to be changed.
> > > TL;DR: FastBoot is leaving the MegaRaidSAS in a weird state, and it
> > > fails to start; Commit 3c076351c402 did make it worse, but I think
> > > we're right that the bug lies in the SAS code.
> > >
> > > Ok, I have done more testing on it (40+ boots), and I think we can
> > > show the problem is somewhere in how the BIOS/EFI/ROM brings up the
> > > card in FastBoot more, or how it leaves the card.
> >
> > I attached your dmesg and lspci logs to
> > https://bugzilla.kernel.org/show_bug.cgi?id=63661, thank you!  You did
> > a huge amount of excellent testing and analysis, and I'm sorry that we
> > haven't made progress using the results.
> >
> > I still think this is a megaraid_sas driver bug, but I don't have
> > enough evidence to really point fingers.
> >
> > Based on your testing, before 3c076351c402 ("PCI: Rework ASPM disable
> > code"), megaraid_sas worked reliably.  After 3c076351c402,
> > megaraid_sas does not work reliably when BIOS Fast Boot is enabled.
> >
> >

Re: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-05-28 Thread Bjorn Helgaas
[+cc Jean, Myron]

Hello megaraid maintainers,

Have you been able to take a look at this at all?  People have been
reporting this issue since 2012 on upstream, Debian, and Ubuntu, and
now we're getting reports on SLES.

My theory is that the Linux driver relies on some MegaRAID initialization
done by the option ROM, and the bug happens when the BIOS doesn't execute
the option ROM.

If that's correct, you should be able to reproduce it on any system by
booting Linux (v3.3 or later) without running the MegaRAID SAS 2208 option
ROM (either by enabling a BIOS "fast boot" switch, or modifying the BIOS to
skip it).  If the Linux driver doesn't rely on the option ROM, you might
even be able to reproduce it by physically removing the option ROM from the
MegaRAID.

Bjorn

On Wed, Apr 29, 2015 at 12:28:32PM -0500, Bjorn Helgaas wrote:
> [+cc linux-pci, linux-kernel, Kashyap, Sumit, Uday, megaraidlinux.pdl]
> 
> On Sun, Jul 13, 2014 at 01:35:51AM +, Robin H. Johnson wrote:
> > On Sat, Jul 12, 2014 at 11:29:20AM -0600, Bjorn Helgaas wrote:
> > > Thanks for the report, Robin.
> > > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=63661 bisected the problem
> > > to 3c076351c402 ("PCI: Rework ASPM disable code"), which appeared in
> > > v3.3.  For starters, can you verify that, e.g., by building
> > > 69166fbf02c7 (the parent of 3c076351c402) to make sure that it works,
> > > and building 3c076351c402 itself to make sure it fails?
> > > 
> > > Assuming that's the case, please attach the complete dmesg and "lspci
> > > -vvxxx" output for both kernels to the bugzilla.  ASPM is a feature
> > > that is configured on both ends of a PCIe link, so I want to see the
> > > lspci info for the whole system, not just the SAS adapters.
> > > 
> > > It's not practical to revert 3c076351c402 now, so I'd also like to see
> > > the same information for the newest possible kernel (if this is
> > > possible; I'm not clear on whether you can boot your system or not) so
> > > we can figure out what needs to be changed.
> > TL;DR: FastBoot is leaving the MegaRaidSAS in a weird state, and it fails to
> > start; Commit 3c076351c402 did make it worse, but I think we're right that 
> > the
> > bug lies in the SAS code.
> > 
> > Ok, I have done more testing on it (40+ boots), and I think we can show the
> > problem is somewhere in how the BIOS/EFI/ROM brings up the card in FastBoot
> > more, or how it leaves the card.
> 
> I attached your dmesg and lspci logs to
> https://bugzilla.kernel.org/show_bug.cgi?id=63661, thank you!  You did
> a huge amount of excellent testing and analysis, and I'm sorry that we
> haven't made progress using the results.
> 
> I still think this is a megaraid_sas driver bug, but I don't have
> enough evidence to really point fingers.
> 
> Based on your testing, before 3c076351c402 ("PCI: Rework ASPM disable
> code"), megaraid_sas worked reliably.  After 3c076351c402,
> megaraid_sas does not work reliably when BIOS Fast Boot is enabled.
> 
> Fast Boot probably means we don't run the option ROM on the device.
> Your dmesg logs show that in the working case, BIOS has enabled the
> device.  In the failing case it has not.  They also show that when
> Fast Boot is enabled, there's a little less MTRR write-protect space,
> which I'm guessing is space that wasn't needed for shadowing option
> ROMs.
> 
> I suspect megaraid_sas depends on something done by the option ROM,
> and that prior to 3c076351c402, Linux did something to ASPM that was
> enough to make megaraid_sas work.
> 
> I attached a couple debug patches to
> https://bugzilla.kernel.org/show_bug.cgi?id=63661 that log all the
> ASPM configuration the PCI core does.  One applies to 69166fbf02c7
> (the pre-3c076351c402 commit), and the other applies to v4.1-rc1.
> Could you boot both of those with "pci=earlydump" and attach the dmesg
> logs to the bugzilla?  If you boot with the BIOS CMOS reset settings
> (Fast Boot enabled and ASPM set to "BIOS"), I expect the 69166fbf02c7-
> based kernel to work, and the v4.1-rc1-based one to fail.
> 
> > Full boot of the system was difficult on the 3.2 kernels, they didn't make 
> > it
> > to userspace for other stuff being too new. For testing, I compiled
> > CONFIG_MEGARAID_SAS=y on 3.2, and =m on 3.16-rc4; that way when the 
> > initramfs &
> > userspace failed, the megaraid load was captured over IPMI serial.
> > 
> > I've done a lot of the analysis below while capturing.
> > 
> > I was going to be booting many times, so I flipped the 'Fast Boot'
> > option back to Disabled, so I could more easily get to the BIOS settings
> > to change options while testing. When I did so, an accidental boot on a
> > kernel that previously failed suddenly worked, leading me to raise an
> > eyebrow, and this expanded my test matrix more.
> > 
> > 3 kernels, 6 different BIOS config combinations (2x3) = 18 test cases
> > Each configuration was booted at least twice; if the result of two boots was
> > not identical, I booted a third time and 

Re: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-05-28 Thread Bjorn Helgaas
[+cc Jean, Myron]

Hello megaraid maintainers,

Have you been able to take a look at this at all?  People have been
reporting this issue since 2012 on upstream, Debian, and Ubuntu, and
now we're getting reports on SLES.

My theory is that the Linux driver relies on some MegaRAID initialization
done by the option ROM, and the bug happens when the BIOS doesn't execute
the option ROM.

If that's correct, you should be able to reproduce it on any system by
booting Linux (v3.3 or later) without running the MegaRAID SAS 2208 option
ROM (either by enabling a BIOS fast boot switch, or modifying the BIOS to
skip it).  If the Linux driver doesn't rely on the option ROM, you might
even be able to reproduce it by physically removing the option ROM from the
MegaRAID.

Bjorn

On Wed, Apr 29, 2015 at 12:28:32PM -0500, Bjorn Helgaas wrote:
 [+cc linux-pci, linux-kernel, Kashyap, Sumit, Uday, megaraidlinux.pdl]
 
 On Sun, Jul 13, 2014 at 01:35:51AM +, Robin H. Johnson wrote:
  On Sat, Jul 12, 2014 at 11:29:20AM -0600, Bjorn Helgaas wrote:
   Thanks for the report, Robin.
   
   https://bugzilla.kernel.org/show_bug.cgi?id=63661 bisected the problem
   to 3c076351c402 (PCI: Rework ASPM disable code), which appeared in
   v3.3.  For starters, can you verify that, e.g., by building
   69166fbf02c7 (the parent of 3c076351c402) to make sure that it works,
   and building 3c076351c402 itself to make sure it fails?
   
   Assuming that's the case, please attach the complete dmesg and lspci
   -vvxxx output for both kernels to the bugzilla.  ASPM is a feature
   that is configured on both ends of a PCIe link, so I want to see the
   lspci info for the whole system, not just the SAS adapters.
   
   It's not practical to revert 3c076351c402 now, so I'd also like to see
   the same information for the newest possible kernel (if this is
   possible; I'm not clear on whether you can boot your system or not) so
   we can figure out what needs to be changed.
  TL;DR: FastBoot is leaving the MegaRaidSAS in a weird state, and it fails to
  start; Commit 3c076351c402 did make it worse, but I think we're right that 
  the
  bug lies in the SAS code.
  
  Ok, I have done more testing on it (40+ boots), and I think we can show the
  problem is somewhere in how the BIOS/EFI/ROM brings up the card in FastBoot
  more, or how it leaves the card.
 
 I attached your dmesg and lspci logs to
 https://bugzilla.kernel.org/show_bug.cgi?id=63661, thank you!  You did
 a huge amount of excellent testing and analysis, and I'm sorry that we
 haven't made progress using the results.
 
 I still think this is a megaraid_sas driver bug, but I don't have
 enough evidence to really point fingers.
 
 Based on your testing, before 3c076351c402 (PCI: Rework ASPM disable
 code), megaraid_sas worked reliably.  After 3c076351c402,
 megaraid_sas does not work reliably when BIOS Fast Boot is enabled.
 
 Fast Boot probably means we don't run the option ROM on the device.
 Your dmesg logs show that in the working case, BIOS has enabled the
 device.  In the failing case it has not.  They also show that when
 Fast Boot is enabled, there's a little less MTRR write-protect space,
 which I'm guessing is space that wasn't needed for shadowing option
 ROMs.
 
 I suspect megaraid_sas depends on something done by the option ROM,
 and that prior to 3c076351c402, Linux did something to ASPM that was
 enough to make megaraid_sas work.
 
 I attached a couple debug patches to
 https://bugzilla.kernel.org/show_bug.cgi?id=63661 that log all the
 ASPM configuration the PCI core does.  One applies to 69166fbf02c7
 (the pre-3c076351c402 commit), and the other applies to v4.1-rc1.
 Could you boot both of those with pci=earlydump and attach the dmesg
 logs to the bugzilla?  If you boot with the BIOS CMOS reset settings
 (Fast Boot enabled and ASPM set to BIOS), I expect the 69166fbf02c7-
 based kernel to work, and the v4.1-rc1-based one to fail.
 
  Full boot of the system was difficult on the 3.2 kernels, they didn't make 
  it
  to userspace for other stuff being too new. For testing, I compiled
  CONFIG_MEGARAID_SAS=y on 3.2, and =m on 3.16-rc4; that way when the 
  initramfs 
  userspace failed, the megaraid load was captured over IPMI serial.
  
  I've done a lot of the analysis below while capturing.
  
  I was going to be booting many times, so I flipped the 'Fast Boot'
  option back to Disabled, so I could more easily get to the BIOS settings
  to change options while testing. When I did so, an accidental boot on a
  kernel that previously failed suddenly worked, leading me to raise an
  eyebrow, and this expanded my test matrix more.
  
  3 kernels, 6 different BIOS config combinations (2x3) = 18 test cases
  Each configuration was booted at least twice; if the result of two boots was
  not identical, I booted a third time and took the majority result.
  
  All kernels had no boot params involving PCI specified (none of pci=, 
  pcie*=,
  disable_msi*).
  
  Kernels:
  K.1: 

RE: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-05-28 Thread Kashyap Desai
Bjorn/Robin,

Apologies for delay. Here is one quick suggestion as we have seen similar
issue (not exactly similar, but high probably to have same issue) while
controller is configured on VM as pass-through and VM reboot abruptly.
In that particular issue, driver interact with FW which may  require chip
reset to bring controller to operation state.

Relevant patch was submitted for only Older controller as it was only seen
for few MegaRaid controller. Below patch already try to do chip reset, but
only for limited controllers...I have attached one more patch which does
chip reset from driver load time for Thunderbolt/Invader/Fury etc. (In
your case you have Thunderbolt controller, so attached patch is required.)

http://www.spinics.net/lists/linux-scsi/msg67288.html

Please post the result with attached patch.

Thanks, Kashyap

 -Original Message-
 From: Bjorn Helgaas [mailto:bhelg...@google.com]
 Sent: Thursday, May 28, 2015 5:55 PM
 To: Robin H. Johnson
 Cc: Adam Radford; Neela Syam Kolli; linux-s...@vger.kernel.org;
 arkadiusz.bub...@open-e.com; Matthew Garrett; Kashyap Desai; Sumit
Saxena;
 Uday Lingala; megaraidlinux@avagotech.com;
linux-...@vger.kernel.org;
 linux-kernel@vger.kernel.org; Jean Delvare; Myron Stowe
 Subject: Re: megaraid_sas: FW in FAULT state!!, how to get more debug
 output? [BKO63661]

 [+cc Jean, Myron]

 Hello megaraid maintainers,

 Have you been able to take a look at this at all?  People have been
reporting this
 issue since 2012 on upstream, Debian, and Ubuntu, and now we're getting
 reports on SLES.

 My theory is that the Linux driver relies on some MegaRAID
initialization done by
 the option ROM, and the bug happens when the BIOS doesn't execute the
option
 ROM.

 If that's correct, you should be able to reproduce it on any system by
booting
 Linux (v3.3 or later) without running the MegaRAID SAS 2208 option ROM
(either
 by enabling a BIOS fast boot switch, or modifying the BIOS to skip
it).  If the
 Linux driver doesn't rely on the option ROM, you might even be able to
 reproduce it by physically removing the option ROM from the MegaRAID.

 Bjorn

 On Wed, Apr 29, 2015 at 12:28:32PM -0500, Bjorn Helgaas wrote:
  [+cc linux-pci, linux-kernel, Kashyap, Sumit, Uday, megaraidlinux.pdl]
 
  On Sun, Jul 13, 2014 at 01:35:51AM +, Robin H. Johnson wrote:
   On Sat, Jul 12, 2014 at 11:29:20AM -0600, Bjorn Helgaas wrote:
Thanks for the report, Robin.
   
https://bugzilla.kernel.org/show_bug.cgi?id=63661 bisected the
problem to 3c076351c402 (PCI: Rework ASPM disable code), which
appeared in v3.3.  For starters, can you verify that, e.g., by
building
69166fbf02c7 (the parent of 3c076351c402) to make sure that it
works, and building 3c076351c402 itself to make sure it fails?
   
Assuming that's the case, please attach the complete dmesg and
lspci -vvxxx output for both kernels to the bugzilla.  ASPM is a
feature that is configured on both ends of a PCIe link, so I want
to see the lspci info for the whole system, not just the SAS
adapters.
   
It's not practical to revert 3c076351c402 now, so I'd also like to
see the same information for the newest possible kernel (if this
is possible; I'm not clear on whether you can boot your system or
not) so we can figure out what needs to be changed.
   TL;DR: FastBoot is leaving the MegaRaidSAS in a weird state, and it
   fails to start; Commit 3c076351c402 did make it worse, but I think
   we're right that the bug lies in the SAS code.
  
   Ok, I have done more testing on it (40+ boots), and I think we can
   show the problem is somewhere in how the BIOS/EFI/ROM brings up the
   card in FastBoot more, or how it leaves the card.
 
  I attached your dmesg and lspci logs to
  https://bugzilla.kernel.org/show_bug.cgi?id=63661, thank you!  You did
  a huge amount of excellent testing and analysis, and I'm sorry that we
  haven't made progress using the results.
 
  I still think this is a megaraid_sas driver bug, but I don't have
  enough evidence to really point fingers.
 
  Based on your testing, before 3c076351c402 (PCI: Rework ASPM disable
  code), megaraid_sas worked reliably.  After 3c076351c402,
  megaraid_sas does not work reliably when BIOS Fast Boot is enabled.
 
  Fast Boot probably means we don't run the option ROM on the device.
  Your dmesg logs show that in the working case, BIOS has enabled the
  device.  In the failing case it has not.  They also show that when
  Fast Boot is enabled, there's a little less MTRR write-protect space,
  which I'm guessing is space that wasn't needed for shadowing option
  ROMs.
 
  I suspect megaraid_sas depends on something done by the option ROM,
  and that prior to 3c076351c402, Linux did something to ASPM that was
  enough to make megaraid_sas work.
 
  I attached a couple debug patches to
  https://bugzilla.kernel.org/show_bug.cgi?id=63661 that log all the
  ASPM configuration the PCI core does.  One applies to 69166fbf02c7

Re: megaraid_sas: "FW in FAULT state!!", how to get more debug output? [BKO63661]

2015-04-29 Thread Bjorn Helgaas
[+cc linux-pci, linux-kernel, Kashyap, Sumit, Uday, megaraidlinux.pdl]

On Sun, Jul 13, 2014 at 01:35:51AM +, Robin H. Johnson wrote:
> On Sat, Jul 12, 2014 at 11:29:20AM -0600, Bjorn Helgaas wrote:
> > Thanks for the report, Robin.
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=63661 bisected the problem
> > to 3c076351c402 ("PCI: Rework ASPM disable code"), which appeared in
> > v3.3.  For starters, can you verify that, e.g., by building
> > 69166fbf02c7 (the parent of 3c076351c402) to make sure that it works,
> > and building 3c076351c402 itself to make sure it fails?
> > 
> > Assuming that's the case, please attach the complete dmesg and "lspci
> > -vvxxx" output for both kernels to the bugzilla.  ASPM is a feature
> > that is configured on both ends of a PCIe link, so I want to see the
> > lspci info for the whole system, not just the SAS adapters.
> > 
> > It's not practical to revert 3c076351c402 now, so I'd also like to see
> > the same information for the newest possible kernel (if this is
> > possible; I'm not clear on whether you can boot your system or not) so
> > we can figure out what needs to be changed.
> TL;DR: FastBoot is leaving the MegaRaidSAS in a weird state, and it fails to
> start; Commit 3c076351c402 did make it worse, but I think we're right that the
> bug lies in the SAS code.
> 
> Ok, I have done more testing on it (40+ boots), and I think we can show the
> problem is somewhere in how the BIOS/EFI/ROM brings up the card in FastBoot
> more, or how it leaves the card.

I attached your dmesg and lspci logs to
https://bugzilla.kernel.org/show_bug.cgi?id=63661, thank you!  You did
a huge amount of excellent testing and analysis, and I'm sorry that we
haven't made progress using the results.

I still think this is a megaraid_sas driver bug, but I don't have
enough evidence to really point fingers.

Based on your testing, before 3c076351c402 ("PCI: Rework ASPM disable
code"), megaraid_sas worked reliably.  After 3c076351c402,
megaraid_sas does not work reliably when BIOS Fast Boot is enabled.

Fast Boot probably means we don't run the option ROM on the device.
Your dmesg logs show that in the working case, BIOS has enabled the
device.  In the failing case it has not.  They also show that when
Fast Boot is enabled, there's a little less MTRR write-protect space,
which I'm guessing is space that wasn't needed for shadowing option
ROMs.

I suspect megaraid_sas depends on something done by the option ROM,
and that prior to 3c076351c402, Linux did something to ASPM that was
enough to make megaraid_sas work.

I attached a couple debug patches to
https://bugzilla.kernel.org/show_bug.cgi?id=63661 that log all the
ASPM configuration the PCI core does.  One applies to 69166fbf02c7
(the pre-3c076351c402 commit), and the other applies to v4.1-rc1.
Could you boot both of those with "pci=earlydump" and attach the dmesg
logs to the bugzilla?  If you boot with the BIOS CMOS reset settings
(Fast Boot enabled and ASPM set to "BIOS"), I expect the 69166fbf02c7-
based kernel to work, and the v4.1-rc1-based one to fail.

> Full boot of the system was difficult on the 3.2 kernels, they didn't make it
> to userspace for other stuff being too new. For testing, I compiled
> CONFIG_MEGARAID_SAS=y on 3.2, and =m on 3.16-rc4; that way when the initramfs 
> &
> userspace failed, the megaraid load was captured over IPMI serial.
> 
> I've done a lot of the analysis below while capturing.
> 
> I was going to be booting many times, so I flipped the 'Fast Boot'
> option back to Disabled, so I could more easily get to the BIOS settings
> to change options while testing. When I did so, an accidental boot on a
> kernel that previously failed suddenly worked, leading me to raise an
> eyebrow, and this expanded my test matrix more.
> 
> 3 kernels, 6 different BIOS config combinations (2x3) = 18 test cases
> Each configuration was booted at least twice; if the result of two boots was
> not identical, I booted a third time and took the majority result.
> 
> All kernels had no boot params involving PCI specified (none of pci=, pcie*=,
> disable_msi*).
> 
> Kernels:
> K.1: Ubuntu's 3.16-rc4
> K.2: 3.2-rc4 3c076351c402 - aspm merged
> K.3: 3.2-rc4 69166fbf02c7 - aspm merge parent
> Notes: 3.2* compiled with GCC4.6, 3.16-rc4 with GCC4.8
> 
> BIOS: Boot -> FastBoot:
> B1.1 Off
> B1.2 On (CMOS reset default)
> 
> BIOS: Advanced -> PCIe/PCI/PnP Configuration -> ASPM Support
> B2.1 Force L0s
> B2.2 BIOS (CMOS reset default)
> B2.3 Disabled
> 
> Reduced Kernaugh Map of results:
> Kernels,B1,B2:   Result
>   *, B1.1,*  PASS
>   *, B1.2, B2.1  VARIABLE (9 runs: 5 fail, 4 pass, no kernel consistency)
> K.1, B1.2, B2.2  FAIL
> K.1, B1.2, B2.3  FAIL
> K.2, B1.2, B2.2  FAIL
> K.2, B1.2, B2.3  FAIL
> K.3, B1.2, B2.2  PASS
> K.3, B1.2, B2.3  PASS

I'm not very practiced with Karnaugh maps, so correct me if my
understanding is wrong:

  - Fast Boot disabled: all kernels always passed

  - Fast Boot enabled, ASPM set to Force L0s 

Re: megaraid_sas: FW in FAULT state!!, how to get more debug output? [BKO63661]

2015-04-29 Thread Bjorn Helgaas
[+cc linux-pci, linux-kernel, Kashyap, Sumit, Uday, megaraidlinux.pdl]

On Sun, Jul 13, 2014 at 01:35:51AM +, Robin H. Johnson wrote:
 On Sat, Jul 12, 2014 at 11:29:20AM -0600, Bjorn Helgaas wrote:
  Thanks for the report, Robin.
  
  https://bugzilla.kernel.org/show_bug.cgi?id=63661 bisected the problem
  to 3c076351c402 (PCI: Rework ASPM disable code), which appeared in
  v3.3.  For starters, can you verify that, e.g., by building
  69166fbf02c7 (the parent of 3c076351c402) to make sure that it works,
  and building 3c076351c402 itself to make sure it fails?
  
  Assuming that's the case, please attach the complete dmesg and lspci
  -vvxxx output for both kernels to the bugzilla.  ASPM is a feature
  that is configured on both ends of a PCIe link, so I want to see the
  lspci info for the whole system, not just the SAS adapters.
  
  It's not practical to revert 3c076351c402 now, so I'd also like to see
  the same information for the newest possible kernel (if this is
  possible; I'm not clear on whether you can boot your system or not) so
  we can figure out what needs to be changed.
 TL;DR: FastBoot is leaving the MegaRaidSAS in a weird state, and it fails to
 start; Commit 3c076351c402 did make it worse, but I think we're right that the
 bug lies in the SAS code.
 
 Ok, I have done more testing on it (40+ boots), and I think we can show the
 problem is somewhere in how the BIOS/EFI/ROM brings up the card in FastBoot
 more, or how it leaves the card.

I attached your dmesg and lspci logs to
https://bugzilla.kernel.org/show_bug.cgi?id=63661, thank you!  You did
a huge amount of excellent testing and analysis, and I'm sorry that we
haven't made progress using the results.

I still think this is a megaraid_sas driver bug, but I don't have
enough evidence to really point fingers.

Based on your testing, before 3c076351c402 (PCI: Rework ASPM disable
code), megaraid_sas worked reliably.  After 3c076351c402,
megaraid_sas does not work reliably when BIOS Fast Boot is enabled.

Fast Boot probably means we don't run the option ROM on the device.
Your dmesg logs show that in the working case, BIOS has enabled the
device.  In the failing case it has not.  They also show that when
Fast Boot is enabled, there's a little less MTRR write-protect space,
which I'm guessing is space that wasn't needed for shadowing option
ROMs.

I suspect megaraid_sas depends on something done by the option ROM,
and that prior to 3c076351c402, Linux did something to ASPM that was
enough to make megaraid_sas work.

I attached a couple debug patches to
https://bugzilla.kernel.org/show_bug.cgi?id=63661 that log all the
ASPM configuration the PCI core does.  One applies to 69166fbf02c7
(the pre-3c076351c402 commit), and the other applies to v4.1-rc1.
Could you boot both of those with pci=earlydump and attach the dmesg
logs to the bugzilla?  If you boot with the BIOS CMOS reset settings
(Fast Boot enabled and ASPM set to BIOS), I expect the 69166fbf02c7-
based kernel to work, and the v4.1-rc1-based one to fail.

 Full boot of the system was difficult on the 3.2 kernels, they didn't make it
 to userspace for other stuff being too new. For testing, I compiled
 CONFIG_MEGARAID_SAS=y on 3.2, and =m on 3.16-rc4; that way when the initramfs 
 
 userspace failed, the megaraid load was captured over IPMI serial.
 
 I've done a lot of the analysis below while capturing.
 
 I was going to be booting many times, so I flipped the 'Fast Boot'
 option back to Disabled, so I could more easily get to the BIOS settings
 to change options while testing. When I did so, an accidental boot on a
 kernel that previously failed suddenly worked, leading me to raise an
 eyebrow, and this expanded my test matrix more.
 
 3 kernels, 6 different BIOS config combinations (2x3) = 18 test cases
 Each configuration was booted at least twice; if the result of two boots was
 not identical, I booted a third time and took the majority result.
 
 All kernels had no boot params involving PCI specified (none of pci=, pcie*=,
 disable_msi*).
 
 Kernels:
 K.1: Ubuntu's 3.16-rc4
 K.2: 3.2-rc4 3c076351c402 - aspm merged
 K.3: 3.2-rc4 69166fbf02c7 - aspm merge parent
 Notes: 3.2* compiled with GCC4.6, 3.16-rc4 with GCC4.8
 
 BIOS: Boot - FastBoot:
 B1.1 Off
 B1.2 On (CMOS reset default)
 
 BIOS: Advanced - PCIe/PCI/PnP Configuration - ASPM Support
 B2.1 Force L0s
 B2.2 BIOS (CMOS reset default)
 B2.3 Disabled
 
 Reduced Kernaugh Map of results:
 Kernels,B1,B2:   Result
   *, B1.1,*  PASS
   *, B1.2, B2.1  VARIABLE (9 runs: 5 fail, 4 pass, no kernel consistency)
 K.1, B1.2, B2.2  FAIL
 K.1, B1.2, B2.3  FAIL
 K.2, B1.2, B2.2  FAIL
 K.2, B1.2, B2.3  FAIL
 K.3, B1.2, B2.2  PASS
 K.3, B1.2, B2.3  PASS

I'm not very practiced with Karnaugh maps, so correct me if my
understanding is wrong:

  - Fast Boot disabled: all kernels always passed

  - Fast Boot enabled, ASPM set to Force L0s enabled: variable; no
consistency of results

  - Fast Boot enabled, ASPM set to BIOS or