Re: Sym2 scsi hang on boot on sparc64

2014-08-20 Thread Meelis Roos
  Bisection (on PA-RISC) points to:
  
  71e75c97f97a9645d25fbf3d8e4165a558f18747 is the first bad commit
  commit 71e75c97f97a9645d25fbf3d8e4165a558f18747
  Author: Christoph Hellwig h...@lst.de
  Date:   Fri Apr 11 19:07:01 2014 +0200
  
  scsi: convert device_busy to atomic_t
 
 That's fixed upstream:
 
 commit 480cadc2b7e0fa2bbab20141efb547dfe0c3707c

Yes, works for both sparc64 and parisc.

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sym2 scsi hang on boot on sparc64

2014-08-19 Thread James Bottomley
On Tue, 2014-08-19 at 14:25 +0300, Meelis Roos wrote:
 3.16 scsi worked fine, 3.17-rc1 misbehaves on 3 of my sparc64 test 
 machines. E220R and E420R are with onboard 5c3875, V210 is with onboarc 
 53c1010 and all behave the same. Any ideas whre to dig deeper? bisection 
 might be nontrivial, because of sparc64 changes that are OK on 3.17-rc1 
 again - but is possible if nothing else helps.

We've got a parisc with an 875 as a root SCSI bus ... I haven't got
around to building for it yet, but I might find time to try today.

 [  164.639697] PCI: Enabling device: (:00:03.0), cmd 147  

 [  164.705076] sym0: 875 rev 0x14 at pci :00:03.0 irq 13

 [  164.858446] sym0: No NVRAM, ID 7, Fast-20, SE, parity checking 

 [  164.935031] sym0: SCSI BUS has been reset. 

 [  164.983113] scsi host0: sym-2.2.3  

 [  165.026358] PCI: Enabling device: (:00:03.1), cmd 3

 [  165.089634] sym1: 875 rev 0x14 at pci :00:03.1 irq 14

 [  165.242820] sym1: No NVRAM, ID 7, Fast-20, SE, parity checking 

 [  165.319227] sym1: SCSI BUS has been reset. 

 [  165.367281] scsi host1: sym-2.2.3  


Does it detect drives in the bit you cut?  I ask because one of the
symptoms of a misrouted irq is random problems with bring up.  However,
if anything is detected, then the irq must be OK.

James

 [  388.835999] INFO: task swapper/0:1 blocked for more than 120 seconds.  

 [  388.912181]   Not tainted 3.17.0-rc1 #46   

 [  388.963187] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message. 
 [  389.056953] swapper/0   D 00483958  7584 1  0 
 0x2000100   
 [  389.148575] Call Trace:

 [  389.177747]  [0082e5fc] schedule+0x1c/0x80 

 [  389.235024]  [00483958] async_synchronize_cookie_domain+0x58/0x100 

 [  389.317301]  [00483a28] async_synchronize_full+0x8/0x20

 [  389.388133]  [006ebe04] wait_for_device_probe+0x64/0x80

 [  389.458938]  [009dcffc] prepare_namespace+0x4/0x1b8

 [  389.525590]  [009dcbac] kernel_init_freeable+0x1c0/0x1d8   

 [  389.597450]  [008298e4] kernel_init+0x4/0x100  

 [  389.657868]  [004060c4] ret_from_fork+0x1c/0x2c

 [  389.720324]  []   (null)   

 [  389.775518] no locks held by swapper/0/1.  

 
 
 
 -- 
 Meelis Roos (mr...@linux.ee)
 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sym2 scsi hang on boot on sparc64

2014-08-19 Thread Meelis Roos
  3.16 scsi worked fine, 3.17-rc1 misbehaves on 3 of my sparc64 test 
  machines. E220R and E420R are with onboard 5c3875, V210 is with onboarc 
  53c1010 and all behave the same. Any ideas whre to dig deeper? bisection 
  might be nontrivial, because of sparc64 changes that are OK on 3.17-rc1 
  again - but is possible if nothing else helps.
 
 We've got a parisc with an 875 as a root SCSI bus ... I haven't got
 around to building for it yet, but I might find time to try today.

Come to think of it, I have couple parsisc with 875 too, will try.

  [  164.639697] PCI: Enabling device: (:00:03.0), cmd 147
   
  [  164.705076] sym0: 875 rev 0x14 at pci :00:03.0 irq 13  
   
  [  164.858446] sym0: No NVRAM, ID 7, Fast-20, SE, parity checking   
   
  [  164.935031] sym0: SCSI BUS has been reset.   
   
  [  164.983113] scsi host0: sym-2.2.3
   
  [  165.026358] PCI: Enabling device: (:00:03.1), cmd 3  
   
  [  165.089634] sym1: 875 rev 0x14 at pci :00:03.1 irq 14  
   
  [  165.242820] sym1: No NVRAM, ID 7, Fast-20, SE, parity checking   
   
  [  165.319227] sym1: SCSI BUS has been reset.   
   
  [  165.367281] scsi host1: sym-2.2.3
   
 
 Does it detect drives in the bit you cut?  I ask because one of the
 symptoms of a misrouted irq is random problems with bring up.  However,
 if anything is detected, then the irq must be OK.

No, nothing scsi related - rtc detection etc.

 
 James
 
  [  388.835999] INFO: task swapper/0:1 blocked for more than 120 seconds.
   
  [  388.912181]   Not tainted 3.17.0-rc1 #46 
   
  [  388.963187] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
  this message. 
  [  389.056953] swapper/0   D 00483958  7584 1  0 
  0x2000100   
  [  389.148575] Call Trace:  
   
  [  389.177747]  [0082e5fc] schedule+0x1c/0x80   
   
  [  389.235024]  [00483958] 
  async_synchronize_cookie_domain+0x58/0x100
  [  389.317301]  [00483a28] async_synchronize_full+0x8/0x20  
   
  [  389.388133]  [006ebe04] wait_for_device_probe+0x64/0x80  
   
  [  389.458938]  [009dcffc] prepare_namespace+0x4/0x1b8  
   
  [  389.525590]  [009dcbac] kernel_init_freeable+0x1c0/0x1d8 
   
  [  389.597450]  [008298e4] kernel_init+0x4/0x100
   
  [  389.657868]  [004060c4] ret_from_fork+0x1c/0x2c  
   
  [  389.720324]  []   (null) 
   
  [  389.775518] no locks held by swapper/0/1.
   
  
  
  
  -- 
  Meelis Roos (mr...@linux.ee)
  --
  To unsubscribe from this list: send the line unsubscribe linux-scsi in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sym2 scsi hang on boot on sparc64

2014-08-19 Thread Meelis Roos
 On Tue, 2014-08-19 at 14:25 +0300, Meelis Roos wrote:
  3.16 scsi worked fine, 3.17-rc1 misbehaves on 3 of my sparc64 test 
  machines. E220R and E420R are with onboard 5c3875, V210 is with onboarc 
  53c1010 and all behave the same. Any ideas whre to dig deeper? bisection 
  might be nontrivial, because of sparc64 changes that are OK on 3.17-rc1 
  again - but is possible if nothing else helps.
 
 We've got a parisc with an 875 as a root SCSI bus ... I haven't got
 around to building for it yet, but I might find time to try today.

Same on parisc:

sym0: 1010-66 rev 0x1 at pci :20:01.0 irq 22
sym0: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi host0: sym-2.2.3
random: nonblocking pool is initialized

and hangs here. So hopefully it is reproducible for you.

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sym2 scsi hang on boot on sparc64

2014-08-19 Thread James Bottomley
On Tue, 2014-08-19 at 17:37 +0300, Meelis Roos wrote:
  On Tue, 2014-08-19 at 14:25 +0300, Meelis Roos wrote:
   3.16 scsi worked fine, 3.17-rc1 misbehaves on 3 of my sparc64 test 
   machines. E220R and E420R are with onboard 5c3875, V210 is with onboarc 
   53c1010 and all behave the same. Any ideas whre to dig deeper? bisection 
   might be nontrivial, because of sparc64 changes that are OK on 3.17-rc1 
   again - but is possible if nothing else helps.
  
  We've got a parisc with an 875 as a root SCSI bus ... I haven't got
  around to building for it yet, but I might find time to try today.
 
 Same on parisc:
 
 sym0: 1010-66 rev 0x1 at pci :20:01.0 irq 22
 sym0: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
 sym0: SCSI BUS has been reset.
 scsi host0: sym-2.2.3
 random: nonblocking pool is initialized
 
 and hangs here. So hopefully it is reproducible for you.

And also independent of the sparc changes.  The only other change in the
window you quote is 64 bit luns.

James



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sym2 scsi hang on boot on sparc64

2014-08-19 Thread Aaro Koskinen
Hi,

On Tue, Aug 19, 2014 at 09:47:35AM -0500, James Bottomley wrote:
 On Tue, 2014-08-19 at 17:37 +0300, Meelis Roos wrote:
   On Tue, 2014-08-19 at 14:25 +0300, Meelis Roos wrote:
3.16 scsi worked fine, 3.17-rc1 misbehaves on 3 of my sparc64 test 
machines. E220R and E420R are with onboard 5c3875, V210 is with onboarc 
53c1010 and all behave the same. Any ideas whre to dig deeper? 
bisection 
might be nontrivial, because of sparc64 changes that are OK on 3.17-rc1 
again - but is possible if nothing else helps.
   
   We've got a parisc with an 875 as a root SCSI bus ... I haven't got
   around to building for it yet, but I might find time to try today.
  
  Same on parisc:
  
  sym0: 1010-66 rev 0x1 at pci :20:01.0 irq 22
  sym0: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
  sym0: SCSI BUS has been reset.
  scsi host0: sym-2.2.3
  random: nonblocking pool is initialized
  
  and hangs here. So hopefully it is reproducible for you.
 
 And also independent of the sparc changes.  The only other change in the
 window you quote is 64 bit luns.

Bisection (on PA-RISC) points to:

71e75c97f97a9645d25fbf3d8e4165a558f18747 is the first bad commit
commit 71e75c97f97a9645d25fbf3d8e4165a558f18747
Author: Christoph Hellwig h...@lst.de
Date:   Fri Apr 11 19:07:01 2014 +0200

scsi: convert device_busy to atomic_t

A.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sym2 scsi hang on boot on sparc64

2014-08-19 Thread Sam Ravnborg
On Tue, Aug 19, 2014 at 11:17:48PM +0300, Aaro Koskinen wrote:
 Hi,
 
 On Tue, Aug 19, 2014 at 09:47:35AM -0500, James Bottomley wrote:
  On Tue, 2014-08-19 at 17:37 +0300, Meelis Roos wrote:
On Tue, 2014-08-19 at 14:25 +0300, Meelis Roos wrote:
 3.16 scsi worked fine, 3.17-rc1 misbehaves on 3 of my sparc64 test 
 machines. E220R and E420R are with onboard 5c3875, V210 is with 
 onboarc 
 53c1010 and all behave the same. Any ideas whre to dig deeper? 
 bisection 
 might be nontrivial, because of sparc64 changes that are OK on 
 3.17-rc1 
 again - but is possible if nothing else helps.

We've got a parisc with an 875 as a root SCSI bus ... I haven't got
around to building for it yet, but I might find time to try today.
   
   Same on parisc:
   
   sym0: 1010-66 rev 0x1 at pci :20:01.0 irq 22
   sym0: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
   sym0: SCSI BUS has been reset.
   scsi host0: sym-2.2.3
   random: nonblocking pool is initialized
   
   and hangs here. So hopefully it is reproducible for you.
  
  And also independent of the sparc changes.  The only other change in the
  window you quote is 64 bit luns.
 
 Bisection (on PA-RISC) points to:
 
 71e75c97f97a9645d25fbf3d8e4165a558f18747 is the first bad commit
 commit 71e75c97f97a9645d25fbf3d8e4165a558f18747
 Author: Christoph Hellwig h...@lst.de
 Date:   Fri Apr 11 19:07:01 2014 +0200
 
 scsi: convert device_busy to atomic_t

I guess you need this fix:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9c44392..ce62e87 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1774,7 +1774,7 @@ static void scsi_request_fn(struct request_queue *q)
blk_requeue_request(q, req);
atomic_dec(sdev-device_busy);
 out_delay:
-   if (atomic_read(sdev-device_busy)  !scsi_device_blocked(sdev))
+   if (!atomic_read(sdev-device_busy)  !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
 }


James already sent it to Linus.

Sam
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sym2 scsi hang on boot on sparc64

2014-08-19 Thread James Bottomley
On Tue, 2014-08-19 at 23:17 +0300, Aaro Koskinen wrote:
 Hi,
 
 On Tue, Aug 19, 2014 at 09:47:35AM -0500, James Bottomley wrote:
  On Tue, 2014-08-19 at 17:37 +0300, Meelis Roos wrote:
On Tue, 2014-08-19 at 14:25 +0300, Meelis Roos wrote:
 3.16 scsi worked fine, 3.17-rc1 misbehaves on 3 of my sparc64 test 
 machines. E220R and E420R are with onboard 5c3875, V210 is with 
 onboarc 
 53c1010 and all behave the same. Any ideas whre to dig deeper? 
 bisection 
 might be nontrivial, because of sparc64 changes that are OK on 
 3.17-rc1 
 again - but is possible if nothing else helps.

We've got a parisc with an 875 as a root SCSI bus ... I haven't got
around to building for it yet, but I might find time to try today.
   
   Same on parisc:
   
   sym0: 1010-66 rev 0x1 at pci :20:01.0 irq 22
   sym0: PA-RISC Firmware, ID 7, Fast-80, LVD, parity checking
   sym0: SCSI BUS has been reset.
   scsi host0: sym-2.2.3
   random: nonblocking pool is initialized
   
   and hangs here. So hopefully it is reproducible for you.
  
  And also independent of the sparc changes.  The only other change in the
  window you quote is 64 bit luns.
 
 Bisection (on PA-RISC) points to:
 
 71e75c97f97a9645d25fbf3d8e4165a558f18747 is the first bad commit
 commit 71e75c97f97a9645d25fbf3d8e4165a558f18747
 Author: Christoph Hellwig h...@lst.de
 Date:   Fri Apr 11 19:07:01 2014 +0200
 
 scsi: convert device_busy to atomic_t

That's fixed upstream:

commit 480cadc2b7e0fa2bbab20141efb547dfe0c3707c
Author: Guenter Roeck li...@roeck-us.net
Date:   Sun Aug 10 05:54:25 2014 -0700

scsi: Fix qemu boot hang problem

Could you try with a kernel that has that fix?

Thanks,

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sym2 scsi hang on boot on sparc64

2014-08-19 Thread Aaro Koskinen
Hi,

On Tue, Aug 19, 2014 at 03:37:18PM -0500, James Bottomley wrote:
 On Tue, 2014-08-19 at 23:17 +0300, Aaro Koskinen wrote:
  Bisection (on PA-RISC) points to:
  
  71e75c97f97a9645d25fbf3d8e4165a558f18747 is the first bad commit
  commit 71e75c97f97a9645d25fbf3d8e4165a558f18747
  Author: Christoph Hellwig h...@lst.de
  Date:   Fri Apr 11 19:07:01 2014 +0200
  
  scsi: convert device_busy to atomic_t
 
 That's fixed upstream:
 
 commit 480cadc2b7e0fa2bbab20141efb547dfe0c3707c
 Author: Guenter Roeck li...@roeck-us.net
 Date:   Sun Aug 10 05:54:25 2014 -0700
 
 scsi: Fix qemu boot hang problem
 
 Could you try with a kernel that has that fix?

Yes, the box boots now fine with the fix.

Thanks,

A.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html