Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello, I've encountered a similiar error like Matthias Prager did in his first mail in this thread in 2012. I use Debian 8 Kernel 3.16 and also own a LSI 2008 card flashed to IT mode (firmware P20) and have problems with disks that were spun down. Writing to them when they are spun down usually ends in the following errors: [59526.359997] sd 0:0:1:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [59526.360003] sd 0:0:1:0: [sdc] CDB: [59526.360006] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00 [59526.360022] blk_update_request: I/O error, dev sdc, sector 824769880 [59544.111090] sd 0:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [59544.111097] sd 0:0:0:0: [sdb] CDB: [59544.00] Read(16): 88 00 00 00 00 00 31 28 fd 50 00 00 00 08 00 00 [59544.15] blk_update_request: I/O error, dev sdb, sector 824769872 [59544.114465] sd 0:0:4:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [59544.114468] sd 0:0:4:0: [sdf] CDB: [59544.114469] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00 [59544.114483] blk_update_request: I/O error, dev sdf, sector 824769880 [59552.117436] sd 0:0:3:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [59552.117443] sd 0:0:3:0: [sde] CDB: [59552.117446] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00 [59552.117462] blk_update_request: I/O error, dev sde, sector 824769968 [59572.951158] sd 0:0:2:0: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [59572.951167] sd 0:0:2:0: [sdd] CDB: [59572.951170] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00 [59572.951192] blk_update_request: I/O error, dev sdd, sector 824769968 [59572.955679] sd 0:0:5:0: [sdg] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [59572.955695] sd 0:0:5:0: [sdg] CDB: [59572.955701] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00 [59572.955720] blk_update_request: I/O error, dev sdg, sector 824769968 [70357.782677] sd 0:0:4:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [70357.782686] sd 0:0:4:0: [sdf] CDB: [70357.782690] Read(16): 88 00 00 00 00 00 85 c1 c9 08 00 00 00 08 00 00 [70357.782712] blk_update_request: I/O error, dev sdf, sector 2244069640 [70368.087947] sd 0:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK [70368.087953] sd 0:0:0:0: [sdb] CDB: [70368.087955] Read(16): 88 00 00 00 00 00 85 c1 c9 00 00 00 00 08 00 00 [70368.087969] blk_update_request: I/O error, dev sdb, sector 2244069632 Notice the lack of the "Device not ready" message, otherwise these errors look very similiars to Matthias' errors. I have no clue what to do to fix this problem. Any suggestions? Greetings, Felix -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
On 07/25/2012 03:55 PM, James Bottomley wrote: Well, reading it, so do I. Unfortunately, we get to deal with the world as it is rather than as we would wish it to be. We likely have this problem with a lot of USB SATLs as well ... Has this patch made it into the main git trees yet? I haven't seen anything about it in nearly a month, but I've been using the James' patch since he posted it and the sleep/wakeup behavior seems improved/correct. -- Robert -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Am 16.08.2012 20:26, schrieb Robert Trace: On 07/25/2012 03:55 PM, James Bottomley wrote: Well, reading it, so do I. Unfortunately, we get to deal with the world as it is rather than as we would wish it to be. We likely have this problem with a lot of USB SATLs as well ... Has this patch made it into the main git trees yet? Not yet, but it is in James scsi misc tree and last I heard was scheduled for inclusion in the 3.6 kernel. Anyways here is his commit: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb I haven't seen anything about it in nearly a month, but I've been using the James' patch since he posted it and the sleep/wakeup behavior seems improved/correct. I have been running smoothly with the patch too - problem solved I'd say :-) -- Robert -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
On 08/16/2012 04:24 PM, Matthias Prager wrote: Not yet, but it is in James scsi misc tree and last I heard was scheduled for inclusion in the 3.6 kernel. Close enough. :-) I didn't track the changes on the SCSI tree and I just wanted to make sure that it didn't slip through the cracks. Thanks to all involved for all of the help and a speedy fix! -- Robert -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
On 07/25/2012 07:56 PM, Matthias Prager wrote: I don't yet understand all the code but I'm following your discussion with Tejun: I've set up a minimal vm running gentoo with a mpt2sas driven controller in passthrough mode. I've applied your proposed patch against the vanilla 3.5.0 kernel (which includes Tejun's commit), and I'm happy to report the problem does seem to get fixed by it. I can confirm this on my hardware as well with both 3.4.4 and 3.5.0. Without James' patch the kernels will immediately drop the I/O and with the patch both kernels will wake the SATA disks and then complete the I/O successfully. -- Robert -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
On 07/25/2012 06:35 PM, tomm wrote: If this is a driver or firmware bug, then why would commit 85ef06d1d252f6a2e73b678591ab71caad4667bb cause this to happen? What is the interaction between this issue and this commit which just flushes events? That's confusing to me as well. Tejun's patch seems very unrelated to anything related to power-state on non-removable disks. Also this issue does not happen with mvsas, only with mpt2sas. Now _that_ is a useful data point. Is that with SATA disks attached? Why is it limited (so far) to just the mpt2sas controller? -- Robert -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: 'Device not ready' issue on mpt2sas since 3.1.10
Hi, We have done some analysis on this issue. From our analysis we observed that, this issue is reproducible on kernel 3.1.10 onwards but in 3.0.36 this issue is not reproducible. So, we have taken the mpt2sas code from 3.1.10 kernel and compiled and run it on 3.0.36 kernel. Here this issue is not reproducible (i.e. it is working fine). From 3.0.36 kernel onwards we have not added any patches that will cause this issue. So, what I mean to say is this issue is not because of mpt2sas driver. Regards, Sreekanth. -Original Message- From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- ow...@vger.kernel.org] On Behalf Of Matthias Prager Sent: Wednesday, July 25, 2012 3:34 AM To: Tejun Heo Cc: Robert Trace; linux-scsi@vger.kernel.org; Jens Axboe; Moore, Eric; James E.J. Bottomley; Alan; Darrick J. Wong; Matthias Prager Subject: Re: 'Device not ready' issue on mpt2sas since 3.1.10 Hello everyone, I retested with a new firmware (P14 - released today), since it contains a bunch of sata and SATL fixes (according to the changelog). Unfortunately the observed behavior is unchanged (tested on a 3.4.5 kernel). Just wanted to let everyone know. Cheers Matthias -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
On Sun, 2012-07-22 at 10:31 -0700, Tejun Heo wrote: Hello, On Sat, Jul 21, 2012 at 02:15:56PM +0200, Matthias Prager wrote: Now I'm not sure this isn't taping over another bug. Which leads me to my question: What is the correct behavior? #1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o by setting allow_restart=1 for sata disks on sas controllers or #2 Teaching the sas drivers they do not need spin-up commands and can simply start issuing i/o to sata disks I haven't consulted SAT but it seems like a bug in SAS driver or firmware. If it's a driver bug, we better fix it there. If a firmware bug, working around those is one of major roles of drivers, so I think setting allow_restart is fine. Actually, I don't think so. SAT-2 section 8.12.2 does say if the device is in the stopped state as the result of processing a START STOP UNIT command (see 9.11), then the SATL shall terminate the TEST UNIT READY command with CHECK CONDITION status with the sense key set to NOT READY and the additional sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND REQUIRED; START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and that's what hdparm -y issues. We don't see this in /drivers/ata because TEST UNIT READY always returns success. So it looks like the mpt2sas SAT is doing the correct thing and we only don't see this problem in normal SATA devices because of a bug in the libata-scsi SAT. However, the kernel log Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Device not ready Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Sense Key : Not Ready [current] Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Add. Sense: Logical unit not ready, initializing command required Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] CDB: Write(10): 2a 00 57 54 52 3f 00 00 08 00 Indicates we got the NOT READY to a non-TUR command, so I suspect what's happening is that sending the TUR causes the SAT to remember the standby state and respond NOT READY to all subsequent commands. However, if we just send an ordinary command, not a TUR, it quietly wakes the drive and we don't see any problems. There is support in SAT for this behaviour because there's a note on the START STOP UNIT command saying After returning GOOD status for a START STOP UNIT command with the START bit set to zero, the SATL shall consider the ATA device to be in the Stopped power state (see SBC-2) Which in SCSI terms would mean return NOT READY to any subsequent commands. Can someone verify this is indeed what the mpt2sas HBA is doing? James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello, James. On Wed, Jul 25, 2012 at 06:19:13PM +0400, James Bottomley wrote: I haven't consulted SAT but it seems like a bug in SAS driver or firmware. If it's a driver bug, we better fix it there. If a firmware bug, working around those is one of major roles of drivers, so I think setting allow_restart is fine. Actually, I don't think so. SAT-2 section 8.12.2 does say if the device is in the stopped state as the result of processing a START STOP UNIT command (see 9.11), then the SATL shall terminate the TEST UNIT READY command with CHECK CONDITION status with the sense key set to NOT READY and the additional sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND REQUIRED; START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and that's what hdparm -y issues. We don't see this in /drivers/ata because TEST UNIT READY always returns success. Urgh... ATA device in standby mode is ready for any command and definitely doesn't need an initializing command. Oh, well... So it looks like the mpt2sas SAT is doing the correct thing and we only don't see this problem in normal SATA devices because of a bug in the libata-scsi SAT. libata is inconsistent with the standard but I think the standard is wrong here. :( Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
On Wed, 2012-07-25 at 10:17 -0700, Tejun Heo wrote: Hello, James. On Wed, Jul 25, 2012 at 06:19:13PM +0400, James Bottomley wrote: I haven't consulted SAT but it seems like a bug in SAS driver or firmware. If it's a driver bug, we better fix it there. If a firmware bug, working around those is one of major roles of drivers, so I think setting allow_restart is fine. Actually, I don't think so. SAT-2 section 8.12.2 does say if the device is in the stopped state as the result of processing a START STOP UNIT command (see 9.11), then the SATL shall terminate the TEST UNIT READY command with CHECK CONDITION status with the sense key set to NOT READY and the additional sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND REQUIRED; START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and that's what hdparm -y issues. We don't see this in /drivers/ata because TEST UNIT READY always returns success. Urgh... ATA device in standby mode is ready for any command and definitely doesn't need an initializing command. Oh, well... Well, it does in sleep mode ... which seems to most closely map to what SCSI thinks of as a stopped unit. I checked the specs just in case there was an error ... they all say STANDBY not SLEEP. So it looks like the mpt2sas SAT is doing the correct thing and we only don't see this problem in normal SATA devices because of a bug in the libata-scsi SAT. libata is inconsistent with the standard but I think the standard is wrong here. :( Well, reading it, so do I. Unfortunately, we get to deal with the world as it is rather than as we would wish it to be. We likely have this problem with a lot of USB SATLs as well ... It looks like a hack like this might be needed. James --- diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 4a6381c..7e59a7f 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -42,6 +42,8 @@ #include trace/events/scsi.h +static void scsi_eh_done(struct scsi_cmnd *scmd); + #define SENSE_TIMEOUT (10*HZ) /* @@ -241,6 +243,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd) if (! scsi_command_normalize_sense(scmd, sshdr)) return FAILED; /* no valid sense data */ + if (scmd-cmnd[0] == TEST_UNIT_READY scmd-scsi_done != scsi_eh_done) + /* +* nasty: for mid-layer issued TURs, we need to return the +* actual sense data without any recovery attempt. For eh +* issued ones, we need to try to recover and interpret +*/ + return SUCCESS; + if (scsi_sense_is_deferred(sshdr)) return NEEDS_RETRY; diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 56a9379..91d3366 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -764,6 +764,16 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result, sdev-model = (char *) (sdev-inquiry + 16); sdev-rev = (char *) (sdev-inquiry + 32); + if (strncmp(sdev-vendor, ATA , 8) == 0) { + /* +* sata emulation layer device. This is a hack to work around +* the SATL power management specifications which state that +* when the SATL detects the device has gone into standby +* mode, it shall respond with NOT READY. +*/ + sdev-allow_restart = 1; + } + if (*bflags BLIST_ISROM) { sdev-type = TYPE_ROM; sdev-removable = 1; -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Tejun Heo tj at kernel.org writes: Hello, On Sat, Jul 21, 2012 at 02:15:56PM +0200, Matthias Prager wrote: Now I'm not sure this isn't taping over another bug. Which leads me to my question: What is the correct behavior? #1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o by setting allow_restart=1 for sata disks on sas controllers or #2 Teaching the sas drivers they do not need spin-up commands and can simply start issuing i/o to sata disks I haven't consulted SAT but it seems like a bug in SAS driver or firmware. If it's a driver bug, we better fix it there. If a firmware bug, working around those is one of major roles of drivers, so I think setting allow_restart is fine. Thanks. If this is a driver or firmware bug, then why would commit 85ef06d1d252f6a2e73b678591ab71caad4667bb cause this to happen? What is the interaction between this issue and this commit which just flushes events? Also this issue does not happen with mvsas, only with mpt2sas. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello James, Am 25.07.2012 21:55, schrieb James Bottomley: It looks like a hack like this might be needed. James SNIP I don't yet understand all the code but I'm following your discussion with Tejun: I've set up a minimal vm running gentoo with a mpt2sas driven controller in passthrough mode. I've applied your proposed patch against the vanilla 3.5.0 kernel (which includes Tejun's commit), and I'm happy to report the problem does seem to get fixed by it. Well at least sending the sata drive in standby using 'hdparm -y' now works (according to 'hdparm -C') without these nasty i/o errors on later i/o. That is to say the drive wakes up again (e.g. from a 'fdisk -l /dev/sda' command) and returns data. -- Matthias -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello everyone, I retested with a new firmware (P14 - released today), since it contains a bunch of sata and SATL fixes (according to the changelog). Unfortunately the observed behavior is unchanged (tested on a 3.4.5 kernel). Just wanted to let everyone know. Cheers Matthias -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello, On Sat, Jul 21, 2012 at 02:15:56PM +0200, Matthias Prager wrote: Now I'm not sure this isn't taping over another bug. Which leads me to my question: What is the correct behavior? #1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o by setting allow_restart=1 for sata disks on sas controllers or #2 Teaching the sas drivers they do not need spin-up commands and can simply start issuing i/o to sata disks I haven't consulted SAT but it seems like a bug in SAS driver or firmware. If it's a driver bug, we better fix it there. If a firmware bug, working around those is one of major roles of drivers, so I think setting allow_restart is fine. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello Tejun, Am 22.07.2012 19:31, schrieb Tejun Heo: I haven't consulted SAT but it seems like a bug in SAS driver or firmware. If it's a driver bug, we better fix it there. If a firmware bug, working around those is one of major roles of drivers, so I think setting allow_restart is fine. as it turns out my workaround (setting allow_restart=1) isn't all that useful after all. There are no more i/o errors because the drive just never goes to standby mode anymore (at least 'hdparm -y /dev/sda' does not seem to have any effect anymore). I don't really understand why - do sas drives ever get to standby mode? (they have allow_restart=1 set by default) And is this desired or expected behavior for sata disk on sas controllers? For the moment the only way for me to have my sata drives sleeping without i/o errors is to revert your original commit (85ef06d1d252f6a2e73b678591ab71caad4667bb - tested with kernels 3.1.10, 3.4.4, 3.4.5, 3.4.6 and 3.5.0) -- Matthias P.S. I hope I'm not getting on everybody's nerves here (especially yours Tejun) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Am 17.07.2012 22:01, schrieb Tejun Heo: On Tue, Jul 17, 2012 at 09:39:41PM +0200, Matthias Prager wrote: I could not however reproduce the issue on any other device than a LSI SAS controller (using SATA disks) - on a regular ICH10 using AHCI and a SATA drive I don't see these i/o errors. But since I'm experiencing these issues on two different systems (both with lsi controllers while running vmware-guests on them) and Robert sees them on his (non-virtualized) system with the same lsi controller (9211-8i), I'm inclined to make the following assumptions: Either it is an issue which is limited to this controller and possibly sata disks hanging off it or it is a more general issue with sas controllers and sata disks (again it could well affect sas disks too). Lacking other controllers or sas disks I can't be sure. So, nothing in the libata stack generates NOT_READY - initializing command required. I suppose it's LSI firmware / driver translating TUR to CHECK_POWER_MODE and generating NOT_READY. I don't know what SAT says about this but this can't be correct. An ATA device in standby mode is ready to process any commands. It should be able to come back to full operation on demand as necessary and that's why it can be transparently enabled from device side. Eric? While reading the linux-scsi mailing list I stumbled upon '[Bug 16070] Fail to issue Start/Stop Unit' http://marc.info/?l=linux-scsim=134278835822649w=2 (bugtracker: https://bugzilla.kernel.org/show_bug.cgi?id=16070) which lead me to trying to enable the 'allow_restart' flag for my disks. With this workaround a vanilla kernel 3.4.5 does not exhibit the i/o errors on sleeping sata disks hanging off sas controllers. I'm currently running one of my systems with a 'echo 1 | tee /sys/block/sd?/device/scsi_disk/*/allow_restart /dev/null' line added to the init scripts. This way I can use the untouched kernel sources and still get around the i/o errors. But I reckon this is no solution. I'm no expert on scsi/sas/ata internals, so please take the following thoughts with a grain of salt: As far as I can see (and Tejun confirmed that - I think) Tejun commit 85ef06d1d252f6a2e73b678591ab71caad4667bb somehow exposes a bug, which lies deeper in the sas/ata code. The 'sas_slave_configure()' function in 'drivers/scsi/libsas/sas_scsi_host.c' sets the 'allow_restart' flag for sas disks hanging off sas controllers. But if it encounters a sata disk it calls 'ata_sas_slave_configure()' in 'drivers/ata/libata_scsi.c' instead and returns without enabling the 'allow_restart' flag. A simple fix would be to set allow_restart=1 after having called 'ata_sas_slave_configure()' but before returning (in 'sas_slave_configure()'). Now I'm not sure this isn't taping over another bug. Which leads me to my question: What is the correct behavior? #1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o by setting allow_restart=1 for sata disks on sas controllers or #2 Teaching the sas drivers they do not need spin-up commands and can simply start issuing i/o to sata disks -- Matthias -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello, On Wed, Jul 11, 2012 at 03:48:00PM +0200, Matthias Prager wrote: I just tested kernel version 3.4.4 without commit 85ef06d1d252f6a2e73b678591ab71caad4667bb and it also works fine (beware of commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 as it conflicts with reverting 85ef06d1d252f6a2e73b678591ab71caad4667bb). I'm trying to understand why this commit leads to the issue of i/o failing on spun down drives, in hopes of being able to fix it. Meanwhile maybe Tejun Heo (author of the commit) or Jens Axboe (the committer) are able to shed some light on this (I've included them in the CC list). Nothing rings a bell for me. How does it fail? The only thing it change is when and which media check commands are issued. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello Tejun, Am 17.07.2012 20:09, schrieb Tejun Heo: Hello, On Wed, Jul 11, 2012 at 03:48:00PM +0200, Matthias Prager wrote: I'm trying to understand why this commit leads to the issue of i/o failing on spun down drives, in hopes of being able to fix it. Meanwhile maybe Tejun Heo (author of the commit) or Jens Axboe (the committer) are able to shed some light on this (I've included them in the CC list). Nothing rings a bell for me. How does it fail? The only thing it change is when and which media check commands are issued. I will try to describe the issue as best as I can (please feel free to point me to more helpful debugging steps or guides): Whenever I put a drive to sleep (either via 'hdparm -y ...' or by letting it run into standby timeout) and issue i/o's afterwards (like with the help of 'fdisk -l') I get back i/o errors (along the lines of 'end_request: I/O error, ...' - see previous posts in this thread) and the drive remains in standby (instead of waking up). Robert (who also saw these errors) bisected the issue down to your patch. And without it kernels 3.1.10 + 3.4.4 run smoothly for him and me. I could not however reproduce the issue on any other device than a LSI SAS controller (using SATA disks) - on a regular ICH10 using AHCI and a SATA drive I don't see these i/o errors. But since I'm experiencing these issues on two different systems (both with lsi controllers while running vmware-guests on them) and Robert sees them on his (non-virtualized) system with the same lsi controller (9211-8i), I'm inclined to make the following assumptions: Either it is an issue which is limited to this controller and possibly sata disks hanging off it or it is a more general issue with sas controllers and sata disks (again it could well affect sas disks too). Lacking other controllers or sas disks I can't be sure. Thank you for taking the time to look into this - it's much appreciated Matthias -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello, On Tue, Jul 17, 2012 at 09:39:41PM +0200, Matthias Prager wrote: I could not however reproduce the issue on any other device than a LSI SAS controller (using SATA disks) - on a regular ICH10 using AHCI and a SATA drive I don't see these i/o errors. But since I'm experiencing these issues on two different systems (both with lsi controllers while running vmware-guests on them) and Robert sees them on his (non-virtualized) system with the same lsi controller (9211-8i), I'm inclined to make the following assumptions: Either it is an issue which is limited to this controller and possibly sata disks hanging off it or it is a more general issue with sas controllers and sata disks (again it could well affect sas disks too). Lacking other controllers or sas disks I can't be sure. So, nothing in the libata stack generates NOT_READY - initializing command required. I suppose it's LSI firmware / driver translating TUR to CHECK_POWER_MODE and generating NOT_READY. I don't know what SAT says about this but this can't be correct. An ATA device in standby mode is ready to process any commands. It should be able to come back to full operation on demand as necessary and that's why it can be transparently enabled from device side. Eric? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Am 11.07.2012 01:27, schrieb Robert Trace: On 07/09/2012 09:51 PM, Robert Trace wrote: Huh.. I just retested this and I'm seeing really random behavior. Ok, with a refined test I've been able to reliably reproduce this and I bisected it back to commit 85ef06d1d252f6a2e73b678591ab71caad4667bb in Linus' tree (introduced between 3.0 and 3.1): commit 85ef06d1d252f6a2e73b678591ab71caad4667bb Author: Tejun Heo t...@kernel.org Date: Fri Jul 1 16:17:47 2011 +0200 block: flush MEDIA_CHANGE from drivers on close(2) Prior to the above commit, sleeping disks will spin up as a result of I/O sent to them. With the above commit, they don't spin up and immediately return an I/O failure. This is good news thank you. I can confirm your findings - omitting commit 85ef06d1d252f6a2e73b678591ab71caad4667bb solves my initial issue here (with 3.1.10). That's all the further I've gotten so far. I'll be happy to test any patches or suggestions. -- Rob -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
I just tested kernel version 3.4.4 without commit 85ef06d1d252f6a2e73b678591ab71caad4667bb and it also works fine (beware of commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 as it conflicts with reverting 85ef06d1d252f6a2e73b678591ab71caad4667bb). I'm trying to understand why this commit leads to the issue of i/o failing on spun down drives, in hopes of being able to fix it. Meanwhile maybe Tejun Heo (author of the commit) or Jens Axboe (the committer) are able to shed some light on this (I've included them in the CC list). Matthias -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
On 07/09/2012 09:51 PM, Robert Trace wrote: Huh.. I just retested this and I'm seeing really random behavior. Ok, with a refined test I've been able to reliably reproduce this and I bisected it back to commit 85ef06d1d252f6a2e73b678591ab71caad4667bb in Linus' tree (introduced between 3.0 and 3.1): commit 85ef06d1d252f6a2e73b678591ab71caad4667bb Author: Tejun Heo t...@kernel.org Date: Fri Jul 1 16:17:47 2011 +0200 block: flush MEDIA_CHANGE from drivers on close(2) Prior to the above commit, sleeping disks will spin up as a result of I/O sent to them. With the above commit, they don't spin up and immediately return an I/O failure. That's all the further I've gotten so far. I'll be happy to test any patches or suggestions. -- Rob -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Hello linux-scsi and linux-raid, I did some further research regarding my problem. It appears to me the fault does not lie with the mpt2sas driver (not that I can definitely exclude it), but with the md implementation. I reproduced what I think is the same issue on a different machine (also running Vmware ESXi 5 and an LSI 9211-8i in IR mode) with a different set of hard-drives of the same model. Using systemrescuecd (2.8.1-beta003) and booting the 64bit 3.4.4 kernel, I issued the following commands: 1) 'hdparm -y /dev/sda' (to put the hard-drive to sleep) 2) 'mdadm --create /dev/md1 --metadata 1.2 --level=mirror --raid-devices=2 --name=test1 /dev/sda missing' 3) 'fdisk -l /dev/md127' (for some reason /proc/mdstat indicates the md is being created as md127) 2) gave me this feedback: -- mdadm: super1.x cannot open /dev/sda: Device or resource busy mdadm: /dev/sda is not suitable for this array. mdadm: create aborted --- Even though it says creating aborted it still created md127. And 3) lead to these lines in dmesg: --- [ 604.838640] sd 2:0:0:0: [sda] Device not ready [ 604.838645] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.838655] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.838663] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00 20 00 [ 604.838680] end_request: I/O error, dev sda, sector 2048 [ 604.838688] Buffer I/O error on device md127, logical block 0 [ 604.838695] Buffer I/O error on device md127, logical block 1 [ 604.838699] Buffer I/O error on device md127, logical block 2 [ 604.838702] Buffer I/O error on device md127, logical block 3 [ 604.838783] sd 2:0:0:0: [sda] Device not ready [ 604.838785] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.838789] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.838793] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.838797] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00 08 00 [ 604.838805] end_request: I/O error, dev sda, sector 2048 [ 604.838808] Buffer I/O error on device md127, logical block 0 [ 604.838983] sd 2:0:0:0: [sda] Device not ready [ 604.838986] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.838989] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.838993] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.838998] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00 08 00 [ 604.839006] end_request: I/O error, dev sda, sector 146514 [ 604.839009] Buffer I/O error on device md127, logical block 183143355 [ 604.839087] sd 2:0:0:0: [sda] Device not ready [ 604.839090] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.839093] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.839097] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.839102] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00 08 00 [ 604.839110] end_request: I/O error, dev sda, sector 146514 [ 604.839113] Buffer I/O error on device md127, logical block 183143355 [ 604.839271] sd 2:0:0:0: [sda] Device not ready [ 604.839274] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.839278] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.839282] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.839286] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00 20 00 [ 604.839321] end_request: I/O error, dev sda, sector 2048 [ 604.839324] Buffer I/O error on device md127, logical block 0 [ 604.839330] Buffer I/O error on device md127, logical block 1 [ 604.840494] sd 2:0:0:0: [sda] Device not ready [ 604.840497] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.840504] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.840512] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.840516] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00 08 00 [ 604.840526] end_request: I/O error, dev sda, sector 2048 -- This excludes hardware-errors (different physical machine and devices) as cause and also ext4 which the other system was using as filesystem. Maybe Neil Brown (who scripts/get_maintainer.pl identified as the maintainer of the md-code) can make bits and pieces of this. It may well be this is the same problem but a different error-path - I don't know. I will try to make the scenario more generic, but I don't have a non-virtual machine to spare atm. Also please do let me know if I'm posting this to the wrong lists (linux-scsi and linux-raid) or if there is anything which might not be helpful with the way I'm reporting this. Regards, Matthias Prager -- To unsubscribe from this list: send the line
Re: 'Device not ready' issue on mpt2sas since 3.1.10
I did some further research regarding my problem. It appears to me the fault does not lie with the mpt2sas driver (not that I can definitely exclude it), but with the md implementation. I'm actually discovering some of the same issues (LSI 9211-8i w/ SATA disks), but I've come to a slightly different conclusion. I noticed that when my SATA disks are on a SATA controller and they spin down (or are spun down via hdparm -y), then they response to TUR (TEST UNIT READY) commands with an OK. Any I/O sent to these disks simply wait while the disks spin up and then complete as usual. However, my SATA disks on the SAS controller respond to TUR with the sense error Not Ready/Initializing command required. Any I/O sent to these disks immediately fails. You saw this in your logging: [ 604.838640] sd 2:0:0:0: [sda] Device not ready [ 604.838645] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.838655] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.838663] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00 20 00 [ 604.838680] end_request: I/O error, dev sda, sector 2048 [ 604.838688] Buffer I/O error on device md127, logical block 0 [ 604.838695] Buffer I/O error on device md127, logical block 1 [ 604.838699] Buffer I/O error on device md127, logical block 2 [ 604.838702] Buffer I/O error on device md127, logical block 3 Sending an explicit START UNIT command to these sleeping disks will wake them up and then they behave normally. (BTW, you can issue TURs and START UNITs via the sg_turs and sg_start commands). I've reproduced this behavior on the raw disks themselves, no MD layer involved (although the freak-out by my MD layer is what alerted me to this issue too... Having your entire array punted the first time you access it is a little scary :-). I'm also on raw hardware and I've seen this behavior on kernels 3.0.33 through 3.4.4. So, SATA disks respond differently depending on the controller they're on. I don't know if this is a SCSI thing, a SAS thing or a firmware/driver thing for the 9211. Now, whether or not the MD layer should be assembling arrays from failed disks is, I think, a separate issue. -- Rob -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
On Mon, 09 Jul 2012 16:40:15 +0200 Matthias Prager li...@matthiasprager.de wrote: Hello linux-scsi and linux-raid, I did some further research regarding my problem. It appears to me the fault does not lie with the mpt2sas driver (not that I can definitely exclude it), but with the md implementation. I reproduced what I think is the same issue on a different machine (also running Vmware ESXi 5 and an LSI 9211-8i in IR mode) with a different set of hard-drives of the same model. Using systemrescuecd (2.8.1-beta003) and booting the 64bit 3.4.4 kernel, I issued the following commands: 1) 'hdparm -y /dev/sda' (to put the hard-drive to sleep) 2) 'mdadm --create /dev/md1 --metadata 1.2 --level=mirror --raid-devices=2 --name=test1 /dev/sda missing' 3) 'fdisk -l /dev/md127' (for some reason /proc/mdstat indicates the md is being created as md127) 2) gave me this feedback: -- mdadm: super1.x cannot open /dev/sda: Device or resource busy mdadm: /dev/sda is not suitable for this array. mdadm: create aborted --- Even though it says creating aborted it still created md127. One of my pet peeves in when people interpret the observations wrongly and then report their interpretation instead of their observation. However sometimes it is very hard to separate the two. You comment above looks perfectly reasonable and looks like a clean observation and not and interpretation. Yet it is an interpretation :-) The observation would be Even though it says creating abort, md127 was still created. You see, it wasn't this mdadm that created md127 - it certainly shouldn't have as you asked it to create md1. I don't know the exact sequence of events, but something - possibly relating to the error messages reported below - caused udev to notice /dev/sda. udev then ran mdadm -I /dev/sda and as it had some metadata on it, it created an array with it. As the name information in that metadata was probably test1 or similar, rather than 1, mdadm didn't know what number was wanted for the array, so it chose a free high number - 127. This metadata must have been left over from an earlier experiment. So it might have been something like. - you run mdadm (call this mdadm-1). - mdadm tries to open sda - driver notices that device is asleep, and wakes it up - the waking up of the device causes a CHANGE uevent to udev - this cause udev to run a new mdadm - mdadm-2 - mdadm-2 reads the metadata, sees old metadata, assembled sda in a new md127 - mdadm-1 gets scheduled again, tries to get O_EXCL access to sda and fails, because sda is now part of md127 Clearly undesirable behaviour. I'm not sure which bit is wrong. NeilBrown And 3) lead to these lines in dmesg: --- [ 604.838640] sd 2:0:0:0: [sda] Device not ready [ 604.838645] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.838655] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.838663] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00 20 00 [ 604.838680] end_request: I/O error, dev sda, sector 2048 [ 604.838688] Buffer I/O error on device md127, logical block 0 [ 604.838695] Buffer I/O error on device md127, logical block 1 [ 604.838699] Buffer I/O error on device md127, logical block 2 [ 604.838702] Buffer I/O error on device md127, logical block 3 [ 604.838783] sd 2:0:0:0: [sda] Device not ready [ 604.838785] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.838789] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.838793] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.838797] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00 08 00 [ 604.838805] end_request: I/O error, dev sda, sector 2048 [ 604.838808] Buffer I/O error on device md127, logical block 0 [ 604.838983] sd 2:0:0:0: [sda] Device not ready [ 604.838986] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.838989] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.838993] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.838998] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00 08 00 [ 604.839006] end_request: I/O error, dev sda, sector 146514 [ 604.839009] Buffer I/O error on device md127, logical block 183143355 [ 604.839087] sd 2:0:0:0: [sda] Device not ready [ 604.839090] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.839093] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.839097] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.839102] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00 08 00 [ 604.839110] end_request: I/O error, dev sda, sector 146514 [ 604.839113] Buffer I/O error on device md127, logical block
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Am 10.07.2012 00:08, schrieb NeilBrown: On Mon, 09 Jul 2012 16:40:15 +0200 Matthias Prager li...@matthiasprager.de wrote: Even though it says creating aborted it still created md127. One of my pet peeves in when people interpret the observations wrongly and then report their interpretation instead of their observation. However sometimes it is very hard to separate the two. You comment above looks perfectly reasonable and looks like a clean observation and not and interpretation. Yet it is an interpretation :-) The observation would be Even though it says creating abort, md127 was still created. You see, it wasn't this mdadm that created md127 - it certainly shouldn't have as you asked it to create md1. Sry - I jumped to conclusions without knowing what was actually going on. I don't know the exact sequence of events, but something - possibly relating to the error messages reported below - caused udev to notice /dev/sda. udev then ran mdadm -I /dev/sda and as it had some metadata on it, it created an array with it. As the name information in that metadata was probably test1 or similar, rather than 1, mdadm didn't know what number was wanted for the array, so it chose a free high number - 127. This metadata must have been left over from an earlier experiment. That is correct (as am just realizing now). There is metadata of an raid1 array left on the disk even though it was used (for a short time) with zfs on freebsd before doing these experiments. So it might have been something like. - you run mdadm (call this mdadm-1). - mdadm tries to open sda - driver notices that device is asleep, and wakes it up - the waking up of the device causes a CHANGE uevent to udev - this cause udev to run a new mdadm - mdadm-2 - mdadm-2 reads the metadata, sees old metadata, assembled sda in a new md127 - mdadm-1 gets scheduled again, tries to get O_EXCL access to sda and fails, because sda is now part of md127 Clearly undesirable behaviour. I'm not sure which bit is wrong. As it turns out mdadm is doing everything right. md127 is actually already present (though inactive) at boot-time. So mdadm is absolutly correct in saying sda is busy and refusing to do anything further. NeilBrown The real problem seems to be located in some layer below md, which is not waking up the disk for any i/o (at all - not even for fdisk -l). Matthias -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Am 09.07.2012 21:37, schrieb Robert Trace: I did some further research regarding my problem. It appears to me the fault does not lie with the mpt2sas driver (not that I can definitely exclude it), but with the md implementation. I'm actually discovering some of the same issues (LSI 9211-8i w/ SATA disks), but I've come to a slightly different conclusion. I noticed that when my SATA disks are on a SATA controller and they spin down (or are spun down via hdparm -y), then they response to TUR (TEST UNIT READY) commands with an OK. Any I/O sent to these disks simply wait while the disks spin up and then complete as usual. However, my SATA disks on the SAS controller respond to TUR with the sense error Not Ready/Initializing command required. Any I/O sent to these disks immediately fails. You saw this in your logging: [ 604.838640] sd 2:0:0:0: [sda] Device not ready [ 604.838645] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 604.838655] sd 2:0:0:0: [sda] Sense Key : Not Ready [current] [ 604.838663] sd 2:0:0:0: [sda] Add. Sense: Logical unit not ready, initializing command required [ 604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00 20 00 [ 604.838680] end_request: I/O error, dev sda, sector 2048 [ 604.838688] Buffer I/O error on device md127, logical block 0 [ 604.838695] Buffer I/O error on device md127, logical block 1 [ 604.838699] Buffer I/O error on device md127, logical block 2 [ 604.838702] Buffer I/O error on device md127, logical block 3 Sending an explicit START UNIT command to these sleeping disks will wake them up and then they behave normally. (BTW, you can issue TURs and START UNITs via the sg_turs and sg_start commands). Thanks for these pointers. I've reproduced this behavior on the raw disks themselves, no MD layer involved (although the freak-out by my MD layer is what alerted me to this issue too... Having your entire array punted the first time you access it is a little scary :-). I'm also on raw hardware and I've seen this behavior on kernels 3.0.33 through 3.4.4. This is interesting - are you sure about 3.0.33? I'm running this kernel atm for it gives me no trouble (as opposed to =3.1.10). The SATA disks are spun up when I access data on them. So, SATA disks respond differently depending on the controller they're on. I don't know if this is a SCSI thing, a SAS thing or a firmware/driver thing for the 9211. Now, whether or not the MD layer should be assembling arrays from failed disks is, I think, a separate issue. I realize now in my cases the MD layer behaved correctly. -- Rob -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
Am 10.07.2012 00:24, schrieb Robert Trace: Also, TURs don't appear to actually wake the disk up (should they?). The only thing I've found that'll wake the disk up is an explicit START UNIT command. I haven't checked the scsi logging side, but about the only commands that wake up the disks are 'smartctl -a /dev/sda' and 'sg_start' (smartcl maybe issuing a START UNIT command on it's own). Matthias -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
[removed linux-raid since the md layer seems unrelated] On 07/09/2012 08:12 PM, Matthias Prager wrote: I've reproduced this behavior on the raw disks themselves, no MD layer involved (although the freak-out by my MD layer is what alerted me to this issue too... Having your entire array punted the first time you access it is a little scary :-). I'm also on raw hardware and I've seen this behavior on kernels 3.0.33 through 3.4.4. This is interesting - are you sure about 3.0.33? I'm running this kernel atm for it gives me no trouble (as opposed to =3.1.10). The SATA disks are spun up when I access data on them. Huh.. I just retested this and I'm seeing really random behavior. I tried 3.0.33 a few days ago after I saw your initial e-mail to this list. At that time, the one disk I tried didn't wake up when I sent I/O to it. My first retest (just now), on 3.0.33 with four disks, showed the behavior you initially reported. Two of the disks woke up from the I/O, but not all of them. Repeating the test without rebooting made two disks wake up, but only one of the same disks from the first test. The second disk that woke up was different. After rebooting and running the test again, none of the disks woke up. Rebooting again and all of the disks are waking up. (FYI, here's the test I ran: 1. hdparm -y /dev/sd[lmjk] 2. hdparm -C /dev/sd[lmjk] (to verify disks in standby) 3. for i in l m j k; do sg_turs -v /dev/sd${i}; done (All disks reported Not Ready) 4. echo 3 /proc/sys/vm/drop_caches 5. for i in l m j k; do dd if=/dev/sd${i} of=/dev/null bs=512 count=1 skip=number; done I've been manually changing the skip=number because I've seen the dd command complete successfully without the disk waking up. I think this is because the disk is satisfying the read from its own cache. Changing where on the disk I'm reading should thwart this. ) I'm confused. I'll try more recent kernels again and see if the behavior becomes predictable. -- Rob -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Device not ready' issue on mpt2sas since 3.1.10
On 07/09/2012 08:21 PM, Matthias Prager wrote: I haven't checked the scsi logging side, but about the only commands that wake up the disks are 'smartctl -a /dev/sda' and 'sg_start' (smartcl maybe issuing a START UNIT command on it's own). smartctl -a does appear to wake the disks. The scsi log shows an IDENTIFY and then several ATA passthrough commands (one of which takes ~10 seconds to complete). So, I don't see an explicit START UNIT, but one of those ATA commands which I didn't decode could certainly trigger the wakeup. -- Rob -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html