[Patch][RFC] st: provide tape statistics via sysfs
First forgive me for using outlook for this, if there are any issues with what I sent let me know and I'll send it again from gmail. This is also my first attempt at a kernel patch so please be gentle. This patch was written to enable tape statistics via sysfs for the dt driver based on kernel 3.8.0-rc6. It creates two new files in sysfs and is based on work done previously in 2005 by Kai Mäkisara. Any feedback would be greatly appreciated. Assuming sysfs is mounted at /sys the first file is /sys/bus/scsi/drivers/st/drives which gives a single number indicating what the largest tape drive instance assigned by st_probe in the st module is. If it's 4 it possible that st0, st1, st2, and st3 exist on the system. Since tape drives can later be disconnected they don't have to exist, the count is a hint so it's possible to gather statistics in a loop with an upper bound. This makes it easier in iostat to gather statistcs. The second file is /sys/class/scsi_tape/stxx/stat where xx is the instance of the tape drive. The file contents are almost the same as the stat file for disks except the merge statistics are always 0 (since tape drives are sequential merged I/Os don't make sense) and the inflight value is either a 0 or 1 since the st module always only has either one read or write outstanding. I've also added one field to the end of the file - a count other I/Os - this could be commands issued by the driver within the kernel (e.g. rewind) or via an ioctl from user space. For tape drives some commands involving actions like tape movement can take a long time, it's important to keep track of scsi requests sent to the tape drive other than reads and writes so when delays happen they can be explained. With some future patches to iostat this figure will be reported and used to calculate an average wait for all I/Os (a_await and oio/s in this output): tape: wr/s KiB_write/srd/s KiB_read/s r_await w_await a_await oio/s st0 186.50 46.750.000.000.0000.2760.276 0.00 st1 186.00 93.000.000.000.0000.1800.180 0.00 st2 0.00 0.00 181.50 45.500.3470.0000.347 0.00 st3 0.00 0.00 183.00 45.750.2240.0000.224 0.00 Q: Does anyone have strong objections to extending the stat format to include another field (a count of scsi commands issue to the target other than reads or writes), or should the format stay in common with disks and a new device class specific file be created that provides extra statistics that may be useful only for a specific class of SCSI device? For example called stat-tape, stat-st or something else? Onto justification we have a customer using virtual tape libraries (lots of drives) and they wanted to be able to monitor the activity and performance of their backups. Because of a lack of functionality they resorted to using a publicly available SystemTap script (created by RedHat presumably when they received similar requests from other customers): http://sourceware.org/systemtap/wiki/WSiostatSCSI Unfortunately, using this script occasionally results in kernel panics on older kernels, those issues have been addressed but most customers still don't end up running the SystemTap script unless they have to and they still wait to monitor performance of their tape drives. Just googling: linux tape throughput statistics is enough to yield many hits on the topic including these: 1. http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14775056 2. http://h30499.www3.hp.com/t5/System-Administration/How-to-get-tape-drive-performance-stats/td-p/3880235#.UKoJxNGloUo 3. http://docs.oracle.com/cd/E19455-01/816-3319/6m9k06r58/index.html The first two are asking about getting tape stats on Linux, the reply for 1. is that you can get the information on AIX. 2. is similar but the reply is that you can get the information for HP-UX 11.31. The last one is the iostat manual page for Solaris which can report tape stats as well. All 3 point out that iostat can print tape statistics on the largest of the commercial unix operating systems. Q: Does anyone have any general feedback about things that need to change or demands about changing the implementation before being accepted? The checkpatch.pl script generates warnings for the diffs because of CamelToe however the CamelToe warnings are because I wanted to stay consistent with the module (look for things like STp). Signed-off-by: Shane Seymour Signed-off-by: Darren Lavender Tested-by: Shane Seymour Tested-by: Darren Lavender --- diff -uprN -X linux-3.8-rc6-vanilla/Documentation/dontdiff linux-3.8-rc6-vanilla/drivers/scsi/st.c linux-3.8-rc6/drivers/scsi/st.c --- linux-3.8-rc6-vanilla/drivers/scsi/st.c 2013-02-08 14:35:27.0 + +++ linux-3.8-rc6/drivers/scsi/st.c 2013-02-22 00:06:50.0 + @@ -174,6 +174,9 @@ static int debugging = DEBUG; stat
Re: Issue with mini-SaS to eSATA to USB 3.0 setup
On Thu, Feb 21, 2013 at 05:27:00PM -0300, Fabio David wrote: > On Thu, Feb 21, 2013 at 4:26 PM, Sarah Sharp > wrote: > On Tue, Jan 29, 2013 at 12:56:02PM -0200, Fabio David wrote: > > > Do you have any suggestions? > > > > A couple possible root causes come to mind: > > > > 1. Perhaps the USB 3.0 hub is interfering with communication to your > > eSATA to USB 3.0 adapters. > > > > 2. Maybe USB device suspend is to blame. Do you have USB device suspend > > enabled for the eSATA to USB adapters? > > I am not sure, I thought it was disabled by default. How can I check? It is disabled by default. I just wanted to make sure an installed udev script wasn't enabling auto-suspend. You can check whether auto-suspend is enabled by running powertop and looking for the lines that correspond to the USB 3.0 to eSATA adapters. If they say 'Bad', device suspend is disabled. If they say 'Good', device suspend is enabled. Or you can find the power/control entries for the devices in /sys/bus/usb/devices/ and make sure they say 'on' rather than 'auto'. E.g. sarah@xanatos:~$ lsusb Bus 001 Device 002: ID 050d:0413 Belkin Components Bus 003 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 045e:0750 Microsoft Corp. Wired Keyboard 600 Bus 001 Device 004: ID 046d:c018 Logitech, Inc. Optical Wheel Mouse Bus 003 Device 004: ID 04f2:b2ea Chicony Electronics Co., Ltd sarah@xanatos:~$ lsusb -t /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ehci_hcd/3p, 480M |__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/8p, 480M /: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ehci_hcd/3p, 480M /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M |__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/4p, 480M |__ Port 3: Dev 3, If 0, Class=HID, Driver=usbhid, 1.5M |__ Port 3: Dev 3, If 1, Class=HID, Driver=usbhid, 1.5M |__ Port 4: Dev 4, If 0, Class=HID, Driver=usbhid, 1.5M sarah@xanatos:~$ cd /sys/bus/usb/devices/ sarah@xanatos:/sys/bus/usb/devices$ ls 1-0:1.0 1-1 1-1:1.0 1-1.3 1-1.3:1.0 1-1.3:1.1 1-1.4 1-1.4:1.0 2-0:1.0 3-0:1.0 3-1 3-1:1.0 3-1.6 3-1.6:1.0 3-1.6:1.1 4-0:1.0 4-1 4-1:1.0 usb1 usb2 usb3 usb4 sarah@xanatos:/sys/bus/usb/devices$ cat 1-1.4/idVendor 046d sarah@xanatos:/sys/bus/usb/devices$ cat 1-1.4/power/control on sarah@xanatos:/sys/bus/usb/devices$ That means my USB mouse is 'on', so device auto-suspend is disabled. Sarah Sharp -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] Re: Issue with mini-SaS to eSATA to USB 3.0 setup
On Thu, Feb 21, 2013 at 03:48:42PM -0500, Douglas Gilbert wrote: > On 13-02-21 02:26 PM, Sarah Sharp wrote: > >Cc-ing the SCSI and USB storage list. > > > >Folks, does the attached picture look like a sane setup? I've never > >used mini-SaS to eSATA adapter before, let alone with four eSATA to USB > >3.0 adapters. > > Well SAS to eSATA is okay (works for me: LSI SAS9212-4i4e HBA > via a SATA to eSATA cable to a SATA disk caddy with an eSATA > port). This seems to be all just SATA signalling, no SAS involved at all, just the physical shape of the connector is miniSAS. -- Vojtech Pavlik Director SuSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read I/O starvation with writeback RAID controller
Hi Martin, On Thu, 2013-02-21 at 12:43 +0100, Martin Svec wrote: > I'm sorry, I forgot to mention hardware details. It isn't aacraid, it > is megaraid-based Dell PERC H700 w/ 1GB NVRAM and 12x 450GB 15k SAS > drives in RAID-10. All in Dell R510 server. > Jan Engelhardt (CC'ed) mentioned the currently out-of-tree ROW scheduler worked for him: https://lkml.org/lkml/2012/12/11/534 Perhaps this would be worth a shot..? --nab > Thanks, > > Martin > > Dne 20.2.2013 21:48, Nicholas A. Bellinger napsal(a): > > Hi Martin, > > > > CC'ing linux-scsi here, as aacraid doesn't have an official maintainer > > atm. > > > > --nab > > > > On Wed, 2013-02-20 at 16:38 +0100, Martin Svec wrote: > >> Hello, > >> > >> I've noticed read I/O starvation problems of LIO iSCSI target when > >> used on top of writeback-enabled HW RAID controller (PERC H700 with > >> 1GB cache). For intensive mixed read-write workload in virtualized > >> environments, writes are able to consume over 95% of the IOPS > >> throughput and cause starvation of reads. > >> > >> After a number of tests it seems to me it's a general issue of block > >> layer I/O scheduling when running on top of a writeback device. If > >> there is a write-intensive task, all writes go to the writeback cache > >> with near-zero latency. This allows writer to quickly saturate the > >> device with thousands of writes while using only a minimal fraction of > >> queue depth. However, non-cached reads depend on spinning drive > >> latencies which are orders of magnitude higher than writeback cache > >> latencies, and so readers cannot submit so many requests per second as > >> writers. Consequently, I guess the controller has totally wrong view > >> of the incoming workload pattern, tries to satisfy the write flood > >> first and the net result is inacceptable starvation of reads, with > >> latencies up to hundreds of milliseconds. > >> > >> A simple fio test with 1TiB block device where one thread does 4k > >> random sync writes with iodepth=32 and one thread does 4k random reads > >> with iodepth=32 shows that instead of the theoretical 50:50 IOPS > >> ratio, the block device runs with 95:5 ratio in favor of writes. In > >> fact, the imbalance is so high that even write iodepth=2 is enaugh to > >> achieve the same numbers. > >> > >> Real workloads that tend to exhibit this problem are: initial zeroing > >> of a virtual machine disk, virtual machine migration, virtual machine > >> cloning, intensive swapping of one virtual machine etc. > >> > >> I tried to set WCE=1 on target iblock device, played with queue > >> depths, tested all three I/O schedulers and their parameters, > >> controller's parameters, but with no luck. To achieve reasonably good > >> fairness, the only solution is to set nr_requests to 1 or disable > >> controller's writeback cache at all -- at the expense of degraded > >> overall performance :-( > >> > >> Regarding nr_requests, there's obvious relation between iodepths and > >> read starvation: if (nr_requests >= workload iodepth) then starvation > >> surely occurs. Lowering nr_requests below this threshold slowly starts > >> improving fairness and for every rd+wr iodepths pair, there exists > >> sufficiently low nr_requests value at which IOPS ratio is finally > >> balanced according to rd:wr iodepth ratio. Unfortunately it means > >> there is no minimal nr_requests value suitable for all workloads. For > >> iodepths around 2 to 8, only nr_requests=1 provides fair load balancing. > >> > >> Is this a known problem? Does anybody find block layer parameters that > >> elliminate this problem for iscsi-target storage in mixed random > >> read-write environments like virtualization? Or should I start writing > >> my own I/O scheduler? ;-) > >> > >> Update: I've just found https://lkml.org/lkml/2012/12/10/550 (Read > >> starvation by sync writes), where Jan Kara describes identical > >> symptoms. But setting nr_requests=1 doesn't help in my case. > >> CC'ing LKML too (I'm not LKML subscriber). > >> > >> Thanks, > >> > >> Martin > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe target-devel" in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > To unsubscribe from this list: send the line "unsubscribe target-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with mini-SaS to eSATA to USB 3.0 setup
On 13-02-21 02:26 PM, Sarah Sharp wrote: Cc-ing the SCSI and USB storage list. Folks, does the attached picture look like a sane setup? I've never used mini-SaS to eSATA adapter before, let alone with four eSATA to USB 3.0 adapters. Well SAS to eSATA is okay (works for me: LSI SAS9212-4i4e HBA via a SATA to eSATA cable to a SATA disk caddy with an eSATA port). eSATA to USB 3.0 adapters sound pretty dodgy, especially when no mention is made of UAS(P). Doug Gilbert On Tue, Jan 29, 2013 at 12:56:02PM -0200, Fabio David wrote: Hi Sarah, My name is Fabio David and I am from Brazil. I've seen your posts on several forums and read articles about you. I really admire your work. Maybe you can help me. I'm trying to connect a PC running Centos 6.3 to a CRU dataport 4-bay storage device. This device only has a miniSaS port. Here is my scenario: - DataCRU device with 4 hot-swapables bays. http://www.cru-inc.com/slideshow.php?dir=//Digital-Cinema//&sel=5 - MiniSaS cable connects to the DataCRU device and on the other side there are 4 eSata connectors http://www.elpeus.com/sas-mini-sas/external-mini-sas-cables/sff-8088-to-4-esata/3m-mini-sas-sff-8088-to-4-esata-cable/ - 4 eSata<->USB3.0 adaptors connected to each eSata connector - Adaptors connected to a USB3.0 HUB - USB3.0 hub connected to PC Everything works ok, I can mount/read the HDs, but sometimes the system does not detect when a hard drive is inserted/removed from a DataCru bay. No events are generated, nothing appears in /proc/partitions nor udev is called to apply my rules. Do you lose only hard drive insertion events, or do you lose remove events as well? For example, what happens when you do this: 1. Unplug the eSATA to USB adapters from the USB 3.0 hub. 2. Insert a hard drive into the bay. 3. Connect the eSATA to USB adapter to the USB 3.0 hub. 4. Wait for hard drive detection, then hot-remove the drive from the bay. However, everything works fine when connected directly to PC's USB port. Please look at the attached picture. It looks like you're only attaching one eSATA to USB adapter to the roothub. Do you only have one USB 3.0 port on the host, or can you try plugging in multiple eSATA to USB adapters into the roothub? Does the setup work when only one eSATA to USB adapter is plugged into the USB 3.0 hub? Do you have any suggestions? A couple possible root causes come to mind: 1. Perhaps the USB 3.0 hub is interfering with communication to your eSATA to USB 3.0 adapters. 2. Maybe USB device suspend is to blame. Do you have USB device suspend enabled for the eSATA to USB adapters? 3. Perhaps the SATA adapters aren't responding with a Medium Changed status when the USB storage device is plugged in. Can you send me dmesg, starting from just before you insert a hard drive into the drive bays? I need dmesg for both when the SATA adapter is connected directly to the roothub, and when it's connected to the USB 3.0 hub. A usbmon trace might also be useful for the USB storage developers. Documentation on how to take that trace is here: http://lxr.linux.no/#linux/Documentation/usb/usbmon.txt Sarah Sharp === lsusb returns Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 001 Device 002: ID 13d3:3323 IMC Networks Bus 001 Device 009: ID 2109:3431 < HUB 3.0 Bus 006 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 007 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 007 Device 040: ID 2109:0810 < HUB 3.0 Bus 007 Device 041: ID 1234:5678 Brain Actuated Technologies Bus 007 Device 042: ID 1234:5678 Brain Actuated Technologies Bus 007 Device 043: ID 1234:5678 Brain Actuated Technologies Bus 007 Device 044: ID 1234:5678 Brain Actuated Technologies /var/log/messages Jan 27 18:00:28 localhost kernel: usb 7-1: New USB device found, idVendor=2109, idProduct=0810 Jan 27 18:00:28 localhost kernel: usb 7-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Jan 27 18:00:28 localhost kernel: usb 7-1: Product: 4-Port USB 3.0 Hub Jan 27 18:00:28 localhost kernel: usb 7-1: Manufacturer: VIA Labs, Inc. Jan 27 18:00:28 localhost kernel: usb 7-1: configuration #1 chosen from 1 choice Jan 27 18:00:28 localhost kernel: hub 7-1:1.0: USB hub found Jan 27 18:00:28 localhost kernel: hub 7-1:1.0: 4 ports detected Jan 28 21:32:02 localhost kernel: usb 7-1.1: new SuperSpeed USB device number 9 using xhci_hcd Jan 28 21:32:56 localhost kernel: xhci_hcd :01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Jan 28 21:32:56 localhost kernel: xhci_hcd :01:00.0: xHCI Host Controller Jan 28 21:32:56 localhost kernel: xhci_hcd 000
Re: [usb-storage] Issue with mini-SaS to eSATA to USB 3.0 setup
I highly doubt hot-insert and hot-remove of HDDs from the 4-bay container (without removing the corresponding USB/eSATA adaptor) will work. The USB/eSATA adaptor does not have a way to inform the host that the eSATA side has been disconnected from the HDD. That functionality isn't in the usb-storage protocol. This type of functionality *might* be supported in the UAS protocol, but I don't know. Matt On Thu, Feb 21, 2013 at 11:26 AM, Sarah Sharp wrote: > Cc-ing the SCSI and USB storage list. > > Folks, does the attached picture look like a sane setup? I've never > used mini-SaS to eSATA adapter before, let alone with four eSATA to USB > 3.0 adapters. > > On Tue, Jan 29, 2013 at 12:56:02PM -0200, Fabio David wrote: >> Hi Sarah, >> >> My name is Fabio David and I am from Brazil. I've seen your posts on >> several forums and read articles about you. I really admire your work. >> >> Maybe you can help me. I'm trying to connect a PC running Centos 6.3 >> to a CRU dataport 4-bay storage device. This device only has a miniSaS >> port. >> >> Here is my scenario: >> >> - DataCRU device with 4 hot-swapables bays. >> http://www.cru-inc.com/slideshow.php?dir=//Digital-Cinema//&sel=5 >> - MiniSaS cable connects to the DataCRU device and on the other side >> there are 4 eSata connectors >> >> http://www.elpeus.com/sas-mini-sas/external-mini-sas-cables/sff-8088-to-4-esata/3m-mini-sas-sff-8088-to-4-esata-cable/ >> - 4 eSata<->USB3.0 adaptors connected to each eSata connector >> - Adaptors connected to a USB3.0 HUB >> - USB3.0 hub connected to PC >> >> Everything works ok, I can mount/read the HDs, but sometimes the >> system does not detect when a hard drive is inserted/removed from a >> DataCru bay. No events are generated, nothing appears in >> /proc/partitions nor udev >> is called to apply my rules. > > Do you lose only hard drive insertion events, or do you lose remove > events as well? > > For example, what happens when you do this: > > 1. Unplug the eSATA to USB adapters from the USB 3.0 hub. > 2. Insert a hard drive into the bay. > 3. Connect the eSATA to USB adapter to the USB 3.0 hub. > 4. Wait for hard drive detection, then hot-remove the drive from the > bay. > >> However, everything works fine when connected directly to PC's USB >> port. Please look at the attached picture. > > It looks like you're only attaching one eSATA to USB adapter to the > roothub. Do you only have one USB 3.0 port on the host, or can you try > plugging in multiple eSATA to USB adapters into the roothub? > > Does the setup work when only one eSATA to USB adapter is plugged into > the USB 3.0 hub? > >> Do you have any suggestions? > > A couple possible root causes come to mind: > > 1. Perhaps the USB 3.0 hub is interfering with communication to your > eSATA to USB 3.0 adapters. > > 2. Maybe USB device suspend is to blame. Do you have USB device suspend > enabled for the eSATA to USB adapters? > > 3. Perhaps the SATA adapters aren't responding with a Medium Changed > status when the USB storage device is plugged in. > > Can you send me dmesg, starting from just before you insert a hard drive > into the drive bays? I need dmesg for both when the SATA adapter is > connected directly to the roothub, and when it's connected to the USB > 3.0 hub. > > A usbmon trace might also be useful for the USB storage developers. > Documentation on how to take that trace is here: > > http://lxr.linux.no/#linux/Documentation/usb/usbmon.txt > > Sarah Sharp > >> === >> >> lsusb returns >> >> Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub >> Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub >> Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub >> Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub >> Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub >> Bus 001 Device 002: ID 13d3:3323 IMC Networks >> Bus 001 Device 009: ID 2109:3431 < HUB 3.0 >> Bus 006 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub >> Bus 007 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub >> Bus 007 Device 040: ID 2109:0810 < HUB 3.0 >> Bus 007 Device 041: ID 1234:5678 Brain Actuated Technologies >> Bus 007 Device 042: ID 1234:5678 Brain Actuated Technologies >> Bus 007 Device 043: ID 1234:5678 Brain Actuated Technologies >> Bus 007 Device 044: ID 1234:5678 Brain Actuated Technologies >> >> /var/log/messages >> >> Jan 27 18:00:28 localhost kernel: usb 7-1: New USB device found, >> idVendor=2109, idProduct=0810 >> Jan 27 18:00:28 localhost kernel: usb 7-1: New USB device strings: >> Mfr=1, Product=2, SerialNumber=0 >> Jan 27 18:00:28 localhost kernel: usb 7-1: Product: 4-Port USB 3.0 Hub >> Jan 27 18:00:28 localhost kernel: usb 7-1: Manufacturer: VIA Labs, Inc. >> Jan 27 18:00:28 localhost kernel: usb 7-1: configuration #1 chosen from 1 >> choice >> Jan 27 18:00:28 localhos
[PATCH RESEND 2/4] scsi: storvsc: avoid usage of WRITE_SAME
From: Olaf Hering Set scsi_device->no_write_same because the host does not support it. Also blacklist WRITE_SAME to avoid (and log) accident usage. If the guest uses the ext4 filesystem, storvsc hangs while it prints these messages in an endless loop: ... [ 161.459523] hv_storvsc vmbus_0_1: cmd 0x41 scsi status 0x2 srb status 0x6 [ 161.462157] sd 2:0:0:0: [sda] [ 161.463135] Sense Key : No Sense [current] [ 161.464983] sd 2:0:0:0: [sda] [ 161.465899] Add. Sense: No additional sense information [ 161.468211] hv_storvsc vmbus_0_1: cmd 0x41 scsi status 0x2 srb status 0x6 [ 161.475766] sd 2:0:0:0: [sda] [ 161.476728] Sense Key : No Sense [current] [ 161.478284] sd 2:0:0:0: [sda] [ 161.479441] Add. Sense: No additional sense information ... This happens with a guest running on Windows Server 2012, but happens to work while running on Windows Server 2008. WRITE_SAME isnt really supported by both versions, so disable the command usage globally. Signed-off-by: Olaf Hering Cc: KY Srinivasan Cc: Signed-off-by: K. Y. Srinivasan --- drivers/scsi/storvsc_drv.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 5ada1d0..2060509 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1156,6 +1156,8 @@ static int storvsc_device_configure(struct scsi_device *sdevice) blk_queue_bounce_limit(sdevice->request_queue, BLK_BOUNCE_ANY); + sdevice->no_write_same = 1; + return 0; } @@ -1238,6 +1240,8 @@ static bool storvsc_scsi_cmd_ok(struct scsi_cmnd *scmnd) u8 scsi_op = scmnd->cmnd[0]; switch (scsi_op) { + /* the host does not handle WRITE_SAME, log accident usage */ + case WRITE_SAME: /* * smartd sends this command and the host does not handle * this. So, don't send it. -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] Drivers: scsi: storvsc: Handle dynamic resizing of the device
Handle LUN size changes by re-scanning the device. Signed-off-by: K. Y. Srinivasan Reviewed-by: Haiyang Zhang --- drivers/scsi/storvsc_drv.c | 31 +++ 1 files changed, 31 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 16d5aac..16a3a0c 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -201,6 +201,7 @@ enum storvsc_request_type { #define SRB_STATUS_AUTOSENSE_VALID 0x80 #define SRB_STATUS_INVALID_LUN 0x20 #define SRB_STATUS_SUCCESS 0x01 +#define SRB_STATUS_ABORTED 0x02 #define SRB_STATUS_ERROR 0x04 /* @@ -295,6 +296,25 @@ struct storvsc_scan_work { uint lun; }; +static void storvsc_device_scan(struct work_struct *work) +{ + struct storvsc_scan_work *wrk; + uint lun; + struct scsi_device *sdev; + + wrk = container_of(work, struct storvsc_scan_work, work); + lun = wrk->lun; + + sdev = scsi_device_lookup(wrk->host, 0, 0, lun); + if (!sdev) + goto done; + scsi_rescan_device(&sdev->sdev_gendev); + scsi_device_put(sdev); + +done: + kfree(wrk); +} + static void storvsc_bus_scan(struct work_struct *work) { struct storvsc_scan_work *wrk; @@ -791,7 +811,18 @@ static void storvsc_handle_error(struct vmscsi_request *vm_srb, do_work = true; process_err_fn = storvsc_remove_lun; break; + case (SRB_STATUS_ABORTED | SRB_STATUS_AUTOSENSE_VALID): + if ((asc == 0x2a) && (ascq == 0x9)) { + do_work = true; + process_err_fn = storvsc_device_scan; + /* +* Retry the I/O that trigerred this. +*/ + set_host_byte(scmnd, DID_REQUEUE); + } + break; } + if (!do_work) return; -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 1/4] Drivers: scsi: storvsc: Initialize the sglist
Initialize sglist before using it. Signed-off-by: K. Y. Srinivasan Cc: --- drivers/scsi/storvsc_drv.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 270b3cf..5ada1d0 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -467,6 +467,7 @@ static struct scatterlist *create_bounce_buffer(struct scatterlist *sgl, if (!bounce_sgl) return NULL; + sg_init_table(bounce_sgl, num_pages); for (i = 0; i < num_pages; i++) { page_buf = alloc_page(GFP_ATOMIC); if (!page_buf) -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Drivers: scsi: storvsc: Restructure error handling code on command completion
In preparation for handling additional sense codes, restructure and cleanup the error handling code in the command completion code path. Signed-off-by: K. Y. Srinivasan Reviewed-by: Haiyang Zhang --- drivers/scsi/storvsc_drv.c | 101 +-- 1 files changed, 59 insertions(+), 42 deletions(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 2060509..16d5aac 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -761,6 +761,55 @@ cleanup: return ret; } +static void storvsc_handle_error(struct vmscsi_request *vm_srb, + struct scsi_cmnd *scmnd, + struct Scsi_Host *host, + u8 asc, u8 ascq) +{ + struct storvsc_scan_work *wrk; + void (*process_err_fn)(struct work_struct *work); + bool do_work = false; + + switch (vm_srb->srb_status) { + case SRB_STATUS_ERROR: + /* +* If there is an error; offline the device since all +* error recovery strategies would have already been +* deployed on the host side. However, if the command +* were a pass-through command deal with it appropriately. +*/ + switch (scmnd->cmnd[0]) { + case ATA_16: + case ATA_12: + set_host_byte(scmnd, DID_PASSTHROUGH); + break; + default: + set_host_byte(scmnd, DID_TARGET_FAILURE); + } + break; + case SRB_STATUS_INVALID_LUN: + do_work = true; + process_err_fn = storvsc_remove_lun; + break; + } + if (!do_work) + return; + + /* +* We need to schedule work to process this error; schedule it. +*/ + wrk = kmalloc(sizeof(struct storvsc_scan_work), GFP_ATOMIC); + if (!wrk) { + set_host_byte(scmnd, DID_TARGET_FAILURE); + return; + } + + wrk->host = host; + wrk->lun = vm_srb->lun; + INIT_WORK(&wrk->work, process_err_fn); + schedule_work(&wrk->work); +} + static void storvsc_command_completion(struct storvsc_cmd_request *cmd_request) { @@ -769,8 +818,13 @@ static void storvsc_command_completion(struct storvsc_cmd_request *cmd_request) void (*scsi_done_fn)(struct scsi_cmnd *); struct scsi_sense_hdr sense_hdr; struct vmscsi_request *vm_srb; - struct storvsc_scan_work *wrk; struct stor_mem_pools *memp = scmnd->device->hostdata; + struct Scsi_Host *host; + struct storvsc_device *stor_dev; + struct hv_device *dev = host_dev->dev; + + stor_dev = get_in_stor_device(dev); + host = stor_dev->host; vm_srb = &cmd_request->vstor_packet.vm_srb; if (cmd_request->bounce_sgl_count) { @@ -783,55 +837,18 @@ static void storvsc_command_completion(struct storvsc_cmd_request *cmd_request) cmd_request->bounce_sgl_count); } - /* -* If there is an error; offline the device since all -* error recovery strategies would have already been -* deployed on the host side. However, if the command -* were a pass-through command deal with it appropriately. -*/ scmnd->result = vm_srb->scsi_status; - if (vm_srb->srb_status == SRB_STATUS_ERROR) { - switch (scmnd->cmnd[0]) { - case ATA_16: - case ATA_12: - set_host_byte(scmnd, DID_PASSTHROUGH); - break; - default: - set_host_byte(scmnd, DID_TARGET_FAILURE); - } - } - - - /* -* If the LUN is invalid; remove the device. -*/ - if (vm_srb->srb_status == SRB_STATUS_INVALID_LUN) { - struct storvsc_device *stor_dev; - struct hv_device *dev = host_dev->dev; - struct Scsi_Host *host; - - stor_dev = get_in_stor_device(dev); - host = stor_dev->host; - - wrk = kmalloc(sizeof(struct storvsc_scan_work), - GFP_ATOMIC); - if (!wrk) { - scmnd->result = DID_TARGET_FAILURE << 16; - } else { - wrk->host = host; - wrk->lun = vm_srb->lun; - INIT_WORK(&wrk->work, storvsc_remove_lun); - schedule_work(&wrk->work); - } - } - if (scmnd->result) { if (scsi_normalize_sense(scmnd->sense_buffer, SCSI_SENSE_BUFFERSIZE, &sense_hdr)) scsi_print_sense_hdr("storvsc", &sense_hdr); } + if (vm_srb->srb_status != SRB_STATUS_SUCCESS) +
[PATCH 0/4] Drivers: scsi: storvsc
This patch set (two of the patches are being resent) fixes and enhances the functionality of the Hyper-V storage driver K. Y. Srinivasan (3): Drivers: scsi: storvsc: Initialize the sglist Drivers: scsi: storvsc: Restructure error handling code on command completion Drivers: scsi: storvsc: Handle dynamic resizing of the device Olaf Hering (1): scsi: storvsc: avoid usage of WRITE_SAME drivers/scsi/storvsc_drv.c | 137 ++- 1 files changed, 95 insertions(+), 42 deletions(-) -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] scsi: 64-bit LUN support
On Thu, 2013-02-21 at 16:15 +, Elliott, Robert (Server Storage) wrote: > Regarding changes like this: > - printk(MYIOC_s_NOTE_FMT "[%d:%d:%d:%d] " > + printk(MYIOC_s_NOTE_FMT "[%d:%d:%d:%llu] " > "FCP_ResponseInfo=%08xh\n", ioc->name, > sc->device->host->host_no, sc->device->channel, > sc->device->id, sc->device->lun, > > It might be preferable to print the LUN values in hex rather than > decimal, particularly if they are large values. SAM-5 includes some > guidance for displaying LUNs, shown below. We can't really change from decimal to hex without causing confusion and possibly breaking ABIs. All the existing SCSI references look like h:c:t:l and all expect l to be a simple decimal. It's not just in the logs, we have active use of this form in all the /sys/class/scsi_*/ directories and some tools may parse this value. > One important goal is to match the format, if any, that the user must > use in a configuration file or command line argument, so > cutting-and-pasting the LUN value works. So, the answer might differ > for prints from different drivers. If a driver expects decimal input > values, then print decimal. > > SAM-5 excerpt: > 4.7.2 Logical unit representation format [...] We're a bit bound by kernel convention here as well. To retain compatibility with SPI and flat addressing schemes, we really need to show the 8 and 16 bit flat addresses as simple decimal numerics. However, we *might* be free to move to a more hierarchical scheme with the multi-level luns, since I don't think there's to many people who've got arrays that output them (yet). James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/4] scsi: 64-bit LUN support
Regarding changes like this: - printk(MYIOC_s_NOTE_FMT "[%d:%d:%d:%d] " + printk(MYIOC_s_NOTE_FMT "[%d:%d:%d:%llu] " "FCP_ResponseInfo=%08xh\n", ioc->name, sc->device->host->host_no, sc->device->channel, sc->device->id, sc->device->lun, It might be preferable to print the LUN values in hex rather than decimal, particularly if they are large values. SAM-5 includes some guidance for displaying LUNs, shown below. One important goal is to match the format, if any, that the user must use in a configuration file or command line argument, so cutting-and-pasting the LUN value works. So, the answer might differ for prints from different drivers. If a driver expects decimal input values, then print decimal. SAM-5 excerpt: 4.7.2 Logical unit representation format When an application client displays or otherwise makes a 64-bit LUN value visible, the application client should display it in hexadecimal format with byte 0 first (i.e., on the left) and byte 7 last (i.e., on the right), regardless of the internal representation of the LUN value (e.g., a single level LUN with an ADDRESS METHOD field set to 01b (i.e., flat space addressing) and a FLAT SPACE LUN field set to 0001h should be displayed as 40 01 00 00 00 00 00 00h, not 00 00 00 00 00 00 01 40h). A separator (e.g., space, dash, or colon) may be included between each byte, each two bytes (e.g., 4001---h), or each four bytes (e.g., 4001 h). [The trailing h is just the T10 documentation convention... a 0x prefix is fine too] [The next three paragraph allow stripping off unnecessary trailing zeros:] When displaying a single level LUN structure using the peripheral device addressing method (see table 11) or a single level LUN structure using the flat space addressing method (see table 12), an application client may display the value as a single 2-byte value representing only the first level LUN (e.g., 40 01h). A separator (e.g., space, dash, or colon) may be included between each byte. When displaying a single level LUN structure using the extended flat space addressing method (see table 13), an application client may display the value as a single 4-byte value representing only the first level LUN (e.g., D2 00 00 01h). A separator (e.g., space, dash, or colon) may be included between each byte, or between each two bytes (e.g., D200 0001h). When displaying a single level LUN structure using the long extended flat space addressing method (see table 14), an application client may display the value as a single 6-byte value representing only the first level LUN (e.g., E2 00 00 01 00 01h). A separator (e.g. space, dash, or colon) may be included between each byte, or between each two bytes (e.g., E200 0001 0001h). When displaying a 16-bit LUN value, an application client should display the value as a single 2-byte value (e.g., 40 01h). A separator (e.g., space, dash, or colon) may be included between each byte. > -Original Message- > From: Hannes Reinecke [mailto:h...@suse.de] > Sent: Tuesday, 19 February, 2013 2:18 AM > To: linux-scsi@vger.kernel.org > Cc: James Bottomley; Jeremy Linton; Elliott, Robert (Server Storage); Bart Van > Assche; Hannes Reinecke > Subject: [PATCH 0/4] scsi: 64-bit LUN support > > This patchset updates the SCSI midlayer to use 64-bit LUNs internally. > It eliminates the need to limit the number of LUNs artificially to > avoid aliasing issues; the SCSI midlayer can now accept any LUN presented > to it. > > The LLDD specific settings for 'max_lun' have been left untouched; > it should be raised to '~0' if the HBA supports 64-bit LUNs internally. > However, it is up to the driver maintainer to raise that limit. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [SCSI]: megaraid: avoid sleeping on spinlock
GFP_KERNEL may cause pci_pool_alloc() sleep, so we need use GFP_ATOMIC instead of GFP_KERNEL. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Denis Efremov --- drivers/scsi/megaraid/megaraid_mm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/megaraid/megaraid_mm.c b/drivers/scsi/megaraid/megaraid_mm.c index 25506c7..4b2f336 100644 --- a/drivers/scsi/megaraid/megaraid_mm.c +++ b/drivers/scsi/megaraid/megaraid_mm.c @@ -568,7 +568,7 @@ mraid_mm_attach_buf(mraid_mmadp_t *adp, uioc_t *kioc, int xferlen) kioc->pool_index= right_pool; kioc->free_buf = 1; - kioc->buf_vaddr = pci_pool_alloc(pool->handle, GFP_KERNEL, + kioc->buf_vaddr = pci_pool_alloc(pool->handle, GFP_ATOMIC, &kioc->buf_paddr); spin_unlock_irqrestore(&pool->lock, flags); -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Read I/O starvation with writeback RAID controller
I'm sorry, I forgot to mention hardware details. It isn't aacraid, it is megaraid-based Dell PERC H700 w/ 1GB NVRAM and 12x 450GB 15k SAS drives in RAID-10. All in Dell R510 server. Thanks, Martin Dne 20.2.2013 21:48, Nicholas A. Bellinger napsal(a): > Hi Martin, > > CC'ing linux-scsi here, as aacraid doesn't have an official maintainer > atm. > > --nab > > On Wed, 2013-02-20 at 16:38 +0100, Martin Svec wrote: >> Hello, >> >> I've noticed read I/O starvation problems of LIO iSCSI target when >> used on top of writeback-enabled HW RAID controller (PERC H700 with >> 1GB cache). For intensive mixed read-write workload in virtualized >> environments, writes are able to consume over 95% of the IOPS >> throughput and cause starvation of reads. >> >> After a number of tests it seems to me it's a general issue of block >> layer I/O scheduling when running on top of a writeback device. If >> there is a write-intensive task, all writes go to the writeback cache >> with near-zero latency. This allows writer to quickly saturate the >> device with thousands of writes while using only a minimal fraction of >> queue depth. However, non-cached reads depend on spinning drive >> latencies which are orders of magnitude higher than writeback cache >> latencies, and so readers cannot submit so many requests per second as >> writers. Consequently, I guess the controller has totally wrong view >> of the incoming workload pattern, tries to satisfy the write flood >> first and the net result is inacceptable starvation of reads, with >> latencies up to hundreds of milliseconds. >> >> A simple fio test with 1TiB block device where one thread does 4k >> random sync writes with iodepth=32 and one thread does 4k random reads >> with iodepth=32 shows that instead of the theoretical 50:50 IOPS >> ratio, the block device runs with 95:5 ratio in favor of writes. In >> fact, the imbalance is so high that even write iodepth=2 is enaugh to >> achieve the same numbers. >> >> Real workloads that tend to exhibit this problem are: initial zeroing >> of a virtual machine disk, virtual machine migration, virtual machine >> cloning, intensive swapping of one virtual machine etc. >> >> I tried to set WCE=1 on target iblock device, played with queue >> depths, tested all three I/O schedulers and their parameters, >> controller's parameters, but with no luck. To achieve reasonably good >> fairness, the only solution is to set nr_requests to 1 or disable >> controller's writeback cache at all -- at the expense of degraded >> overall performance :-( >> >> Regarding nr_requests, there's obvious relation between iodepths and >> read starvation: if (nr_requests >= workload iodepth) then starvation >> surely occurs. Lowering nr_requests below this threshold slowly starts >> improving fairness and for every rd+wr iodepths pair, there exists >> sufficiently low nr_requests value at which IOPS ratio is finally >> balanced according to rd:wr iodepth ratio. Unfortunately it means >> there is no minimal nr_requests value suitable for all workloads. For >> iodepths around 2 to 8, only nr_requests=1 provides fair load balancing. >> >> Is this a known problem? Does anybody find block layer parameters that >> elliminate this problem for iscsi-target storage in mixed random >> read-write environments like virtualization? Or should I start writing >> my own I/O scheduler? ;-) >> >> Update: I've just found https://lkml.org/lkml/2012/12/10/550 (Read >> starvation by sync writes), where Jan Kara describes identical >> symptoms. But setting nr_requests=1 doesn't help in my case. >> CC'ing LKML too (I'm not LKML subscriber). >> >> Thanks, >> >> Martin >> >> -- >> To unsubscribe from this list: send the line "unsubscribe target-devel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] [SCSI] aacraid: suppress two GCC warnings
Building src.o for a 32 bit system triggers two GCC warnings: drivers/scsi/aacraid/src.c: In function ‘aac_src_deliver_message’: drivers/scsi/aacraid/src.c:410:3: warning: right shift count >= width of type [enabled by default] drivers/scsi/aacraid/src.c:434:2: warning: right shift count >= width of type [enabled by default] These warnings are caused by a right shift of 32. Use upper_32_bits() to suppress them. Signed-off-by: Paul Bolle --- 0) Instead of a cast to u64, this version uses upper_32_bits() as James suggested. I also stopped changing 0L to 0UL, because I keep having doubts about the cargo cult. 1) Still compile tested only, but now on v3.8. drivers/scsi/aacraid/src.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c index 3b021ec..e2e3492 100644 --- a/drivers/scsi/aacraid/src.c +++ b/drivers/scsi/aacraid/src.c @@ -407,7 +407,7 @@ static int aac_src_deliver_message(struct fib *fib) fib->hw_fib_va->header.StructType = FIB_MAGIC2; fib->hw_fib_va->header.SenderFibAddress = (u32)address; fib->hw_fib_va->header.u.TimeStamp = 0; - BUG_ON((u32)(address >> 32) != 0L); + BUG_ON(upper_32_bits(address) != 0L); address |= fibsize; } else { /* Calculate the amount to the fibsize bits */ @@ -431,7 +431,7 @@ static int aac_src_deliver_message(struct fib *fib) address |= fibsize; } - src_writel(dev, MUnit.IQ_H, (address >> 32) & 0x); + src_writel(dev, MUnit.IQ_H, upper_32_bits(address) & 0x); src_writel(dev, MUnit.IQ_L, address & 0x); return 0; -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SCSI: amd_iommu dma_boundary overflow
Hi Eddie, > On Tue, 2013-02-19 at 18:30 -0800, Eddie Wai wrote: > > The code seems correct as it make sense to impose the same hardware > > segment boundary limit on both the blk queue and the DMA code. It would > > be an easy alternative to simply prevent the shost->dma_boundary from > > being set to DMA_BIT_MASK(64), but it seems more correct to fix the > > amd_iommu code itself to detect and handle this max 64-bit mask condition. Thanks for tracking this problem down. It turns out that this code does not only exist in the AMD IOMMU driver but also in other ones (Calgary and GART at least, havn't checked all). > > --- a/drivers/iommu/amd_iommu.c > > +++ b/drivers/iommu/amd_iommu.c > > @@ -1526,11 +1526,14 @@ static unsigned long dma_ops_area_alloc(struct > > device *dev, > > unsigned long boundary_size; > > unsigned long address = -1; > > unsigned long limit; > > + unsigned long mask; > > > > next_bit >>= PAGE_SHIFT; > > > > - boundary_size = ALIGN(dma_get_seg_boundary(dev) + 1, > > - PAGE_SIZE) >> PAGE_SHIFT; Given that there is a BUG_ON() in the iommu-helpers which checks for !is_power_of_2(boundary_size) I think we can simplify the this macro and avoid the overflow in a more clever way: boundary_size = (dma_get_seg_boundary(dev) >> PAGE_SHIFT) + 1; This should work because dma_get_seg_boundary(dev) really needs to be a bitmask which becomes a power_of_2 on incrementing. Regards, Joerg -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v10 0/4] block layer runtime pm
On Wed, Feb 20, 2013 at 10:43:50AM -0500, Alan Stern wrote: > On Wed, 20 Feb 2013, Aaron Lu wrote: > > > In August 2010, Jens and Alan discussed about "Runtime PM and the block > > layer". http://marc.info/?t=12825910841&r=1&w=2 > > And then Alan has given a detailed implementation guide: > > http://marc.info/?l=linux-scsi&m=133727953625963&w=2 > > > v10: > > - Add link of Alan Stern's ideas on block layer runtime PM to patch 2 > > and 3's changelog; > > - Add back code to schdule device suspend if scsi driver return -EBUSY. > > This all looks okay now. You can add > > Acked-by: Alan Stern > > to each of the patches. Great, thanks a lot for your kind help. Hi James, Can I have your ack for patch 1 and 4? And Jens, Do you have any comments for this series? Thanks, Aaron -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] block: modify __bio_add_page check to accept pages that don't start a new segment
The original behavior was to refuse all pages after the maximum number of segments has been reached. However, some drivers (like st) craft their buffers to potentially require exactly max segments and multiple pages in the last segment. This patch modifies the check to allow pages that can be merged into the last segment. This change fixes EBUSY failures when using large (1mb) tape block size in high memory fragmentation condition. Signed-off-by: Jan Vesely --- fs/bio.c | 26 -- 1 files changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/bio.c b/fs/bio.c index b96fc6c..02efbd5 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -500,7 +500,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page *page, unsigned int len, unsigned int offset, unsigned short max_sectors) { - int retried_segments = 0; struct bio_vec *bvec; /* @@ -551,18 +550,12 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page return 0; /* -* we might lose a segment or two here, but rather that than -* make this too complex. +* prepare segment count check, reduce segment count if possible */ - while (bio->bi_phys_segments >= queue_max_segments(q)) { - - if (retried_segments) - return 0; - - retried_segments = 1; + if (bio->bi_phys_segments >= queue_max_segments(q)) blk_recount_segments(q, bio); - } + /* * setup the new entry, we might clear it again later if we @@ -572,6 +565,19 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page bvec->bv_page = page; bvec->bv_len = len; bvec->bv_offset = offset; + + /* +* the other part of the segment count check, allow mergeable pages +*/ + if ((bio->bi_phys_segments > queue_max_segments(q)) || + ( (bio->bi_phys_segments == queue_max_segments(q)) && + !BIOVEC_PHYS_MERGEABLE(bvec - 1, bvec))) { + bvec->bv_page = NULL; + bvec->bv_len = 0; + bvec->bv_offset = 0; + return 0; + } + /* * if queue has other restrictions (eg varying max sector size -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html