Re: [MINI SUMMIT] SCSI core performance

2012-07-18 Thread James Bottomley
On Tue, 2012-07-17 at 19:39 -0700, Nicholas A. Bellinger wrote:
 Hi KS-PCs,
 
 I'd like to propose a SCSI performance mini-summit to see how interested
 folks are in helping address the long-term issues that SCSI core is
 currently facing wrt to multi-lun per host and heavy small block random
 I/O workloads.
 
 I know this would probably be better suited for LSF (for the record it
 was proposed this year) but now that we've acknowledge there is a
 problem with SCSI LLDs vs. raw block drivers vs. other SCSI subsystems,
 it would be useful to get the storage folks into a single room at some
 point during KS/LPC to figure out what is actually going on with SCSI
 core.

You seem to have a short memory:  The last time it was discussed

http://marc.info/?t=13415537393

It rapidly became apparent there isn't a problem.  Enabling high IOPS in
the SCSI stack is what I think you mean.

 As mentioned in the recent tcm_vhost thread, there are a number of cases
 where drivers/target/ code can demonstrate this limitation pretty
 vividly now.
 
 This includes the following scenarios using raw block flash export with
 target_core_mod + target_core_iblock export and the same small block
 (4k) mixed random I/O workload with fio:
 
 *) tcm_loop local SCSI LLD performance is an order of magnitude slower 
than the same local raw block flash backend.
 *) tcm_qla2xxx performs better using MSFT Server hosts than Linux v3.x
based hosts using 2x socket Nehalem hardware w/ PCI-e Gen2 HBAs
 *) ib_srpt performs better using MSFT Server host than RHEL 6.x .32 
based hosts using 2x socket Romley hardware w/ PCI-e Gen3 HCAs
 *) Raw block IBLOCK export into KVM guest v3.5-rc w/ virtio-scsi is 
behind in performance vs. raw local block flash.  (cmwq on the host 
is helping here, but still need to with MSFT SCSI mini-port)
 
 Also, with 1M IOPs into a single VM guest now being done by other non
 Linux based hypervisors, the virtualization bit for high performance KVM
 SCSI based storage is quickly coming on..
 
 So all of that said, I'd like to at least have a discussion with the key
 SCSI + block folks who will be present in San Diego on path forward to
 address these without having to wait until LSF-2013 + hope for a topic
 slot to materialize then.
 
 Thank you for your consideration,

Well, your proposal is devoid of an actual proposal.

Enabling high IOPS involves reducing locking overhead and path length
through the code.  I think most of the low hanging fruit in this area is
already picked, but if you have an idea, please say.  There might be
something we can extract from the lockless queue work Jens is doing, but
we need that to materialise first.

Without a concrete thing to discuss, shooting the breeze on high IOPS in
the SCSI stack is about as useful as discussing what happened in last
night's episode of Coronation Street which, when it happens in my house,
always helps me see how incredibly urgent fixing the leaky tap I've been
putting off for months actually is.

If someone can come up with a proposal ... or even perhaps another path
trace showing where the reducible overhead and lock problems are we can
discuss it on the list and we might have a real topic by the time LSF
rolls around.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target: Allow for target_submit_cmd() returning errors

2012-07-18 Thread Sebastian Andrzej Siewior

On 07/18/2012 02:05 AM, Nicholas A. Bellinger wrote:

index 02ace18..9ddf315 100644
--- a/drivers/usb/gadget/tcm_usb_gadget.c
+++ b/drivers/usb/gadget/tcm_usb_gadget.c
@@ -1060,7 +1060,10 @@ static void usbg_cmd_work(struct work_struct *work)
tpg = cmd-fu-tpg;
tv_nexus = tpg-tpg_nexus;
dir = get_cmd_dir(cmd-cmd_buf);
-   if (dir  0) {
+   if (dir  0 ||
+   target_submit_cmd(se_cmd, tv_nexus-tvn_se_sess,
+ cmd-cmd_buf, cmd-sense_iu.sense, 
cmd-unpacked_lun,
+ 0, cmd-prio_attr, dir, TARGET_SCF_UNKNOWN_SIZE)) 
{
transport_init_se_cmd(se_cmd,
tv_nexus-tvn_se_sess-se_tpg-se_tpg_tfo,
tv_nexus-tvn_se_sess, cmd-data_len, DMA_NONE,
@@ -1069,12 +1072,7 @@ static void usbg_cmd_work(struct work_struct *work)
transport_send_check_condition_and_sense(se_cmd,
TCM_UNSUPPORTED_SCSI_OPCODE, 1);
usbg_cleanup_cmd(cmd);
-   return;
}
-
-   target_submit_cmd(se_cmd, tv_nexus-tvn_se_sess,
-   cmd-cmd_buf, cmd-sense_iu.sense, cmd-unpacked_lun,
-   0, cmd-prio_attr, dir, TARGET_SCF_UNKNOWN_SIZE);
  }

  static int usbg_submit_command(struct f_uas *fu,


Looking at this part again target_submit_cmd() has been moved ahead of
target_init_se_cmd() in the exception path, which ends up getting called
twice during a target_get_sess_cmd() failure..

It looks like usbg_cmd_work() + bot_cmd_work() will need some work if
they intends to use a non zero failure to signal exception here, without
invoking a CHECK_CONDITION and sense.  Actually, I'm not even sure
sending a CHECK_CONDITION here after the target_submit_cmd() is going to
work for other fabrics drivers, but it appears to work with usb-gadget.
(Sebastian..?)


For UAS the status/ sense URB is sent right away (without any data) and
this worked the last time I tested.
For BOT things are a little complicated. What I do is to send empty
data (to fill the data pipe) and send a bad status so the other side
learns that the transfer failed somehow. The sense information is lost.
What I should do is to queue this sense data for the next MODE_SENSE
request which will be coming soon. Right now WinXP repeats a failed
command thrice if MODE_SENSE returns no error. This is on my to-fix
list…


So for the moment, lets fix the double se_cmd init and make things a
little easier to read..   Sebastian, please have a look and double check
that the (dir  0) + target_submit_cmd() failures cases are both going
to work as expected whilst sending a CHECK_CONDITION response.


The sense code should only be sent once and
transport_send_check_condition_and_sense() sets
SCF_SENT_CHECK_CONDITION and checks for it so doing it twice should not
do any harm right? But grep does not say when this flag gets removed.



Thanks!

--nab


Sebastian
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 0/5] Add vhost-blk support

2012-07-18 Thread Stefan Hajnoczi
On Tue, Jul 17, 2012 at 4:09 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Fri, Jul 13, 2012 at 04:55:06PM +0800, Asias He wrote:

 Hi folks,

 [I am resending to fix the broken thread in the previous one.]

 This patchset adds vhost-blk support. vhost-blk is a in kernel virito-blk
 device accelerator. Compared to userspace virtio-blk implementation, 
 vhost-blk
 gives about 5% to 15% performance improvement.

 Same thing as tcm_host comment:

 It seems not 100% clear whether this driver will have major
 userspace using it. And if not, it would be very hard to support a
 driver when recent userspace does not use it in the end.

 I think a good idea for 3.6 would be to make it depend on
 CONFIG_STAGING.  Then we don't commit to an ABI.  For this, you can 
 add
 a separate Kconfig and source it from drivers/staging/Kconfig.  Maybe 
 it
 needs to be in a separate directory drivers/vhost/staging/Kconfig.

 I Cc'd the list of tcm_host in the hope that you can cooperate on this.


Adding it to staging allows more people to try it out, so that's a
good thing.  If I get a moment to play with it I'll let you know the
results.

Stefan
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v3 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 12:59:28AM +, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@linux-iscsi.org
 
 Hi folks,
 
 The following is the RFC-v3 series of tcm_vhost target fabric driver code
 currently in-flight for-3.6 mainline code.
 
 With the merge window opening soon, the tcm_vhost code has started seeing
 time in linux-next.  The v2 - v3 changelog from the last week is currently
 looking like:
 
  *) Unlock on error in tcm_vhost_drop_nexus() (DanC)
  *) Fix strlen() doesn't count the terminator (DanC)
  *) Call kfree() on an error path (DanC)
  *) Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab)
  *) Fix another strlen() off by one in tcm_vhost_make_tport (DanC)
  *) Add option under drivers/staging/Kconfig, and move to drivers/vhost/tcm/
 as requested by MST (nab)

I actually only wanted to have a separate Kconfig (in a separate
directory or Kconfig.tcm), so you do not need to move code back to move
driver out of staging.  But if you prefer this, I'm fine with it too for
now.

 Thanks to Dan Carpenter for his smatch fixes this past round.
 
 Also as requested by MST, the code has been moved to a seperate tcm/ 
 subdirectory
 under drivers/vhost/ so that it can be included under staging's config options
 until we can settle on the necessary userspace bits for QEMU and kvm-tool.
 
 The updated series will be going out shortly to target-pending/for-next-merge.
 
 Please have another look and let us know if you have any concerned.
 
 Thanks!
 
 Nicholas Bellinger (2):
   vhost: Add vhost_scsi specific defines
   tcm_vhost: Initial merge for vhost level target fabric driver
 
 Stefan Hajnoczi (2):
   vhost: Separate vhost-net features from vhost features
   vhost: make vhost work queue visible
 
  drivers/staging/Kconfig   |2 +
  drivers/vhost/Makefile|2 +
  drivers/vhost/net.c   |4 +-
  drivers/vhost/tcm/Kconfig |6 +
  drivers/vhost/tcm/Makefile|1 +
  drivers/vhost/tcm/tcm_vhost.c | 1611 
 +
  drivers/vhost/tcm/tcm_vhost.h |   74 ++
  drivers/vhost/test.c  |4 +-
  drivers/vhost/vhost.c |5 +-
  drivers/vhost/vhost.h |6 +-
  include/linux/vhost.h |9 +
  11 files changed, 1716 insertions(+), 8 deletions(-)
  create mode 100644 drivers/vhost/tcm/Kconfig
  create mode 100644 drivers/vhost/tcm/Makefile
  create mode 100644 drivers/vhost/tcm/tcm_vhost.c
  create mode 100644 drivers/vhost/tcm/tcm_vhost.h
 
 -- 
 1.7.2.5
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v3 3/4] vhost: Add vhost_scsi specific defines

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 12:59:31AM +, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@risingtidesystems.com
 
 This patch adds the initial vhost_scsi_ioctl() callers for 
 VHOST_SCSI_SET_ENDPOINT
 and VHOST_SCSI_CLEAR_ENDPOINT respectively, and also adds struct 
 vhost_vring_target
 that is used by tcm_vhost code when locating target ports during qemu setup.
 
 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Cc: Zhi Yong Wu wu...@cn.ibm.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Paolo Bonzini pbonz...@redhat.com,
 Signed-off-by: Nicholas A. Bellinger n...@risingtidesystems.com
 ---
  include/linux/vhost.h |9 +
  1 files changed, 9 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/vhost.h b/include/linux/vhost.h
 index e847f1e..33b313b 100644
 --- a/include/linux/vhost.h
 +++ b/include/linux/vhost.h
 @@ -24,7 +24,11 @@ struct vhost_vring_state {
  struct vhost_vring_file {
   unsigned int index;
   int fd; /* Pass -1 to unbind from file. */
 +};
  
 +struct vhost_vring_target {

Can this be renamed vhost_scsi_target?

 + unsigned char vhost_wwpn[224];

224? I am guessing ISCSI_NAME_LEN from include/scsi/iscsi_proto.h?
Unfortunately we can't include iscsi_proto.h here as it
is not exported to users. But let's add a comment for now.

 + unsigned short vhost_tpgt;
  };
  
  struct vhost_vring_addr {
 @@ -121,6 +125,11 @@ struct vhost_memory {
   * device.  This can be used to stop the ring (e.g. for migration). */
  #define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct 
 vhost_vring_file)
  
 +/* VHOST_SCSI specific defines */
 +
 +#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct 
 vhost_vring_target)
 +#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct 
 vhost_vring_target)
 +
  /* Feature bits */
  /* Log all write descriptors. Can be changed while device is active. */

Can these go into appropriate ifdef CONFIG_TCP_VHOST please?

  #define VHOST_F_LOG_ALL 26
 -- 
 1.7.2.5
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Anthony Liguori

On 07/17/2012 04:50 PM, Nicholas A. Bellinger wrote:

On Tue, 2012-07-17 at 13:55 -0500, Anthony Liguori wrote:

On 07/17/2012 10:05 AM, Michael S. Tsirkin wrote:

On Wed, Jul 11, 2012 at 09:15:00PM +, Nicholas A. Bellinger wrote:


SNIP



It still seems not 100% clear whether this driver will have major
userspace using it. And if not, it would be very hard to support a driver
when recent userspace does not use it in the end.


I don't think this is a good reason to exclude something from the kernel.
However, there are good reasons why this doesn't make sense for something like
QEMU--specifically because we have a large number of features in our block layer
that tcm_vhost would bypass.



I can definitely appreciate your concern here as the QEMU maintainer.


But perhaps it makes sense for something like native kvm tool.  And if it did go
into the kernel, we would certainly support it in QEMU.



...


But I do think the kernel should carefully consider whether it wants to support
an interface like this.  This an extremely complicated ABI with a lot of subtle
details around state and compatibility.

Are you absolutely confident that you can support a userspace application that
expects to get exactly the same response from all possible commands in 20 kernel
versions from now?  Virtualization requires absolutely precise compatibility in
terms of bugs and features.  This is probably not something the TCM stack has
had to consider yet.



We most certainly have thought about long term userspace compatibility
with TCM.  Our userspace code (that's now available in all major
distros) is completely forward-compatible with new fabric modules such
as tcm_vhost.  No update required.


I'm not sure we're talking about the same thing when we say compatibility.

I'm not talking about the API.  I'm talking about the behavior of the commands 
that tcm_vhost supports.


If you add support for a new command, you need to provide userspace a way to 
disable this command.  If you change what gets reported for VPD, you need to 
provide userspace a way to make VPD look like what it did in a previous version.


Basically, you need to be able to make a TCM device behave 100% the same as it 
did in an older version of the kernel.


This is unique to virtualization due to live migration.  If you migrate from a 
3.6 kernel to a 3.8 kernel, you need to make sure that the 3.8 kernel's TCM 
device behaves exactly like the 3.6 kernel because the guest that is interacting 
with it does not realize that live migration happened.


Yes, you can add knobs via configfs to control this behavior, but I think the 
question is, what's the plan for this?


BTW, I think this is a good thing to cover in Documentation/vhost/tcm_vhost.txt. 
 I think that's probably the only change that's needed here.


Regards,

Anthony Liguori



Also, by virtue of the fact that we are using configfs + rtslib (python
object library) on top, it's very easy to keep any type of compatibility
logic around in python code.  With rtslib, we are able to hide configfs
ABI changes from higher level apps.

So far we've had a track record of 100% userspace ABI compatibility in
mainline since .38, and I don't intend to merge a patch that breaks this
any time soon.  But if that ever happens, apps using rtslib are not
going to be effected.


I think a good idea for 3.6 would be to make it depend on CONFIG_STAGING.
Then we don't commit to an ABI.


I think this is a good idea.  Even if it goes in, a really clear policy would be
needed wrt the userspace ABI.

While tcm_vhost is probably more useful than vhost_blk, it's a much more complex
ABI to maintain.



As far as I am concerned, the kernel API (eg: configfs directory layout)
as it is now in sys/kernel/config/target/vhost/ is not going to change.
It's based on the same drivers/target/target_core_fabric_configfs.c
generic layout that we've had since .38.

The basic functional fabric layout in configfs is identical (with fabric
dependent WWPN naming of course) regardless of fabric driver, and by
virtue of being generic it means we can add things like fabric dependent
attributes + parameters in the future for existing fabrics without
breaking userspace.

So while I agree the ABI is more complex than vhost-blk, the logic in
target_core_fabric_configfs.c is a basic ABI fabric definition that we
are enforcing across all fabric modules in mainline for long term
compatibility.

--nab



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Paolo Bonzini
Il 18/07/2012 15:42, Anthony Liguori ha scritto:
 If you add support for a new command, you need to provide userspace a
 way to disable this command.  If you change what gets reported for VPD,
 you need to provide userspace a way to make VPD look like what it did in
 a previous version.

The QEMU target is not enforcing this to this level.  We didn't for
CD-ROM ATAPI, and we're not doing it for SCSI.

It may indeed be useful for changes to VPD pages or major features.
However, so far we've never introduced any feature that deserved it.
This is also because OSes typically don't care: they use a small subset
of the features and all the remaining decorations are only needed to
be pedantically compliant to the spec.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 08:42:21AM -0500, Anthony Liguori wrote:
 On 07/17/2012 04:50 PM, Nicholas A. Bellinger wrote:
 On Tue, 2012-07-17 at 13:55 -0500, Anthony Liguori wrote:
 On 07/17/2012 10:05 AM, Michael S. Tsirkin wrote:
 On Wed, Jul 11, 2012 at 09:15:00PM +, Nicholas A. Bellinger wrote:
 
 SNIP
 
 
 It still seems not 100% clear whether this driver will have major
 userspace using it. And if not, it would be very hard to support a driver
 when recent userspace does not use it in the end.
 
 I don't think this is a good reason to exclude something from the kernel.
 However, there are good reasons why this doesn't make sense for something 
 like
 QEMU--specifically because we have a large number of features in our block 
 layer
 that tcm_vhost would bypass.
 
 
 I can definitely appreciate your concern here as the QEMU maintainer.
 
 But perhaps it makes sense for something like native kvm tool.  And if it 
 did go
 into the kernel, we would certainly support it in QEMU.
 
 
 ...
 
 But I do think the kernel should carefully consider whether it wants to 
 support
 an interface like this.  This an extremely complicated ABI with a lot of 
 subtle
 details around state and compatibility.
 
 Are you absolutely confident that you can support a userspace application 
 that
 expects to get exactly the same response from all possible commands in 20 
 kernel
 versions from now?  Virtualization requires absolutely precise 
 compatibility in
 terms of bugs and features.  This is probably not something the TCM stack 
 has
 had to consider yet.
 
 
 We most certainly have thought about long term userspace compatibility
 with TCM.  Our userspace code (that's now available in all major
 distros) is completely forward-compatible with new fabric modules such
 as tcm_vhost.  No update required.
 
 I'm not sure we're talking about the same thing when we say compatibility.
 
 I'm not talking about the API.  I'm talking about the behavior of
 the commands that tcm_vhost supports.
 
 If you add support for a new command, you need to provide userspace
 a way to disable this command.  If you change what gets reported for
 VPD, you need to provide userspace a way to make VPD look like what
 it did in a previous version.
 
 Basically, you need to be able to make a TCM device behave 100% the
 same as it did in an older version of the kernel.
 
 This is unique to virtualization due to live migration.  If you
 migrate from a 3.6 kernel to a 3.8 kernel, you need to make sure
 that the 3.8 kernel's TCM device behaves exactly like the 3.6 kernel
 because the guest that is interacting with it does not realize that
 live migration happened.
 
 Yes, you can add knobs via configfs to control this behavior, but I
 think the question is, what's the plan for this?
 
 BTW, I think this is a good thing to cover in
 Documentation/vhost/tcm_vhost.txt.  I think that's probably the only
 change that's needed here.
 
 Regards,
 
 Anthony Liguori

I agree it's needed but it's not a requirement for merging IMHO.
As a first step we can disable live migration.

 
 Also, by virtue of the fact that we are using configfs + rtslib (python
 object library) on top, it's very easy to keep any type of compatibility
 logic around in python code.  With rtslib, we are able to hide configfs
 ABI changes from higher level apps.
 
 So far we've had a track record of 100% userspace ABI compatibility in
 mainline since .38, and I don't intend to merge a patch that breaks this
 any time soon.  But if that ever happens, apps using rtslib are not
 going to be effected.
 
 I think a good idea for 3.6 would be to make it depend on CONFIG_STAGING.
 Then we don't commit to an ABI.
 
 I think this is a good idea.  Even if it goes in, a really clear policy 
 would be
 needed wrt the userspace ABI.
 
 While tcm_vhost is probably more useful than vhost_blk, it's a much more 
 complex
 ABI to maintain.
 
 
 As far as I am concerned, the kernel API (eg: configfs directory layout)
 as it is now in sys/kernel/config/target/vhost/ is not going to change.
 It's based on the same drivers/target/target_core_fabric_configfs.c
 generic layout that we've had since .38.
 
 The basic functional fabric layout in configfs is identical (with fabric
 dependent WWPN naming of course) regardless of fabric driver, and by
 virtue of being generic it means we can add things like fabric dependent
 attributes + parameters in the future for existing fabrics without
 breaking userspace.
 
 So while I agree the ABI is more complex than vhost-blk, the logic in
 target_core_fabric_configfs.c is a basic ABI fabric definition that we
 are enforcing across all fabric modules in mainline for long term
 compatibility.
 
 --nab
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Christoph Hellwig
On Wed, Jul 18, 2012 at 08:42:21AM -0500, Anthony Liguori wrote:
 
 If you add support for a new command, you need to provide userspace
 a way to disable this command.  If you change what gets reported for
 VPD, you need to provide userspace a way to make VPD look like what
 it did in a previous version.
 
 Basically, you need to be able to make a TCM device behave 100% the
 same as it did in an older version of the kernel.
 
 This is unique to virtualization due to live migration.  If you
 migrate from a 3.6 kernel to a 3.8 kernel, you need to make sure
 that the 3.8 kernel's TCM device behaves exactly like the 3.6 kernel
 because the guest that is interacting with it does not realize that
 live migration happened.

I don't think these strict live migration rules apply to SCSI targets.

Real life storage systems get new features and different behaviour with
firmware upgrades all the time, and SCSI initiators deal with that just
fine.  I don't see any reason to be more picky just because we're
virtualized.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 11:53:38AM -0400, Christoph Hellwig wrote:
 On Wed, Jul 18, 2012 at 08:42:21AM -0500, Anthony Liguori wrote:
  
  If you add support for a new command, you need to provide userspace
  a way to disable this command.  If you change what gets reported for
  VPD, you need to provide userspace a way to make VPD look like what
  it did in a previous version.
  
  Basically, you need to be able to make a TCM device behave 100% the
  same as it did in an older version of the kernel.
  
  This is unique to virtualization due to live migration.  If you
  migrate from a 3.6 kernel to a 3.8 kernel, you need to make sure
  that the 3.8 kernel's TCM device behaves exactly like the 3.6 kernel
  because the guest that is interacting with it does not realize that
  live migration happened.
 
 I don't think these strict live migration rules apply to SCSI targets.
 
 Real life storage systems get new features and different behaviour with
 firmware upgrades all the time, and SCSI initiators deal with that just
 fine.
  I don't see any reason to be more picky just because we're
 virtualized.


Presumably initiators are shut down for target firmware upgrades?
With virtualization your host can change without guest shutdown.
You can also *lose* commands when migrating to an older host.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v3 4/4] tcm_vhost: Initial merge for vhost level target fabric driver

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 12:59:32AM +, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@linux-iscsi.org
 
 This patch adds the initial code for tcm_vhost, a Vhost level TCM
 fabric driver for virtio SCSI initiators into KVM guest.
 
 This code is currently up and running on v3.5-rc2 host+guest along
 with the virtio-scsi vdev-scan() patch to allow a proper
 scsi_scan_host() to occur once the tcm_vhost nexus has been established
 by the paravirtualized virtio-scsi client here:
 
 virtio-scsi: Add vdrv-scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
 http://marc.info/?l=linux-scsim=134160609212542w=2
 
 Using tcm_vhost requires Zhi's - Stefan's qemu vhost-scsi tree here:
 
 https://github.com/wuzhy/qemu/tree/vhost-scsi
 
 along with the recent QEMU patch to hw/virtio-scsi.c to set max_target=0
 during vhost-scsi operation.
 
 Changelog v2 - v3:
 
   Unlock on error in tcm_vhost_drop_nexus() (DanC)
   Fix strlen() doesn't count the terminator (DanC)
   Call kfree() on an error path (DanC)
   Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab)
   Fix another strlen() off by one in tcm_vhost_make_tport (DanC)
   Add option under drivers/staging/Kconfig, and move to drivers/vhost/tcm/
   as requested by MST (nab)
 
 Changelog v1 - v2:
 
   Fix tv_cmd completion - release SGL memory leak (nab)
   Fix sparse warnings for static variable usage ((Fengguang Wu)
   Fix sparse warnings for min() typing + printk format specs (Fengguang Wu)
   Convert to cmwq submission for I/O dispatch (nab + hch)
 
 Changelog v0 - v1:
 
   Merge into single source + header file, and move to drivers/vhost/
 
 Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Cc: Zhi Yong Wu wu...@cn.ibm.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Christoph Hellwig h...@lst.de
 Cc: Hannes Reinecke h...@suse.de
 Cc: Jens Axboe ax...@kernel.dk
 Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
 ---
  drivers/staging/Kconfig   |2 +
  drivers/vhost/Makefile|2 +
  drivers/vhost/tcm/Kconfig |6 +
  drivers/vhost/tcm/Makefile|1 +
  drivers/vhost/tcm/tcm_vhost.c | 1611 
 +
  drivers/vhost/tcm/tcm_vhost.h |   74 ++
  6 files changed, 1696 insertions(+), 0 deletions(-)
  create mode 100644 drivers/vhost/tcm/Kconfig
  create mode 100644 drivers/vhost/tcm/Makefile
  create mode 100644 drivers/vhost/tcm/tcm_vhost.c
  create mode 100644 drivers/vhost/tcm/tcm_vhost.h
 

Really sorry about making you run around like that,
I did not mean moving all of tcm to a directory,
just adding tcm/Kconfig or adding drivers/vhost/Kconfig.tcm
because eventually it's easier to keep it all together
in one place.

 diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
 index 05e33c7..8d1a627 100644
 --- a/drivers/staging/Kconfig
 +++ b/drivers/staging/Kconfig
 @@ -132,4 +132,6 @@ source drivers/staging/ipack/Kconfig
  
  source drivers/staging/gdm72xx/Kconfig
  
 +source drivers/vhost/tcm/Kconfig
 +
  endif # STAGING
 diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
 index 72dd020..3408bea 100644
 --- a/drivers/vhost/Makefile
 +++ b/drivers/vhost/Makefile
 @@ -1,2 +1,4 @@
  obj-$(CONFIG_VHOST_NET) += vhost_net.o
  vhost_net-y := vhost.o net.o
 +
 +obj-$(CONFIG_TCM_VHOST) += tcm/
 diff --git a/drivers/vhost/tcm/Kconfig b/drivers/vhost/tcm/Kconfig
 new file mode 100644
 index 000..a9c6f76
 --- /dev/null
 +++ b/drivers/vhost/tcm/Kconfig
 @@ -0,0 +1,6 @@
 +config TCM_VHOST
 + tristate TCM_VHOST fabric module (EXPERIMENTAL)
 + depends on TARGET_CORE  EVENTFD  EXPERIMENTAL  m
 + default n
 + ---help---
 + Say M here to enable the TCM_VHOST fabric module for use with 
 virtio-scsi guests
 diff --git a/drivers/vhost/tcm/Makefile b/drivers/vhost/tcm/Makefile
 new file mode 100644
 index 000..54b0ea6
 --- /dev/null
 +++ b/drivers/vhost/tcm/Makefile
 @@ -0,0 +1 @@
 +obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
 diff --git a/drivers/vhost/tcm/tcm_vhost.c b/drivers/vhost/tcm/tcm_vhost.c
 new file mode 100644
 index 000..0ee4046
 --- /dev/null
 +++ b/drivers/vhost/tcm/tcm_vhost.c
 @@ -0,0 +1,1611 @@
 +/***
 + * Vhost kernel TCM fabric driver for virtio SCSI initiators
 + *
 + * (C) Copyright 2010-2012 RisingTide Systems LLC.
 + * (C) Copyright 2010-2012 IBM Corp.
 + *
 + * Licensed to the Linux Foundation under the General Public License (GPL) 
 version 2.
 + *
 + * Authors: Nicholas A. Bellinger n...@risingtidesystems.com
 + *  Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but 

Re: [RFC-v3 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 12:59:28AM +, Nicholas A. Bellinger wrote:
 From: Nicholas Bellinger n...@linux-iscsi.org
 
 Hi folks,
 
 The following is the RFC-v3 series of tcm_vhost target fabric driver code
 currently in-flight for-3.6 mainline code.

So I sent some comments. I think it's in an OK state for a staging
driver.  I'd suggest doing some interations while there is still time
and the rest we can fix up in tree.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Anthony Liguori

On 07/18/2012 10:53 AM, Christoph Hellwig wrote:

On Wed, Jul 18, 2012 at 08:42:21AM -0500, Anthony Liguori wrote:


If you add support for a new command, you need to provide userspace
a way to disable this command.  If you change what gets reported for
VPD, you need to provide userspace a way to make VPD look like what
it did in a previous version.

Basically, you need to be able to make a TCM device behave 100% the
same as it did in an older version of the kernel.

This is unique to virtualization due to live migration.  If you
migrate from a 3.6 kernel to a 3.8 kernel, you need to make sure
that the 3.8 kernel's TCM device behaves exactly like the 3.6 kernel
because the guest that is interacting with it does not realize that
live migration happened.


I don't think these strict live migration rules apply to SCSI targets.

Real life storage systems get new features and different behaviour with
firmware upgrades all the time, and SCSI initiators deal with that just
fine.  I don't see any reason to be more picky just because we're
virtualized.


But would this happen while a system is running live?

I agree that in general, SCSI targets don't need this, but I'm pretty sure that 
if a guest probes for a command, you migrate to an old version, and that command 
is no longer there, badness will ensue.


It's different when you're talking about a reboot happening or a 
disconnect/reconnect due to firmware upgrade.  The OS would naturally be 
reprobing in this case.


Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Rustad, Mark D
On Jul 18, 2012, at 9:00 AM, Michael S. Tsirkin wrote:

 On Wed, Jul 18, 2012 at 11:53:38AM -0400, Christoph Hellwig wrote:
 On Wed, Jul 18, 2012 at 08:42:21AM -0500, Anthony Liguori wrote:
 
 If you add support for a new command, you need to provide userspace
 a way to disable this command.  If you change what gets reported for
 VPD, you need to provide userspace a way to make VPD look like what
 it did in a previous version.
 
 Basically, you need to be able to make a TCM device behave 100% the
 same as it did in an older version of the kernel.
 
 This is unique to virtualization due to live migration.  If you
 migrate from a 3.6 kernel to a 3.8 kernel, you need to make sure
 that the 3.8 kernel's TCM device behaves exactly like the 3.6 kernel
 because the guest that is interacting with it does not realize that
 live migration happened.
 
 I don't think these strict live migration rules apply to SCSI targets.
 
 Real life storage systems get new features and different behaviour with
 firmware upgrades all the time, and SCSI initiators deal with that just
 fine.
 I don't see any reason to be more picky just because we're
 virtualized.
 
 Presumably initiators are shut down for target firmware upgrades?
 With virtualization your host can change without guest shutdown.
 You can also *lose* commands when migrating to an older host.


Actually no. Storage vendors do not want to impose a need to take initiators 
down for any reason. I have worked for a storage system vendor that routinely 
did firmware upgrades on-the-fly. This is done by multi-pathing and taking one 
path down, upgrade, bring up, repeat. There was even one non-redundant system 
that I am aware of that could upgrade firmware and reboot fast enough that the 
initiators would not notice.

You do have to pay very close attention to some things however. Don't change 
the device identity in any way - even version information, otherwise a Windows 
initiator will blue-screen. I made that mistake myself, so I remember it well. 
It seemed like such an innocent change. I don't recall there being any issue 
with adding commands and we did do that on occasion.

-- 
Mark Rustad, LAN Access Division, Intel Corporation

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread James Bottomley
On Wed, 2012-07-18 at 11:00 -0500, Anthony Liguori wrote:
 On 07/18/2012 10:53 AM, Christoph Hellwig wrote:
  On Wed, Jul 18, 2012 at 08:42:21AM -0500, Anthony Liguori wrote:
 
  If you add support for a new command, you need to provide userspace
  a way to disable this command.  If you change what gets reported for
  VPD, you need to provide userspace a way to make VPD look like what
  it did in a previous version.
 
  Basically, you need to be able to make a TCM device behave 100% the
  same as it did in an older version of the kernel.
 
  This is unique to virtualization due to live migration.  If you
  migrate from a 3.6 kernel to a 3.8 kernel, you need to make sure
  that the 3.8 kernel's TCM device behaves exactly like the 3.6 kernel
  because the guest that is interacting with it does not realize that
  live migration happened.
 
  I don't think these strict live migration rules apply to SCSI targets.
 
  Real life storage systems get new features and different behaviour with
  firmware upgrades all the time, and SCSI initiators deal with that just
  fine.  I don't see any reason to be more picky just because we're
  virtualized.
 
 But would this happen while a system is running live?

Of course: Think about the consequences: you want to upgrade one array
on your SAN.  You definitely don't want to shut down your entire data
centre to achieve it.  In place upgrades on running SANs have been
common in enterprise environments for a while.

 I agree that in general, SCSI targets don't need this, but I'm pretty sure 
 that 
 if a guest probes for a command, you migrate to an old version, and that 
 command 
 is no longer there, badness will ensue.

What command are we talking about?  Operation of initiators is usually
just READ and WRITE.  So perhaps we might have inline UNMAP ... but the
world wouldn't come to an end even if the latter stopped working.

Most of the complex SCSI stuff is done at start of day; it's actually
only then we'd notice things like changes in INQUIRY strings or mode
pages.

Failover, which is what you're talking about, requires reinstatement of
all the operating parameters of the source/target system, but that's not
wholly the responsibility of the storage system ...

James

 It's different when you're talking about a reboot happening or a 
 disconnect/reconnect due to firmware upgrade.  The OS would naturally be 
 reprobing in this case.



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi/ibmvscsi: add module alias for ibmvscsic

2012-07-18 Thread olaf
From: Olaf Hering o...@aepfle.de

The driver is named ibmvscsic, at runtime it its name is advertised as
ibmvscsi. For this reason mkinitrd wont pickup the driver properly.
Reported by IBM during SLES11 beta testing:

https://bugzilla.novell.com/show_bug.cgi?id=459933
LTC50724

Signed-off-by: Olaf Hering o...@aepfle.de

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 3a6c474..e580aa4 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -100,6 +100,9 @@ static struct scsi_transport_template 
*ibmvscsi_transport_template;
 
 static struct ibmvscsi_ops *ibmvscsi_ops;
 
+/* The driver is named ibmvscsic, map ibmvscsi to module name */
+#define IBMVSCSI_NAME ibmvscsi
+MODULE_ALIAS(IBMVSCSI_NAME);
 MODULE_DESCRIPTION(IBM Virtual SCSI);
 MODULE_AUTHOR(Dave Boutcher);
 MODULE_LICENSE(GPL);
@@ -1796,7 +1799,7 @@ static struct device_attribute *ibmvscsi_attrs[] = {
 static struct scsi_host_template driver_template = {
.module = THIS_MODULE,
.name = IBM POWER Virtual SCSI Adapter  IBMVSCSI_VERSION,
-   .proc_name = ibmvscsi,
+   .proc_name = IBMVSCSI_NAME,
.queuecommand = ibmvscsi_queuecommand,
.eh_abort_handler = ibmvscsi_eh_abort_handler,
.eh_device_reset_handler = ibmvscsi_eh_device_reset_handler,
@@ -1936,7 +1939,7 @@ static int ibmvscsi_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
}
 
hostdata-work_thread = kthread_run(ibmvscsi_work, hostdata, %s_%d,
-   ibmvscsi, host-host_no);
+   IBMVSCSI_NAME, host-host_no);
 
if (IS_ERR(hostdata-work_thread)) {
dev_err(vdev-dev, couldn't initialize kthread. rc=%ld\n,
@@ -2061,7 +2064,7 @@ static struct vio_driver ibmvscsi_driver = {
.probe = ibmvscsi_probe,
.remove = ibmvscsi_remove,
.get_desired_dma = ibmvscsi_get_desired_dma,
-   .name = ibmvscsi,
+   .name = IBMVSCSI_NAME,
.pm = ibmvscsi_pm_ops,
 };
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] scsi/ibmvscsi: /sys/class/scsi_host/hostX/config doesn't show any information

2012-07-18 Thread olaf
From: Linda Xie lx...@us.ibm.com

Expected result:
It should show something like this:
x1521p4:~ # cat /sys/class/scsi_host/host1/config
PARTITIONNAME='x1521p4'
NWSDNAME='X1521P4'
HOSTNAME='X1521P4'
DOMAINNAME='RCHLAND.IBM.COM'
NAMESERVERS='9.10.244.100 9.10.244.200'

Actual result:
x1521p4:~ # cat /sys/class/scsi_host/host0/config
x1521p4:~ #

This patch changes the size of the buffer used for transfering config
data to 4K. It was tested against 2.6.19-rc2 tree.

Reported by IBM during SLES11 beta testing:

https://bugzilla.novell.com/show_bug.cgi?id=439970
LTC49349

Signed-off-by: Olaf Hering o...@aepfle.de

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index e580aa4..1513ca8 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -93,6 +93,8 @@ static int max_requests = IBMVSCSI_MAX_REQUESTS_DEFAULT;
 static int max_events = IBMVSCSI_MAX_REQUESTS_DEFAULT + 2;
 static int fast_fail = 1;
 static int client_reserve = 1;
+/* host data buffer size */
+#define HOST_BUFFER_SIZE 4096
 
 static struct scsi_transport_template *ibmvscsi_transport_template;
 
@@ -1666,7 +1668,7 @@ static ssize_t show_host_srp_version(struct device *dev,
struct ibmvscsi_host_data *hostdata = shost_priv(shost);
int len;
 
-   len = snprintf(buf, PAGE_SIZE, %s\n,
+   len = snprintf(buf, HOST_BUFFER_SIZE, %s\n,
   hostdata-madapter_info.srp_version);
return len;
 }
@@ -1687,7 +1689,7 @@ static ssize_t show_host_partition_name(struct device 
*dev,
struct ibmvscsi_host_data *hostdata = shost_priv(shost);
int len;
 
-   len = snprintf(buf, PAGE_SIZE, %s\n,
+   len = snprintf(buf, HOST_BUFFER_SIZE, %s\n,
   hostdata-madapter_info.partition_name);
return len;
 }
@@ -1708,7 +1710,7 @@ static ssize_t show_host_partition_number(struct device 
*dev,
struct ibmvscsi_host_data *hostdata = shost_priv(shost);
int len;
 
-   len = snprintf(buf, PAGE_SIZE, %d\n,
+   len = snprintf(buf, HOST_BUFFER_SIZE, %d\n,
   hostdata-madapter_info.partition_number);
return len;
 }
@@ -1728,7 +1730,7 @@ static ssize_t show_host_mad_version(struct device *dev,
struct ibmvscsi_host_data *hostdata = shost_priv(shost);
int len;
 
-   len = snprintf(buf, PAGE_SIZE, %d\n,
+   len = snprintf(buf, HOST_BUFFER_SIZE, %d\n,
   hostdata-madapter_info.mad_version);
return len;
 }
@@ -1748,7 +1750,7 @@ static ssize_t show_host_os_type(struct device *dev,
struct ibmvscsi_host_data *hostdata = shost_priv(shost);
int len;
 
-   len = snprintf(buf, PAGE_SIZE, %d\n, hostdata-madapter_info.os_type);
+   len = snprintf(buf, HOST_BUFFER_SIZE, %d\n, 
hostdata-madapter_info.os_type);
return len;
 }
 
@@ -1767,7 +1769,7 @@ static ssize_t show_host_config(struct device *dev,
struct ibmvscsi_host_data *hostdata = shost_priv(shost);
 
/* returns null-terminated host config data */
-   if (ibmvscsi_do_host_config(hostdata, buf, PAGE_SIZE) == 0)
+   if (ibmvscsi_do_host_config(hostdata, buf, HOST_BUFFER_SIZE) == 0)
return strlen(buf);
else
return 0;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 04:42:33PM +, Rustad, Mark D wrote:
 On Jul 18, 2012, at 9:00 AM, Michael S. Tsirkin wrote:
 
  On Wed, Jul 18, 2012 at 11:53:38AM -0400, Christoph Hellwig wrote:
  On Wed, Jul 18, 2012 at 08:42:21AM -0500, Anthony Liguori wrote:
  
  If you add support for a new command, you need to provide userspace
  a way to disable this command.  If you change what gets reported for
  VPD, you need to provide userspace a way to make VPD look like what
  it did in a previous version.
  
  Basically, you need to be able to make a TCM device behave 100% the
  same as it did in an older version of the kernel.
  
  This is unique to virtualization due to live migration.  If you
  migrate from a 3.6 kernel to a 3.8 kernel, you need to make sure
  that the 3.8 kernel's TCM device behaves exactly like the 3.6 kernel
  because the guest that is interacting with it does not realize that
  live migration happened.
  
  I don't think these strict live migration rules apply to SCSI targets.
  
  Real life storage systems get new features and different behaviour with
  firmware upgrades all the time, and SCSI initiators deal with that just
  fine.
  I don't see any reason to be more picky just because we're
  virtualized.
  
  Presumably initiators are shut down for target firmware upgrades?
  With virtualization your host can change without guest shutdown.
  You can also *lose* commands when migrating to an older host.
 
 
 Actually no. Storage vendors do not want to impose a need to take initiators 
 down for any reason. I have worked for a storage system vendor that routinely 
 did firmware upgrades on-the-fly. This is done by multi-pathing and taking 
 one path down, upgrade, bring up, repeat.

With live migration even that does not happen.

 There was even one non-redundant system that I am aware of that could upgrade 
 firmware and reboot fast enough that the initiators would not notice.
 
 You do have to pay very close attention to some things however. Don't change 
 the device identity in any way - even version information, otherwise a 
 Windows initiator will blue-screen. I made that mistake myself, so I remember 
 it well. It seemed like such an innocent change. I don't recall there being 
 any issue with adding commands and we did do that on occasion.

How about removing commands?

 -- 
 Mark Rustad, LAN Access Division, Intel Corporation
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Anthony Liguori

On 07/18/2012 11:47 AM, James Bottomley wrote:

On Wed, 2012-07-18 at 11:00 -0500, Anthony Liguori wrote:

Of course: Think about the consequences: you want to upgrade one array
on your SAN.  You definitely don't want to shut down your entire data
centre to achieve it.  In place upgrades on running SANs have been
common in enterprise environments for a while.


Would firmware upgrades ever result in major OS visible changes though?

Maybe OSes are more robust with SCSI than with other types of buses, but I don't 
think it's safe to completely ignore the problem.



I agree that in general, SCSI targets don't need this, but I'm pretty sure that
if a guest probes for a command, you migrate to an old version, and that command
is no longer there, badness will ensue.


What command are we talking about?  Operation of initiators is usually
just READ and WRITE.  So perhaps we might have inline UNMAP ... but the
world wouldn't come to an end even if the latter stopped working.


Is that true for all OSes?  Linux may handle things gracefully if UNMAP starts 
throwing errors but that doesn't mean that Windows will.


There are other cases where this creates problems.  Windows (and some other 
OSes) fingerprint the hardware profile in order to do license enforcement.  If 
the hardware changes beyond a certain amount, then they refuse to boot.


Windows does this with a points system and I do believe that INQUIRY responses 
from any local disks are included in this tally.



Most of the complex SCSI stuff is done at start of day; it's actually
only then we'd notice things like changes in INQUIRY strings or mode
pages.

Failover, which is what you're talking about, requires reinstatement of
all the operating parameters of the source/target system, but that's not
wholly the responsibility of the storage system ...


It's the responsibility of the hypervisor when dealing with live migration.

Regards,

Anthony Liguori



James


It's different when you're talking about a reboot happening or a
disconnect/reconnect due to firmware upgrade.  The OS would naturally be
reprobing in this case.






--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Rustad, Mark D
On Jul 18, 2012, at 10:17 AM, Michael S. Tsirkin wrote:

snip

 You do have to pay very close attention to some things however. Don't change 
 the device identity in any way - even version information, otherwise a 
 Windows initiator will blue-screen. I made that mistake myself, so I 
 remember it well. It seemed like such an innocent change. I don't recall 
 there being any issue with adding commands and we did do that on occasion.
 
 How about removing commands?

Good question. With the storage system I am familiar with, that would only be a 
risk if firmware were downgraded. Downgrading would never have been 
recommended. I am sure that if something like persistent reserve suddenly went 
away it would cause big trouble for some initiators.

-- 
Mark Rustad, LAN Access Division, Intel Corporation

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 44771] [REGRESSION] a7a20d103994fd760766e6c9d494daa569cbfe06 makes kernel 3.5 unbootable on an Intel chipset based motherboard

2012-07-18 Thread Alan Stern
On Tue, 17 Jul 2012, James Bottomley wrote:

 On Tue, 2012-07-17 at 12:51 -0700, Linus Torvalds wrote:
  Wrong people cc'd, it looks like.
  
  Guys, commit a7a20d103994 (sd: limit the scope of the async probe
  domain) is causing boot problems. It's timing-dependent and
  apparently sometimes works, which makes sense with that commit.
  
  However, it *should* have been fixed by commit 43a8d39d0137 (fix
  async probe regression), but Artem seems to report the problem even
  in -rc7.
  
  Comments?
 
 As far as I can tell, the fix should have worked.  However, there are a
 lot of assumptions in the async stuff that end up not being true in the
 presence of separate async domains.   We should be fixing it all here:
 
 http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=shortlog;h=refs/heads/async

Those commits haven't been merged into 3.5-rc7, right?  So Artem isn't 
using them.

 I've got to say, I don't understand the bug report.  all of those
 commits were about probing for devices.  However, the screen shot
 
 https://bugzilla.kernel.org/attachment.cgi?id=75351
 
 shows the devices were found, it's the partition tables that weren't.
 For us to see the message about sda's capacity, we're already in the
 async code the commits were trying to synchronise with.

That's understandable, given that those commits aren't present in
Artem's kernel.  Are they queued for -stable?

 However, there are some missing messages: there's no partition table
 print and no final 
 
   sd_printk(KERN_NOTICE, sdkp, Attached SCSI %sdisk\n,
 sdp-removable ? removable  : );
 
 So sd_probe_async() got stuck somewhere after the first
 sd_revalidata_disk().

No, it didn't get stuck.  The problem is that the scsi_wait_scan module 
didn't wait for the async scanning to finish.

And the reason for that is one you're already familiar with: 
CONFIG_SCSI_MOD=y.

Artem, if you change all your SCSI drivers to be modular rather than 
built-in, that ought to fix the problem.  Alternatively, you can simply 
continue to use the rootwait option.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v3 3/4] vhost: Add vhost_scsi specific defines

2012-07-18 Thread Nicholas A. Bellinger
On Wed, 2012-07-18 at 16:05 +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 18, 2012 at 12:59:31AM +, Nicholas A. Bellinger wrote:
  From: Nicholas Bellinger n...@risingtidesystems.com
  
  This patch adds the initial vhost_scsi_ioctl() callers for 
  VHOST_SCSI_SET_ENDPOINT
  and VHOST_SCSI_CLEAR_ENDPOINT respectively, and also adds struct 
  vhost_vring_target
  that is used by tcm_vhost code when locating target ports during qemu setup.
  
  Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
  Cc: Zhi Yong Wu wu...@cn.ibm.com
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: Paolo Bonzini pbonz...@redhat.com,
  Signed-off-by: Nicholas A. Bellinger n...@risingtidesystems.com
  ---
   include/linux/vhost.h |9 +
   1 files changed, 9 insertions(+), 0 deletions(-)
  
  diff --git a/include/linux/vhost.h b/include/linux/vhost.h
  index e847f1e..33b313b 100644
  --- a/include/linux/vhost.h
  +++ b/include/linux/vhost.h
  @@ -24,7 +24,11 @@ struct vhost_vring_state {
   struct vhost_vring_file {
  unsigned int index;
  int fd; /* Pass -1 to unbind from file. */
  +};
   
  +struct vhost_vring_target {
 
 Can this be renamed vhost_scsi_target?

Done

 
  +   unsigned char vhost_wwpn[224];
 
 224? I am guessing ISCSI_NAME_LEN from include/scsi/iscsi_proto.h?
 Unfortunately we can't include iscsi_proto.h here as it
 is not exported to users. But let's add a comment for now.
 

This is actually from target/target_core_base.h:TRANSPORT_IQN_LEN.

Fixing this up now..

  +   unsigned short vhost_tpgt;
   };
   
   struct vhost_vring_addr {
  @@ -121,6 +125,11 @@ struct vhost_memory {
* device.  This can be used to stop the ring (e.g. for migration). */
   #define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct 
  vhost_vring_file)
   
  +/* VHOST_SCSI specific defines */
  +
  +#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct 
  vhost_vring_target)
  +#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct 
  vhost_vring_target)
  +
   /* Feature bits */
   /* Log all write descriptors. Can be changed while device is active. */
 
 Can these go into appropriate ifdef CONFIG_TCP_VHOST please?
 

M, I don't think we can do that with CONFIG_TCM_VHOST=m, or at least
not with the following patch

diff --git a/include/linux/vhost.h b/include/linux/vhost.h
index 33b313b..e4b1ee3 100644
--- a/include/linux/vhost.h
+++ b/include/linux/vhost.h
@@ -125,10 +126,14 @@ struct vhost_memory {
  * device.  This can be used to stop the ring (e.g. for migration). */
 #define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file)
 
+#ifdef CONFIG_TCM_VHOST
+
 /* VHOST_SCSI specific defines */
 
-#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct 
vhost_vring_target)
-#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct 
vhost_vring_target)
+# define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct 
vhost_scsi_target)
+# define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct 
vhost_scsi_target)
+
+#endif

--

drivers/vhost/tcm/tcm_vhost.c: In function ‘vhost_scsi_ioctl’:
drivers/vhost/tcm/tcm_vhost.c:970: error: ‘VHOST_SCSI_SET_ENDPOINT’ undeclared 
(first use in this function)
drivers/vhost/tcm/tcm_vhost.c:970: error: (Each undeclared identifier is 
reported only once
drivers/vhost/tcm/tcm_vhost.c:970: error: for each function it appears in.)
drivers/vhost/tcm/tcm_vhost.c:975: error: ‘VHOST_SCSI_CLEAR_ENDPOINT’ 
undeclared (first use in this function)
make[3]: *** [drivers/vhost/tcm/tcm_vhost.o] Error 1


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v3 4/4] tcm_vhost: Initial merge for vhost level target fabric driver

2012-07-18 Thread Nicholas A. Bellinger
On Wed, 2012-07-18 at 19:09 +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 18, 2012 at 12:59:32AM +, Nicholas A. Bellinger wrote:

SNIP

  
  Changelog v2 - v3:
  
Unlock on error in tcm_vhost_drop_nexus() (DanC)
Fix strlen() doesn't count the terminator (DanC)
Call kfree() on an error path (DanC)
Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab)
Fix another strlen() off by one in tcm_vhost_make_tport (DanC)
Add option under drivers/staging/Kconfig, and move to drivers/vhost/tcm/
as requested by MST (nab)
  
  ---
   drivers/staging/Kconfig   |2 +
   drivers/vhost/Makefile|2 +
   drivers/vhost/tcm/Kconfig |6 +
   drivers/vhost/tcm/Makefile|1 +
   drivers/vhost/tcm/tcm_vhost.c | 1611 
  +
   drivers/vhost/tcm/tcm_vhost.h |   74 ++
   6 files changed, 1696 insertions(+), 0 deletions(-)
   create mode 100644 drivers/vhost/tcm/Kconfig
   create mode 100644 drivers/vhost/tcm/Makefile
   create mode 100644 drivers/vhost/tcm/tcm_vhost.c
   create mode 100644 drivers/vhost/tcm/tcm_vhost.h
  
 
 Really sorry about making you run around like that,
 I did not mean moving all of tcm to a directory,
 just adding tcm/Kconfig or adding drivers/vhost/Kconfig.tcm
 because eventually it's easier to keep it all together
 in one place.
 

Er, apologies for the slight mis-understanding here..  Moving back now +
fixing up the Kbuild bits.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v3 3/4] vhost: Add vhost_scsi specific defines

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 02:17:05PM -0700, Nicholas A. Bellinger wrote:
 On Wed, 2012-07-18 at 16:05 +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 18, 2012 at 12:59:31AM +, Nicholas A. Bellinger wrote:
   From: Nicholas Bellinger n...@risingtidesystems.com
   
   This patch adds the initial vhost_scsi_ioctl() callers for 
   VHOST_SCSI_SET_ENDPOINT
   and VHOST_SCSI_CLEAR_ENDPOINT respectively, and also adds struct 
   vhost_vring_target
   that is used by tcm_vhost code when locating target ports during qemu 
   setup.
   
   Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
   Cc: Zhi Yong Wu wu...@cn.ibm.com
   Cc: Michael S. Tsirkin m...@redhat.com
   Cc: Paolo Bonzini pbonz...@redhat.com,
   Signed-off-by: Nicholas A. Bellinger n...@risingtidesystems.com
   ---
include/linux/vhost.h |9 +
1 files changed, 9 insertions(+), 0 deletions(-)
   
   diff --git a/include/linux/vhost.h b/include/linux/vhost.h
   index e847f1e..33b313b 100644
   --- a/include/linux/vhost.h
   +++ b/include/linux/vhost.h
   @@ -24,7 +24,11 @@ struct vhost_vring_state {
struct vhost_vring_file {
 unsigned int index;
 int fd; /* Pass -1 to unbind from file. */
   +};

   +struct vhost_vring_target {
  
  Can this be renamed vhost_scsi_target?
 
 Done
 
  
   + unsigned char vhost_wwpn[224];
  
  224? I am guessing ISCSI_NAME_LEN from include/scsi/iscsi_proto.h?
  Unfortunately we can't include iscsi_proto.h here as it
  is not exported to users. But let's add a comment for now.
  
 
 This is actually from target/target_core_base.h:TRANSPORT_IQN_LEN.
 
 Fixing this up now..
 
   + unsigned short vhost_tpgt;
};

struct vhost_vring_addr {
   @@ -121,6 +125,11 @@ struct vhost_memory {
 * device.  This can be used to stop the ring (e.g. for migration). */
#define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct 
   vhost_vring_file)

   +/* VHOST_SCSI specific defines */
   +
   +#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct 
   vhost_vring_target)
   +#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct 
   vhost_vring_target)
   +
/* Feature bits */
/* Log all write descriptors. Can be changed while device is active. */
  
  Can these go into appropriate ifdef CONFIG_TCP_VHOST please?
  
 
 M, I don't think we can do that with CONFIG_TCM_VHOST=m, or at least
 not with the following patch
 
 diff --git a/include/linux/vhost.h b/include/linux/vhost.h
 index 33b313b..e4b1ee3 100644
 --- a/include/linux/vhost.h
 +++ b/include/linux/vhost.h
 @@ -125,10 +126,14 @@ struct vhost_memory {
   * device.  This can be used to stop the ring (e.g. for migration). */
  #define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct 
 vhost_vring_file)
  
 +#ifdef CONFIG_TCM_VHOST
 +
  /* VHOST_SCSI specific defines */
  
 -#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct 
 vhost_vring_target)
 -#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct 
 vhost_vring_target)
 +# define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct 
 vhost_scsi_target)
 +# define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct 
 vhost_scsi_target)
 +
 +#endif
 
 --
 
 drivers/vhost/tcm/tcm_vhost.c: In function ‘vhost_scsi_ioctl’:
 drivers/vhost/tcm/tcm_vhost.c:970: error: ‘VHOST_SCSI_SET_ENDPOINT’ 
 undeclared (first use in this function)
 drivers/vhost/tcm/tcm_vhost.c:970: error: (Each undeclared identifier is 
 reported only once
 drivers/vhost/tcm/tcm_vhost.c:970: error: for each function it appears in.)
 drivers/vhost/tcm/tcm_vhost.c:975: error: ‘VHOST_SCSI_CLEAR_ENDPOINT’ 
 undeclared (first use in this function)
 make[3]: *** [drivers/vhost/tcm/tcm_vhost.o] Error 1


Maybe ifdefs only work for booleans? If yes you can probably add a boolean and 
select it?
What I want to prevent is exposing tcm stuff in the header if it is
configured off. I'll play with it tomorrow.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [MINI SUMMIT] SCSI core performance

2012-07-18 Thread Nicholas A. Bellinger
On Wed, 2012-07-18 at 09:00 +0100, James Bottomley wrote:
 On Tue, 2012-07-17 at 19:39 -0700, Nicholas A. Bellinger wrote:
  Hi KS-PCs,
  
  I'd like to propose a SCSI performance mini-summit to see how interested
  folks are in helping address the long-term issues that SCSI core is
  currently facing wrt to multi-lun per host and heavy small block random
  I/O workloads.
  
  I know this would probably be better suited for LSF (for the record it
  was proposed this year) but now that we've acknowledge there is a
  problem with SCSI LLDs vs. raw block drivers vs. other SCSI subsystems,
  it would be useful to get the storage folks into a single room at some
  point during KS/LPC to figure out what is actually going on with SCSI
  core.
 
 You seem to have a short memory:  The last time it was discussed
 
 http://marc.info/?t=13415537393
 
 It rapidly became apparent there isn't a problem.  Enabling high IOPS in
 the SCSI stack is what I think you mean.
 

small block random I/O == performance, that is correct.

The host-lock-less stuff is doing better these days for small-ish
multi-lun setups with large block sequential I/O workloads.

Doing ~1 GB/sec per LUN is achievable with multi-lun per host (say up to
6-8 LUNs dependent on your setup) using PCI-e Gen3 hardware.

  As mentioned in the recent tcm_vhost thread, there are a number of cases
  where drivers/target/ code can demonstrate this limitation pretty
  vividly now.
  
  This includes the following scenarios using raw block flash export with
  target_core_mod + target_core_iblock export and the same small block
  (4k) mixed random I/O workload with fio:
  
  *) tcm_loop local SCSI LLD performance is an order of magnitude slower 
 than the same local raw block flash backend.
  *) tcm_qla2xxx performs better using MSFT Server hosts than Linux v3.x
 based hosts using 2x socket Nehalem hardware w/ PCI-e Gen2 HBAs
  *) ib_srpt performs better using MSFT Server host than RHEL 6.x .32 
 based hosts using 2x socket Romley hardware w/ PCI-e Gen3 HCAs
  *) Raw block IBLOCK export into KVM guest v3.5-rc w/ virtio-scsi is 
 behind in performance vs. raw local block flash.  (cmwq on the host 
 is helping here, but still need to with MSFT SCSI mini-port)
  
  Also, with 1M IOPs into a single VM guest now being done by other non
  Linux based hypervisors, the virtualization bit for high performance KVM
  SCSI based storage is quickly coming on..
  
  So all of that said, I'd like to at least have a discussion with the key
  SCSI + block folks who will be present in San Diego on path forward to
  address these without having to wait until LSF-2013 + hope for a topic
  slot to materialize then.
  
  Thank you for your consideration,
 
 Well, your proposal is devoid of an actual proposal.
 

Huh..?  It's a proposal for a discussion to (hopefully) identify the
main culprit(s) and figure out an incremental way forward.

Due to the fact that 1M IOPs machines aren't quite the norm (yet), the
idea is to get storage folks in the same room who do have access to 1M
IOPs systems + have an interest in making SCSI core go faster for random
small block I/O workloads.

This can be vendors / LLD maintainers who've run into similar
limitations with SCSI core, or folks who have an interest in KVM guest
SCSI performance.

 Enabling high IOPS involves reducing locking overhead and path length
 through the code.  I think most of the low hanging fruit in this area is
 already picked, but if you have an idea, please say.  There might be
 something we can extract from the lockless queue work Jens is doing, but
 we need that to materialise first.
 

Would really like to hear from Jens here, but I don't know how much time
he is spending on the SCSI layer these days..

I've been more interested recently in working on a fabric that can
demonstrate this bottleneck with raw block flash into KVM guest -
virtio-scsi, as I think it's a important vehicle for short-term
diagnosis.

 Without a concrete thing to discuss, shooting the breeze on high IOPS in
 the SCSI stack is about as useful as discussing what happened in last
 night's episode of Coronation Street which, when it happens in my house,
 always helps me see how incredibly urgent fixing the leaky tap I've been
 putting off for months actually is.
 

Sorry, I've never heard of that show.  

 If someone can come up with a proposal ... or even perhaps another path
 trace showing where the reducible overhead and lock problems are we can
 discuss it on the list and we might have a real topic by the time LSF
 rolls around.
 

So identifying root culprit(s) is still a WIP at this point.

In the next weeks I'll be back spending time back on 1M IOPs machines
with raw block flash + qla2xxx/srpt/vhost + Linux/MSFT SCSI clients, and
should be getting some more data-points by then.

Anyways, if it ends up taking until LSF it ends up at LSF.  I figured
since things are heating up for virtio-scsi that KS might be a good
venue for 

Re: [Bug 44771] [REGRESSION] a7a20d103994fd760766e6c9d494daa569cbfe06 makes kernel 3.5 unbootable on an Intel chipset based motherboard

2012-07-18 Thread Dan Williams
On Wed, Jul 18, 2012 at 1:25 PM, Alan Stern st...@rowland.harvard.edu wrote:
 On Tue, 17 Jul 2012, James Bottomley wrote:

 On Tue, 2012-07-17 at 12:51 -0700, Linus Torvalds wrote:
  Wrong people cc'd, it looks like.
 
  Guys, commit a7a20d103994 (sd: limit the scope of the async probe
  domain) is causing boot problems. It's timing-dependent and
  apparently sometimes works, which makes sense with that commit.
 
  However, it *should* have been fixed by commit 43a8d39d0137 (fix
  async probe regression), but Artem seems to report the problem even
  in -rc7.
 
  Comments?

 As far as I can tell, the fix should have worked.  However, there are a
 lot of assumptions in the async stuff that end up not being true in the
 presence of separate async domains.   We should be fixing it all here:

 http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=shortlog;h=refs/heads/async

 Those commits haven't been merged into 3.5-rc7, right?  So Artem isn't
 using them.

 I've got to say, I don't understand the bug report.  all of those
 commits were about probing for devices.  However, the screen shot

 https://bugzilla.kernel.org/attachment.cgi?id=75351

 shows the devices were found, it's the partition tables that weren't.
 For us to see the message about sda's capacity, we're already in the
 async code the commits were trying to synchronise with.

 That's understandable, given that those commits aren't present in
 Artem's kernel.  Are they queued for -stable?

I did not flag them as such given the wider entanglements.


 However, there are some missing messages: there's no partition table
 print and no final

   sd_printk(KERN_NOTICE, sdkp, Attached SCSI %sdisk\n,
 sdp-removable ? removable  : );

 So sd_probe_async() got stuck somewhere after the first
 sd_revalidata_disk().

 No, it didn't get stuck.  The problem is that the scsi_wait_scan module
 didn't wait for the async scanning to finish.

 And the reason for that is one you're already familiar with:
 CONFIG_SCSI_MOD=y.

I had forgotten that additional aspect. but I don't think that is the
root cause.  scsi_wait_scan() is a nop in Artem's config given
CONFIG_SCSI_WAIT_SCAN=m.  From what I have seen ahci typically ensures
that wait_for_device_probe() is all that is needed to guarantee
scanning is complete which commit a7a20d103994 (sd: limit the scope
of the async probe domain) breaks, and which commit 43a8d39d0137
([SCSI] fix async probe regression), does not resolve because that
requires a call to scsi_complete_async_scans() to close the loop.

 Artem, if you change all your SCSI drivers to be modular rather than
 built-in, that ought to fix the problem.  Alternatively, you can simply
 continue to use the rootwait option.

I think setting CONFIG_SCSI_WAIT_SCAN=y would also be a workaround, or
going forward with the pending async rework to make sd probe work once
again visible to async_synchronize_domain_full().

--
Dan
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Nicholas A. Bellinger
On Wed, 2012-07-18 at 08:42 -0500, Anthony Liguori wrote:
 On 07/17/2012 04:50 PM, Nicholas A. Bellinger wrote:
  On Tue, 2012-07-17 at 13:55 -0500, Anthony Liguori wrote:
  On 07/17/2012 10:05 AM, Michael S. Tsirkin wrote:
  On Wed, Jul 11, 2012 at 09:15:00PM +, Nicholas A. Bellinger wrote:
 

SNIP

 
  But I do think the kernel should carefully consider whether it wants to 
  support
  an interface like this.  This an extremely complicated ABI with a lot of 
  subtle
  details around state and compatibility.
 
  Are you absolutely confident that you can support a userspace application 
  that
  expects to get exactly the same response from all possible commands in 20 
  kernel
  versions from now?  Virtualization requires absolutely precise 
  compatibility in
  terms of bugs and features.  This is probably not something the TCM stack 
  has
  had to consider yet.
 
 
  We most certainly have thought about long term userspace compatibility
  with TCM.  Our userspace code (that's now available in all major
  distros) is completely forward-compatible with new fabric modules such
  as tcm_vhost.  No update required.
 
 I'm not sure we're talking about the same thing when we say compatibility.
 
 I'm not talking about the API.  I'm talking about the behavior of the 
 commands 
 that tcm_vhost supports.
 

OK, I understand what your getting at now..

 If you add support for a new command, you need to provide userspace a way to 
 disable this command.  If you change what gets reported for VPD, you need to 
 provide userspace a way to make VPD look like what it did in a previous 
 version.
 
 Basically, you need to be able to make a TCM device behave 100% the same as 
 it 
 did in an older version of the kernel.
 
 This is unique to virtualization due to live migration.  If you migrate from 
 a 
 3.6 kernel to a 3.8 kernel, you need to make sure that the 3.8 kernel's TCM 
 device behaves exactly like the 3.6 kernel because the guest that is 
 interacting 
 with it does not realize that live migration happened.
 
 Yes, you can add knobs via configfs to control this behavior, but I think the 
 question is, what's the plan for this?
 

So we already allow for some types of CDBs emulation to be toggled via
backend device attributes:

root@tifa:/usr/src/target-pending.git# tree 
/sys/kernel/config/target/core/iblock_2/fioa/attrib/
/sys/kernel/config/target/core/iblock_2/fioa/attrib/
├── block_size
├── emulate_dpo
├── emulate_fua_read
├── emulate_fua_write
├── emulate_rest_reord
├── emulate_tas
├── emulate_tpu
├── emulate_tpws
├── emulate_ua_intlck_ctrl
├── emulate_write_cache
├── enforce_pr_isids

SNIP

Some things like SPC-3 persistent reservations + implict/explict ALUA
multipath currently can't be disabled, but adding two more backend
attributes to disable/enable this logic individual is easy enough to do.

So that said, I don't have a problem with adding the necessary device
attributes to limit what type of CDBs a backend device is capable of
processing.

Trying to limiting this per-guest (instead of per-device) is where
things get a little more tricky..

 BTW, I think this is a good thing to cover in 
 Documentation/vhost/tcm_vhost.txt. 
   I think that's probably the only change that's needed here.
 

Sure, but I'll need to know what else that you'd like to optionally
restrict it terms of CDB processing that's not already there..

Thanks for your feedback!

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scsi_lib: fix scsi_io_completion's SG_IO error propagation

2012-07-18 Thread Mike Snitzer
Hi James,

Seems this one slipped through the cracks.  Please review/apply.

Thanks,
Mike

On Thu, May 31 2012 at  4:33pm -0400,
Moger, Babu babu.mo...@netapp.com wrote:

 Thanks Mike. Looks good.
 
  -Original Message-
  From: Mike Snitzer [mailto:snit...@redhat.com]
  Sent: Thursday, May 31, 2012 3:24 PM
  To: Paolo Bonzini
  Cc: linux-scsi@vger.kernel.org; Moger, Babu; h...@suse.de;
  micha...@cs.wisc.edu; james.bottom...@hansenpartnership.com;
  sta...@vger.kernel.org.#.3.4
  Subject: Re: scsi_lib: fix scsi_io_completion's SG_IO error propagation
  
  On Thu, May 31 2012 at  3:51pm -0400,
  Paolo Bonzini pbonz...@redhat.com wrote:
  
   Il 31/05/2012 21:05, Mike Snitzer ha scritto:
The following v3.4-rc1 commit unmasked an existing bug in
scsi_io_completion's SG_IO error handling:
47ac56d [SCSI] scsi_error: classify some ILLEGAL_REQUEST sense as a
  permanent TARGET_ERROR
   
Given that certain ILLEGAL_REQUEST are now properly categorized as
TARGET_ERROR the host_byte is being set (before host_byte wasn't ever
set for these ILLEGAL_REQUEST).
   
In scsi_io_completion, initialize req-errors with cmd-result _after_
the SG_IO block that calls __scsi_error_from_host_byte (which may
modify the host_byte).
   
Before this fix:
   
cdb to send: 12 01 01 00 00 00
ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00],
mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=2, flags=0,
status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b,
00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0x10,
driver_status=0x8, resid=0, duration=0, info=0x1}) = 0
SCSI Status: Check Condition
   
Sense Information:
sense buffer empty
   
After:
   
cdb to send: 12 01 01 00 00 00
ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00],
mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=2, flags=0,
status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b,
00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0,
driver_status=0x8, resid=0, duration=0, info=0x1}) = 0
SCSI Status: Check Condition
   
Sense Information:
 Fixed format, current;  Sense key: Illegal Request
 Additional sense: Invalid field in cdb
 Raw sense data (in hex):
70 00 05 00 00 00 00 0b  00 00 00 00 24 00 00 00
00 00 00
   
Signed-off-by: Mike Snitzer snit...@redhat.com
Reviewed-by: Babu Moger babu.mo...@netapp.com
  
   Reported-by: Paolo Bonzini pbonz...@redhat.com
   Tested-by: Paolo Bonzini pbonz...@redhat.com
  
  Yes, thanks for backfilling the Reported-by and Tested-by Paolo!
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target: Allow for target_submit_cmd() returning errors

2012-07-18 Thread Nicholas A. Bellinger
On Wed, 2012-07-18 at 11:10 +0200, Sebastian Andrzej Siewior wrote:
 On 07/18/2012 02:05 AM, Nicholas A. Bellinger wrote:
  index 02ace18..9ddf315 100644
  --- a/drivers/usb/gadget/tcm_usb_gadget.c
  +++ b/drivers/usb/gadget/tcm_usb_gadget.c
  @@ -1060,7 +1060,10 @@ static void usbg_cmd_work(struct work_struct *work)
 tpg = cmd-fu-tpg;
 tv_nexus = tpg-tpg_nexus;
 dir = get_cmd_dir(cmd-cmd_buf);
  -  if (dir  0) {
  +  if (dir  0 ||
  +  target_submit_cmd(se_cmd, tv_nexus-tvn_se_sess,
  +cmd-cmd_buf, cmd-sense_iu.sense, 
  cmd-unpacked_lun,
  +0, cmd-prio_attr, dir, TARGET_SCF_UNKNOWN_SIZE)) 
  {
 transport_init_se_cmd(se_cmd,
 tv_nexus-tvn_se_sess-se_tpg-se_tpg_tfo,
 tv_nexus-tvn_se_sess, cmd-data_len, DMA_NONE,
  @@ -1069,12 +1072,7 @@ static void usbg_cmd_work(struct work_struct *work)
 transport_send_check_condition_and_sense(se_cmd,
 TCM_UNSUPPORTED_SCSI_OPCODE, 1);
 usbg_cleanup_cmd(cmd);
  -  return;
 }
  -
  -  target_submit_cmd(se_cmd, tv_nexus-tvn_se_sess,
  -  cmd-cmd_buf, cmd-sense_iu.sense, cmd-unpacked_lun,
  -  0, cmd-prio_attr, dir, TARGET_SCF_UNKNOWN_SIZE);
}
 
static int usbg_submit_command(struct f_uas *fu,
 
  Looking at this part again target_submit_cmd() has been moved ahead of
  target_init_se_cmd() in the exception path, which ends up getting called
  twice during a target_get_sess_cmd() failure..
 
  It looks like usbg_cmd_work() + bot_cmd_work() will need some work if
  they intends to use a non zero failure to signal exception here, without
  invoking a CHECK_CONDITION and sense.  Actually, I'm not even sure
  sending a CHECK_CONDITION here after the target_submit_cmd() is going to
  work for other fabrics drivers, but it appears to work with usb-gadget.
  (Sebastian..?)
 
 For UAS the status/ sense URB is sent right away (without any data) and
 this worked the last time I tested.

nod, thanks for the re-clarification on this..  ;)

 For BOT things are a little complicated. What I do is to send empty
 data (to fill the data pipe) and send a bad status so the other side
 learns that the transfer failed somehow. The sense information is lost.
 What I should do is to queue this sense data for the next MODE_SENSE
 request which will be coming soon. Right now WinXP repeats a failed
 command thrice if MODE_SENSE returns no error. This is on my to-fix
 list…
 

Ok, so I think the special case for usb-gadget where is able to call
transport_send_check_condition_and_sense() should be OK for the single
target_get_sess_cmd() - target_submit_cmd() failure case..

This is not going to be safe for just about every other fabric AFAICT,
so I think we need a comment here to describe that fact..

  So for the moment, lets fix the double se_cmd init and make things a
  little easier to read..   Sebastian, please have a look and double check
  that the (dir  0) + target_submit_cmd() failures cases are both going
  to work as expected whilst sending a CHECK_CONDITION response.
 
 The sense code should only be sent once and
 transport_send_check_condition_and_sense() sets
 SCF_SENT_CHECK_CONDITION and checks for it so doing it twice should not
 do any harm right? But grep does not say when this flag gets removed.
 

Correct.  The SCF_SENT_CHECK_CONDITION bit does not get removed ahead of
fabric descriptor release.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] target fixes for v3.5

2012-07-18 Thread Nicholas A. Bellinger
Hi Linus,

Here are the last handful of target fixes that have come in recently
that are headed for-3.5 final.  Please go ahead and pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git master

This includes a bugfix from MDR to address a NULL pointer OOPs with FCoE
aborts, along with a WRITE_SAME emulation bugfix for NOLB=0 cases +
persistent reservation return cleanups from Roland.

All three patches are CC'ed to stable.

Thank you!

--nab

Mark Rustad (1):
  tcm_fc: Fix crash seen with aborts and large reads

Roland Dreier (2):
  target: Clean up returning errors in PR handling code
  target: Fix range calculation in WRITE SAME emulation when num blocks
== 0

 drivers/target/target_core_cdb.c |2 +-
 drivers/target/target_core_pr.c  |7 ---
 drivers/target/tcm_fc/tfc_cmd.c  |2 ++
 3 files changed, 7 insertions(+), 4 deletions(-)



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html