Re: WARNING: CPU: 1 PID: 495 at mm/slab_common.c:69 kmem_cache_create+0x1a9/0x330()

2014-07-18 Thread poma

On 18.07.2014 22:16, poma wrote:

On 18.07.2014 22:07, James Bottomley wrote:

On Fri, 2014-07-18 at 22:01 +0200, poma wrote:

On 18.07.2014 16:17, Christoph Hellwig wrote:

On Fri, Jul 18, 2014 at 05:21:04PM +0400, Vladimir Davydov wrote:

Slab warns, because the name of the cache being created contains spaces.
The "bad" cache is created by scsi_get_host_cmd_pool. Its name
(pool->cmd_name) is initialized by scsi_alloc_host_cmd_pool as follows:

pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);

So, if hostt->name contains spaces, the cache name will also contain
spaces and we'll get the warning. And hostt->name can contain spaces,
e.g. virtscsi_host_template_single.name="Virtio SCSI HBA".


Or might not even be present.  I'll send a patch to replace it with
->proc_name, which must not contain spaces and is generally shorter
as well.



Is this what you thought?


No, he means this, if you want to try it.

James

---

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe..eb07a9b 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -368,8 +368,8 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
if (!pool)
return NULL;

-   pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);
-   pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->name);
+   pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->proc_name);
+   pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->proc_name);
if (!pool->cmd_name || !pool->sense_name) {
scsi_free_host_cmd_pool(pool);
return NULL;




Man, I just now read it correctly - "So, if hostt->name contains spaces".
Thanks.

I'll be back.



Yea, I can confirm it works! :)
No warnings.


poma


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: CPU: 1 PID: 495 at mm/slab_common.c:69 kmem_cache_create+0x1a9/0x330()

2014-07-18 Thread poma

On 18.07.2014 22:07, James Bottomley wrote:

On Fri, 2014-07-18 at 22:01 +0200, poma wrote:

On 18.07.2014 16:17, Christoph Hellwig wrote:

On Fri, Jul 18, 2014 at 05:21:04PM +0400, Vladimir Davydov wrote:

Slab warns, because the name of the cache being created contains spaces.
The "bad" cache is created by scsi_get_host_cmd_pool. Its name
(pool->cmd_name) is initialized by scsi_alloc_host_cmd_pool as follows:

pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);

So, if hostt->name contains spaces, the cache name will also contain
spaces and we'll get the warning. And hostt->name can contain spaces,
e.g. virtscsi_host_template_single.name="Virtio SCSI HBA".


Or might not even be present.  I'll send a patch to replace it with
->proc_name, which must not contain spaces and is generally shorter
as well.



Is this what you thought?


No, he means this, if you want to try it.

James

---

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe..eb07a9b 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -368,8 +368,8 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
if (!pool)
return NULL;

-   pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);
-   pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->name);
+   pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->proc_name);
+   pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->proc_name);
if (!pool->cmd_name || !pool->sense_name) {
scsi_free_host_cmd_pool(pool);
return NULL;




Man, I just now read it correctly - "So, if hostt->name contains spaces".
Thanks.

I'll be back.



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: CPU: 1 PID: 495 at mm/slab_common.c:69 kmem_cache_create+0x1a9/0x330()

2014-07-18 Thread James Bottomley
On Fri, 2014-07-18 at 22:01 +0200, poma wrote:
> On 18.07.2014 16:17, Christoph Hellwig wrote:
> > On Fri, Jul 18, 2014 at 05:21:04PM +0400, Vladimir Davydov wrote:
> >> Slab warns, because the name of the cache being created contains spaces.
> >> The "bad" cache is created by scsi_get_host_cmd_pool. Its name
> >> (pool->cmd_name) is initialized by scsi_alloc_host_cmd_pool as follows:
> >>
> >>pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);
> >>
> >> So, if hostt->name contains spaces, the cache name will also contain
> >> spaces and we'll get the warning. And hostt->name can contain spaces,
> >> e.g. virtscsi_host_template_single.name="Virtio SCSI HBA".
> >
> > Or might not even be present.  I'll send a patch to replace it with
> > ->proc_name, which must not contain spaces and is generally shorter
> > as well.
> >
> 
> Is this what you thought?

No, he means this, if you want to try it.

James

---

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 88d46fe..eb07a9b 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -368,8 +368,8 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
if (!pool)
return NULL;
 
-   pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);
-   pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->name);
+   pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->proc_name);
+   pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->proc_name);
if (!pool->cmd_name || !pool->sense_name) {
scsi_free_host_cmd_pool(pool);
return NULL;


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: CPU: 1 PID: 495 at mm/slab_common.c:69 kmem_cache_create+0x1a9/0x330()

2014-07-18 Thread poma

On 18.07.2014 16:17, Christoph Hellwig wrote:

On Fri, Jul 18, 2014 at 05:21:04PM +0400, Vladimir Davydov wrote:

Slab warns, because the name of the cache being created contains spaces.
The "bad" cache is created by scsi_get_host_cmd_pool. Its name
(pool->cmd_name) is initialized by scsi_alloc_host_cmd_pool as follows:

pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);

So, if hostt->name contains spaces, the cache name will also contain
spaces and we'll get the warning. And hostt->name can contain spaces,
e.g. virtscsi_host_template_single.name="Virtio SCSI HBA".


Or might not even be present.  I'll send a patch to replace it with
->proc_name, which must not contain spaces and is generally shorter
as well.



Is this what you thought?
@@ -148,20 +148,20 @@
struct kmem_cache   *cmd_slab;
struct kmem_cache   *sense_slab;
unsigned intusers;
-   char*cmd_name;
+   char*proc_name;
char*sense_name;
unsigned intslab_flags;
gfp_t   gfp_mask;
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_pool = {

-   .cmd_name   = "scsi_cmd_cache",
+   .proc_name  = "scsi_cmd_cache",
.sense_name = "scsi_sense_cache",
.slab_flags = SLAB_HWCACHE_ALIGN,
 };
 
 static struct scsi_host_cmd_pool scsi_cmd_dma_pool = {

-   .cmd_name   = "scsi_cmd_cache(DMA)",
+   .proc_name  = "scsi_cmd_cache(DMA)",
.sense_name = "scsi_sense_cache(DMA)",
.slab_flags = SLAB_HWCACHE_ALIGN|SLAB_CACHE_DMA,
.gfp_mask   = __GFP_DMA,
@@ -354,7 +354,7 @@
 scsi_free_host_cmd_pool(struct scsi_host_cmd_pool *pool)
 {
kfree(pool->sense_name);
-   kfree(pool->cmd_name);
+   kfree(pool->proc_name);
kfree(pool);
 }
 
@@ -368,9 +368,9 @@

if (!pool)
return NULL;
 
-	pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);

+   pool->proc_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);
pool->sense_name = kasprintf(GFP_KERNEL, "%s_sense", hostt->name);
-   if (!pool->cmd_name || !pool->sense_name) {
+   if (!pool->proc_name || !pool->sense_name) {
scsi_free_host_cmd_pool(pool);
return NULL;
}
@@ -403,7 +403,7 @@
}
 
 	if (!pool->users) {

-   pool->cmd_slab = kmem_cache_create(pool->cmd_name, cmd_size, 0,
+   pool->cmd_slab = kmem_cache_create(pool->proc_name, cmd_size, 0,
   pool->slab_flags, NULL);
if (!pool->cmd_slab)
goto out_free_pool;


however ain't workin.


poma


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/25] Replace DEFINE_PCI_DEVICE_TABLE macro use

2014-07-18 Thread James Bottomley
On Fri, 2014-07-18 at 11:17 -0700, Greg KH wrote:
> On Fri, Jul 18, 2014 at 09:54:32AM -0700, James Bottomley wrote:
> > On Fri, 2014-07-18 at 09:43 -0700, Greg KH wrote:
> > > On Fri, Jul 18, 2014 at 12:22:13PM -0400, John W. Linville wrote:
> > > > On Fri, Jul 18, 2014 at 05:26:47PM +0200, Benoit Taine wrote:
> > > > > We should prefer `const struct pci_device_id` over
> > > > > `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines.
> > > > > This issue was reported by checkpatch.
> > > > 
> > > > Honestly, I prefer the macro -- it stands-out more.  Maybe the style
> > > > guidelines and/or checkpatch should change instead?
> > > 
> > > The macro is horrid, no other bus has this type of thing just to save a
> > > few characters in typing
> > 
> > OK, so this is the macro:
> > 
> > #define DEFINE_PCI_DEVICE_TABLE(_table) \
> > const struct pci_device_id _table[]
> > 
> > Could you explain what's so horrible?
> > 
> > The reason it's useful today is that people forget the const (and
> > sometimes the [] making it a true table instead of a pointer).  If you
> > use the DEFINE_PCI_DEVICE_TABLE macro, the compile breaks if you use it
> > wrongly (good) and you automatically get the correct annotations.
> 
> We have almost 1000 more uses of the non-macro version than the "macro"
> version in the kernel today:
> $ git grep -w DEFINE_PCI_DEVICE_TABLE | wc -l
> 262
> $ git grep "const struct pci_device_id" | wc -l
> 1254
> 
> My big complaint is that we need to be consistant, either pick one or
> the other and stick to it.  As the macro is the least used, it's easiest
> to fix up, and it also is more consistant with all other kernel
> subsystems which do not have such a macro.

I've a weak preference for consistency, but not at the expense of
hundreds of patches churning the kernel to remove an innocuous macro.
Churn costs time and effort.

> As there is no need for the __init macro mess anymore, there's no real
> need for the DEFINE_PCI_DEVICE_TABLE macro either.  I think checkpatch
> will catch the use of non-const users for the id table already today, it
> catches lots of other uses like this already.
> 
> > > , so why should PCI be "special" in this regard
> > > anymore?
> > 
> > I think the PCI usage dwarfs most other bus types now, so you could turn
> > the question around.  However, I don't think majority voting is a good
> > guide to best practise; lets debate the merits for their own sake.
> 
> Not really "dwarf", USB is close with over 700 such structures:
> $ git grep "const struct usb_device_id" | wc -l
> 725
> 
> And i2c is almost just as big as PCI:
> $ git grep "const struct i2c_device_id" | wc -l
> 1223
> 
> So again, this macro is not consistent with the majority of PCI drivers,
> nor with any other type of "device id" declaration in the kernel, which
> is why I feel it should be removed.
> 
> And finally, the PCI documentation itself says to not use this macro, so
> this isn't a "new" thing.  From Documentation/PCI/pci.txt:
> 
>   The ID table is an array of struct pci_device_id entries ending with an
>   all-zero entry.  Definitions with static const are generally preferred.
>   Use of the deprecated macro DEFINE_PCI_DEVICE_TABLE should be avoided.
> 
> That wording went into the file last December, when we last talked about
> this and everyone in that discussion agreed to remove the macro for the
> above reasons.
> 
> Consistency matters.

In this case, I don't think it does that much ... a cut and paste either
way (from a macro or non-macro based driver) yields correct code.  Since
there's no bug here and no apparent way to misuse the macro, why bother?

Anyway, it's PCI code ... let the PCI maintainer decide.  However, if he
does want this do it as one big bang patch via either the PCI or Trivial
tree (latter because Jiří has experience doing this, but the former
might be useful so the decider feels the pain ...)

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/25] Replace DEFINE_PCI_DEVICE_TABLE macro use

2014-07-18 Thread Greg KH
On Fri, Jul 18, 2014 at 09:54:32AM -0700, James Bottomley wrote:
> On Fri, 2014-07-18 at 09:43 -0700, Greg KH wrote:
> > On Fri, Jul 18, 2014 at 12:22:13PM -0400, John W. Linville wrote:
> > > On Fri, Jul 18, 2014 at 05:26:47PM +0200, Benoit Taine wrote:
> > > > We should prefer `const struct pci_device_id` over
> > > > `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines.
> > > > This issue was reported by checkpatch.
> > > 
> > > Honestly, I prefer the macro -- it stands-out more.  Maybe the style
> > > guidelines and/or checkpatch should change instead?
> > 
> > The macro is horrid, no other bus has this type of thing just to save a
> > few characters in typing
> 
> OK, so this is the macro:
> 
> #define DEFINE_PCI_DEVICE_TABLE(_table) \
>   const struct pci_device_id _table[]
> 
> Could you explain what's so horrible?
> 
> The reason it's useful today is that people forget the const (and
> sometimes the [] making it a true table instead of a pointer).  If you
> use the DEFINE_PCI_DEVICE_TABLE macro, the compile breaks if you use it
> wrongly (good) and you automatically get the correct annotations.

We have almost 1000 more uses of the non-macro version than the "macro"
version in the kernel today:
$ git grep -w DEFINE_PCI_DEVICE_TABLE | wc -l
262
$ git grep "const struct pci_device_id" | wc -l
1254

My big complaint is that we need to be consistant, either pick one or
the other and stick to it.  As the macro is the least used, it's easiest
to fix up, and it also is more consistant with all other kernel
subsystems which do not have such a macro.

As there is no need for the __init macro mess anymore, there's no real
need for the DEFINE_PCI_DEVICE_TABLE macro either.  I think checkpatch
will catch the use of non-const users for the id table already today, it
catches lots of other uses like this already.

> > , so why should PCI be "special" in this regard
> > anymore?
> 
> I think the PCI usage dwarfs most other bus types now, so you could turn
> the question around.  However, I don't think majority voting is a good
> guide to best practise; lets debate the merits for their own sake.

Not really "dwarf", USB is close with over 700 such structures:
$ git grep "const struct usb_device_id" | wc -l
725

And i2c is almost just as big as PCI:
$ git grep "const struct i2c_device_id" | wc -l
1223

So again, this macro is not consistent with the majority of PCI drivers,
nor with any other type of "device id" declaration in the kernel, which
is why I feel it should be removed.

And finally, the PCI documentation itself says to not use this macro, so
this isn't a "new" thing.  From Documentation/PCI/pci.txt:

The ID table is an array of struct pci_device_id entries ending with an
all-zero entry.  Definitions with static const are generally preferred.
Use of the deprecated macro DEFINE_PCI_DEVICE_TABLE should be avoided.

That wording went into the file last December, when we last talked about
this and everyone in that discussion agreed to remove the macro for the
above reasons.

Consistency matters.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread Douglas Gilbert

On 14-07-18 07:41 AM, James Bottomley wrote:

On Fri, 2014-07-18 at 17:17 +, Elliott, Robert (Server Storage)
wrote:




From: James Bottomley [mailto:jbottom...@parallels.com]

On Fri, 2014-07-18 at 00:51 +, Elliott, Robert (Server Storage)
wrote:

...


Also, in both sd_setup_flush_cmnd and sd_sync_cache:
   cmd->cmnd[0] = SYNCHRONIZE_CACHE;
   cmd->cmd_len = 10;

SYNCHRONIZE CACHE (16) should be favored over SYNCHRONIZE
CACHE (10) unless SYNCHRONIZE CACHE (10) is not supported.


(sorry - meant "unless ... 16 is not supported")


Yes, I guessed that?


For what reason.  We usually go for the safe alternatives, which is 10
byte commands because they have the widest testing and greatest level of
support.  We don't do range flushes currently, so there doesn't seem to
be a practical different.  If we did support range flushes, we'd likely
only use SC(16) on >2TB devices.

James


A goal of the simplified SCSI feature set idea is to drop all the
short CDBs that have larger, more capable equivalents - don't carry
READ 6/10/12/16 and SYNCHRONIZE CACHE 10/16, just keep the 16-byte
versions.  With modern serial IU-based protocols, short CDBs don't
save any transfer time.  This will simplify design and testing on
both initiator and target sides. Competing command sets like NVMe
rightly point out that SCSI has too much legacy baggage - all you
need for IO is one READ, one WRITE, and one FLUSH command.


But that's not relevant to us.  This is the problem of practical vs
standards approaches.  We have to support older and buggy devices.  Most
small USB storage sticks die if they see 16 byte CDB commands because
their interpreters.  The more "legacy baggage" the standards committee
dumps, the greater the number of heuristics OSs have to have to cope
with the plethora of odd devices.


That's why SBC-3 ended up with these warning notes for all the
non-16 byte CDBs:

NOTE 15 - Migration from the SYNCHRONIZE CACHE (10) command to
the SYNCHRONIZE CACHE (16) command is recommended for all
implementations.

If the LBA field in SYNCHRONIZE CACHE went obsolete, then maybe
SYNCHRONIZE CACHE (10) would be kept instead of (16), but that
field is still present.  So, (16) is the likely survivor.


OK, but look at it from our point of view:  The reply above justifies
why we prefer 10 byte CDBs over 16.  If the standards body ever did
remove SC(10) completely, then we'd have to have yet another heuristic
to recognise devices that don't support SC(10), but until that day,
using SC(10) alone works in all cases, so is the better path for the OS.

If you could, please get the standards body to recognise that the more
command churn they introduce (in the name of rationalisation or
whatever), the more problems they introduce for Operating Systems and
the more likelihood (because of different people reading different
revisions of standards) that we end up with compliance bugs in devices.


From the term: "feature sets" I'm guessing T10 will follow what T13
does and have something like a VPD page with descriptors of feature
sets supported. Each set has mandatory and optional commands,
perhaps a similar categorization of mode and VPD pages as well. Such
a "clean slate" for SCSI would make it simpler in the future, at
least for what to put on the fast path. Perhaps some legacy
support could be pushed to the user space.

For many technical areas "legacy" is a derogatory term, but
not necessarily for storage!

Doug Gilbert

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/25] Replace DEFINE_PCI_DEVICE_TABLE macro use

2014-07-18 Thread Joe Perches
On Fri, 2014-07-18 at 09:43 -0700, Greg KH wrote:
> On Fri, Jul 18, 2014 at 12:22:13PM -0400, John W. Linville wrote:
> > On Fri, Jul 18, 2014 at 05:26:47PM +0200, Benoit Taine wrote:
> > > We should prefer `const struct pci_device_id` over
> > > `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines.
> > > This issue was reported by checkpatch.
> >  scripts/checkpatch.pl | 4 ++--
> > Honestly, I prefer the macro -- it stands-out more.  Maybe the style
> > guidelines and/or checkpatch should change instead?
> 
> The macro is horrid, no other bus has this type of thing just to save a
> few characters in typing, so why should PCI be "special" in this regard
> anymore?

I think it doesn't matter much.

The PCI_DEVICE and PCI_VDEVICE macro uses are somewhat similar
and are frequently used with PCI_DEVICE_TABLE, so there's some
commonality there.

The checkpatch message could be made --strict/CHK instead of
WARN so most people would never see it.

Of course it could be removed altogether too.  I don't care.
---
(suggested patch is for -next)

 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index dc72a9b..754fbf2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3018,8 +3018,8 @@ sub process {
 
 # check for uses of DEFINE_PCI_DEVICE_TABLE
if ($line =~ /\bDEFINE_PCI_DEVICE_TABLE\s*\(\s*(\w+)\s*\)\s*=/) 
{
-   if (WARN("DEFINE_PCI_DEVICE_TABLE",
-"Prefer struct pci_device_id over deprecated 
DEFINE_PCI_DEVICE_TABLE\n" . $herecurr) &&
+   if (CHK("DEFINE_PCI_DEVICE_TABLE",
+   "Prefer struct pci_device_id over deprecated 
DEFINE_PCI_DEVICE_TABLE\n" . $herecurr) &&
$fix) {
$fixed[$fixlinenr] =~ 
s/\b(?:static\s+|)DEFINE_PCI_DEVICE_TABLE\s*\(\s*(\w+)\s*\)\s*=\s*/static const 
struct pci_device_id $1\[\] = /;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread James Bottomley
On Fri, 2014-07-18 at 17:17 +, Elliott, Robert (Server Storage)
wrote:
> 
> 
> > From: James Bottomley [mailto:jbottom...@parallels.com]
> > 
> > On Fri, 2014-07-18 at 00:51 +, Elliott, Robert (Server Storage)
> > wrote:
> ...
> > >
> > > Also, in both sd_setup_flush_cmnd and sd_sync_cache:
> > >   cmd->cmnd[0] = SYNCHRONIZE_CACHE;
> > >   cmd->cmd_len = 10;
> > >
> > > SYNCHRONIZE CACHE (16) should be favored over SYNCHRONIZE
> > > CACHE (10) unless SYNCHRONIZE CACHE (10) is not supported.
> 
> (sorry - meant "unless ... 16 is not supported")

Yes, I guessed that?

> > For what reason.  We usually go for the safe alternatives, which is 10
> > byte commands because they have the widest testing and greatest level of
> > support.  We don't do range flushes currently, so there doesn't seem to
> > be a practical different.  If we did support range flushes, we'd likely
> > only use SC(16) on >2TB devices.
> > 
> > James
> 
> A goal of the simplified SCSI feature set idea is to drop all the
> short CDBs that have larger, more capable equivalents - don't carry
> READ 6/10/12/16 and SYNCHRONIZE CACHE 10/16, just keep the 16-byte 
> versions.  With modern serial IU-based protocols, short CDBs don't 
> save any transfer time.  This will simplify design and testing on
> both initiator and target sides. Competing command sets like NVMe 
> rightly point out that SCSI has too much legacy baggage - all you 
> need for IO is one READ, one WRITE, and one FLUSH command.

But that's not relevant to us.  This is the problem of practical vs
standards approaches.  We have to support older and buggy devices.  Most
small USB storage sticks die if they see 16 byte CDB commands because
their interpreters.  The more "legacy baggage" the standards committee
dumps, the greater the number of heuristics OSs have to have to cope
with the plethora of odd devices.

> That's why SBC-3 ended up with these warning notes for all the
> non-16 byte CDBs:
> 
>   NOTE 15 - Migration from the SYNCHRONIZE CACHE (10) command to
>   the SYNCHRONIZE CACHE (16) command is recommended for all
>   implementations.
> 
> If the LBA field in SYNCHRONIZE CACHE went obsolete, then maybe
> SYNCHRONIZE CACHE (10) would be kept instead of (16), but that
> field is still present.  So, (16) is the likely survivor.

OK, but look at it from our point of view:  The reply above justifies
why we prefer 10 byte CDBs over 16.  If the standards body ever did
remove SC(10) completely, then we'd have to have yet another heuristic
to recognise devices that don't support SC(10), but until that day,
using SC(10) alone works in all cases, so is the better path for the OS.

If you could, please get the standards body to recognise that the more
command churn they introduce (in the name of rationalisation or
whatever), the more problems they introduce for Operating Systems and
the more likelihood (because of different people reading different
revisions of standards) that we end up with compliance bugs in devices.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread Elliott, Robert (Server Storage)


> From: James Bottomley [mailto:jbottom...@parallels.com]
> 
> On Fri, 2014-07-18 at 00:51 +, Elliott, Robert (Server Storage)
> wrote:
...
> >
> > Also, in both sd_setup_flush_cmnd and sd_sync_cache:
> >   cmd->cmnd[0] = SYNCHRONIZE_CACHE;
> >   cmd->cmd_len = 10;
> >
> > SYNCHRONIZE CACHE (16) should be favored over SYNCHRONIZE
> > CACHE (10) unless SYNCHRONIZE CACHE (10) is not supported.

(sorry - meant "unless ... 16 is not supported")

> For what reason.  We usually go for the safe alternatives, which is 10
> byte commands because they have the widest testing and greatest level of
> support.  We don't do range flushes currently, so there doesn't seem to
> be a practical different.  If we did support range flushes, we'd likely
> only use SC(16) on >2TB devices.
> 
> James

A goal of the simplified SCSI feature set idea is to drop all the
short CDBs that have larger, more capable equivalents - don't carry
READ 6/10/12/16 and SYNCHRONIZE CACHE 10/16, just keep the 16-byte 
versions.  With modern serial IU-based protocols, short CDBs don't 
save any transfer time.  This will simplify design and testing on
both initiator and target sides. Competing command sets like NVMe 
rightly point out that SCSI has too much legacy baggage - all you 
need for IO is one READ, one WRITE, and one FLUSH command.

That's why SBC-3 ended up with these warning notes for all the
non-16 byte CDBs:

NOTE 15 - Migration from the SYNCHRONIZE CACHE (10) command to
the SYNCHRONIZE CACHE (16) command is recommended for all
implementations.

If the LBA field in SYNCHRONIZE CACHE went obsolete, then maybe
SYNCHRONIZE CACHE (10) would be kept instead of (16), but that
field is still present.  So, (16) is the likely survivor.

---
Rob ElliottHP Server Storage




Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread h...@infradead.org
On Fri, Jul 18, 2014 at 10:12:38AM -0700, h...@infradead.org wrote:
> This is what I plan to put in after it passes basic testing:

And that one was on top of my previous version.  One that applies
against core-for-3.17 below:

---
>From 8a79783e5f72ec034a724e16c1f46604bd97bf68 Mon Sep 17 00:00:00 2001
From: "K. Y. Srinivasan" 
Date: Fri, 18 Jul 2014 17:11:27 +0200
Subject: sd: fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O
 timeout

Commit ID: 7e660100d85af860e7ad763202fff717adcdaacd added code to derive the
FLUSH_TIMEOUT from the basic I/O timeout. However, this patch did not use the
basic I/O timeout of the device. Fix this bug.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: James Bottomley 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 377a520..9ffb393 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -880,7 +880,7 @@ static int sd_setup_flush_cmnd(struct scsi_cmnd *cmd)
cmd->transfersize = 0;
cmd->allowed = SD_MAX_RETRIES;
 
-   rq->timeout *= SD_FLUSH_TIMEOUT_MULTIPLIER;
+   rq->timeout = rq->q->rq_timeout * SD_FLUSH_TIMEOUT_MULTIPLIER;
return BLKPREP_OK;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread h...@infradead.org
This is what I plan to put in after it passes basic testing:

---
>From bb617c9465b839d70ecbbc69002a20ccf5f935bd Mon Sep 17 00:00:00 2001
From: "K. Y. Srinivasan" 
Date: Fri, 18 Jul 2014 19:12:58 +0200
Subject: sd: fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O
 timeout

Commit ID: 7e660100d85af860e7ad763202fff717adcdaacd added code to derive the
FLUSH_TIMEOUT from the basic I/O timeout. However, this patch did not use the
basic I/O timeout of the device. Fix this bug.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: James Bottomley 
Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bef4e78..9ffb393 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -880,7 +880,7 @@ static int sd_setup_flush_cmnd(struct scsi_cmnd *cmd)
cmd->transfersize = 0;
cmd->allowed = SD_MAX_RETRIES;
 
-   rq->timeout = SD_TIMEOUT * SD_FLUSH_TIMEOUT_MULTIPLIER;
+   rq->timeout = rq->q->rq_timeout * SD_FLUSH_TIMEOUT_MULTIPLIER;
return BLKPREP_OK;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread KY Srinivasan


> -Original Message-
> From: James Bottomley [mailto:jbottom...@parallels.com]
> Sent: Friday, July 18, 2014 9:57 AM
> To: KY Srinivasan
> Cc: linux-ker...@vger.kernel.org; h...@infradead.org; a...@canonical.com;
> de...@linuxdriverproject.org; micha...@cs.wisc.edu; ax...@kernel.dk;
> linux-scsi@vger.kernel.org; oher...@suse.com;
> gre...@linuxfoundation.org; jasow...@redhat.com
> Subject: Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT
> from the basic I/O timeout
> 
> On Fri, 2014-07-18 at 16:44 +, KY Srinivasan wrote:
> >
> > > -Original Message-
> > > From: Christoph Hellwig (h...@infradead.org)
> > > [mailto:h...@infradead.org]
> > > Sent: Friday, July 18, 2014 8:11 AM
> > > To: KY Srinivasan
> > > Cc: Jens Axboe; James Bottomley; micha...@cs.wisc.edu; Christoph
> > > Hellwig (h...@infradead.org); linux-scsi@vger.kernel.org;
> > > gre...@linuxfoundation.org; jasow...@redhat.com; linux-
> > > ker...@vger.kernel.org; oher...@suse.com; a...@canonical.com;
> > > de...@linuxdriverproject.org
> > > Subject: Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the
> > > FLUSH_TIMEOUT from the basic I/O timeout
> > >
> > > On Thu, Jul 17, 2014 at 11:53:33PM +, KY Srinivasan wrote:
> > > > I still see this problem. There was talk of fixing it elsewhere.
> > >
> > > Well, what we have right not is entirely broken, given that the
> > > block layer doesn't initialize ->timeout on TYPE_FS requeuests.
> > >
> > > We either need to revert that initial commit or apply something like
> > > the attached patch as a quick fix.
> > I had sent this exact patch sometime back:
> >
> > http://www.spinics.net/lists/linux-scsi/msg75385.html
> 
> Actually, no you didn't.  The difference is in the derivation of the timeout.
> Christoph's patch is absolute in terms of SD_TIMEOUT; yours is relative to the
> queue timeout setting ... I thought there was a reason for preferring the
> relative version.

You are right; sorry about that. I think my version is better since we do want 
to base the
flush timeout relative to the basic timeout.

K. Y
> 
> James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread James Bottomley
On Fri, 2014-07-18 at 10:00 -0700, Christoph Hellwig wrote:
> On Wed, Jun 04, 2014 at 09:33:43AM -0700, K. Y. Srinivasan wrote:
> > Commit ID: 7e660100d85af860e7ad763202fff717adcdaacd added code to derive the
> > FLUSH_TIMEOUT from the basic I/O timeout. However, this patch did not use 
> > the
> > basic I/O timeout of the device. Fix this bug.
> > 
> > Signed-off-by: K. Y. Srinivasan 
> 
> Looks good to me,
> 
> Reviewed-by: Christoph Hellwig 
> 
> (it will need some changes to apply to my tree, but I'm happy to do that
> if I can get another review).
> 
> James, does this look fine to you?

Yes:

Reviewed-by: James Bottomley 

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread Christoph Hellwig
On Wed, Jun 04, 2014 at 09:33:43AM -0700, K. Y. Srinivasan wrote:
> Commit ID: 7e660100d85af860e7ad763202fff717adcdaacd added code to derive the
> FLUSH_TIMEOUT from the basic I/O timeout. However, this patch did not use the
> basic I/O timeout of the device. Fix this bug.
> 
> Signed-off-by: K. Y. Srinivasan 

Looks good to me,

Reviewed-by: Christoph Hellwig 

(it will need some changes to apply to my tree, but I'm happy to do that
if I can get another review).

James, does this look fine to you?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread h...@infradead.org
On Fri, Jul 18, 2014 at 04:57:13PM +, James Bottomley wrote:
> Actually, no you didn't.  The difference is in the derivation of the
> timeout.  Christoph's patch is absolute in terms of SD_TIMEOUT; yours is
> relative to the queue timeout setting ... I thought there was a reason
> for preferring the relative version.

Yes, KYs version is better.  It takes the base timeout drivers set
on the request queue into account.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread James Bottomley
On Fri, 2014-07-18 at 16:44 +, KY Srinivasan wrote:
> 
> > -Original Message-
> > From: Christoph Hellwig (h...@infradead.org) [mailto:h...@infradead.org]
> > Sent: Friday, July 18, 2014 8:11 AM
> > To: KY Srinivasan
> > Cc: Jens Axboe; James Bottomley; micha...@cs.wisc.edu; Christoph Hellwig
> > (h...@infradead.org); linux-scsi@vger.kernel.org;
> > gre...@linuxfoundation.org; jasow...@redhat.com; linux-
> > ker...@vger.kernel.org; oher...@suse.com; a...@canonical.com;
> > de...@linuxdriverproject.org
> > Subject: Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT
> > from the basic I/O timeout
> > 
> > On Thu, Jul 17, 2014 at 11:53:33PM +, KY Srinivasan wrote:
> > > I still see this problem. There was talk of fixing it elsewhere.
> > 
> > Well, what we have right not is entirely broken, given that the block layer
> > doesn't initialize ->timeout on TYPE_FS requeuests.
> > 
> > We either need to revert that initial commit or apply something like the
> > attached patch as a quick fix.
> I had sent this exact patch sometime back:
> 
> http://www.spinics.net/lists/linux-scsi/msg75385.html

Actually, no you didn't.  The difference is in the derivation of the
timeout.  Christoph's patch is absolute in terms of SD_TIMEOUT; yours is
relative to the queue timeout setting ... I thought there was a reason
for preferring the relative version.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/25] Replace DEFINE_PCI_DEVICE_TABLE macro use

2014-07-18 Thread James Bottomley
On Fri, 2014-07-18 at 09:43 -0700, Greg KH wrote:
> On Fri, Jul 18, 2014 at 12:22:13PM -0400, John W. Linville wrote:
> > On Fri, Jul 18, 2014 at 05:26:47PM +0200, Benoit Taine wrote:
> > > We should prefer `const struct pci_device_id` over
> > > `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines.
> > > This issue was reported by checkpatch.
> > 
> > Honestly, I prefer the macro -- it stands-out more.  Maybe the style
> > guidelines and/or checkpatch should change instead?
> 
> The macro is horrid, no other bus has this type of thing just to save a
> few characters in typing

OK, so this is the macro:

#define DEFINE_PCI_DEVICE_TABLE(_table) \
const struct pci_device_id _table[]

Could you explain what's so horrible?

The reason it's useful today is that people forget the const (and
sometimes the [] making it a true table instead of a pointer).  If you
use the DEFINE_PCI_DEVICE_TABLE macro, the compile breaks if you use it
wrongly (good) and you automatically get the correct annotations.

> , so why should PCI be "special" in this regard
> anymore?

I think the PCI usage dwarfs most other bus types now, so you could turn
the question around.  However, I don't think majority voting is a good
guide to best practise; lets debate the merits for their own sake.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/25] Replace DEFINE_PCI_DEVICE_TABLE macro use

2014-07-18 Thread Greg KH
On Fri, Jul 18, 2014 at 12:22:13PM -0400, John W. Linville wrote:
> On Fri, Jul 18, 2014 at 05:26:47PM +0200, Benoit Taine wrote:
> > We should prefer `const struct pci_device_id` over
> > `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines.
> > This issue was reported by checkpatch.
> 
> Honestly, I prefer the macro -- it stands-out more.  Maybe the style
> guidelines and/or checkpatch should change instead?

The macro is horrid, no other bus has this type of thing just to save a
few characters in typing, so why should PCI be "special" in this regard
anymore?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread KY Srinivasan


> -Original Message-
> From: Christoph Hellwig (h...@infradead.org) [mailto:h...@infradead.org]
> Sent: Friday, July 18, 2014 8:11 AM
> To: KY Srinivasan
> Cc: Jens Axboe; James Bottomley; micha...@cs.wisc.edu; Christoph Hellwig
> (h...@infradead.org); linux-scsi@vger.kernel.org;
> gre...@linuxfoundation.org; jasow...@redhat.com; linux-
> ker...@vger.kernel.org; oher...@suse.com; a...@canonical.com;
> de...@linuxdriverproject.org
> Subject: Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT
> from the basic I/O timeout
> 
> On Thu, Jul 17, 2014 at 11:53:33PM +, KY Srinivasan wrote:
> > I still see this problem. There was talk of fixing it elsewhere.
> 
> Well, what we have right not is entirely broken, given that the block layer
> doesn't initialize ->timeout on TYPE_FS requeuests.
> 
> We either need to revert that initial commit or apply something like the
> attached patch as a quick fix.
I had sent this exact patch sometime back:

http://www.spinics.net/lists/linux-scsi/msg75385.html

K. Y


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/25] Replace DEFINE_PCI_DEVICE_TABLE macro use

2014-07-18 Thread John W. Linville
On Fri, Jul 18, 2014 at 05:26:47PM +0200, Benoit Taine wrote:
> We should prefer `const struct pci_device_id` over
> `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines.
> This issue was reported by checkpatch.

Honestly, I prefer the macro -- it stands-out more.  Maybe the style
guidelines and/or checkpatch should change instead?

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/25] Replace DEFINE_PCI_DEVICE_TABLE macro use

2014-07-18 Thread James Bottomley
On Fri, 2014-07-18 at 17:26 +0200, Benoit Taine wrote:
> We should prefer `const struct pci_device_id` over
> `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines.
> This issue was reported by checkpatch.

What kernel coding style?  checkpatch isn't the arbiter of style, if
that's the only problem.

The DEFINE_PCI macro was a well reasoned addition when it was added in
2008.  The problem was most people were getting the definition wrong.
When we converted away from CONFIG_HOTPLUG, having this DEFINE_ meant
that only one place needed changing instead of hundreds for PCI tables.

The reason people were getting the PCI table wrong was mostly the init
section specifiers which are now gone, but it has enough underlying
utility (mostly constification) that I don't think we'd want to churn
the kernel hugely to make a change to struct pci_table and then have to
start detecting and fixing misuses.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/25] BusLogic: Replace DEFINE_PCI_DEVICE_TABLE macro use

2014-07-18 Thread Benoit Taine
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet
kernel coding style guidelines. This issue was reported by checkpatch.

Signed-off-by: Benoit Taine 

---
Tested by compilation without errors.

 drivers/scsi/BusLogic.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/BusLogic.c b/drivers/scsi/BusLogic.c
index 972f817..64c7514 100644
--- a/drivers/scsi/BusLogic.c
+++ b/drivers/scsi/BusLogic.c
@@ -3893,7 +3893,7 @@ __setup("BusLogic=", blogic_setup);
  PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
{ }
 };*/
-static DEFINE_PCI_DEVICE_TABLE(blogic_pci_tbl) = {
+static const struct pci_device_id blogic_pci_tbl[] = {
{PCI_DEVICE(PCI_VENDOR_ID_BUSLOGIC, 
PCI_DEVICE_ID_BUSLOGIC_MULTIMASTER)},
{PCI_DEVICE(PCI_VENDOR_ID_BUSLOGIC, 
PCI_DEVICE_ID_BUSLOGIC_MULTIMASTER_NC)},
{PCI_DEVICE(PCI_VENDOR_ID_BUSLOGIC, PCI_DEVICE_ID_BUSLOGIC_FLASHPOINT)},

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread James Bottomley
On Fri, 2014-07-18 at 00:51 +, Elliott, Robert (Server Storage)
wrote:
> In sd_sync_cache:
>   rq->timeout *= SD_FLUSH_TIMEOUT_MULTIPLIER;
> 
> Regardless of the baseline for the multiplication, a magic 
> number of 2 is too arbitrary.  That might work for an
> individual drive, but could be far too short for a RAID
> controller that runs into worst case error handling for
> the drives to which it is flushing (e.g., if its cache
> is volatile and the drives all have recoverable errors
> during writes).  That time goes up with a bigger cache and 
> with more fragmented write data in the cache requiring
> more individual WRITE commands.

RAID devices with gigabytes of cache are usually battery backed and lie
about their cache type (they usually claim write through).  This
behaviour is exactly what we want because the flush is used to enforce
write barriers for filesystem consistency, so we only want to flush
volatile caches.

The rare case you cite, where the RAID device is volatile and having
difficulty flushing, we do want a failure because otherwise the
filesystem will become unusable and the system will live lock ...
barriers are sent down frequently under normal filesystem operation.

The SCSI standards committees have been struggling with defining and
detecting the difference between volatile and non-volatile cache and the
heuristics we'd have to use to avoid annoying USB devices with detecting
this don't look pretty.  We always zero the SYNC_NV bit anyway, so even
if the devices stopped lying, we'd be safe.

> A better value would be the Recommended Command Timeout field 
> value reported in the REPORT SUPPORTED OPERATION CODES command,
> if reported by the device server.  That is supposed to account
> for the worst case.
> 
> For cases where that is not reported, exposing the multiplier
> in sysfs would let the timeout for simple devices be set to
> smaller values than complex devices.
> 
> Also, in both sd_setup_flush_cmnd and sd_sync_cache:
>   cmd->cmnd[0] = SYNCHRONIZE_CACHE;
>   cmd->cmd_len = 10;
> 
> SYNCHRONIZE CACHE (16) should be favored over SYNCHRONIZE 
> CACHE (10) unless SYNCHRONIZE CACHE (10) is not supported.

For what reason.  We usually go for the safe alternatives, which is 10
byte commands because they have the widest testing and greatest level of
support.  We don't do range flushes currently, so there doesn't seem to
be a practical different.  If we did support range flushes, we'd likely
only use SC(16) on >2TB devices.

James



[PATCH 0/25] Replace DEFINE_PCI_DEVICE_TABLE macro use

2014-07-18 Thread Benoit Taine
We should prefer `const struct pci_device_id` over
`DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines.
This issue was reported by checkpatch.

A simplified version of the semantic patch that makes this change is as
follows (http://coccinelle.lip6.fr/):

// 

@@
identifier i;
declarer name DEFINE_PCI_DEVICE_TABLE;
initializer z;
@@

- DEFINE_PCI_DEVICE_TABLE(i)
+ const struct pci_device_id i[]
= z;

// 

I have 103 patches ready, and will only send a few for you to judge if
it is useful enough, and to prevent from spamming too much.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] scsi: move the writeable from struct scsi_device to struct scsi_cd

2014-07-18 Thread Christoph Hellwig
We currently set the field in common code based on the device type,
but then only use it in the cdrom driver which also overrides the
value previously set in the generic code.

Just leave this entirely to the CDROM driver to make everyones life
simpler.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi_scan.c   | 24 
 drivers/scsi/sd.c  |  3 ---
 drivers/scsi/sr.c  |  4 ++--
 drivers/scsi/sr.h  |  1 +
 include/scsi/scsi_device.h |  1 -
 5 files changed, 3 insertions(+), 30 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index b91cfaf..a5a0bde 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -807,30 +807,6 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
sdev->removable = (inq_result[1] & 0x80) >> 7;
}
 
-   switch (sdev->type) {
-   case TYPE_RBC:
-   case TYPE_TAPE:
-   case TYPE_DISK:
-   case TYPE_PRINTER:
-   case TYPE_MOD:
-   case TYPE_PROCESSOR:
-   case TYPE_SCANNER:
-   case TYPE_MEDIUM_CHANGER:
-   case TYPE_ENCLOSURE:
-   case TYPE_COMM:
-   case TYPE_RAID:
-   case TYPE_OSD:
-   sdev->writeable = 1;
-   break;
-   case TYPE_ROM:
-   case TYPE_WORM:
-   sdev->writeable = 0;
-   break;
-   default:
-   sdev_printk(KERN_INFO, sdev, "unknown device type %d\n",
-   sdev->type);
-   }
-
if (sdev->type == TYPE_RBC || sdev->type == TYPE_ROM) {
/* RBC and MMC devices can return SCSI-3 compliance and yet
 * still not support REPORT LUNS, so make them act as
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 3663e38..377a520 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -992,9 +992,6 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
}
}
if (rq_data_dir(rq) == WRITE) {
-   if (!sdp->writeable) {
-   goto out;
-   }
SCpnt->cmnd[0] = WRITE_6;
 
if (blk_integrity_rq(rq))
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index cce4771..7eeb936 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -435,7 +435,7 @@ static int sr_init_command(struct scsi_cmnd *SCpnt)
}
 
if (rq_data_dir(rq) == WRITE) {
-   if (!cd->device->writeable)
+   if (!cd->writeable)
goto out;
SCpnt->cmnd[0] = WRITE_10;
cd->cdi.media_written = 1;
@@ -927,7 +927,7 @@ static void get_capabilities(struct scsi_cd *cd)
 */
if ((cd->cdi.mask & (CDC_DVD_RAM | CDC_MRW_W | CDC_RAM | CDC_CD_RW)) !=
(CDC_DVD_RAM | CDC_MRW_W | CDC_RAM | CDC_CD_RW)) {
-   cd->device->writeable = 1;
+   cd->writeable = 1;
}
 
kfree(buffer);
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index 5334e98..1d1f6f4 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -36,6 +36,7 @@ typedef struct scsi_cd {
struct scsi_device *device;
unsigned int vendor;/* vendor code, see sr_vendor.c */
unsigned long ms_offset;/* for reading multisession-CD's
*/
+   unsigned writeable : 1;
unsigned use:1; /* is this device still supportable */
unsigned xa_flag:1; /* CD has XA sectors ? */
unsigned readcd_known:1;/* drive supports READ_CD (0xbe) */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 0f853f2..b895784 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -127,7 +127,6 @@ struct scsi_device {
 * pass settings from slave_alloc to scsi
 * core. */
unsigned int eh_timeout; /* Error handling timeout */
-   unsigned writeable:1;
unsigned removable:1;
unsigned changed:1; /* Data invalid due to media change */
unsigned busy:1;/* Used to prevent races */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] scsi: add a symbolic name for the ZBC device type

2014-07-18 Thread Christoph Hellwig
Make sure we have a symbolic name for the ZBC type available,
so that e.g. patch for a SATA to translate ZAC commands can
make use of it.

Signed-off-by: Christoph Hellwig 
---
 include/scsi/scsi.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 91e2e42..e6df23c 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -332,6 +332,7 @@ static inline int scsi_status_is_good(int status)
 #define TYPE_ENCLOSURE  0x0d/* Enclosure Services Device */
 #define TYPE_RBC   0x0e
 #define TYPE_OSD0x11
+#define TYPE_ZBC0x14
 #define TYPE_NO_LUN 0x7f
 
 /* SCSI protocols; these are taken from SPC-3 section 7.5 */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] scsi: update scsi_device_types

2014-07-18 Thread Christoph Hellwig
Add two new device types, most importantly the zoned block device
one.

Split from an earlier patch by Hannes Reinecke.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 013709f..33318f5 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -122,6 +122,8 @@ static const char *const scsi_device_types[] = {
"Bridge controller",
"Object storage   ",
"Automation/Drive ",
+   "Security Manager ",
+   "Direct-Access-ZBC",
 };
 
 /**
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


minor device type updates

2014-07-18 Thread Christoph Hellwig
Two trivial patches extracted from Hannes' ZBC series, and one cleanup
in response to my review of it.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread Christoph Hellwig (h...@infradead.org)
On Fri, Jul 18, 2014 at 12:51:06AM +, Elliott, Robert (Server Storage) 
wrote:
> SYNCHRONIZE CACHE (16) should be favored over SYNCHRONIZE 
> CACHE (10) unless SYNCHRONIZE CACHE (10) is not supported.

I gues you mean (16) for the last occurance?  What's the benefit of
using SYNCHRONIZE CACHE (16) if we don't pass a LBA range?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [SCSI] Fix a bug in deriving the FLUSH_TIMEOUT from the basic I/O timeout

2014-07-18 Thread Christoph Hellwig (h...@infradead.org)
On Thu, Jul 17, 2014 at 11:53:33PM +, KY Srinivasan wrote:
> I still see this problem. There was talk of fixing it elsewhere.

Well, what we have right not is entirely broken, given that the
block layer doesn't initialize ->timeout on TYPE_FS requeuests.

We either need to revert that initial commit or apply something like
the attached patch as a quick fix.

>From ecbf154d15f4022676219b9ff90e542d1db64392 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Fri, 18 Jul 2014 17:11:27 +0200
Subject: sd: set a non-zero timeout for flush requests

rq->timeout for TYPE_FS commands needs to be initialized by the driver,
so we can't simply multiply it.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 377a520..bef4e78 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -880,7 +880,7 @@ static int sd_setup_flush_cmnd(struct scsi_cmnd *cmd)
cmd->transfersize = 0;
cmd->allowed = SD_MAX_RETRIES;
 
-   rq->timeout *= SD_FLUSH_TIMEOUT_MULTIPLIER;
+   rq->timeout = SD_TIMEOUT * SD_FLUSH_TIMEOUT_MULTIPLIER;
return BLKPREP_OK;
 }
 
-- 
1.9.1



Re: [PATCH] virtio_scsi: check on resp->sense_len instead of 'sense_buffer'

2014-07-18 Thread Ming Lei
On Fri, Jul 18, 2014 at 11:00 PM, Paolo Bonzini  wrote:
> Il 18/07/2014 16:57, Ming Lei ha scritto:
>> - if (sc->sense_buffer) {
>> + if (resp->sense_len) {
>
> In the (unlikely) case that sc->sense_buffer == NULL, you'd pass a NULL
> to memcpy.

Every valid scsi_cmnd should have non NULL sc->sense_buffer, please
see scsi_host_alloc_command().

Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] SCSI: Add a blacklist flag which enables VPD page inquiries

2014-07-18 Thread KY Srinivasan


> -Original Message-
> From: Christoph Hellwig [mailto:h...@infradead.org]
> Sent: Friday, July 18, 2014 8:04 AM
> To: Martin K. Petersen
> Cc: linux-scsi@vger.kernel.org; KY Srinivasan
> Subject: Re: [PATCH] SCSI: Add a blacklist flag which enables VPD page
> inquiries
> 
> On Tue, Jul 15, 2014 at 12:49:17PM -0400, Martin K. Petersen wrote:
> >
> > Despite supporting modern SCSI features some storage devices continue
> > to claim conformance to an older version of the SPC spec. This is done
> > for compatibility with legacy operating systems.
> >
> > Linux by default will not attempt to read VPD pages on devices that
> > claim SPC-2 or older. Introduce a blacklist flag that can be used to
> > trigger VPD page inquiries on devices that are known to support them.
> >
> > Reported-by: KY Srinivasan 
> > Tested-by: KY Srinivasan 
> > Signed-off-by: Martin K. Petersen 
> > CC: 
> 
> Looks good,
> 
> Reviewed-by: Christoph Hellwig 
> 
> KY, can you send a hyperv patch that sets it to go along with this one?
Will do. Thanks Christoph.

K. Y
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] SCSI: Add a blacklist flag which enables VPD page inquiries

2014-07-18 Thread Christoph Hellwig
On Tue, Jul 15, 2014 at 12:49:17PM -0400, Martin K. Petersen wrote:
> 
> Despite supporting modern SCSI features some storage devices continue to
> claim conformance to an older version of the SPC spec. This is done for
> compatibility with legacy operating systems.
> 
> Linux by default will not attempt to read VPD pages on devices that
> claim SPC-2 or older. Introduce a blacklist flag that can be used to
> trigger VPD page inquiries on devices that are known to support them.
> 
> Reported-by: KY Srinivasan 
> Tested-by: KY Srinivasan 
> Signed-off-by: Martin K. Petersen 
> CC: 

Looks good,

Reviewed-by: Christoph Hellwig 

KY, can you send a hyperv patch that sets it to go along with this one?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_scsi: check on resp->sense_len instead of 'sense_buffer'

2014-07-18 Thread Paolo Bonzini
Il 18/07/2014 16:57, Ming Lei ha scritto:
> - if (sc->sense_buffer) {
> + if (resp->sense_len) {

In the (unlikely) case that sc->sense_buffer == NULL, you'd pass a NULL
to memcpy.

If you want, you can change this if to

if (sc->sense_buffer && resp->sense_len)

but frankly it seems like slightly pointless churn to me.

Paolo

>   memcpy(sc->sense_buffer, resp->sense,
>  min_t(u32, resp->sense_len, VIRTIO_SCSI_SENSE_SIZE));
> - if (resp->sense_len)
> - set_driver_byte(sc, DRIVER_SENSE);
> + set_driver_byte(sc, DRIVER_SENSE);
>   }


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virtio_scsi: check on resp->sense_len instead of 'sense_buffer'

2014-07-18 Thread Ming Lei
The 'sense_buffer' of 'struct scsi_cmnd' is always true, but
resp->sense_len is only set if there is sense available.

Signed-off-by: Ming Lei 
---
 drivers/scsi/virtio_scsi.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index eee1bc0..4d2d3b2 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -194,11 +194,10 @@ static void virtscsi_complete_cmd(struct virtio_scsi 
*vscsi, void *buf)
}
 
WARN_ON(resp->sense_len > VIRTIO_SCSI_SENSE_SIZE);
-   if (sc->sense_buffer) {
+   if (resp->sense_len) {
memcpy(sc->sense_buffer, resp->sense,
   min_t(u32, resp->sense_len, VIRTIO_SCSI_SENSE_SIZE));
-   if (resp->sense_len)
-   set_driver_byte(sc, DRIVER_SENSE);
+   set_driver_byte(sc, DRIVER_SENSE);
}
 
sc->scsi_done(sc);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/15] block copy: initial XCOPY offload support

2014-07-18 Thread Mikulas Patocka


On Fri, 18 Jul 2014, Tomas Henzl wrote:

> > +   if (src_sector + nr_sects < src_sector ||
> > +   dst_sector + nr_sects < dst_sector)
> > +   return -EINVAL;
> 
> Hi Mikulas,
> this^ is meant as an overflow test or what is the reason?
> Thanks, Tomas

Yes. It is a test for overflow.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: CPU: 1 PID: 495 at mm/slab_common.c:69 kmem_cache_create+0x1a9/0x330()

2014-07-18 Thread Christoph Hellwig
On Fri, Jul 18, 2014 at 05:21:04PM +0400, Vladimir Davydov wrote:
> Slab warns, because the name of the cache being created contains spaces.
> The "bad" cache is created by scsi_get_host_cmd_pool. Its name
> (pool->cmd_name) is initialized by scsi_alloc_host_cmd_pool as follows:
> 
>   pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);
> 
> So, if hostt->name contains spaces, the cache name will also contain
> spaces and we'll get the warning. And hostt->name can contain spaces,
> e.g. virtscsi_host_template_single.name="Virtio SCSI HBA".

Or might not even be present.  I'll send a patch to replace it with
->proc_name, which must not contain spaces and is generally shorter
as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: CPU: 1 PID: 495 at mm/slab_common.c:69 kmem_cache_create+0x1a9/0x330()

2014-07-18 Thread Vladimir Davydov
Slab warns, because the name of the cache being created contains spaces.
The "bad" cache is created by scsi_get_host_cmd_pool. Its name
(pool->cmd_name) is initialized by scsi_alloc_host_cmd_pool as follows:

pool->cmd_name = kasprintf(GFP_KERNEL, "%s_cmd", hostt->name);

So, if hostt->name contains spaces, the cache name will also contain
spaces and we'll get the warning. And hostt->name can contain spaces,
e.g. virtscsi_host_template_single.name="Virtio SCSI HBA".

The issue was introduced by commit 89d9a567952ba ("[SCSI] add support
for per-host cmd pools"). It can be fixed e.g. by substituting spaces
with underscores in cache names.

Adding the patch author and reviewers to Cc.

On Fri, Jul 18, 2014 at 12:57:25PM +0200, poma wrote:
> 
> I guess someone working over the summertime. :)
> 
> [ cut here ]
> WARNING: CPU: 1 PID: 495 at mm/slab_common.c:69 
> kmem_cache_create+0x1a9/0x330()
> Modules linked in: virtio_net virtio_scsi(+) drm virtio_pci i2c_core 
> virtio_ring ata_generic pata_acpi virtio sunrpc dm_crypt dm_round_robin 
> linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor 
> async_tx raid6_pq raid1 raid0 iscsi_ibft iscsi_boot_sysfs floppy iscsi_tcp 
> libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs edd dm_multipath
> CPU: 1 PID: 495 Comm: systemd-udevd Not tainted 
> 3.16.0-0.rc5.git0.1.fc21.x86_64 #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>    bfd3b05a 88007f55fa40 8171b4e3
>    88007f55fa78 8108f30d 880079175d40
>   88007f55ffd8 81c6de08 88007f55ffd8 88007f55ffd8
> Call Trace:
>   [] dump_stack+0x45/0x56
>   [] warn_slowpath_common+0x7d/0xa0
>   [] warn_slowpath_null+0x1a/0x20
>   [] kmem_cache_create+0x1a9/0x330
>   [] scsi_setup_command_freelist+0x120/0x270
>   [] scsi_add_host_with_dma+0x60/0x2b0
>   [] virtscsi_probe+0x269/0x2af [virtio_scsi]
>   [] ? vp_reset+0x90/0x90 [virtio_pci]
>   [] virtio_dev_probe+0xe9/0x160 [virtio]
>   [] driver_probe_device+0xa3/0x400
>   [] __driver_attach+0x8b/0x90
>   [] ? __device_attach+0x40/0x40
>   [] bus_for_each_dev+0x73/0xc0
>   [] driver_attach+0x1e/0x20
>   [] bus_add_driver+0x180/0x250
>   [] ? 0xa017
>   [] driver_register+0x64/0xf0
>   [] register_virtio_driver+0x20/0x40 [virtio]
>   [] init+0x85/0x1000 [virtio_scsi]
>   [] do_one_initcall+0xd8/0x210
>   [] ? __vunmap+0xa2/0x100
>   [] load_module+0x2061/0x2600
>   [] ? store_uevent+0x70/0x70
>   [] SyS_init_module+0xcd/0x120
>   [] system_call_fastpath+0x16/0x1b
> ---[ end trace 64d7cf025fd3bf4a ]---
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1121092
> 
> 
> poma
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/15] block copy: initial XCOPY offload support

2014-07-18 Thread Tomas Henzl
On 07/15/2014 09:34 PM, Mikulas Patocka wrote:
> This is Martin Petersen's xcopy patch
> (https://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=xcopy&id=0bdeed274e16b3038a851552188512071974eea8)
> with some bug fixes, ported to the current kernel.
>
> This patch makes it possible to use the SCSI XCOPY command.
>
> We create a bio that has REQ_COPY flag in bi_rw and a bi_copy structure
> that defines the source device. The target device is defined in the
> bi_bdev and bi_iter.bi_sector.
>
> There is a new BLKCOPY ioctl that makes it possible to use XCOPY from
> userspace. The ioctl argument is a pointer to an array of four uint64_t
> values.
>
> The first value is a source byte offset, the second value is a destination
> byte offset, the third value is byte length. The forth value is written by
> the kernel and it represents the number of bytes that the kernel actually
> copied.
>
> Signed-off-by: Martin K. Petersen 
> Signed-off-by: Mikulas Patocka 
>
> ---
>  Documentation/ABI/testing/sysfs-block |9 +
>  block/bio.c   |2 
>  block/blk-core.c  |5 
>  block/blk-lib.c   |   95 
>  block/blk-merge.c |7 
>  block/blk-settings.c  |   13 +
>  block/blk-sysfs.c |   10 +
>  block/compat_ioctl.c  |1 
>  block/ioctl.c |   49 ++
>  drivers/scsi/scsi.c   |   57 +++
>  drivers/scsi/sd.c |  263 
> +-
>  drivers/scsi/sd.h |4 
>  include/linux/bio.h   |9 -
>  include/linux/blk_types.h |   15 +
>  include/linux/blkdev.h|   15 +
>  include/scsi/scsi_device.h|3 
>  include/uapi/linux/fs.h   |1 
>  17 files changed, 545 insertions(+), 13 deletions(-)
>
> Index: linux-3.16-rc5/Documentation/ABI/testing/sysfs-block
> ===
> --- linux-3.16-rc5.orig/Documentation/ABI/testing/sysfs-block 2014-07-14 
> 15:17:07.0 +0200
> +++ linux-3.16-rc5/Documentation/ABI/testing/sysfs-block  2014-07-14 
> 16:26:44.0 +0200
> @@ -220,3 +220,12 @@ Description:
>   write_same_max_bytes is 0, write same is not supported
>   by the device.
>  
> +
> +What:/sys/block//queue/copy_max_bytes
> +Date:January 2014
> +Contact: Martin K. Petersen 
> +Description:
> + Devices that support copy offloading will set this value
> + to indicate the maximum buffer size in bytes that can be
> + copied in one operation. If the copy_max_bytes is 0 the
> + device does not support copy offload.
> Index: linux-3.16-rc5/block/blk-core.c
> ===
> --- linux-3.16-rc5.orig/block/blk-core.c  2014-07-14 16:26:22.0 
> +0200
> +++ linux-3.16-rc5/block/blk-core.c   2014-07-14 16:26:44.0 +0200
> @@ -1831,6 +1831,11 @@ generic_make_request_checks(struct bio *
>   goto end_io;
>   }
>  
> + if (bio->bi_rw & REQ_COPY && !bdev_copy_offload(bio->bi_bdev)) {
> + err = -EOPNOTSUPP;
> + goto end_io;
> + }
> +
>   /*
>* Various block parts want %current->io_context and lazy ioc
>* allocation ends up trading a lot of pain for a small amount of
> Index: linux-3.16-rc5/block/blk-lib.c
> ===
> --- linux-3.16-rc5.orig/block/blk-lib.c   2014-07-14 16:26:40.0 
> +0200
> +++ linux-3.16-rc5/block/blk-lib.c2014-07-14 16:32:21.0 +0200
> @@ -304,3 +304,98 @@ int blkdev_issue_zeroout(struct block_de
>   return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
>  }
>  EXPORT_SYMBOL(blkdev_issue_zeroout);
> +
> +/**
> + * blkdev_issue_copy - queue a copy same operation
> + * @src_bdev:source blockdev
> + * @src_sector:  source sector
> + * @dst_bdev:destination blockdev
> + * @dst_sector: destination sector
> + * @nr_sects:number of sectors to copy
> + * @gfp_mask:memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + *Copy a block range from source device to target device.
> + */
> +int blkdev_issue_copy(struct block_device *src_bdev, sector_t src_sector,
> +   struct block_device *dst_bdev, sector_t dst_sector,
> +   unsigned int nr_sects, gfp_t gfp_mask)
> +{
> + DECLARE_COMPLETION_ONSTACK(wait);
> + struct request_queue *sq = bdev_get_queue(src_bdev);
> + struct request_queue *dq = bdev_get_queue(dst_bdev);
> + unsigned int max_copy_sectors;
> + struct bio_batch bb;
> + int ret = 0;
> +
> + if (!sq || !dq)
> + return -ENXIO;
> +
> + max_copy_sectors = min(sq->li

RE: megasas: failed to get PD list

2014-07-18 Thread Kashyap Desai
Peter:

We will check with Falcon MR controller and will get back to you. Can you
quickly check using stable kernel instead of upstream. (Just to check if
it is regression>) ?

~ Kashyap

> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> ow...@vger.kernel.org] On Behalf Of Peter Andersson
> Sent: Friday, July 18, 2014 12:01 AM
> To: linux-scsi@vger.kernel.org
> Subject: megasas: failed to get PD list
>
> Hi.
> I'm having major problems with my "MegaRAID SAS 9240-81" raid
controller.
> Basically the kernel doesn't find my raid drives.
>
> Bios version: 4.38.02.0 (January 16 2014) The bios of the raid
controller shows
> a fully working (and initialized) raid 0.
>
> Kernel: 3.15.1-031501-generic (ubuntu server 14.04)
>
> This is the full dmesg regarding the card:
>
> [0.902723] megasas: 06.803.01.00-rc1 Mon. Mar. 10 17:00:00 PDT 2014
> [0.902788] megasas: 0x1000:0x0073:0x1000:0x9240: bus 3:slot 0:func 0
> [0.903121] megasas: FW now in Ready state
> [0.903180] megaraid_sas :03:00.0: irq 80 for MSI/MSI-X
> [0.903188] megaraid_sas :03:00.0: [scsi0]: FW supports<0> MSIX
> vector,Online CPUs: <12>,Current MSIX <1>
> [0.924336] megasas_init_mfi: fw_support_ieee=67108864
> [0.924372] megasas: INIT adapter done
> [   72.855404] megasas: failed to get PD list
>
> lspci:
> 03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008
> [Falcon] (rev 03)
>
> megaraidsas-status:
> -- Arrays informations --
> -- ID | Type | Size | Status
>
> -- Disks informations
> -- ID | Model | Status | Warnings
>
> After the drives are initialized i also get this message:
> Virtual drive handled by bios
>
> That's all i can think to be of interest to you.
> Could anyone give me some pointers on what could be wrong?
>
> Thanks!
>
> /Peter
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 1/2] percpu_tags: Prototype implementation

2014-07-18 Thread Alexander Gordeev
This is a development of "percpu_ida" library. The major
differrences between "percpu_ida" and "percpu_tags" are:

* The global freelist has gone. As result, CPUs do not compete
  for the global lock.

* Long-running operatons (scanning thru a cpumask) are executed
  with local interrupts enabled;

* percpu_ida::percpu_max_size limit is eliminated. Instead, the
  limit is determined dynamically. Depending from how many CPUs
  are requesting tags each CPU gets a fair share of the tag space;

* A tag is attempted to return to the CPU it was allocated on. As
  result, it is expected the locality of data associated with the
  tag benefits;

Signed-off-by: Alexander Gordeev 
Cc: linux-scsi@vger.kernel.org
Cc: qla2xxx-upstr...@qlogic.com
Cc: Nicholas Bellinger 
Cc: Kent Overstreet 
Cc: "Michael S. Tsirkin" 
---
 include/linux/percpu_tags.h |   37 +++
 lib/Makefile|2 +-
 lib/percpu_tags.c   |  556 +++
 3 files changed, 594 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/percpu_tags.h
 create mode 100644 lib/percpu_tags.c

diff --git a/include/linux/percpu_tags.h b/include/linux/percpu_tags.h
new file mode 100644
index 000..ee89863
--- /dev/null
+++ b/include/linux/percpu_tags.h
@@ -0,0 +1,37 @@
+#ifndef __PERCPU_TAGS_H__
+#define __PERCPU_TAGS_H__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct percpu_cache;
+
+struct percpu_tags {
+   int nr_tags;
+   struct percpu_cache __percpu*cache;
+
+   int *tag_cpu_map;
+
+   cpumask_t   alloc_tags;
+   cpumask_t   free_tags;
+   cpumask_t   wait_tags;
+};
+
+int percpu_tags_alloc(struct percpu_tags *pt, int state);
+void percpu_tags_free(struct percpu_tags *pt, int tag);
+
+int percpu_tags_init(struct percpu_tags *pt, int nr_tags);
+void percpu_tags_destroy(struct percpu_tags *pt);
+
+int percpu_tags_for_each_free(struct percpu_tags *pt,
+ int (*fn)(unsigned, void *), void *data);
+int percpu_tags_free_tags(struct percpu_tags *pt, int cpu);
+
+#define PERCPU_TAGS_BATCH_MAX  4
+
+#endif /* __PERCPU_TAGS_H__ */
diff --git a/lib/Makefile b/lib/Makefile
index ba967a1..671143f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -26,7 +26,7 @@ obj-y += bcd.o div64.o sort.o parser.o halfmd4.o 
debug_locks.o random32.o \
 bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \
 gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \
 bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \
-percpu-refcount.o percpu_ida.o hash.o
+percpu-refcount.o percpu_ida.o percpu_tags.o hash.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += kstrtox.o
diff --git a/lib/percpu_tags.c b/lib/percpu_tags.c
new file mode 100644
index 000..7bb61f5
--- /dev/null
+++ b/lib/percpu_tags.c
@@ -0,0 +1,556 @@
+/*
+ * Percpu tags library, based on percpu IDA library
+ *
+ * Copyright (C) 2014 RedHat, Inc. Alexander Gordeev
+ * Copyright (C) 2013 Datera, Inc. Kent Overstreet
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2, or (at
+ * your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct percpu_cache {
+   spinlock_t  lock;
+   int cpu;/* CPU this cache belongs to */
+   wait_queue_head_t   wait;   /* tasks waiting for a tag */
+
+   int nr_alloc;   /* nr of allocated tags */
+
+   int nr_free;/* nr of unallocated tags */
+   int freelist[];
+};
+
+#define spin_lock_irqsave_cond(lock, cpu, flags)   \
+do  {  \
+   preempt_disable();  \
+   if ((cpu) == smp_processor_id())\
+   local_irq_save(flags);  \
+   spin_lock(lock);\
+   preempt_enable();   \
+} while (0)
+
+#define spin_unlock_irqrestore_cond(lock, cpu, flags)  \
+do {   \
+   int __this_cpu = smp_processor_id();\
+   spin_unlock(lock);  \
+   

[PATCH RFC 0/2] percpu_tags: Prototype implementation

2014-07-18 Thread Alexander Gordeev
Hello Gentleman,

This is a development of "percpu_ida" library. I named it
"percpu_tags" to simplify review, since most of "percpu_ida"
code has gone and a diff would not be informative.

While the concept of per-cpu arrays is is preserved, the
implementation is heavily reworked. The main objective is to
improve the "percpu_ida" locking scheme.

Here is the list of major differrences between "percpu_ida" and
"percpu_tags":

* The global freelist has gone. As result, CPUs do not compete
  for the global lock.

* Long-running operatons (scanning thru a cpumask) are executed
  with local interrupts enabled;

* percpu_ida::percpu_max_size limit is eliminated. Instead, the
  limit is determined dynamically. Depending from how many CPUs
  are requesting tags each CPU gets a fair share of the tag space;

* A tag is attempted to return to the CPU it was allocated on. As
  result, it is expected the locality of data associated with the
  tag benefits;

The code is largely raw and untested. The reason I am posting
is the prototype implementation performs 2-3 times faster than
"percpu_ida", so I would like to ensure if it worth going further
with this approach or is there a no-go.

The performance test is not decent, though. I used "fio" random
read against a "null_blk" device sitting on top of "percpu_tags",
which is not exactly how "percpu_ida" is used. This is another
reason I am posting - an advice on how to properly test is very
appreciated.

The source code could be found at
https://github.com/a-gordeev/linux.git  percpu_tags-v0

Thanks!

Cc: linux-scsi@vger.kernel.org
Cc: qla2xxx-upstr...@qlogic.com
Cc: Nicholas Bellinger 
Cc: Kent Overstreet 
Cc: "Michael S. Tsirkin" 

Alexander Gordeev (2):
  percpu_tags: Prototype implementation
  percpu_tags: Use percpu_tags instead of percpu_ida

 drivers/scsi/qla2xxx/qla_target.c|6 +-
 drivers/target/iscsi/iscsi_target_util.c |6 +-
 drivers/target/target_core_transport.c   |4 +-
 drivers/target/tcm_fc/tfc_cmd.c  |8 +-
 drivers/vhost/scsi.c |6 +-
 include/linux/percpu_tags.h  |   37 ++
 include/target/target_core_base.h|4 +-
 lib/Makefile |2 +-
 lib/percpu_tags.c|  556 ++
 9 files changed, 611 insertions(+), 18 deletions(-)
 create mode 100644 include/linux/percpu_tags.h
 create mode 100644 lib/percpu_tags.c

-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 2/2] percpu_tags: Use percpu_tags instead of percpu_ida

2014-07-18 Thread Alexander Gordeev
All users of "percpu_ida" are blindly converted to
"percpu_tags" users. No testing has been conducted.

Signed-off-by: Alexander Gordeev 
Cc: linux-scsi@vger.kernel.org
Cc: qla2xxx-upstr...@qlogic.com
Cc: Nicholas Bellinger 
Cc: Kent Overstreet 
Cc: "Michael S. Tsirkin" 
---
 drivers/scsi/qla2xxx/qla_target.c|6 +++---
 drivers/target/iscsi/iscsi_target_util.c |6 +++---
 drivers/target/target_core_transport.c   |4 ++--
 drivers/target/tcm_fc/tfc_cmd.c  |8 
 drivers/vhost/scsi.c |6 +++---
 include/target/target_core_base.h|4 ++--
 6 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index e632e14..cec4847 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -2729,7 +2729,7 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
WARN_ON(1);
return;
}
-   percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+   percpu_tags_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
 }
 EXPORT_SYMBOL(qlt_free_cmd);
 
@@ -3153,7 +3153,7 @@ out_term:
 */
spin_lock_irqsave(&ha->hardware_lock, flags);
qlt_send_term_exchange(vha, NULL, &cmd->atio, 1);
-   percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+   percpu_tags_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
ha->tgt.tgt_ops->put_sess(sess);
spin_unlock_irqrestore(&ha->hardware_lock, flags);
 }
@@ -3173,7 +3173,7 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t 
*vha,
struct qla_tgt_cmd *cmd;
int tag;
 
-   tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+   tag = percpu_tags_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
if (tag < 0)
return NULL;
 
diff --git a/drivers/target/iscsi/iscsi_target_util.c 
b/drivers/target/iscsi/iscsi_target_util.c
index fd90b28..f76729e 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -17,7 +17,7 @@
  
**/
 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -158,7 +158,7 @@ struct iscsi_cmd *iscsit_allocate_cmd(struct iscsi_conn 
*conn, int state)
struct se_session *se_sess = conn->sess->se_sess;
int size, tag;
 
-   tag = percpu_ida_alloc(&se_sess->sess_tag_pool, state);
+   tag = percpu_tags_alloc(&se_sess->sess_tag_pool, state);
if (tag < 0)
return NULL;
 
@@ -701,7 +701,7 @@ void iscsit_release_cmd(struct iscsi_cmd *cmd)
kfree(cmd->iov_data);
kfree(cmd->text_in_ptr);
 
-   percpu_ida_free(&sess->se_sess->sess_tag_pool, se_cmd->map_tag);
+   percpu_tags_free(&sess->se_sess->sess_tag_pool, se_cmd->map_tag);
 }
 EXPORT_SYMBOL(iscsit_release_cmd);
 
diff --git a/drivers/target/target_core_transport.c 
b/drivers/target/target_core_transport.c
index 7fa62fc..5c7f6f4 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -272,7 +272,7 @@ int transport_alloc_session_tags(struct se_session *se_sess,
}
}
 
-   rc = percpu_ida_init(&se_sess->sess_tag_pool, tag_num);
+   rc = percpu_tags_init(&se_sess->sess_tag_pool, tag_num);
if (rc < 0) {
pr_err("Unable to init se_sess->sess_tag_pool,"
" tag_num: %u\n", tag_num);
@@ -445,7 +445,7 @@ EXPORT_SYMBOL(transport_deregister_session_configfs);
 void transport_free_session(struct se_session *se_sess)
 {
if (se_sess->sess_cmd_map) {
-   percpu_ida_destroy(&se_sess->sess_tag_pool);
+   percpu_tags_destroy(&se_sess->sess_tag_pool);
if (is_vmalloc_addr(se_sess->sess_cmd_map))
vfree(se_sess->sess_cmd_map);
else
diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index be0c0d0..6176c92 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -100,7 +100,7 @@ static void ft_free_cmd(struct ft_cmd *cmd)
if (fr_seq(fp))
lport->tt.seq_release(fr_seq(fp));
fc_frame_free(fp);
-   percpu_ida_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
+   percpu_tags_free(&sess->se_sess->sess_tag_pool, cmd->se_cmd.map_tag);
ft_sess_put(sess);  /* undo get from lookup at recv */
 }
 
@@ -456,7 +456,7 @@ static void ft_recv_cmd(struct ft_sess *sess, struct 
fc_frame *fp)
struct se_session *se_sess = sess->se_sess;
int tag;
 
-   tag = percpu_ida_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
+   tag = percpu_tags_alloc(&se_sess->sess_tag_pool, TASK_RUNNING);
   

[PATCH 03/14] scsi: centralize command re-queueing in scsi_dispatch_fn

2014-07-18 Thread Christoph Hellwig
Make sure we only have the logic for requeing commands in one place.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/scsi.c | 35 ---
 drivers/scsi/scsi_lib.c |  9 ++---
 2 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 321f854..2396e89 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -645,9 +645,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 * returns an immediate error upwards, and signals
 * that the device is no longer present */
cmd->result = DID_NO_CONNECT << 16;
-   scsi_done(cmd);
-   /* return 0 (because the command has been processed) */
-   goto out;
+   goto done;
}
 
/* Check to see if the scsi lld made this device blocked. */
@@ -659,17 +657,9 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 * occur until the device transitions out of the
 * suspend state.
 */
-
-   scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
-
SCSI_LOG_MLQUEUE(3, scmd_printk(KERN_INFO, cmd,
"queuecommand : device blocked\n"));
-
-   /*
-* NOTE: rtn is still zero here because we don't need the
-* queue to be plugged on return (it's already stopped)
-*/
-   goto out;
+   return SCSI_MLQUEUE_DEVICE_BUSY;
}
 
/*
@@ -693,20 +683,19 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
   "cdb_size=%d host->max_cmd_len=%d\n",
   cmd->cmd_len, cmd->device->host->max_cmd_len));
cmd->result = (DID_ABORT << 16);
-
-   scsi_done(cmd);
-   goto out;
+   goto done;
}
 
if (unlikely(host->shost_state == SHOST_DEL)) {
cmd->result = (DID_NO_CONNECT << 16);
-   scsi_done(cmd);
-   } else {
-   trace_scsi_dispatch_cmd_start(cmd);
-   cmd->scsi_done = scsi_done;
-   rtn = host->hostt->queuecommand(host, cmd);
+   goto done;
+
}
 
+   trace_scsi_dispatch_cmd_start(cmd);
+
+   cmd->scsi_done = scsi_done;
+   rtn = host->hostt->queuecommand(host, cmd);
if (rtn) {
trace_scsi_dispatch_cmd_error(cmd, rtn);
if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
@@ -715,12 +704,12 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 
SCSI_LOG_MLQUEUE(3, scmd_printk(KERN_INFO, cmd,
"queuecommand : request rejected\n"));
-
-   scsi_queue_insert(cmd, rtn);
}
 
- out:
return rtn;
+ done:
+   scsi_done(cmd);
+   return 0;
 }
 
 /**
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3ac677c..bf73427 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1557,9 +1557,12 @@ static void scsi_request_fn(struct request_queue *q)
 * Dispatch the command to the low-level driver.
 */
rtn = scsi_dispatch_cmd(cmd);
-   spin_lock_irq(q->queue_lock);
-   if (rtn)
+   if (rtn) {
+   scsi_queue_insert(cmd, rtn);
+   spin_lock_irq(q->queue_lock);
goto out_delay;
+   }
+   spin_lock_irq(q->queue_lock);
}
 
return;
@@ -1579,7 +1582,7 @@ static void scsi_request_fn(struct request_queue *q)
blk_requeue_request(q, req);
sdev->device_busy--;
 out_delay:
-   if (sdev->device_busy == 0)
+   if (sdev->device_busy == 0 && !scsi_device_blocked(sdev))
blk_delay_queue(q, SCSI_QUEUE_DELAY);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/14] scsi: set ->scsi_done before calling scsi_dispatch_cmd

2014-07-18 Thread Christoph Hellwig
The blk-mq code path will set this to a different function, so make the
code simpler by setting it up in a legacy-request specific place.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/scsi.c | 23 +--
 drivers/scsi/scsi_lib.c | 20 
 2 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2396e89..6200a26 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -72,8 +72,6 @@
 #define CREATE_TRACE_POINTS
 #include 
 
-static void scsi_done(struct scsi_cmnd *cmd);
-
 /*
  * Definitions and constants.
  */
@@ -693,8 +691,6 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
}
 
trace_scsi_dispatch_cmd_start(cmd);
-
-   cmd->scsi_done = scsi_done;
rtn = host->hostt->queuecommand(host, cmd);
if (rtn) {
trace_scsi_dispatch_cmd_error(cmd, rtn);
@@ -708,28 +704,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
 
return rtn;
  done:
-   scsi_done(cmd);
+   cmd->scsi_done(cmd);
return 0;
 }
 
 /**
- * scsi_done - Invoke completion on finished SCSI command.
- * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives
- * ownership back to SCSI Core -- i.e. the LLDD has finished with it.
- *
- * Description: This function is the mid-level's (SCSI Core) interrupt routine,
- * which regains ownership of the SCSI command (de facto) from a LLDD, and
- * calls blk_complete_request() for further processing.
- *
- * This function is interrupt context safe.
- */
-static void scsi_done(struct scsi_cmnd *cmd)
-{
-   trace_scsi_dispatch_cmd_done(cmd);
-   blk_complete_request(cmd->request);
-}
-
-/**
  * scsi_finish_command - cleanup and pass command back to upper layer
  * @cmd: the command
  *
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index bf73427..b832696 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -29,6 +29,8 @@
 #include 
 #include 
 
+#include 
+
 #include "scsi_priv.h"
 #include "scsi_logging.h"
 
@@ -1454,6 +1456,23 @@ static void scsi_softirq_done(struct request *rq)
}
 }
 
+/**
+ * scsi_done - Invoke completion on finished SCSI command.
+ * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives
+ * ownership back to SCSI Core -- i.e. the LLDD has finished with it.
+ *
+ * Description: This function is the mid-level's (SCSI Core) interrupt routine,
+ * which regains ownership of the SCSI command (de facto) from a LLDD, and
+ * calls blk_complete_request() for further processing.
+ *
+ * This function is interrupt context safe.
+ */
+static void scsi_done(struct scsi_cmnd *cmd)
+{
+   trace_scsi_dispatch_cmd_done(cmd);
+   blk_complete_request(cmd->request);
+}
+
 /*
  * Function:scsi_request_fn()
  *
@@ -1556,6 +1575,7 @@ static void scsi_request_fn(struct request_queue *q)
/*
 * Dispatch the command to the low-level driver.
 */
+   cmd->scsi_done = scsi_done;
rtn = scsi_dispatch_cmd(cmd);
if (rtn) {
scsi_queue_insert(cmd, rtn);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/14] scsi: convert device_busy to atomic_t

2014-07-18 Thread Christoph Hellwig
Avoid taking the queue_lock to check the per-device queue limit.  Instead
we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.

Unlike the host and target busy counters this doesn't allow us to avoid the
queue_lock in the request_fn due to the way the interface works, but it'll
allow us to prepare for using the blk-mq code, which doesn't use the
queue_lock at all, and it at least avoids a queue_lock round trip in
scsi_device_unbusy, which is still important given how busy the queue_lock
is.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/message/fusion/mptsas.c |  2 +-
 drivers/scsi/scsi_lib.c | 50 +++--
 drivers/scsi/scsi_sysfs.c   | 10 -
 drivers/scsi/sg.c   |  2 +-
 include/scsi/scsi_device.h  |  4 +---
 5 files changed, 40 insertions(+), 28 deletions(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index 711fcb5..d636dbe 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -3763,7 +3763,7 @@ mptsas_send_link_status_event(struct fw_event_work 
*fw_event)
printk(MYIOC_s_DEBUG_FMT
"SDEV OUTSTANDING CMDS"
"%d\n", ioc->name,
-   sdev->device_busy));
+   
atomic_read(&sdev->device_busy)));
}
 
}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d0bd7e0..1ddf0fb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -302,9 +302,7 @@ void scsi_device_unbusy(struct scsi_device *sdev)
spin_unlock_irqrestore(shost->host_lock, flags);
}
 
-   spin_lock_irqsave(sdev->request_queue->queue_lock, flags);
-   sdev->device_busy--;
-   spin_unlock_irqrestore(sdev->request_queue->queue_lock, flags);
+   atomic_dec(&sdev->device_busy);
 }
 
 /*
@@ -355,9 +353,9 @@ static void scsi_single_lun_run(struct scsi_device 
*current_sdev)
 
 static inline int scsi_device_is_busy(struct scsi_device *sdev)
 {
-   if (sdev->device_busy >= sdev->queue_depth || sdev->device_blocked)
+   if (atomic_read(&sdev->device_busy) >= sdev->queue_depth ||
+   sdev->device_blocked)
return 1;
-
return 0;
 }
 
@@ -1204,7 +1202,7 @@ scsi_prep_return(struct request_queue *q, struct request 
*req, int ret)
 * queue must be restarted, so we schedule a callback to happen
 * shortly.
 */
-   if (sdev->device_busy == 0)
+   if (atomic_read(&sdev->device_busy) == 0)
blk_delay_queue(q, SCSI_QUEUE_DELAY);
break;
default:
@@ -1255,26 +1253,33 @@ static void scsi_unprep_fn(struct request_queue *q, 
struct request *req)
 static inline int scsi_dev_queue_ready(struct request_queue *q,
  struct scsi_device *sdev)
 {
-   if (sdev->device_busy == 0 && sdev->device_blocked) {
+   unsigned int busy;
+
+   busy = atomic_inc_return(&sdev->device_busy) - 1;
+   if (sdev->device_blocked) {
+   if (busy)
+   goto out_dec;
+
/*
 * unblock after device_blocked iterates to zero
 */
-   if (--sdev->device_blocked == 0) {
-   SCSI_LOG_MLQUEUE(3,
-  sdev_printk(KERN_INFO, sdev,
-  "unblocking device at zero depth\n"));
-   } else {
+   if (--sdev->device_blocked != 0) {
blk_delay_queue(q, SCSI_QUEUE_DELAY);
-   return 0;
+   goto out_dec;
}
+   SCSI_LOG_MLQUEUE(3, sdev_printk(KERN_INFO, sdev,
+  "unblocking device at zero depth\n"));
}
-   if (scsi_device_is_busy(sdev))
-   return 0;
+
+   if (busy >= sdev->queue_depth)
+   goto out_dec;
 
return 1;
+out_dec:
+   atomic_dec(&sdev->device_busy);
+   return 0;
 }
 
-
 /*
  * scsi_target_queue_ready: checks if there we can send commands to target
  * @sdev: scsi device on starget to check.
@@ -1448,7 +1453,7 @@ static void scsi_kill_request(struct request *req, struct 
request_queue *q)
 * bump busy counts.  To bump the counters, we need to dance
 * with the locks as normal issue path does.
 */
-   sdev->device_busy++;
+   atomic_inc(&sdev->device_busy);
atomic_inc(&shost->host_busy);
atomic_inc(&starget->target_bus

[PATCH 05/14] scsi: push host_lock down into scsi_{host,target}_queue_ready

2014-07-18 Thread Christoph Hellwig
Prepare for not taking a host-wide lock in the dispatch path by pushing
the lock down into the places that actually need it.  Note that this
patch is just a preparation step, as it will actually increase lock
roundtrips and thus decrease performance on its own.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/scsi_lib.c | 75 +
 1 file changed, 39 insertions(+), 36 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b832696..112c737 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1274,18 +1274,18 @@ static inline int scsi_dev_queue_ready(struct 
request_queue *q,
 /*
  * scsi_target_queue_ready: checks if there we can send commands to target
  * @sdev: scsi device on starget to check.
- *
- * Called with the host lock held.
  */
 static inline int scsi_target_queue_ready(struct Scsi_Host *shost,
   struct scsi_device *sdev)
 {
struct scsi_target *starget = scsi_target(sdev);
+   int ret = 0;
 
+   spin_lock_irq(shost->host_lock);
if (starget->single_lun) {
if (starget->starget_sdev_user &&
starget->starget_sdev_user != sdev)
-   return 0;
+   goto out;
starget->starget_sdev_user = sdev;
}
 
@@ -1293,57 +1293,66 @@ static inline int scsi_target_queue_ready(struct 
Scsi_Host *shost,
/*
 * unblock after target_blocked iterates to zero
 */
-   if (--starget->target_blocked == 0) {
-   SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
-"unblocking target at zero depth\n"));
-   } else
-   return 0;
+   if (--starget->target_blocked != 0)
+   goto out;
+
+   SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
+"unblocking target at zero depth\n"));
}
 
if (scsi_target_is_busy(starget)) {
list_move_tail(&sdev->starved_entry, &shost->starved_list);
-   return 0;
+   goto out;
}
 
-   return 1;
+   scsi_target(sdev)->target_busy++;
+   ret = 1;
+out:
+   spin_unlock_irq(shost->host_lock);
+   return ret;
 }
 
 /*
  * scsi_host_queue_ready: if we can send requests to shost, return 1 else
  * return 0. We must end up running the queue again whenever 0 is
  * returned, else IO can hang.
- *
- * Called with host_lock held.
  */
 static inline int scsi_host_queue_ready(struct request_queue *q,
   struct Scsi_Host *shost,
   struct scsi_device *sdev)
 {
+   int ret = 0;
+
+   spin_lock_irq(shost->host_lock);
+
if (scsi_host_in_recovery(shost))
-   return 0;
+   goto out;
if (shost->host_busy == 0 && shost->host_blocked) {
/*
 * unblock after host_blocked iterates to zero
 */
-   if (--shost->host_blocked == 0) {
-   SCSI_LOG_MLQUEUE(3,
-   shost_printk(KERN_INFO, shost,
-"unblocking host at zero 
depth\n"));
-   } else {
-   return 0;
-   }
+   if (--shost->host_blocked != 0)
+   goto out;
+
+   SCSI_LOG_MLQUEUE(3,
+   shost_printk(KERN_INFO, shost,
+"unblocking host at zero depth\n"));
}
if (scsi_host_is_busy(shost)) {
if (list_empty(&sdev->starved_entry))
list_add_tail(&sdev->starved_entry, 
&shost->starved_list);
-   return 0;
+   goto out;
}
 
/* We're OK to process the command, so we can't be starved */
if (!list_empty(&sdev->starved_entry))
list_del_init(&sdev->starved_entry);
 
-   return 1;
+   shost->host_busy++;
+   ret = 1;
+out:
+   spin_unlock_irq(shost->host_lock);
+   return ret;
 }
 
 /*
@@ -1524,7 +1533,7 @@ static void scsi_request_fn(struct request_queue *q)
blk_start_request(req);
sdev->device_busy++;
 
-   spin_unlock(q->queue_lock);
+   spin_unlock_irq(q->queue_lock);
cmd = req->special;
if (unlikely(cmd == NULL)) {
printk(KERN_CRIT "impossible request in %s.\n"
@@ -1534,7 +1543,6 @@ static void scsi_request_fn(struct request_queue *q)
blk_dump_rq_flags(req, "foo");
BUG();

[PATCH 07/14] scsi: convert host_busy to atomic_t

2014-07-18 Thread Christoph Hellwig
Avoid taking the host-wide host_lock to check the per-host queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if nessecary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/advansys.c |  4 +-
 drivers/scsi/libiscsi.c |  4 +-
 drivers/scsi/libsas/sas_scsi_host.c |  5 ++-
 drivers/scsi/qlogicpti.c|  2 +-
 drivers/scsi/scsi.c |  2 +-
 drivers/scsi/scsi_error.c   |  7 ++--
 drivers/scsi/scsi_lib.c | 74 ++---
 drivers/scsi/scsi_sysfs.c   |  9 -
 include/scsi/scsi_host.h| 10 ++---
 9 files changed, 69 insertions(+), 48 deletions(-)

diff --git a/drivers/scsi/advansys.c b/drivers/scsi/advansys.c
index e716d0a..43761c1 100644
--- a/drivers/scsi/advansys.c
+++ b/drivers/scsi/advansys.c
@@ -2512,7 +2512,7 @@ static void asc_prt_scsi_host(struct Scsi_Host *s)
 
printk("Scsi_Host at addr 0x%p, device %s\n", s, dev_name(boardp->dev));
printk(" host_busy %u, host_no %d,\n",
-  s->host_busy, s->host_no);
+  atomic_read(&s->host_busy), s->host_no);
 
printk(" base 0x%lx, io_port 0x%lx, irq %d,\n",
   (ulong)s->base, (ulong)s->io_port, boardp->irq);
@@ -3346,7 +3346,7 @@ static void asc_prt_driver_conf(struct seq_file *m, 
struct Scsi_Host *shost)
 
seq_printf(m,
   " host_busy %u, max_id %u, max_lun %llu, max_channel %u\n",
-  shost->host_busy, shost->max_id,
+  atomic_read(&shost->host_busy), shost->max_id,
   shost->max_lun, shost->max_channel);
 
seq_printf(m,
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index f2db82b..f9f3a12 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -2971,7 +2971,7 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 */
for (;;) {
spin_lock_irqsave(session->host->host_lock, flags);
-   if (!session->host->host_busy) { /* OK for ERL == 0 */
+   if (!atomic_read(&session->host->host_busy)) { /* OK for ERL == 
0 */
spin_unlock_irqrestore(session->host->host_lock, flags);
break;
}
@@ -2979,7 +2979,7 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
msleep_interruptible(500);
iscsi_conn_printk(KERN_INFO, conn, "iscsi conn_destroy(): "
  "host_busy %d host_failed %d\n",
- session->host->host_busy,
+ atomic_read(&session->host->host_busy),
  session->host->host_failed);
/*
 * force eh_abort() to unblock
diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index 7d02a19..24e477d 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -813,7 +813,7 @@ retry:
spin_unlock_irq(shost->host_lock);
 
SAS_DPRINTK("Enter %s busy: %d failed: %d\n",
-   __func__, shost->host_busy, shost->host_failed);
+   __func__, atomic_read(&shost->host_busy), 
shost->host_failed);
/*
 * Deal with commands that still have SAS tasks (i.e. they didn't
 * complete via the normal sas_task completion mechanism),
@@ -858,7 +858,8 @@ out:
goto retry;
 
SAS_DPRINTK("--- Exit %s: busy: %d failed: %d tries: %d\n",
-   __func__, shost->host_busy, shost->host_failed, tries);
+   __func__, atomic_read(&shost->host_busy),
+   shost->host_failed, tries);
 }
 
 enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
diff --git a/drivers/scsi/qlogicpti.c b/drivers/scsi/qlogicpti.c
index 6d48d30..740ae49 100644
--- a/drivers/scsi/qlogicpti.c
+++ b/drivers/scsi/qlogicpti.c
@@ -959,7 +959,7 @@ static inline void update_can_queue(struct Scsi_Host *host, 
u_int in_ptr, u_int
/* Temporary workaround until bug is found and fixed (one bug has been 
found
   already, but fixing it makes things even worse) -jj */
int num_free = QLOGICPTI_REQ_QUEUE_LEN - REQ_QUEUE_DEPTH(in_ptr, 
out_ptr) - 64;
-   host->can_queue = host->host_busy + num_free;
+   host->can_queue = atomic_read(&host->host_busy) + num_free;
host->sg_tablesize = QLOGICPTI_MAX_SG(num_free);
 }
 
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 6200a26..21fb97b 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -600,7 +600,7 @@ void scsi_log_completion(struct scsi_cmnd *cmd, int 
disposition)
if (level > 3)
scm

[PATCH 06/14] scsi: convert target_busy to an atomic_t

2014-07-18 Thread Christoph Hellwig
Avoid taking the host-wide host_lock to check the per-target queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if nessecary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/scsi_lib.c| 53 --
 include/scsi/scsi_device.h |  4 ++--
 2 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 112c737..0580711 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -294,7 +294,7 @@ void scsi_device_unbusy(struct scsi_device *sdev)
 
spin_lock_irqsave(shost->host_lock, flags);
shost->host_busy--;
-   starget->target_busy--;
+   atomic_dec(&starget->target_busy);
if (unlikely(scsi_host_in_recovery(shost) &&
 (shost->host_failed || shost->host_eh_scheduled)))
scsi_eh_wakeup(shost);
@@ -361,7 +361,7 @@ static inline int scsi_device_is_busy(struct scsi_device 
*sdev)
 static inline int scsi_target_is_busy(struct scsi_target *starget)
 {
return ((starget->can_queue > 0 &&
-starget->target_busy >= starget->can_queue) ||
+atomic_read(&starget->target_busy) >= starget->can_queue) ||
 starget->target_blocked);
 }
 
@@ -1279,37 +1279,50 @@ static inline int scsi_target_queue_ready(struct 
Scsi_Host *shost,
   struct scsi_device *sdev)
 {
struct scsi_target *starget = scsi_target(sdev);
-   int ret = 0;
+   unsigned int busy;
 
-   spin_lock_irq(shost->host_lock);
if (starget->single_lun) {
+   spin_lock_irq(shost->host_lock);
if (starget->starget_sdev_user &&
-   starget->starget_sdev_user != sdev)
-   goto out;
+   starget->starget_sdev_user != sdev) {
+   spin_unlock_irq(shost->host_lock);
+   return 0;
+   }
starget->starget_sdev_user = sdev;
+   spin_unlock_irq(shost->host_lock);
}
 
-   if (starget->target_busy == 0 && starget->target_blocked) {
+   busy = atomic_inc_return(&starget->target_busy) - 1;
+   if (starget->target_blocked) {
+   if (busy)
+   goto starved;
+
/*
 * unblock after target_blocked iterates to zero
 */
-   if (--starget->target_blocked != 0)
-   goto out;
+   spin_lock_irq(shost->host_lock);
+   if (--starget->target_blocked != 0) {
+   spin_unlock_irq(shost->host_lock);
+   goto out_dec;
+   }
+   spin_unlock_irq(shost->host_lock);
 
SCSI_LOG_MLQUEUE(3, starget_printk(KERN_INFO, starget,
 "unblocking target at zero depth\n"));
}
 
-   if (scsi_target_is_busy(starget)) {
-   list_move_tail(&sdev->starved_entry, &shost->starved_list);
-   goto out;
-   }
+   if (starget->can_queue > 0 && busy >= starget->can_queue)
+   goto starved;
 
-   scsi_target(sdev)->target_busy++;
-   ret = 1;
-out:
+   return 1;
+
+starved:
+   spin_lock_irq(shost->host_lock);
+   list_move_tail(&sdev->starved_entry, &shost->starved_list);
spin_unlock_irq(shost->host_lock);
-   return ret;
+out_dec:
+   atomic_dec(&starget->target_busy);
+   return 0;
 }
 
 /*
@@ -1419,7 +1432,7 @@ static void scsi_kill_request(struct request *req, struct 
request_queue *q)
spin_unlock(sdev->request_queue->queue_lock);
spin_lock(shost->host_lock);
shost->host_busy++;
-   starget->target_busy++;
+   atomic_inc(&starget->target_busy);
spin_unlock(shost->host_lock);
spin_lock(sdev->request_queue->queue_lock);
 
@@ -1589,9 +1602,7 @@ static void scsi_request_fn(struct request_queue *q)
return;
 
  host_not_ready:
-   spin_lock_irq(shost->host_lock);
-   scsi_target(sdev)->target_busy--;
-   spin_unlock_irq(shost->host_lock);
+   atomic_dec(&scsi_target(sdev)->target_busy);
  not_ready:
/*
 * lock q, handle tag, requeue req, and decrement device_busy. We
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 9aa38f7..4e078b6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -291,8 +291,8 @@ struct scsi_target {
unsigned intexpecting_lun_change:1; /* A device has reported
 * a 3F/0E UA, other devices on
 * the same target will also. */
-   /* commands actu

[PATCH 09/14] scsi: fix the {host,target,device}_blocked counter mess

2014-07-18 Thread Christoph Hellwig
Seems like these counters are missing any sort of synchronization for
updates, as a over 10 year old comment from me noted.  Fix this by
using atomic counters, and while we're at it also make sure they are
in the same cacheline as the _busy counters and not needlessly stored
to in every I/O completion.

With the new model the _busy counters can temporarily go negative,
so all the readers are updated to check for > 0 values.  Longer
term every successful I/O completion will reset the counters to zero,
so the temporarily negative values will not cause any harm.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/scsi.c| 21 +++
 drivers/scsi/scsi_lib.c| 66 +++---
 drivers/scsi/scsi_sysfs.c  | 10 ++-
 include/scsi/scsi_device.h |  7 ++---
 include/scsi/scsi_host.h   |  7 ++---
 5 files changed, 58 insertions(+), 53 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 21fb97b..3dde8a3 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -726,17 +726,16 @@ void scsi_finish_command(struct scsi_cmnd *cmd)
 
scsi_device_unbusy(sdev);
 
-/*
- * Clear the flags which say that the device/host is no longer
- * capable of accepting new commands.  These are set in scsi_queue.c
- * for both the queue full condition on a device, and for a
- * host full condition on the host.
-*
-* XXX(hch): What about locking?
- */
-shost->host_blocked = 0;
-   starget->target_blocked = 0;
-sdev->device_blocked = 0;
+   /*
+* Clear the flags that say that the device/target/host is no longer
+* capable of accepting new commands.
+*/
+   if (atomic_read(&shost->host_blocked))
+   atomic_set(&shost->host_blocked, 0);
+   if (atomic_read(&starget->target_blocked))
+   atomic_set(&starget->target_blocked, 0);
+   if (atomic_read(&sdev->device_blocked))
+   atomic_set(&sdev->device_blocked, 0);
 
/*
 * If we have valid sense information, then some kind of recovery
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1ddf0fb..69da4cb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -99,14 +99,16 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 */
switch (reason) {
case SCSI_MLQUEUE_HOST_BUSY:
-   host->host_blocked = host->max_host_blocked;
+   atomic_set(&host->host_blocked, host->max_host_blocked);
break;
case SCSI_MLQUEUE_DEVICE_BUSY:
case SCSI_MLQUEUE_EH_RETRY:
-   device->device_blocked = device->max_device_blocked;
+   atomic_set(&device->device_blocked,
+  device->max_device_blocked);
break;
case SCSI_MLQUEUE_TARGET_BUSY:
-   starget->target_blocked = starget->max_target_blocked;
+   atomic_set(&starget->target_blocked,
+  starget->max_target_blocked);
break;
}
 }
@@ -351,29 +353,35 @@ static void scsi_single_lun_run(struct scsi_device 
*current_sdev)
spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
-static inline int scsi_device_is_busy(struct scsi_device *sdev)
+static inline bool scsi_device_is_busy(struct scsi_device *sdev)
 {
-   if (atomic_read(&sdev->device_busy) >= sdev->queue_depth ||
-   sdev->device_blocked)
-   return 1;
-   return 0;
+   if (atomic_read(&sdev->device_busy) >= sdev->queue_depth)
+   return true;
+   if (atomic_read(&sdev->device_blocked) > 0)
+   return true;
+   return false;
 }
 
-static inline int scsi_target_is_busy(struct scsi_target *starget)
+static inline bool scsi_target_is_busy(struct scsi_target *starget)
 {
-   return ((starget->can_queue > 0 &&
-atomic_read(&starget->target_busy) >= starget->can_queue) ||
-starget->target_blocked);
+   if (starget->can_queue > 0 &&
+   atomic_read(&starget->target_busy) >= starget->can_queue)
+   return true;
+   if (atomic_read(&starget->target_blocked) > 0)
+   return true;
+   return false;
 }
 
-static inline int scsi_host_is_busy(struct Scsi_Host *shost)
+static inline bool scsi_host_is_busy(struct Scsi_Host *shost)
 {
-   if ((shost->can_queue > 0 &&
-atomic_read(&shost->host_busy) >= shost->can_queue) ||
-   shost->host_blocked || shost->host_self_blocked)
-   return 1;
-
-   return 0;
+   if (shost->can_queue > 0 &&
+   atomic_read(&shost->host_busy) >= shost->can_queue)
+   return true;
+   if (atomic_read(&shost->host_blocked) > 0)
+   return true;
+   if (shost->host_self_blo

[PATCH 01/14] scsi: add scsi_setup_cmnd helper

2014-07-18 Thread Christoph Hellwig
Factor out command setup code that will be shared with the blk-mq code path.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/scsi_lib.c | 40 ++--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 85cf0ef..04c3684 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1092,6 +1092,27 @@ static int scsi_setup_fs_cmnd(struct scsi_device *sdev, 
struct request *req)
return scsi_cmd_to_driver(cmd)->init_command(cmd);
 }
 
+static int scsi_setup_cmnd(struct scsi_device *sdev, struct request *req)
+{
+   struct scsi_cmnd *cmd = req->special;
+
+   if (!blk_rq_bytes(req))
+   cmd->sc_data_direction = DMA_NONE;
+   else if (rq_data_dir(req) == WRITE)
+   cmd->sc_data_direction = DMA_TO_DEVICE;
+   else
+   cmd->sc_data_direction = DMA_FROM_DEVICE;
+
+   switch (req->cmd_type) {
+   case REQ_TYPE_FS:
+   return scsi_setup_fs_cmnd(sdev, req);
+   case REQ_TYPE_BLOCK_PC:
+   return scsi_setup_blk_pc_cmnd(sdev, req);
+   default:
+   return BLKPREP_KILL;
+   }
+}
+
 static int
 scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 {
@@ -1195,24 +1216,7 @@ static int scsi_prep_fn(struct request_queue *q, struct 
request *req)
goto out;
}
 
-   if (!blk_rq_bytes(req))
-   cmd->sc_data_direction = DMA_NONE;
-   else if (rq_data_dir(req) == WRITE)
-   cmd->sc_data_direction = DMA_TO_DEVICE;
-   else
-   cmd->sc_data_direction = DMA_FROM_DEVICE;
-
-   switch (req->cmd_type) {
-   case REQ_TYPE_FS:
-   ret = scsi_setup_fs_cmnd(sdev, req);
-   break;
-   case REQ_TYPE_BLOCK_PC:
-   ret = scsi_setup_blk_pc_cmnd(sdev, req);
-   break;
-   default:
-   ret = BLKPREP_KILL;
-   }
-
+   ret = scsi_setup_cmnd(sdev, req);
 out:
return scsi_prep_return(q, req, ret);
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/14] scsi: unwind blk_end_request_all and blk_end_request_err calls

2014-07-18 Thread Christoph Hellwig
Replace the calls to the various blk_end_request variants with opencode
equivalents.  Blk-mq is using a model that gives the driver control
between the bio updates and the actual completion, and making the old
code follow that same model allows us to keep the code more similar for
both paths.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/scsi_lib.c | 61 ++---
 1 file changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a643353..8723abe 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -621,6 +621,37 @@ static void scsi_release_bidi_buffers(struct scsi_cmnd 
*cmd)
cmd->request->next_rq->special = NULL;
 }
 
+static bool scsi_end_request(struct request *req, int error,
+   unsigned int bytes, unsigned int bidi_bytes)
+{
+   struct scsi_cmnd *cmd = req->special;
+   struct scsi_device *sdev = cmd->device;
+   struct request_queue *q = sdev->request_queue;
+   unsigned long flags;
+
+
+   if (blk_update_request(req, error, bytes))
+   return true;
+
+   /* Bidi request must be completed as a whole */
+   if (unlikely(bidi_bytes) &&
+   blk_update_request(req->next_rq, error, bidi_bytes))
+   return true;
+
+   if (blk_queue_add_random(q))
+   add_disk_randomness(req->rq_disk);
+
+   spin_lock_irqsave(q->queue_lock, flags);
+   blk_finish_request(req, error);
+   spin_unlock_irqrestore(q->queue_lock, flags);
+
+   if (bidi_bytes)
+   scsi_release_bidi_buffers(cmd);
+   scsi_release_buffers(cmd);
+   scsi_next_command(cmd);
+   return false;
+}
+
 /**
  * __scsi_error_from_host_byte - translate SCSI error code into errno
  * @cmd:   SCSI command (unused)
@@ -693,7 +724,7 @@ static int __scsi_error_from_host_byte(struct scsi_cmnd 
*cmd, int result)
  *be put back on the queue and retried using the same
  *command as before, possibly after a delay.
  *
- * c) We can call blk_end_request() with -EIO to fail
+ * c) We can call scsi_end_request() with -EIO to fail
  *the remainder of the request.
  */
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
@@ -744,13 +775,9 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
 * both sides at once.
 */
req->next_rq->resid_len = scsi_in(cmd)->resid;
-
-   scsi_release_buffers(cmd);
-   scsi_release_bidi_buffers(cmd);
-
-   blk_end_request_all(req, 0);
-
-   scsi_next_command(cmd);
+   if (scsi_end_request(req, 0, blk_rq_bytes(req),
+   blk_rq_bytes(req->next_rq)))
+   BUG();
return;
}
} else if (blk_rq_bytes(req) == 0 && result && !sense_deferred) {
@@ -797,15 +824,16 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
/*
 * If we finished all bytes in the request we are done now.
 */
-   if (!blk_end_request(req, error, good_bytes))
-   goto next_command;
+   if (!scsi_end_request(req, error, good_bytes, 0))
+   return;
 
/*
 * Kill remainder if no retrys.
 */
if (error && scsi_noretry_cmd(cmd)) {
-   blk_end_request_all(req, error);
-   goto next_command;
+   if (scsi_end_request(req, error, blk_rq_bytes(req), 0))
+   BUG();
+   return;
}
 
/*
@@ -919,8 +947,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int 
good_bytes)
scsi_print_sense("", cmd);
scsi_print_command(cmd);
}
-   if (!blk_end_request_err(req, error))
-   goto next_command;
+   if (!scsi_end_request(req, error, blk_rq_err_bytes(req), 0))
+   return;
/*FALLTHRU*/
case ACTION_REPREP:
requeue:
@@ -939,11 +967,6 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
__scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY, 0);
break;
}
-   return;
-
-next_command:
-   scsi_release_buffers(cmd);
-   scsi_next_command(cmd);
 }
 
 static int scsi_init_sgtable(struct request *req, struct scsi_data_buffer *sdb,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.or

[PATCH 14/14] fnic: reject device resets without assigned tags for the blk-mq case

2014-07-18 Thread Christoph Hellwig
Current the midlayer fakes up a struct request for the explicit reset
ioctls, and those don't have a tag allocated to them.  The fnic driver pokes
into midlayer structures to paper over this design issue, but that won't
work for the blk-mq case.

Either someone who can actually test the hardware will have to come up with
a similar hack for the blk-mq case, or we'll have to bite the bullet and fix
the way the EH ioctls work for real, but until that happens we fail these
explicit requests here.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
Cc: Hiral Patel 
Cc: Suma Ramars 
Cc: Brian Uchino 
---
 drivers/scsi/fnic/fnic_scsi.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 3f88f56..961bdf5 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -2224,6 +2224,22 @@ int fnic_device_reset(struct scsi_cmnd *sc)
 
tag = sc->request->tag;
if (unlikely(tag < 0)) {
+   /*
+* XXX(hch): current the midlayer fakes up a struct
+* request for the explicit reset ioctls, and those
+* don't have a tag allocated to them.  The below
+* code pokes into midlayer structures to paper over
+* this design issue, but that won't work for blk-mq.
+*
+* Either someone who can actually test the hardware
+* will have to come up with a similar hack for the
+* blk-mq case, or we'll have to bite the bullet and
+* fix the way the EH ioctls work for real, but until
+* that happens we fail these explicit requests here.
+*/
+   if (shost_use_blk_mq(sc->device->host))
+   goto fnic_device_reset_end;
+
tag = fnic_scsi_host_start_tag(fnic, sc);
if (unlikely(tag == SCSI_NO_TAG))
goto fnic_device_reset_end;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/14] scsi: add support for a blk-mq based I/O path.

2014-07-18 Thread Christoph Hellwig
This patch adds support for an alternate I/O path in the scsi midlayer
which uses the blk-mq infrastructure instead of the legacy request code.

Use of blk-mq is fully transparent to drivers, although for now a host
template field is provided to opt out of blk-mq usage in case any unforseen
incompatibilities arise.

In general replacing the legacy request code with blk-mq is a simple and
mostly mechanical transformation.  The biggest exception is the new code
that deals with the fact the I/O submissions in blk-mq must happen from
process context, which slightly complicates the I/O completion handler.
The second biggest differences is that blk-mq is build around the concept
of preallocated requests that also include driver specific data, which
in SCSI context means the scsi_cmnd structure.  This completely avoids
dynamic memory allocations for the fast path through I/O submission.

Due the preallocated requests the MQ code path exclusively uses the
host-wide shared tag allocator instead of a per-LUN one.  This only
affects drivers actually using the block layer provided tag allocator
instead of their own.  Unlike the old path blk-mq always provides a tag,
although drivers don't have to use it.

For now the blk-mq path is disable by defauly and must be enabled using
the "use_blk_mq" module parameter.  Once the remaining work in the block
layer to make blk-mq more suitable for slow devices is complete I hope
to make it the default and eventually even remove the old code path.

Based on the earlier scsi-mq prototype by Nicholas Bellinger.

Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and
various sugestions and code contributions.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/hosts.c  |  35 +++-
 drivers/scsi/scsi.c   |   5 +-
 drivers/scsi/scsi_lib.c   | 464 --
 drivers/scsi/scsi_priv.h  |   3 +
 drivers/scsi/scsi_scan.c  |   5 +-
 drivers/scsi/scsi_sysfs.c |   2 +
 include/scsi/scsi_host.h  |  18 +-
 include/scsi/scsi_tcq.h   |  28 ++-
 8 files changed, 488 insertions(+), 72 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 0632eee..6de80e3 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -213,9 +213,24 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
goto fail;
}
 
+   if (shost_use_blk_mq(shost)) {
+   error = scsi_mq_setup_tags(shost);
+   if (error)
+   goto fail;
+   }
+
+   /*
+* Note that we allocate the freelist even for the MQ case for now,
+* as we need a command set aside for scsi_reset_provider.  Having
+* the full host freelist and one command available for that is a
+* little heavy-handed, but avoids introducing a special allocator
+* just for this.  Eventually the structure of scsi_reset_provider
+* will need a major overhaul.
+*/
error = scsi_setup_command_freelist(shost);
if (error)
-   goto fail;
+   goto out_destroy_tags;
+
 
if (!shost->shost_gendev.parent)
shost->shost_gendev.parent = dev ? dev : &platform_bus;
@@ -226,7 +241,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
 
error = device_add(&shost->shost_gendev);
if (error)
-   goto out;
+   goto out_destroy_freelist;
 
pm_runtime_set_active(&shost->shost_gendev);
pm_runtime_enable(&shost->shost_gendev);
@@ -279,8 +294,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
device_del(&shost->shost_dev);
  out_del_gendev:
device_del(&shost->shost_gendev);
- out:
+ out_destroy_freelist:
scsi_destroy_command_freelist(shost);
+ out_destroy_tags:
+   if (shost_use_blk_mq(shost))
+   scsi_mq_destroy_tags(shost);
  fail:
return error;
 }
@@ -309,8 +327,13 @@ static void scsi_host_dev_release(struct device *dev)
}
 
scsi_destroy_command_freelist(shost);
-   if (shost->bqt)
-   blk_free_tags(shost->bqt);
+   if (shost_use_blk_mq(shost)) {
+   if (shost->tag_set.tags)
+   scsi_mq_destroy_tags(shost);
+   } else {
+   if (shost->bqt)
+   blk_free_tags(shost->bqt);
+   }
 
kfree(shost->shost_data);
 
@@ -436,6 +459,8 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template 
*sht, int privsize)
else
shost->dma_boundary = 0x;
 
+   shost->use_blk_mq = scsi_use_blk_mq && !shost->hostt->disable_blk_mq;
+
device_initialize(&shost->shost_gendev);
dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
shost->shost_gendev.bus = &scsi_bus_

[PATCH 12/14] scatterlist: allow chaining to preallocated chunks

2014-07-18 Thread Christoph Hellwig
Blk-mq drivers usually preallocate their S/G list as part of the request,
but if we want to support the very large S/G lists currently supported by
the SCSI code that would tie up a lot of memory in the preallocated request
pool.  Add support to the scatterlist code so that it can initialize a
S/G list that uses a preallocated first chunks and dynamically allocated
additional chunks.  That way the scsi-mq code can preallocate a first
page worth of S/G entries as part of the request, and dynamically extend
the S/G list when needed.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/scsi_lib.c | 16 +++-
 include/linux/scatterlist.h |  6 +++---
 lib/scatterlist.c   | 25 +
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8723abe..bbd7a0a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -564,6 +564,11 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
nents, gfp_t gfp_mask)
return mempool_alloc(sgp->pool, gfp_mask);
 }
 
+static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
+{
+   __sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS, false, scsi_sg_free);
+}
+
 static int scsi_alloc_sgtable(struct scsi_data_buffer *sdb, int nents,
  gfp_t gfp_mask)
 {
@@ -572,19 +577,12 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer 
*sdb, int nents,
BUG_ON(!nents);
 
ret = __sg_alloc_table(&sdb->table, nents, SCSI_MAX_SG_SEGMENTS,
-  gfp_mask, scsi_sg_alloc);
+  NULL, gfp_mask, scsi_sg_alloc);
if (unlikely(ret))
-   __sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS,
-   scsi_sg_free);
-
+   scsi_free_sgtable(sdb);
return ret;
 }
 
-static void scsi_free_sgtable(struct scsi_data_buffer *sdb)
-{
-   __sg_free_table(&sdb->table, SCSI_MAX_SG_SEGMENTS, scsi_sg_free);
-}
-
 /*
  * Function:scsi_release_buffers()
  *
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index a964f72..f4ec8bb 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -229,10 +229,10 @@ void sg_init_one(struct scatterlist *, const void *, 
unsigned int);
 typedef struct scatterlist *(sg_alloc_fn)(unsigned int, gfp_t);
 typedef void (sg_free_fn)(struct scatterlist *, unsigned int);
 
-void __sg_free_table(struct sg_table *, unsigned int, sg_free_fn *);
+void __sg_free_table(struct sg_table *, unsigned int, bool, sg_free_fn *);
 void sg_free_table(struct sg_table *);
-int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int, gfp_t,
-sg_alloc_fn *);
+int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
+struct scatterlist *, gfp_t, sg_alloc_fn *);
 int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
 int sg_alloc_table_from_pages(struct sg_table *sgt,
struct page **pages, unsigned int n_pages,
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 3a8e8e8..b4415fc 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -165,6 +165,7 @@ static void sg_kfree(struct scatterlist *sg, unsigned int 
nents)
  * __sg_free_table - Free a previously mapped sg table
  * @table: The sg table header to use
  * @max_ents:  The maximum number of entries per single scatterlist
+ * @skip_first_chunk: don't free the (preallocated) first scatterlist chunk
  * @free_fn:   Free function
  *
  *  Description:
@@ -174,7 +175,7 @@ static void sg_kfree(struct scatterlist *sg, unsigned int 
nents)
  *
  **/
 void __sg_free_table(struct sg_table *table, unsigned int max_ents,
-sg_free_fn *free_fn)
+bool skip_first_chunk, sg_free_fn *free_fn)
 {
struct scatterlist *sgl, *next;
 
@@ -202,7 +203,10 @@ void __sg_free_table(struct sg_table *table, unsigned int 
max_ents,
}
 
table->orig_nents -= sg_size;
-   free_fn(sgl, alloc_size);
+   if (!skip_first_chunk) {
+   free_fn(sgl, alloc_size);
+   skip_first_chunk = false;
+   }
sgl = next;
}
 
@@ -217,7 +221,7 @@ EXPORT_SYMBOL(__sg_free_table);
  **/
 void sg_free_table(struct sg_table *table)
 {
-   __sg_free_table(table, SG_MAX_SINGLE_ALLOC, sg_kfree);
+   __sg_free_table(table, SG_MAX_SINGLE_ALLOC, false, sg_kfree);
 }
 EXPORT_SYMBOL(sg_free_table);
 
@@ -241,8 +245,8 @@ EXPORT_SYMBOL(sg_free_table);
  *
  **/
 int __sg_alloc_table(struct sg_table *table, unsigned int nents,
-unsigned int max_ents, gfp_t gfp_mask,
-sg_alloc_fn *alloc_fn)
+unsigned int max_ents, struct scatterlist *first_chun

[PATCH 10/14] scsi: only maintain target_blocked if the driver has a target queue limit

2014-07-18 Thread Christoph Hellwig
This saves us an atomic operation for each I/O submission and completion
for the usual case where the driver doesn't set a per-target can_queue
value.  Only a few iscsi hardware offload drivers set the per-target
can_queue value at the moment.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/scsi_lib.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 69da4cb..a643353 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -295,7 +295,8 @@ void scsi_device_unbusy(struct scsi_device *sdev)
unsigned long flags;
 
atomic_dec(&shost->host_busy);
-   atomic_dec(&starget->target_busy);
+   if (starget->can_queue > 0)
+   atomic_dec(&starget->target_busy);
 
if (unlikely(scsi_host_in_recovery(shost) &&
 (shost->host_failed || shost->host_eh_scheduled))) {
@@ -364,11 +365,12 @@ static inline bool scsi_device_is_busy(struct scsi_device 
*sdev)
 
 static inline bool scsi_target_is_busy(struct scsi_target *starget)
 {
-   if (starget->can_queue > 0 &&
-   atomic_read(&starget->target_busy) >= starget->can_queue)
-   return true;
-   if (atomic_read(&starget->target_blocked) > 0)
-   return true;
+   if (starget->can_queue > 0) {
+   if (atomic_read(&starget->target_busy) >= starget->can_queue)
+   return true;
+   if (atomic_read(&starget->target_blocked) > 0)
+   return true;
+   }
return false;
 }
 
@@ -1309,6 +1311,9 @@ static inline int scsi_target_queue_ready(struct 
Scsi_Host *shost,
spin_unlock_irq(shost->host_lock);
}
 
+   if (starget->can_queue <= 0)
+   return 1;
+
busy = atomic_inc_return(&starget->target_busy) - 1;
if (atomic_read(&starget->target_blocked) > 0) {
if (busy)
@@ -1324,7 +1329,7 @@ static inline int scsi_target_queue_ready(struct 
Scsi_Host *shost,
 "unblocking target at zero depth\n"));
}
 
-   if (starget->can_queue > 0 && busy >= starget->can_queue)
+   if (busy >= starget->can_queue)
goto starved;
 
return 1;
@@ -1334,7 +1339,8 @@ starved:
list_move_tail(&sdev->starved_entry, &shost->starved_list);
spin_unlock_irq(shost->host_lock);
 out_dec:
-   atomic_dec(&starget->target_busy);
+   if (starget->can_queue > 0)
+   atomic_dec(&starget->target_busy);
return 0;
 }
 
@@ -1455,7 +1461,8 @@ static void scsi_kill_request(struct request *req, struct 
request_queue *q)
 */
atomic_inc(&sdev->device_busy);
atomic_inc(&shost->host_busy);
-   atomic_inc(&starget->target_busy);
+   if (starget->can_queue > 0)
+   atomic_inc(&starget->target_busy);
 
blk_complete_request(req);
 }
@@ -1624,7 +1631,8 @@ static void scsi_request_fn(struct request_queue *q)
return;
 
  host_not_ready:
-   atomic_dec(&scsi_target(sdev)->target_busy);
+   if (scsi_target(sdev)->can_queue > 0)
+   atomic_dec(&scsi_target(sdev)->target_busy);
  not_ready:
/*
 * lock q, handle tag, requeue req, and decrement device_busy. We
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/14] scsi: split __scsi_queue_insert

2014-07-18 Thread Christoph Hellwig
Factor out a helper to set the _blocked values, which we'll reuse for the
blk-mq code path.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Webb Scales 
Acked-by: Jens Axboe 
Tested-by: Bart Van Assche 
Tested-by: Robert Elliott 
---
 drivers/scsi/scsi_lib.c | 44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 04c3684..3ac677c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -75,28 +75,12 @@ struct kmem_cache *scsi_sdb_cache;
  */
 #define SCSI_QUEUE_DELAY   3
 
-/**
- * __scsi_queue_insert - private queue insertion
- * @cmd: The SCSI command being requeued
- * @reason:  The reason for the requeue
- * @unbusy: Whether the queue should be unbusied
- *
- * This is a private queue insertion.  The public interface
- * scsi_queue_insert() always assumes the queue should be unbusied
- * because it's always called before the completion.  This function is
- * for a requeue after completion, which should only occur in this
- * file.
- */
-static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
+static void
+scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 {
struct Scsi_Host *host = cmd->device->host;
struct scsi_device *device = cmd->device;
struct scsi_target *starget = scsi_target(device);
-   struct request_queue *q = device->request_queue;
-   unsigned long flags;
-
-   SCSI_LOG_MLQUEUE(1, scmd_printk(KERN_INFO, cmd,
-   "Inserting command %p into mlqueue\n", cmd));
 
/*
 * Set the appropriate busy bit for the device/host.
@@ -123,6 +107,30 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int 
reason, int unbusy)
starget->target_blocked = starget->max_target_blocked;
break;
}
+}
+
+/**
+ * __scsi_queue_insert - private queue insertion
+ * @cmd: The SCSI command being requeued
+ * @reason:  The reason for the requeue
+ * @unbusy: Whether the queue should be unbusied
+ *
+ * This is a private queue insertion.  The public interface
+ * scsi_queue_insert() always assumes the queue should be unbusied
+ * because it's always called before the completion.  This function is
+ * for a requeue after completion, which should only occur in this
+ * file.
+ */
+static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
+{
+   struct scsi_device *device = cmd->device;
+   struct request_queue *q = device->request_queue;
+   unsigned long flags;
+
+   SCSI_LOG_MLQUEUE(1, scmd_printk(KERN_INFO, cmd,
+   "Inserting command %p into mlqueue\n", cmd));
+
+   scsi_set_blocked(cmd, reason);
 
/*
 * Decrement the counters, since these commands are no longer
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


scsi-mq V4

2014-07-18 Thread Christoph Hellwig
At this point the code is ready for merging and use by developers and early
adopters.  Except for the newly added first patch all have been thru multiple
review cycles and I would like to merge the series early next week assuming
I can get reviews for this.  Please scream loud if you see any reason not
to merge it now.

The core blk-mq code isn't that suitable for slow devices
yet, mostly due to the lack of an I/O scheduler, but Jens is working on it.
Similarly there is no dm-multipath support for drivers using blk-mq yet,
but I'm working on it.  It should also be noted that the code doesn't
actually support multiple hardware queues or fine grained tuning of the
blk-mq parameters yet.  All these could be added fairly easily as soon
as low-level drivers want to make use of them.

The amount of chances to the existing code are fairly small, and mostly
speedups or cleanups that also apply to the old path as well.  Because
of this I also haven't bothered to put it under a config option, just
like the blk-mq core.

The usage of blk-mq dramatically decreases CPU usage under all workloads going
down from 100% CPU usage that the old setup can hit easily to usually less
than 20% for maxing out storage subsystems with 512byte reads and writes,
and it allows to easily archive millions of IOPS.  Bart and Robert have
helped with some very detailed measurements that they might be able to send
in reply to this, although these usually involve significantly reworked low
level drivers to avoid other bottle necks.

One major objection to previous iterations of this code was the simple
replacement of the host_lock with atomic counters for the host and busy
counters.  The host_lock avoidance on it's own already improves performance,
and with the patch to avoid maintaining the per-target busy counter unless
needed we now replace a lock round trip on the host_lock with just a single
atomic increment in the submission path, and a single atomic decrement in
completion path, which should provide benefits even for the oddest RISC
architecture.  Longer term I'd still love to get rid of these entirely
and use the counters in blk-mq, but due to the difference in how they
are maintained this doesn't seem feasible as long as we still need to
support the legacy request code path.

Changes from V3:
 - micro optimize the scsi_*_queue_ready functions (Webb Scales)
 - reverted an uninited but harmless transformation in
   scsi_host_queue_ready (Reported by Webb Scales)
 - remove a superflous cancel_delayed_work (Reported by Mike Christie)
 - fix for error handling during failed host initialization
   (Reported by Robert Elliot)

Changes from V2:
 - rebased on top of the I/O path cleanups

Changes from V1:
 - rebased on top of the core-for-3.17 branch, most notable the
   scsi logging changes
 - fixed handling of cmd_list to prevent crashes for some heavy
   workloads
 - fixed incorrect handling of !target->can_queue
 - avoid scheduling a workqueue on I/O completions when no queues
   are congested

In addition to the patches in this thread there also is a git available at:

git://git.infradead.org/users/hch/scsi.git scsi-mq.4

This work was sponsored by the ION division of Fusion IO.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] scsi patch queue tree updated

2014-07-18 Thread Christoph Hellwig
I've pushed out new version of the for-3.17 core and drivers trees:

  git://git.infradead.org/users/hch/scsi-queue.git core-for-3.17
  git://git.infradead.org/users/hch/scsi-queue.git drivers-for-3.17

In the core tree the biggest update is the merge of the I/O path
cleanups, in addition various individual patches were merged, and
the a few fixes for the 64-bit LUN series were squashed into the
patches.

The drivers side gained various updates, but there's still lots of
driver work on the list that needs more reviews.

I did a rebase to Linus' latest tree to pick up the fix for the aio
regression to help people doing IOPS testing.

Note that I've also pushed out a core-for-3.16 tree with James' flush
fix in case James wants to push this for 3.16 still; but this late
in the release cycles it might be better to wait for 3.17 for this as
well.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] mpt3sas, mpt2sas: fix scsi_add_host error handling problems in _scsih_probe

2014-07-18 Thread Christoph Hellwig
On Thu, Jul 17, 2014 at 02:39:30PM -0500, Robert Elliott wrote:
> In _scsih_probe, propagate the return value from scsi_add_host.
> In mpt3sas, avoid calling list_del twice if that returns an
> error, which causes list_del corruption warnings if an error
> is returned.
> 
> Tested with blk-mq and scsi-mq patches to properly cleanup
> from and propagate blk_mq_init_rq_map errors.
> 
> Signed-off-by: Robert Elliott 

Looks good.  It would be even better if we'd have proper error values
returned from mpt2/3sas_base_attach as well.

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] blk-mq: pass along blk_mq_alloc_tag_set return values

2014-07-18 Thread Christoph Hellwig
On Thu, Jul 17, 2014 at 02:39:21PM -0500, Robert Elliott wrote:
> Two of the blk-mq based drivers do not pass back the return value
> from blk_mq_alloc_tag_set, instead just returning -ENOMEM.
> 
> blk_mq_alloc_tag_set returns -EINVAL if the number of queues or
> queue depth is bad.  -ENOMEM implies that retrying after freeing some
> memory might be more successful, but that won't ever change
> in the -EINVAL cases.
> 
> Change the null_blk and mtip32xx drivers to pass along
> the return value.
> 
> Signed-off-by: Robert Elliott 

Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] blk-mq: cleanup after blk_mq_init_rq_map failures

2014-07-18 Thread Christoph Hellwig
On Thu, Jul 17, 2014 at 02:39:09PM -0500, Robert Elliott wrote:
> In blk-mq.c blk_mq_alloc_tag_set, if:
>   set->tags = kmalloc_node()
> succeeds, but one of the blk_mq_init_rq_map() calls fails,
>   goto out_unwind;
> needs to free set->tags so the caller is not obligated
> to do so.  None of the current callers (null_blk,
> virtio_blk, virtio_blk, or the forthcoming scsi-mq)
> do so.
> 
> set->tags needs to be set to NULL after doing so,
> so other tag cleanup logic doesn't try to free
> a stale pointer later.  Also set it to NULL
> in blk_mq_free_tag_set.
> 
> Tested with error injection on the forthcoming
> scsi-mq + hpsa combination.
> 
> Signed-off-by: Robert Elliott 

Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html