from:"James Bottomley"

Re: [PATCH] scsi: prevent stack buffer overflow in host_reset

2012-11-30 Thread James Bottomley

On Thu, 2012-11-15 at 15:51 -0500, Sasha Levin wrote:
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index ce5224c..77ba946 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -249,9 +249,9 @@ static DEVICE_ATTR(active_mode, S_IRUGO | S_IWUSR, 
> show_shost_active_mode, NULL)
>  
>  static int check_reset_type(char *str)
>  {
> - if (strncmp(str, "adapter", 10) == 0)
> + if (sysfs_streq(str, "adapter"))
>   return SCSI_ADAPTER_RESET;
> - else if (strncmp(str, "firmware", 10) == 0)
> + else if (sysfs_streq(str, "firmware"))
>   return SCSI_FIRMWARE_RESET;
>   else
>   return 0;
> @@ -264,12 +264,9 @@ store_host_reset(struct device *dev, struct 
> device_attribute *attr,
>   struct Scsi_Host *shost = class_to_shost(dev);
>   struct scsi_host_template *sht = shost->hostt;
>   int ret = -EINVAL;
> - char str[10];
>   int type;
>  
> - sscanf(buf, "%s", str);
> - type = check_reset_type(str);
> -
> + type = check_reset_type(buf);
>   if (!type)
>   goto exit_store_host_reset;

You didn't compile test this:

  CC [M]  drivers/scsi/scsi_sysfs.o
drivers/scsi/scsi_sysfs.c: In function ‘store_host_reset’:
drivers/scsi/scsi_sysfs.c:269:2: warning: passing argument 1 of 
‘check_reset_type’ discards ‘const’ qualifier from pointer target type [enabled 
by default]
drivers/scsi/scsi_sysfs.c:250:12: note: expected ‘char *’ but argument is of 
type ‘const char *’

It needs a const in check_reset_type().

I've added it.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] arcmsr: Modify ARC-1214 IO Behavior

2012-11-30 Thread James Bottomley

On Fri, 2012-11-16 at 19:56 +0800, NickCheng wrote:
> From: Nick Cheng 
> 
> Modify ARC-1214 IO behavior to make up for HW seldom malfunction.
> Signed-off-by: Nick Cheng 

This still isn't right.  I said you could ignore most of the warnings
(like lines over 80 characters), but you have a lot of this:

> - printk(KERN_NOTICE "arcmsr%d: memory mapping region 
> fail \n", =
> acb->host->host_no);
> + pr_notice("arcmsr%d: memory mapping"
> + "region fail\n", acb->host->host_no);

I'm agnostic on whether you split a print, in spite of the warning but I
do tend to prefer longer lines, since they avoid the mistake you made
quite a few times, which is not leaving a space between the separate
words in the split string.  This will produce a message like

arcmsr1: memory mappingregion fail

Plus this indentation

> - if (readl(&acb->pmuC->outbound_doorbell) & 
> ARCMSR_HBCMU_IOP2DRV_MESSAGE_CMD_DONE) {
> - 
> writel(ARCMSR_HBCMU_IOP2DRV_MESSAGE_CMD_DONE_DOORBELL_CLEAR, 
> &acb->pmuC->outbound_doorbell_clear);/*clear interrupt*/=0A=
> + if (readl(&acb->pmuC->outbound_doorbell) &
> + ARCMSR_HBCMU_IOP2DRV_MESSAGE_CMD_DONE) {
>+  writel(ARCMSR_HBCMU_IOP2DRV_MESSAGE_CMD_DONE_DOORBELL_CLEAR,
> + &acb->pmuC->outbound_doorbell_clear);

Is being changed from correctly indented to incorrectly indented (writel
needs another tab).

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86/efi: pull NV+BS variables out before we exit boot services

2013-03-18 Thread James Bottomley

From: James Bottomley 

The object here is to make the NV+BS variables accessible (at least read only)
at runtime so we can get a full picture of the state of the EFI variables for
debugging and secure boot purposes.

The way this is done is to get the efi stub to pull all the NV+BS
(i.e. variables without the RT flag) out into setup_data records which can
then be read back in at runtime.  the EFI calls get_next_variable() and
get_variable() are modified to run through these setup records when the real
calls fail.

Signed-off-by: James Bottomley 

---

This same mechanism should work for all architectures, but the method of
transferring information from boot to runtime differs, so it can't be
done generically

James

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index c205035..62e 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -251,6 +251,85 @@ static void find_bits(unsigned long mask, u8 *pos, u8 
*size)
*size = len;
 }
 
+static efi_status_t setup_efi_bootvars(struct boot_params *params)
+{
+   const int len = 1024;
+   efi_char16_t *str;
+   efi_status_t status;
+   struct setup_data **data = (struct setup_data 
**)¶ms->hdr.setup_data;
+
+   status = efi_call_phys3(sys_table->boottime->allocate_pool,
+   EFI_LOADER_DATA, len, &str);
+   if (status != EFI_SUCCESS)
+   return status;
+
+   /*
+* Note: this trick only works on Little Endian if hdr.setup_data
+* is u64 on both 64 and 32 bit
+*/
+   while (*data)
+   data = (struct setup_data **)&(*data)->next;
+
+   memset(str, 0, len);
+
+   for (;;) {
+   unsigned long size = len, str_len;
+   efi_guid_t guid;
+   struct efi_setup_bootvars *bvs;
+   unsigned int attributes;
+
+   status = efi_call_phys3(sys_table->runtime->get_next_variable,
+   &size, str, &guid);
+   if (status != EFI_SUCCESS)
+   break;
+
+   str_len = size;
+   size = 0;
+   status = efi_call_phys5(sys_table->runtime->get_variable,
+   str, &guid, &attributes, &size, NULL);
+   if (status != EFI_SUCCESS && status != EFI_BUFFER_TOO_SMALL)
+   break;
+
+
+   /*
+* Please don't be tempted to optimise here: some
+* UEFI implementations fail to set attributes if
+* they return any error (including EFI_BUFFER_TOO_SMALL)
+*/
+   status = efi_call_phys3(sys_table->boottime->allocate_pool,
+   EFI_LOADER_DATA,
+   size + sizeof(*bvs) + str_len,
+   &bvs);
+   if (status != EFI_SUCCESS)
+   continue;
+   memset(bvs, 0 , sizeof(bvs));
+   status = efi_call_phys5(sys_table->runtime->get_variable,
+   str, &guid, &attributes,
+   &size, bvs->data);
+
+   if (status != EFI_SUCCESS
+   || (attributes & 0x07) != (EFI_VARIABLE_BOOTSERVICE_ACCESS |
+  EFI_VARIABLE_NON_VOLATILE)) {
+   efi_call_phys1(sys_table->boottime->free_pool, bvs);
+   continue;
+   }
+   memcpy(bvs->data + size, str, str_len);
+   bvs->name_size = str_len;
+   bvs->s_d.type = SETUP_EFIVAR;
+   bvs->s_d.len = size + sizeof(*bvs) + str_len;
+   bvs->guid = guid;
+   bvs->size = size;
+   bvs->attributes = attributes;
+   *data = (struct setup_data *)bvs;
+   data = (struct setup_data **)&(*data)->next;
+   }
+   *data = NULL;
+   status = EFI_SUCCESS;
+ out:
+   efi_call_phys1(sys_table->boottime->free_pool, str);
+   return status;
+}
+
 static efi_status_t setup_efi_pci(struct boot_params *params)
 {
efi_pci_io_protocol *pci;
@@ -1159,6 +1238,8 @@ struct boot_params *efi_main(void *handle, 
efi_system_table_t *_table,
 
setup_efi_pci(boot_params);
 
+   setup_efi_bootvars(boot_params);
+
status = efi_call_phys3(sys_table->boottime->allocate_pool,
EFI_LOADER_DATA, sizeof(*gdt),
(void **)&gdt);
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 60c89f3..2dadecd 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -122,4 +122,13 @@ sta

Re: [PATCH] [SCSI]: print the msgbytes and statusbyte from scsi result

2013-03-18 Thread James Bottomley

On Sun, 2013-03-17 at 17:29 +0900, Namjae Jeon wrote:
> From: Namjae Jeon 
> 
> Introduce msgbyte and statusbyte in the prints as part of the
> result which is returned by the lower layer driver in response to
> SCSI command issued, in case of any error conditions.
> 
> Purpose of adding these prints is to convey, during any I/O
> error case, which condition exactly has happened in lower device and
> from the prints we can directly deduce, what is the status of command
> issued. This will help to quickly debug the scenario and also making
> a test case to create new scenarios.
> 
> Also change the printk to more appropriate pr_* macro.
> 
> Signed-off-by: Namjae Jeon 
> Signed-off-by: Amit Sahrawat 
> ---
>  drivers/scsi/constants.c |6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c
> index 76e4c03..77bb1dc 100644
> --- a/drivers/scsi/constants.c
> +++ b/drivers/scsi/constants.c
> @@ -1445,8 +1445,10 @@ void scsi_show_result(int result)
>  
>  void scsi_show_result(int result)
>  {
> - printk("Result: hostbyte=0x%02x driverbyte=0x%02x\n",
> -host_byte(result), driver_byte(result));
> + pr_info("Result: hostbyte=0x%02x driverbyte=0x%02x"
> + "msgbyte=0x%02x statusbyte=0x%02x\n",
> +host_byte(result), driver_byte(result), msg_byte(result),
> + status_byte(result));

You didn't test this, did you? If you did, you'd have noticed the change
from printk to pr_info gives you an unwanted "6" in the message.

Also, what are you hoping to achieve? scsi_show_result() is only used by
sd in a very few special command situations.  I can't believe the msg
byte would be anything other than zero and the status byte check
condition.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/efi: pull NV+BS variables out before we exit boot services

2013-03-19 Thread James Bottomley

On Tue, 2013-03-19 at 01:48 +, Matthew Garrett wrote:
> On Mon, Mar 18, 2013 at 08:40:14AM +0000, James Bottomley wrote:
> 
> > The object here is to make the NV+BS variables accessible (at least read 
> > only)
> > at runtime so we can get a full picture of the state of the EFI variables 
> > for
> > debugging and secure boot purposes.
> 
> I'd really prefer not to do this - the reason these aren't flagged as RT 
> is that they're not supposed to be visible at runtime and may break 
> certain security assumptions.

I suppose they could be flagged as read only by root, but I haven't seen
anything non-innocuous in them yet.  It's mostly shell redirects and a
bit more useful information about the secure boot configuration.

Any security assumptions that rely on inability to read certain
information aren't really going to be that secure.  Inability to modify,
sure, but inability to read, not really.

>  If there's a real development purpose to 
> this then it ought to be guarded as a config option.

I've no objection to them going under the secure boot lockdown config
option if that's what you're thinking.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V5 1/5] virtio-scsi: redo allocation of target data

2013-03-19 Thread James Bottomley

On Tue, 2013-03-19 at 17:57 +0800, Wanlong Gao wrote:
> From: Paolo Bonzini 
> 
> virtio_scsi_target_state is now empty.  We will find new uses for it in
> the next few patches, so this patch does not drop it completely.
> However, having dropped the sglist flexible array member, we can turn
> the tgt array-of-pointers into a simple array.  This simplifies the
> allocation.
> 
> Even simpler would be to place the virtio_scsi_target_state structs in a
> flexible array member at the end of struct virtio_scsi.  But we do not
> do that, because we will place the virtqueues there in the next patches.

I'm really sorry, but I must have been asleep at the wheel when I let
code like this go in.  No modern driver should have fixed arrays for
target information.  The way this is supposed to work is that you have
entries in the host template for target_alloc and target_destroy.  You
hook into these and attach your struct virtio_scsi_target_state to
scsi_target->hostdata, which you kmalloc in the target_alloc routine and
kfree in the target_destroy routine.  Now you get at it from the sdev
with scsi_target(sdev)->hostdata. No messing around with fixed size
arrays and bulk memory allocation and no need to pass in the maximum
target size as a parameter because everything should now happen
dynamically.

Since you're redoing the code anyway, can you fix it to work this way?

Thanks,

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V5 1/5] virtio-scsi: redo allocation of target data

2013-03-19 Thread James Bottomley

On Tue, 2013-03-19 at 12:45 +0100, Paolo Bonzini wrote:
> Il 19/03/2013 12:32, James Bottomley ha scritto:
> > On Tue, 2013-03-19 at 17:57 +0800, Wanlong Gao wrote:
> >> From: Paolo Bonzini 
> >>
> >> virtio_scsi_target_state is now empty.  We will find new uses for it in
> >> the next few patches, so this patch does not drop it completely.
> >> However, having dropped the sglist flexible array member, we can turn
> >> the tgt array-of-pointers into a simple array.  This simplifies the
> >> allocation.
> >>
> >> Even simpler would be to place the virtio_scsi_target_state structs in a
> >> flexible array member at the end of struct virtio_scsi.  But we do not
> >> do that, because we will place the virtqueues there in the next patches.
> > 
> > I'm really sorry, but I must have been asleep at the wheel when I let
> > code like this go in.  No modern driver should have fixed arrays for
> > target information.  The way this is supposed to work is that you have
> > entries in the host template for target_alloc and target_destroy.  You
> > hook into these and attach your struct virtio_scsi_target_state to
> > scsi_target->hostdata,
> 
> So that would be sc->device->sdev_target->hostdata.

No, unfortunate name, but it's used for something else (actually, I
think it *was* used by something else and is unused now).  The construct
is

scsi_target(sc->device)->hostdata

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/efi: pull NV+BS variables out before we exit boot services

2013-03-19 Thread James Bottomley

On Tue, 2013-03-19 at 16:35 +, Matthew Garrett wrote:
> On Tue, Mar 19, 2013 at 08:14:45AM +0000, James Bottomley wrote:
> 
> > Any security assumptions that rely on inability to read certain
> > information aren't really going to be that secure.  Inability to modify,
> > sure, but inability to read, not really.
> 
> Well, I guess that's public/private key cryptography screwed.

Well, OK, it's ex-BIOS writers we're dealing with, so I won't say no-one
would be stupid enough to come up with a security scheme embedding
Private Keys in BS+NV variables, but I would have thought the fact that
Linux would blow the lid off it might be a good incentive not to do it
and thus a plus point for this patch.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/efi: pull NV+BS variables out before we exit boot services

2013-03-19 Thread James Bottomley

On Tue, 2013-03-19 at 17:25 +, Matthew Garrett wrote:
> On Tue, Mar 19, 2013 at 05:17:27PM +0000, James Bottomley wrote:
> > On Tue, 2013-03-19 at 16:35 +, Matthew Garrett wrote:
> > > On Tue, Mar 19, 2013 at 08:14:45AM +0000, James Bottomley wrote:
> > > 
> > > > Any security assumptions that rely on inability to read certain
> > > > information aren't really going to be that secure.  Inability to modify,
> > > > sure, but inability to read, not really.
> > > 
> > > Well, I guess that's public/private key cryptography screwed.
> > 
> > Well, OK, it's ex-BIOS writers we're dealing with, so I won't say no-one
> > would be stupid enough to come up with a security scheme embedding
> > Private Keys in BS+NV variables, but I would have thought the fact that
> > Linux would blow the lid off it might be a good incentive not to do it
> > and thus a plus point for this patch.
> 
> The hibernation scheme we'd discussed involved having the first stage 
> loader generating a keypair and handing half of it to the OS for 
> encryption of the hibernation partition, then handing the other half to 
> the OS on the next boot so it can decrypt it. That requires non-RT 
> variables to be restricted from OS visibility.

The scheme we discussed, unless something radically changed, was to
convey a temporary key pair via a mechanism to later verify the
hybernate kernel on a resume.  That only requires reboot safe knowledge
of the public key.  The private key can be conveyed in BS only (not NV),
and should be consumed (as in deleted) by the OS when it receives it, so
it wouldn't be exposed by this patch.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/efi: pull NV+BS variables out before we exit boot services

2013-03-19 Thread James Bottomley

On Tue, 2013-03-19 at 18:28 +, Matthew Garrett wrote:
> On Tue, Mar 19, 2013 at 06:23:31PM +0000, James Bottomley wrote:
> 
> > The scheme we discussed, unless something radically changed, was to
> > convey a temporary key pair via a mechanism to later verify the
> > hybernate kernel on a resume.  That only requires reboot safe knowledge
> > of the public key.  The private key can be conveyed in BS only (not NV),
> > and should be consumed (as in deleted) by the OS when it receives it, so
> > it wouldn't be exposed by this patch.
> 
> It requires the key to survive the system being entirely powered down, 
> which means it needs to be BS+NV. It shouldn't be possible for userspace 
> to access this key.

It requires the *public* key to survive power down, certainly.  The
private key can be thrown away once the hibernate image is signed.  I
think the scheme can be constructed so the private key is never in NV
storage ... that also makes it more secure against tampering.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/efi: pull NV+BS variables out before we exit boot services

2013-03-19 Thread James Bottomley

On Tue, 2013-03-19 at 18:50 +, Matthew Garrett wrote:
> On Tue, Mar 19, 2013 at 06:40:56PM +0000, James Bottomley wrote:
> > On Tue, 2013-03-19 at 18:28 +, Matthew Garrett wrote:
> > > It requires the key to survive the system being entirely powered down, 
> > > which means it needs to be BS+NV. It shouldn't be possible for userspace 
> > > to access this key.
> > 
> > It requires the *public* key to survive power down, certainly.  The
> > private key can be thrown away once the hibernate image is signed.  I
> > think the scheme can be constructed so the private key is never in NV
> > storage ... that also makes it more secure against tampering.
> 
> Well, that somewhat complicates implementation - we'd be encrypting the 
> entire contents of memory except for the key that we're using to encrypt 
> memory. Keeping the public key away from userspace avoids having to care 
> about that.

I don't quite understand what you're getting at: the principle of public
key cryptography is that you can make the public key, well public.  You
only need to guard the private key.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86/efi: pull NV+BS variables out before we exit boot services

2013-03-20 Thread James Bottomley

On Tue, 2013-03-19 at 23:17 +, Matthew Garrett wrote:
> On Tue, Mar 19, 2013 at 11:00:31PM +0000, James Bottomley wrote:
> > On Tue, 2013-03-19 at 18:50 +, Matthew Garrett wrote:
> > > Well, that somewhat complicates implementation - we'd be encrypting the 
> > > entire contents of memory except for the key that we're using to encrypt 
> > > memory. Keeping the public key away from userspace avoids having to care 
> > > about that.
> > 
> > I don't quite understand what you're getting at: the principle of public
> > key cryptography is that you can make the public key, well public.  You
> > only need to guard the private key.
> 
> Ok, so let's just rephrase it as asymmetric cryptography. The aim is to 
> ensure that there's never a situation where userspace can decrypt a 
> hibernation file, modify it and reencrypt it. So, shim (or whatever) 
> generates a keypair. The encryption key is passed to the kernel being 
> booted. The decryption key is stashed in a variable in order to be 
> passed to the resume kernel.
> 
> If the decryption key is available to userspace then the kernel needs to 
> discard the encryption key during image write-out - otherwise the 
> encryption key will be in the encrypted image. If the decryption key 
> isn't available to userspace then this isn't a concern.
> 
> If the decryption key *is* available to userspace (as it would be in 
> your case), there's a requirement to discard the encryption key during 
> the hibernation process. This isn't impossible, but it does add a little 
> to the complexity.

I agree with this.  But I do think the volatile secret key scheme, where
you discard the key immediately after use is the more secure one because
it relies on fewer secrets (and, indeed, no secrets at all after the
event).  It's a case where transparency encourages better security
architecture.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RESEND 1/1] arch Kconfig: remove references to IRQ_PER_CPU

2013-02-04 Thread James Bottomley

On Mon, 2013-02-04 at 10:09 +, James Hogan wrote:
> The IRQ_PER_CPU Kconfig symbol was removed in the following commit:
> 
> Commit 6a58fb3bad099076f36f0f30f44507bc3275cdb6 ("genirq: Remove
> CONFIG_IRQ_PER_CPU") merged in v2.6.39-rc1.
> 
> But IRQ_PER_CPU wasn't removed from any of the architecture Kconfig
> files where it was defined or selected. It's completely unused so remove
> the remaining references.
> 
> Signed-off-by: James Hogan 
> Cc: Thomas Gleixner 
> Cc: Mike Frysinger 
> Cc: Fenghua Yu 
> Cc: Ralf Baechle 
> Cc: "James E.J. Bottomley" 
> Cc: Helge Deller 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Paul Mundt 
> Acked-by: Tony Luck 
> Acked-by: Richard Kuo 

For what it's worth ACK, but I don't really think you need it since the
patch is trivial and obviously correct.

> 
> Does anybody want to pick this patch up?

I see Thomas already has.  Thanks, by the way, for not doing this as one
patch per architecture ...

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: scanning for LUNs

2013-04-04 Thread James Bottomley

On Thu, 2013-04-04 at 08:12 -0700, K. Y. Srinivasan wrote:
> Here is the code snippet for scanning LUNS (drivers/scsi/scsi_scan.c in 
> function 
> __scsi_scan_target()):
> 
> /*
>  * Scan LUN 0, if there is some response, scan further. Ideally, we
>  * would not configure LUN 0 until all LUNs are scanned.
>  */
> res = scsi_probe_and_add_lun(starget, 0, &bflags, NULL, rescan, NULL);
> if (res == SCSI_SCAN_LUN_PRESENT || res == SCSI_SCAN_TARGET_PRESENT) {
> if (scsi_report_lun_scan(starget, bflags, rescan) != 0)
>  
> 
> So, if we don't get a response while scanning LUN0, we will not use
> scsi_report_lun_scan().
> On Hyper-V, the scsi emulation on the host does not treat LUN0 as
> anything special and we 
> could have situations where the only device under a scsi controller is
> at a location other than 0
> or 1. In this case the standard LUN scanning code in Linux fails to
> detect this device. Is this
> behaviour expected? Why is LUN0 treated differently here. Looking at
> the scsi spec, I am not sure
> if this is what is specified. Any help/guidance will be greatly
> appreciated.

Why don't you describe the problem.  We can't scan randomly a bunch of
LUNs hoping for a response (the space is 10^19).  SAM thinks you use
LUNW for this, but that's not well supported.  We can't annoy USB
devices by probing with REPORT LUNS, so conventionally most arrays
return something for LUN0 even if they don't actually have one (That's
what the peripheral qualifier codes are supposed to be about).  We
translate PQ1 and PQ2 to SCSI_SCAN_TARGET_PRESENT, which means no LUN,
but there is a target to scan here.

If you're sending back an error to an INQUIRY to LUN0, then you're out
of spec.  The SCSI standards say:

SPC3 6.4.1: In response to an INQUIRY command received by an
incorrect logical unit, the SCSI target device shall return the
INQUIRY data with the peripheral qualifier set to the value
defined in 6.4.2. The INQUIRY command shall return CHECK
CONDITION status only when the device server is unable to return
the requested INQUIRY data

James

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add non-zero module sections to sysfs

2013-04-05 Thread James Bottomley

On Fri, 2013-04-05 at 14:30 +1030, Rusty Russell wrote:
> Sebastian Wankerl  writes:
> > On 04/04/13 03:00, Rusty Russell wrote:
> >> Sebastian Wankerl  writes:
> >>> Add non-zero module sections to sysfs on architectures unequal to PARISC.
> >>> KGDB needs all module sections for proper module debugging. Therefore, 
> >>> commit 
> >>> 35dead4235e2b67da7275b4122fed37099c2f462 is revoked except for PARISC
> >>> architecture.

Thanks for actually cc'ing us.

> >> #ifdef CONFIG_PARISC in the middle of kernel/module.c is super-ugly, and
> >> wrong.
> >
> > I don't see why this is wrong. It used to load all sections to sysfs
> > until the patch mentioned. Actually, it is the PARISC build chain which
> > is broken.
> 
> Exactly.  Don't workaround it here, revert it and put the
> duplicate-section-name fixup in parisc where it belongs.
> 
> Assuming parisc still produces these dup sections: that patch is 4 years
> old now.

Just so you know: this isn't a parisc specific problem.  Gcc produces
duplicate section names under various circumstances, but the one that
bites us is -ffunction-sections.  Note that there are proposals to use
-ffunction-sections on all architectures (so we can garbage collect
unused functions) in which case you'll induce the bug identified in
35dead4235e2b67da7275b4122fed37099c2f462 on every architecture

The problem is our assumption that section names be unique.  This
assumption is wrong.  The ELF spec says (version 1.1 page 1-15): "An
object file may have more than one section with the same name."  We need
to fix the kernel not to rely on a bogus assumption ... but we had no
idea how to do that in a way that preserved the backwards compatibility
of sections subdirectory.

I admit that 35dead4235e2b67da7275b4122fed37099c2f462 is a hack, but now
the problem has got attention, can we fix it properly?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] parisc: unbreak automounter support on 64-bit kernel with 32-bit userspace

2013-01-31 Thread James Bottomley

[adding autofs list and maintainer for their perusal and ack, although
this is an obvious fix to me

James]
On Thu, 2013-01-31 at 21:01 +0100, Helge Deller wrote:
> Similiar to other 64 bit Linux targets autofs_wqt_t needs to be of type int
> which has a size of 32 bits on 32- and 64-bit parisc kernels.
> 
> Signed-off-by: Helge Deller 
> 
> diff --git a/include/uapi/linux/auto_fs.h b/include/uapi/linux/auto_fs.h
> index 77cdba9..d551754 100644
> --- a/include/uapi/linux/auto_fs.h
> +++ b/include/uapi/linux/auto_fs.h
> @@ -43,7 +43,7 @@
>   */
>  
>  #if defined(__sparc__) || defined(__mips__) || defined(__x86_64__) \
> - || defined(__powerpc__) || defined(__s390__)
> + || defined(__powerpc__) || defined(__s390__) || defined(__hppa__)
>  typedef unsigned int autofs_wqt_t;
>  #else
>  typedef unsigned long autofs_wqt_t;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] parisc: unbreak automounter support on 64-bit kernel with 32-bit userspace

2013-01-31 Thread James Bottomley

On Thu, 2013-01-31 at 13:29 -0800, H. Peter Anvin wrote:
> On 01/31/2013 12:26 PM, James Bottomley wrote:
> > [adding autofs list and maintainer for their perusal and ack, although
> > this is an obvious fix to me
> > 
> > James]
> 
> The #if list probably should be inverted, and only specific platforms
> should have unsigned long...

The whitelist is 64 bit only platforms, like ia64, isn't it?  So yes,
they're far less numerous.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] don't wait on disk to start on resume

2013-01-31 Thread James Bottomley

On Thu, 2013-01-31 at 14:02 -0800, dbasehore . wrote:
> (Resending message as plain text so it's tracked through vger lists)

Just FYI ... I don't get messages sent to me personally and the list, so
if it wasn't on vger, I haven't seen it.

> I didn't notice the other call to async_synchronize_full, I'll remove that 
> one.
> 
> I split up the functionality of sd_start_stop_device to ensure that
> the command to start the disk was sent before resume was completed.
> For now, it's open coded, but that could be changed if I make
> additions are made to scsi_lib.c to allow us to return immediately
> after scheduling the command.

That would be better; I'll wait to see the proposed patch.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] don't wait on disk to start on resume

2013-01-31 Thread James Bottomley

On Thu, 2013-01-31 at 14:00 -0800, dbasehore . wrote:
> (Resending as plain text so the message is tracked on vger)
> 
> Hi, thanks for reading through my patch.
> 
> With regards to SCSI_SENSE_BUFFERSIZE, I'm following the precedent of
> scsi_execute_req found in drivers/scsi/scsi_lib.c

I have no context for this: I don't recall making any comment on sense
buffers; what are you replying to?

James

> It seems that it is used by the scsi_normalize_sense function which I
> call in sd_resume_async. I just input SCSI_SENSE_BUFFERSIZE directly
> there though. I didn't know if anything would change its behavior on a
> lower level if I made sense_len = SCSI_SENSE_BUFFERSIZE, so I just
> went with what was already done.
> 
> I'll make sure the semantic fixes go into the final patch.
> 
> Also, I forgot to mention one possible problem earlier. I understand
> that some hard drives have a command buffer that can be executed by
> the hard drive in an order it determines. Does anyone know of a
> potential problem if the following happens?
> 
> -resume finishes (hard drive not started yet)
> -read/write sent to disk, inserted before start command
> 
> Could this happen? If so, could it cause any problems?
> 
> I've tested the possibility of a program trying to read/write from the
> disk before it has started, and the read/write blocks until the disk
> has actually been spun up. I don't know if there are specific hard
> drives where this could be a problem though.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/8] mm: use vm_unmapped_area() on parisc architecture

2013-01-24 Thread James Bottomley

On Wed, 2013-01-23 at 17:29 -0800, Michel Lespinasse wrote:
> Update the parisc arch_get_unmapped_area function to make use of
> vm_unmapped_area() instead of implementing a brute force search.
> 
> Signed-off-by: Michel Lespinasse 
> Acked-by: Rik van Riel 
> 
> ---
>  arch/parisc/kernel/sys_parisc.c |   46 ++
>  1 files changed, 17 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
> index f76c10863c62..6ab138088076 100644
> --- a/arch/parisc/kernel/sys_parisc.c
> +++ b/arch/parisc/kernel/sys_parisc.c
> @@ -35,18 +35,15 @@
>  
>  static unsigned long get_unshared_area(unsigned long addr, unsigned long len)
>  {
> - struct vm_area_struct *vma;
> + struct vm_unmapped_area_info info;
>  
> - addr = PAGE_ALIGN(addr);
> -
> - for (vma = find_vma(current->mm, addr); ; vma = vma->vm_next) {
> - /* At this point:  (!vma || addr < vma->vm_end). */
> - if (TASK_SIZE - len < addr)
> - return -ENOMEM;
> - if (!vma || addr + len <= vma->vm_start)
> - return addr;
> - addr = vma->vm_end;
> - }
> + info.flags = 0;
> + info.length = len;
> + info.low_limit = PAGE_ALIGN(addr);
> + info.high_limit = TASK_SIZE;
> + info.align_mask = 0;
> + info.align_offset = 0;
> + return vm_unmapped_area(&info);
>  }
>  
>  #define DCACHE_ALIGN(addr) (((addr) + (SHMLBA - 1)) &~ (SHMLBA - 1))

This macro is now redundant and can be removed.

> @@ -63,30 +60,21 @@ static unsigned long get_unshared_area(unsigned long 
> addr, unsigned long len)
>   */
>  static int get_offset(struct address_space *mapping)
>  {
> - int offset = (unsigned long) mapping << (PAGE_SHIFT - 8);
> - return offset & 0x3FF000;
> + return (unsigned long) mapping >> 8;

I'm not sure I agree with this shift (but I think the original was wrong
as well so the comment probably needs updating.)  Trying to derive
entropy from the mapping address is always nasty.  Mostly they're
embedded in the inode, so the right shift should be something like
log2(sizeof(inode)) + 1 and since the inode size is usually somewhere
between 512 and 1024 bytes, that comes out to 10 I think.

>  }
>  
>  static unsigned long get_shared_area(struct address_space *mapping,
>   unsigned long addr, unsigned long len, unsigned long pgoff)
>  {
> - struct vm_area_struct *vma;
> - int offset = mapping ? get_offset(mapping) : 0;
> -
> - offset = (offset + (pgoff << PAGE_SHIFT)) & 0x3FF000;
> + struct vm_unmapped_area_info info;
>  
> - addr = DCACHE_ALIGN(addr - offset) + offset;
> -
> - for (vma = find_vma(current->mm, addr); ; vma = vma->vm_next) {
> - /* At this point:  (!vma || addr < vma->vm_end). */
> - if (TASK_SIZE - len < addr)
> - return -ENOMEM;
> - if (!vma || addr + len <= vma->vm_start)
> - return addr;
> - addr = DCACHE_ALIGN(vma->vm_end - offset) + offset;
> - if (addr < vma->vm_end) /* handle wraparound */
> - return -ENOMEM;
> - }
> + info.flags = 0;
> + info.length = len;
> + info.low_limit = PAGE_ALIGN(addr);
> + info.high_limit = TASK_SIZE;
> + info.align_mask = PAGE_MASK & (SHMLBA - 1);
> + info.align_offset = (get_offset(mapping) + pgoff) << PAGE_SHIFT;
> + return vm_unmapped_area(&info);
>  }
>  
>  unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,

Other than that, I think this will work, but I'd like to test it.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Linux-c6x-dev] [PATCH 3/9] c6x: Provide dma_mmap_coherent() and dma_get_sgtable()

2013-01-24 Thread James Bottomley


On Wed, 2013-01-23 at 18:44 +0100, Marek Szyprowski wrote:
> On 1/23/2013 11:29 AM, James Bottomley wrote:
> > On Wed, 2013-01-23 at 10:47 +0100, Marek Szyprowski wrote:
> > > On 1/22/2013 11:13 AM, James Bottomley wrote:
> > > > There might be a simple solution:  just replace void *cpu_addr with void
> > > > **cpu_addr in the API.  This is a bit nasty since compilers think that
> > > > void ** to void * conversion is quite legal, so it would be hard to pick
> > > > up misuse of this (uint8_t ** would be better).  That way VIPT could
> > > > remap the kernel pages to a coherent address.  This would probably have
> > > > to change in the dma_mmap_attr() and dma_ops structures.
> > > >
> > > > All consumers would have to expect cpu_addr might change, but that seems
> > > > doable.
> > >
> > > I still don't get how this can help having a coherent buffer between DMA
> > > (devices) and CPU (kernel and user space mappings). The main purpose of
> > > the dma_mmap_coherent() call is to provide a common API for mapping a
> > > coherent buffer between DMA (device) and userspace. It is not about
> > > creating a coherent mapping for sharing memory between userspace and
> > > kernel space.
> >
> > OK, so I assume you don't understand how VIPT architectures work?
> > On a VIPT architecture, the CPU cache is indexed by the virtual address
> > but tagged by the physical address.  This means that when an address
> > comes into the CPU, we can do simultaneous lookups in the cache and the
> > TLB (by the virtual address).  The cache doesn't have the full address
> > bits of the index, so it usually only looks up the lowest n bits.  The
> > value of n gives the congruency of the cache (sometimes called the
> > colour of the cache lines).  The cache usually produces a number of
> > possible lines depending on its associativity and the TLB lookup
> > produces the physical address.  We can now sweep through the cache lines
> > and if a physical address tag matches, return the answer from the cache
> > instead of having to consult main memory.  This gives a speed advantage
> > over PIPT (Physically Indexed Physically Tagged) caches because on PIPT
> > the cache lookup can only occur *after* the TLB lookup instead of
> > concurrently with.
> >
> > As an aside, in practise every PIPT CPU actually has a VIPT cache,
> > purely because of the speed angle.  The trick to making it all work is
> > to shrink n so that n <= PAGE_SIZE_BITS and increase the associativity.
> > This means you can get VIPT speed without producing the aliasing
> > effects.
> >
> > Coherence is achieved in VIPT CPUs when two mapping addresses for the
> > same physical page, say v1 and v2 are congruent i.e. (v1 & ((1< > == (v2 & ((1< > This is why we have an architecture override for
> > arch_get_unmapped_area(). we use this to ensure all shareable mappings
> > for all userspace process are at congruent addresses, so we don't get
> > any aliasing problems in shared VMAs.
> >
> > Aliasing happens when v1 & ((1< > v2 are virtual addresses of the same physical page.  Now that page has
> > *two* cache copies and you have to be very careful to resolve the
> > aliases.  This, unfortunately, is the default scenario for kernel
> > virtual and user virtual addresses because all kernel pages are offset
> > mapped which is why we have to be annoyingly careful about flushing
> > kernel pages before we hand control back to userspace.
> >
> > As you can see, we can only share a mapping between the kernel and user
> > space without all the aliasing problems if the address is congruent.
> >
> > dma_mmap_coherent() seeks to be coherent from the device to the kernel
> > to userspace.  On PARISC, we fix device coherency by loading a coherence
> > index into our IOMMU (it effectively tells the IOMMU the virtual alias
> > for the physical memory and allows the CPU cache to be managed as part
> > of the I/O transaction).  The only way to prevent aliasing between the
> > kernel and user space is to make the two addresses congruent.  We
> > already have a fixed user address (given to us by the vma), so our only
> > choice is to remap the page in the kernel to a congruent address and
> > bingo we have a buffer shared between device/kernel/user space with no
> > aliasing problems.
> >
> > The reason for making the cpu_addr mutable is to allow the
> > implementation to remap the page and make full congruency happen.  We
> > can&#x

Re: [git pull] signal.git

2013-02-24 Thread James Bottomley

On Sat, 2013-02-23 at 18:56 -0800, Linus Torvalds wrote:
> Ok, in the meantime I had merged the parisc and powerpc trees, which
> had their own fixes in this area: powerpc added the transactional
> memory support for power8 (which impacted signal save/restore), and
> parisc had some fixes to the routines you then removed in favor of
> generic ones.

I compiled and booted it on parisc.  Everything seems to be OK, at least
on our pa8800 systems.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [SCSI] aacraid: silence two GCC warnings

2013-02-20 Thread James Bottomley

On Sat, 2013-02-09 at 21:09 +0100, Paul Bolle wrote:
> Compiling src.o for a 32 bit system triggers two GCC warnings:
> drivers/scsi/aacraid/src.c: In function ‘aac_src_deliver_message’:
> drivers/scsi/aacraid/src.c:410:3: warning: right shift count >= width of 
> type [enabled by default]
> drivers/scsi/aacraid/src.c:434:2: warning: right shift count >= width of 
> type [enabled by default]
> 
> Silence these warnings by casting the 'address' variable (of type
> dma_addr_t) to u64 on those two lines.
> 
> Signed-off-by: Paul Bolle 
> ---
> 0) Compile tested only.
> 
> 1) Changing '0L' to 'OUL' might be a bit of cargo cult programming. I
> doubt it's necessary. Members of that cult might also change
> '0x' to '0xUL', but I didn't.
> 
>  drivers/scsi/aacraid/src.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
> index 3b021ec..dfa8a37 100644
> --- a/drivers/scsi/aacraid/src.c
> +++ b/drivers/scsi/aacraid/src.c
> @@ -407,7 +407,7 @@ static int aac_src_deliver_message(struct fib *fib)
>   fib->hw_fib_va->header.StructType = FIB_MAGIC2;
>   fib->hw_fib_va->header.SenderFibAddress = (u32)address;
>   fib->hw_fib_va->header.u.TimeStamp = 0;
> - BUG_ON((u32)(address >> 32) != 0L);
> + BUG_ON(((u64)address >> 32) != 0UL);

Actually, this isn't the right way to do this.  The correct way would be
address >> 16 >> 16, but no-one is expected to remember this, so we have
it macroised in kernel.h as upper_32_bits().

So this should become

BUG_ON(upper_32_bits(address) != 0L);

and a similar change for the below.

James

>   address |= fibsize;
>   } else {
>   /* Calculate the amount to the fibsize bits */
> @@ -431,7 +431,7 @@ static int aac_src_deliver_message(struct fib *fib)
>   address |= fibsize;
>   }
>  
> - src_writel(dev, MUnit.IQ_H, (address >> 32) & 0x);
> + src_writel(dev, MUnit.IQ_H, ((u64)address >> 32) & 0x);
>   src_writel(dev, MUnit.IQ_L, address & 0x);
>  
>   return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] First round of SCSI updates for the 3.8+ merge window

2013-02-22 Thread James Bottomley

This is the first round, consisting mostly of drivers and patches
submitted 3 weeks ago.  Since I've been travelling quite a bit, there
will be a second round just before the merge window closes for all the
patches three weeks or newer (that includes the FCoE tree).

The patch set is mostly driver updates (bnx2fc, ipr, lpfc, qla4) and a
few bug fixes.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-misc

The short changelog is

Bhanu Prakash Gollapudi (6):
  bnx2fc: Bumped version to 1.0.13
  bnx2fc: Support max IO size to 512KB
  bnx2fc: Tx/Rx byte counts reset to 0 when exceeding 32 bit values
  bnx2fc: Map the doorbell register between offload and enable requests
  bnx2fc: Move offload/upload wait logic into a function
  bnx2fc: support software fcoe target

Bjorn Helgaas (1):
  gdth: Remove buggy ROM handling

Brian King (2):
  ipr: Resource path error logging cleanup
  ipr: Handler ID memory allocation failure at module load time

Cyril Roelandt (1):
  bnx2fc: remove useless calls to memset().

Dan Carpenter (2):
  bfa: fix strncpy() limiter in bfad_start_ops()
  mpt3sas: cut and paste bug storing trigger mpi

Harish Zunjarrao (1):
  qla4xxx: Allow reset in link down case

Hiral Patel (2):
  fnic: fix for trusted cos
  fnic: updated MAINTAINERS list

James Smart (11):
  lpfc 8.3.37: Update lpfc version for 8.3.37 driver release
  lpfc 8.3.37: Fixed infinite loop in lpfc_sli4_fcf_rr_next_index_get.
  lpfc 8.3.37: Fixed crash due to SLI Port invalid resource count
  lpfc 8.3.37: Provide support for FCoE protocol dual-chute (ULP) operation
  lpfc 8.3.37: Fixed stale ndlp state when the node is marked for deferred 
removal.
  lpfc 8.3.37: Fix potential memory corruption bug
  lpfc 8.3.37: Fixed no-context ABTS failed with BA_RJT
  lpfc 8.3.37: Removed use of NOP mailboxes for interrupt verification
  lpfc 8.3.37: Fixed exhausted retry for plogi to nameserver.
  lpfc 8.3.37: Fixed ELS_REC received on the unsolicited receive queue
  lpfc 8.3.37: Provide support for change_queue_type

Jesper Juhl (1):
  csiostor: Don't leak mem or fail to release firmware in 
csio_hw_flash_config()

Julia Lawall (2):
  bnx2fc: adjust duplicate test
  bnx2fc: Remove potential NULL dereference

Karen Higgins (2):
  qla4xxx: Throttle active IOCBs to firmware limits
  qla4xxx: Remove unnecessary code from qla4xxx_init_local_data

Mahesh Rajashekhara (1):
  aacraid: 1024 max outstanding command support for Series 7 and above

Manish Rangankar (4):
  qla4xxx: Fix return code for qla4xxx_session_get_param.
  qla4xxx: wait for boot target login response during probe.
  scsi_transport_iscsi: export iscsi class session's target_id in sysfs.
  qla4xxx: Fix memory corruption issue in qla4xxx_get_ep_fwdb.

Nilesh Javali (4):
  qla4xxx: Quiesce driver activities while loopback
  qla4xxx: Rename MBOX_ASTS_IDC_NOTIFY to MBOX_ASTS_IDC_REQUEST_NOTIFICATION
  qla4xxx: Add spurious interrupt messages under debug level 2
  qla4xxx: Correct the validation to check in get_sys_info mailbox

Poornima Vonti (1):
  qla4xxx: Re-register IRQ handler while retrying initialize of adapter

Sachin Kamat (1):
  mpt3sas: Remove unneeded version.h header inclusion

Sreekanth Reddy (1):
  mpt2sas: fix for driver fails EEH, recovery from injected pci bus error

Thadeu Lima de Souza Cascardo (1):
  cxgb4i: Remove the scsi host device when removing device

Tomas Henzl (1):
  bnx2i: fix the bit manipulation when setting the error mask

Vijaya Mohan Guvva (1):
  bfa: Update MAINTAINERS list for BFA driver

Vikas Chaudhary (6):
  qla4xxx: Update driver version to 5.03.00-k4
  qla4xxx: Added support for force firmware dump
  qla4xxx: Update driver version to 5.03.00-k3
  qla4xxx: Pass correct function param to qla4_8xxx_rd_direct
  qla4xxx: Update driver version to 5.03.00-k2
  qla4xxx: Fix MBOX intr switching from polling to intr mode for ISP83XX

Wei Yongjun (2):
  mpt3sas: remove unused variables
  csiostor: convert to use simple_open()

wenxi...@linux.vnet.ibm.com (6):
  ipr: Fix sparse error in ipr driver
  ipr: Driver version 2.6.0
  ipr: Implement block iopoll
  ipr: Reduce lock contention
  ipr: Add support for MSI-X and distributed completion
  ipr: Add sereral new CCIN definitions for new adapters support

And the diffstat:

 MAINTAINERS |7 +-
 drivers/scsi/aacraid/aacraid.h  |8 +-
 drivers/scsi/aacraid/comminit.c |   11 +-
 drivers/scsi/bfa/bfad.c |2 +-
 drivers/scsi/bnx2fc/bnx2fc.h|   27 +-
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c   |   21 +-
 drivers/scsi/bnx2fc/bnx2fc_hwi.c|   29 +-
 drivers/scsi/bnx2fc/bnx2fc_io.c |8 +-
 drivers/scsi/bnx2f

Re: [PATCH 0/9] overlay filesystem: request for inclusion (v17)

2013-03-15 Thread James Bottomley

On Fri, 2013-03-15 at 05:13 +, Al Viro wrote:
> On Fri, Mar 15, 2013 at 02:09:14PM +0900, J. R. Okajima wrote:
> 
> > If so, it has a big disadvantage for the layer-fs (or branch-fs) to have
> > to implement a new method for whiteout.
> > 
> > Overlayfs implements whiteout as symlink+xattr which consumes an
> > inode. And you don't like it, right?
> > What I showed is another generic approach without xattr where the new
> > method to whiteout is unnecessary.
> 
> I'm yet to see the reason that would make implementing that method a big
> disadvantage, TBH...

It's the fact that a directory entry based whiteout limits the amount of
change to the VFS, but has to be supported by underlying filesystems.
The generic_dirent_fallthrough() mechanism is a nice way of hiding it,
but there are still quite a few fs specific mods in the union mount tree
because of this.  Having to modify filesystems to me indicates the
mechanism is a bit fragile.  If we could do whiteouts purely in the VFS,
so it would work for any filesystem (without needing filesystem
modifications) that would seem to be a more robust approach.  I'm not
saying we can definitely do this in an elegant way ... I'm just saying
that if someone comes up with it, it's obviously preferable.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Results of the Linux Foundation TAB election

2012-08-30 Thread James Bottomley

As a result of the election at Bali Hai, The five new members of the TAB
elected to two year terms are:

Chris Mason 
James Bottomley 
John W. Linville 
Ric Wheeler 
Jesse Barnes 

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] parisc fixes for 3.4-rc1

2012-08-31 Thread James Bottomley

This is a set of two bug fixes.  One is the ATOMIC problem which is now
causing a compile failure in certain situations.  The other is
mishandling of PER_LINUX32 which may also cause user visible effects.

The patches are here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6.git parisc-fixes

The short changelog is:

Jiri Kosina (1):
  fix personality flag check in copy_thread()

Mel Gorman (1):
  Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts

And the diffstat:

 arch/parisc/include/asm/atomic.h | 4 ++--
 arch/parisc/kernel/process.c | 2 +-
 arch/parisc/kernel/sys_parisc.c  | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

Full diff below.

James

---

diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 6c6defc..af9cf30 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -141,7 +141,7 @@ static __inline__ int __atomic_add_unless(atomic_t *v, int 
a, int u)
 
 #define atomic_sub_and_test(i,v)   (atomic_sub_return((i),(v)) == 0)
 
-#define ATOMIC_INIT(i) ((atomic_t) { (i) })
+#define ATOMIC_INIT(i) { (i) }
 
 #define smp_mb__before_atomic_dec()smp_mb()
 #define smp_mb__after_atomic_dec() smp_mb()
@@ -150,7 +150,7 @@ static __inline__ int __atomic_add_unless(atomic_t *v, int 
a, int u)
 
 #ifdef CONFIG_64BIT
 
-#define ATOMIC64_INIT(i) ((atomic64_t) { (i) })
+#define ATOMIC64_INIT(i) { (i) }
 
 static __inline__ s64
 __atomic64_add_return(s64 i, atomic64_t *v)
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index d4b94b3..2c05a92 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -309,7 +309,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp,
cregs->ksp = (unsigned long)stack
+ (pregs->gr[21] & (THREAD_SIZE - 1));
cregs->gr[30] = usp;
-   if (p->personality == PER_HPUX) {
+   if (personality(p->personality) == PER_HPUX) {
 #ifdef CONFIG_HPUX
cregs->kpc = (unsigned long) &hpux_child_return;
 #else
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index c9b9322..7426e40 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -225,12 +225,12 @@ long parisc_personality(unsigned long personality)
long err;
 
if (personality(current->personality) == PER_LINUX32
-   && personality == PER_LINUX)
-   personality = PER_LINUX32;
+   && personality(personality) == PER_LINUX)
+   personality = (personality & ~PER_MASK) | PER_LINUX32;
 
err = sys_personality(personality);
-   if (err == PER_LINUX32)
-   err = PER_LINUX;
+   if (personality(err) == PER_LINUX32)
+   err = (err & ~PER_MASK) | PER_LINUX;
 
return err;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] parisc fixes for 3.6-rc3

2012-08-31 Thread James Bottomley

This is a set of two bug fixes.  One is the ATOMIC problem which is now
causing a compile failure in certain situations.  The other is
mishandling of PER_LINUX32 which may also cause user visible effects.

The patches are here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6.git parisc-fixes

The short changelog is:

Jiri Kosina (1):
  fix personality flag check in copy_thread()

Mel Gorman (1):
  Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts

And the diffstat:

 arch/parisc/include/asm/atomic.h | 4 ++--
 arch/parisc/kernel/process.c | 2 +-
 arch/parisc/kernel/sys_parisc.c  | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

Full diff below.

James

---

diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 6c6defc..af9cf30 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -141,7 +141,7 @@ static __inline__ int __atomic_add_unless(atomic_t *v, int 
a, int u)
 
 #define atomic_sub_and_test(i,v)   (atomic_sub_return((i),(v)) == 0)
 
-#define ATOMIC_INIT(i) ((atomic_t) { (i) })
+#define ATOMIC_INIT(i) { (i) }
 
 #define smp_mb__before_atomic_dec()smp_mb()
 #define smp_mb__after_atomic_dec() smp_mb()
@@ -150,7 +150,7 @@ static __inline__ int __atomic_add_unless(atomic_t *v, int 
a, int u)
 
 #ifdef CONFIG_64BIT
 
-#define ATOMIC64_INIT(i) ((atomic64_t) { (i) })
+#define ATOMIC64_INIT(i) { (i) }
 
 static __inline__ s64
 __atomic64_add_return(s64 i, atomic64_t *v)
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index d4b94b3..2c05a92 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -309,7 +309,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp,
cregs->ksp = (unsigned long)stack
+ (pregs->gr[21] & (THREAD_SIZE - 1));
cregs->gr[30] = usp;
-   if (p->personality == PER_HPUX) {
+   if (personality(p->personality) == PER_HPUX) {
 #ifdef CONFIG_HPUX
cregs->kpc = (unsigned long) &hpux_child_return;
 #else
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index c9b9322..7426e40 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -225,12 +225,12 @@ long parisc_personality(unsigned long personality)
long err;
 
if (personality(current->personality) == PER_LINUX32
-   && personality == PER_LINUX)
-   personality = PER_LINUX32;
+   && personality(personality) == PER_LINUX)
+   personality = (personality & ~PER_MASK) | PER_LINUX32;
 
err = sys_personality(personality);
-   if (err == PER_LINUX32)
-   err = PER_LINUX;
+   if (personality(err) == PER_LINUX32)
+   err = (err & ~PER_MASK) | PER_LINUX;
 
return err;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sg_io: allow UNMAP and WRITE SAME without CAP_SYS_RAWIO

2012-09-12 Thread James Bottomley

On Tue, 2012-09-11 at 19:56 +0200, Paolo Bonzini wrote:
> The set of use cases is so variable that no single filter can accomodate
> all of them: high availability people want persistent reservations, NAS
> people want trim/discard, but these are just two groups.  Someone is
> using a Windows VM to run vendor tools and wants to have access to
> vendor-specific commands.
> 
> You can tell this last group to use root, but not everyone else who is
> already relying on Unix permissions, SELinux and/or device cgroups to
> confine their virtual machines.

This is why the whole filter thing was mutable via sysfs.  That way the
admin could set this up per device.  It sounds like this is what you
want to fix, rather than opening up more holes in an already leaky
security apparatus.  The ideal is that we would be much more restrictive
by default and give root the ability to override this both globally and
per-device to conform to whatever policy it has for the virtual
environments.

The patch which removed all of the sysfs pieces was:

commit 018e0446890661504783f92388ecce7138c1566d
Author: Jens Axboe 
Date:   Fri Jun 26 16:27:10 2009 +0200

block: get rid of queue-private command filter

So that's probably the place to start for putting it back properly.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] scsi/qlogicpti.c section fixes

2008-01-30 Thread James Bottomley


On Wed, 2008-01-30 at 22:03 +0200, Adrian Bunk wrote:
> This patch fixes the following section mismatches:
> 
> <--  snip  -->
> 
> ...
> WARNING: drivers/scsi/qlogicpti.o(.devexit.text+0x8): Section mismatch in 
> reference from the function qpti_sbus_remove() to the function 
> .init.text:qpti_chain_del()
> WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x56c): Section mismatch in 
> reference from the function qpti_sbus_probe() to the function 
> .init.text:qpti_map_regs()
> WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x580): Section mismatch in 
> reference from the function qpti_sbus_probe() to the function 
> .init.text:qpti_register_irq()
> WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x594): Section mismatch in 
> reference from the function qpti_sbus_probe() to the function 
> .init.text:qpti_get_scsi_id()
> WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x5b8): Section mismatch in 
> reference from the function qpti_sbus_probe() to the function 
> .init.text:qpti_map_queues()
> WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x780): Section mismatch in 
> reference from the function qpti_sbus_probe() to the function 
> .init.text:qpti_chain_add()
> ...

OK, look, this is really getting out of hand.

__init is possibly justifiable with a few hundred k savings on boot.
__devinit and the rest are surely killable on the grounds they provide
little benefit for all the pain they cause.

all __exit seems to do is set us up for unreferenced pointers in
discarded sections, so could we kill that too?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] scsi/qlogicpti.c section fixes

2008-01-30 Thread James Bottomley


On Wed, 2008-01-30 at 22:20 +0100, Sam Ravnborg wrote:
> On Wed, Jan 30, 2008 at 03:00:16PM -0600, James Bottomley wrote:
> > 
> > On Wed, 2008-01-30 at 22:03 +0200, Adrian Bunk wrote:
> > > This patch fixes the following section mismatches:
> > > 
> > > <--  snip  -->
> > > 
> > > ...
> > > WARNING: drivers/scsi/qlogicpti.o(.devexit.text+0x8): Section mismatch in 
> > > reference from the function qpti_sbus_remove() to the function 
> > > .init.text:qpti_chain_del()
> > > WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x56c): Section mismatch 
> > > in reference from the function qpti_sbus_probe() to the function 
> > > .init.text:qpti_map_regs()
> > > WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x580): Section mismatch 
> > > in reference from the function qpti_sbus_probe() to the function 
> > > .init.text:qpti_register_irq()
> > > WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x594): Section mismatch 
> > > in reference from the function qpti_sbus_probe() to the function 
> > > .init.text:qpti_get_scsi_id()
> > > WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x5b8): Section mismatch 
> > > in reference from the function qpti_sbus_probe() to the function 
> > > .init.text:qpti_map_queues()
> > > WARNING: drivers/scsi/qlogicpti.o(.devinit.text+0x780): Section mismatch 
> > > in reference from the function qpti_sbus_probe() to the function 
> > > .init.text:qpti_chain_add()
> > > ...
> > 
> > OK, look, this is really getting out of hand.
> We now at last have some infrastructure that does proper consistency
> check of the use of annotation.
> So it is by no means getting out-of-hands..
> 
> To my understanding Adrian just fixed a potential oops.
> You have a driver that surely are thought to be hotplugable -
> otherwise qpti_sbus_probe() would not have been annotated __devinit.
> And had it ever been hotplugged then you had called a function in
> a __init section that was long gone.

I never said it didn't fix a bug.  I'm just reflecting that all of the
problems would go away and we'd save thousands of person hours on the
infrastructure and bug fixing if we simply #defined most of the
sectional annotations to be nops.

My basic question is "is the value of what all of this buys us
(basically a few hundred k in savings)" worth all the investment we're
putting into it.  Particularly as it looks like that investment is
becoming increasingly expensive.

> > __init is possibly justifiable with a few hundred k savings on boot.
> > __devinit and the rest are surely killable on the grounds they provide
> > little benefit for all the pain they cause.
> For the embedded people a few kb here and there is worth it.
> 
> > all __exit seems to do is set us up for unreferenced pointers in
> > discarded sections, so could we kill that too?
> Again - savings when we build-in the drivers.
> And without the checks we see 'funny' linker errors on the architectues
> that can continue to add the .exit.text in /DISCARD/

Perhaps you have different figures, but my standard kernel linking ones
tell me that the discard sections only save tens of k (not hundreds that
the init ones save), so I really do think they have no real benefit and
land us huge problems of pointer references into discarded sections.

I don't deny we can invest large amounts of work to fix our current
issues and build large scriptable checks to ensure we keep it fixed ...
I'm just asking if, at the end of the day, it's really worth it.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PATCH] intermediate SCSI update for 2.6.24

2008-01-30 Thread James Bottomley

This isn't the final update I promised, most of this is fixing bugs in
the previous update which became more evident with wider testing (a
nasty set of cases where we have &cmnd->sense_buffer instead of
&cmnd->sense_buffer[0] which cause problems now that sense_buffer is a
pointer instead and a bug in the setup of immediate request sense
commands that lead to some USB devices becoming unusable).  However, the
two feature updates it has are to convert SCSI to allow it to use
bidirectional commands and to make sg chaining the default (drivers now
turn it off simply by setting their sg_tablesize to something which
won't excite chaining).

The patch is available here:

master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git

The short changelog is:

Boaz Harrosh (3):
  bidirectional command support
  implement scsi_data_buffer
  tgt: use scsi_init_io instead of scsi_alloc_sgtable

David Milburn (1):
  aic7xxx: fix ahc_done check SCB_ACTIVE for tagged transactions

FUJITA Tomonori (11):
  handle scsi_init_queue failure properly
  destroy scsi_bidi_sdb_cache in scsi_exit_queue
  scsi_debug: add XDWRITEREAD_10 support
  scsi_debug: add bidi data transfer support
  scsi_debug: add get_data_transfer_info helper function
  aic7xxx: fix warnings with CONFIG_PM disabled
  aic79xx: fix warnings with CONFIG_PM disabled
  zfcp: fix sense_buffer access bug
  ncr53c8xx: fix sense_buffer access bug
  aic79xx: fix sense_buffer access bug
  hptiop: fix sense_buffer access bug

James Bottomley (3):
  Revert "[SCSI] aacraid: fib context lock for management ioctls"
  bsg: copy the cmd_type field to the subordinate request for bidi
  remove use_sg_chaining

Kiyoshi Ueda (1):
  bidirectional: fix up for the new blk_end_request code

Nathan Lynch (1):
  sym53c8xx: fix bad memset argument in sym_set_cam_result_error

Thomas Bogendoerfer (1):
  sgiwd93: use cached memory access to make driver work on IP28


And the diffstat:

 arch/ia64/hp/sim/simscsi.c |1 
 block/bsg.c|1 
 drivers/infiniband/ulp/srp/ib_srp.c|1 
 drivers/s390/scsi/zfcp_fsf.c   |4 
 drivers/scsi/3w-9xxx.c |1 
 drivers/scsi/3w-.c |1 
 drivers/scsi/BusLogic.c|1 
 drivers/scsi/Kconfig   |2 
 drivers/scsi/NCR53c406a.c  |1 
 drivers/scsi/a100u2w.c |1 
 drivers/scsi/aacraid/commctrl.c|   29 +--
 drivers/scsi/aacraid/linit.c   |1 
 drivers/scsi/aha1740.c |1 
 drivers/scsi/aic7xxx/aic79xx.h |5 
 drivers/scsi/aic7xxx/aic79xx_core.c|2 
 drivers/scsi/aic7xxx/aic79xx_osm.c |3 
 drivers/scsi/aic7xxx/aic79xx_osm_pci.c |   33 +--
 drivers/scsi/aic7xxx/aic79xx_pci.c |2 
 drivers/scsi/aic7xxx/aic7xxx.h |4 
 drivers/scsi/aic7xxx/aic7xxx_core.c|3 
 drivers/scsi/aic7xxx/aic7xxx_osm.c |   10 -
 drivers/scsi/aic7xxx/aic7xxx_osm_pci.c |   33 +--
 drivers/scsi/aic7xxx/aic7xxx_pci.c |2 
 drivers/scsi/aic7xxx_old.c |1 
 drivers/scsi/arcmsr/arcmsr_hba.c   |1 
 drivers/scsi/dc395x.c  |1 
 drivers/scsi/dpt_i2o.c |1 
 drivers/scsi/eata.c|1 
 drivers/scsi/hosts.c   |1 
 drivers/scsi/hptiop.c  |3 
 drivers/scsi/ibmmca.c  |1 
 drivers/scsi/ibmvscsi/ibmvscsi.c   |1 
 drivers/scsi/initio.c  |1 
 drivers/scsi/iscsi_tcp.c   |1 
 drivers/scsi/libsrp.c  |4 
 drivers/scsi/lpfc/lpfc_scsi.c  |2 
 drivers/scsi/mac53c94.c|1 
 drivers/scsi/megaraid.c|1 
 drivers/scsi/megaraid/megaraid_mbox.c  |1 
 drivers/scsi/megaraid/megaraid_sas.c   |1 
 drivers/scsi/mesh.c|1 
 drivers/scsi/ncr53c8xx.c   |2 
 drivers/scsi/nsp32.c   |1 
 drivers/scsi/pcmcia/sym53c500_cs.c |1 
 drivers/scsi/qla1280.c |1 
 drivers/scsi/qla2xxx/qla_os.c  |2 
 drivers/scsi/qla4xxx/ql4_os.c  |1 
 drivers/scsi/qlogicfas.c   |1 
 drivers/scsi/scsi.c|2 
 drivers/scsi/scsi_debug.c  |  174 ++--
 drivers/scsi/scsi_error.c  |   33 +--
 drivers/scsi/scsi_lib.c|  274 +++--
 drivers/scsi/scsi_tgt_lib.c|   28 ---
 drivers/scsi/sd.c  |4 
 drivers/scsi/sgiwd93.c |   64 ---
 drivers/scsi/sr.c  |   25 +--
 drivers/scsi/stex.c|1 
 drivers/scsi/sym53c416.c   |1 
 drivers/scsi/sym53c8xx_2/sym_glue.c|3 
 drivers/scsi/u14-34f.

Re: Value of __*{init,exit} anotations?

2008-01-30 Thread James Bottomley

On Thu, 2008-01-31 at 00:32 +0200, Adrian Bunk wrote:
> On Wed, Jan 30, 2008 at 03:41:35PM -0600, James Bottomley wrote:
> > On Wed, 2008-01-30 at 22:20 +0100, Sam Ravnborg wrote:
> > > On Wed, Jan 30, 2008 at 03:00:16PM -0600, James Bottomley wrote:
> >...
> > > > __init is possibly justifiable with a few hundred k savings on boot.
> > > > __devinit and the rest are surely killable on the grounds they provide
> > > > little benefit for all the pain they cause.
> > > For the embedded people a few kb here and there is worth it.
> > > 
> > > > all __exit seems to do is set us up for unreferenced pointers in
> > > > discarded sections, so could we kill that too?
> > > Again - savings when we build-in the drivers.
> > > And without the checks we see 'funny' linker errors on the architectues
> > > that can continue to add the .exit.text in /DISCARD/
> > 
> > Perhaps you have different figures, but my standard kernel linking ones
> > tell me that the discard sections only save tens of k (not hundreds that
> > the init ones save), so I really do think they have no real benefit and
> > land us huge problems of pointer references into discarded sections.
> > 
> > I don't deny we can invest large amounts of work to fix our current
> > issues and build large scriptable checks to ensure we keep it fixed ...
> > I'm just asking if, at the end of the day, it's really worth it.
> 
> Some people consider it worth it for their memory restricted systems
> and would like to drive the annotations even further. [1]
> 
> My experience while fixing section bugs during the last years is that 
> the __dev{init,exit}* are actually the main question since they are both 
> the majority of annotations and the ones that bring benefits only 
> in a case that has become very exotic (CONFIG_HOTPLUG=n).
> 
> All the other annotations either both bring value for everyone
> (plain __init* and __exit*) or are nothing normal drivers would
> use (__cpu* and _mem*).
> 
> People at linux-arch (Cc'ed) might be better at explaining how often 
> CONFIG_HOTPLUG gets used in real-life systems and how big the savings 
> are there.
> 
> That might be a good basis for deciding whether it's worth it.

I'll certainly buy this.  Perhaps killing everything other than __init
and __exit (meaning discardable whether the system is hotplug, suspend
or whatever) might get rid of 90% of the problem while still preserving
90% of the benefits.  I think a lot of the issues do come from confusion
over whether it should be __init, __devinint etc .

We can argue later over the benefit of __exit ...

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] kill hotplug init/exit section annotations

2008-01-31 Thread James Bottomley

No-one seems to see much value in these, and they cause about 90% of our
problems with __init/__exit markers, so simply eliminate them.  Rather
than run over the whole tree removing them, this patch #defines them to
be nops.

Signed-off-by: James Bottomley <[EMAIL PROTECTED]>

---

I'll probably be going after __exit after this one, but it makes sense
to split them up, since the hotplug annotation removal looks
uncontroversial, whereas __exit and discard section removal might
produce more robust debate.  I also think doing the hotplug removal
gives us 90% of the benefits and removes 90% of the section mismatch
problems.

James


diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index f784d2f..5099021 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -9,46 +9,11 @@
 /* Align . to a 8 byte boundary equals to maximum function alignment. */
 #define ALIGN_FUNCTION()  . = ALIGN(8)
 
-/* The actual configuration determine if the init/exit sections
- * are handled as text/data or they can be discarded (which
- * often happens at runtime)
- */
-#ifdef CONFIG_HOTPLUG
-#define DEV_KEEP(sec)*(.dev##sec)
-#define DEV_DISCARD(sec)
-#else
-#define DEV_KEEP(sec)
-#define DEV_DISCARD(sec) *(.dev##sec)
-#endif
-
-#ifdef CONFIG_HOTPLUG_CPU
-#define CPU_KEEP(sec)*(.cpu##sec)
-#define CPU_DISCARD(sec)
-#else
-#define CPU_KEEP(sec)
-#define CPU_DISCARD(sec) *(.cpu##sec)
-#endif
-
-#if defined(CONFIG_MEMORY_HOTPLUG)
-#define MEM_KEEP(sec)*(.mem##sec)
-#define MEM_DISCARD(sec)
-#else
-#define MEM_KEEP(sec)
-#define MEM_DISCARD(sec) *(.mem##sec)
-#endif
-
-
 /* .data section */
 #define DATA_DATA  \
*(.data)\
*(.data.init.refok) \
*(.ref.data)\
-   DEV_KEEP(init.data) \
-   DEV_KEEP(exit.data) \
-   CPU_KEEP(init.data) \
-   CPU_KEEP(exit.data) \
-   MEM_KEEP(init.data) \
-   MEM_KEEP(exit.data) \
. = ALIGN(8);   \
VMLINUX_SYMBOL(__start___markers) = .;  \
*(__markers)\
@@ -171,12 +136,6 @@
/* __*init sections */  \
__init_rodata : AT(ADDR(__init_rodata) - LOAD_OFFSET) { \
*(.ref.rodata)  \
-   DEV_KEEP(init.rodata)   \
-   DEV_KEEP(exit.rodata)   \
-   CPU_KEEP(init.rodata)   \
-   CPU_KEEP(exit.rodata)   \
-   MEM_KEEP(init.rodata)   \
-   MEM_KEEP(exit.rodata)   \
}   \
\
/* Built-in module parameters. */   \
@@ -208,12 +167,6 @@
*(.ref.text)\
*(.text.init.refok) \
*(.exit.text.refok) \
-   DEV_KEEP(init.text) \
-   DEV_KEEP(exit.text) \
-   CPU_KEEP(init.text) \
-   CPU_KEEP(exit.text) \
-   MEM_KEEP(init.text) \
-   MEM_KEEP(exit.text)
 
 
 /* sched.text is aling to function alignment to secure we have same
@@ -241,33 +194,15 @@
 /* init and exit section handling */
 #define INIT_DATA  \
*(.init.data)   \
-   DEV_DISCARD(init.data)  \
-   DEV_DISCARD(init.rodata)\
-   CPU_DISCARD(init.data)  \
-   CPU_DISCARD(init.rodata)\
-   MEM_DISCARD(init.data)  \
-   MEM_DISCARD(init.rodata)
 
 #define INIT_TEXT  \
*

Re: Are Section mismatches out of control?

2008-02-01 Thread James Bottomley

On Fri, 2008-02-01 at 03:03 -0800, Andrew Morton wrote:
> On Fri, 1 Feb 2008 11:47:18 +0100 Sam Ravnborg <[EMAIL PROTECTED]> wrote:
> 
> > James said in a related posting that the Section mismatch
> > warnings were getting out of control.
> 
> eh.  They're easy - the build system tells you about them!
> 
> > The list is here:
> 
> Question is: why do people keep adding new ones when they are so easy to
> detect and fix?
> 
> Asnwer: because neither they nor their patch integrators are doing adequate
> compilation testing.

That's because certain configurations, where this shows up, like !
CONFIG_HOTPLUG or !CONFIG_HOTPLUG_CPU simply aren't anything other than
extremely weird configs for most modern HW and distributions.

I'm not saying we can't fix the problem with tools or that we can't
invest huge amounts of time patching and fixing this.  I am saying we
should be questioning the value of the investment in doing so.

I also claim that getting rid of the __dev.* sectional attributes (for
which I've already submitted a small patch) fixes 90% of the problems
(since they're mainly caused by driver writers not understanding what
all of these things mean).  The remaining ones are clearer and
additionally, show up on every build (whether HOTPLUG or not).

Before we start to pillory driver writers for not caring about configs
they rightly don't give a toss about, could we at least get someone
(anyone!) in the community that cares about extreme low memory
configurations to give us reasons why there's value to all this effort
(rather than spending it on say reducing cache footprints or eliminating
space in structures)?  All the people I've asked so far (mainly in arm,
admittedly) have said they have bigger things to worry about.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Are Section mismatches out of control?

2008-02-01 Thread James Bottomley

On Fri, 2008-02-01 at 11:47 +0100, Sam Ravnborg wrote:
> James said in a related posting that the Section mismatch
> warnings were getting out of control.

What I actually said was that the churn in the source base caused by
these sectional mismatches was getting out of hand.

What I questioned was the value of inspecting all the drivers and trying
to fix them all (and the effort involved) vs my proposed solution of
simply making 90% of them go away.

> So I decided to take a closer look at current
> status. Latest mainline with Adrian + mine fixes applied.
> 
> Target was x86 - an allyesconfig build.
> I looked at the reported Section mismatch warnings per
> directory so see where it was looking bad.
> 
> The list is here:
> 
> 
> Directory Warnings
> = 
> block/ 0
> fs/0
> init/  0
> lib/   0
> net/   2
> sound/ 3
> crypto/0
> ipc/   0
> kernel/0
> mm/0
> usr/   0
> security/  0
> drivers/net/  11
> drivers/ata/   2
> drivers/base/  0
> drivers/block/ 0
> drivers/cdrom/ 0
> drivers/crypto/0
> drivers/hid/   0
> drivers/input/ 0
> drivers/lguest/0
> drivers/leds/  0
> drivers/media/ 0
> drivers/pci/   0
> drivers/pcmcia/9
> drivers/power/ 0
> drivers/ps3/   0
> drivers/rtc/   1
> drivers/scsi/  3
> drivers/isdn/ 34
> drivers/serial/   10
> drivers/spi/   0
> drivers/usb/   0
> drivers/video/14
> drivers/telephony/ 0
> drivers/watchdog/  0
> drivers/w1/0
> drivers/dca0
> drivers/edac/  0
> drivers/acpi/  2
> drivers/char/  3
> drivers/cpufreq3
> drivers/hwmon/ 1
> drivers/infiniband/ 0
> drivers/md 0
> drivers/message/   0
> drivers/misc/  0
> drivers/mmc/   0
> drivers/mtd/   0
> drivers/parport/   0
> drivers/pnp/   0
> 
> As expected the majority is in drivers/
> And as is obvious from the above the warnings are
> concentrated on a few places:
> drivers/isdn/ has the top score.
> Then we have video/, net/ serial/ and pcmia.
> 
> With thos four directories clean we are down with
> 78 less warnings.
> 
> The total figure for this build is 106 warnings.
> In my book things are not out of control.
> So stop complaining and lets see some fixes.
> 
> I will look at drivers/isdn as next step.

But on your own estimate, that's around 50 patches to fix all of this,
isn't it?  Which all have to be inspected, tested and integrated.  Is
that really a worthwhile exercise for saving 5k of memory (also from
your own estimate).  Particularly as the people who care about extreme
memory configurations aren't going "goody sections saved us 5k", they're
going "&^*&^ me the kernel increased in size by 150k on my system from
2.6.15 to 2.6.24". 

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Are Section mismatches out of control?

2008-02-01 Thread James Bottomley


On Fri, 2008-02-01 at 22:47 +0100, Jan Engelhardt wrote:
> On Feb 1 2008 03:21, Harvey Harrison wrote:
> >> 
> >> Question is: why do people keep adding new ones when they are so easy to
> >> detect and fix?
> >> 
> >> Asnwer: because neither they nor their patch integrators are doing adequate
> >> compilation testing.
> >
> >[...]
> >Unless they break the build, or if there currently are 0 and they make
> >it non-zero, people seem not to caresad.  Probably the same for
> >sparse/checkpatch, "there's plenty already, I can't be bothered to look"
> 
> checkpatch does not parse C, it uses heuristical regexes.
> 
> That makes it very different from sparse or the section mismatch
> finder which do not output false positives.

Even by the exalted standards of LKML which sometimes seems to make a
virtue of misinformation, four wrong statements in twenty seven words is
pretty impressive ... I salute you!

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add iSCSI iBFT support (v0.4.6)

2008-02-01 Thread James Bottomley

On Wed, 2008-01-30 at 17:37 -0400, Konrad Rzeszutek wrote:
> This patch (v0.4.6) adds /sysfs/firmware/ibft/[initiator|targetX|ethernetX]
> directories along with text properties which export the the iSCSI
> Boot Firmware Table (iBFT) structure.
> 
> What is iSCSI Boot Firmware Table? It is a mechanism for the iSCSI
> tools to extract from the machine NICs the iSCSI connection information
> so that they can automagically mount the iSCSI share/target. Currently
> the iSCSI information is hard-coded in the initrd. The /sysfs entries
> are read-only one-name-and-value fields.
> 
> The usual set of data exposed is:
> 
> # for a in `find /sys/firmware/ibft/ -type f -print`; do  echo -n "$a: ";  
> cat $a; done
> /sys/firmware/ibft/target0/target-name: iqn.2007.com.intel-sbx44:storage-10gb
> /sys/firmware/ibft/target0/nic-assoc: 0
> /sys/firmware/ibft/target0/chap-type: 0
> /sys/firmware/ibft/target0/lun: 
> /sys/firmware/ibft/target0/port: 3260
> /sys/firmware/ibft/target0/ip-addr: 192.168.79.116
> /sys/firmware/ibft/target0/flags: 3
> /sys/firmware/ibft/target0/index: 0
> /sys/firmware/ibft/ethernet0/mac: 00:11:25:9d:8b:01
> /sys/firmware/ibft/ethernet0/vlan: 0
> /sys/firmware/ibft/ethernet0/gateway: 192.168.79.254
> /sys/firmware/ibft/ethernet0/origin: 0
> /sys/firmware/ibft/ethernet0/subnet-mask: 255.255.252.0
> /sys/firmware/ibft/ethernet0/ip-addr: 192.168.77.41
> /sys/firmware/ibft/ethernet0/flags: 7
> /sys/firmware/ibft/ethernet0/index: 0
> /sys/firmware/ibft/initiator/initiator-name: iqn.2007-07.com:konrad.initiator
> /sys/firmware/ibft/initiator/flags: 3
> /sys/firmware/ibft/initiator/index: 0
> 
> 
> For full details of the IBFT structure please take a look at:
> ftp://ftp.software.ibm.com/systems/support/system_x_pdf/ibm_iscsi_boot_firmware_table_v1.02.pdf

Some pieces of the patch are obviously wrong:  find_ibft() shouldn't be
in ibft_init ... if ibft_phys was zero, it means the bootmem reservation
wasn't done and you shouldn't be poking about in memory which has likely
now been overwritten.

Also, why is ibft_phys the global variable you pass?  You never actually
want to use the physical address, what you always end up using is the
kernel virtual address.

I'd simply use the ibft variable to point to the virtual address of the
ibft or null if not found, then you can throw out all the phys_to_virt()
calls.

Also, move the reserve_bootmem into the ibft_find routines and ensure
they're only called once on boot.  Refuse to attach the ibft driver if
the virtual pointer ibft is null.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread James Bottomley


On Sat, 2008-02-02 at 10:51 -0500, Jeff Garzik wrote:
> Ingo Molnar wrote:
> > ===
> > --- linux.orig/drivers/scsi/lpfc/lpfc_init.c
> > +++ linux/drivers/scsi/lpfc/lpfc_init.c
> > @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
> > uint16_t iotag;
> > int bars = pci_select_bars(pdev, IORESOURCE_MEM);
> >  
> > -   if (pci_enable_device_bars(pdev, bars))
> > +   if (pci_enable_device_io(pdev))
> > goto out;
> 
> Look at the line right above it...  AFAICS you want 
> pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs.
> 
> Also a CC to linux-scsi and the driver author would be nice, as they are 
> the ones with hardware and can verify.
> 
> This set of changes seemed like 50% guesswork to me, without consulting 
> the authors :(  And unlike many changes, you actually have to know the 
> hardware [or get clues from surrounding code] to make the change.

Jeff is right, this fix is wrong.

The correct fix is here:

http://marc.info/?l=linux-scsi&m=120196768605031

Ingo, could you please cc linux-scsi on your mails ... it would have
been a complete disaster to have this incorrect patch go in.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] pci: pci_enable_device_bars() fix

2008-02-02 Thread James Bottomley

On Sat, 2008-02-02 at 18:08 +0100, Ingo Molnar wrote:
> * Jeff Garzik <[EMAIL PROTECTED]> wrote:
> 
> > Ingo Molnar wrote:
> >> ===
> >> --- linux.orig/drivers/scsi/lpfc/lpfc_init.c
> >> +++ linux/drivers/scsi/lpfc/lpfc_init.c
> >> @@ -1894,7 +1894,7 @@ lpfc_pci_probe_one(struct pci_dev *pdev,
> >>uint16_t iotag;
> >>int bars = pci_select_bars(pdev, IORESOURCE_MEM);
> >>  - if (pci_enable_device_bars(pdev, bars))
> >> +  if (pci_enable_device_io(pdev))
> >>goto out;
> >
> > Look at the line right above it...  AFAICS you want 
> > pci_enable_device_mem(), if the mask is selecting IORESOURCE_MEM BARs.
> >
> > Also a CC to linux-scsi and the driver author would be nice, as they 
> > are the ones with hardware and can verify.
> 
> it would have been totally appropriate for me to just send a mail to 
> lkml with the proper subject line about the breakage. (I might even have 
> decided to stay completely silent about the issue and fix it for my own 
> build, letting you guys figure it out.)

We agreed to differ a while ago on your head in the sand, post it to
LKML because everybody follows that list attitude.  Some nice soul will
always (eventually) forward to the correct list, so this practice more
or less works (just not in as timely fashion as actually posting to the
relevant list).

> Instead i did a search of lkml (based on the function name in the build 
> error) and figured out where the pull request was on lkml: Greg. I 
> replied to that mail, he'll obviously know whom else to Cc from that 
> point on (if anyone). I really didnt want to (nor did i need to) figure 
> out whether this was some general driver level API change that happen 
> kernel-wide, or some SCSI specific change. I simply replied to the pull 
> request whose Cc: line seemed well-populated to me already. I also took 
> a look at the commit itself and did a quick hack in a hurry to keep the 
> tests rolling. It really did not occur to me that i should have added 
> anyone else to the Cc: line, as [EMAIL PROTECTED] was 
> Cc:-ed already so i assumed the interest was from that angle.
> 
> ( And as this was spent from my family's weekend time and i had no time
>   and no interest to dig any further than to figure out the "first hop"
>   of the change that broke the build, and the parties who initiated that
>   hop. I'm in fact surprised that your and James's answer to my
>   bugreport is hostility. )

What's hostile about telling you your patch is wrong and pointing you at
the correct one?

> So i find your suggestion that i should have added more people to the 
> Cc: line unfair on several levels.
> 
> > This set of changes seemed like 50% guesswork to me, without 
> > consulting the authors :( And unlike many changes, you actually have 
> > to know the hardware [or get clues from surrounding code] to make the 
> > change.
> 
> you mean the whole set of changes? Or just mine? Mine was a 30 seconds 
> guesswork of course, i dont have the hardware, i didnt do the change, 
> nor did i do anything near that code - i just saw the build failure stop 
> my testing and sent in this notification and a hack. That's why i sent 
> it to Greg, as a reply to the mail where he pushed these changes 
> upstream and who'll know what to do with it from that point on. My patch 
> can be totally thrown away (and should be, apparently).
> 
> but ... i guess next time i'll think twice before sending any bugreports 
> about or related to the SCSI code anywhere, unless they become really 
> annoying. Who needs this hassle?

Oh good grief, don't come the raw prawn with us!

You're a kernel maintainer, you are expected to know the rules and
follow them.  The very fact that a well known maintainer posts a patch
with a signed-off-by to Andrew and Linus means that it likely gets
applied.  Because of this, maintainers and other people with similarly
trusted positions are expected to go via the lists and subsystems in
part as general courtesy, but also to verify that their patch is
actually valid before it gets applied.

Are you seriously telling us that it required too much investigation on
your part to figure out that something with a compile failure in
drivers/scsi might belong on the scsi list?

Let me tell you exactly what would have happened if that patch had been
applied:  Because the wrong bars were enabled, the device would have
attached but been non-functional.  Likely Emulex would have picked this
up in their -rc1 testing and spent several days trying to trace the
cause in their labs.  Chances are, that, because this type of breakage
is extremely subtle, they wouldn't have noticed the fact that the enable
should have been _mem not _io.  Then they would have raised the issue to
linux-scsi.  It would probably take at least a person week of effort
before anyone finally noticed what the issue was.

If you can't see that your behaviour needs modifying, I suggest you
seriously conside

Re: [patch] scsi: fix spurious build failures in aic7xxx

2008-02-03 Thread James Bottomley

On Sun, 2008-02-03 at 02:41 -0500, Sam Ravnborg wrote:
> Thanks for testing Ingo.
> 
> James - I plan to submit this via kbuild.git in about 10
> hours.
> Let me know if you would like it to go in via scsi and I
> will drop you an updated patch (need to elaboarte a bit more
> in
> the commit message).

Sure, the changes look fine .. and will not conflict with anything
pending I currently have.  You can add my acked-by if you want.

Please also note the @steeleye.com email address doesn't come to me any
more, so this email had to be forwarded.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] scsi: fix spurious build failures in aic7xxx

2008-02-03 Thread James Bottomley

On Sun, 2008-02-03 at 21:05 +0100, Sam Ravnborg wrote:
> On Sun, Feb 03, 2008 at 01:23:42PM -0600, James Bottomley wrote:
> > On Sun, 2008-02-03 at 02:41 -0500, Sam Ravnborg wrote:
> > > Thanks for testing Ingo.
> > > 
> > > James - I plan to submit this via kbuild.git in about 10
> > > hours.
> > > Let me know if you would like it to go in via scsi and I
> > > will drop you an updated patch (need to elaboarte a bit more
> > > in
> > > the commit message).
> > 
> > Sure, the changes look fine .. and will not conflict with anything
> > pending I currently have.  You can add my acked-by if you want.
> Thanks - will do so.

Great

> > 
> > Please also note the @steeleye.com email address doesn't come to me any
> > more, so this email had to be forwarded.
> OK.
> Can I ask you to fix these:
> Documentation/DocBook/scsi.tmpl:[EMAIL PROTECTED]
> Documentation/scsi/scsi_mid_low_api.txt:    James Bottomley  Bottomley at steeleye dot com>

Sure, I'll patch those

> drivers/base/attribute_container.c: * Copyright (c) 2005 - James Bottomley 
> <[EMAIL PROTECTED]>
> drivers/base/transport_class.c: * Copyright (c) 2005 - James Bottomley 
> <[EMAIL PROTECTED]>
> drivers/scsi/raid_class.c: * Copyright (c) 2005 - James Bottomley <[EMAIL 
> PROTECTED]>
> include/linux/attribute_container.h: * Copyright (c) 2005 - James Bottomley 
> <[EMAIL PROTECTED]>
> include/linux/raid_class.h: * Copyright (c) 2005 - James Bottomley <[EMAIL 
> PROTECTED]>
> include/linux/transport_class.h: * Copyright (c) 2005 - James Bottomley 
> <[EMAIL PROTECTED]>

This is less obvious:  The copyrights are historical rather than current
and also, I was at steeleye when I did that work, so it does deserve to
be reflected in the file history.  I know it adds confusion from a whom
do I contact for this file point of view, but it is legally the right
thing to do.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] scsi: fix spurious build failures in aic7xxx

2008-02-03 Thread James Bottomley

On Sun, 2008-02-03 at 21:38 +0100, Sam Ravnborg wrote:
> > 
> > > drivers/base/attribute_container.c: * Copyright (c) 2005 - James 
> > > Bottomley <[EMAIL PROTECTED]>
> > > drivers/base/transport_class.c: * Copyright (c) 2005 - James Bottomley 
> > > <[EMAIL PROTECTED]>
> > > drivers/scsi/raid_class.c: * Copyright (c) 2005 - James Bottomley <[EMAIL 
> > > PROTECTED]>
> > > include/linux/attribute_container.h: * Copyright (c) 2005 - James 
> > > Bottomley <[EMAIL PROTECTED]>
> > > include/linux/raid_class.h: * Copyright (c) 2005 - James Bottomley 
> > > <[EMAIL PROTECTED]>
> > > include/linux/transport_class.h: * Copyright (c) 2005 - James Bottomley 
> > > <[EMAIL PROTECTED]>
> > 
> > This is less obvious:  The copyrights are historical rather than current
> > and also, I was at steeleye when I did that work, so it does deserve to
> > be reflected in the file history.  I know it adds confusion from a whom
> > do I contact for this file point of view, but it is legally the right
> > thing to do.
> When I read the above is that the Copyright are assigned to the
> person "James Bottomley" with an email address.

That's the correct reading.

> If Steeleye deserve copyrights then _I_ would have
> explicitly stated the company name too.

No, I own the copyrights, but SteelEye was partially funding my open
source work at the time.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] enclosure: add support for enclosure services

2008-02-03 Thread James Bottomley

The enclosure misc device is really just a library providing sysfs
support for physical enclosure devices and their components.

Signed-off-by: James Bottomley <[EMAIL PROTECTED]>
---

See the additional ses patch for SCSI enclosure services users of this.

---
 drivers/misc/Kconfig  |   10 +
 drivers/misc/Makefile |1 +
 drivers/misc/enclosure.c  |  449 +
 include/linux/enclosure.h |  120 
 4 files changed, 580 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/enclosure.c
 create mode 100644 include/linux/enclosure.h

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index b5e67c0..c6e5c09 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -11,6 +11,7 @@ menuconfig MISC_DEVICES
 
  If you say N, all options in this submenu will be skipped and 
disabled.
 
+
 if MISC_DEVICES
 
 config IBM_ASM
@@ -232,4 +233,13 @@ config ATMEL_SSC
 
  If unsure, say N.
 
+config ENCLOSURE_SERVICES
+   tristate "Enclosure Services"
+   default n
+   help
+ Provides support for intelligent enclosures (bays which
+ contain storage devices).  You also need either a host
+ driver (SCSI/ATA) which supports enclosures
+ or a SCSI enclosure device (SES) to use these services.
+
 endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 87f2685..de9f1f5 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_SONY_LAPTOP) += sony-laptop.o
 obj-$(CONFIG_THINKPAD_ACPI)+= thinkpad_acpi.o
 obj-$(CONFIG_FUJITSU_LAPTOP)   += fujitsu-laptop.o
 obj-$(CONFIG_EEPROM_93CX6) += eeprom_93cx6.o
+obj-$(CONFIG_ENCLOSURE_SERVICES) += enclosure.o
\ No newline at end of file
diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
new file mode 100644
index 000..e4683cd
--- /dev/null
+++ b/drivers/misc/enclosure.c
@@ -0,0 +1,449 @@
+/*
+ * Enclosure Services
+ *
+ * Copyright (C) 2008 James Bottomley <[EMAIL PROTECTED]>
+ *
+**-
+**
+**  This program is free software; you can redistribute it and/or
+**  modify it under the terms of the GNU General Public License
+**  version 2 as published by the Free Software Foundation.
+**
+**  This program is distributed in the hope that it will be useful,
+**  but WITHOUT ANY WARRANTY; without even the implied warranty of
+**  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+**  GNU General Public License for more details.
+**
+**  You should have received a copy of the GNU General Public License
+**  along with this program; if not, write to the Free Software
+**  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+**
+**-
+*/
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static LIST_HEAD(container_list);
+static DEFINE_MUTEX(container_list_lock);
+static struct class enclosure_class;
+static struct class enclosure_component_class;
+
+/**
+ * enclosure_find - find an enclosure given a device
+ * @dev:   the device to find for
+ *
+ * Looks through the list of registered enclosures to see
+ * if it can find a match for a device.  Returns NULL if no
+ * enclosure is found.
+ */
+struct enclosure_device *enclosure_find(struct device *dev)
+{
+   struct enclosure_device *edev = NULL;
+
+   mutex_lock(&container_list_lock);
+   list_for_each_entry(edev, &container_list, node) {
+   if (edev->cdev.dev == dev) {
+   mutex_unlock(&container_list_lock);
+   return edev;
+   }
+   }
+   mutex_unlock(&container_list_lock);
+
+   return NULL;
+}
+EXPORT_SYMBOL(enclosure_find);
+
+/**
+ * enclosure_for_each_device - calls a function for each enclosure
+ * @fn:the function to call
+ * @data:  the data to pass to each call
+ *
+ * Loops over all the enclosures calling the function.
+ *
+ * Note, this function uses a mutex which will be held across calls to
+ * @fn, so it must have user context, and @fn should not sleep or
+ * otherwise cause the mutex to be held for indefinite periods
+ */
+int enclosure_for_each_device(int (*fn)(struct enclosure_device *, void *),
+ void *data)
+{
+   int error = 0;
+   struct enclosure_device *edev;
+
+   mutex_lock(&container_list_lock);
+   list_for_each_entry(edev, &container_list, node) {
+   error = fn(edev, data);
+   if (error)
+   break;
+   }
+   mutex_unlock(&container_list_lock);
+
+   return error;
+}
+EXPORT_SYMBOL_GPL(enclosure_for_each_device);
+
+/**
+ * enclosure_register - register device as an enclosure
+ *
+ * @dev:   device containing the enclosure
+ * @com

[PATCH] ses: add new Enclosure ULD

2008-02-03 Thread James Bottomley

This adds support to SCSI for enclosure services devices. It also makes
use of the enclosure services added in an earlier patch to display the
enclosure topology in sysfs.

At the moment, the enclosures are SAS specific, but if anyone actually
has a non-SAS enclosure that follows the SES-2 standard, we can add that
as well.

On my Vitesse based system, the enclosures show up like this:

sparkweed:~# ls -l /sys/class/enclosure/0\:0\:1\:0/
total 0
-r--r--r-- 1 root root 4096 2008-02-03 15:44 components
lrwxrwxrwx 1 root root0 2008-02-03 15:44 device -> 
../../../devices/pci:01/:01:02.0/host0/port-0:0/expander-0:0/port-0:0:12/end_device-0:0:12/target0:0:1/0:0:1:0
drwxr-xr-x 2 root root0 2008-02-03 15:44 SLOT 000
drwxr-xr-x 2 root root0 2008-02-03 15:44 SLOT 001
drwxr-xr-x 2 root root0 2008-02-03 15:44 SLOT 002
drwxr-xr-x 2 root root0 2008-02-03 15:44 SLOT 003
drwxr-xr-x 2 root root0 2008-02-03 15:44 SLOT 004
drwxr-xr-x 2 root root0 2008-02-03 15:44 SLOT 005
lrwxrwxrwx 1 root root0 2008-02-03 15:44 subsystem -> ../../enclosure
--w--- 1 root root 4096 2008-02-03 15:44 uevent

And the individual occupied slots like this:

sparkweed:~# ls -l /sys/class/enclosure/0\:0\:1\:0/SLOT\ 001/
total 0
-rw-r--r-- 1 root root 4096 2008-02-03 15:45 active
lrwxrwxrwx 1 root root0 2008-02-03 15:45 device -> 
../../../../devices/pci:01/:01:02.0/host0/port-0:0/expander-0:0/port-0:0:11/end_device-0:0:11/target0:0:0/0:0:0:0
-rw-r--r-- 1 root root 4096 2008-02-03 15:45 fault
-rw-r--r-- 1 root root 4096 2008-02-03 15:45 locate
-rw-r--r-- 1 root root 4096 2008-02-03 15:45 status
lrwxrwxrwx 1 root root0 2008-02-03 15:45 subsystem -> 
../../../enclosure_component
-r--r--r-- 1 root root 4096 2008-02-03 15:45 type
--w--- 1 root root 4096 2008-02-03 15:45 uevent

You can flash the various blinky lights by echoing to the fault and locate 
files.

>From the device's point of view, you can see it has an enclosure like this:

sparkweed:~# ls /sys/class/scsi_disk/0\:0\:0\:0/device/
block:sda genericqueue_depth  state
bsg:0:0:0:0   iocounterbits  queue_type   subsystem
bus   iodone_cnt rescan   timeout
deleteioerr_cnt  rev  type
device_blockediorequest_cnt  scsi_device:0:0:0:0  uevent
drivermodalias   scsi_disk:0:0:0:0vendor
enclosure_component:SLOT 001  model  scsi_generic:sg0
evt_media_change  power  scsi_level

Note the enclosure_component:SLOT 001 which shows where in the enclosure
this device fits.

The astute will notice that I'm using SCSI VPD Inquiries to identify the
devices.  This, unfortunately, won't work for SATA devices unless we do
some really nasty hacking about on the SAT because the only think that
knows the SAS addresses for SATA devices is libsas, not libata where the
SAT resides.

Signed-off-by: James Bottomley <[EMAIL PROTECTED]>
---
 drivers/scsi/Kconfig  |   10 +-
 drivers/scsi/Makefile |1 +
 drivers/scsi/ses.c|  684 +
 3 files changed, 694 insertions(+), 1 deletions(-)
 create mode 100644 drivers/scsi/ses.c

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 0b243e0..7bc7b1a 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -179,7 +179,15 @@ config CHR_DEV_SCH
  say M here and read  and
  . The module will be called ch.o.
  If unsure, say N.
-   
+
+config SCSI_ENCLOSURE
+   tristate "SCSI Enclosure Support"
+   depends on SCSI && ENCLOSURE_SERVICES
+   help
+ Enclosures are devices sitting on or in SCSI backplanes that
+ manage devices.  If you have a disk cage, the chances are that
+ it has an enclosure device.  Selecting this option will just allow
+ certain enclosure conditions to be reported and is not required.
 
 comment "Some SCSI devices (e.g. CD jukebox) support multiple LUNs"
depends on SCSI
diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index 93e1428..46370cb 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -138,6 +138,7 @@ obj-$(CONFIG_BLK_DEV_SD)+= sd_mod.o
 obj-$(CONFIG_BLK_DEV_SR)   += sr_mod.o
 obj-$(CONFIG_CHR_DEV_SG)   += sg.o
 obj-$(CONFIG_CHR_DEV_SCH)  += ch.o
+obj-$(CONFIG_SCSI_ENCLOSURE)   += ses.o
 
 # This goes last, so that "real" scsi devices probe earlier
 obj-$(CONFIG_SCSI_DEBUG)   += scsi_debug.o
diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
new file mode 100644
index 000..2656b2d
--- /dev/null
+++ b/drivers/scsi/ses.c
@@ -0,0 +1,684 @@
+/*
+ * SCSI Enclosure Services
+ *
+ * Copyright (C) 2008 James Bottomley <[EMAIL PROTECTED]>
+ *
+**-
+*

Re: [PATCH] enclosure: add support for enclosure services

2008-02-03 Thread James Bottomley

On Sun, 2008-02-03 at 23:03 +0100, Sam Ravnborg wrote:
> Hi James.
> 
> Nitpicking only.
> 
>   Sam

Thanks for the review.

> > +
> >  if MISC_DEVICES
> 
> Unrelated change.

Yes, removed it.

> > +config ENCLOSURE_SERVICES
> > +   tristate "Enclosure Services"
> > +   default n
> Not needed. n is default.

It is now ... plus there are a few other entries in this file with
default n.

> > +obj-$(CONFIG_ENCLOSURE_SERVICES) += enclosure.o
> > \ No newline at end of file
> Can we get one..

added

> > +EXPORT_SYMBOL(enclosure_find);
> 
> No GPL - but the other of your EXPORT's are GPL.

Done ... Greg gets annoyed about these.

> > +struct enclosure_device *
> > +enclosure_register(struct device *dev, const char *name, int components,
> > +  struct enclosure_component_callbacks *cb)
> style purists would request you to put return type and function
> name on the same line...

I tend to prefer the function name always at the beginning of the line.
But even style purists have to agree it's better than trying to futilely
squash all the arguments on separate lines because the return complex
type is quite long.

>   +   struct enclosure_device *edev =
> > +   kzalloc(sizeof(struct enclosure_device) +
> > +   sizeof(struct enclosure_component)*components,
> > +   GFP_KERNEL);
> > +   int err, i;
> > +
> > +   if (!edev)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   if (!cb)
> > +   return ERR_PTR(-EINVAL);
> We leak memory here - we never free edev.

We do indeed .. fixed.

> > +struct enclosure_component *
> > +enclosure_component_register(struct enclosure_device *edev,
> > +unsigned int number,
> > +    enum enclosure_component_type type,
> > +const char *name)
> > +{
> Missing kernel-doc for this and the following exports.

Yes, added.

James

---

From: James Bottomley <[EMAIL PROTECTED]>
Date: Sun, 3 Feb 2008 15:40:56 -0600
Subject: [SCSI] enclosure: add support for enclosure services

The enclosure misc device is really just a library providing sysfs
support for physical enclosure devices and their components.

Signed-off-by: James Bottomley <[EMAIL PROTECTED]>
---
 drivers/misc/Kconfig  |9 +
 drivers/misc/Makefile |1 +
 drivers/misc/enclosure.c  |  486 +
 include/linux/enclosure.h |  120 +++
 4 files changed, 616 insertions(+), 0 deletions(-)
 create mode 100644 drivers/misc/enclosure.c
 create mode 100644 include/linux/enclosure.h

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index b5e67c0..ca68480 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -232,4 +232,13 @@ config ATMEL_SSC
 
  If unsure, say N.
 
+config ENCLOSURE_SERVICES
+   tristate "Enclosure Services"
+   default n
+   help
+ Provides support for intelligent enclosures (bays which
+ contain storage devices).  You also need either a host
+ driver (SCSI/ATA) which supports enclosures
+ or a SCSI enclosure device (SES) to use these services.
+
 endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 87f2685..639c755 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -17,3 +17,4 @@ obj-$(CONFIG_SONY_LAPTOP) += sony-laptop.o
 obj-$(CONFIG_THINKPAD_ACPI)+= thinkpad_acpi.o
 obj-$(CONFIG_FUJITSU_LAPTOP)   += fujitsu-laptop.o
 obj-$(CONFIG_EEPROM_93CX6) += eeprom_93cx6.o
+obj-$(CONFIG_ENCLOSURE_SERVICES) += enclosure.o
diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
new file mode 100644
index 000..42e6e43
--- /dev/null
+++ b/drivers/misc/enclosure.c
@@ -0,0 +1,486 @@
+/*
+ * Enclosure Services
+ *
+ * Copyright (C) 2008 James Bottomley <[EMAIL PROTECTED]>
+ *
+**-
+**
+**  This program is free software; you can redistribute it and/or
+**  modify it under the terms of the GNU General Public License
+**  version 2 as published by the Free Software Foundation.
+**
+**  This program is distributed in the hope that it will be useful,
+**  but WITHOUT ANY WARRANTY; without even the implied warranty of
+**  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+**  GNU General Public License for more details.
+**
+**  You should have received a copy of the GNU General Public License
+**  along with this program; if not, write to the Free Software
+**  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+**
+**-
+*/
+#include 
+#include 
+#include 
+#include 
+#include 
+#include

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread James Bottomley

On Mon, 2008-02-04 at 15:27 +0300, Vladislav Bolkhovitin wrote:
> Vladislav Bolkhovitin wrote:
> So, James, what is your opinion on the above? Or the overall SCSI target 
> project simplicity doesn't matter much for you and you think it's fine 
> to duplicate Linux page cache in the user space to keep the in-kernel 
> part of the project as small as possible?

The answers were pretty much contained here

http://marc.info/?l=linux-scsi&m=120164008302435

and here:

http://marc.info/?l=linux-scsi&m=120171067107293

Weren't they?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread James Bottomley

On Mon, 2008-02-04 at 21:38 +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> > On Mon, 2008-02-04 at 20:56 +0300, Vladislav Bolkhovitin wrote:
> > 
> >>James Bottomley wrote:
> >>
> >>>On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote:
> >>>
> >>>
> >>>>James Bottomley wrote:
> >>>>
> >>>>
> >>>>>>>>So, James, what is your opinion on the above? Or the overall SCSI 
> >>>>>>>>target 
> >>>>>>>>project simplicity doesn't matter much for you and you think it's 
> >>>>>>>>fine 
> >>>>>>>>to duplicate Linux page cache in the user space to keep the in-kernel 
> >>>>>>>>part of the project as small as possible?
> >>>>>>>
> >>>>>>>
> >>>>>>>The answers were pretty much contained here
> >>>>>>>
> >>>>>>>http://marc.info/?l=linux-scsi&m=120164008302435
> >>>>>>>
> >>>>>>>and here:
> >>>>>>>
> >>>>>>>http://marc.info/?l=linux-scsi&m=120171067107293
> >>>>>>>
> >>>>>>>Weren't they?
> >>>>>>
> >>>>>>No, sorry, it doesn't look so for me. They are about performance, but 
> >>>>>>I'm asking about the overall project's architecture, namely about one 
> >>>>>>part of it: simplicity. Particularly, what do you think about 
> >>>>>>duplicating Linux page cache in the user space to have zero-copy cached 
> >>>>>>I/O? Or can you suggest another architectural solution for that problem 
> >>>>>>in the STGT's approach?
> >>>>>
> >>>>>
> >>>>>Isn't that an advantage of a user space solution?  It simply uses the
> >>>>>backing store of whatever device supplies the data.  That means it takes
> >>>>>advantage of the existing mechanisms for caching.
> >>>>
> >>>>No, please reread this thread, especially this message: 
> >>>>http://marc.info/?l=linux-kernel&m=120169189504361&w=2. This is one of 
> >>>>the advantages of the kernel space implementation. The user space 
> >>>>implementation has to have data copied between the cache and user space 
> >>>>buffer, but the kernel space one can use pages in the cache directly, 
> >>>>without extra copy.
> >>>
> >>>
> >>>Well, you've said it thrice (the bellman cried) but that doesn't make it
> >>>true.
> >>>
> >>>The way a user space solution should work is to schedule mmapped I/O
> >>>from the backing store and then send this mmapped region off for target
> >>>I/O.  For reads, the page gather will ensure that the pages are up to
> >>>date from the backing store to the cache before sending the I/O out.
> >>>For writes, You actually have to do a msync on the region to get the
> >>>data secured to the backing store. 
> >>
> >>James, have you checked how fast is mmaped I/O if work size > size of 
> >>RAM? It's several times slower comparing to buffered I/O. It was many 
> >>times discussed in LKML and, seems, VM people consider it unavoidable. 
> > 
> > 
> > Erm, but if you're using the case of work size > size of RAM, you'll
> > find buffered I/O won't help because you don't have the memory for
> > buffers either.
> 
> James, just check and you will see, buffered I/O is a lot faster.

So in an out of memory situation the buffers you don't have are a lot
faster than the pages I don't have?

> >>So, using mmaped IO isn't an option for high performance. Plus, mmaped 
> >>IO isn't an option for high reliability requirements, since it doesn't 
> >>provide a practical way to handle I/O errors.
> > 
> > I think you'll find it does ... the page gather returns -EFAULT if
> > there's an I/O error in the gathered region. 
> 
> Err, to whom return? If you try to read from a mmaped page, which can't 
> be populated due to I/O error, you will get SIGBUS or SIGSEGV, I don't 
> remember exactly. It's quite tricky to get back to the faulted command 
> from the signal handler.
> 
> Or do you mean mmap(MAP_POPULATE)/munmap() for each command? Do you 
> think that such mapping/unmapping is good for performance?
> 
> > msync does something
> > similar if there's a write failure.
> > 
> >>>You also have to pull tricks with
> >>>the mmap region in the case of writes to prevent useless data being read
> >>>in from the backing store.
> >>
> >>Can you be more exact and specify what kind of tricks should be done for 
> >>that?
> > 
> > Actually, just avoid touching it seems to do the trick with a recent
> > kernel.
> 
> Hmm, how can one write to an mmaped page and don't touch it?

I meant from user space ... the writes are done inside the kernel.

However, as Linus has pointed out, this discussion is getting a bit off
topic.  There's no actual evidence that copy problems are causing any
performatince issues issues for STGT.  In fact, there's evidence that
they're not for everything except IB networks.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread James Bottomley


On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> >>>>So, James, what is your opinion on the above? Or the overall SCSI target 
> >>>>project simplicity doesn't matter much for you and you think it's fine 
> >>>>to duplicate Linux page cache in the user space to keep the in-kernel 
> >>>>part of the project as small as possible?
> >>>
> >>>
> >>>The answers were pretty much contained here
> >>>
> >>>http://marc.info/?l=linux-scsi&m=120164008302435
> >>>
> >>>and here:
> >>>
> >>>http://marc.info/?l=linux-scsi&m=120171067107293
> >>>
> >>>Weren't they?
> >>
> >>No, sorry, it doesn't look so for me. They are about performance, but 
> >>I'm asking about the overall project's architecture, namely about one 
> >>part of it: simplicity. Particularly, what do you think about 
> >>duplicating Linux page cache in the user space to have zero-copy cached 
> >>I/O? Or can you suggest another architectural solution for that problem 
> >>in the STGT's approach?
> > 
> > 
> > Isn't that an advantage of a user space solution?  It simply uses the
> > backing store of whatever device supplies the data.  That means it takes
> > advantage of the existing mechanisms for caching.
> 
> No, please reread this thread, especially this message: 
> http://marc.info/?l=linux-kernel&m=120169189504361&w=2. This is one of 
> the advantages of the kernel space implementation. The user space 
> implementation has to have data copied between the cache and user space 
> buffer, but the kernel space one can use pages in the cache directly, 
> without extra copy.

Well, you've said it thrice (the bellman cried) but that doesn't make it
true.

The way a user space solution should work is to schedule mmapped I/O
from the backing store and then send this mmapped region off for target
I/O.  For reads, the page gather will ensure that the pages are up to
date from the backing store to the cache before sending the I/O out.
For writes, You actually have to do a msync on the region to get the
data secured to the backing store.  You also have to pull tricks with
the mmap region in the case of writes to prevent useless data being read
in from the backing store.  However, none of this involves data copies.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread James Bottomley

On Mon, 2008-02-04 at 20:56 +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> > On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote:
> > 
> >>James Bottomley wrote:
> >>
> >>>>>>So, James, what is your opinion on the above? Or the overall SCSI 
> >>>>>>target 
> >>>>>>project simplicity doesn't matter much for you and you think it's fine 
> >>>>>>to duplicate Linux page cache in the user space to keep the in-kernel 
> >>>>>>part of the project as small as possible?
> >>>>>
> >>>>>
> >>>>>The answers were pretty much contained here
> >>>>>
> >>>>>http://marc.info/?l=linux-scsi&m=120164008302435
> >>>>>
> >>>>>and here:
> >>>>>
> >>>>>http://marc.info/?l=linux-scsi&m=120171067107293
> >>>>>
> >>>>>Weren't they?
> >>>>
> >>>>No, sorry, it doesn't look so for me. They are about performance, but 
> >>>>I'm asking about the overall project's architecture, namely about one 
> >>>>part of it: simplicity. Particularly, what do you think about 
> >>>>duplicating Linux page cache in the user space to have zero-copy cached 
> >>>>I/O? Or can you suggest another architectural solution for that problem 
> >>>>in the STGT's approach?
> >>>
> >>>
> >>>Isn't that an advantage of a user space solution?  It simply uses the
> >>>backing store of whatever device supplies the data.  That means it takes
> >>>advantage of the existing mechanisms for caching.
> >>
> >>No, please reread this thread, especially this message: 
> >>http://marc.info/?l=linux-kernel&m=120169189504361&w=2. This is one of 
> >>the advantages of the kernel space implementation. The user space 
> >>implementation has to have data copied between the cache and user space 
> >>buffer, but the kernel space one can use pages in the cache directly, 
> >>without extra copy.
> > 
> > 
> > Well, you've said it thrice (the bellman cried) but that doesn't make it
> > true.
> > 
> > The way a user space solution should work is to schedule mmapped I/O
> > from the backing store and then send this mmapped region off for target
> > I/O.  For reads, the page gather will ensure that the pages are up to
> > date from the backing store to the cache before sending the I/O out.
> > For writes, You actually have to do a msync on the region to get the
> > data secured to the backing store. 
> 
> James, have you checked how fast is mmaped I/O if work size > size of 
> RAM? It's several times slower comparing to buffered I/O. It was many 
> times discussed in LKML and, seems, VM people consider it unavoidable. 

Erm, but if you're using the case of work size > size of RAM, you'll
find buffered I/O won't help because you don't have the memory for
buffers either.

> So, using mmaped IO isn't an option for high performance. Plus, mmaped 
> IO isn't an option for high reliability requirements, since it doesn't 
> provide a practical way to handle I/O errors.

I think you'll find it does ... the page gather returns -EFAULT if
there's an I/O error in the gathered region.  msync does something
similar if there's a write failure.

> > You also have to pull tricks with
> > the mmap region in the case of writes to prevent useless data being read
> > in from the backing store.
> 
> Can you be more exact and specify what kind of tricks should be done for 
> that?

Actually, just avoid touching it seems to do the trick with a recent
kernel.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread James Bottomley


On Mon, 2008-02-04 at 19:25 +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> >>Vladislav Bolkhovitin wrote:
> >>So, James, what is your opinion on the above? Or the overall SCSI target 
> >>project simplicity doesn't matter much for you and you think it's fine 
> >>to duplicate Linux page cache in the user space to keep the in-kernel 
> >>part of the project as small as possible?
> > 
> > 
> > The answers were pretty much contained here
> > 
> > http://marc.info/?l=linux-scsi&m=120164008302435
> > 
> > and here:
> > 
> > http://marc.info/?l=linux-scsi&m=120171067107293
> > 
> > Weren't they?
> 
> No, sorry, it doesn't look so for me. They are about performance, but 
> I'm asking about the overall project's architecture, namely about one 
> part of it: simplicity. Particularly, what do you think about 
> duplicating Linux page cache in the user space to have zero-copy cached 
> I/O? Or can you suggest another architectural solution for that problem 
> in the STGT's approach?

Isn't that an advantage of a user space solution?  It simply uses the
backing store of whatever device supplies the data.  That means it takes
advantage of the existing mechanisms for caching.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread James Bottomley

On Mon, 2008-02-04 at 10:29 -0800, Linus Torvalds wrote:
> 
> On Mon, 4 Feb 2008, James Bottomley wrote:
> > 
> > The way a user space solution should work is to schedule mmapped I/O
> > from the backing store and then send this mmapped region off for target
> > I/O.
> 
> mmap'ing may avoid the copy, but the overhead of a mmap operation is 
> quite often much *bigger* than the overhead of a copy operation.
> 
> Please do not advocate the use of mmap() as a way to avoid memory copies. 
> It's not realistic. Even if you can do it with a single "mmap()" system 
> call (which is not at all a given, considering that block devices can 
> easily be much larger than the available virtual memory space), the fact 
> is that page table games along with the fault (and even just TLB miss) 
> overhead is easily more than the cost of copying a page in a nice 
> streaming manner.
> 
> Yes, memory is "slow", but dammit, so is mmap().
> 
> > You also have to pull tricks with the mmap region in the case of writes 
> > to prevent useless data being read in from the backing store.  However, 
> > none of this involves data copies.
> 
> "data copies" is irrelevant. The only thing that matters is performance. 
> And if avoiding data copies is more costly (or even of a similar cost) 
> than the copies themselves would have been, there is absolutely no upside, 
> and only downsides due to extra complexity.
> 
> If you want good performance for a service like this, you really generally 
> *do* need to in kernel space. You can play games in user space, but you're 
> fooling yourself if you think you can do as well as doing it in the 
> kernel. And you're *definitely* fooling yourself if you think mmap() 
> solves performance issues. "Zero-copy" does not equate to "fast". Memory 
> speeds may be slower that core CPU speeds, but not infinitely so!
> 
> (That said: there *are* alternatives to mmap, like "splice()", that really 
> do potentially solve some issues without the page table and TLB overheads. 
> But while splice() avoids the costs of paging, I strongly suspect it would 
> still have easily measurable latency issues. Switching between user and 
> kernel space multiple times is definitely not going to be free, although 
> it's probably not a huge issue if you have big enough requests).

Sorry ... this is really just a discussion of how something (zero copy)
could be done, rather than an implementation proposal.  (I'm not
actually planning to make the STGT people do anything ... although
investigating splice does sound interesting).

Right at the moment, STGT seems to be performing just fine on
measurements up to gigabit networks.  There are suggestions that there
may be a problem on 8G IB networks, but it's not definitive yet.

I'm already on record as saying I think the best fix for IB networks is
just to reduce the context switches by increasing the transfer size, but
the infrastructure to allow that only just went into git head.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rfc] direct IO submission and completion scalability issues

2008-02-04 Thread James Bottomley

On Mon, 2008-02-04 at 05:33 -0500, Jens Axboe wrote:
> As Andi mentions, we can look into making that lockless. For the initial
> implementation I didn't really care, just wanted something to play with
> that would nicely allow me to control both the submit and complete side
> of the affinity issue.

Sorry, late to the party ... it went to my steeleye address, not my
current one.

Could you try re-running the tests with a low queue depth (say around 8)
and the card interrupt bound to a single CPU.

The reason for asking you to do this is that it should emulate almost
precisely what you're looking for:  The submit path will be picked up in
the SCSI softirq where the queue gets run, so you should find that all
submit and returns happen on a single CPU, so everything gets cache hot
there.

James

p.s. if everyone could also update my email address to the
hansenpartnership one, the people at steeleye who monitor my old email
account would be grateful.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread James Bottomley

On Mon, 2008-02-04 at 22:43 +, Alan Cox wrote:
> > better. So for example, I personally suspect that ATA-over-ethernet is way 
> > better than some crazy SCSI-over-TCP crap, but I'm biased for simple and 
> > low-level, and against those crazy SCSI people to begin with.
> 
> Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP
> would probably trash iSCSI for latency if nothing else.

Actually, there's also FCoE now ... which is essentially SCSI
encapsulated in Fibre Channel Protocols (FCP) running over ethernet with
Jumbo frames.  It does the standard SCSI TCQ, so should answer all the
latency pieces.  Intel even has an implementation:

http://www.open-fcoe.org/

I tend to prefer the low levels as well.  The whole disadvantage for IP
as regards iSCSI was the layers of protocols on top of it for
addressing, authenticating, encrypting and finding any iSCSI device
anywhere in the connected universe.

I tend to see loss of routing from operating at the MAC level to be a
nicely justifiable tradeoff (most storage networks tend to be hubbed or
switched anyway).  Plus an ethernet MAC with jumbo frames is a large
framed nearly lossless medium, which is practically what FCP is
expecting.  If you really have to connect large remote sites ... well
that's what tunnelling bridges are for.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] enclosure: add support for enclosure services

2008-02-04 Thread James Bottomley

On Mon, 2008-02-04 at 16:32 -0800, Luben Tuikov wrote:
> --- On Sun, 2/3/08, James Bottomley <[EMAIL PROTECTED]> wrote:
> 
> > The enclosure misc device is really just a library providing
> > sysfs
> > support for physical enclosure devices and their
> > components.
> 
> Who is the target audience/user of those facilities?
> a) The kernel itself needing to read/write SES pages?

That depends on the enclosure integration, but right at the moment, it
doesn't

> b) A user space application using sysfs to read/write
>SES pages?

Not an application so much as a user.  The idea of sysfs is to allow
users to get and set the information in addition to applications.

> At the moment SES device management is done via
> an application (user-space) and a user-space library
> used by the application and /dev/sgX to send SCSI
> commands to the SES device.

I must have missed that when I was looking for implementations; what's
the URL?

But, if we have non-scsi enclosures to integrate, that makes it harder
for a user application because it has to know all the implementations.
A sysfs framework on the other hand is a universal known thing for the
user applications.

> One could have a very good argument to not bloat
> the kernel with this but leave it to a user-space
> application and a library to do all this and
> communicate with the SES device via the kernel's /dev/sgX.

The same thing goes for other esoteric SCSI infrastructure pieces like
cd changers.  On the whole, given that ATA is asking for enclosure
management in kernel, it makes sense to consolidate the infrastructure
and a ses ULD is a very good test bed.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] enclosure: add support for enclosure services

2008-02-04 Thread James Bottomley


On Mon, 2008-02-04 at 18:01 -0800, Luben Tuikov wrote:
> --- On Mon, 2/4/08, James Bottomley <[EMAIL PROTECTED]> wrote:
> > > > The enclosure misc device is really just a
> > library providing
> > > > sysfs
> > > > support for physical enclosure devices and their
> > > > components.
> > > 
> > > Who is the target audience/user of those facilities?
> > > a) The kernel itself needing to read/write SES pages?
> > 
> > That depends on the enclosure integration, but right at the
> > moment, it
> > doesn't
> 
> Yes, I didn't suspect so.
> 
> > 
> > > b) A user space application using sysfs to read/write
> > >SES pages?
> > 
> > Not an application so much as a user.  The idea of sysfs is
> > to allow
> > users to get and set the information in addition to
> > applications.
> 
> Exactly the same argument stands for a user-space
> application with a user-space library.
> 
> This is the classical case of where it is better to
> do this in user-space as opposed to the kernel.
> 
> The kernel provides capability to access the SES
> device.  The user space application and library
> provide interpretation and control.  Thus if the
> enclosure were upgraded, one doesn't need to
> upgrade their kernel in order to utilize the new
> capabilities of the SES device.  Plus upgrading
> a user-space application is a lot easier than
> the kernel (and no reboot necessary).

The implementation is modular, so it's remove and insert ...

> Consider another thing: vendors would really like
> unprecedented access to the SES device in the enclosure
> so as your ses/enclosure code keeps state it would
> get out of sync when vendor user-space enclosure
> applications access (and modify) the SES device's
> pages.

The state model doesn't assume nothing else will alter the state.

> You can test this yourself: submit a patch
> that removes SES /dev/sgX support; advertise your
> ses/class solution and watch the fun.
> 
> > > At the moment SES device management is done via
> > > an application (user-space) and a user-space library
> > > used by the application and /dev/sgX to send SCSI
> > > commands to the SES device.
> > 
> > I must have missed that when I was looking for
> > implementations; what's
> > the URL?
> 
> I'm not aware of any GPLed ones.  That doesn't
> necessarily mean that the best course of action is
> to bloat the kernel.  You can move your ses/enclosure
> stuff to a user space application library
> and thus start a GPLed one.

Certainly ... patches welcome.

> > But, if we have non-scsi enclosures to integrate, that
> > makes it harder
> > for a user application because it has to know all the
> > implementations.
> 
> So does the kernel.  And as I pointed out above, it
> is a lot easier to upgrade a user-space application and
> library than it is to upgrade a new kernel and having
> to reboot the computer to run the new kernel.

No, think again ... it's easy for SES based enclosures because they have
a SCSI transport.  We have no transport for SGPIO based enclosures nor
for any of the other more esoteric ones.

That's not to say it can't be done, but it does mean that it can't be
completely userspace.

> > A sysfs framework on the other hand is a universal known
> > thing for the
> > user applications.
> 
> So would a user-space ses library, a la libses.so.
> 
> > > One could have a very good argument to not bloat
> > > the kernel with this but leave it to a user-space
> > > application and a library to do all this and
> > > communicate with the SES device via the kernel's
> > /dev/sgX.
> > 
> > The same thing goes for other esoteric SCSI infrastructure
> > pieces like
> > cd changers.  On the whole, given that ATA is asking for
> > enclosure
> > management in kernel, it makes sense to consolidate the
> > infrastructure
> > and a ses ULD is a very good test bed.
> 
> What is wrong with exporting the SES device as /dev/sgX
> and having a user-space application and library to
> do all this?

How do you transport the enclosure commands over /dev/sgX?  Only SES has
SCSI command encapsulation ... the rest won't even be SCSI targets ...

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] enclosure: add support for enclosure services

2008-02-04 Thread James Bottomley

On Mon, 2008-02-04 at 19:28 -0800, Luben Tuikov wrote:
> --- On Mon, 2/4/08, James Bottomley <[EMAIL PROTECTED]> wrote:
> > On Mon, 2008-02-04 at 18:01 -0800, Luben Tuikov wrote:
> > > --- On Mon, 2/4/08, James Bottomley
> > <[EMAIL PROTECTED]> wrote:
> > > > > > The enclosure misc device is really
> > just a
> > > > library providing
> > > > > > sysfs
> > > > > > support for physical enclosure devices
> > and their
> > > > > > components.
> > > > > 
> > > > > Who is the target audience/user of those
> > facilities?
> > > > > a) The kernel itself needing to read/write
> > SES pages?
> > > > 
> > > > That depends on the enclosure integration, but
> > right at the
> > > > moment, it
> > > > doesn't
> > > 
> > > Yes, I didn't suspect so.
> > > 
> > > > 
> > > > > b) A user space application using sysfs to
> > read/write
> > > > >SES pages?
> > > > 
> > > > Not an application so much as a user.  The idea
> > of sysfs is
> > > > to allow
> > > > users to get and set the information in addition
> > to
> > > > applications.
> > > 
> > > Exactly the same argument stands for a user-space
> > > application with a user-space library.
> > > 
> > > This is the classical case of where it is better to
> > > do this in user-space as opposed to the kernel.
> > > 
> > > The kernel provides capability to access the SES
> > > device.  The user space application and library
> > > provide interpretation and control.  Thus if the
> > > enclosure were upgraded, one doesn't need to
> > > upgrade their kernel in order to utilize the new
> > > capabilities of the SES device.  Plus upgrading
> > > a user-space application is a lot easier than
> > > the kernel (and no reboot necessary).
> > 
> > The implementation is modular, so it's remove and
> > insert ...
> 
> I guess the same could be said for STGT and SCST, right?

You mean both of their kernel pieces are modular?  That's correct.

> LOL, no seriously, this is unnecessary kernel bloat,
> or rather at the wrong place (see below).
> 
> > 
> > > Consider another thing: vendors would really like
> > > unprecedented access to the SES device in the
> > enclosure
> > > so as your ses/enclosure code keeps state it would
> > > get out of sync when vendor user-space enclosure
> > > applications access (and modify) the SES device's
> > > pages.
> > 
> > The state model doesn't assume nothing else will alter
> > the state.
> 
> But it would be trivial exercise to show that an
> inconsistent state can be had by modifying pages
> of the SES device directly from userspace bypassing
> your implementation.

I don't think so ... if you actually look at the code, you'll see it
doesn't really have persistent state for the enclosure.

> > > You can test this yourself: submit a patch
> > > that removes SES /dev/sgX support; advertise your
> > > ses/class solution and watch the fun.
> > > 
> > > > > At the moment SES device management is done
> > via
> > > > > an application (user-space) and a user-space
> > library
> > > > > used by the application and /dev/sgX to send
> > SCSI
> > > > > commands to the SES device.
> > > > 
> > > > I must have missed that when I was looking for
> > > > implementations; what's
> > > > the URL?
> > > 
> > > I'm not aware of any GPLed ones.  That doesn't
> > > necessarily mean that the best course of action is
> > > to bloat the kernel.  You can move your ses/enclosure
> > > stuff to a user space application library
> > > and thus start a GPLed one.
> > 
> > Certainly ... patches welcome.
> 
> I've non at the moment, plus I don't think you'd be
> the point of contact for a user-space SES library.
> Unless of course you've already started something up
> on sourceforge.
> 
> Really, such an effort already exists: it is called
> sg_ses(8).
> 
> > 
> > > > But, if we have non-scsi enclosures to integrate,
> > that
> > > > makes it harder
> > > > for a user application because it has to know all
> > the
> > > > implementatio

Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread James Bottomley


On Tue, 2008-02-05 at 05:43 +0100, Matteo Tescione wrote:
> Hi all,
> And sorry for intrusion, i am not a developer but i work everyday with iscsi
> and i found it fantastic.
> Altough Aoe, Fcoe and so on could be better, we have to look in real world
> implementations what is needed *now*, and if we look at vmware world,
> virtual iron, microsoft clustering etc, the answer is iSCSI.
> And now, SCST is the best open-source iSCSI target. So, from an end-user
> point of view, what are the really problems to not integrate scst in the
> mainstream kernel?

The fact that your last statement is conjecture.  It's definitely untrue
for non-IB networks, and the jury is still out on IB networks.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] enclosure: add support for enclosure services

2008-02-05 Thread James Bottomley

On Mon, 2008-02-04 at 21:35 -0800, Luben Tuikov wrote:
> > > I guess the same could be said for STGT and SCST,
> > right?
> > 
> > You mean both of their kernel pieces are modular? 
> > That's correct.
> 
> No, you know very well what I mean.
> 
> By the same logic you're preaching to include your
> solution part of the kernel, you can also apply to
> SCST.

Ah, but it's not ... the current patch is merely exporting an interface.
The debate in STGT vs SCST is not whether to export an interface but
where to draw the line.

You could also argue in the same vein that sd is redundant because a
filesystem could talk directly to the device via /dev/sgX (in fact OSD
based filesystems already do this).  The argument is true, but misses
the bigger picture that the interfaces exported by sd are more portable
(apply to non-SCSI block devices) and easier to use.

> > > Yes, for which the transport layer, implements the
> > > scsi device node for the SES device.  It doesn't
> > really
> > > matter if the SCSI commands sent to the SES device go
> > > over SGPIO or FC or SAS or Bluetooth or I2C, etc, the
> > > transport layer can implement that and present the
> > > /dev/sgX node.
> > 
> > But it does matter if the enclosure device doesn't
> > speak SCSI.
> 
> Enclosure management isn't as simple as you're
> portraying it here.  The enclosure management
> device speaks either SES or SAF-TE.  The transport
> protocol to access it could be SGPIO or I2C or...

Look, just read the spec; SGPIO is a bus for driving enclosures ... it
doesn't require SES or SAF-TE or even any SCSI protocol.

> >  SGPIO
> > isn't a SCSI protocol ... it's a general purpose
> > serial bus protocol.
> > It's pretty simple and register based, but it might (or
> > might not) be
> > accessible via a SCSI bridge.
> 
> I see.  You've just discovered SGPIO -- good for you.
> 
> At any rate, I told you already that what is needed
> is not what you've provided but a _device node_
> exported by the kernel, either a processor or
> enclosure type.

Wrong ... we don't export non-SCSI devices as SCSI (with the single and
rather annoying exception of ATA via SAT).

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: aic7xxx build failure

2008-02-05 Thread James Bottomley

On Tue, 2008-02-05 at 19:40 +0200, Adrian Bunk wrote:
> Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx 
> compilation:
> 
> <--  snip  -->
> 
> $ make O=../out/x86-full
> ...
>   SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h
>   SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h
>   CC  drivers/scsi/aic7xxx/aic79xx_core.o
> gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory
> gcc: no input files
> make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1
> 
> <--  snip  -->
> 
> Next "make" run brings the same failure in 
> drivers/scsi/aic7xxx/aic7xxx_core.c.
> 
> With the third "make" it works.
> 
> It might compile for people with SMP systems using -j?

I'd just say "weird behaviour"  the file being complained about is
definitely part of the tree ... does it actually exist in your tree when
gcc claims it doesn't? if so, I suspect some type of make path screwup
here.  The commit in question is this:

commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb
Author: Sam Ravnborg <[EMAIL PROTECTED]>
Date:   Sun Feb 3 21:55:49 2008 +0100

scsi: fix dependency bug in aic7 Makefile

Building the aic7xxx driver includes the copy
of an .h file from a _shipped file.

In a highly parallel build Ingo saw that the
build sometimes failed (included distcc usage).
It was tracked down to a missing dependency from the .c
source file to the generated .h file.
We started to build the .c file before the
copy (cat) operation of the .h file completed
and we then only got half of the definitions
from the copied .h file.

Add an explicit dependency from the .c files to the
generated .h files so make knows all dependencies and
finsih the build of the .h files before it starts
building the .o files.

Ingo tested this fix and reported:
good news: hundreds of successful kernel builds and no failures
overnight.

    Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>
Acked-by: Ingo Molnar <[EMAIL PROTECTED]>
Acked-by: James Bottomley <[EMAIL PROTECTED]>

diff --git a/drivers/scsi/aic7xxx/Makefile b/drivers/scsi/aic7xxx/Makefile
index e4f70c5..4c54954 100644
--- a/drivers/scsi/aic7xxx/Makefile
+++ b/drivers/scsi/aic7xxx/Makefile
@@ -44,13 +44,8 @@ clean-files += aic79xx_seq.h aic79xx_reg.h 
aic79xx_reg_print.c

 # Dependencies for generated files need to be listed explicitly

-$(obj)/aic7xxx_core.o: $(obj)/aic7xxx_seq.h
-$(obj)/aic7xxx_core.o: $(obj)/aic7xxx_reg.h
-$(obj)/aic79xx_core.o: $(obj)/aic79xx_seq.h
-$(obj)/aic79xx_core.o: $(obj)/aic79xx_reg.h
-
-$(addprefix $(obj)/,$(aic7xxx-y)): $(obj)/aic7xxx_seq.h
-$(addprefix $(obj)/,$(aic79xx-y)): $(obj)/aic79xx_seq.h
+$(addprefix $(src)/,$(aic7xxx-y:.o=.c)): $(obj)/aic7xxx_seq.h 
$(obj)/aic7xxx_reg.h
+$(addprefix $(src)/,$(aic79xx-y:.o=.c)): $(obj)/aic79xx_seq.h 
$(obj)/aic79xx_reg.h

 aic7xxx-gen-$(CONFIG_AIC7XXX_BUILD_FIRMWARE)   := $(obj)/aic7xxx_reg.h
 aic7xxx-gen-$(CONFIG_AIC7XXX_REG_PRETTY_PRINT) += $(obj)/aic7xxx_reg_print.c

The last two additions look wrong:  they make source files depend on
headers, which isn't right: it's object files that depend on headers (we
don't know how to rebuild the .c files because they're not auto
generated).  However, the commit log indicates the cause might be
deeper.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: aic7xxx build failure

2008-02-05 Thread James Bottomley


On Tue, 2008-02-05 at 19:40 +0200, Adrian Bunk wrote:
> Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx 
> compilation:
> 
> <--  snip  -->
> 
> $ make O=../out/x86-full
> ...
>   SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h
>   SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h
>   CC  drivers/scsi/aic7xxx/aic79xx_core.o
> gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory
> gcc: no input files
> make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1

Could you run this with V=1 to get us a verbose output of what the exact
files gcc is failing on are?

Thanks,

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: aic7xxx build failure

2008-02-05 Thread James Bottomley


On Tue, 2008-02-05 at 20:24 +0200, Adrian Bunk wrote:
> On Tue, Feb 05, 2008 at 12:18:04PM -0600, James Bottomley wrote:
> > 
> > On Tue, 2008-02-05 at 19:40 +0200, Adrian Bunk wrote:
> > > Commit 8891fec65ac5b5a74b50c705e31b66c92c3eddeb broke aic7xxx 
> > > compilation:
> > > 
> > > <--  snip  -->
> > > 
> > > $ make O=../out/x86-full
> > > ...
> > >   SHIPPED drivers/scsi/aic7xxx/aic79xx_seq.h
> > >   SHIPPED drivers/scsi/aic7xxx/aic79xx_reg.h
> > >   CC  drivers/scsi/aic7xxx/aic79xx_core.o
> > > gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory
> > > gcc: no input files
> > > make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1
> > 
> > Could you run this with V=1 to get us a verbose output of what the exact
> > files gcc is failing on are?
> 
> make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build 
> obj=drivers/scsi
> make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build 
> obj=drivers/scsi/aacraid
> (cat /dev/null; ) > drivers/scsi/aacraid/modules.order
> make -f /home/bunk/linux/kernel-2.6/git/linux-2.6/scripts/Makefile.build 
> obj=drivers/scsi/aic7xxx
>   cat 
> /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx/aic79xx_seq.h_shipped
>  > drivers/scsi/aic7xxx/aic79xx_seq.h
>   cat 
> /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx/aic79xx_reg.h_shipped
>  > drivers/scsi/aic7xxx/aic79xx_reg.h
>   gcc -Wp,-MD,drivers/scsi/aic7xxx/.aic79xx_core.o.d  -nostdinc -isystem 
> /usr/lib/gcc/i486-linux-gnu/4.2.3/include -D__KERNEL__ -Iinclude -Iinclude2 
> -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include -include 
> include/linux/autoconf.h 
> -I/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi/aic7xxx 
> -Idrivers/scsi/aic7xxx -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs 
> -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os 
> -m32 -msoft-float -mregparm=3 -freg-struct-return 
> -mpreferred-stack-boundary=2 -march=athlon -ffreestanding -DCONFIG_AS_CFI=1 
> -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare 
> -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow 
> -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-x86/mach-generic 
> -Iinclude/asm-x86/mach-generic 
> -I/home/bunk/linux/kernel-2.6/git/linux-2.6/include/asm-x86/mach-default 
> -Iinclude/asm-x86/mach-default -fno-omit-frame-pointer 
> -fno-optimize-sibling-calls -fno-stack-protector 
> -Wdeclaration-after-statement -Wno-pointer-sign 
> -I/home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/scsi -Idrivers/scsi  
> -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(aic79xx_core)"  
> -D"KBUILD_MODNAME=KBUILD_STR(aic79xx)" -c -o 
> drivers/scsi/aic7xxx/aic79xx_core.o drivers/scsi/aic7xxx/aic79xx_core.c
> gcc: drivers/scsi/aic7xxx/aic79xx_core.c: No such file or directory
> gcc: no input files
> make[4]: *** [drivers/scsi/aic7xxx/aic79xx_core.o] Error 1
> make[3]: *** [drivers/scsi/aic7xxx] Error 2
> make[2]: *** [drivers/scsi] Error 2
> make[1]: *** [drivers] Error 2
> make: *** [sub-make] Error 2

Do I assume from this that you have different source and object
directories?  There shouldn't be a failure if this is building
in /home/bunk/linux/kernel-2.6/git/linux-2.6/ because the source file
should be there.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread James Bottomley

This email somehow didn't manage to make it to the list (I suspect
because it had html attachments).

James

---

  From: 
Julian Satran
<[EMAIL PROTECTED]>
To: 
Nicholas A. Bellinger
<[EMAIL PROTECTED]>
Cc: 
Andrew Morton
<[EMAIL PROTECTED]>, Alan
Cox <[EMAIL PROTECTED]>, Bart
Van Assche
<[EMAIL PROTECTED]>, FUJITA
Tomonori
<[EMAIL PROTECTED]>,
James Bottomley
<[EMAIL PROTECTED]>, ...
   Subject: 
Re: Integration of SCST in the
mainstream Linux kernel
  Date: 
Mon, 4 Feb 2008 21:31:48 -0500
(20:31 CST)


Well stated. In fact the "layers" above ethernet do provide the services 
that make the TCP/IP stack compelling - a whole complement of services.
ALL services required (naming, addressing, discovery, security etc.) will 
have to be recreated if you take the FcOE route. That makes good business 
for some but not necessary for the users. Those services BTW are not on 
the data path and are not "overhead".
The TCP/IP stack pathlength is decently low. What makes most 
implementations poor is that they where naively extended in the SMP world. 
Recent implementations (published) from IBM and Intel show excellent 
performance (4-6 times the regular stack). I do not have unfortunately 
latency numbers (as the community major stress has been throughput) but I 
assume that RDMA (not necessarily hardware RDMA) and/or the use of 
infiniband or latency critical applications - within clusters may be the 
ultimate low latency solution. Ethernet has some inherent latency issues 
(the bridges) that are inherited by anything on ethernet (FcOE included). 
The IP protocol stack is not inherently slow but some implementations are 
somewhat sluggish.
But instead of replacing them with new and half backed contraptions we 
would be all better of improving what we have and understand.

In the whole debate of around FcOE I heard a single argument that may have 
some merit - building convertors iSCSI-FCP to support legacy islands of 
FCP (read storage products that do not support iSCSI natively) is 
expensive. It is correct technically - only that FcOE eliminates an 
expense at the wrong end of the wire - it reduces the cost of the storage 
box at the expense of added cost at the server (and usually there a many 
servers using a storage box). FcOE vendors are also bound to provide FCP 
like services for FcOE - naming, security, discovery etc. - that do not 
exist on Ethernet. It is a good business for FcOE vendors - a duplicate 
set of solution for users.

It should be apparent by now that if one speaks about a "converged" 
network we should speak about an IP network and not about Ethernet.
If we take this route we might get perhaps also to an "infrastructure 
physical variants" that support very low latency better than ethernet and 
we might be able to use them with the same "stack" - a definite forward 
looking solution.

IMHO it is foolish to insist on throwing away the whole stack whenever we 
make a slight improvement in the physical layer of the network. We have a 
substantial investment and body of knowledge in the protocol stack and 
nothing proposed improves on it - obviously not as in its total level of 
service nor in performance.

Julo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread James Bottomley

On Tue, 2008-02-05 at 21:59 +0300, Vladislav Bolkhovitin wrote:
> >>Hmm, how can one write to an mmaped page and don't touch it?
> > 
> > I meant from user space ... the writes are done inside the kernel.
> 
> Sure, the mmap() approach agreed to be unpractical, but could you 
> elaborate more on this anyway, please? I'm just curious. Do you think 
> about implementing a new syscall, which would put pages with data in the 
> mmap'ed area?

No, it has to do with the way invalidation occurs.  When you mmap a
region from a device or file, the kernel places page translations for
that region into your vm_area.  The regions themselves aren't backed
until faulted.  For write (i.e. incoming command to target) you specify
the write flag and send the area off to receive the data.  The gather,
expecting the pages to be overwritten, backs them with pages marked
dirty but doesn't fault in the contents (unless it already exists in the
page cache).  The kernel writes the data to the pages and the dirty
pages go back to the user.  msync() flushes them to the device.

The disadvantage of all this is that the handle for the I/O if you will
is a virtual address in a user process that doesn't actually care to see
the data. non-x86 architectures will do flushes/invalidates on this
address space as the I/O occurs.

> > However, as Linus has pointed out, this discussion is getting a bit off
> > topic. 
> 
> No, that isn't off topic. We've just proved that there is no good way to 
> implement zero-copy cached I/O for STGT. I see the only practical way 
> for that, proposed by FUJITA Tomonori some time ago: duplicating Linux 
> page cache in the user space. But will you like it?

Well, there's no real evidence that zero copy or lack of it is a problem
yet.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] First round of SCSI updates for the 3.6+ merge window

2012-10-02 Thread James Bottomley

This is a large set of updates, mostly for drivers (qla2xxx [including
support for new 83xx based card], qla4xxx, mpt2sas, bfa, zfcp, hpsa,
be2iscsi, isci, lpfc, ipr, ibmvfc, ibmvscsi, megaraid_sas).  There's
also a rework for tape adding virtually unlimited numbers of tape drives
plus a set of dif fixes for sd and a fix for a live lock on hot remove
of SCSI devices.

This round includes a signed tag pull of isci-for-3.6

Because of the need to keep the tree stable for next testing, anything
submitted after last Saturday will be in the next round.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-for-linus

The short changelog is

Adam Radford (3):
  megaraid_sas: Version and Changelog update
  megaraid_sas: Add resetwaittime module parameter
  megaraid_sas: Add throttlequeuedepth module parameter

Alexey Khoroshilov (1):
  mpt2sas: fix double mutex lock in NON_BLOCKING state

Andi Kleen (1):
  Fix incorrect memset in bnx2fc_parse_fcp_rsp

Andrew Vasquez (2):
  qla2xxx: Fail initialization if unable to load RISC code.
  qla2xxx: Ensure PLOGI is sent to Fabric Management-Server upon request.

Artur Wojcik (1):
  isci: implement suspend/resume support

Arun Easi (5):
  qla2xxx: Use the right field for container_of.
  qla2xxx: Fix incorrect status reporting on DIF errors.
  qla2xxx: T10 DIF - ISP83xx changes.
  qla2xxx: Use #defines instead of hardcoded values for intr status.
  qla2xxx: Fix for continuous rescan attempts in arbitrated loop topology.

Atul Deshmukh (1):
  qla2xxx: Wrong PCIe(2.5Gb/s x8) speed in the kerenel message for ISP82xx.

Bart Van Assche (1):
  scsi_dh_alua: Enable STPG for unavailable ports

Benjamin Herrenschmidt (2):
  ibmvscsi: Fix host config length field overflow
  ibmvscsi: Remove backend abstraction

Brian King (6):
  ibmvfc: Driver version 1.0.10
  ibmvfc: Ignore fabric RSCNs when link is dead
  ibmvfc: Fix double completion on abort timeout
  ipr: Driver version 2.5.4
  ipr: Reduce interrupt lock time
  ipr: Reduce queuecommand lock time

Chad Dupuis (12):
  qla2xxx: Update version number to 8.04.00.07-k.
  qla2xxx: Set Maximum Read Request Size to 4K.
  qla2xxx: Fix for handling some error conditions in loopback.
  qla2xxx: Fix description of qla2xmaxqdepth parameter.
  qla2xxx: Remove setting Scsi_host->this_id during adapter probe.
  qla2xxx: Implement beacon support for ISP83xx.
  qla2xxx: Only enable link up on the correct interrupt event.
  qla2xxx: Update the driver copyright.
  qla2xxx: Changes for ISP83xx loopback support.
  qla2xxx: Use bitmap to store loop_id's for fcports.
  qla2xxx: Add check in qla82xx_watchdog for failed hardware state.
  qla2xxx: Bind to ISP8031 devices.

Dan Carpenter (4):
  ipr: remove an unneeded check
  ipr: missing unlock before a return
  isci: add a couple __iomem annotations
  isci: make function declaration match implementation

Dan Williams (6):
  scsi_remove_target: fix softlockup regression on hot remove
  libsas, ipr: cleanup ata_host flags initialization via ata_host_init
  libsas: suspend / resume support
  libata: export ata_port suspend/resume infrastructure for sas
  libata: reset once
  isci: fix isci_pci_probe() generates warning on efi failure path

Dave Jiang (1):
  isci: Fix interrupt coalescing assumption of active TCs

Dave Maurer (1):
  isci: fix COMSAS negation timout workaround for WD SAS drives

Fengguang Wu (1):
  megaraid_sas: combine kmalloc+memset into kzalloc

Giridhar Malavali (4):
  qla2xxx: Allow MSI interrupt registration for ISP82xx.
  qla2xxx: Don't toggle RISC interrupt bits after IRQ lines are attached.
  qla2xxx: Fix for legacy interrupts for ISP83xx.
  qla2xxx: Don't register to legacy interrupt for ISP82xx.

Hanjun Guo (1):
  Fusion MPT: disable pci device when mpt map resoures failed

Hannes Reinecke (1):
  st: remove st_mutex

Harish Zunjarrao (1):
  qla2xxx: Add FW DUMP SIZE sysfs attribute.

James Smart (33):
  lpfc 8.3.34: Update lpfc version for 8.3.34 driver release
  lpfc 8.3.34: Fixed leaking memory from pci dma pool
  lpfc 8.3.34: Correct lock handling to eliminate reset escalation on I/O 
abort
  lpfc 8.3.34: Adjust IO Channels to 1 when INTx
  lpfc 8.3.34: Add XRI to abort handler timeout log message
  lpfc 8.3.34: Streamline fcp underrun message printing
  lpfc 8.3.34: Correct typecasts for snprintf messages
  lpfc 8.3.34: Simplify BlockGuard lpfc_printf_vlog messages
  lpfc 8.3.34: Fix parameter field in CQE to mask for LOCAL_REJECT status
  lpfc 8.3.34: Fix number of IO channels to match CPUs
  lpfc 8.3.34: Add SLI-4 V1 Capacity and Resource Descriptor support
  lpfc 8.3.34: Add LOGO support after ABTS compliance
  lpfc 8.3.33: Update lpfc version for 8.3.33 driver release

Re: [PATCH 0/2] virtio-scsi fixes for 3.6

2012-10-02 Thread James Bottomley

On Mon, 2012-10-01 at 15:11 +0200, Paolo Bonzini wrote:
> Il 26/07/2012 15:28, Paolo Bonzini ha scritto:
> > James,
> > 
> > patch 1 fixes scanning of LUNs whose number is greater than 255.  QEMU
> > passes a max_lun of 16383 (because it uses SAM numbering) but in Linux
> > it must become 32768 (because LUNs above 255 are "relocated" to 16640).
> > Patch 2 is a resubmission of the patch for online resizing of virtio-scsi
> > LUNs, which needs to be rebased.
> > 
> > LUNs above 255 now work for all of scanning, hotplug, hotunplug and
> > resize.
> > 
> > Thanks,
> > 
> > Paolo
> > 
> > Paolo Bonzini (2):
> >   virtio-scsi: fix LUNs greater than 255
> >   virtio-scsi: support online resizing of disks
> > 
> >  drivers/scsi/virtio_scsi.c  |   37 +++--
> >  include/linux/virtio_scsi.h |2 ++
> >  2 files changed, 37 insertions(+), 2 deletions(-)
> > 
> 
> Ping, are these patches going into 3.7?

They're 3.7 candidates yes (enhancements certainly aren't 3.6).  I seem
to have become lost with the virtio-scsi updates since what I have
marked for inclusion is a patch series that's a partial intersection
with this.  I'll flush my queue for virto-scsi, please resend all the
missing patches you want in 3.7.

Thanks,

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] SCSI: Re-name the HBA Type

2012-10-03 Thread James Bottomley

On Wed, 2012-10-03 at 20:14 +0800, NickCheng wrote:
> From: Nick Cheng 
> 
> Replace the nameing, hba, hbb and hbc, with hbaA, hbaB abd hbaC respectively
> Signed-off-by: Nick Cheng< nick.ch...@areca.com.tw >

Please send these to the SCSI mailing list: linux-s...@vger.kernel.org

With a proper subject (which is :).

And a proper signed off by (you need a space between your name and the
opening < and no space within).

Thanks,

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Bug 48241] New: oops when setting up LVM

2012-10-03 Thread James Bottomley

On Wed, 2012-10-03 at 15:04 +, bugzilla-dae...@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=48241
> 
>Summary: oops when setting up LVM
>Product: IO/Storage
>Version: 2.5
> Kernel Version: 3.6.0-next-20121003
>   Platform: All
> OS/Version: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: SCSI
> AssignedTo: linux-s...@vger.kernel.org
> ReportedBy: daniel.san...@pobox.com
> Regression: No
> 
> 
> Created an attachment (id=81921)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=81921)
> image of oops

The image says the RIP is at kthread_data + 0xb

That implies something went wrong within the workqueue or kthread
systems, I've cc'd linux-kernel, but it's a bit of a vague thing to go
on and could conceivably be a hardware issue (or some weird thread
interaction in linux-next).

The first question would be "does it happen in vanilla 3.6"?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] Disintegrate UAPI for parisc

2012-10-05 Thread James Bottomley

On Thu, 2012-10-04 at 20:51 +0100, David Howells wrote:
> Can you merge the following branch into the parisc tree please.

Just to confirm: this is for testing purposes only ... you'll be sending
the pull requests to Linus?  Because otherwise I'll need a signed pull
request from you.

I should be able to get around to testing this after I've looked at some
of the signal/kernel_thread/execve patches for Al Viro.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Drivers: scsi

2012-10-24 Thread James Bottomley

On Wed, 2012-10-24 at 09:25 -0700, K. Y. Srinivasan wrote:
> When the low level driver returns SCSI_MLQUEUE_DEVICE_BUSY,
> how is the command retried; I suspect the retry is done after some delay.

Delay depends mainly on I/O pressure and the unplug timer in the block
layer.

> Is this delay programmable? If the device state changes,
> can the low level driver notify upper layers that it can now handle
> the command that it had failed earlier with SCSI_MLQUEUE_DEVICE_BUSY.

In theory, you can call blk_run_queue() from the event handler that sees
the device become ready.  I don't think any driver actually does this,
but I can't see it would cause any problem.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [SCSI] ses: fix memory leaks

2008-02-11 Thread James Bottomley


On Mon, 2008-02-11 at 10:23 -0600, James Bottomley wrote:
> On Sun, 2008-02-10 at 23:25 -0800, Yinghai Lu wrote:
> > please check it...
> 
> This one looks perfect, thanks!

Well, nearly perfect.  I corrected this typo:

+   if (!buf)
+   goto simple_polulate;


Which a compile check before you submitted the patch would have picked
up ...

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH?][arch/parisc/kernel/pci-dma.c] pcxl_dma_ops.alloc_noncoherent = pa11_dma_alloc_consistent?

2008-02-11 Thread James Bottomley

On Mon, 2008-02-11 at 09:42 -0700, Matthew Wilcox wrote:
> On Mon, Feb 11, 2008 at 05:23:33PM +0100, Roel Kluin wrote:
> > duplicate pa11_dma_alloc_consistent; more appropriate appears
> > pa11_dma_alloc_noncoherent here. 
> > 
> > Not tested, please confirm that this fix is correct
> 
> I don't think it is.  The memories are fading, so I don't recall why it
> is we do it this way, but I'm pretty sure it's correct the way it is.

dma_alloc_noncoherent is a fallback for boxes with coherency problems
which cannot allocate coherent memory.  The API asks for coherent memory
but the driver promises to behave correctly even if an incoherent area
is returned.  dma_alloc_coherent() requires only coherent memory.  The
pcx boxes (PA7200 and below) can't do the uncached trick for coherent
memory, so the API is designed for them.  The PCXL can, so we allocate
coherent memory even for dma_alloc_noncoherent() in that case (and nop
out the coherency handlers the driver uses).

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH?][arch/parisc/kernel/pci-dma.c] pcxl_dma_ops.alloc_noncoherent = pa11_dma_alloc_consistent?

2008-02-11 Thread James Bottomley


On Mon, 2008-02-11 at 17:23 +0100, Roel Kluin wrote:
> duplicate pa11_dma_alloc_consistent; more appropriate appears
> pa11_dma_alloc_noncoherent here. 
> 
> Not tested, please confirm that this fix is correct

No, it looks completely incorrect to me.  What makes you think a pcxl
box has a problem with coherency?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PATCH] final SCSI updates for 2.6.24 merge window

2008-02-11 Thread James Bottomley

On Mon, 2008-02-11 at 13:13 -0800, Harvey Harrison wrote:
> On Fri, 2008-02-08 at 10:37 -0600, James Bottomley wrote:
> > On Fri, 2008-02-08 at 10:03 +0100, Geert Uytterhoeven wrote:
> > > On Thu, 7 Feb 2008, James Bottomley wrote:
> > > > On Thu, 2008-02-07 at 17:04 -0800, Harvey Harrison wrote:
> > > > > I'm going to guess that this is the entry in feature-removal.txt
> > > > > that need an update then:
> > > > > 
> > > > > ---
> > > > > 
> > > > > What: old NCR53C9x driver
> > > > > When: October 2007
> > > > > Why:  Replaced by the much better esp_scsi driver.  Actual low-level
> > > > >   driver can be ported over almost trivially.
> > > > > Who:  David Miller <[EMAIL PROTECTED]>
> > > > >   Christoph Hellwig <[EMAIL PROTECTED]>
> > > > 
> > > > Not immediately ... I anticipate a few "where'd my driver go?" type
> > > > questions from m68k for which this provides a useful reference to point
> > > > to ...
> > > 
> > > Don't bother, we're fully aware of this.
> > > 
> > > The shortest feature removal notice in Linux's history (is it?) didn't go
> > > unnoticed ;-)
> 
> Sowould a patch be welcome to remove this entry then?

Not at the moment for the reasons I already gave.  Ping me in about
three months if it's not gone by then.

James



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-11 Thread James Bottomley

On Tue, 2008-02-12 at 12:02 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Andrew was looking for someone to run a linux-next tree that just
> contained the subsystem git and quilt trees for 2.6.x+1 and I (in a
> moment of madness) volunteered.  So, this is to announce the creating of
> such a tree (it doesn't exist yet) which will require some (hopefully)
> small amount of work on the part of subsystem maintainers.

Actually, it sort of does.  If you look here:

http://www.kernel.org/pub/linux/kernel/people/jejb/merge-tree/

You'll find a merge candidate tree that builds nightly from everyone's
git and quilt trees.  I'm using it to track merge conflicts (so I only
build the patch, I don't check it compiles).

You're welcome to the scripts that do this:

http://www.kernel.org/pub/linux/kernel/people/jejb/build.pl

And the config file that runs it:

http://www.kernel.org/pub/linux/kernel/people/jejb/merge-tree-build

I don't plan to do much more than keep it building to check conflicts,
so you're welcome to take it over.

> Those interested in discussion about this are encouraged to join the
> [EMAIL PROTECTED] mailing list.
> 
> The first things I need from the subsystem maintainers (you know who you
> are) are a contact address (a list address is fine) and at least one git
> branch or quilt series that contains all the things you want to see go
> into 2.6.26.  I am happy for there to be multiple branches/series (in
> fact there probably will need to be in some cases where there are
> dependencies on others work).
> 
> The tree will be based on the current version of Linus' tree but you may
> specify an earlier branch point if you need to (hopefully not - but more
> likely for quilters, I guess).
> 
> I hope to recreate this tree every day automatically.  In order to do
> this, any tree that has a conflict will be dropped from that days tree.
> The maintainer will be notified.  I hope to provide some clue as to what
> the conflict is with, but probably not initially.

Actually the experiment with the -mc tree shows that most of the
conflicts are trivial in nature (usually docbook stuff or
feature-removal.txt stuff), so you can do a trivial triage by hand.  You
can't automatically drop them (well, not unless you want to end up
dropping half the trees).

The other problem is that we actually maintain deliberate conflicts with
a last person to merge fixes it type attitude.  Again, it's usually in
minor areas, and the fixups are fairly trivial, but it illustrates why
conflicts can't be a reason to drop a tree, you have to maintain some
sort of automatic fixup (at least I had to with the -mc tree).  The
reason we do this is that it would give the maintainers a nasty web of
included trees (which is almost impossible for the quilt trees anyway)
if we tried to resolve the conflicts and destroy our ability to rebase.

> I will attempt to build the tree between each merge (and a failed build
> will again cause the offending tree to be dropped).  These builds will be
> necessarily restricted to probably one architecture/config.  I will build
> the entire tree on as many architectures/configs as seem sensible and
> the results of that will be available on a web page (to be announced).

Yes, this is the bit I've never dared do ... principally because it's
such a time sink.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [SCSI] ses: fix memory leaks

2008-02-11 Thread James Bottomley


On Sun, 2008-02-10 at 23:25 -0800, Yinghai Lu wrote:
> please check it...

This one looks perfect, thanks!

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] scsi: ses fix mem leaking when fail to add intf

2008-02-10 Thread James Bottomley


On Sat, 2008-02-09 at 15:15 -0800, Yinghai Lu wrote:
> [PATCH] scsi: ses fix mem leaking when fail to add intf
> 
> fix leaking with scomp leaking when failing.
> also remove one extra space.

There are still a few extraneous code moves in this one.  This is about
the correct minimal set, isn't it?

James

---

From: Yinghai Lu <[EMAIL PROTECTED]>
Date: Sat, 9 Feb 2008 15:15:47 -0800
Subject: [SCSI] ses: fix memory leaks

fix leaking with scomp leaking when failing. Also free page10 on
driver removal  and remove one extra space.

Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>
Signed-off-by: James Bottomley <[EMAIL PROTECTED]>
---
 drivers/scsi/ses.c |   20 
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index 2a6e4f4..8abc4a9 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -416,11 +416,11 @@ static int ses_intf_add(struct class_device *cdev,
int i, j, types, len, components = 0;
int err = -ENOMEM;
struct enclosure_device *edev;
-   struct ses_component *scomp;
+   struct ses_component *scomp = NULL;
 
if (!scsi_device_enclosure(sdev)) {
/* not an enclosure, but might be in one */
-   edev =  enclosure_find(&sdev->host->shost_gendev);
+   edev = enclosure_find(&sdev->host->shost_gendev);
if (edev) {
ses_match_to_enclosure(edev, sdev);
class_device_put(&edev->cdev);
@@ -456,9 +456,6 @@ static int ses_intf_add(struct class_device *cdev,
if (!buf)
goto err_free;
 
-   ses_dev->page1 = buf;
-   ses_dev->page1_len = len;
-
result = ses_recv_diag(sdev, 1, buf, len);
if (result)
goto recv_failed;
@@ -473,6 +470,9 @@ static int ses_intf_add(struct class_device *cdev,
type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
components += type_ptr[1];
}
+   ses_dev->page1 = buf;
+   ses_dev->page1_len = len;
+   buf = NULL;
 
result = ses_recv_diag(sdev, 2, hdr_buf, INIT_ALLOC_SIZE);
if (result)
@@ -489,6 +489,7 @@ static int ses_intf_add(struct class_device *cdev,
goto recv_failed;
ses_dev->page2 = buf;
ses_dev->page2_len = len;
+   buf = NULL;
 
/* The additional information page --- allows us
 * to match up the devices */
@@ -506,11 +507,12 @@ static int ses_intf_add(struct class_device *cdev,
goto recv_failed;
ses_dev->page10 = buf;
ses_dev->page10_len = len;
+   buf = NULL;
 
  no_page10:
-   scomp = kmalloc(sizeof(struct ses_component) * components, GFP_KERNEL);
+   scomp = kzalloc(sizeof(struct ses_component) * components, GFP_KERNEL);
if (!scomp)
-   goto  err_free;
+   goto err_free;
 
edev = enclosure_register(cdev->dev, sdev->sdev_gendev.bus_id,
  components, &ses_enclosure_callbacks);
@@ -521,7 +523,7 @@ static int ses_intf_add(struct class_device *cdev,
 
edev->scratch = ses_dev;
for (i = 0; i < components; i++)
-   edev->component[i].scratch = scomp++;
+   edev->component[i].scratch = scomp + i;
 
/* Page 7 for the descriptors is optional */
buf = NULL;
@@ -598,6 +600,7 @@ static int ses_intf_add(struct class_device *cdev,
err = -ENODEV;
  err_free:
kfree(buf);
+   kfree(scomp);
kfree(ses_dev->page10);
kfree(ses_dev->page2);
kfree(ses_dev->page1);
@@ -630,6 +633,7 @@ static void ses_intf_remove(struct class_device *cdev,
ses_dev = edev->scratch;
edev->scratch = NULL;
 
+   kfree(ses_dev->page10);
kfree(ses_dev->page1);
kfree(ses_dev->page2);
kfree(ses_dev);
-- 
1.5.3.8



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-11 Thread James Bottomley

On Tue, 2008-02-12 at 13:23 +1100, Stephen Rothwell wrote:
> Hi James,
> 
> On Mon, 11 Feb 2008 19:36:49 -0600 James Bottomley <[EMAIL PROTECTED]> wrote:
> >
> > On Tue, 2008-02-12 at 12:02 +1100, Stephen Rothwell wrote:
> > > 
> > > Andrew was looking for someone to run a linux-next tree that just
> > > contained the subsystem git and quilt trees for 2.6.x+1 and I (in a
> > > moment of madness) volunteered.  So, this is to announce the creating of
> > > such a tree (it doesn't exist yet) which will require some (hopefully)
> > > small amount of work on the part of subsystem maintainers.
> > 
> > Actually, it sort of does.  If you look here:
> 
> Yes, Andrew pointed me there and I should have mentioned it, sorry.
> 
> > http://www.kernel.org/pub/linux/kernel/people/jejb/merge-tree/
> > 
> > You'll find a merge candidate tree that builds nightly from everyone's
> > git and quilt trees.  I'm using it to track merge conflicts (so I only
> > build the patch, I don't check it compiles).
> > 
> > You're welcome to the scripts that do this:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/jejb/build.pl
> > 
> > And the config file that runs it:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/jejb/merge-tree-build
> > 
> > I don't plan to do much more than keep it building to check conflicts,
> > so you're welcome to take it over.
> 
> Thanks, they are very useful.
> 
> > > I hope to recreate this tree every day automatically.  In order to do
> > > this, any tree that has a conflict will be dropped from that days tree.
> > > The maintainer will be notified.  I hope to provide some clue as to what
> > > the conflict is with, but probably not initially.
> > 
> > Actually the experiment with the -mc tree shows that most of the
> > conflicts are trivial in nature (usually docbook stuff or
> > feature-removal.txt stuff), so you can do a trivial triage by hand.  You
> > can't automatically drop them (well, not unless you want to end up
> > dropping half the trees).
> 
> Well I did a trial run with the 40 git trees in the latest -mm and got
> only 6 conflicts (which actually were not trivial, unfortunately).  Of
> course, a lot of them have already been pulled into Linus' tree by now.

Hmm, OK  of the 46 I pull in, it's only xfs causing difficulties at the
moment.  Pre 2.6.24 being declared, we had about 10 fixups, all of which
were trivial except for one or two I got the subsystems to fix.

> > The other problem is that we actually maintain deliberate conflicts with
> > a last person to merge fixes it type attitude.  Again, it's usually in
> 
> I was hoping to be able to automatically find the other tree involved in
> a conflict and point both maintainers at the problem.  Also, there is
> always the possibility of reordering the trees ;-)

It's frequently just not possible.  There are a lot of global API
changes that sweep across subsystems.  For instance, I had a trivial
fixup script for scsi and net because of a netlink API change.

> > minor areas, and the fixups are fairly trivial, but it illustrates why
> > conflicts can't be a reason to drop a tree, you have to maintain some
> > sort of automatic fixup (at least I had to with the -mc tree).  The
> 
> OK, maybe if the conflict is trivial, we can do fixups.

Right .. it's really not possible to work without an infrastructure that
does this.

> > reason we do this is that it would give the maintainers a nasty web of
> > included trees (which is almost impossible for the quilt trees anyway)
> > if we tried to resolve the conflicts and destroy our ability to rebase.
> 
> I am hoping (one of Andrew's bugbears) that over time we will end up with
> several branches from git using trees (the vast majority) most of which
> are completely contained within their own subsystem and don't depend on
> anything but Linus' tree.  The conflicting and dependent branches will be
> merged later in the sequence.  Thus we will end up with a large amount of
> the tree becoming stable as the merge window approaches.  (Yes, sometimes
> I am an optimist :-))

We can't really do this.  We don't work in a utopian my area only
environment.  Almost every release cycle involves dealing with patches
that cross subsystem boundaries.  Often we try to minimise the problems
by taking patches from different subsystems via our trees to keep the
dependencies straight, but it does lead to conflicts.  Most of them are
ones we're aware of, and easily resolvable as the trees go in in the
merge window.  So any infrastructure that builds a unified tree has to
be aware of them too.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 10:00 -0800, Linus Torvalds wrote:
> 
> On Tue, 12 Feb 2008, James Bottomley wrote:
> >
> > On Tue, 2008-02-12 at 09:09 -0800, Linus Torvalds wrote:
> > >  (a) create a base tree with _just_ that fundamental infrastructure 
> > > change,
> > >  and make sure that base branch is so obviously good that there is no 
> > >  question about merging it.
> > 
> > The problem is how do we use a?  Usually we need to track your -rc tree
> > as our fixes go in ... some of which affect our development trees.
> 
> So?
> 
> > If we stick with (a) as the base, we don't get to pull in the fixes in
> > your tree.  If we use your tree we have to pull in (a) creating n
> > different merge points for the n different upstream trees..
> 
> I don't understand what you mean. This is true whether you pulled (a) or 
> not. If you have any changes what-so-ever in your tree, if you pull in 
> fixes from my tree, you'll get a merge.
> 
> But if you mean that you cannot rebase (a), then yes. That was what I 
> said.

Yes, that's what I meant ... no rebasing of (a) once it becomes the
fixed point we all work from.

>  Rebases *do*not*work* (and fundamentally cannot work) in a 
> distributed environment.

Hm ... I think net is a counter example to this.  Rebases certainly work
for them.  The issue, I thought, was around the policy of rebasing and
how often.

I see the question as being one of who creates the history.  When
something goes into your tree, that's an obvious history point and we
can't rewrite it.  However, when something goes into my trees, I don't
think it's as obviously a history point that has to be preserved, so I
can rebase obviously, rebasing causes disruption, so I don't do it
unless it's necessitated by an actual conflict.

> But why would you merge with my tree in the first place? My tree won't 
> normally have any conflicts or anything like that anyway.

This normally happens when we have bug fixes to a driver that is
simultaneously being worked on for the merge window ... and it actually
happens quite a lot.  Sometimes the conflicts actually don't come via my
tree (usually a -mm docbook update, include file removal or something
else that looked trivial to the submitter).

The worst examples of this are the "run lindent on the driver" type of
patches, but a lot of other changes give trivial conflict too.

> With a "Linux-next" tree, you'll see the conflicts if they occur (since 
> *that* tree would merge!), and in that case you would say  "now I need to 
> merge Linus' tree just to resolve the conflicts!"

Right ... that's exactly why I created the -mc tree ... it warns me that
there's a looming problem I need to fix (and not just in your tree).
For that reason, a -next tree that does exactly the same thing will be
great ... I won't have to maintain the -mc tree.

> But before that, merging my tree (or rebasing on top of it) is simply 
> *wrong*. It has nothing to do with your SCSI development.

Yes ... I don't do that ... Like I said, I only rebase for an actual
conflict.

> > Yes, this is effectively what I did with the post merge SCSI tree.
> > However, if you do this rebasing becomes a fact of life because you need
> > to rebase out all the dependencies you have before you merge (in fact,
> > it's a good way of checking whether your dependencies have been merged
> > yet or not, seeing what survives a rebase).
> 
> I don't see the logic. You shouldn't need to rebase at all. I don't see 
> why you claim that this makes rebasing more of a fact. It doesn't. It has 
> no impact at all, except making rebasing _less_ possible!

Well, it came at me because Jens was rebasing the block tree as he
worked through issues in the two branches I was based on.  If the block
tree hadn't rebased, I wouldn't have rebased that often.  However, we
did come across other things that had to move from my tree to Jens' as
we developed this, mainly bug fixes moving closer to the source to
preserve bisectability, and in that case rebases were really necessary.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 10:48 -0800, Linus Torvalds wrote:
> But you're all ignoring my fundamental objection: you're talking as if 
> cross-tree fundamental API changes should be the "norm", and that we 
> should try to solve the workflow issues that stem from that. And I'm 
> saying that I think we should try to FIX the issue, and make sure that 
> it simply *isn't* the norm.

OK, I'll address it then (or try to).  Just don't eat me please ... I'm
only a small and timid storage maintainer ...

> In other words, I'm perfectly happy to be an a*hole and tell people that I 
> simply won't merge things that cause undue API churn at all, and that were 
> not thought out sufficiently.
> 
> We've had too many issues like that (SG chaining, iommu, driver core, not 
> to mention the upheavals in x86) lately, but realistically, which 
> subsystem remains a problem for the future? And maybe the correct thing to 
> do is to just say "enough!".

I can sort of agree, but I don't think you can say enough == no more API
changes.  There's the request_irq argument removal patch floating around
somewhere for instance ...but as someone said "Every rule always has an
exception".

I think what's really needed is a balance where it's painful (on the
originator) to make an API change.  Part of what -mc or -next trees can
do is to make that pain visible early (and often).

> I'm perfectly happy being hardnosed and saying "nope, that's crap, it 
> doesn't matter if the code is slightly better if you cause those kinds of 
> issues".

OK, so you can supply the pain as well.

> The thing is, sometimes the answer really *is* "Don't do that then!". If 
> our model becomes bogged up by cross-subsystem serialization issues, that 
> means that we (a) spend too much time handling the fall-out from that and 
> (b) too little time just refining the details.

I think this is exaggerated.  By the end of the -rc cycle for 2.6.24 I
had six merge fixup patches because of trivial cross subsystem API
changes and two dropped trees because of intractable problems (and they
were notified and later fixed themselves up).  Given that I was
including about 50 git trees and 4 quilt ones, that's not too bad,
Especially as we had all the API churn you mentioned.

> And quite frankly, while "good core architecture" matters a lot, in the 
> end, "getting all the boring small things right" matters even more! Big 
> architectural churn that cleans up and fixes the "big issues" is totally 
> anti-productive if it means that we don't look at the boring small detail 
> stuff.

I agree with this too.  When I introduce an API, I usually try to make
sure it will last the test of time ... the trouble is, I'm not always
omniscient.

So the point, I think we've reached is that:

 1. we want to try to enforce good design for APIs to try to ensure
they're as future proof as foreseeable.
 2. We have to recognise that 1. is impossible in practice and
provide a mechanism for correcting APIs

So, I think our current system is good in that it enforces pain on
people who change APIs.  However, perhaps we are missing the bit where
we reflect longer before introducing APIs in the first place. (hey,
better reviewing ... I've heard that one before ...)

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] 2.6.25-rc1-git2: GDT SCSI: change drivers/scsi/gdth.c into using pci_get device

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 20:48 -0300, Sergio Luis wrote:
> reposting an updated version of it. Please check if it's ok.

Looks fine, thanks!  You added an extra space at the end of 

while ((pdev = pci_get_device(vendor, device, pdev)) 

Which I fixed.  Unfortunately checkpatch isn't very helpful for this
driver since it uses spaces not tabs everywhere, which checkpatch really
hates.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 18:35 -0800, Linus Torvalds wrote:
> 
> On Tue, 12 Feb 2008, James Bottomley wrote:
> > 
> > Yes, this is exactly the feature I'm looking for.  It would allow the
> > downstream users of a rebased tree to rebase themselves correctly.
> > 
> > All the information about the rebase is in the reflog ... it can't be
> > too difficult to pass it through on a pull and allow the downstream tree
> > to do the right thing.
> 
> Guys, you simply have no idea what you're talking about.
> 
> Those downstream trees can have done things themselves. They *depended* on 
> the state you gave them.
> 
> You can't just say "oops, I lied, this is the state you should have used, 
> now it's _your_ mess to sort out".
> 
> OF COURSE it's what you'd like to use - it absolves you of any and all 
> actual responsibility. But dammit, that's not what anybody else wants than 
> the irresponsible person who cannot be bothered to stand up for his work!
> 
> If you're not confident enough about your work, don't push it out! It's 
> that simple. Pushing out to a public branch is a small "release".
> 
> Have the f*cking back-bone to be able to stand behind what you did!

Erm, I would like this feature as a downstream user.

I'm not asking for this to be the default or even easily available.
However, when you know you've based a downstream tree on what you know
to be a volatile base, it would be very useful information to have.

Right at the moment, I maintain a  and a -base and
simply cherry pick the commits between the two to do the right thing
when I know my volatile base has changed.  It would be very helpful to
have a version of rebase that new my base had been rebased.

Basing on a tree I know to be volatile is sometimes a development
decision I make as a downstream user ... I'm just wishing the tools
could help me handle the problems better.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley


On Tue, 2008-02-12 at 19:31 -0800, Linus Torvalds wrote:
> 
> On Tue, 12 Feb 2008, James Bottomley wrote:
> > 
> > Right at the moment, I maintain a  and a -base and
> > simply cherry pick the commits between the two to do the right thing
> > when I know my volatile base has changed.  It would be very helpful to
> > have a version of rebase that new my base had been rebased.
> 
> Hey, I know, you could use.. drumroll..
> 
>   "git rebase"
> 
> I know that's a big leap of faith, to use git rebase for rebasing, but 
> there you have it. Us git people are kind of odd that way.
> 
> IOW, if you know the old broken base, and the new base, just do
> 
>   git rebase --onto newbase oldbase
> 
> and it should do exactly that (basically lots of automated cherry-picks).

OK, smarty-pants ... that works nicely, thanks!

I'm used to maintaining  and -base, so this probably
suits my workflow better than getting the information from the reflog.

It wasn't clear from the git rebase man page that it would work like
that.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] enclosure: add support for enclosure services

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 11:07 -0800, Kristen Carlson Accardi wrote:
> I understand what you are trying to do - I guess I just doubt the value
> you've added by doing this.  I think that there's going to be so much
> customization that system vendors will want to add, that they are going
> to wind up adding a custom library regardless, so standardising those
> few things won't buy us anything.

It depends ... if you actually have a use for the customisations, yes.
If you just want the basics of who (what's in the enclousure), what
(activity) and where (locate) then I think it solves your problem almost
entirely.

So, entirely as a straw horse, tell me what else your enclosures provide
that I haven't listed in the four points.  The SES standards too provide
a huge range of things that no-one ever seems to implement (temperature,
power, fan speeds etc).

I think the users of enclosures fall int these categories

85% just want to know where their device actually is (i.e. that sdc is
in enclosure slot 5)
50% like watching the activity lights
30% want to be able to have a visual locate function
20% want a visual failure indication (the other 80% rely on some OS
notification instead)

When you add up the overlapping needs, you get about 90% of people happy
with the basics that the enclosure services provide.  Could there be
more ... sure; should there be more ... I don't think so ... that's what
value add the user libraries can provide.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] enclosure: add support for enclosure services

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 10:22 -0800, Kristen Carlson Accardi wrote:
> I apologize for taking so long to review this patch.  I obviously agree
> wholeheartedly with Luben.  The problem I ran into while trying to
> design an enclosure management interface for the SATA devices is that
> there is all this vendor defined stuff.  For example, for the AHCI LED
> protocol, the only "defined" LED is 'activity'.  For LED2 and LED3 it
> is up to hardware vendors to define these.  For SGPIO there's all kinds
> of ways for hw vendors to customize.  I felt that it was going to be a
> maintainance nightmare to have to keep track of various vendors
> enclosure implementations in the ahci driver, and that it'd be better
> to just have user space libraries take care of that.  Plus, that way a
> vendor doesn't have to get a patch into the kernel to get their new
> spiffy wizzy bang blinky lights working (think of how long it takes
> something to even get into a vendor kernel, which is what these guys
> care about...).  So I'm still not sold on having an enclosure
> abstraction in the kernel - at least for the SATA controllers.

Correct me if I'm wrong, but didn't the original AHCI enclosure patch
expose activity LEDs via sysfs?

I'm not saying there aren't a lot of non standard pieces that need to be
activated by direct commands or other user activated protocol.  I am
saying there are a lot of standard pieces that we could do with showing
in a uniform manner.

The pieces I think are absolutely standard are

1. Actual enclosure presence (is this device in an enclosure)
2. Activity LED, this seems to be a feature of every enclosure.

I also think the following are reasonably standard (based on the fact
that most enclosure standards recommend but don't require this):

3. Locate LED (for locating the device).  Even if you only have an
activity LED, this is usually done by flashing the activity LED in a
well defined pattern.
4. Fault.  this is the least standardised of the lot, but does seem to
be present in about every enclosure implementation.

All I've done is standardise these four pieces ... the services actually
take into account that it might not be possible to do certain of these
(like fault).

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 17:20 -0800, David Miller wrote:
> What would be really cool is if you could do the rebase thing, push
> that to a remote tree you were already pushing into and others could
> pull from that and all the right things happen.
> 
> A rebase is just a series of events, and those could propagate to
> people who are pulling from the tree.  I'm pretty sure GIT could
> support this.

Yes, this is exactly the feature I'm looking for.  It would allow the
downstream users of a rebased tree to rebase themselves correctly.

All the information about the rebase is in the reflog ... it can't be
too difficult to pass it through on a pull and allow the downstream tree
to do the right thing.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 09:09 -0800, Linus Torvalds wrote:
>  (a) create a base tree with _just_ that fundamental infrastructure change,
>  and make sure that base branch is so obviously good that there is no 
>  question about merging it.

The problem is how do we use a?  Usually we need to track your -rc tree
as our fixes go in ... some of which affect our development trees.

If we stick with (a) as the base, we don't get to pull in the fixes in
your tree.  If we use your tree we have to pull in (a) creating n
different merge points for the n different upstream trees..

> (b) tell other people about the reason for the infrastructure change, and 
>  simply allow others to merge it. You don't have to wait for *me* to 
>  open the merge window, you need to make sure that the people that get 
>  impacted most can continue development!

Yes, this is effectively what I did with the post merge SCSI tree.
However, if you do this rebasing becomes a fact of life because you need
to rebase out all the dependencies you have before you merge (in fact,
it's a good way of checking whether your dependencies have been merged
yet or not, seeing what survives a rebase).

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 11:36 -0500, Jeff Garzik wrote:
> David Miller wrote:
> > This is why, with the networking, we've just tossed all of the network
> > driver stuff in there too.  I can rebase freely, remove changesets,
> > rework them, etc. and this causes a very low amount of pain for Jeff
> > Garzik and John Linville.
> 
> 
> s/very low/not low/
> 
> Rebasing is always a pain, and John and I both agreed the other day that 
> you do it too often.
> 
> I've complained about this before, too...  but figured this was just 
> another thing I was getting ignored on, and so life moved on.  But don't 
> try to sell rebasing as "low pain".
> 
> Rebasing makes the history all nice and pretty, but by totalling 
> flattening the history, trashing all the commit ids (and rewriting 
> associated metadata), you create obvious downstream problems.
> 
> Rebasing is  low impact only if you don't have git downstream people. 
> Otherwise, you're just treating it as a useful quilt clone, really.

I used to be in this camp too: seeing rebasing as the ultimate evil.
However, what I found was that as I tried to preserve my merge tree's
history, I was picking up merge points with non trivial fixups.  You can
argue that this is how git is supposed to operate, but the problem is
that these merge points become immovable objects in the tree history ...
I can't rebase the tree at all any more and I can't simply remove a
commit (which gets messy for commits duplicated by other trees)  They
also effectively separate the fix up from the patch causing it.   Plus,
it became difficult for people to go to gitweb to see the tree history,
because everything before the merge point got buried.

I finally concluded that I needed to rebase my trees to resolve
conflicts rather than merging them.  The advantages are I know the exact
patch with the problem and I can keep the history annotated with what
had to be done to get it to apply.  Plus with git (and this is a huge
advantage over quilt) knowing the base of the patch, I can still use the
merge algorithms to help me do the fixup.

I'm not saying rebase constantly ... I'm still mindful of the disruption
it causes, so I only do it if I pick up an actual conflict.  However, I
do think it does becomes necessary at times.

The problem rebasing causes is that git is pretty horrible at rebasing a
tree which is based on your now rebased tree (try saying that after a
few martinis).  Git really needs to acquire better rebasing tools for
this one case, but if that were solved, I think a lot of the objections
to rebasing could be answered.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 17:32 +0200, Benny Halevy wrote:
> On Feb. 12, 2008, 17:07 +0200, James Bottomley <[EMAIL PROTECTED]> wrote:
> > On Mon, 2008-02-11 at 21:53 -0800, Greg KH wrote:
> >>> this is why you need specific trees for just the API change, and these
> >>> need to EXPLICITLY go first before EVERYTHING ELSE. Yes this needs a
> >>> bit of coordination, but it's the only way.
> >> Even then, it will not work.
> >>
> >> Again, Roland isn't going to want to always pull in my driver tree just
> >> to build his tree.  He wants to, and needs to do his own development
> >> effort.
> >>
> >> But when we merge them together, there would be problems.
> >>
> >> So, you can't just "drop" the IB tree.
> >> You can't just "drip" my tree.
> >>
> >> Where do you "fix this up" at?  I can send a patch for the IB tree, but
> >> Roland can't put it in his tree, and I can't put it in my tree, it needs
> >> to go _after_ both of our trees.
> > 
> > Actually, we had exactly this issue with the SCSI bidirectional patches:
> > They depended on the sg_table patches in block.  The solution I adopted
> > was two merge trees:  One to go in immediately with no dependencies
> > (scsi-misc-2.6) and the other based on the pieces of block (so it would
> > compile and apply) to go in mid way through the merge round after block
> > (scsi-bidi-2.6).  What I did was keep rebasing the bidi tree until I
> > could see there was nothing other than a plane base before merging it.
> 
> My take on this, in retrospect, is that the code should probably have been
> developed in one branch off of one of the trees, or maybe even better in a
> third tree.

For your development, possibly, but not for my merge ... there were a
lot of other patches besides yours in the bidi tree (it was, in fact,
misnamed, but I thought at the time I created it it would only contain
the bidi patches).

> The point is that the structure of git trees followed the organizational
> structure rather than the problem at hand and if contributions coming
> from different trees depend on each other, staging branches should be created
> in the integration tree for working out the conflicts and when ready,
> the integrated branches should be pushed towards the tree's trunk.

Actually, I think you'll find the problem is that we can't build an
integration tree that's both stable and long lived, and hence you can't
base anything off it that requires a long lived base.

So, the way I worked for your patches was to use the -mc tree to
identify my conflicts, and only include the actual conflicting branches
as a basis for the scsi-bidi tree beofre fixing up the patches.  I then
actually used the -mc tree as a canary to tell me when to rebase.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] 2.6.25-rc1-git2: GDT SCSI: change drivers/scsi/gdth.c into using pci_get device

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 11:31 -0300, Sergio Luis wrote:
> Fix compilation warning in drivers/scsi/gdth.c, using deprecated 
> pci_find_device. 
> Change it into using pci_get_device instead:
> drivers/scsi/gdth.c:645: warning: 'pci_find_device' is deprecated (declared 
> at include/linux/pci.h:495)
> 
> Signed-off-by: Sergio Luis <[EMAIL PROTECTED]>
> 
> gdth.c |2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff -urN linux-2.6.25-rc1-git2.orig/drivers/scsi/gdth.c 
> linux-2.6.25-rc1-git2/drivers/scsi/gdth.c
> --- linux-2.6.25-rc1-git2.orig/drivers/scsi/gdth.c2008-02-12 
> 09:26:14.0 -0300
> +++ linux-2.6.25-rc1-git2/drivers/scsi/gdth.c 2008-02-12 10:33:08.0 
> -0300
> @@ -642,7 +642,7 @@
>*cnt, vendor, device));
>  
>  pdev = NULL;
> -while ((pdev = pci_find_device(vendor, device, pdev)) 
> +while ((pdev = pci_get_device(vendor, device, pdev)) 
> != NULL) {
>  if (pci_enable_device(pdev))
>  continue;

This can't be correct without a matching put in the error leg.

The difference between the two APIs is that pci_get_device returns a
pci_device with a reference taken and pci_find_device doesn't.  However,
pci_get_device does drop the reference again so as long as you never
exit the loop until it returns NULL, it is OK ... it's the exits before
pci_get_device() returns NULL that need the put.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley

On Mon, 2008-02-11 at 21:53 -0800, Greg KH wrote:
> > this is why you need specific trees for just the API change, and these
> > need to EXPLICITLY go first before EVERYTHING ELSE. Yes this needs a
> > bit of coordination, but it's the only way.
> 
> Even then, it will not work.
> 
> Again, Roland isn't going to want to always pull in my driver tree just
> to build his tree.  He wants to, and needs to do his own development
> effort.
> 
> But when we merge them together, there would be problems.
> 
> So, you can't just "drop" the IB tree.
> You can't just "drip" my tree.
> 
> Where do you "fix this up" at?  I can send a patch for the IB tree, but
> Roland can't put it in his tree, and I can't put it in my tree, it needs
> to go _after_ both of our trees.

Actually, we had exactly this issue with the SCSI bidirectional patches:
They depended on the sg_table patches in block.  The solution I adopted
was two merge trees:  One to go in immediately with no dependencies
(scsi-misc-2.6) and the other based on the pieces of block (so it would
compile and apply) to go in mid way through the merge round after block
(scsi-bidi-2.6).  What I did was keep rebasing the bidi tree until I
could see there was nothing other than a plane base before merging it.

Of course, this only worked because Jens has a git tree ... it would
have been a lot harder (but not impossible) if I'd had entangled patches
from a quilt tree.

So I've already proven that the split tree solution is viable, if not
pretty.  The bidi tree had to be rebased an awful lot as the block trees
changed and rebased.  Unfortunately, git isn't very good at this, I
eventually had to keep a base and a top reference and just try to cherry
pick this series into the new constructed block tree.  But it can be
done...

> That's what -mm has been able to handle so far, and that needs to also
> work with -next.

Actually, we never successfully got block and bidi via -mm.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Open bugs

2008-02-12 Thread James Bottomley

Added linux-scsi for the SCSI ones

On Tue, 2008-02-12 at 00:18 -0800, Natalie Protasevich wrote:
> Hello,
> 
> The bugs listed are over a month old, and haven't been addressed yet.
> It would be appreciated if corresponding maintainers identify whether
> the bugs have been fixed, or need to be worked on, and take
> appropriate action.
> 
> In most cases, reporters are standing by and ready to provide
> information and necessary testing.
> Thanks!
> 
> SCSI==
> 
> Problems on booting
> http://bugzilla.kernel.org/show_bug.cgi?id=9621
> Date: 12/22/2007
> Regression
> Summary: The boot stops / hangs on hardware detection of SCSI.  I have
> an InitioINI-9X00U/UW
> When I have an usb key sticked in /dev/sba, and run lilo then, then it
> dont boot but give L99 99 99 99 ... error

I think this was fixed by commit
e2d435ea4084022ab88efa74214accb45b1f9e92

Apparently bugzilla email is on the fritz again because this bug report
didn't come across linux-scsi.

> Resetting RAID attached to a FC Switch causes kernel panic and crash
> http://bugzilla.kernel.org/show_bug.cgi?id=9598
> 12/18/2007
> Hardware Environment:SunFire X4200 - 2 x dual core AMD Opteron CPUs,
> 8GB Ram, Qlogic FC adapter.
> Summary: Resetting the RAID box causes the X4200 to crash.

This one looks like the usual problem of remove re-add with the SCSI
device model.

> 3ware 9650SE -8LPML not recognized by 3w-9xxx driver
> http://bugzilla.kernel.org/show_bug.cgi?id=8908
> 08/19/2007
> Problem Description: The 3w-9xxx kernel driver for 3ware 9xxx SATA
> RAID Controller series did not recognize the 3ware 9650SE-8LPML SATA
> RAID Controller.

Since this one never apparently worked it's not a regression but an
enhancement request, isn't it?

However, adding this PCI ID to the driver should be fairly
straightforward.  Does anyone know what the actual PCI IDs are?

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Announce: Linux-next (Or Andrew's dream :-))

2008-02-12 Thread James Bottomley

On Tue, 2008-02-12 at 21:18 +0100, Jiri Kosina wrote:
> On Tue, 12 Feb 2008, James Bottomley wrote:
> 
> > >  Rebases *do*not*work* (and fundamentally cannot work) in a 
> > > distributed environment.
> > Hm ... I think net is a counter example to this.  Rebases certainly work
> > for them.  The issue, I thought, was around the policy of rebasing and
> > how often.
> 
> Hmm ... as far as I can see, Jeff and John (i.e. net tree downstreams) are 
> pretty loudly unhappy with Dave rebasing too often, right?

That's true ... but irrelevant to the argument of whether rebasing does
or doesn't work.  I don't think anyone's arguing that rebasing doesn't
cause real problems to downstream users ... when I was based on block
for the scsi post merge tree, I was just such a user ...

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

2008-02-13 Thread James Bottomley

On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> - gdth_flush(ha);
> -

This piece doesn't look right.  gdth_flush() forces the internal cache
to disk backing.  If you remove it, you're taking the chance that the
machine will be powered off without a writeback which can cause data
corruption.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] enclosure: add support for enclosure services

2008-02-13 Thread James Bottomley

On Wed, 2008-02-13 at 09:08 -0500, James Smart wrote:
> The keep-it-in-user-space arguments seem fairly compelling to me.
> Especially as we've pushed whole i/o subsystems out to user space
> (iscsi, stgt, talked about fcoe, a lot of dm control, etc).

And to me too.

> The functionality seems to align with Doug's sg/lsscsi utility chain
> as well.  Granted, the new utility would have to be designed in such
> as way that it can incorporate vendor "hardware handlers".  But, if
> James has a somewhat common implementation already for a kernel
> implementation, I'm sure that can be the starting point for lsscsi.
> 
> So, the main question I believe is being asked is:
> - Do we need to represent this via the object/sysfs tree or can an
>outside utility be depended upon to show it ?
> 
> Note that I am not supporting:
> "Vendors may choose to distribute their own applications".
> For this to become truly useful, there needs to be a common tool/method
> that presents common features in a common manner, regardless of whether
> it is in kernel or not.

I don't disagree with that, but the fact is that there isn't such a
tool.   It's also a fact that the enterprise is reasonably unhappy with
the lack of an enclosure management infrastructure, since it's something
they got on all the other unix systems.

I think a minimal infrastructure in-kernel does just about everything
the enterprise wants ... and since it's stateless, they can always use
direct connect tools in addition.

However, I'm happy to be proven wrong ... anyone on this thread is
welcome to come up with a userland enclosure infrastructure.  Once it
does everything the in-kernel one does (which is really about the
minimal possible set), I'll be glad to erase the in-kernel one.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

2008-02-13 Thread James Bottomley


On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[EMAIL PROTECTED]> wrote:
> > On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[EMAIL PROTECTED]> 
> > wrote:
> >> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> >>> - gdth_flush(ha);
> >>> -
> >> This piece doesn't look right.  gdth_flush() forces the internal cache
> >> to disk backing.  If you remove it, you're taking the chance that the
> >> machine will be powered off without a writeback which can cause data
> >> corruption.
> >>
> >> James
> >>
> > Yes. 
> > I have more problems reported, with exit, and am just sending one more 
> > patch that puts
> > this back in. Which was tested.
> > 
> > So I will resend this one plus one new one.
> > 
> > Boaz
> > 
> 
> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
> to a gdth_halt() function, which would then redo half of what gdth_exit
> does, and wrongly so, and crash.  
> 
> Are we guaranteed in todays kernel that modules .exit function be called
> on an halt or reboot? If so then there is no need for duplications and
> the gdth_halt() should go.

No.  The __exit section is actually discardable if you promise never to
remove the module.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] enclosure: add support for enclosure services

2008-02-13 Thread James Bottomley

On Wed, 2008-02-13 at 11:22 -0500, James Smart wrote:
> James Bottomley wrote:
> > I don't disagree with that, but the fact is that there isn't such a
> > tool.   It's also a fact that the enterprise is reasonably unhappy with
> > the lack of an enclosure management infrastructure, since it's something
> > they got on all the other unix systems.
> 
> I don't disagree.
> 
> > I think a minimal infrastructure in-kernel does just about everything
> > the enterprise wants ... and since it's stateless, they can always use
> > direct connect tools in addition.
> > 
> > However, I'm happy to be proven wrong ... anyone on this thread is
> > welcome to come up with a userland enclosure infrastructure.  Once it
> > does everything the in-kernel one does (which is really about the
> > minimal possible set), I'll be glad to erase the in-kernel one.
> 
> yeah, but...  putting something new in, only to pull it later, is a bad
> paradigm for adding new mgmt interfaces. Believe me, I've felt users pain in
> the reverse flow : driver-specific stuff that then has to migrate to upstream
> interfaces, complicated by different pull points by different distros. You can
> migrate a management interface, but can you really remove/pull one out ?

That depends on the result.  I agree that migration will be a pain, so I
suppose I set the bar a bit low; the user tool needs to be a bit more
compelling; plus I'll manage the interface transition ... if there is
one; there's no such tool yet.

> Isn't it better to let the lack of an interface give motivation to create
> the "right" interface, once the "right way" is determined - which is what I
> thought we were discussing ?or is this simply that there is no motivation
> until something exists, that people don't like, thus they become motivated ?

Well ... I did learn the latter from Andrew, so I thought I'd try it.
It's certainly true that the enclosure problem has been an issue for
over a decade, so there doesn't seem to be anything motivating anyone to
solve it.  I wouldn't have bothered except that I could see ad-hoc
in-kernel sysfs solutions beginning to appear.  At least this way they
can all present a unified interface.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

2008-02-13 Thread James Bottomley

On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <[EMAIL PROTECTED]> wrote:
> > On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
> >> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[EMAIL PROTECTED]> wrote:
> >>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[EMAIL PROTECTED]> 
> >>> wrote:
> >>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> >>>>> -   gdth_flush(ha);
> >>>>> -
> >>>> This piece doesn't look right.  gdth_flush() forces the internal cache
> >>>> to disk backing.  If you remove it, you're taking the chance that the
> >>>> machine will be powered off without a writeback which can cause data
> >>>> corruption.
> >>>>
> >>>> James
> >>>>
> >>> Yes. 
> >>> I have more problems reported, with exit, and am just sending one more 
> >>> patch that puts
> >>> this back in. Which was tested.
> >>>
> >>> So I will resend this one plus one new one.
> >>>
> >>> Boaz
> >>>
> >> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
> >> to a gdth_halt() function, which would then redo half of what gdth_exit
> >> does, and wrongly so, and crash.  
> >>
> >> Are we guaranteed in todays kernel that modules .exit function be called
> >> on an halt or reboot? If so then there is no need for duplications and
> >> the gdth_halt() should go.
> > 
> > No.  The __exit section is actually discardable if you promise never to
> > remove the module.
> > 
> I don't understand please explain. 
> What does a driver need to do if it needs a consistent shutdown retine?
> module or built in? unload or shutdown?

It needs to register a reboot notifier, which gdth does.

However, the notifier is only called on reboot, so it also needs to
clean up correctly on module exit as well.

The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
command.  That's done by a shutdown notifier from sd, so the correct
thing would always get done; however it does mean the driver has to be
in a condition to process the last sync cache command.

For the quick fix, just keep the current infrastructure and put back the
gdth_flush() command where it can be effective.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

2008-02-13 Thread James Bottomley

On Wed, 2008-02-13 at 19:18 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <[EMAIL PROTECTED]> wrote:
> > It needs to register a reboot notifier, which gdth does.
> > 
> > However, the notifier is only called on reboot, so it also needs to
> > clean up correctly on module exit as well.
> > 
> > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
> 
> Why would we think that the controller does not support this command
> is it not in the mandatory section of the standard?

Um, because the controller isn't a SCSI device.  It's an emulated device
which means the SCSI comands are processed in the driver.  It does look
like the driver<->HBA communication is some sort of translated dialect
of SCSI.

> > command.  That's done by a shutdown notifier from sd, so the correct
> > thing would always get done; however it does mean the driver has to be
> > in a condition to process the last sync cache command.
> 
> Why would it not be ready? what do other drivers do?
> The drivers is ready until the very last module's .exit. Is that good
> enough?

shutdown is called as part of device removal and module unload ...
usually from scsi_remove_host().  So you can't tear down command
processing before that point.

> > 
> > For the quick fix, just keep the current infrastructure and put back the
> > gdth_flush() command where it can be effective.
> > 
> 
> Just did. But if needed I would prefer to emulate the SCSI SYNCHRONIZE CACHE
> command and not that boot notifier thing. Please advise.

I think such a change, though desirable, would be too large to count as
a bug fix.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash

2008-02-13 Thread James Bottomley

On Wed, 2008-02-13 at 19:12 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <[EMAIL PROTECTED]> wrote:
> > On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
> >> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <[EMAIL PROTECTED]> 
> >> wrote:
> >>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
> >>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <[EMAIL PROTECTED]> 
> >>>> wrote:
> >>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <[EMAIL PROTECTED]> 
> >>>>> wrote:
> >>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> >>>>>>> - gdth_flush(ha);
> >>>>>>> -
> >>>>>> This piece doesn't look right.  gdth_flush() forces the internal cache
> >>>>>> to disk backing.  If you remove it, you're taking the chance that the
> >>>>>> machine will be powered off without a writeback which can cause data
> >>>>>> corruption.
> >>>>>>
> >>>>>> James
> >>>>>>
> >>>>> Yes. 
> >>>>> I have more problems reported, with exit, and am just sending one more 
> >>>>> patch that puts
> >>>>> this back in. Which was tested.
> >>>>>
> >>>>> So I will resend this one plus one new one.
> >>>>>
> >>>>> Boaz
> >>>>>
> >>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
> >>>> to a gdth_halt() function, which would then redo half of what gdth_exit
> >>>> does, and wrongly so, and crash.  
> >>>>
> >>>> Are we guaranteed in todays kernel that modules .exit function be called
> >>>> on an halt or reboot? If so then there is no need for duplications and
> >>>> the gdth_halt() should go.
> >>> No.  The __exit section is actually discardable if you promise never to
> >>> remove the module.
> >>>
> >> I don't understand please explain. 
> >> What does a driver need to do if it needs a consistent shutdown retine?
> >> module or built in? unload or shutdown?
> > 
> > It needs to register a reboot notifier, which gdth does.
> > 
> > However, the notifier is only called on reboot, so it also needs to
> > clean up correctly on module exit as well.
> > 
> > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
> > command.  That's done by a shutdown notifier from sd, so the correct
> > thing would always get done; however it does mean the driver has to be
> > in a condition to process the last sync cache command.
> > 
> > For the quick fix, just keep the current infrastructure and put back the
> > gdth_flush() command where it can be effective.
> > 
> > James
> > 
> > 
> Totally untested.
> 
> ---
> From: Boaz Harrosh <[EMAIL PROTECTED]>
> Subject: [PATCH] gdth: bugfix for the at-exit problems
> 
> gdth_exit would first remove all cards then stop the timer
> and would not sync with the timer function. This caused a crash
> in gdth_timer() when module was unloaded.
> So del_timer_sync the timer before we delete the cards.
> 
> also the reboot notifier function would crash. So unify
> the exit and halt functions with a gdth_shutdown() that's
> called by both.
> 
> Signed-off-by: Boaz Harrosh <[EMAIL PROTECTED]>
> ---
>  drivers/scsi/gdth.c |   99 --
>  1 files changed, 40 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
> index 8eb78be..7bb9b45 100644
> --- a/drivers/scsi/gdth.c
> +++ b/drivers/scsi/gdth.c
> @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file 
> *filep,
>unsigned int cmd, unsigned long arg);
>  
>  static void gdth_flush(gdth_ha_str *ha);
> -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf);
>  static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *));
>  static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
>   struct gdth_cmndinfo *cmndinfo);
> @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd,
>  #include "gdth_proc.h"
>  #include "gdth_proc.c"
>  
> -/* notifier block to get a notify on system shutdown/halt/reboot */
> -static struct n

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2540 matches

Mail list logo