date:20070312

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Chris Wedgwood

On Tue, Mar 13, 2007 at 01:11:44AM -0400, Andreas Dilger wrote:

> I'd guess a vast majority of IO will have the end similarly
> misaligned as the start.  Very little filesystem IO is 512 bytes,
> possibly excluding XFS in an unusual mode.

XFS (mkfs.xfs) can be told what the native sector size is and will
adjust writes accordingly.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Andreas Dilger

On Mar 12, 2007  10:26 -0400, Jeff Garzik wrote:
> In my own experiments on my own Fedora workstation, ~66% of IOs in Linux 
> start on an odd sector, and ~33% started on even-numbered sectors.  For 
> a 1K-sector drive with 'odd' alignment, the configuration Microsoft will 
> likely want, that means the majority of disk transactions will avoid a 
> RMW cycle, but a still-numerous minority will not.

Isn't that purely an artifact of the DOS partition table alignment, possibly
skewed by the fact that most of your IO is on partition 1 & 3?  Hard to
believe this because of the nice even numbers though.

Since ext3 has at least 1kB blocksize and defaults to 4kB blocksize with
most modern disks because they are > 500MB in size, you should never
have misaligned writes generated by the filesystem itself.

> I did not test 
> transfer length, to see how many transfers /ended/ on an odd sector, 
> thus determining how many RMW cycles the tail of an average I/O requires.

I'd guess a vast majority of IO will have the end similarly misaligned as
the start.  Very little filesystem IO is 512 bytes, possibly excluding XFS
in an unusual mode.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] tgt: remove the code to build sense

2007-03-12 Thread FUJITA Tomonori

As Doug pointed out, it's would be better for tgt to do nothing than
send the bugus sense when tgt hits the user-space bugs.

The patch was made against scsi-misc tree.

---
From: FUJITA Tomonori <[EMAIL PROTECTED]>

tgt notifies a LLD of the failure with sense when it hits the
user-space daemon bugs. However, tgt doesn't know anything about SCSI
devices that initiators talks to. So it's impossible to send proper
sense buffer (format and contents).

This patch changes tgt not to notify a LLD of the failure with bogus
sense. Instead, tgt just re-queues the failure command to the internal
list so that it will be freed cleanly later on when the scsi_host is
removed.

Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]>
Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
---
 drivers/scsi/scsi_tgt_lib.c |   27 +--
 1 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index c05dff9..2570f48 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -459,16 +459,6 @@ static struct request *tgt_cmd_hash_look
return rq;
 }
 
-static void scsi_tgt_build_sense(unsigned char *sense_buffer, unsigned char 
key,
-unsigned char asc, unsigned char asq)
-{
-   sense_buffer[0] = 0x70;
-   sense_buffer[2] = key;
-   sense_buffer[7] = 0xa;
-   sense_buffer[12] = asc;
-   sense_buffer[13] = asq;
-}
-
 int scsi_tgt_kspace_exec(int host_no, int result, u64 tag,
 unsigned long uaddr, u32 len, unsigned long 
sense_uaddr,
 u32 sense_len, u8 rw)
@@ -528,12 +518,21 @@ int scsi_tgt_kspace_exec(int host_no, in
 * user-space daemon bugs or OOM
 * TODO: we can do better for OOM.
 */
+   struct scsi_tgt_queuedata *qdata;
+   struct list_head *head;
+   unsigned long flags;
+
eprintk("cmd %p ret %d uaddr %lx len %d rw %d\n",
cmd, err, uaddr, len, rw);
-   cmd->result = SAM_STAT_CHECK_CONDITION;
-   memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
-   scsi_tgt_build_sense(cmd->sense_buffer,
-HARDWARE_ERROR, 0, 0);
+
+   qdata = shost->uspace_req_q->queuedata;
+   head = &qdata->cmd_hash[cmd_hashfn(tcmd->tag)];
+
+   spin_lock_irqsave(&qdata->cmd_hash_lock, flags);
+   list_add(&tcmd->hash_list, head);
+   spin_unlock_irqrestore(&qdata->cmd_hash_lock, flags);
+
+   goto done;
}
}
err = scsi_tgt_transfer_response(cmd);
-- 
1.4.3.2

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: WRITE BUFFER commands through SG_IO getting rounded up to sector past 32k

2007-03-12 Thread Aravind Parchuri


[EMAIL PROTECTED] wrote:

Aravind Parchuri wrote:

[EMAIL PROTECTED] wrote:

Aravind Parchuri wrote:
  

My log messages were getting all mixed up, so I cleaned up my little
test to send just one command at a time. It actually looks like the mid
layer passes the command through to open-iscsi with the right size the
first time, but then it sends a second command with request_bufflen = 0.

I can verify that the command completed on the target just like the
regular ones did, so there should be no reason for a retry of any sort.

Here's the log for a 32896 byte command:
Mar  9 11:27:43 ITX000c292c3c8d kernel: sg_open: dev=0, flags=0x802
Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_add_sfp: sfp=0xcbadc000
Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_build_reserve: req_size=32768
Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_build_indirect:
buff_size=32768, blk_size=32768
Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_add_sfp:   bufflen=32768,
k_use_sg=1
Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_ioctl: sg0, cmd=0x2285
Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_common_write:  scsi
opcode=0x3b, cmd_size=10
Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_start_req: dxfer_len=32896
Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_build_indirect:
buff_size=32896, blk_size=33280
Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_write_xfer: num_xfer=32896,
iovec_count=0, k_use_sg=2
Mar  8 11:27:43 ITX000c292c3c8d kernel: iscsi_queuecommand: opcode 3b
request_bufflen 32896 transfersize 32896


Did you add your own output? Could you enable iscsi debugging? What
kernel is this with and what versions of open-iscsi (upstream or svn or
tarball release)?
  

No custom output, all of this is from scsi (SCSI_LOG_TIMEOUT=5) and
open-iscsi (DEBUG_SCSI enabled). I thought I mentioned this in an
earlier mail -  the kernel is 2.6.19, but the open-iscsi drivers and


The output here does not have DEBUG_SCSI enabled. It just has your
custom iscsi output. The first log had DEBUG_SCSI iscsi output.



I'm extremely sorry - I think I forgot to enable it while recompiling 
stuff with your patch, and I was fooled by that print that we put in 
iscsi_queuecommand.



But what I want is not just parts you cut out. I am looking for all of
it. If I run:

sg_write_buffer /dev/sdi -l 32896


I wasn't aware that such a utility existed - it must be part of 
sg_utils, I guess. I just hacked up a test to fire an SG_IO ioctl to a 
given device. I guess I should try with this one next, just to check.




with my patch and a netapp target, I get the error:

Write buffer command not supported

If I look at the iscsi log, I see

Mar 12 12:59:54 madmax kernel: iscsi: cmd [itt 0x25 total 32896 imm_data
32896 unsol count 0, unsol offset 32896]
Mar 12 12:59:54 madmax kernel: iscsi: ctask enq [write cid 0 sc
81000e82d6c0 cdb 0x3b itt 0x25 len 32896 cmdsn 294 win 128]
Mar 12 12:59:54 madmax kernel: iscsi: ctask deq [cid 0 xmstate 2 itt 0x25]
Mar 12 12:59:55 madmax kernel: iscsi: cmdrsp [op 0x21 cid 0 itt 0x25 len 24]
Mar 12 12:59:55 madmax kernel: iscsi: copied 22 bytes of sense
Mar 12 12:59:55 madmax kernel: iscsi: done [sc 81000e82d6c0 res 2
itt 0x25]


which shows the len got set and that the target failed the target and we
got some sense back. The command is then failed to sg and sg returns to
userspace.

Your output just shows that we may be retrying the same command over and
over and it never completes which would be strange because the command
is a block pc command. It would help to see what happens with iscsi's
DEBUG_SCSI option to see if the command failed or completed ok. It would
also be nice to get the scsi-ml output to see what the sense is and what
is going on there, but as a first step we can look at the iscsi output.



Thanks for explaining in detail what's useful in debugging the problem. 
Here are logs with DEBUG_SCSI enabled, SCSI_LOG_TIMEOUT set to level 5, 
SCSI_LOG_MLQUEUE to 4 and SCSI_LOG_MLCOMPLETE to 4. This is everything 
in /var/log/messages from the time I issued the ioctl.


Command with 32896 bytes:

Mar 12 11:02:33 ITX000c292c3c8d kernel: iscsi: mgmtpdu [op 0x40 hdr->itt 
0x datalen 0]
Mar 12 11:02:33 ITX000c292c3c8d kernel: iscsi: mtask deq [cid 0 state 4 
itt 0xa05]
Mar 12 11:03:03 ITX000c292c3c8d kernel: iscsi: mgmtpdu [op 0x40 hdr->itt 
0x datalen 0]
Mar 12 11:03:03 ITX000c292c3c8d kernel: iscsi: mtask deq [cid 0 state 4 
itt 0xa06]
Mar 12 11:03:32 ITX000c292c3c8d kernel: iscsi: mgmtpdu [op 0x40 hdr->itt 
0x datalen 0]
Mar 12 11:03:32 ITX000c292c3c8d kernel: iscsi: mtask deq [cid 0 state 4 
itt 0xa07]
Mar 12 11:04:02 ITX000c292c3c8d kernel: iscsi: mgmtpdu [op 0x40 hdr->itt 
0x datalen 0]
Mar 12 11:04:02 ITX000c292c3c8d kernel: iscsi: mtask deq [cid 0 state 4 
itt 0xa08]

Mar 12 11:04:21 ITX000c292c3c8d kernel: sg_open: dev=0, flags=0x802
Mar 12 11:04:21 ITX000c292c3c8d kernel: sg_add_sfp: sfp=0xcb554000
Mar 12 11:04:21 ITX000c292c3c8d kernel: sg_build_reserve: req_size=32768
Mar 12 11:04:21 ITX000c292c3c8d k

[ PATCH ] mptsas: Fix oops during driver load time(rev 2)

2007-03-12 Thread Judith Lebzelter

This fixes an oops during driver load time.   

mptsas_probe calls mpt_attach(over in mptbase.c).  Inside that 
call, we read some manufacturing config pages to setup some 
defaults.  While reading the config pages, the firmware doesn't 
complete the reply in time, and we have a timeout. The timeout 
results in hardreset handler being called.  The hardreset 
handler calls all the fusion upper layer driver reset callback 
handlers.  The mptsas_ioc_reset function is the callback handler 
in mptsas.c.   In summary, mptsas_ioc_reset is getting called 
before scsi_host_alloc is called, and the pointer ioc->sh is 
NULL, as well as the hostdata.

Signed-off-by:  Judith Lebzelter <[EMAIL PROTECTED]

---
Sorry I was not more descriptive.  Here is the patch with Eric's 
description as requested.

Index: linux-2.6.21-rc3/drivers/message/fusion/mptsas.c
===
--- linux-2.6.21-rc3.orig/drivers/message/fusion/mptsas.c
+++ linux-2.6.21-rc3/drivers/message/fusion/mptsas.c
@@ -815,7 +815,7 @@ mptsas_taskmgmt_complete(MPT_ADAPTER *io
 static int
 mptsas_ioc_reset(MPT_ADAPTER *ioc, int reset_phase)
 {
-   MPT_SCSI_HOST   *hd = (MPT_SCSI_HOST *)ioc->sh->hostdata;
+   MPT_SCSI_HOST   *hd;
struct mptsas_target_reset_event *target_reset_list, *n;
int rc;
 
@@ -827,7 +827,10 @@ mptsas_ioc_reset(MPT_ADAPTER *ioc, int r
if (reset_phase != MPT_IOC_POST_RESET)
goto out;
 
-   if (!hd || !hd->ioc)
+   if (!ioc->sh || !ioc->sh->hostdata)
+   goto out;
+   hd = (MPT_SCSI_HOST *)ioc->sh->hostdata;
+   if (!hd->ioc)
goto out;
 
if (list_empty(&hd->target_reset_list))


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Bryan Henderson

>> I don't get this.  If you mean partitions defined by the classic DOS 
>> partition table format, then AFAICS, such a partition can start in any 
>> sector.
>
>Only at "logical cylinder boundary" (except for the first partition).

That's a requirement in ancient DOS systems that use CHS addressing 
(physical CHS, no less), isn't it  (so you can properly convert a 
within-partition address to a within-disk address)?

While I would guess most people still partition disks that way (Even 
linux-util fdisk seems to do it by default), they don't have to.

Doesn't matter for this discussion, though.  As Doug demonstrated, even 
when you do start at cylinder boundaries, half your partitions start on an 
even sector, because typical cylinders have an odd number of sectors.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Jeff Garzik


Douglas Gilbert wrote:

Bryan Henderson wrote:

What is an odd-aligned disk?


s/disk/partition/ ?



Example:  An odd-aligned disk in the 512-b logical / 1K-physical 
scenario is where odd LBAs indicate the start of a 1K physical sector. 
An even-aligned disk is where even LBAs indicate the start of a 1K 
physical sector.


In order to avoid too many RMW cycles, partition software SHOULD (using 
IETF language) be aware of the underlying physical sector size 
alignment, in order to align paritions for optimal performance.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Douglas Gilbert

Bryan Henderson wrote:
>> DOS partitions start partitions on odd-numbered sectors
> 
> I don't get this.  If you mean partitions defined by the classic DOS 
> partition table format, then AFAICS, such a partition can start in any 
> sector.

Bryan,
Typically the first partition on a DOS partitioned disk
starts at the next available sector after the mbr
which, for some bizarre reason, is 63 sectors long.
Hence:

# fdisk -lu /dev/hda

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders, total 156301488 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/hda1   *  6318314099 9157018+   c  W95 FAT32 (LBA)
/dev/hda21831410019551104  618502+  82  Linux swap / Solaris
/dev/hda419551105   15629638468372640   83  Linux


> 
>> so presuming you have odd-aligned disks, life is good.
> 
> What is an odd-aligned disk?

s/disk/partition/ ?
Perhaps hda1 and hda4 above are examples.

Doug Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] fc_transport: update potential link speeds

2007-03-12 Thread James Smart

This patch updates the FC transport for all speeds identified in SM-HBA.
Note: it does not sync the "bit" definitions, as that is actually insulated
from user-space via the sysfs text string. (I could do it, but it does introduce
a potential binary-incompatibility).

-- james s

Signed-off-by: James Smart <[EMAIL PROTECTED]>


diff -upNr a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
--- a/drivers/scsi/scsi_transport_fc.c  2007-03-12 13:56:16.0 -0500
+++ b/drivers/scsi/scsi_transport_fc.c  2007-03-12 14:11:20.0 -0500
@@ -200,6 +200,8 @@ static const struct {
{ FC_PORTSPEED_2GBIT,   "2 Gbit" },
{ FC_PORTSPEED_4GBIT,   "4 Gbit" },
{ FC_PORTSPEED_10GBIT,  "10 Gbit" },
+   { FC_PORTSPEED_8GBIT,   "8 Gbit" },
+   { FC_PORTSPEED_16GBIT,  "16 Gbit" },
{ FC_PORTSPEED_NOT_NEGOTIATED,  "Not Negotiated" },
 };
 fc_bitfield_name_search(port_speed, fc_port_speed_names)
diff -upNr a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
--- a/include/scsi/scsi_transport_fc.h  2007-03-12 13:56:23.0 -0500
+++ b/include/scsi/scsi_transport_fc.h  2007-03-12 14:10:28.0 -0500
@@ -108,6 +108,8 @@ enum fc_port_state {
 #define FC_PORTSPEED_2GBIT 2
 #define FC_PORTSPEED_4GBIT 4
 #define FC_PORTSPEED_10GBIT8
+#define FC_PORTSPEED_8GBIT 0x10
+#define FC_PORTSPEED_16GBIT0x20
 #define FC_PORTSPEED_NOT_NEGOTIATED(1 << 15) /* Speed not established */
 
 /*


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Sergei Shtylyov


Hello.

Bryan Henderson wrote:


DOS partitions start partitions on odd-numbered sectors


I don't get this.  If you mean partitions defined by the classic DOS 
partition table format, then AFAICS, such a partition can start in any 
sector.


   Only at "logical cylinder boudary" (except for the first partition).

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Bryan Henderson

>DOS partitions start partitions on odd-numbered sectors

I don't get this.  If you mean partitions defined by the classic DOS 
partition table format, then AFAICS, such a partition can start in any 
sector.

>so presuming you have odd-aligned disks, life is good.

What is an odd-aligned disk?

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: WRITE BUFFER commands through SG_IO getting rounded up to sector past 32k

2007-03-12 Thread Mike Christie

Aravind Parchuri wrote:
> [EMAIL PROTECTED] wrote:
>> Aravind Parchuri wrote:
>>   
>>> My log messages were getting all mixed up, so I cleaned up my little
>>> test to send just one command at a time. It actually looks like the mid
>>> layer passes the command through to open-iscsi with the right size the
>>> first time, but then it sends a second command with request_bufflen = 0.
>>>
>>> I can verify that the command completed on the target just like the
>>> regular ones did, so there should be no reason for a retry of any sort.
>>>
>>> Here's the log for a 32896 byte command:
>>> Mar  9 11:27:43 ITX000c292c3c8d kernel: sg_open: dev=0, flags=0x802
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_add_sfp: sfp=0xcbadc000
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_build_reserve: req_size=32768
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_build_indirect:
>>> buff_size=32768, blk_size=32768
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_add_sfp:   bufflen=32768,
>>> k_use_sg=1
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_ioctl: sg0, cmd=0x2285
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_common_write:  scsi
>>> opcode=0x3b, cmd_size=10
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_start_req: dxfer_len=32896
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_build_indirect:
>>> buff_size=32896, blk_size=33280
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: sg_write_xfer: num_xfer=32896,
>>> iovec_count=0, k_use_sg=2
>>> Mar  8 11:27:43 ITX000c292c3c8d kernel: iscsi_queuecommand: opcode 3b
>>> request_bufflen 32896 transfersize 32896
>>> 
>> Did you add your own output? Could you enable iscsi debugging? What
>> kernel is this with and what versions of open-iscsi (upstream or svn or
>> tarball release)?
>>   
> No custom output, all of this is from scsi (SCSI_LOG_TIMEOUT=5) and
> open-iscsi (DEBUG_SCSI enabled). I thought I mentioned this in an
> earlier mail -  the kernel is 2.6.19, but the open-iscsi drivers and

The output here does not have DEBUG_SCSI enabled. It just has your
custom iscsi output. The first log had DEBUG_SCSI iscsi output.

But what I want is not just parts you cut out. I am looking for all of
it. If I run:

sg_write_buffer /dev/sdi -l 32896

with my patch and a netapp target, I get the error:

Write buffer command not supported

If I look at the iscsi log, I see

Mar 12 12:59:54 madmax kernel: iscsi: cmd [itt 0x25 total 32896 imm_data
32896 unsol count 0, unsol offset 32896]
Mar 12 12:59:54 madmax kernel: iscsi: ctask enq [write cid 0 sc
81000e82d6c0 cdb 0x3b itt 0x25 len 32896 cmdsn 294 win 128]
Mar 12 12:59:54 madmax kernel: iscsi: ctask deq [cid 0 xmstate 2 itt 0x25]
Mar 12 12:59:55 madmax kernel: iscsi: cmdrsp [op 0x21 cid 0 itt 0x25 len 24]
Mar 12 12:59:55 madmax kernel: iscsi: copied 22 bytes of sense
Mar 12 12:59:55 madmax kernel: iscsi: done [sc 81000e82d6c0 res 2
itt 0x25]


which shows the len got set and that the target failed the target and we
got some sense back. The command is then failed to sg and sg returns to
userspace.

Your output just shows that we may be retrying the same command over and
over and it never completes which would be strange because the command
is a block pc command. It would help to see what happens with iscsi's
DEBUG_SCSI option to see if the command failed or completed ok. It would
also be nice to get the scsi-ml output to see what the sense is and what
is going on there, but as a first step we can look at the iscsi output.

Also what target are you using?
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] qla2xxx: Allow the extended-error-logging flag to be dynamic.

2007-03-12 Thread Andrew Vasquez

The module parameter, ql2xextended_error_logging, can now be
set dynamically by writing to the following sysfs entry:

/sys/module/qla2xxx/parameters/ql2xextended_error_logging

This alleviates the need for the driver to be unloaded and
reloaded in order to enable logging.

Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]>
---
 drivers/scsi/qla2xxx/qla_os.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index f67ef38..b6c96a8 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -62,7 +62,7 @@ MODULE_PARM_DESC(ql2xallocfwdump,
"vary by ISP type.  Default is 1 - allocate memory.");
 
 int ql2xextended_error_logging;
-module_param(ql2xextended_error_logging, int, S_IRUGO|S_IRUSR);
+module_param(ql2xextended_error_logging, int, S_IRUGO|S_IWUSR);
 MODULE_PARM_DESC(ql2xextended_error_logging,
"Option to enable extended error logging, "
"Default is 0 - no logging. 1 - log errors.");
-- 
1.5.0.3.382.g34572

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] qla2xxx: Update version number to 8.01.07-k6.

2007-03-12 Thread Andrew Vasquez

Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]>
---
 drivers/scsi/qla2xxx/qla_version.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_version.h 
b/drivers/scsi/qla2xxx/qla_version.h
index 61347ae..dc85495 100644
--- a/drivers/scsi/qla2xxx/qla_version.h
+++ b/drivers/scsi/qla2xxx/qla_version.h
@@ -7,7 +7,7 @@
 /*
  * Driver version
  */
-#define QLA2XXX_VERSION  "8.01.07-k5"
+#define QLA2XXX_VERSION  "8.01.07-k6"
 
 #define QLA_DRIVER_MAJOR_VER   8
 #define QLA_DRIVER_MINOR_VER   1
-- 
1.5.0.3.382.g34572

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] qla2xxx: fix RSCN handling on big-endian systems

2007-03-12 Thread Andrew Vasquez

From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>

qla2xxx driver fails to handle RSCN events affecting area or domain due
to an endian issue on big endian systems.  This fixes the port_id_t
structure on big endian systems.

Signed-off-by: Malahal Naineni <[EMAIL PROTECTED]>
Acked-by: Seokmann Ju <[EMAIL PROTECTED]>
Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]>
---
 drivers/scsi/qla2xxx/qla_def.h |   13 -
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_def.h b/drivers/scsi/qla2xxx/qla_def.h
index 05f4f2a..e8948b6 100644
--- a/drivers/scsi/qla2xxx/qla_def.h
+++ b/drivers/scsi/qla2xxx/qla_def.h
@@ -1478,14 +1478,17 @@ typedef union {
uint32_t b24 : 24;
 
struct {
-   uint8_t d_id[3];
-   uint8_t rsvd_1;
-   } r;
-
-   struct {
+#ifdef __BIG_ENDIAN
+   uint8_t domain;
+   uint8_t area;
+   uint8_t al_pa;
+#elif __LITTLE_ENDIAN
uint8_t al_pa;
uint8_t area;
uint8_t domain;
+#else
+#error "__BIG_ENDIAN or __LITTLE_ENDIAN must be defined!"
+#endif
uint8_t rsvd_1;
} b;
 } port_id_t;
-- 
1.5.0.3.382.g34572

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] qla2xxx: Add scan_[start|finish]() callbacks for ISP24xx HBAs.

2007-03-12 Thread Andrew Vasquez

Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]>
---
 drivers/scsi/qla2xxx/qla_os.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 68f5d24..f67ef38 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -157,6 +157,8 @@ static struct scsi_host_template qla24xx_driver_template = {
 
.slave_alloc= qla2xxx_slave_alloc,
.slave_destroy  = qla2xxx_slave_destroy,
+   .scan_finished  = qla2xxx_scan_finished,
+   .scan_start = qla2xxx_scan_start,
.change_queue_depth = qla2x00_change_queue_depth,
.change_queue_type  = qla2x00_change_queue_type,
.this_id= -1,
-- 
1.5.0.3.382.g34572

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] qla2xxx: Add cond_resched() calls during HBA flash manipulation.

2007-03-12 Thread Andrew Vasquez

We're observing soft lockups during HBA FLASH retrieval and
update.  Add cond_resched() each time around the tight-loops
during flash read()s/write()s.

Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]>
---
 drivers/scsi/qla2xxx/qla_sup.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_sup.c b/drivers/scsi/qla2xxx/qla_sup.c
index ff1dd41..362d041 100644
--- a/drivers/scsi/qla2xxx/qla_sup.c
+++ b/drivers/scsi/qla2xxx/qla_sup.c
@@ -466,6 +466,7 @@ qla24xx_read_flash_dword(scsi_qla_host_t *ha, uint32_t addr)
udelay(10);
else
rval = QLA_FUNCTION_TIMEOUT;
+   cond_resched();
}
 
/* TODO: What happens if we time out? */
@@ -508,6 +509,7 @@ qla24xx_write_flash_dword(scsi_qla_host_t *ha, uint32_t 
addr, uint32_t data)
udelay(10);
else
rval = QLA_FUNCTION_TIMEOUT;
+   cond_resched();
}
return rval;
 }
@@ -1255,6 +1257,7 @@ qla2x00_poll_flash(scsi_qla_host_t *ha, uint32_t addr, 
uint8_t poll_data,
}
udelay(10);
barrier();
+   cond_resched();
}
return status;
 }
@@ -1403,6 +1406,7 @@ qla2x00_read_flash_data(scsi_qla_host_t *ha, uint8_t 
*tmp_buf, uint32_t saddr,
if (saddr % 100)
udelay(10);
*tmp_buf = data;
+   cond_resched();
}
 }
 
@@ -1689,6 +1693,7 @@ update_flash:
rval = QLA_FUNCTION_FAILED;
break;
}
+   cond_resched();
}
} while (0);
qla2x00_flash_disable(ha);
-- 
1.5.0.3.382.g34572

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] qla2xxx: Drop acquisition of hardware_lock during flash manipulations.

2007-03-12 Thread Andrew Vasquez

There's no need given, I/O has been quiesced, RISC
interrupts have been disabled, and finally the RISC has been
paused.  Flash manipulation on ISP21xx, ISP22xx, and ISP23xx
parts requires the RISC to go through a full reset to
recover.

Signed-off-by: Andrew Vasquez <[EMAIL PROTECTED]>
---
 drivers/scsi/qla2xxx/qla_sup.c |6 --
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_sup.c b/drivers/scsi/qla2xxx/qla_sup.c
index 362d041..206bda0 100644
--- a/drivers/scsi/qla2xxx/qla_sup.c
+++ b/drivers/scsi/qla2xxx/qla_sup.c
@@ -1453,7 +1453,6 @@ uint8_t *
 qla2x00_read_optrom_data(struct scsi_qla_host *ha, uint8_t *buf,
 uint32_t offset, uint32_t length)
 {
-   unsigned long flags;
uint32_t addr, midpoint;
uint8_t *data;
struct device_reg_2xxx __iomem *reg = &ha->iobase->isp;
@@ -1462,7 +1461,6 @@ qla2x00_read_optrom_data(struct scsi_qla_host *ha, 
uint8_t *buf,
qla2x00_suspend_hba(ha);
 
/* Go with read. */
-   spin_lock_irqsave(&ha->hardware_lock, flags);
midpoint = ha->optrom_size / 2;
 
qla2x00_flash_enable(ha);
@@ -1477,7 +1475,6 @@ qla2x00_read_optrom_data(struct scsi_qla_host *ha, 
uint8_t *buf,
*data = qla2x00_read_flash_byte(ha, addr);
}
qla2x00_flash_disable(ha);
-   spin_unlock_irqrestore(&ha->hardware_lock, flags);
 
/* Resume HBA. */
qla2x00_resume_hba(ha);
@@ -1491,7 +1488,6 @@ qla2x00_write_optrom_data(struct scsi_qla_host *ha, 
uint8_t *buf,
 {
 
int rval;
-   unsigned long flags;
uint8_t man_id, flash_id, sec_number, data;
uint16_t wd;
uint32_t addr, liter, sec_mask, rest_addr;
@@ -1504,7 +1500,6 @@ qla2x00_write_optrom_data(struct scsi_qla_host *ha, 
uint8_t *buf,
sec_number = 0;
 
/* Reset ISP chip. */
-   spin_lock_irqsave(&ha->hardware_lock, flags);
WRT_REG_WORD(®->ctrl_status, CSR_ISP_SOFT_RESET);
pci_read_config_word(ha->pdev, PCI_COMMAND, &wd);
 
@@ -1697,7 +1692,6 @@ update_flash:
}
} while (0);
qla2x00_flash_disable(ha);
-   spin_unlock_irqrestore(&ha->hardware_lock, flags);
 
/* Resume HBA. */
qla2x00_resume_hba(ha);
-- 
1.5.0.3.382.g34572

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/6] qla2xxx: fixes for 2.6.21 [8.01.07-k6].

2007-03-12 Thread Andrew Vasquez

This patchset updates the qla2xxx driver to 8.01.07-k6.

 drivers/scsi/qla2xxx/qla_def.h |   13 -
 drivers/scsi/qla2xxx/qla_os.c  |4 +++-
 drivers/scsi/qla2xxx/qla_sup.c |   11 +--
 drivers/scsi/qla2xxx/qla_version.h |2 +-
 4 files changed, 17 insertions(+), 13 deletions(-)

here's the commits:

- fix RSCN handling on big-endian systems
- Add scan_[start|finish]() callbacks for ISP24xx HBAs.
- Add cond_resched() calls during HBA flash manipulation.
- Drop acquisition of hardware_lock during flash manipulations.
- Allow the extended-error-logging flag to be dynamic.
- Update version number to 8.01.07-k6.

Regards,
Andrew Vasquez
QLogic Corporation
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Make SG_SET_FORCE_LOW_DMA behave as advertised

2007-03-12 Thread Mike Christie

Olaf Kirch wrote:
> Make SG_SET_FORCE_LOW_DMA behave as advertised
> 
> I came across this by accident. I have serious doubts whether ISA DMA
> is really relevant these days :-) but what the heck. Feel free to disregard
> if this code is headed for the recycler anyway.
> 
> The SCSI-HOWTO says this about SG_SET_FORCE_LOW_DMA:
> 
>   If the "reserved" buffer allocated on open() is not in use then
>   it will be de-allocated and re-allocated under the 16MB level
>   (and the latter operation could fail yielding ENOMEM).
> 
> I came across this by accident - the current code will never reallocate
> the buffer during the ioctl, because it first sets sfp->low_dma to 1,
> and then checks the very same variable for equality with 0.
> 
> The patch below changes the order of commands, and also moves
> the buffer reallocation around so that it also happens when
> the device has unchecked_isa_dma set.

It might be needed. The removal was my fault. If you are just worried
about using the right memory and the host had unchecked_isa_dma set, the
buffer is now bounced for sg by the block layer helpers so sg does not
need to worry about that.

The problem with this is that the sg code would allocate large chunks of
memory so it would try to make large segments, and if you were trying to
do a really large IO then the isa_page_pool may not work like the old sg
code.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Martin K. Petersen

> "Doug" == Douglas Gilbert <[EMAIL PROTECTED]> writes:

Doug> SAT is now a standard and an agenda item for SAT-2 is to wire
Doug> ATA8-ACS's large sector size support to the additions to SBC-3
Doug> mentioned above.

Doug> I'm not sure how this stuff plays with end to end data
Doug> protection :-) 

The proposal you forwarded talks about "transformed protection
information" but doesn't go into details.  

Assuming the drive has 4KB physical blocks and receives 512 byte
logical blocks, it's easy to verify the integrity of the 512 byte
sector and then do R-M-W on the physical.  Similarly, on the way out
logical guard and ref tags could be generated after integrity of the
physical has been verified.

The only thing that really bites is that the app tag will be per
physical block and not per logical (unless the drive leaves enough
space to store 8 tags per 4KB sector).

-- 
Martin K. Petersen  Oracle Linux Engineering

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to send inquiry command to thorugh sd path (i.e. /dev/sda) by using SG_IO ioctl

2007-03-12 Thread Douglas Gilbert

dudekula mastan wrote:
> Hi Gilbert,
>
>   Thanks for quick reply.
>
>   The example program (sg_Simple --- not only this all examples) is taking 
> /dev/sg  path as input  but I want /dev/sd path as input.

In the lk 2.6 series, it will also work for "sd" devices
(and "hd" devices if they happen to be cd/dvd drives).

>   Please explain me with an example, which takes /dev/sd path as input.

You have one already. Actually you have lots of examples there.

>Is SG_IO supports sd driver ? 

yes, in the lk 2.6 series.

 I am not sure.. I think it will work only for sg driver. Am I correct ?.

That was correct several years ago, not now.

Doug Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Douglas Gilbert

Ric Wheeler wrote:
> Alan Cox wrote:
>>> First generation of 1K sector drives will continue to use the same
>>> 512-byte ATA sector size you are familiar with.  A single 512-byte
>>> write will cause the drive to perform a read-modify-write cycle. 
>>> This configuration is physical 1K sector, logical 512b sector.
>>
>> The problem case is "read-modify-screwup"
>>
>> At that point we've trashed the block we were writing (a well studied
>> recovery case), and we've blasted some previously sane, totally
>> unrelated sector of data out of existance. Thats why we need to know
>> ideally if they are doing the write to a different physical block when
>> they do this, so that we don't lose the old data. My guess is they won't
>> as it'll be hard.
> 
> I think that the firmware would have to do this in the drive's write
> cache and would always write the modified data back to the same physical
> sector (unless a media error forces a sector remap).
> 
> If firmware modifies the 7 512 byte sectors that it read to do the 1 512
> byte sector write, then we certainly would see what you describe happen.
> 
> In general, it would seem to be a bad idea to do allocate a different
> physical sector to underpin this king of read-modify-write since that
> would kill contiguous layout of files, etc.
> 
>>> A future configuration will change the logical ATA interface away
>>> from 512-byte sectors to 1K or 4K.  Here, it is impossible to read a
>>> quantity smaller than 1K or 4K, whatever the sector size is.
>>
>> That one I'm not worried about - other than "guess how Redmond decide to
>> make partition tables work" that one is mostly easy (be fun to see how
>> many controllers simply can't cope with the command formats)
>>
> 
> This will be interesting to find out. I will be sharing a panel with
> some BIOS & MS people, so I will update all on what I hear,

Ric,
Just to add a SCSI perspective, it looks like 4 KB sectored
disks will be almost exclusively ATA devices. It is being
done to improve capacity at the expensive of performance.
[SCSI/FC/SAS disks typically trade off capacity for better
performance.]

Support for disks with smaller logical block size than
physical block size has already been added to SBC-3. The
overview of this document gives a rationale:
www.t10.org/ftp/t10/document.06/06-034r5.pdf

SAT is now a standard and an agenda item for SAT-2 is
to wire ATA8-ACS's large sector size support to the
additions to SBC-3 mentioned above.


I'm not sure how this stuff plays with end to end data
protection :-)
Most SCSI disks currently allow formatting sizes of 512
up to 528 bytes per logical block.

Doug Gilbert



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Christoph Hellwig

On Mon, Mar 12, 2007 at 10:45:16AM -0400, Jeff Garzik wrote:
> Christoph Hellwig wrote:
> >the occasional 2k sector SCSI MO device aswell.  It would be nice to
> >get samples of large sector size ATA devices into the hands of developers
> >to do real world testing of the whole stack.
> 
> "hands of developers" meaning you specifically?  :)

No.  I probably wouldn't have time to deal with it aswell.

> I've had a 512b-logical/1K-physical ATA test drive for a few months now, 
> and another couple arrived today.

Ok, that's exactly what I meant.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Jeff Garzik


Christoph Hellwig wrote:

the occasional 2k sector SCSI MO device aswell.  It would be nice to
get samples of large sector size ATA devices into the hands of developers
to do real world testing of the whole stack.


"hands of developers" meaning you specifically?  :)

I've had a 512b-logical/1K-physical ATA test drive for a few months now, 
and another couple arrived today.


Hopefully people can parse what I've been posting, since I cannot give 
out raw numbers or data at this time.


Of course, with RMW drives that leave the 512-b logical interface 
untouched, I had expected that they would Just Work(tm) and that is 
pretty much what happened.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Jeff Garzik


Jan Engelhardt wrote:

On Mar 11 2007 22:45, Ric Wheeler wrote:

Jan Engelhardt wrote:

On Mar 11 2007 18:51, Ric Wheeler wrote:


During the recent IO/FS workshop, we spoke briefly about the
coming change to a 4k sector size for disks on linux. If I
recall correctly, the general feeling was that the impact was
not significant since we already do most file system IO in 4k
page sizes and should be fine as long as we partition drives
correctly and avoid non-4k aligned partitions.


Sorry about jumping right in, but what about an 'old-style'
partition table that relies on 512 as a unit?



I think that the normal case would involve new drives which
would need to be partitioned in 4k aligned partitions.
Shouldn't that work regardless of the unit used in the
partition table?


Assume this partition table on my current HD:

Disk /dev/hdc: 251.0 GB, 251000193024 bytes
255 heads, 63 sectors/track, 30515 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Start  End  Blocks   Id  System
/dev/hdc1   1 33  265041   82  Linux swap / Solaris
/dev/hdc2  34  30515   2448466655  Extended

That is, 255 * 63 * 30515 * 512 == roughly 251 GB.

Now, if this disk was copied byte per byte (/bin/dd) to a
4096-based disk, and Linux would start using a sector size of
4096, then I would suddenly have

255 * 63 * 30515 * 4096 == 2 TB

Although I would not mind the 2 TB, the partition table would
read quite differently (note the Blocks column which is
multiplied by 4 (512x4=4096))


At this level, for RMW drives, nothing changes.  The partition software, 
ATA driver, and all other bits continue to think that sector size == 512 
bytes.


The partition software /hopefully/ becomes smart enough to understand 
the alignment necessary, but that is not a requirement.


This is the key to understanding the difference between a physical 
(==platters) sector size change without a logical (==ATA interface) 
sector size change.




   Device Start  End  Blocks   Id  System
/dev/hdc1   1 33 1060164   82  Linux swap / Solaris
/dev/hdc2  34  30515   9793866605  Extended

Which would mean that the swap partition reaches into the real
data partition and would corrupt it.


For RMW drives, RMW cycles would occur but not corruption.

For non-RMW drives, this just wouldn't occur.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Alan Cox

> For 1K/4K logical sector sizes, who knows.  EFI?  
> Certainly seems incompatible with the current popular DOS partition format.

Its a bit messier than that. There are two interpretations of "DOS"
partition formats found on 2K sector size magneto opticals. One is that
everything is the same as before (as if sectors were 512 byte), the other
is a different "everything is the same" which scales by the 2K sector
size. The two are of course wonderfully incompatible
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread James Bottomley

On Mon, 2007-03-12 at 08:18 +, Christoph Hellwig wrote:
> The FS stack and higher levels of the I/O stack should be mostly ready.
> The S/390 DASDs are commonly used with 4k sector sizes, and we've had
> the occasional 2k sector SCSI MO device aswell.  It would be nice to
> get samples of large sector size ATA devices into the hands of developers
> to do real world testing of the whole stack.

Theoretically, we already have the capacity to verify this.  Although
not with ATA. However, since ATA uses virtually the same paths as SCSI,
we could test with variable sector SCSI devices, and SCSI does allow you
to reformat the device with different sector sizes.

James

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Jeff Garzik


Jan Engelhardt wrote:

On Mar 11 2007 18:51, Ric Wheeler wrote:

During the recent IO/FS workshop, we spoke briefly about the
coming change to a 4k sector size for disks on linux. If I
recall correctly, the general feeling was that the impact was
not significant since we already do most file system IO in 4k
page sizes and should be fine as long as we partition drives
correctly and avoid non-4k aligned partitions.


Sorry about jumping right in, but what about an 'old-style'
partition table that relies on 512 as a unit?


For 1K/4K physical sector size, where logical sector size remains 512-b, 
nothing changes.  DOS partitions start partitions on odd-numbered 
sectors, so presuming you have odd-aligned disks, life is good.


For 1K/4K logical sector sizes, who knows.  EFI?  

Certainly seems incompatible with the current popular DOS partition format.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Jeff Garzik


Alan Cox wrote:
First generation of 1K sector drives will continue to use the same 
512-byte ATA sector size you are familiar with.  A single 512-byte write 
will cause the drive to perform a read-modify-write cycle.  This 
configuration is physical 1K sector, logical 512b sector.


The problem case is "read-modify-screwup"

At that point we've trashed the block we were writing (a well studied
recovery case), and we've blasted some previously sane, totally
unrelated sector of data out of existance. Thats why we need to know
ideally if they are doing the write to a different physical block when
they do this, so that we don't lose the old data. My guess is they won't
as it'll be hard.


Strict ATA command set answer:  you will have no idea what goes on under 
the hood.  The current 512-b interface stays /exactly/ the same, save 
for a word or two in IDENTIFY DEVICE telling you the "secret" physical 
sector size.  If all your I/Os are aligned properly, then you need not 
worry about RMW cycles, as they will not occur.


Intuition answer:  they will use their firmware-internal standard code 
for scheduling reads and writes, and will only reallocate sectors as 
needed by media failure or similar events.


The "M" part of the modify cycle happens in disk ram.  So from the 
disk's point of view, a single 512-b write would require reading a 
single 1K hard sector, updating the contents in cache RAM, and then 
writing a single 1K hard sector.  The reading of the unknown half of the 
sector can be scheduled well in advance, usually, since writeback 
caching gives the drive plenty of time (relatively speaking) to optimize 
things.


Overall, it definitely adds a few more points of failure, but we can't 
do much at all about those points of failure.


In my own experiments on my own Fedora workstation, ~66% of IOs in Linux 
start on an odd sector, and ~33% started on even-numbered sectors.  For 
a 1K-sector drive with 'odd' alignment, the configuration Microsoft will 
likely want, that means the majority of disk transactions will avoid a 
RMW cycle, but a still-numerous minority will not.  I did not test 
transfer length, to see how many transfers /ended/ on an odd sector, 
thus determining how many RMW cycles the tail of an average I/O requires.




A future configuration will change the logical ATA interface away from 
512-byte sectors to 1K or 4K.  Here, it is impossible to read a quantity 
smaller than 1K or 4K, whatever the sector size is.


That one I'm not worried about - other than "guess how Redmond decide to
make partition tables work" that one is mostly easy (be fun to see how
many controllers simply can't cope with the command formats)


Indeed...

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Ric Wheeler


Alan Cox wrote:
First generation of 1K sector drives will continue to use the same 
512-byte ATA sector size you are familiar with.  A single 512-byte write 
will cause the drive to perform a read-modify-write cycle.  This 
configuration is physical 1K sector, logical 512b sector.


The problem case is "read-modify-screwup"

At that point we've trashed the block we were writing (a well studied
recovery case), and we've blasted some previously sane, totally
unrelated sector of data out of existance. Thats why we need to know
ideally if they are doing the write to a different physical block when
they do this, so that we don't lose the old data. My guess is they won't
as it'll be hard.


I think that the firmware would have to do this in the drive's write 
cache and would always write the modified data back to the same physical 
sector (unless a media error forces a sector remap).


If firmware modifies the 7 512 byte sectors that it read to do the 1 512 
byte sector write, then we certainly would see what you describe happen.


In general, it would seem to be a bad idea to do allocate a different 
physical sector to underpin this king of read-modify-write since that 
would kill contiguous layout of files, etc.


A future configuration will change the logical ATA interface away from 
512-byte sectors to 1K or 4K.  Here, it is impossible to read a quantity 
smaller than 1K or 4K, whatever the sector size is.


That one I'm not worried about - other than "guess how Redmond decide to
make partition tables work" that one is mostly easy (be fun to see how
many controllers simply can't cope with the command formats)



This will be interesting to find out. I will be sharing a panel with 
some BIOS & MS people, so I will update all on what I hear,


ric
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Make SG_SET_FORCE_LOW_DMA behave as advertised

2007-03-12 Thread Olaf Kirch

Make SG_SET_FORCE_LOW_DMA behave as advertised

I came across this by accident. I have serious doubts whether ISA DMA
is really relevant these days :-) but what the heck. Feel free to disregard
if this code is headed for the recycler anyway.

The SCSI-HOWTO says this about SG_SET_FORCE_LOW_DMA:

If the "reserved" buffer allocated on open() is not in use then
it will be de-allocated and re-allocated under the 16MB level
(and the latter operation could fail yielding ENOMEM).

I came across this by accident - the current code will never reallocate
the buffer during the ioctl, because it first sets sfp->low_dma to 1,
and then checks the very same variable for equality with 0.

The patch below changes the order of commands, and also moves
the buffer reallocation around so that it also happens when
the device has unchecked_isa_dma set.

Signed-off-by: [EMAIL PROTECTED]

---
 drivers/scsi/sg.c |   13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

Index: linux-2.6.20/drivers/scsi/sg.c
===
--- linux-2.6.20.orig/drivers/scsi/sg.c
+++ linux-2.6.20/drivers/scsi/sg.c
@@ -842,17 +842,22 @@ sg_ioctl(struct inode *inode, struct fil
result = get_user(val, ip);
if (result)
return result;
+   if (val == 0) {
+   if (sdp->detached)
+   return -ENODEV;
+   val = sdp->device->host->unchecked_isa_dma;
+   }
if (val) {
+   int was_low_dma = sfp->low_dma;
+
sfp->low_dma = 1;
-   if ((0 == sfp->low_dma) && (0 == sg_res_in_use(sfp))) {
+   if ((0 == was_low_dma) && (0 == sg_res_in_use(sfp))) {
val = (int) sfp->reserve.bufflen;
sg_remove_scat(&sfp->reserve);
sg_build_reserve(sfp, val);
}
} else {
-   if (sdp->detached)
-   return -ENODEV;
-   sfp->low_dma = sdp->device->host->unchecked_isa_dma;
+   sfp->low_dma = 0;
}
return 0;
case SG_GET_LOW_DMA:

-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to send inquiry command to thorugh sd path (i.e. /dev/sda) by using SG_IO ioctl

2007-03-12 Thread Douglas Gilbert

MasthanUsha wrote:
> 
> Hi All,
>  
> Any one og you have any idea on scsi inquiry command ?
>  
> I want to send an Inquiry command to a scsi device through sd path (.i.e.
> /dev/sda or /dev/sdb) by using SG_IO ioctl. Please explain me...

If you look at http://www.torque.net/sg/sg3_utils.html
and fetch a tarball (e.g. sg3_utils-1.23.tgz) then have
a look at the examples/sg_simple1.c file.

Doug Gilbert

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Alan Cox

> First generation of 1K sector drives will continue to use the same 
> 512-byte ATA sector size you are familiar with.  A single 512-byte write 
> will cause the drive to perform a read-modify-write cycle.  This 
> configuration is physical 1K sector, logical 512b sector.

The problem case is "read-modify-screwup"

At that point we've trashed the block we were writing (a well studied
recovery case), and we've blasted some previously sane, totally
unrelated sector of data out of existance. Thats why we need to know
ideally if they are doing the write to a different physical block when
they do this, so that we don't lose the old data. My guess is they won't
as it'll be hard.
 
> A future configuration will change the logical ATA interface away from 
> 512-byte sectors to 1K or 4K.  Here, it is impossible to read a quantity 
> smaller than 1K or 4K, whatever the sector size is.

That one I'm not worried about - other than "guess how Redmond decide to
make partition tables work" that one is mostly easy (be fun to see how
many controllers simply can't cope with the command formats)

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Alan Cox

> Now, if this disk was copied byte per byte (/bin/dd) to a
> 4096-based disk, and Linux would start using a sector size of
> 4096, then I would suddenly have

The ATA drives I'm aware of report 512 byte sector size, do 512 byte
I/O's but use 4K physical sectors and to get sane performance except the
OS to issue sensible sized I/O requests.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

2007-03-12 Thread Christoph Hellwig

On Sun, Mar 11, 2007 at 06:51:53PM -0400, Ric Wheeler wrote:
> 
> During the recent IO/FS workshop, we spoke briefly about the coming 
> change to a 4k sector size for disks on linux. If I recall correctly, 
> the general feeling was that the impact was not significant since we 
> already do most file system IO in 4k page sizes and should be fine as 
> long as we partition drives correctly and avoid non-4k aligned partitions.
> 
> Are there other concerns in the IO or FS stack that we should bring up 
> with vendors?  I have been asked to summarize the impact of 4k sectors 
> on linux  for a disk vendor gathering and want to make sure that I put 
> all of our linux specific items into that summary...

The FS stack and higher levels of the I/O stack should be mostly ready.
The S/390 DASDs are commonly used with 4k sector sizes, and we've had
the occasional 2k sector SCSI MO device aswell.  It would be nice to
get samples of large sector size ATA devices into the hands of developers
to do real world testing of the whole stack.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

[PATCH] tgt: remove the code to build sense

Re: WRITE BUFFER commands through SG_IO getting rounded up to sector past 32k

[ PATCH ] mptsas: Fix oops during driver load time(rev 2)

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

[PATCH] fc_transport: update potential link speeds

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: WRITE BUFFER commands through SG_IO getting rounded up to sector past 32k

[PATCH 5/6] qla2xxx: Allow the extended-error-logging flag to be dynamic.

[PATCH 6/6] qla2xxx: Update version number to 8.01.07-k6.

[PATCH 1/6] qla2xxx: fix RSCN handling on big-endian systems

[PATCH 2/6] qla2xxx: Add scan_[start|finish]() callbacks for ISP24xx HBAs.

[PATCH 3/6] qla2xxx: Add cond_resched() calls during HBA flash manipulation.

[PATCH 4/6] qla2xxx: Drop acquisition of hardware_lock during flash manipulations.

[PATCH 0/6] qla2xxx: fixes for 2.6.21 [8.01.07-k6].

Re: [PATCH] Make SG_SET_FORCE_LOW_DMA behave as advertised

Re: impact of 4k sector size on the IO & FS stack

Re: How to send inquiry command to thorugh sd path (i.e. /dev/sda) by using SG_IO ioctl

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

[PATCH] Make SG_SET_FORCE_LOW_DMA behave as advertised

Re: How to send inquiry command to thorugh sd path (i.e. /dev/sda) by using SG_IO ioctl

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

Re: impact of 4k sector size on the IO & FS stack

36 matches

Site Navigation

Mail list logo

Footer information