Re: Integration of SCST in the mainstream Linux kernel

2008-02-19 Thread Erez Zilber
Bart Van Assche wrote:
 On Feb 18, 2008 10:43 AM, Erez Zilber [EMAIL PROTECTED] wrote:
   
 If you use a high value for FirstBurstLength, all (or most) of your data
 will be sent as unsolicited data-out PDUs. These PDUs don't use the RDMA
 engine, so you miss the advantage of IB.
 

 Hello Erez,

 Did you notice the e-mail Roland Dreier wrote on Februari 6, 2008 ?
 This is what Roland wrote:
   
 I think the confusion here is caused by a slight misuse of the term
 RDMA.  It is true that all data is always transported over an
 InfiniBand connection when iSER is used, but not all such transfers
 are one-sided RDMA operations; some data can be transferred using
 send/receive operations.
 

   
Yes, I saw that. I tried to give an explanation with more details.

 Or: data sent during the first burst is not transferred via one-sided
 remote memory reads or writes but via two-sided send/receive
 operations. At least on my setup, these operations are as fast as
 one-sided remote memory reads or writes. As an example, I obtained the
 following numbers on my setup (SDR 4x network);
 ib_write_bw: 933 MB/s.
 ib_read_bw: 905 MB/s.
 ib_send_bw: 931 MB/s.

   
According to these numbers one can think that you don't need RDMA at
all, just send iSCSI PDUs over IB. The benchmarks that you use are
synthetic IB benchmarks that are not equivalent to iSCSI over iSER. They
just send IB packets. I'm not surprised that you got more or less the
same performance because, AFAIK, ib_send_bw doesn't copy data (unlike
iSCSI that has to copy data that is sent/received without RDMA).

When you use RDMA with iSCSI (i.e. iSER), you don't need to create iSCSI
PDUs and process them. The CPU is not busy as it is with iSCSI over TCP
because no data copies are required. Another advantage is that you don't
need header/data digest because the IB HW does that.

Erez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-18 Thread Erez Zilber
Bart Van Assche wrote:
 On Feb 5, 2008 6:01 PM, Erez Zilber [EMAIL PROTECTED] wrote:
   
 Using such large values for FirstBurstLength will give you poor
 performance numbers for WRITE commands (with iSER). FirstBurstLength
 means how much data should you send as unsolicited data (i.e. without
 RDMA). It means that your WRITE commands were sent without RDMA.
 

 Sorry, but I'm afraid you got this wrong. When the iSER transport is
 used instead of TCP, all data is sent via RDMA, including unsolicited
 data. If you have look at the iSER implementation in the Linux kernel
 (source files under drivers/infiniband/ulp/iser), you will see that
 all data is transferred via RDMA and not via TCP/IP.

   

When you execute WRITE commands with iSCSI, it works like this:

EDTL (Expected data length) - the data length of your command

FirstBurstLength - the length of data that will be sent as unsolicited
data (i.e. as immediate data with the SCSI command and as unsolicited
data-out PDUs)

If you use a high value for FirstBurstLength, all (or most) of your data
will be sent as unsolicited data-out PDUs. These PDUs don't use the RDMA
engine, so you miss the advantage of IB.

If you use a lower value for FirstBurstLength, EDTL - FirstBurstLength
bytes will be sent as solicited data-out PDUs. With iSER, solicited
data-out PDUs are RDMA operations.

I hope that I'm more clear now.

Erez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread Erez Zilber
Bart Van Assche wrote:
 On Jan 30, 2008 12:32 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote:
   
 iSER has parameters to limit the maximum size of RDMA (it needs to
 repeat RDMA with a poor configuration)?
 

 Please specify which parameters you are referring to. As you know I
 had already repeated my tests with ridiculously high values for the
 following iSER parameters: FirstBurstLength, MaxBurstLength and
 MaxRecvDataSegmentLength (16 MB, which is more than the 1 MB block
 size specified to dd).

   
Using such large values for FirstBurstLength will give you poor
performance numbers for WRITE commands (with iSER). FirstBurstLength
means how much data should you send as unsolicited data (i.e. without
RDMA). It means that your WRITE commands were sent without RDMA.

Erez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread Erez Zilber
Bart Van Assche wrote:
 As you probably know there is a trend in enterprise computing towards
 networked storage. This is illustrated by the emergence during the
 past few years of standards like SRP (SCSI RDMA Protocol), iSCSI
 (Internet SCSI) and iSER (iSCSI Extensions for RDMA). Two different
 pieces of software are necessary to make networked storage possible:
 initiator software and target software. As far as I know there exist
 three different SCSI target implementations for Linux:
 - The iSCSI Enterprise Target Daemon (IETD,
 http://iscsitarget.sourceforge.net/);
 - The Linux SCSI Target Framework (STGT, http://stgt.berlios.de/);
 - The Generic SCSI Target Middle Level for Linux project (SCST,
 http://scst.sourceforge.net/).
 Since I was wondering which SCSI target software would be best suited
 for an InfiniBand network, I started evaluating the STGT and SCST SCSI
 target implementations. Apparently the performance difference between
 STGT and SCST is small on 100 Mbit/s and 1 Gbit/s Ethernet networks,
 but the SCST target software outperforms the STGT software on an
 InfiniBand network. See also the following thread for the details:
 http://sourceforge.net/mailarchive/forum.php?thread_name=e2e108260801170127w2937b2afg9bef324efa945e43%40mail.gmail.comforum_name=scst-devel.

   
Sorry for the late response (but better late than never).

One may claim that STGT should have lower performance than SCST because
its data path is from userspace. However, your results show that for
non-IB transports, they both show the same numbers. Furthermore, with IB
there shouldn't be any additional difference between the 2 targets
because data transfer from userspace is as efficient as data transfer
from kernel space.

The only explanation that I see is that fine tuning for iSCSI  iSER is
required. As was already mentioned in this thread, with SDR you can get
~900 MB/sec with iSER (on STGT).

Erez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IB/iSER: add logical unit reset support

2008-01-22 Thread Erez Zilber
eh_device_reset_handler was already added to scsi_host_template
in iscsi_tcp, and is now added also for iscsi_iser.

Signed-off-by: Erez Zilber [EMAIL PROTECTED]
Signed-off-by: Mike Christie [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index fd69fb3..4cd0705 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -552,6 +552,7 @@ static struct scsi_host_template iscsi_iser_sht = {
.max_sectors= 1024,
.cmd_per_lun= ISCSI_MAX_CMD_PER_LUN,
.eh_abort_handler   = iscsi_eh_abort,
+   .eh_device_reset_handler= iscsi_eh_device_reset,
.eh_host_reset_handler  = iscsi_eh_host_reset,
.use_clustering = DISABLE_CLUSTERING,
.proc_name  = iscsi_iser,
-- 
1.5.3.7



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-17 Thread Erez Zilber
FUJITA Tomonori wrote:
 On Thu, 17 Jan 2008 12:48:28 +0300
 Vladislav Bolkhovitin [EMAIL PROTECTED] wrote:

   
 FUJITA Tomonori wrote:
 
 On Thu, 17 Jan 2008 10:27:08 +0100
 Bart Van Assche [EMAIL PROTECTED] wrote:


   
 Hello,

 I have performed a test to compare the performance of SCST and STGT.
 Apparently the SCST target implementation performed far better than
 the STGT target implementation. This makes me wonder whether this is
 due to the design of SCST or whether STGT's performance can be
 improved to the level of SCST ?

 Test performed: read 2 GB of data in blocks of 1 MB from a target (hot
 cache -- no disk reads were performed, all reads were from the cache).
 Test command: time dd if=/dev/sde of=/dev/null bs=1M count=2000

  STGT read SCST read
   performance (MB/s)   performance (MB/s)
 Ethernet (1 Gb/s network)7789
 IPoIB (8 Gb/s network)   82   229
 SRP (8 Gb/s network)N/A   600
 iSER (8 Gb/s network)80   N/A

 These results show that SCST uses the InfiniBand network very well
 (effectivity of about 88% via SRP), but that the current STGT version
 is unable to transfer data faster than 82 MB/s. Does this mean that
 there is a severe bottleneck  present in the current STGT
 implementation ?
 
 I don't know about the details but Pete said that he can achieve more
 than 900MB/s read performance with tgt iSER target using ramdisk.

 http://www.mail-archive.com/[EMAIL PROTECTED]/msg4.html
   
 Please don't confuse multithreaded latency insensitive workload with 
 single threaded, hence latency sensitive one.
 

 Seems that he can get good performance with single threaded workload:

 http://www.osc.edu/~pw/papers/wyckoff-iser-snapi07-talk.pdf


 But I don't know about the details so let's wait for Pete to comment
 on this.

 Perhaps Voltaire people could comment on the tgt iSER performances.
   

We didn't run any real performance test with tgt, so I don't have
numbers yet. I know that Pete got ~900 MB/sec by hacking sgp_dd, so all
data was read/written to the same block (so it was all done in the
cache). Pete - am I right?

As already mentioned, he got that with IB SDR cards that are 10 Gb/sec
cards in theory (actual speed is ~900 MB/sec). With DDR cards (20
Gb/sec), you can get even more. I plan to test that in the near future.

Erez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-17 Thread Erez Zilber

 We didn't run any real performance test with tgt, so I don't have
 numbers yet. I know that Pete got ~900 MB/sec by hacking sgp_dd, so all
 data was read/written to the same block (so it was all done in the
 cache). Pete - am I right?

 As already mentioned, he got that with IB SDR cards that are 10 Gb/sec
 cards in theory (actual speed is ~900 MB/sec). With DDR cards (20
 Gb/sec), you can get even more. I plan to test that in the near future.

 Are you writing about a maximum possible speed which he got, including
 multithreded tests with many outstanding commands or about speed he
 got  on single threaded reads with one outstanding command? This
 thread is about the second one.


As I said, we didn't run any performance tests on stgt yet.

Erez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: usage of max_sectors in scsi_host_template

2007-11-07 Thread Erez Zilber

Stefan Richter wrote:


Erez Zilber wrote:
 I'm not sure that I understand the meaning of max_sectors in
 scsi_host_template.

Did you have a look at scsi_mid_low_api.txt?
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/scsi/scsi_mid_low_api.txt;h=6f70f2b9327e1f0db7bc05bdbf2d6ce3b2fcbdcf#l1232



I will go over it. Thanks for the link.


 Is it the maximum data length of a single SCSI command?

Yes.

 Is it in bytes?

No, it is in units of 512 bytes.

 What's the size of a sector?

Usually 512 bytes according to above doc.  Always 512 bytes from the
point of view of block/ll_rw_blk.c::blk_queue_max_sectors().
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=block/ll_rw_blk.c;h=75c98d58f4ddf7252e2717e0924b9d6a8925b4e5#l590



So, ll_rw_blk actually uses the max_sectors value to chop requests 
larger than max_sectors. Am I right? If yes, I have a problem:


I'm running sgp_dd (on RHAS 4 up4 - kernel version is 2.6.9), so it 
calls scsi-ml directly (without going through ll_rw_blk). I ran it with 
the following parameters:


sgp_dd bs=512 of=/dev/null if=/dev/sg1 bpt=2048 thr=4 time=1 count=100k 
deb=9


I see that a single 1MB command is generated. Here's the debug info from 
sgp_dd:


sgp_dd: if=/dev/sg1 skip=0 of=/dev/null seek=0 count=102400
Start of loop, count=102400, in_num_sect=0, out_num_sect=0
Starting worker thread k=0
sg_start_io: SCSI READ, blk=0 num_blks=2048
Read (10) [28 00 00 00 00 00 00 08 00 00 ]
dir=-3, len=1048576, dxfrp=0x2a9558a000, cmd_len=10

Now, the low-level driver below scsi-ml is open-iscsi over iSER. 
max_sectors is set to 1024 (i.e. 512 kB). Still, the iSER driver 
receives a 1MB command. I guess that the max_sectors value is never 
used. Am I right?


Thanks,
Erez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


usage of max_sectors in scsi_host_template

2007-11-06 Thread Erez Zilber
I'm not sure that I understand the meaning of max_sectors in 
scsi_host_template. Is it the maximum data length of a single SCSI 
command? Is it in bytes? What's the size of a sector?



Thanks,

--



Erez Zilber | 972-9-971-7689

Software Engineer, Storage Solutions

Voltaire – _The Grid Backbone_

__

www.voltaire.com http://www.voltaire.com/


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html