Re: Integration of SCST in the mainstream Linux kernel
Bart Van Assche wrote: On Feb 18, 2008 10:43 AM, Erez Zilber [EMAIL PROTECTED] wrote: If you use a high value for FirstBurstLength, all (or most) of your data will be sent as unsolicited data-out PDUs. These PDUs don't use the RDMA engine, so you miss the advantage of IB. Hello Erez, Did you notice the e-mail Roland Dreier wrote on Februari 6, 2008 ? This is what Roland wrote: I think the confusion here is caused by a slight misuse of the term RDMA. It is true that all data is always transported over an InfiniBand connection when iSER is used, but not all such transfers are one-sided RDMA operations; some data can be transferred using send/receive operations. Yes, I saw that. I tried to give an explanation with more details. Or: data sent during the first burst is not transferred via one-sided remote memory reads or writes but via two-sided send/receive operations. At least on my setup, these operations are as fast as one-sided remote memory reads or writes. As an example, I obtained the following numbers on my setup (SDR 4x network); ib_write_bw: 933 MB/s. ib_read_bw: 905 MB/s. ib_send_bw: 931 MB/s. According to these numbers one can think that you don't need RDMA at all, just send iSCSI PDUs over IB. The benchmarks that you use are synthetic IB benchmarks that are not equivalent to iSCSI over iSER. They just send IB packets. I'm not surprised that you got more or less the same performance because, AFAIK, ib_send_bw doesn't copy data (unlike iSCSI that has to copy data that is sent/received without RDMA). When you use RDMA with iSCSI (i.e. iSER), you don't need to create iSCSI PDUs and process them. The CPU is not busy as it is with iSCSI over TCP because no data copies are required. Another advantage is that you don't need header/data digest because the IB HW does that. Erez - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Bart Van Assche wrote: On Feb 5, 2008 6:01 PM, Erez Zilber [EMAIL PROTECTED] wrote: Using such large values for FirstBurstLength will give you poor performance numbers for WRITE commands (with iSER). FirstBurstLength means how much data should you send as unsolicited data (i.e. without RDMA). It means that your WRITE commands were sent without RDMA. Sorry, but I'm afraid you got this wrong. When the iSER transport is used instead of TCP, all data is sent via RDMA, including unsolicited data. If you have look at the iSER implementation in the Linux kernel (source files under drivers/infiniband/ulp/iser), you will see that all data is transferred via RDMA and not via TCP/IP. When you execute WRITE commands with iSCSI, it works like this: EDTL (Expected data length) - the data length of your command FirstBurstLength - the length of data that will be sent as unsolicited data (i.e. as immediate data with the SCSI command and as unsolicited data-out PDUs) If you use a high value for FirstBurstLength, all (or most) of your data will be sent as unsolicited data-out PDUs. These PDUs don't use the RDMA engine, so you miss the advantage of IB. If you use a lower value for FirstBurstLength, EDTL - FirstBurstLength bytes will be sent as solicited data-out PDUs. With iSER, solicited data-out PDUs are RDMA operations. I hope that I'm more clear now. Erez - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Bart Van Assche wrote: On Jan 30, 2008 12:32 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote: iSER has parameters to limit the maximum size of RDMA (it needs to repeat RDMA with a poor configuration)? Please specify which parameters you are referring to. As you know I had already repeated my tests with ridiculously high values for the following iSER parameters: FirstBurstLength, MaxBurstLength and MaxRecvDataSegmentLength (16 MB, which is more than the 1 MB block size specified to dd). Using such large values for FirstBurstLength will give you poor performance numbers for WRITE commands (with iSER). FirstBurstLength means how much data should you send as unsolicited data (i.e. without RDMA). It means that your WRITE commands were sent without RDMA. Erez - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration of SCST in the mainstream Linux kernel
Bart Van Assche wrote: As you probably know there is a trend in enterprise computing towards networked storage. This is illustrated by the emergence during the past few years of standards like SRP (SCSI RDMA Protocol), iSCSI (Internet SCSI) and iSER (iSCSI Extensions for RDMA). Two different pieces of software are necessary to make networked storage possible: initiator software and target software. As far as I know there exist three different SCSI target implementations for Linux: - The iSCSI Enterprise Target Daemon (IETD, http://iscsitarget.sourceforge.net/); - The Linux SCSI Target Framework (STGT, http://stgt.berlios.de/); - The Generic SCSI Target Middle Level for Linux project (SCST, http://scst.sourceforge.net/). Since I was wondering which SCSI target software would be best suited for an InfiniBand network, I started evaluating the STGT and SCST SCSI target implementations. Apparently the performance difference between STGT and SCST is small on 100 Mbit/s and 1 Gbit/s Ethernet networks, but the SCST target software outperforms the STGT software on an InfiniBand network. See also the following thread for the details: http://sourceforge.net/mailarchive/forum.php?thread_name=e2e108260801170127w2937b2afg9bef324efa945e43%40mail.gmail.comforum_name=scst-devel. Sorry for the late response (but better late than never). One may claim that STGT should have lower performance than SCST because its data path is from userspace. However, your results show that for non-IB transports, they both show the same numbers. Furthermore, with IB there shouldn't be any additional difference between the 2 targets because data transfer from userspace is as efficient as data transfer from kernel space. The only explanation that I see is that fine tuning for iSCSI iSER is required. As was already mentioned in this thread, with SDR you can get ~900 MB/sec with iSER (on STGT). Erez - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IB/iSER: add logical unit reset support
eh_device_reset_handler was already added to scsi_host_template in iscsi_tcp, and is now added also for iscsi_iser. Signed-off-by: Erez Zilber [EMAIL PROTECTED] Signed-off-by: Mike Christie [EMAIL PROTECTED] --- drivers/infiniband/ulp/iser/iscsi_iser.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c index fd69fb3..4cd0705 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.c +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c @@ -552,6 +552,7 @@ static struct scsi_host_template iscsi_iser_sht = { .max_sectors= 1024, .cmd_per_lun= ISCSI_MAX_CMD_PER_LUN, .eh_abort_handler = iscsi_eh_abort, + .eh_device_reset_handler= iscsi_eh_device_reset, .eh_host_reset_handler = iscsi_eh_host_reset, .use_clustering = DISABLE_CLUSTERING, .proc_name = iscsi_iser, -- 1.5.3.7 - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance of SCST versus STGT
FUJITA Tomonori wrote: On Thu, 17 Jan 2008 12:48:28 +0300 Vladislav Bolkhovitin [EMAIL PROTECTED] wrote: FUJITA Tomonori wrote: On Thu, 17 Jan 2008 10:27:08 +0100 Bart Van Assche [EMAIL PROTECTED] wrote: Hello, I have performed a test to compare the performance of SCST and STGT. Apparently the SCST target implementation performed far better than the STGT target implementation. This makes me wonder whether this is due to the design of SCST or whether STGT's performance can be improved to the level of SCST ? Test performed: read 2 GB of data in blocks of 1 MB from a target (hot cache -- no disk reads were performed, all reads were from the cache). Test command: time dd if=/dev/sde of=/dev/null bs=1M count=2000 STGT read SCST read performance (MB/s) performance (MB/s) Ethernet (1 Gb/s network)7789 IPoIB (8 Gb/s network) 82 229 SRP (8 Gb/s network)N/A 600 iSER (8 Gb/s network)80 N/A These results show that SCST uses the InfiniBand network very well (effectivity of about 88% via SRP), but that the current STGT version is unable to transfer data faster than 82 MB/s. Does this mean that there is a severe bottleneck present in the current STGT implementation ? I don't know about the details but Pete said that he can achieve more than 900MB/s read performance with tgt iSER target using ramdisk. http://www.mail-archive.com/[EMAIL PROTECTED]/msg4.html Please don't confuse multithreaded latency insensitive workload with single threaded, hence latency sensitive one. Seems that he can get good performance with single threaded workload: http://www.osc.edu/~pw/papers/wyckoff-iser-snapi07-talk.pdf But I don't know about the details so let's wait for Pete to comment on this. Perhaps Voltaire people could comment on the tgt iSER performances. We didn't run any real performance test with tgt, so I don't have numbers yet. I know that Pete got ~900 MB/sec by hacking sgp_dd, so all data was read/written to the same block (so it was all done in the cache). Pete - am I right? As already mentioned, he got that with IB SDR cards that are 10 Gb/sec cards in theory (actual speed is ~900 MB/sec). With DDR cards (20 Gb/sec), you can get even more. I plan to test that in the near future. Erez - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance of SCST versus STGT
We didn't run any real performance test with tgt, so I don't have numbers yet. I know that Pete got ~900 MB/sec by hacking sgp_dd, so all data was read/written to the same block (so it was all done in the cache). Pete - am I right? As already mentioned, he got that with IB SDR cards that are 10 Gb/sec cards in theory (actual speed is ~900 MB/sec). With DDR cards (20 Gb/sec), you can get even more. I plan to test that in the near future. Are you writing about a maximum possible speed which he got, including multithreded tests with many outstanding commands or about speed he got on single threaded reads with one outstanding command? This thread is about the second one. As I said, we didn't run any performance tests on stgt yet. Erez - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: usage of max_sectors in scsi_host_template
Stefan Richter wrote: Erez Zilber wrote: I'm not sure that I understand the meaning of max_sectors in scsi_host_template. Did you have a look at scsi_mid_low_api.txt? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/scsi/scsi_mid_low_api.txt;h=6f70f2b9327e1f0db7bc05bdbf2d6ce3b2fcbdcf#l1232 I will go over it. Thanks for the link. Is it the maximum data length of a single SCSI command? Yes. Is it in bytes? No, it is in units of 512 bytes. What's the size of a sector? Usually 512 bytes according to above doc. Always 512 bytes from the point of view of block/ll_rw_blk.c::blk_queue_max_sectors(). http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=block/ll_rw_blk.c;h=75c98d58f4ddf7252e2717e0924b9d6a8925b4e5#l590 So, ll_rw_blk actually uses the max_sectors value to chop requests larger than max_sectors. Am I right? If yes, I have a problem: I'm running sgp_dd (on RHAS 4 up4 - kernel version is 2.6.9), so it calls scsi-ml directly (without going through ll_rw_blk). I ran it with the following parameters: sgp_dd bs=512 of=/dev/null if=/dev/sg1 bpt=2048 thr=4 time=1 count=100k deb=9 I see that a single 1MB command is generated. Here's the debug info from sgp_dd: sgp_dd: if=/dev/sg1 skip=0 of=/dev/null seek=0 count=102400 Start of loop, count=102400, in_num_sect=0, out_num_sect=0 Starting worker thread k=0 sg_start_io: SCSI READ, blk=0 num_blks=2048 Read (10) [28 00 00 00 00 00 00 08 00 00 ] dir=-3, len=1048576, dxfrp=0x2a9558a000, cmd_len=10 Now, the low-level driver below scsi-ml is open-iscsi over iSER. max_sectors is set to 1024 (i.e. 512 kB). Still, the iSER driver receives a 1MB command. I guess that the max_sectors value is never used. Am I right? Thanks, Erez - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
usage of max_sectors in scsi_host_template
I'm not sure that I understand the meaning of max_sectors in scsi_host_template. Is it the maximum data length of a single SCSI command? Is it in bytes? What's the size of a sector? Thanks, -- Erez Zilber | 972-9-971-7689 Software Engineer, Storage Solutions Voltaire – _The Grid Backbone_ __ www.voltaire.com http://www.voltaire.com/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html