Re: iSCSI throughput drops as link rtt increases?

2010-01-07 Thread Jack Z
Hi Pasi,

Thank you very much for your help. I really appreciate it!

On Jan 5, 12:58 pm, Pasi Kärkkäinen pa...@iki.fi wrote:

 On Tue, Jan 05, 2010 at 02:05:03AM -0800, Jack Z wrote:
   Try using some benchmarking tool that can do multiple outstanding IOs..
   for example ltp disktest.
  And I tried ltp disktest, too. But I'm not sure whether I used it
  right because the result was a little surprising...
  I did
  disktest -w -S0:1k -B 1024 /dev/sdb
  (/dev/sdb is the iSCSI device file, no partition or file system on it)
  And the result was:
  | 2010/01/05-02:58:26 | START | 27293 | v1.4.2 | /dev/sdb | Start
  args: -w -S0:1024k -B 1024 -PA (-I b) (-N 8385867) (-K 4) (-c) (-p R)
  (-L 1048577) (-D 0:100) (-t 0:2m) (-o 0)
  | 2010/01/05-02:58:26 | INFO  | 27293 | v1.4.2 | /dev/sdb | Starting
  pass
  ^C| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
  bytes written in 85578 transfers: 87631872
  | 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
  write throughput: 701055.0B/s (0.67MB/s), IOPS 684.6/s.
  | 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
  Write Time: 125 seconds (0d0h2m5s)
  | 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
  overall runtime: 152 seconds (0d0h2m32s)
  | 2010/01/05-03:00:58 | END   | 27293 | v1.4.2 | /dev/sdb | User
  Interrupt: Test Done (Passed)
  As you can see, the throughput was only 0.67MB/s and only 85578
  written in 87631872 transfers...
  I also tweaked the options with -p l and/or -I bd (change seek
  pattern to linear and/or speficy IO type as block and direct IO) but
  no improvement happened...
 Hmm.. so it does 684 IO operations per second (IOPS), and each IO was 1k
 in size, so it makes 684 kB/sec of throughput.
 1000 milliseconds (1 second) divided by 684 IOPS is 1.46 milliseconds per IO..
 Are you sure you had 16ms of rtt?


Actually that was probably the output from 0.2 ms rtt instead of 16
ms... I'm sorry for the mistake. I tried again the same command on a
16ms RTT, and the IOPS was mostly around 150.

 Try to play and experiment with these options:

 -B 64k  (blocksize 64k, try also 4k)
 -I BD (block device, direct IO (O_DIRECT))
 -K 16 (16 threads, aka 16 outstanding IOs. -K 1 should be the same as dd)

 Examples:

 Sequential (linear) reads using blocksize 4k and 4 simultaneous threads, for 
 60 seconds:
 disktest -B 4k -h 1 -I BD -K 4 -p l -P T -T 60 -r /dev/sdX

 Random writes:

 disktest -B 4k -h 1 -I BD -K 4 -p r -P T -T 60 -w /dev/sdX

 30% random reads, 70% random writes:
 disktest -r -w -D30:70 -K2 -E32 -B 8k -T 60 -pR -Ibd -PA /dev/md4

 Hopefully that helps..


That did help! I tried the following combinations of -B -K and -p at
20 ms RTT and the other options were -h 30 -I BD -P T -S0:(1 GB size)

-B 4k/64k -K 4/64 -p l

When I put -p l there the performance went down
drastically...

-B 4k -K 4/64 -p r

The disk throughput was similar to the one I used in the previous
post
disktest -w -S0:1k -B 1024 /dev/sdb  and it was much lower than dd
could get.

-B 64k -K 4 -p r

The disk throughput was higher than the last one but still not as
high
as dd could get.

-B 64k -K 64 -p r

The disk throughput was boosted to 8.06 MB/s and the IOPS was 129.0.
At the link layer, the traffic rate was 70.536 Mbps (the TCP baseline
was 96.202 Mbps). At the same time, dd ( bs=64K count=(1 GB size))
got
a throughput of 6.7 MB/s and the traffic rate on the link layer was
57.749 Mbps.

Although not much, it was still an improvement :) and it was the
first
improvement I have ever seen since I started my experiments! Thank
you
very much!

As for
 Oh, also make sure you have 'oflag=direct' for dd.

The result was surprisingly low again... Do you think the reason
might
be that I was running dd on a device file (/dev/sdb), which did not
have any partitions/file systems on it?

Thanks a lot!

Jack

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: iSCSI throughput drops as link rtt increases?

2010-01-07 Thread Ulrich Windl
On 4 Jan 2010 at 6:54, Jack Z wrote:

 Hi all,
 
 I was testing the performance of open-iscsi initiator with IET target
 over a 100Mbps Ethernet link with emulated rtt.  What I did was to do
 raw disk sequential write by
 
 $ dd if=/dev/zero of=/dev/sdb bs=1024 count=1048576
 
 , in which /dev/sdb is the iSCSI device. I also measured TCP
 throughput using iperf with the default setup except -n 1024M. And I
 got the following data on iSCSI throughput and TCP throughput v.s. rtt
 
 rtt (ms)iSCSI throughput by dd (MB/s)   TCP throughput by
 iperf (Mbit/s)
 0.2   11.3
 94.3
 4  11.1
 94.3
 8  10.2
 94.3
 128.6
 94.2
 167.2
 94.2
 206.0
 94.1
 
 local disk throughput by dd was 26.7 MB/s.
 
 As shown in the table above, iSCSI throughput declined rapidly with
 rtt increased from 0.2ms to 20ms. TCP throughput, however, only
 dropped less than 1 percent.

From what I know the (estimated) RTT (Round Trip Time) increases if a link 
problem 
(i.e. lost packets) was detected (if other parameters are unchanged).

 
 Then I used Wireshark to grab the traces of iSCSI and iperf and I
 found lots of iSCSI PDUs were divided into TCP segments of 1448 bytes
 but with iperf TCP segments could be as large as 65000+ bytes.

How would you transport such a segmen unfragmented?

 
 I first thought this was because of the small default value (8192) for
 MaxRecvDataSegmentLength. So I increased that value to 262144. But in
 a later test with 16ms rtt, I found the iSCSI throughput was only
 improved by 0.7 MB/s and a lot of iSCSI PDUs were still divided into
 1448 byte long TCP segments... So I think MaxRecvDataSegmentLength may
 not be the reason.

I think the question is how big the TCP receive window will be.


 
 I also skimmed through the iSCSI specification, but it seemed no luck
 there either...
 
 I know the Ethernet MTU is 1500 byte long and that might be the reason
 of the 1448 byte TCP segments, but iperf did get to send much larger
 TCP segments of 65000+ bytes...

over which layer 2?

 
 So does anyone have any idea about this: why iSCSI is not fully
 utilizing the bandwidth on long rtt links by increasing the TCP
 segment size?

Sorry, but I think utilizing a high-delay conncetion works via increasing the 
window size (i.e. number of packets), not the size of the segments. Both would 
be 
valid, but due to layer 2 and layer 3 restrictions (ISO OSI talk), only sending 
more packets while waiting for an answer will be a valid assumption (unless you 
have a dedicated single-hop line).

Regards,
Ulrich

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: iSCSI throughput drops as link rtt increases?

2010-01-07 Thread Pasi Kärkkäinen
On Wed, Jan 06, 2010 at 11:59:37PM -0800, Jack Z wrote:
 Hi Pasi,
 
 Thank you very much for your help. I really appreciate it!
 
 On Jan 5, 12:58 pm, Pasi Kärkkäinen pa...@iki.fi wrote:
  On Tue, Jan 05, 2010 at 02:05:03AM -0800, Jack Z wrote:
 
 
Try using some benchmarking tool that can do multiple outstanding IOs..
for example ltp disktest.
 
   And I tried ltp disktest, too. But I'm not sure whether I used it
   right because the result was a little surprising...
 
   I did
 
   disktest -w -S0:1k -B 1024 /dev/sdb
 
   (/dev/sdb is the iSCSI device file, no partition or file system on it)
 
   And the result was:
 
   | 2010/01/05-02:58:26 | START | 27293 | v1.4.2 | /dev/sdb | Start
   args: -w -S0:1024k -B 1024 -PA (-I b) (-N 8385867) (-K 4) (-c) (-p R)
   (-L 1048577) (-D 0:100) (-t 0:2m) (-o 0)
   | 2010/01/05-02:58:26 | INFO  | 27293 | v1.4.2 | /dev/sdb | Starting
   pass
   ^C| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
   bytes written in 85578 transfers: 87631872
   | 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
   write throughput: 701055.0B/s (0.67MB/s), IOPS 684.6/s.
   | 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
   Write Time: 125 seconds (0d0h2m5s)
   | 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
   overall runtime: 152 seconds (0d0h2m32s)
   | 2010/01/05-03:00:58 | END   | 27293 | v1.4.2 | /dev/sdb | User
   Interrupt: Test Done (Passed)
 
   As you can see, the throughput was only 0.67MB/s and only 85578
   written in 87631872 transfers...
   I also tweaked the options with -p l and/or -I bd (change seek
   pattern to linear and/or speficy IO type as block and direct IO) but
   no improvement happened...
 
  Hmm.. so it does 684 IO operations per second (IOPS), and each IO was 1k
  in size, so it makes 684 kB/sec of throughput.
 
  1000 milliseconds (1 second) divided by 684 IOPS is 1.46 milliseconds per 
  IO..
 
  Are you sure you had 16ms of rtt?
 
 Actually that was probably the output from 0.2 ms rtt instead of 16
 ms... I'm sorry for the mistake. I tried again the same command on a
 16ms RTT, and the IOPS was mostly around 180.
 

1000ms divided by 16ms rtt gives you 62,5 synchronous IOPS max.
So that means you had about 3 outstanding IOs running, since you
got 180 IOPS.

If I'm still following everything correctly :)

 
  Try to play and experiment with these options:
 
  -B 64k  (blocksize 64k, try also 4k)
  -I BD (block device, direct IO (O_DIRECT))
  -K 16 (16 threads, aka 16 outstanding IOs. -K 1 should be the same as dd)
 
  Examples:
 
  Sequential (linear) reads using blocksize 4k and 4 simultaneous threads, 
  for 60 seconds:
  disktest -B 4k -h 1 -I BD -K 4 -p l -P T -T 60 -r /dev/sdX
 
  Random writes:
 
  disktest -B 4k -h 1 -I BD -K 4 -p r -P T -T 60 -w /dev/sdX
 
  30% random reads, 70% random writes:
  disktest -r -w -D30:70 -K2 -E32 -B 8k -T 60 -pR -Ibd -PA /dev/md4
 
  Hopefully that helps..
 
 That did help. I tried the following combinations of -B -K and -p at
 20 ms RTT and the other options were -h 30 -I BD -P T -S0:(1 GB size)
 
 -B 4k/64k -K 4/64 -p l
 
 It seems that when I put -p l there the performance goes down
 drastically...
 

That's really weird.. linear/sequential (-p l) should always be faster
than random.

 -B 4k -K 4/64 -p r
 
 The disk throughput is similar to the one I used in the previous post
 disktest -w -S0:1k -B 1024 /dev/sdb  and it's much lower than dd
 could get.
 

like said, weird.

 -B 64k -K 4 -p r
 
 The disk throughput is higher than the last one but still not as high
 as dd could get.
 
 -B 64k -K 64 -p r
 
 The disk throughput was boosted to 8.06 MB/s and the IOPS was 129.0.
 At the link layer, the traffic rate was 70.536 Mbps (the TCP baseline
 was 96.202 Mbps). At the same time, dd ( bs=64K count=(1 GB size)) got
 a throughput of 6.7 MB/s and the traffic rate on the link layer was
 57.749 Mbps.
 

Ok.

129 IOPS * 64kB = 8256 kB/sec, which pretty much matches the 8 MB/sec
you measured.

this still means there was only 1 outstanding IO.. and definitely not 64 (-K 
64).

 Although not much, it was still an improvement and it was the first
 improvement I have ever seen since I started my experiments! Thank you
 very much!
 
 As for
 
  Oh, also make sure you have 'oflag=direct' for dd.
 
 The result was surprisingly low again... Do you think the reason might
 be that I was running dd on a device file (/dev/sdb), which did not
 have any partitions/file systems on it?
 
 Thanks a lot!
 

oflag=direct makes dd use O_DIRECT, aka bypass all kernel/initiator caches for 
writing.
iflag=direct would bypass all caches for reading.

It shouldn't matter if you write or read from /dev/sda1 instead of /dev/sda. 
As long as it's a raw block device, it shouldn't matter.
If you write/read to/from a filesystem, that obviously matters.

What kind of target you are using for this benchmark? 

-- Pasi

-- 
You received this message because you are subscribed to 

Re: iSCSI throughput drops as link rtt increases?

2010-01-07 Thread Jack Z
Hi Ulrich,

Thanks for your reply!


  I was testing the performance of open-iscsi initiator with IET target
  over a 100Mbps Ethernet link with emulated rtt.  What I did was to do
  raw disk sequential write by

  $ dd if=/dev/zero of=/dev/sdb bs=1024 count=1048576

  , in which /dev/sdb is the iSCSI device. I also measured TCP
  throughput using iperf with the default setup except -n 1024M. And I
  got the following data on iSCSI throughput and TCP throughput v.s. rtt

  rtt (ms)        iSCSI throughput by dd (MB/s)   TCP throughput by
  iperf (Mbit/s)
  0.2               11.3
  94.3
  4                  11.1
  94.3
  8                  10.2
  94.3
  12                8.6
  94.2
  16                7.2
  94.2
  20                6.0
  94.1

  local disk throughput by dd was 26.7 MB/s.

  As shown in the table above, iSCSI throughput declined rapidly with
  rtt increased from 0.2ms to 20ms. TCP throughput, however, only
  dropped less than 1 percent.

 From what I know the (estimated) RTT (Round Trip Time) increases if a link 
 problem
 (i.e. lost packets) was detected (if other parameters are unchanged).

As explained at the beginning of my first thread, I was doing an
experiment. And the experiment was done on two laptops over a straight-
through cable. The RTT was increased intentionally, as I was measuring
the iSCSI performance against RTT changes. The other parameters of the
link, such as packet loss etc, were not changed and no packet loss was
observed when using ping over the link.

  Then I used Wireshark to grab the traces of iSCSI and iperf and I
  found lots of iSCSI PDUs were divided into TCP segments of 1448 bytes
  but with iperf TCP segments could be as large as 65000+ bytes.

 How would you transport such a segmen unfragmented?

  I also skimmed through the iSCSI specification, but it seemed no luck
  there either...

  I know the Ethernet MTU is 1500 byte long and that might be the reason
  of the 1448 byte TCP segments, but iperf did get to send much larger
  TCP segments of 65000+ bytes...

 over which layer 2?

As Mike suggested in his reply, this could be a jumbo frame. The
following is the data of a 65160 packet captured by Wireshark:

No. TimeSourceS_Port Destination
D_Port Protocol Info
266 0.13781010.0.0.1  56099  10.0.0.2
5001 TCP  56099  5001 [ACK] Seq=376505 Ack=1 Win=92 Len=65160
[Packet size limited during capture]

Frame 266 (65226 bytes on wire, 58 bytes captured)
Arrival Time: Jan  4, 2010 04:44:33.711762000
[Time delta from previous captured frame: 0.000206000 seconds]
[Time delta from previous displayed frame: 0.002861000 seconds]
[Time since reference or first frame: 0.13781 seconds]
Frame Number: 266
Frame Length: 65226 bytes
Capture Length: 58 bytes
[Frame is marked: True]
[Protocols in frame: eth:ip:tcp]
[Coloring Rule Name: TCP]
[Coloring Rule String: tcp]
Ethernet II, Src: HonHaiPr_0f:35:65 , Dst: Ibm_8d:59:02
Destination: Ibm_8d:59:02
Address: Ibm_8d:59:02
 ...0     = IG bit: Individual address
(unicast)
 ..0.     = LG bit: Globally unique
address (factory default)
Source: HonHaiPr_0f:35:65
Address: HonHaiPr_0f:35:65
 ...0     = IG bit: Individual address
(unicast)
 ..0.     = LG bit: Globally unique
address (factory default)
Type: IP (0x0800)
Internet Protocol, Src: 10.0.0.1 (10.0.0.1), Dst: 10.0.0.2 (10.0.0.2)
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN:
0x00)
 00.. = Differentiated Services Codepoint: Default (0x00)
 ..0. = ECN-Capable Transport (ECT): 0
 ...0 = ECN-CE: 0
Total Length: 65212
Identification: 0x8729 (34601)
Flags: 0x04 (Don't Fragment)
0... = Reserved bit: Not set
.1.. = Don't fragment: Set
..0. = More fragments: Not set
Fragment offset: 0
Time to live: 64
Protocol: TCP (0x06)
Header checksum: 0xa10f [correct]
[Good: True]
[Bad : False]
Source: 10.0.0.1 (10.0.0.1)
Destination: 10.0.0.2 (10.0.0.2)
Transmission Control Protocol, Src Port: 56099 (56099), Dst Port:
commplex-link (5001), Seq: 376505, Ack: 1, Len: 65160
Source port: 56099 (56099)
Destination port: commplex-link (5001)
[Stream index: 0]
Sequence number: 376505(relative sequence number)
[Next sequence number: 441665(relative sequence number)]
Acknowledgement number: 1(relative ack number)
Header length: 32 bytes
Flags: 0x10 (ACK)
0...  = Congestion Window Reduced (CWR): Not set
.0..  = ECN-Echo: Not set
..0.  = Urgent: Not set
...1  = Acknowledgement: Set
 0... = Push: Not set
 .0.. = Reset: Not set
 ..0. = Syn: Not set
 ...0 = Fin: Not set
 

Re: iSCSI throughput drops as link rtt increases?

2010-01-06 Thread Jack Knight
Hi Pasi,

Thank you very much for your reply.

  I was testing the performance of open-iscsi initiator with IET target
  over a 100Mbps Ethernet link with emulated rtt.  What I did was to do
  raw disk sequential write by

  $ dd if=/dev/zero of=/dev/sdb bs=1024 count=1048576

 Did you also try with bigger block sizes? 1k blocks are pretty small.

 try bs=1024k to see if it makes a difference.


I tried bs = 1024k and the throughput is improved, but not much... It
goes from 7.2MB/s to 8.0MB/s at a rtt of 16ms. And again, over 90% of
the TCP segments on the wire was only of 1448 bytes...


 dd will use only one outstanding IO, so you have wait for rtt
 milliseconds after every IO for the ack.. so that definitely slows you
 down a lot when rtt gets bigger.

 Try using some benchmarking tool that can do multiple outstanding IOs..
 for example ltp disktest.


And I tried ltp disktest, too. But I'm not sure whether I used it
right because the result was a little surprising...

I did

disktest -w -S0:1k -B 1024 /dev/sdb

(/dev/sdb is the iSCSI device file, no partition or file system on it)

And the result was:

| 2010/01/05-02:58:26 | START | 27293 | v1.4.2 | /dev/sdb | Start
args: -w -S0:1024k -B 1024 -PA (-I b) (-N 8385867) (-K 4) (-c) (-p R)
(-L 1048577) (-D 0:100) (-t 0:2m) (-o 0)
| 2010/01/05-02:58:26 | INFO  | 27293 | v1.4.2 | /dev/sdb | Starting
pass
^C| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
bytes written in 85578 transfers: 87631872
| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
write throughput: 701055.0B/s (0.67MB/s), IOPS 684.6/s.
| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
Write Time: 125 seconds (0d0h2m5s)
| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
overall runtime: 152 seconds (0d0h2m32s)
| 2010/01/05-03:00:58 | END   | 27293 | v1.4.2 | /dev/sdb | User
Interrupt: Test Done (Passed)

As you can see, the throughput was only 0.67MB/s and only 85578
written in 87631872 transfers...
I also tweaked the options with -p l and/or -I bd (change seek
pattern to linear and/or speficy IO type as block and direct IO) but
no improvement happened...

There must be something I've done wrong... Could you maybe help me out
here?

Thanks a lot!

jack
-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: iSCSI throughput drops as link rtt increases?

2010-01-06 Thread Mike Christie

On 01/04/2010 08:54 AM, Jack Z wrote:

Then I used Wireshark to grab the traces of iSCSI and iperf and I
found lots of iSCSI PDUs were divided into TCP segments of 1448 bytes
but with iperf TCP segments could be as large as 65000+ bytes.

I first thought this was because of the small default value (8192) for
MaxRecvDataSegmentLength. So I increased that value to 262144. But in
a later test with 16ms rtt, I found the iSCSI throughput was only
improved by 0.7 MB/s and a lot of iSCSI PDUs were still divided into
1448 byte long TCP segments... So I think MaxRecvDataSegmentLength may
not be the reason.



Are you using jumbo frames?
-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: iSCSI throughput drops as link rtt increases?

2010-01-06 Thread Jack Z
Hi Mike,

I use the default configuration of open-iscsi initiator and IET,
except I change the NOP interval to 500s and the
MaxRecvDataSegmentLength of IET to 262144 (default value is 8192).

And the network I'm using is a straight-through cable between two
laptops. I'm not sure whether the NICs support or choose to use jumbo
frames... But as you can see in my previous post, the TCP segments of
iperf traffic were mostly as large as 65000+ bytes but over 90% of the
iSCSI ones were only 1448 bytes. So I thought that TCP did support
large segments but it seemed that iSCSI or TCP chose not to use the
large ones but went with the small ones for some reason...

Do you think there might be some configurations I can play with to
change this?

Thanks a lot!

Jack!



On Jan 6, 12:03 pm, Mike Christie micha...@cs.wisc.edu wrote:
 On 01/04/2010 08:54 AM, Jack Z wrote:

  Then I used Wireshark to grab the traces of iSCSI and iperf and I
  found lots of iSCSI PDUs were divided into TCP segments of 1448 bytes
  but with iperf TCP segments could be as large as 65000+ bytes.

  I first thought this was because of the small default value (8192) for
  MaxRecvDataSegmentLength. So I increased that value to 262144. But in
  a later test with 16ms rtt, I found the iSCSI throughput was only
  improved by 0.7 MB/s and a lot of iSCSI PDUs were still divided into
  1448 byte long TCP segments... So I think MaxRecvDataSegmentLength may
  not be the reason.

 Are you using jumbo frames?
-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: iSCSI throughput drops as link rtt increases?

2010-01-06 Thread Jack Z
Hi Mike,

I use the default configuration of open-iscsi initiator and IET,
except I change the NOP interval to 500s and the
MaxRecvDataSegmentLength of IET to 262144 (default value is 8192).

And the network I'm using is a straight-through cable between two
laptops. I'm not sure whether the NICs support or choose to use jumbo
frames...

But as you can see in my previous post, the TCP segments of
iperf traffic were mostly as large as 65000+ bytes but over 90% of
the
iSCSI ones were only 1448 bytes. So I thought that TCP did support
large segments but it seemed that iSCSI or TCP chose not to use the
large ones but went with the small ones for some reason...

Do you think there might be some configurations I can play with to
have iSCSI and TCP choose to use large segments?

Thanks a lot!

Jack!



On Jan 6, 12:03 pm, Mike Christie micha...@cs.wisc.edu wrote:
 On 01/04/2010 08:54 AM, Jack Z wrote:

  Then I used Wireshark to grab the traces of iSCSI and iperf and I
  found lots of iSCSI PDUs were divided into TCP segments of 1448 bytes
  but with iperf TCP segments could be as large as 65000+ bytes.

  I first thought this was because of the small default value (8192) for
  MaxRecvDataSegmentLength. So I increased that value to 262144. But in
  a later test with 16ms rtt, I found the iSCSI throughput was only
  improved by 0.7 MB/s and a lot of iSCSI PDUs were still divided into
  1448 byte long TCP segments... So I think MaxRecvDataSegmentLength may
  not be the reason.

 Are you using jumbo frames?
-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: iSCSI throughput drops as link rtt increases?

2010-01-05 Thread Jack Z
Hi Pasi,

Thank you very much for your reply.

  I was testing the performance of open-iscsi initiator with IET target
  over a 100Mbps Ethernet link with emulated rtt.  What I did was to do
  raw disk sequential write by

  $ dd if=/dev/zero of=/dev/sdb bs=1024 count=1048576

 Did you also try with bigger block sizes? 1k blocks are pretty small.

 try bs=1024k to see if it makes a difference.


I tried bs = 1024k and the throughput is improved, but not much... It
goes from 7.2MB/s to 8.0MB/s at a rtt of 16ms. And again, over 90% of
the TCP segments on the wire was only of 1448 bytes...


 dd will use only one outstanding IO, so you have wait for rtt
 milliseconds after every IO for the ack.. so that definitely slows you
 down a lot when rtt gets bigger.

 Try using some benchmarking tool that can do multiple outstanding IOs..
 for example ltp disktest.


And I tried ltp disktest, too. But I'm not sure whether I used it
right because the result was a little surprising...

I did

disktest -w -S0:1k -B 1024 /dev/sdb

(/dev/sdb is the iSCSI device file, no partition or file system on it)

And the result was:

| 2010/01/05-02:58:26 | START | 27293 | v1.4.2 | /dev/sdb | Start
args: -w -S0:1024k -B 1024 -PA (-I b) (-N 8385867) (-K 4) (-c) (-p R)
(-L 1048577) (-D 0:100) (-t 0:2m) (-o 0)
| 2010/01/05-02:58:26 | INFO  | 27293 | v1.4.2 | /dev/sdb | Starting
pass
^C| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
bytes written in 85578 transfers: 87631872
| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
write throughput: 701055.0B/s (0.67MB/s), IOPS 684.6/s.
| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
Write Time: 125 seconds (0d0h2m5s)
| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
overall runtime: 152 seconds (0d0h2m32s)
| 2010/01/05-03:00:58 | END   | 27293 | v1.4.2 | /dev/sdb | User
Interrupt: Test Done (Passed)

As you can see, the throughput was only 0.67MB/s and only 85578
written in 87631872 transfers...
I also tweaked the options with -p l and/or -I bd (change seek
pattern to linear and/or speficy IO type as block and direct IO) but
no improvement happened...

There must be something I've done wrong... Could you maybe help me out
here?

Thanks a lot!

jack

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: iSCSI throughput drops as link rtt increases?

2010-01-05 Thread Pasi Kärkkäinen
On Tue, Jan 05, 2010 at 02:05:03AM -0800, Jack Z wrote:
 Hi Pasi,
 
 Thank you very much for your reply.
 
   I was testing the performance of open-iscsi initiator with IET target
   over a 100Mbps Ethernet link with emulated rtt.  What I did was to do
   raw disk sequential write by
 
   $ dd if=/dev/zero of=/dev/sdb bs=1024 count=1048576
 
  Did you also try with bigger block sizes? 1k blocks are pretty small.
 
  try bs=1024k to see if it makes a difference.
 
 
 I tried bs = 1024k and the throughput is improved, but not much... It
 goes from 7.2MB/s to 8.0MB/s at a rtt of 16ms. And again, over 90% of
 the TCP segments on the wire was only of 1448 bytes...


Ok..


 
  dd will use only one outstanding IO, so you have wait for rtt
  milliseconds after every IO for the ack.. so that definitely slows you
  down a lot when rtt gets bigger.
 
  Try using some benchmarking tool that can do multiple outstanding IOs..
  for example ltp disktest.
 
 
 And I tried ltp disktest, too. But I'm not sure whether I used it
 right because the result was a little surprising...
 
 I did
 
 disktest -w -S0:1k -B 1024 /dev/sdb
 
 (/dev/sdb is the iSCSI device file, no partition or file system on it)
 
 And the result was:
 
 | 2010/01/05-02:58:26 | START | 27293 | v1.4.2 | /dev/sdb | Start
 args: -w -S0:1024k -B 1024 -PA (-I b) (-N 8385867) (-K 4) (-c) (-p R)
 (-L 1048577) (-D 0:100) (-t 0:2m) (-o 0)
 | 2010/01/05-02:58:26 | INFO  | 27293 | v1.4.2 | /dev/sdb | Starting
 pass
 ^C| 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
 bytes written in 85578 transfers: 87631872
 | 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
 write throughput: 701055.0B/s (0.67MB/s), IOPS 684.6/s.
 | 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
 Write Time: 125 seconds (0d0h2m5s)
 | 2010/01/05-03:00:58 | STAT  | 27293 | v1.4.2 | /dev/sdb | Total
 overall runtime: 152 seconds (0d0h2m32s)
 | 2010/01/05-03:00:58 | END   | 27293 | v1.4.2 | /dev/sdb | User
 Interrupt: Test Done (Passed)
 
 As you can see, the throughput was only 0.67MB/s and only 85578
 written in 87631872 transfers...
 I also tweaked the options with -p l and/or -I bd (change seek
 pattern to linear and/or speficy IO type as block and direct IO) but
 no improvement happened...
 

Hmm.. so it does 684 IO operations per second (IOPS), and each IO was 1k
in size, so it makes 684 kB/sec of throughput.

1000 milliseconds (1 second) divided by 684 IOPS is 1.46 milliseconds per IO..

Are you sure you had 16ms of rtt? 

 There must be something I've done wrong... Could you maybe help me out
 here?
 
 Thanks a lot!
 

Try to play and experiment with these options:

-B 64k  (blocksize 64k, try also 4k)
-I BD (block device, direct IO (O_DIRECT))
-K 16 (16 threads, aka 16 outstanding IOs. -K 1 should be the same as dd)

Examples:

Sequential (linear) reads using blocksize 4k and 4 simultaneous threads, for 60 
seconds:
disktest -B 4k -h 1 -I BD -K 4 -p l -P T -T 60 -r /dev/sdX

Random writes:
disktest -B 4k -h 1 -I BD -K 4 -p r -P T -T 60 -w /dev/sdX

30% random reads, 70% random writes:
disktest -r -w -D30:70 -K2 -E32 -B 8k -T 60 -pR -Ibd -PA /dev/md4

Hopefully that helps..

-- Pasi

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: iSCSI throughput drops as link rtt increases?

2010-01-05 Thread Pasi Kärkkäinen
On Tue, Jan 05, 2010 at 09:58:09PM +0200, Pasi Kärkkäinen wrote:
 On Tue, Jan 05, 2010 at 02:05:03AM -0800, Jack Z wrote:
  Hi Pasi,
  
  Thank you very much for your reply.
  
I was testing the performance of open-iscsi initiator with IET target
over a 100Mbps Ethernet link with emulated rtt.  What I did was to do
raw disk sequential write by
  
$ dd if=/dev/zero of=/dev/sdb bs=1024 count=1048576
  
   Did you also try with bigger block sizes? 1k blocks are pretty small.
  
   try bs=1024k to see if it makes a difference.
  
  
  I tried bs = 1024k and the throughput is improved, but not much... It
  goes from 7.2MB/s to 8.0MB/s at a rtt of 16ms. And again, over 90% of
  the TCP segments on the wire was only of 1448 bytes...
 
 

Oh, also make sure you have 'oflag=direct' for dd.

-- Pasi

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




iSCSI throughput drops as link rtt increases?

2010-01-04 Thread Jack Z
Hi all,

I was testing the performance of open-iscsi initiator with IET target
over a 100Mbps Ethernet link with emulated rtt.  What I did was to do
raw disk sequential write by

$ dd if=/dev/zero of=/dev/sdb bs=1024 count=1048576

, in which /dev/sdb is the iSCSI device. I also measured TCP
throughput using iperf with the default setup except -n 1024M. And I
got the following data on iSCSI throughput and TCP throughput v.s. rtt

rtt (ms)iSCSI throughput by dd (MB/s)   TCP throughput by
iperf (Mbit/s)
0.2   11.3
94.3
4  11.1
94.3
8  10.2
94.3
128.6
94.2
167.2
94.2
206.0
94.1

local disk throughput by dd was 26.7 MB/s.

As shown in the table above, iSCSI throughput declined rapidly with
rtt increased from 0.2ms to 20ms. TCP throughput, however, only
dropped less than 1 percent.

Then I used Wireshark to grab the traces of iSCSI and iperf and I
found lots of iSCSI PDUs were divided into TCP segments of 1448 bytes
but with iperf TCP segments could be as large as 65000+ bytes.

I first thought this was because of the small default value (8192) for
MaxRecvDataSegmentLength. So I increased that value to 262144. But in
a later test with 16ms rtt, I found the iSCSI throughput was only
improved by 0.7 MB/s and a lot of iSCSI PDUs were still divided into
1448 byte long TCP segments... So I think MaxRecvDataSegmentLength may
not be the reason.

I also skimmed through the iSCSI specification, but it seemed no luck
there either...

I know the Ethernet MTU is 1500 byte long and that might be the reason
of the 1448 byte TCP segments, but iperf did get to send much larger
TCP segments of 65000+ bytes...

So does anyone have any idea about this: why iSCSI is not fully
utilizing the bandwidth on long rtt links by increasing the TCP
segment size?

Thanks a lot! Any help would be highly appreciated!

jack

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: iSCSI throughput drops as link rtt increases?

2010-01-04 Thread Pasi Kärkkäinen
On Mon, Jan 04, 2010 at 06:54:17AM -0800, Jack Z wrote:
 Hi all,
 
 I was testing the performance of open-iscsi initiator with IET target
 over a 100Mbps Ethernet link with emulated rtt.  What I did was to do
 raw disk sequential write by
 
 $ dd if=/dev/zero of=/dev/sdb bs=1024 count=1048576
 

Did you also try with bigger block sizes? 1k blocks are pretty small.

try bs=1024k to see if it makes a difference.

 , in which /dev/sdb is the iSCSI device. I also measured TCP
 throughput using iperf with the default setup except -n 1024M. And I
 got the following data on iSCSI throughput and TCP throughput v.s. rtt
 
 rtt (ms)iSCSI throughput by dd (MB/s)   TCP throughput by
 iperf (Mbit/s)
 0.2   11.3
 94.3
 4  11.1
 94.3
 8  10.2
 94.3
 128.6
 94.2
 167.2
 94.2
 206.0
 94.1
 
 local disk throughput by dd was 26.7 MB/s.
 
 As shown in the table above, iSCSI throughput declined rapidly with
 rtt increased from 0.2ms to 20ms. TCP throughput, however, only
 dropped less than 1 percent.
 

dd will use only one outstanding IO, so you have wait for rtt
milliseconds after every IO for the ack.. so that definitely slows you
down a lot when rtt gets bigger.

Try using some benchmarking tool that can do multiple outstanding IOs..
for example ltp disktest.

-- Pasi

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.