After scratching my head for a few weeks, I've decided to ask for some help.

First, I've got two machines connected by gigabit ethernet, network performance 
is not a problem as I am able to substantially saturate the wire when not using 
iscsi [say iperf] or ftp. Both systems are 8.1-RELENG. They are both 
multi-core, 8G of RAM. 

Symptoms: When doing writes (size relatively independent) from a client to a 
server via iSCSI I seem to be
 hitting a wall between 18-26MB/s of write. This can be repeated continuously 
whether doing a newfs on a 2TB iscsi volume or doing a dd from /dev/zero to the 
iscsi target. I haven't compared read performance. What originally put me on to 
this was watching the newfs *fly* across the screen, and then hang for several 
seconds, and then *fly* again, and
 then pause. 

This looked like a write-delay problem, so I tweaked txgwrite values and/or the 
synctime values. This showed some improvements (iostat showed something closer 
to continuous write performance to the server but there was still a delay 
whether the write_limit was 384MB all the way up to 4GB. This tells me the 
spindles weren't holding the throughput back. The iostat size was never much 
beyond 20-26MB/s, peaks were frequently two-three times that, but then it would 
be 1MB/s for a few seconds which would bring us back to this average). CPU and 
network load were never the limiting factor, nor did the spindles ever get 
above 20-30% busy. 

So I added two USB keys that write at around 30-40MB/s, and mirrored them as a 
ZIL log. iostat verifies they are being used, but not continuously, it seems 
that the txgwrite value applies to writing to the ZIL. I also tried turning off 
the ZIL log and saw no particular performance increase (or
 decrease). When newfs (which jumps around a lot more than dd) the performance 
throughput does not change much at all. Even at 26K-40K pps, interrupt loads 
and such are not problematic, turning on polling does not change the 
performance appreciably.

The "server" is a RAIDZ2 of 15 drives @ 2TB each. So *write* throughput should 
be pretty fast sequentially (i.e. the dd case), but it is returning 
identically. This server does nothing much but istgt -- tried NCQ values from 
255 down to 32 to no improvement.

Even though network performance was not showing a particular limit, I *did* get 
from 18MB/s to 26MB/s by tweaking tcp sendbuf* and tcp send* values way beyond 
reason even though the TCP throughput hadn't been a problem in non iscsi 
operations.

So whatever i'm doing is not addressing the particular problem. The drives have 
plenty of available I/O, but instead of using it, or the RAM in the system, or 
the ZIL in the system, it seems
 largely idle, pegs the system with continuous (but not max speed) writes and 
halts the network transfers, and then continues on its way. 

Even if its a threading issue (i.e. we are single threading) there should be 
some way to make this behave like a normal system considering how much RAM, 
SSD, and other resources I'm trying to through at this thing. For example, 
after the buffer starts to empty, additional writes from the client should be 
accepted and NCQ should help reorder to process them in an efficient fashion, 
etc, etc. 

istgt settings:
istgt version 0.3
istgt extra version 20100707

    MaxSessions              32
    MaxConnections           32
    FirstBurstLength         65536
    MaxBurstLength           262144
    MaxRecvDataSegmentLength 262144

Local benchmarks like dd if=/dev/zero of=/tank/dump bs=1M count=12000 returns 
like 200MB/s. 12582912000 bytes transferred in 61.140903 secs (205801867 
bytes/sec), and show continuous (as expected) writes to the spindles. (200MB/s 
is pretty close to the max I/O speed we can expect given the port the 
controller is in and RAID overhead, etc with 7200 RPM drives, at 5900 RPM the 
number is about 80MB/s). 

If this is an istgt problem, is there a way to get reasonable performance out 
of it?

I know I'm not losing my mind here, so if someone has tackled this particular 
problem (or its sort), please chime in and let me know what tunable I'm 
missing. :)

Thanks very much, in advance,

DJ






_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Reply via email to