Re: [zfs-discuss] Improving zfs send performance

2008-11-12 Thread Thomas Maier-Komor
Roch schrieb:
> Thomas, for long latency fat links, it should be quite
> beneficial to set the socket buffer on the receive side
> (instead of having users tune tcp_recv_hiwat).
> 
> throughput of a tcp connnection is gated by 
> "receive socket buffer / round trip time".
> 
> Could that be Ross' problem ?
> 
> -r
> 
> 

Hmm, I'm not a TCP expert, but that sounds absolutely possible, if
Solaris 10 isn't tuning the TCP buffer automatically. The default
receive buffer seems to be 48k (at least on a V240 running 118833-33).
So if the block size is something like 128k it would absolutely make
sense to tune the receive buffer to lower the rund trip time...

Ross: Would you like a patch to test if this is the case? Which version
of mbuffer are you currently using?

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-11-12 Thread Roch

Thomas, for long latency fat links, it should be quite
beneficial to set the socket buffer on the receive side
(instead of having users tune tcp_recv_hiwat).

throughput of a tcp connnection is gated by 
"receive socket buffer / round trip time".

Could that be Ross' problem ?

-r



Ross Smith writes:
 > 
 > Thanks, that got it working.  I'm still only getting 10MB/s, so it's not 
 > solved my problem - I've still got a bottleneck somewhere, but mbuffer is a 
 > huge improvement over standard zfs send / receive.  It makes such a 
 > difference when you can actually see what's going on.
 > 
 > 
 > 
 > > Date: Wed, 15 Oct 2008 12:08:14 +0200
 > > From: [EMAIL PROTECTED]
 > > To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org
 > > Subject: Re: [zfs-discuss] Improving zfs send performance
 > > 
 > > Ross schrieb:
 > >> Hi,
 > >> 
 > >> I'm just doing my first proper send/receive over the network and I'm 
 > >> getting just 9.4MB/s over a gigabit link.  Would you be able to provide 
 > >> an example of how to use mbuffer / socat with ZFS for a Solaris beginner?
 > >> 
 > >> thanks,
 > >> 
 > >> Ross
 > >> --
 > >> This message posted from opensolaris.org
 > >> ___
 > >> zfs-discuss mailing list
 > >> zfs-discuss@opensolaris.org
 > >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > > 
 > > receiver> mbuffer -I sender:1 -s 128k -m 512M | zfs receive
 > > 
 > > sender> zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m
 > > 512M -O receiver:1
 > > 
 > > BTW: I release a new version of mbuffer today.
 > > 
 > > HTH,
 > > Thomas
 > 
 > _
 > Make a mini you and download it into Windows Live Messenger
 > http://clk.atdmt.com/UKM/go/111354029/direct/01/
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-20 Thread Scott Williamson
On Mon, Oct 20, 2008 at 1:52 AM, Victor Latushkin
<[EMAIL PROTECTED]>wrote

> Indeed. For example, less than a week ago fix for the following two CRs
> (along with some others) was put back into Solaris Nevada:
>
> 6333409 traversal code should be able to issue multiple reads in parallel
> 6418042 want traversal in depth-first pre-order for quicker 'zfs send'
>

That is helpful Victor. Does anyone have a full list of CRs that I can
provide to sun support? I have tried searching the bugs database, but I
didn't even find those two on my own.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-20 Thread Victor Latushkin
Richard Elling пишет:
>> Keep in mind that this is for Solaris 10 not opensolaris.
> 
> Keep in mind that any changes required for Solaris 10 will first
> be available in OpenSolaris, including any changes which may
> have already been implemented.

Indeed. For example, less than a week ago fix for the following two CRs 
(along with some others) was put back into Solaris Nevada:

6333409 traversal code should be able to issue multiple reads in parallel
6418042 want traversal in depth-first pre-order for quicker 'zfs send'

This should have positive impact on 'zfs send' performance.

Wbr,
victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-18 Thread Carsten Aulbert
Hi

Miles Nordin wrote:
>> "r" == Ross  <[EMAIL PROTECTED]> writes:
> 
>  r> figures so close to 10MB/s.  All three servers are running
>  r> full duplex gigabit though
> 
> there is one tricky way 100Mbit/s could still bite you, but it's
> probably not happening to you.  It mostly affects home users with
> unmanaged switches:
> 
>   http://www.smallnetbuilder.com/content/view/30212/54/
>   http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
> 
> because the big switch vendors all use pause frames safely:
> 
>  http://www.networkworld.com/netresources/0913flow2.html -- pause frames as 
> interpreted by netgear are harmful

That rings a bell, Ross, are you using NFS via UDP or TCP? May it be
that your network has different performance levels for different
transport types? For our network we have disabled pause frames completey
and rely only on TCP internal mechanisms to prevent flooding/blocking.

Carsten

PS: the job where 25k files sizing up to 800 GB is now done - zfs send
took only 52 hrs and the speed was ~ 4.5 MB/s :(
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Scott Williamson
On Fri, Oct 17, 2008 at 2:48 PM, Richard Elling <[EMAIL PROTECTED]>wrote:

> Keep in mind that any changes required for Solaris 10 will first
> be available in OpenSolaris, including any changes which may
> have already been implemented.
>

For me (who uses SOL10) it is the only way I can get information about what
bugs and changes have been identified and helps me get stuff from
opensolaris into sol10. The last support ticket resulted in a solaris iSCSI
target to windows initiator patch to solaris 10 that made iSCSI targets on
ZFS actually work for us.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Richard Elling
Scott Williamson wrote:
> Hi All,
>  
> I have opened a ticket with sun support #66104157 regarding zfs send / 
> receive and will let you know what I find out.

Thanks.

>  
> Keep in mind that this is for Solaris 10 not opensolaris.

Keep in mind that any changes required for Solaris 10 will first
be available in OpenSolaris, including any changes which may
have already been implemented.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Miles Nordin
> "r" == Ross  <[EMAIL PROTECTED]> writes:

 r> figures so close to 10MB/s.  All three servers are running
 r> full duplex gigabit though

there is one tricky way 100Mbit/s could still bite you, but it's
probably not happening to you.  It mostly affects home users with
unmanaged switches:

  http://www.smallnetbuilder.com/content/view/30212/54/
  http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html

because the big switch vendors all use pause frames safely:

 http://www.networkworld.com/netresources/0913flow2.html -- pause frames as 
interpreted by netgear are harmful



pgpUnfS5B76nY.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Scott Williamson
Hi All,

I have opened a ticket with sun support #66104157 regarding zfs send /
receive and will let you know what I find out.

Keep in mind that this is for Solaris 10 not opensolaris.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Ross
Yup, that's one of the first things I checked when it came out with
figures so close to 10MB/s.  All three servers are running full duplex
gigabit though, as reported by both Solaris and the switch.  And both
the NFS at 60+MB/s, and the zfs send / receive are all going over the
same network link, in some cases to the same servers.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Dimitri Aivaliotis
Hi Ross,

On Fri, Oct 17, 2008 at 1:35 PM, Ross <[EMAIL PROTECTED]> wrote:
> Ok, just did some more testing on this machine to try to find where my 
> bottlenecks are.  Something very odd is going on here.  As best I can tell 
> there are two separate problems now:
>
> - something is throttling network output to 10MB/s


I'll try to help you with this problem.


> The network throughput I've verified with mbuffer:
>
> 1.  A quick mbuffer test from /dev/zero to /dev/null gave me 565MB/s.
> 2.  On a test server, mbuffer sending from /dev/zero on one machine to 
> /dev/null on another gave me 37MB/s
> 3.  On the live server, mbuffer sending from /dev/zero to the same receiving 
> machine gave me just under 10MB/s.
>
> This looks very much like mbuffer is throttled on this machine, but I know 
> NFS can give me 60-80MB/s.  Can anybody give me a clue as to what could be 
> causing this?
>

Does your NFS mount go over a separate network?  If not, just ignore
this advice. :)

When first testing out ZFS over NFS performance, I ran into a similar
problem.  I had very nice graphs, all plateauing at 10MB/s, and was
getting frustrated at performance being so slow.  It turned out that
one of my links was 100Mbit.  I took a moment to breathe, learn from
my mistake (check the network links BEFORE running performance tests),
and ran my tests again.

Check your network links, make sure that it's Gigabit all the way
through, and that you're negotiating full-duplex.  A 100Mbit link will
give you just about 10MB/s throughput on network transfers.

- Dimitri
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-17 Thread Ross
Ok, just did some more testing on this machine to try to find where my 
bottlenecks are.  Something very odd is going on here.  As best I can tell 
there are two separate problems now:

- something is throttling network output to 10MB/s
- something is throttling zfs send to around 20MB/s

The network throughput I've verified with mbuffer:

1.  A quick mbuffer test from /dev/zero to /dev/null gave me 565MB/s.
2.  On a test server, mbuffer sending from /dev/zero on one machine to 
/dev/null on another gave me 37MB/s
3.  On the live server, mbuffer sending from /dev/zero to the same receiving 
machine gave me just under 10MB/s.

This looks very much like mbuffer is throttled on this machine, but I know NFS 
can give me 60-80MB/s.  Can anybody give me a clue as to what could be causing 
this?


And the disk performance is just as confusing.  Again I used a test server to 
provide a comparison, and this time used a zfs scrub with iostat to check the 
performance possible on the disks.

Live server:  5 sets of 3 way mirrors
Test server:  5 disk raid-z2

1.  On the Live server, zfs send to /dev/null via mbuffer reports a speed of 
21MB/s
 # zfs send [EMAIL PROTECTED] | mbuffer -s 128k -m 512M > /dev/null
2.  On the Test server, zfs send to /dev/null via mbuffer reports a speed of 
35MB/s
3.  On the Live server, zpool scrub and iostat report a peak of 3k iops, and 
283MB/s throughput.
4.  On the Test server, zpool scrub and iostat report a peak of 472 iops, and 
53MB/s throughput.

Surely the send and scrub operations should give similar results?  Why is zpool 
scrub running 10-15x faster than zfs send on the live server?

The iostat figures on the live server are particularly telling.

During a scrub (30s intervals):
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rc-pool  734G  1.55T  2.94K 41   189M   788K
  mirror 144G   320G578  6  39.2M   166K
c1t1d0  -  -379  5  39.9M   166K
c1t2d0  -  -379  5  39.9M   166K
c2t1d0  -  -385  5  40.1M   166K
  mirror 147G   317G633  2  37.8M   170K
c1t3d0  -  -389  2  38.7M   171K
c2t2d0  -  -393  2  38.9M   171K
c2t0d0  -  -384  2  38.9M   171K
  mirror 147G   317G619  6  37.3M  57.5K
c2t3d0  -  -377  2  38.3M  57.9K
c1t5d0  -  -377  2  38.3M  57.9K
c1t4d0  -  -373  3  38.2M  57.9K
  mirror 148G   316G638 10  37.6M  64.0K
c2t4d0  -  -375  4  38.5M  64.4K
c2t5d0  -  -386  6  38.2M  64.4K
c1t6d0  -  -384  6  38.2M  64.4K
  mirror 149G   315G540  6  37.4M   164K
c1t7d0  -  -356  4  38.1M   164K
c2t6d0  -  -362  5  38.2M   164K
c2t7d0  -  -361  5  38.2M   164K
  c3d1p0  12K   504M  0  8  0   166K
--  -  -  -  -  -  -

During a send (30s intervals):
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rc-pool  734G  1.55T148 55  18.6M  1.71M
  mirror 144G   320G 25  6  3.15M   235K
c1t1d0  -  -  8  3  1.02M   235K
c1t2d0  -  -  7  3   954K   235K
c2t1d0  -  -  9  3  1.19M   235K
  mirror 147G   317G 27  3  3.40M   203K
c1t3d0  -  -  8  2  1.03M   203K
c2t2d0  -  -  9  3  1.25M   203K
c2t0d0  -  -  8  2  1.11M   203K
  mirror 147G   317G 32  2  4.12M   205K
c2t3d0  -  - 11  1  1.45M   205K
c1t5d0  -  - 10  1  1.34M   205K
c1t4d0  -  - 10  1  1.34M   205K
  mirror 148G   316G 32  2  4.02M   201K
c2t4d0  -  - 10  1  1.37M   201K
c2t5d0  -  -  9  1  1.23M   201K
c1t6d0  -  - 11  1  1.43M   201K
  mirror 149G   315G 31  6  3.89M   180K
c1t7d0  -  - 11  2  1.45M   180K
c2t6d0  -  -  8  2  1.10M   180K
c2t7d0  -  - 10  2  1.35M   180K
  c3d1p0  12K   504M  0 34  0   727K
--  -  -  -  -  -  -

Can anybody explain why zfs send could be so slow on one server?  Is anybody 
else able to compare their iostat results for a zfs send and zpool scrub to see 
if they also have such a huge difference between the figures?

thanks,

Ross
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Scott Williamson
So I am zfs sending ~450 datasets between thumpers running SOL10U5 via ssh,
most are empty except maybe 10 that have a few GB of files.

I see the following output on one that contained ~1GB  of files in my send
report:

Output from zfs receive -v "received 1.07Gb stream in 30 seconds
(36.4Mb/sec)"

I have a few problems with this:

1. Should it not read 1.07GB for Bytes?

2. Should it not read that this was done at a rate of 36.4MB/s?

The output seems to be incorrect, but makes sense if you uppercase the b.

This is an underwhelming ~292Mb/s!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Ross Smith


Oh dear god.  Sorry folks, it looks like the new hotmail really doesn't play 
well with the list.  Trying again in plain text:
 
 
> Try to separate the two things:
> 
> (1) Try /dev/zero -> mbuffer --- network ---> mbuffer> /dev/null
> That should give you wirespeed
 
I tried that already.  It still gets just 10-11MB/s from this server.
I can get zfs send / receive and mbuffer working at 30MB/s though from a couple 
of test servers (with much lower specs).
 
> (2) Try zfs send | mbuffer> /dev/null
> That should give you an idea how fast zfs send really is locally.
 
Hmm, that's better than 10MB/s, but the average is still only around 20MB/s:
summary:  942 MByte in 47.4 sec - average of 19.9 MB/s
 
I think that points to another problem though as the send mbuffer is 100% full. 
 Certainly the pool itself doesn't appear under any strain at all while this is 
going on:
 
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rc-pool  732G  1.55T171 85  21.3M  1.01M
  mirror 144G   320G 38  0  4.78M  0
c1t1d0  -  -  6  0   779K  0
c1t2d0  -  - 17  0  2.17M  0
c2t1d0  -  - 14  0  1.85M  0
  mirror 146G   318G 39  0  4.89M  0
c1t3d0  -  - 20  0  2.50M  0
c2t2d0  -  - 13  0  1.63M  0
c2t0d0  -  -  6  0   779K  0
  mirror 146G   318G 34  0  4.35M  0
c2t3d0  -  - 19  0  2.39M  0
c1t5d0  -  -  7  0  1002K  0
c1t4d0  -  -  7  0  1002K  0
  mirror 148G   316G 23  0  2.93M  0
c2t4d0  -  -  8  0  1.09M  0
c2t5d0  -  -  6  0   890K  0
c1t6d0  -  -  7  0  1002K  0
  mirror 148G   316G 35  0  4.35M  0
c1t7d0  -  -  6  0   779K  0
c2t6d0  -  - 12  0  1.52M  0
c2t7d0  -  - 17  0  2.07M  0
  c3d1p0  12K   504M  0 85  0  1.01M
--  -  -  -  -  -  -
 
Especially when compared to the zfs send stats on my backup server which 
managed 30MB/s via mbuffer (Being received on a single virtual SATA disk):
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rpool   5.12G  42.6G  0  5  0  27.1K
  c4t0d0s0  5.12G  42.6G  0  5  0  27.1K
--  -  -  -  -  -  -
zfspool  431G  4.11T261  0  31.4M  0
  raidz2 431G  4.11T261  0  31.4M  0
c4t1d0  -  -155  0  6.28M  0
c4t2d0  -  -155  0  6.27M  0
c4t3d0  -  -155  0  6.27M  0
c4t4d0  -  -155  0  6.27M  0
c4t5d0  -  -155  0  6.27M  0
--  -  -  -  -  -  -
The really ironic thing is that the 30MB/s send / receive was sending to a 
virtual SATA disk which is stored (via sync NFS) on the server I'm having 
problems with...
 
Ross

 

> Date: Thu, 16 Oct 2008 14:27:49 +0200
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> CC: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Improving zfs send performance
> 
> Hi Ross
> 
> Ross wrote:
>> Now though I don't think it's network at all. The end result from that 
>> thread is that we can't see any errors in the network setup, and using 
>> nicstat and NFS I can show that the server is capable of 50-60MB/s over the 
>> gigabit link. Nicstat also shows clearly that both zfs send / receive and 
>> mbuffer are only sending 1/5 of that amount of data over the network.
>> 
>> I've completely run out of ideas of my own (but I do half expect there's a 
>> simple explanation I haven't thought of). Can anybody think of a reason why 
>> both zfs send / receive and mbuffer would be so slow?
> 
> Try to separate the two things:
> 
> (1) Try /dev/zero -> mbuffer --- network ---> mbuffer> /dev/null
> 
> That should give you wirespeed
> 
> (2) Try zfs send | mbuffer> /dev/null
> 
> That should give you an idea how fast zfs send really is locally.
> 
> Carsten
_
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Ross Smith

> Try to separate the two things:> > (1) Try /dev/zero -> mbuffer --- network 
> ---> mbuffer > /dev/null
> That should give you wirespeed
I tried that already.  It still gets just 10-11MB/s from this server.
I can get zfs send / receive and mbuffer working at 30MB/s though from a couple 
of test servers (with much lower specs).
 
> (2) Try zfs send | mbuffer > /dev/null> That should give you an idea how fast 
> zfs send really is locally.
Hmm, that's better than 10MB/s, but the average is still only around 20MB/s:
summary:  942 MByte in 47.4 sec - average of 19.9 MB/s
 
I think that points to another problem though as the send mbuffer is 100% full. 
 Certainly the pool itself doesn't appear under any strain at all while this is 
going on:
 
   capacity operationsbandwidthpool used  avail   
read  write   read  write--  -  -  -  -  -  
-rc-pool  732G  1.55T171 85  21.3M  1.01M  mirror 144G   
320G 38  0  4.78M  0c1t1d0  -  -  6  0   779K   
   0c1t2d0  -  - 17  0  2.17M  0c2t1d0  -  
- 14  0  1.85M  0  mirror 146G   318G 39  0  4.89M  
0c1t3d0  -  - 20  0  2.50M  0c2t2d0  -  -   
  13  0  1.63M  0c2t0d0  -  -  6  0   779K  0  
mirror 146G   318G 34  0  4.35M  0c2t3d0  -  - 
19  0  2.39M  0c1t5d0  -  -  7  0  1002K  0
c1t4d0  -  -  7  0  1002K  0  mirror 148G   316G 23 
 0  2.93M  0c2t4d0  -  -  8  0  1.09M  0
c2t5d0  -  -  6  0   890K  0c1t6d0  -  -  7 
 0  1002K  0  mirror 148G   316G 35  0  4.35M  0
c1t7d0  -  -  6  0   779K  0c2t6d0  -  - 12 
 0  1.52M  0c2t7d0  -  - 17  0  2.07M  0  
c3d1p0  12K   504M  0 85  0  1.01M--  -  -  
-  -  -  -
Especially when compared to the zfs send stats on my backup server which 
managed 30MB/s via mbuffer (Being received on a single virtual SATA disk):
   capacity operationsbandwidthpool used  avail   
read  write   read  write--  -  -  -  -  -  
-rpool   5.12G  42.6G  0  5  0  27.1K  c4t0d0s0  5.12G  
42.6G  0  5  0  27.1K--  -  -  -  -  -  
-zfspool  431G  4.11T261  0  31.4M  0  raidz2 431G  
4.11T261  0  31.4M  0c4t1d0  -  -155  0  6.28M  
0c4t2d0  -  -155  0  6.27M  0c4t3d0  -  
-155  0  6.27M  0c4t4d0  -  -155  0  6.27M  
0c4t5d0  -  -155  0  6.27M  0--  -  -  
-  -  -  -
The really ironic thing is that the 30MB/s send / receive was sending to a 
virtual SATA disk which is stored (via sync NFS) on the server I'm having 
problems with...
 
Ross
 
 
_
Win New York holidays with Kellogg’s & Live Search
http://clk.atdmt.com/UKM/go/111354033/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Carsten Aulbert
Hi Ross

Ross wrote:
> Now though I don't think it's network at all.  The end result from that 
> thread is that we can't see any errors in the network setup, and using 
> nicstat and NFS I can show that the server is capable of 50-60MB/s over the 
> gigabit link.  Nicstat also shows clearly that both zfs send / receive and 
> mbuffer are only sending 1/5 of that amount of data over the network.
> 
> I've completely run out of ideas of my own (but I do half expect there's a 
> simple explanation I haven't thought of).  Can anybody think of a reason why 
> both zfs send / receive and mbuffer would be so slow?

Try to separate the two things:

(1) Try /dev/zero -> mbuffer --- network ---> mbuffer > /dev/null

That should give you wirespeed

(2) Try zfs send | mbuffer > /dev/null

That should give you an idea how fast zfs send really is locally.

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Carsten Aulbert
Hi Scott,

Scott Williamson wrote:
> You seem to be using dd for write testing. In my testing I noted that
> there was a large difference in write speed between using dd to write
> from /dev/zero and using other files. Writing from /dev/zero always
> seemed to be fast, reaching the maximum of ~200MB/s and using cp which
> would perform poorler the fewer the vdevs.

You are right, the write benchmarks were done with dd just to have some
"bulk" bulk figures since usually zeros can be generated fast enough.

> 
> This also impacted the zfs send speed, as with fewer vdevs in RaidZ2 the
> disks seemed to spend most of their time seeking during the send.
> 

That seems a bit too simplistic to me. If you compare raidz with raidz2
it seems that raidz2 is not too bad with fewer vdevs. I wish there was a
way for zfs send to avoid so many seeks. The << 1 TB file system is
still being zfs send, now close to 48 hours.

Cheers

Carsten

PS: We still have a spare thumper sitting around, maybe I give it a try
with 5 vdevs
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Ross
Ok, I'm not entirely sure this is the same problem, but it does sound fairly 
similar.  Apologies for hijacking the thread if this does turn out to be 
something else.

After following the advice here to get mbuffer working with zfs send / receive, 
I found I was only getting around 10MB/s throughput.  Thinking it was a network 
problem I started the below thread in the OpenSolaris help forum:
http://www.opensolaris.org/jive/thread.jspa?messageID=294846

Now though I don't think it's network at all.  The end result from that thread 
is that we can't see any errors in the network setup, and using nicstat and NFS 
I can show that the server is capable of 50-60MB/s over the gigabit link.  
Nicstat also shows clearly that both zfs send / receive and mbuffer are only 
sending 1/5 of that amount of data over the network.

I've completely run out of ideas of my own (but I do half expect there's a 
simple explanation I haven't thought of).  Can anybody think of a reason why 
both zfs send / receive and mbuffer would be so slow?
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Scott Williamson
Hi Carsten,

You seem to be using dd for write testing. In my testing I noted that there
was a large difference in write speed between using dd to write from
/dev/zero and using other files. Writing from /dev/zero always seemed to be
fast, reaching the maximum of ~200MB/s and using cp which would perform
poorler the fewer the vdevs.

This also impacted the zfs send speed, as with fewer vdevs in RaidZ2 the
disks seemed to spend most of their time seeking during the send.

On Thu, Oct 16, 2008 at 1:27 AM, Carsten Aulbert <[EMAIL PROTECTED]
> wrote:

> Some time ago I made some tests to find this:
>
> (1) create a new zpool
> (2) Copy user's home to it (always the same ~ 25 GB IIRC)
> (3) zfs send to /dev/null
> (4) evaluate && continue loop
>
> I did this for fully mirrored setups, raidz as well as raidz2, the
> results were mixed:
>
>
> https://n0.aei.uni-hannover.de/cgi-bin/twiki/view/ATLAS/ZFSBenchmarkTest#ZFS_send_performance_relevant_fo
>
> The culprit here might be that in retrospect this seemed like a "good"
> home filesystem, i.e. one which was quite fast.
>
> If you don't want to bother with the table:
>
> Mirrored setup never exceeded 58 MB/s and was getting faster the more
> small mirrors you used.
>
> RaidZ had its sweetspot with a configuration of '6 6 6 6 6 6 5 5', i.e.
> 6 or 5 disks per RaidZ and 8 vdevs
>
> RaidZ2 finally was best at '10 9 9 9 9', i.e. 5 vdevs but not much worse
> with only 3, i.e. what we are currently using to get more storage space
> (gains us about 2 TB/box).
>
> Cheers
>
> Carsten
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi again

Brent Jones wrote:
> 
> Scott,
> 
> Can you tell us the configuration that you're using that is working for you?
> Were you using RaidZ, or RaidZ2? I'm wondering what the "sweetspot" is
> to get a good compromise in vdevs and usable space/performance
>

Some time ago I made some tests to find this:

(1) create a new zpool
(2) Copy user's home to it (always the same ~ 25 GB IIRC)
(3) zfs send to /dev/null
(4) evaluate && continue loop

I did this for fully mirrored setups, raidz as well as raidz2, the
results were mixed:

https://n0.aei.uni-hannover.de/cgi-bin/twiki/view/ATLAS/ZFSBenchmarkTest#ZFS_send_performance_relevant_fo

The culprit here might be that in retrospect this seemed like a "good"
home filesystem, i.e. one which was quite fast.

If you don't want to bother with the table:

Mirrored setup never exceeded 58 MB/s and was getting faster the more
small mirrors you used.

RaidZ had its sweetspot with a configuration of '6 6 6 6 6 6 5 5', i.e.
6 or 5 disks per RaidZ and 8 vdevs

RaidZ2 finally was best at '10 9 9 9 9', i.e. 5 vdevs but not much worse
with only 3, i.e. what we are currently using to get more storage space
(gains us about 2 TB/box).

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Brent Jones
On Wed, Oct 15, 2008 at 2:17 PM, Scott Williamson
<[EMAIL PROTECTED]> wrote:
> Hi All,
>
> Just want to note that I had the same issue with zfs send + vdevs that had
> 11 drives in them on a X4500. Reducing the count of drives per zvol cleared
> this up.
>
> One vdev is IOPS limited to the speed of one drive in that vdev, according
> to this post (see comment from ptribble.)
>

Scott,

Can you tell us the configuration that you're using that is working for you?
Were you using RaidZ, or RaidZ2? I'm wondering what the "sweetspot" is
to get a good compromise in vdevs and usable space/performance

Thanks!

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Scott Williamson
Hi All,

Just want to note that I had the same issue with zfs send + vdevs that had
11 drives in them on a X4500. Reducing the count of drives per zvol cleared
this up.

One vdev is IOPS limited to the speed of one drive in that vdev, according
to this post  (see
comment from ptribble.)

On Wed, Oct 15, 2008 at 3:07 PM, Carsten Aulbert <[EMAIL PROTECTED]
> wrote:

> Hi Richard,
>
> Richard Elling wrote:
> > Since you are reading, it depends on where the data was written.
> > Remember, ZFS dynamic striping != RAID-0.
> > I would expect something like this if the pool was expanded at some
> > point in time.
>
> No, the RAID was set-up in one go right after jumpstarting the box.
>
> >> (2) The disks should be able to perform much much faster than they
> >> currently output data at, I believe it;s 2008 and not 1995.
> >>
> >
> > X4500?  Those disks are good for about 75-80 random iops,
> > which seems to be about what they are delivering.  The dtrace
> > tool, iopattern, will show the random/sequential nature of the
> > workload.
> >
>
> I need to read about his a bit and will try to analyze it.
>
> >> (3) The four cores of the X4500 are dying of boredom, i.e. idle >95% all
> >> the time.
> >>
> >> Has anyone a good idea, where the bottleneck could be? I'm running out
> >> of ideas.
> >>
> >
> > I would suspect the disks.  30 second samples are not very useful
> > to try and debug such things -- even 1 second samples can be
> > too coarse.  But you should take a look at 1 second samples
> > to see if there is a consistent I/O workload.
> > -- richard
> >
>
> Without doing too much statistics (yet, if needed I can easily do that)
> it looks like these:
>
>
>   capacity operationsbandwidth
> pool used  avail   read  write   read  write
> --  -  -  -  -  -  -
> atlashome   3.54T  17.3T256  0  7.97M  0
>  raidz2 833G  6.00T  0  0  0  0
>c0t0d0  -  -  0  0  0  0
>c1t0d0  -  -  0  0  0  0
>c4t0d0  -  -  0  0  0  0
>c6t0d0  -  -  0  0  0  0
>c7t0d0  -  -  0  0  0  0
>c0t1d0  -  -  0  0  0  0
>c1t1d0  -  -  0  0  0  0
>c4t1d0  -  -  0  0  0  0
>c5t1d0  -  -  0  0  0  0
>c6t1d0  -  -  0  0  0  0
>c7t1d0  -  -  0  0  0  0
>c0t2d0  -  -  0  0  0  0
>c1t2d0  -  -  0  0  0  0
>c4t2d0  -  -  0  0  0  0
>c5t2d0  -  -  0  0  0  0
>  raidz21.29T  5.52T133  0  4.14M  0
>c6t2d0  -  -117  0   285K  0
>c7t2d0  -  -114  0   279K  0
>c0t3d0  -  -106  0   261K  0
>c1t3d0  -  -114  0   282K  0
>c4t3d0  -  -118  0   294K  0
>c5t3d0  -  -125  0   308K  0
>c6t3d0  -  -126  0   311K  0
>c7t3d0  -  -118  0   293K  0
>c0t4d0  -  -119  0   295K  0
>c1t4d0  -  -120  0   298K  0
>c4t4d0  -  -120  0   291K  0
>c6t4d0  -  -106  0   257K  0
>c7t4d0  -  - 96  0   236K  0
>c0t5d0  -  -109  0   267K  0
>c1t5d0  -  -114  0   282K  0
>  raidz21.43T  5.82T123  0  3.83M  0
>c4t5d0  -  -108  0   242K  0
>c5t5d0  -  -104  0   236K  0
>c6t5d0  -  -104  0   239K  0
>c7t5d0  -  -107  0   245K  0
>c0t6d0  -  -108  0   248K  0
>c1t6d0  -  -106  0   245K  0
>c4t6d0  -  -108  0   250K  0
>c5t6d0  -  -112  0   258K  0
>c6t6d0  -  -114  0   261K  0
>c7t6d0  -  -110  0   253K  0
>c0t7d0  -  -109  0   248K  0
>c1t7d0  -  -109  0   246K  0
>c4t7d0  -  -108  0   243K  0
>c5t7d0  -  -108  0   244K  0
>c6t7d0  -  -106  0   240K  0
>c7t7d0  -  -109  0   244K  0
> --  -  -  -  -  -  -
>
> the iops vary between about 70 - 140, interesting bit is that the first
> raidz2 does not get any hits at all :(
>
> Cheers
>
> Carsten
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi Richard,

Richard Elling wrote:
> Since you are reading, it depends on where the data was written.
> Remember, ZFS dynamic striping != RAID-0.
> I would expect something like this if the pool was expanded at some
> point in time.

No, the RAID was set-up in one go right after jumpstarting the box.

>> (2) The disks should be able to perform much much faster than they
>> currently output data at, I believe it;s 2008 and not 1995.
>>   
> 
> X4500?  Those disks are good for about 75-80 random iops,
> which seems to be about what they are delivering.  The dtrace
> tool, iopattern, will show the random/sequential nature of the
> workload.
>

I need to read about his a bit and will try to analyze it.

>> (3) The four cores of the X4500 are dying of boredom, i.e. idle >95% all
>> the time.
>>
>> Has anyone a good idea, where the bottleneck could be? I'm running out
>> of ideas.
>>   
> 
> I would suspect the disks.  30 second samples are not very useful
> to try and debug such things -- even 1 second samples can be
> too coarse.  But you should take a look at 1 second samples
> to see if there is a consistent I/O workload.
> -- richard
> 

Without doing too much statistics (yet, if needed I can easily do that)
it looks like these:


   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
atlashome   3.54T  17.3T256  0  7.97M  0
  raidz2 833G  6.00T  0  0  0  0
c0t0d0  -  -  0  0  0  0
c1t0d0  -  -  0  0  0  0
c4t0d0  -  -  0  0  0  0
c6t0d0  -  -  0  0  0  0
c7t0d0  -  -  0  0  0  0
c0t1d0  -  -  0  0  0  0
c1t1d0  -  -  0  0  0  0
c4t1d0  -  -  0  0  0  0
c5t1d0  -  -  0  0  0  0
c6t1d0  -  -  0  0  0  0
c7t1d0  -  -  0  0  0  0
c0t2d0  -  -  0  0  0  0
c1t2d0  -  -  0  0  0  0
c4t2d0  -  -  0  0  0  0
c5t2d0  -  -  0  0  0  0
  raidz21.29T  5.52T133  0  4.14M  0
c6t2d0  -  -117  0   285K  0
c7t2d0  -  -114  0   279K  0
c0t3d0  -  -106  0   261K  0
c1t3d0  -  -114  0   282K  0
c4t3d0  -  -118  0   294K  0
c5t3d0  -  -125  0   308K  0
c6t3d0  -  -126  0   311K  0
c7t3d0  -  -118  0   293K  0
c0t4d0  -  -119  0   295K  0
c1t4d0  -  -120  0   298K  0
c4t4d0  -  -120  0   291K  0
c6t4d0  -  -106  0   257K  0
c7t4d0  -  - 96  0   236K  0
c0t5d0  -  -109  0   267K  0
c1t5d0  -  -114  0   282K  0
  raidz21.43T  5.82T123  0  3.83M  0
c4t5d0  -  -108  0   242K  0
c5t5d0  -  -104  0   236K  0
c6t5d0  -  -104  0   239K  0
c7t5d0  -  -107  0   245K  0
c0t6d0  -  -108  0   248K  0
c1t6d0  -  -106  0   245K  0
c4t6d0  -  -108  0   250K  0
c5t6d0  -  -112  0   258K  0
c6t6d0  -  -114  0   261K  0
c7t6d0  -  -110  0   253K  0
c0t7d0  -  -109  0   248K  0
c1t7d0  -  -109  0   246K  0
c4t7d0  -  -108  0   243K  0
c5t7d0  -  -108  0   244K  0
c6t7d0  -  -106  0   240K  0
c7t7d0  -  -109  0   244K  0
--  -  -  -  -  -  -

the iops vary between about 70 - 140, interesting bit is that the first
raidz2 does not get any hits at all :(

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Richard Elling
comments below...

Carsten Aulbert wrote:
> Hi all,
>
> Carsten Aulbert wrote:
>   
>> More later.
>> 
>
> OK, I'm completely puzzled right now (and sorry for this lengthy email).
>  My first (and currently only idea) was that the size of the files is
> related to this effect, but that does not seem to be the case:
>
> (1) A 185 GB zfs file system was transferred yesterday with a speed of
> about 60 MB/s to two different servers. The histogram of files looks like:
>
> 2822 files were investigated, total size is: 185.82 Gbyte
>
> Summary of file sizes [bytes]:
> zero:  2
> 1 -> 2 0
> 2 -> 4 1
> 4 -> 8 3
> 8 -> 16   26
> 16 -> 32   8
> 32 -> 64   6
> 64 -> 128 29
> 128 -> 25611
> 256 -> 51213
> 512 -> 1024   17
> 1024 -> 2k33
> 2k -> 4k  45
> 4k -> 8k9044  
> 8k -> 16k 60
> 16k -> 32k41
> 32k -> 64k19
> 64k -> 128k   22
> 128k -> 256k  12
> 256k -> 512k   5
> 512k -> 1024k   1218  **
> 1024k -> 2M16004  *
> 2M -> 4M   46202
> 
> 4M -> 8M   0
> 8M -> 16M  0
> 16M -> 32M 0
> 32M -> 64M 0
> 64M -> 128M0
> 128M -> 256M   0
> 256M -> 512M   0
> 512M -> 1024M  0
> 1024M -> 2G0
> 2G -> 4G   0
> 4G -> 8G   0
> 8G -> 16G  1
>
> (2) Currently a much larger file system is being transferred, the same
> script (even the same incarnation, i.e. process) is now running close to
> 22 hours:
>
> 28549 files were investigated, total size is: 646.67 Gbyte
>
> Summary of file sizes [bytes]:
> zero:   4954  **
> 1 -> 2 0
> 2 -> 4 0
> 4 -> 8 1
> 8 -> 161
> 16 -> 32   0
> 32 -> 64   0
> 64 -> 128  1
> 128 -> 256 0
> 256 -> 512 9
> 512 -> 1024   71
> 1024 -> 2k 1
> 2k -> 4k1095  **
> 4k -> 8k8449  *
> 8k -> 16k   2217  
> 16k -> 32k   503  ***
> 32k -> 64k 1
> 64k -> 128k1
> 128k -> 256k   1
> 256k -> 512k   0
> 512k -> 1024k  0
> 1024k -> 2M0
> 2M -> 4M   0
> 4M -> 8M  16
> 8M -> 16M  0
> 16M -> 32M 0
> 32M -> 64M 11218
> 
> 64M -> 128M0
> 128M -> 256M   0
> 256M -> 512M   0
> 512M -> 1024M  0
> 1024M -> 2G0
> 2G -> 4G   5
> 4G -> 8G   1
> 8G -> 16G  3
> 16G -> 32G 1
>
>
> When watching zpool iostat I get this (30 second average, NOT the first
> output):
>
>capacity operationsbandwidth
> pool used  avail   read  write   read  write
> --  -  -  -  -  -  -
> atlashome   3.54T  17.3T137  0  4.28M  0
>   raidz2 833G  6.00T  1  0  30.8K  0
> c0t0d0  -  -  1  0  2.38K  0
> c1t0d0  -  -  1  0  2.18K  0
> c4t0d0  -  -  0  0  1.91K  0
> c6t0d0  -  -  0  0  1.76K  0
> c7t0d0  -  -  0  0  1.77K  0
> c0t1d0  -  -  0  0  1.79K  0
> c1t1d0  -  -  0  0  1.86K  0
> c4t1d0  -  -  0  0  1.97K  0
> c5t1d0  -  -  0  0  2.04K  0
> c6t1d0  -  -  1  0  2.25K  0
> c7t1d0  -  -  1  0  2.31K  0
> c0t2d0  -  -  1  0  2.21K  0
> c1t2d0  -  -  0  0  1.99K  0
> c4t2d0  -  -  0  0  1.99K  0
> c5t2d0  -  -  1  0  2.38K  0
>   raidz21.29T  5.52T 67  0  2.09M  0
> c6t2d0  -  - 58  0   143K  0
> c7t2d0  -  - 58  0   141K  0
> c0t3d0  -  - 53  0   131K  0
> c1t3d0  -  - 53  0   130K  0
> c4t3d0  -  - 58  0   143K  0
> c5t3d0  -  - 58  0   145K  0
> c6t3d0  -  - 59  0   147K  0
> c7t3d0  -  - 59  0   146K  0
> c0t4d0  -  - 59  0   145K  0
> c1t4d0  -  - 58  0   145K  0
> c4t4d0  -  - 58  0   145K  0
> c6t4d0  -  - 58  0   143K  0
> c7t4d0  -  - 58  0   143K  0
> c0t5d0  -

Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Marcelo Leal
Hello all,
 I think in SS 11 should be -xarch=amd64.

 Leal.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Thomas Maier-Komor
Ross Smith schrieb:
> I'm using 2008-05-07 (latest stable), am I right in assuming that one is ok?
> 
> 
>> Date: Wed, 15 Oct 2008 13:52:42 +0200
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org
>> Subject: Re: [zfs-discuss] Improving zfs send performance
>>
>> Thomas Maier-Komor schrieb:
>>> BTW: I release a new version of mbuffer today.
>> WARNING!!!
>>
>> Sorry people!!!
>>
>> The latest version of mbuffer has a regression that can CORRUPT output
>> if stdout is used. Please fall back to the last version. A fix is on the
>> way...
>>
>> - Thomas
> 
> _
> Discover Bird's Eye View now with Multimap from Live Search
> http://clk.atdmt.com/UKM/go/111354026/direct/01/

Yes this one is OK. The regression appeared in 20081014.

- Thomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi all,

Carsten Aulbert wrote:
> More later.

OK, I'm completely puzzled right now (and sorry for this lengthy email).
 My first (and currently only idea) was that the size of the files is
related to this effect, but that does not seem to be the case:

(1) A 185 GB zfs file system was transferred yesterday with a speed of
about 60 MB/s to two different servers. The histogram of files looks like:

2822 files were investigated, total size is: 185.82 Gbyte

Summary of file sizes [bytes]:
zero:  2
1 -> 2 0
2 -> 4 1
4 -> 8 3
8 -> 16   26
16 -> 32   8
32 -> 64   6
64 -> 128 29
128 -> 25611
256 -> 51213
512 -> 1024   17
1024 -> 2k33
2k -> 4k  45
4k -> 8k9044  
8k -> 16k 60
16k -> 32k41
32k -> 64k19
64k -> 128k   22
128k -> 256k  12
256k -> 512k   5
512k -> 1024k   1218  **
1024k -> 2M16004  *
2M -> 4M   46202

4M -> 8M   0
8M -> 16M  0
16M -> 32M 0
32M -> 64M 0
64M -> 128M0
128M -> 256M   0
256M -> 512M   0
512M -> 1024M  0
1024M -> 2G0
2G -> 4G   0
4G -> 8G   0
8G -> 16G  1

(2) Currently a much larger file system is being transferred, the same
script (even the same incarnation, i.e. process) is now running close to
22 hours:

28549 files were investigated, total size is: 646.67 Gbyte

Summary of file sizes [bytes]:
zero:   4954  **
1 -> 2 0
2 -> 4 0
4 -> 8 1
8 -> 161
16 -> 32   0
32 -> 64   0
64 -> 128  1
128 -> 256 0
256 -> 512 9
512 -> 1024   71
1024 -> 2k 1
2k -> 4k1095  **
4k -> 8k8449  *
8k -> 16k   2217  
16k -> 32k   503  ***
32k -> 64k 1
64k -> 128k1
128k -> 256k   1
256k -> 512k   0
512k -> 1024k  0
1024k -> 2M0
2M -> 4M   0
4M -> 8M  16
8M -> 16M  0
16M -> 32M 0
32M -> 64M 11218

64M -> 128M0
128M -> 256M   0
256M -> 512M   0
512M -> 1024M  0
1024M -> 2G0
2G -> 4G   5
4G -> 8G   1
8G -> 16G  3
16G -> 32G 1


When watching zpool iostat I get this (30 second average, NOT the first
output):

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
atlashome   3.54T  17.3T137  0  4.28M  0
  raidz2 833G  6.00T  1  0  30.8K  0
c0t0d0  -  -  1  0  2.38K  0
c1t0d0  -  -  1  0  2.18K  0
c4t0d0  -  -  0  0  1.91K  0
c6t0d0  -  -  0  0  1.76K  0
c7t0d0  -  -  0  0  1.77K  0
c0t1d0  -  -  0  0  1.79K  0
c1t1d0  -  -  0  0  1.86K  0
c4t1d0  -  -  0  0  1.97K  0
c5t1d0  -  -  0  0  2.04K  0
c6t1d0  -  -  1  0  2.25K  0
c7t1d0  -  -  1  0  2.31K  0
c0t2d0  -  -  1  0  2.21K  0
c1t2d0  -  -  0  0  1.99K  0
c4t2d0  -  -  0  0  1.99K  0
c5t2d0  -  -  1  0  2.38K  0
  raidz21.29T  5.52T 67  0  2.09M  0
c6t2d0  -  - 58  0   143K  0
c7t2d0  -  - 58  0   141K  0
c0t3d0  -  - 53  0   131K  0
c1t3d0  -  - 53  0   130K  0
c4t3d0  -  - 58  0   143K  0
c5t3d0  -  - 58  0   145K  0
c6t3d0  -  - 59  0   147K  0
c7t3d0  -  - 59  0   146K  0
c0t4d0  -  - 59  0   145K  0
c1t4d0  -  - 58  0   145K  0
c4t4d0  -  - 58  0   145K  0
c6t4d0  -  - 58  0   143K  0
c7t4d0  -  - 58  0   143K  0
c0t5d0  -  - 58  0   145K  0
c1t5d0  -  - 58  0   144K  0
  raidz21.43T  5.82T 69  0  2.16M  0
c4t5d0  -  - 62  0   141K  0
c5t5d0  -  - 60  0   138K  0
c6t5d0  -  - 59  0   135K  0
c7t5d0  - 

Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Ross Smith

I'm using 2008-05-07 (latest stable), am I right in assuming that one is ok?


> Date: Wed, 15 Oct 2008 13:52:42 +0200
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Improving zfs send performance
> 
> Thomas Maier-Komor schrieb:
>> BTW: I release a new version of mbuffer today.
> 
> WARNING!!!
> 
> Sorry people!!!
> 
> The latest version of mbuffer has a regression that can CORRUPT output
> if stdout is used. Please fall back to the last version. A fix is on the
> way...
> 
> - Thomas

_
Discover Bird's Eye View now with Multimap from Live Search
http://clk.atdmt.com/UKM/go/111354026/direct/01/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Thomas Maier-Komor
Thomas Maier-Komor schrieb:
> BTW: I release a new version of mbuffer today.

WARNING!!!

Sorry people!!!

The latest version of mbuffer has a regression that can CORRUPT output
if stdout is used. Please fall back to the last version. A fix is on the
way...

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi Ross

Ross Smith wrote:
> Thanks, that got it working.  I'm still only getting 10MB/s, so it's not 
> solved my problem - I've still got a bottleneck somewhere, but mbuffer is a 
> huge improvement over standard zfs send / receive.  It makes such a 
> difference when you can actually see what's going on.

I'm currently trying to investigate this a bit. One of our user's home
directories is extremely slow to 'zfs send'. It started yesterday
afternoon at about 1600+0200 and is still running and has only copied
less than 50% of the whole tree:

On the receiving side zfs get tells me:

atlashome/BACKUP/XXX  used   193G   -
atlashome/BACKUP/XXX  available  17.2T  -
atlashome/BACKUP/XXX  referenced 193G   -
atlashome/BACKUP/XXX  compressratio  1.81x  -

So close 350 GB are transferred and about 500 GB to go.

More later.

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Ross Smith

Thanks, that got it working.  I'm still only getting 10MB/s, so it's not solved 
my problem - I've still got a bottleneck somewhere, but mbuffer is a huge 
improvement over standard zfs send / receive.  It makes such a difference when 
you can actually see what's going on.



> Date: Wed, 15 Oct 2008 12:08:14 +0200
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Improving zfs send performance
> 
> Ross schrieb:
>> Hi,
>> 
>> I'm just doing my first proper send/receive over the network and I'm getting 
>> just 9.4MB/s over a gigabit link.  Would you be able to provide an example 
>> of how to use mbuffer / socat with ZFS for a Solaris beginner?
>> 
>> thanks,
>> 
>> Ross
>> --
>> This message posted from opensolaris.org
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> receiver> mbuffer -I sender:1 -s 128k -m 512M | zfs receive
> 
> sender> zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m
> 512M -O receiver:1
> 
> BTW: I release a new version of mbuffer today.
> 
> HTH,
> Thomas

_
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Thomas Maier-Komor
Ross schrieb:
> Hi,
> 
> I'm just doing my first proper send/receive over the network and I'm getting 
> just 9.4MB/s over a gigabit link.  Would you be able to provide an example of 
> how to use mbuffer / socat with ZFS for a Solaris beginner?
> 
> thanks,
> 
> Ross
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

receiver> mbuffer -I sender:1 -s 128k -m 512M | zfs receive

sender> zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m
512M -O receiver:1

BTW: I release a new version of mbuffer today.

HTH,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Ross
Hi,

I'm just doing my first proper send/receive over the network and I'm getting 
just 9.4MB/s over a gigabit link.  Would you be able to provide an example of 
how to use mbuffer / socat with ZFS for a Solaris beginner?

thanks,

Ross
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-14 Thread Thomas Maier-Komor
Carsten Aulbert schrieb:
> Hi again,
> 
> Thomas Maier-Komor wrote:
>> Carsten Aulbert schrieb:
>>> Hi Thomas,
>> I don't know socat or what benefit it gives you, but have you tried
>> using mbuffer to send and receive directly (options -I and -O)?
> 
> I thought we tried that in the past and with socat it seemed faster, but
> I just made a brief test and I got (/dev/zero -> remote /dev/null) 330
> MB/s with mbuffer+socat and 430MB/s with mbuffer alone.
> 
>> Additionally, try to set the block size of mbuffer to the recordsize of
>> zfs (usually 128k):
>> receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive
>> sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1
> 
> We are using 32k since many of our user use tiny files (and then I need
> to reduce the buffer size because of this 'funny' error):
> 
> mbuffer: fatal: Cannot address so much memory
> (32768*65536=2147483648>1544040742911).
> 
> Does this qualify for a bug report?
> 
> Thanks for the hint of looking into this again!
> 
> Cheers
> 
> Carsten
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Yes this qualifies for a bug report. As a workaround for now, you can
compile in 64 bit mode.
I.e.:
$ ./configure CFLAGS="-g -O -m64"
$ make && make install

This works for Sun Studio 12 and gcc. For older version of Sun Studio,
you need to pass -xarch=v9 instead of -m64.

I am planning to release an updated version mbuffer this week. I'll
include a patch for this issue.

Cheers,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-14 Thread Carsten Aulbert
Hi again,

Thomas Maier-Komor wrote:
> Carsten Aulbert schrieb:
>> Hi Thomas,
> I don't know socat or what benefit it gives you, but have you tried
> using mbuffer to send and receive directly (options -I and -O)?

I thought we tried that in the past and with socat it seemed faster, but
I just made a brief test and I got (/dev/zero -> remote /dev/null) 330
MB/s with mbuffer+socat and 430MB/s with mbuffer alone.

> Additionally, try to set the block size of mbuffer to the recordsize of
> zfs (usually 128k):
> receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive
> sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1

We are using 32k since many of our user use tiny files (and then I need
to reduce the buffer size because of this 'funny' error):

mbuffer: fatal: Cannot address so much memory
(32768*65536=2147483648>1544040742911).

Does this qualify for a bug report?

Thanks for the hint of looking into this again!

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-14 Thread Thomas Maier-Komor
Carsten Aulbert schrieb:
> Hi Thomas,
> 
> Thomas Maier-Komor wrote:
> 
>> Carsten,
>>
>> the summary looks like you are using mbuffer. Can you elaborate on what
>> options you are passing to mbuffer? Maybe changing the blocksize to be
>> consistent with the recordsize of the zpool could improve performance.
>> Is the buffer running full or is it empty most of the time? Are you sure
>> that the network connection is 10Gb/s all the way through from machine
>> to machine?
> 
> Well spotted :)
> 
> right now plain mbuffer with plenty of buffer (-m 2048M) on both ends
> and I have not seen any buffer exceeding the 10% watermark level. The
> network connection are via Neterion XFrame II Sun Fire NICs then via CX4
> cables to our core switch where both boxes are directly connected
> (WovenSystmes EFX1000). netperf tells me that the TCP performance is
> close to 7.5 GBit/s duplex and if I use
> 
> cat /dev/zero | mbuffer | socat ---> socat | mbuffer > /dev/null
> 
> I easily see speeds of about 350-400 MB/s so I think the network is fine.
> 
> Cheers
> 
> Carsten
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

I don't know socat or what benefit it gives you, but have you tried
using mbuffer to send and receive directly (options -I and -O)?
Additionally, try to set the block size of mbuffer to the recordsize of
zfs (usually 128k):
receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive
sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1

As transmitting from /dev/zero to /dev/null is at a rate of 350MB/s, I
guess, you are really hitting the maximum speed of your zpool. From my
understanding, I'd guess sending is always slower than receiving,
because reads are random and writes are sequential. So it should be
quite normal that mbuffer's buffer doesn't really see a lot of usage.

Cheers,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Carsten Aulbert
Hi Thomas,

Thomas Maier-Komor wrote:

> 
> Carsten,
> 
> the summary looks like you are using mbuffer. Can you elaborate on what
> options you are passing to mbuffer? Maybe changing the blocksize to be
> consistent with the recordsize of the zpool could improve performance.
> Is the buffer running full or is it empty most of the time? Are you sure
> that the network connection is 10Gb/s all the way through from machine
> to machine?

Well spotted :)

right now plain mbuffer with plenty of buffer (-m 2048M) on both ends
and I have not seen any buffer exceeding the 10% watermark level. The
network connection are via Neterion XFrame II Sun Fire NICs then via CX4
cables to our core switch where both boxes are directly connected
(WovenSystmes EFX1000). netperf tells me that the TCP performance is
close to 7.5 GBit/s duplex and if I use

cat /dev/zero | mbuffer | socat ---> socat | mbuffer > /dev/null

I easily see speeds of about 350-400 MB/s so I think the network is fine.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Carsten Aulbert
Hi

Darren J Moffat wrote:

> 
> What are you using to transfer the data over the network ?
> 

Initially just plain ssh which was way to slow, now we use mbuffer on
both ends and socket transfer the data over via socat - I know that
mbuffer already allows this, but in a few tests socat seemed to be faster.

Sorry for not writing this into the first email.

Cheers

Carsten


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Thomas Maier-Komor
Carsten Aulbert schrieb:
> Hi all,
> 
> although I'm running all this in a Sol10u5 X4500, I hope I may ask this
> question here. If not, please let me know where to head to.
> 
> We are running several X4500 with only 3 raidz2 zpools since we want
> quite a bit of storage space[*], but the performance we get when using
> zfs send is sometimes really lousy. Of course this depends what's in the
> file system, but when doing a few backups today I have seen the following:
> 
> receiving full stream of atlashome/[EMAIL PROTECTED] into
> atlashome/BACKUP/[EMAIL PROTECTED]
> in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer   0% full
> summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s
> 
> So, a mere 15 GB were transferred in 45 minutes, another user's home
> which is quite large (7TB) took more than 42 hours to be transferred.
> Since all this is going a 10 Gb/s network and the CPUs are all idle I
> would really like to know why
> 
> * zfs send is so slow and
> * how can I improve the speed?
> 
> Thanks a lot for any hint
> 
> Cheers
> 
> Carsten
> 
> [*] we have some quite a few tests with more zpools but were not able to
> improve the speeds substantially. For this particular bad file system I
> still need to histogram the file sizes.
> 


Carsten,

the summary looks like you are using mbuffer. Can you elaborate on what
options you are passing to mbuffer? Maybe changing the blocksize to be
consistent with the recordsize of the zpool could improve performance.
Is the buffer running full or is it empty most of the time? Are you sure
that the network connection is 10Gb/s all the way through from machine
to machine?

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Darren J Moffat
Carsten Aulbert wrote:
> Hi all,
> 
> although I'm running all this in a Sol10u5 X4500, I hope I may ask this
> question here. If not, please let me know where to head to.
> 
> We are running several X4500 with only 3 raidz2 zpools since we want
> quite a bit of storage space[*], but the performance we get when using
> zfs send is sometimes really lousy. Of course this depends what's in the
> file system, but when doing a few backups today I have seen the following:
> 
> receiving full stream of atlashome/[EMAIL PROTECTED] into
> atlashome/BACKUP/[EMAIL PROTECTED]
> in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer   0% full
> summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s
> 
> So, a mere 15 GB were transferred in 45 minutes, another user's home
> which is quite large (7TB) took more than 42 hours to be transferred.
> Since all this is going a 10 Gb/s network and the CPUs are all idle I
> would really like to know why

What are you using to transfer the data over the network ?

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss