Re: [zfs-discuss] JBOD performance
> I was just wandering that maybe there's a problem with just one > disk... No, this is something I have observed on at least four different systems, with vastly varying hardware. Probably just the effects of the known problem. Thanks, -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org signature.asc Description: This is a digitally signed message part. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Hello Peter, Tuesday, December 18, 2007, 5:12:48 PM, you wrote: >> Sequential writing problem with process throttling - there's an open >> bug for it for quite a while. Try to lower txg_time to 1s - should >> help a little bit. PS> Yeah, my post was mostly to emphasize that on commodity hardware raidz2 does PS> not even come close to being a CPU bottleneck. It wasn't a poke at the PS> streaming performance. Very interesting to hear there's a bug open for it PS> though. >> Can you also post iostat -xnz 1 while you're doing dd? >> and zpool status PS> This was FreeBSD, but I can provide iostat -x if you still want it for some PS> reason. I was just wandering that maybe there's a problem with just one disk... -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
> Sequential writing problem with process throttling - there's an open > bug for it for quite a while. Try to lower txg_time to 1s - should > help a little bit. Yeah, my post was mostly to emphasize that on commodity hardware raidz2 does not even come close to being a CPU bottleneck. It wasn't a poke at the streaming performance. Very interesting to hear there's a bug open for it though. > Can you also post iostat -xnz 1 while you're doing dd? > and zpool status This was FreeBSD, but I can provide iostat -x if you still want it for some reason. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org signature.asc Description: This is a digitally signed message part. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Frank Penczek writes: > Hi, > > On Dec 17, 2007 4:18 PM, Roch - PAE <[EMAIL PROTECTED]> wrote: > > > > > > The pool holds home directories so small sequential writes to one > > > large file present one of a few interesting use cases. > > > > Can you be more specific here ? > > > > Do you have a body of application that would do small > > sequential writes; or one in particular ? Another > > interesting info is if we expect those to be allocating > > writes or overwrite (beware that some app, move the old file > > out, then run allocating writes, then unlink the original > > file). > > Sorry, I try to be more specific. > The zpool contains home directories that are exported to client machines. > It is hard to predict what exactly users are doing, but one thing users do > for > certain is checking out software projects from our subversion server. The > projects typically contain many source code files (thousands) and a > build process > accesses all of them in the worst case. That is what I meant by "many (small) > files like compiling projects" in my previous post. The performance > for this case > is ... hopefully improvable. > This we'll have to work on. But first, If this is to Storage with NVRAM, I assume you checked that the storage does not flush it's caches : http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes If that is not your problem and if ZFS underperform another FS on the backend of NFS, then this needs investigation. If ZFS/NFS underperformance a direct attach FS that might just be an NFS issue not related to ZFS. Again that needs investigation. Performance gains won't happen unless we find out what doesn't work. > Now for sequential writes: > We don't have a specific application issuing sequential writes but I > can think of > at least a few cases where these writes may occur, e.g. > dumps of substantial amounts of measurement data or growing log files > of applications. > In either case these would be mainly allocating writes. > Right but I'd hope the application would issue substantially large writes specially if it' needs to dump data at high rate. If the data rate is more modest, then the CPU lost to this effect will itself be modest. > Does this provide the information you're interested in? > I get a sense that it's more important we find out what is your build issue is. But the small writes will have to be improved one day also. -r > > Cheers, > Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Hi, On Dec 17, 2007 4:18 PM, Roch - PAE <[EMAIL PROTECTED]> wrote: > > > > The pool holds home directories so small sequential writes to one > > large file present one of a few interesting use cases. > > Can you be more specific here ? > > Do you have a body of application that would do small > sequential writes; or one in particular ? Another > interesting info is if we expect those to be allocating > writes or overwrite (beware that some app, move the old file > out, then run allocating writes, then unlink the original > file). Sorry, I try to be more specific. The zpool contains home directories that are exported to client machines. It is hard to predict what exactly users are doing, but one thing users do for certain is checking out software projects from our subversion server. The projects typically contain many source code files (thousands) and a build process accesses all of them in the worst case. That is what I meant by "many (small) files like compiling projects" in my previous post. The performance for this case is ... hopefully improvable. Now for sequential writes: We don't have a specific application issuing sequential writes but I can think of at least a few cases where these writes may occur, e.g. dumps of substantial amounts of measurement data or growing log files of applications. In either case these would be mainly allocating writes. Does this provide the information you're interested in? Cheers, Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
>> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 48.00.0 3424.6 0.0 35.00.0 728.9 0 100 c2t8d0 > That service time is just terrible! yea, that service time is unreasonable. almost a second for each command? and 35 more commands queued? (reorder = faster) I had a server with similar service times, so I repaired a replacement blade and when I went to slid it in, noticed a loud noise coming from the blade below it.. notified the windows person who owned it and it had been "broken" for some time and turned it off... it was much better after that. vibration... check vibration. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Frank Penczek writes: > Hi, > > On Dec 17, 2007 10:37 AM, Roch - PAE <[EMAIL PROTECTED]> wrote: > > > > > > dd uses a default block size of 512B. Does this map to your > > expected usage ? When I quickly tested the CPU cost of small > > read from cache, I did see that ZFS was more costly than UFS > > up to a crossover between 8K and 16K. We might need a more > > comprehensive study of that (data in/out of cache, different > > recordsize &alignment constraints ). But for small > > syscalls, I think we might need some work in ZFS to make it > > CPU efficient. > > > > So first, does small sequential writeto a large file, > > matches an interesting use case ? > > The pool holds home directories so small sequential writes to one > large file present one of a few interesting use cases. Can you be more specific here ? Do you have a body of application that would do small sequential writes; or one in particular ? Another interesting info is if we expect those to be allocating writes or overwrite (beware that some app, move the old file out, then run allocating writes, then unlink the original file). > The performance is equally disappointing for many (small) files > like compiling projects in svn repositories. > ??? -r > Cheers, > Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Hi, On Dec 17, 2007 10:37 AM, Roch - PAE <[EMAIL PROTECTED]> wrote: > > > dd uses a default block size of 512B. Does this map to your > expected usage ? When I quickly tested the CPU cost of small > read from cache, I did see that ZFS was more costly than UFS > up to a crossover between 8K and 16K. We might need a more > comprehensive study of that (data in/out of cache, different > recordsize &alignment constraints ). But for small > syscalls, I think we might need some work in ZFS to make it > CPU efficient. > > So first, does small sequential writeto a large file, > matches an interesting use case ? The pool holds home directories so small sequential writes to one large file present one of a few interesting use cases. The performance is equally disappointing for many (small) files like compiling projects in svn repositories. Cheers, Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
dd uses a default block size of 512B. Does this map to your expected usage ? When I quickly tested the CPU cost of small read from cache, I did see that ZFS was more costly than UFS up to a crossover between 8K and 16K. We might need a more comprehensive study of that (data in/out of cache, different recordsize &alignment constraints ). But for small syscalls, I think we might need some work in ZFS to make it CPU efficient. So first, does small sequential writeto a large file, matches an interesting use case ? -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Robert Milkowski wrote: > Hello James, > > Sunday, December 16, 2007, 9:54:18 PM, you wrote: > > JCM> hi Frank, > > JCM> there is an interesting pattern here (at least, to my > JCM> untrained eyes) - your %b starts off quite low: > JCM> All of which, to me, look like you're filling a buffer > JCM> or two. > > JCM> I don't recall the config of your zpool, but if the > JCM> devices are disks that are direct or san-attached, I > JCM> would be wondering about their outstanding queue depths. > > JCM> I think it's time to break out some D to find out where > JCM> in the stack the bottleneck(s) really are. > Maybe he could try to limit # of queued request per disk in zfs to > something smaller than default 35 (maybe even down to 1?) Hi Robert, yup, that's on my list of things for Frank to try. I've asked for a bit more config information though so we can get a bit of clarity on that front first. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Hello James, Sunday, December 16, 2007, 9:54:18 PM, you wrote: JCM> hi Frank, JCM> there is an interesting pattern here (at least, to my JCM> untrained eyes) - your %b starts off quite low: JCM> Frank Penczek wrote: JCM> >> --- >> dd'ing to NFS mount: >> [EMAIL PROTECTED]://tmp> dd if=./file.tmp of=/home/fpz/file.tmp >> 20+0 records in >> 20+0 records out >> 10240 bytes (102 MB) copied, 11.3959 seconds, 9.0 MB/s >> >> # iostat -xnz 1 >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 2.8 17.3 149.4 127.6 0.0 1.30.0 66.0 0 12 c2t8d0 >> 2.8 17.3 149.4 127.6 0.0 1.30.0 65.9 0 13 c2t9d0 >> 2.8 17.3 149.3 127.6 0.0 1.30.0 66.1 0 13 c2t10d0 >> 2.8 17.3 149.3 127.6 0.0 1.30.0 66.4 0 13 c2t11d0 >> 2.8 17.3 149.5 127.6 0.0 1.30.0 66.5 0 13 c2t12d0 >> 0.31.05.4 133.9 0.0 0.00.1 27.2 0 1 c1t1d0 >> 0.50.3 26.8 16.5 0.0 0.00.1 11.1 0 0 c1t0d0 >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.01.00.08.0 0.0 0.00.08.9 0 1 c1t1d0 >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 10.00.07.0 0.0 0.00.00.5 0 0 c2t8d0 >> 0.0 10.00.07.5 0.0 0.00.00.5 0 1 c2t9d0 >> 0.0 10.00.06.0 0.0 0.00.00.7 0 1 c2t10d0 >> 0.0 10.00.07.0 0.0 0.00.00.3 0 0 c2t11d0 >> 0.0 10.00.07.5 0.0 0.00.00.3 0 0 c2t12d0 JCM> then it jumps - roughly, quadrupling >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 67.60.0 1298.6 0.0 9.80.2 145.2 1 71 c2t8d0 >> 0.0 64.80.0 1139.4 0.0 9.20.0 141.8 0 69 c2t9d0 >> 0.0 59.20.0 898.9 0.0 8.60.0 144.9 0 68 c2t10d0 >> 0.0 67.60.0 1379.4 0.0 9.50.0 140.0 0 68 c2t11d0 >> 0.0 70.40.0 1257.3 0.0 11.40.0 162.1 0 73 c2t12d0 JCM> then it maxes out and stays that way >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 43.80.0 3068.5 0.0 34.90.0 796.0 0 100 c2t8d0 >> 0.0 55.60.0 3891.9 0.0 34.70.0 624.9 0 100 c2t9d0 >> 0.0 58.80.0 4211.9 0.0 33.40.0 568.2 0 100 c2t10d0 >> 0.0 49.20.0 3388.6 0.0 34.50.0 702.3 0 100 c2t11d0 >> 0.0 57.70.0 3805.3 0.0 34.30.0 594.0 0 100 c2t12d0 >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 60.00.0 4279.6 0.0 35.00.0 583.2 0 100 c2t8d0 >> 0.0 48.00.0 3423.7 0.0 35.00.0 729.1 0 100 c2t9d0 >> 0.0 41.00.0 2910.3 0.0 35.00.0 853.6 0 100 c2t10d0 >> 0.0 50.00.0 3552.2 0.0 35.00.0 699.9 0 100 c2t11d0 >> 0.0 48.00.0 3423.7 0.0 35.00.0 729.1 0 100 c2t12d0 >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 48.00.0 3424.6 0.0 35.00.0 728.9 0 100 c2t8d0 >> 0.0 60.00.0 4280.8 0.0 35.00.0 583.1 0 100 c2t9d0 >> 0.0 55.00.0 3938.2 0.0 35.00.0 636.1 0 100 c2t10d0 >> 0.0 56.00.0 4024.3 0.0 35.00.0 624.7 0 100 c2t11d0 >> 0.0 48.00.0 3424.6 0.0 35.00.0 728.9 0 100 c2t12d0 >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 52.00.0 3723.5 0.0 35.00.0 672.9 0 100 c2t8d0 >> 0.0 43.00.0 3081.5 0.0 35.00.0 813.8 0 100 c2t9d0 >> 0.0 46.00.0 3296.0 0.0 35.00.0 760.7 0 100 c2t10d0 >> 0.0 48.00.0 3424.0 0.0 35.00.0 729.0 0 100 c2t11d0 >> 0.0 62.00.0 4408.1 0.0 35.00.0 564.4 0 100 c2t12d0 >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 60.00.0 4279.8 0.0 35.00.0 583.2 0 100 c2t8d0 >> 0.0 57.00.0 4065.8 0.0 35.00.0 613.9 0 100 c2t9d0 >> 0.0 59.00.0 4194.3 0.0 35.00.0 593.1 0 100 c2t10d0 >> 0.0 56.00.0 4023.3 0.0 35.00.0 624.9 0 100 c2t11d0 >> 0.0 48.00.0 3424.3 0.0 35.00.0 729.1 0 100 c2t12d0 JCM> drops back a fraction >> extended device statistics >> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >> 0.0 65.70.0 1385.0 0.0 14.50.0 220.8 0 90 c2t8d0 >> 0.9 68.4 39.8 1623.6 0.0 13.00.0 187.8 0
Re: [zfs-discuss] JBOD performance
hi Frank, there is an interesting pattern here (at least, to my untrained eyes) - your %b starts off quite low: Frank Penczek wrote: > --- > dd'ing to NFS mount: > [EMAIL PROTECTED]://tmp> dd if=./file.tmp of=/home/fpz/file.tmp > 20+0 records in > 20+0 records out > 10240 bytes (102 MB) copied, 11.3959 seconds, 9.0 MB/s > > # iostat -xnz 1 > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 2.8 17.3 149.4 127.6 0.0 1.30.0 66.0 0 12 c2t8d0 > 2.8 17.3 149.4 127.6 0.0 1.30.0 65.9 0 13 c2t9d0 > 2.8 17.3 149.3 127.6 0.0 1.30.0 66.1 0 13 c2t10d0 > 2.8 17.3 149.3 127.6 0.0 1.30.0 66.4 0 13 c2t11d0 > 2.8 17.3 149.5 127.6 0.0 1.30.0 66.5 0 13 c2t12d0 > 0.31.05.4 133.9 0.0 0.00.1 27.2 0 1 c1t1d0 > 0.50.3 26.8 16.5 0.0 0.00.1 11.1 0 0 c1t0d0 > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.01.00.08.0 0.0 0.00.08.9 0 1 c1t1d0 > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 10.00.07.0 0.0 0.00.00.5 0 0 c2t8d0 > 0.0 10.00.07.5 0.0 0.00.00.5 0 1 c2t9d0 > 0.0 10.00.06.0 0.0 0.00.00.7 0 1 c2t10d0 > 0.0 10.00.07.0 0.0 0.00.00.3 0 0 c2t11d0 > 0.0 10.00.07.5 0.0 0.00.00.3 0 0 c2t12d0 then it jumps - roughly, quadrupling > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 67.60.0 1298.6 0.0 9.80.2 145.2 1 71 c2t8d0 > 0.0 64.80.0 1139.4 0.0 9.20.0 141.8 0 69 c2t9d0 > 0.0 59.20.0 898.9 0.0 8.60.0 144.9 0 68 c2t10d0 > 0.0 67.60.0 1379.4 0.0 9.50.0 140.0 0 68 c2t11d0 > 0.0 70.40.0 1257.3 0.0 11.40.0 162.1 0 73 c2t12d0 then it maxes out and stays that way > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 43.80.0 3068.5 0.0 34.90.0 796.0 0 100 c2t8d0 > 0.0 55.60.0 3891.9 0.0 34.70.0 624.9 0 100 c2t9d0 > 0.0 58.80.0 4211.9 0.0 33.40.0 568.2 0 100 c2t10d0 > 0.0 49.20.0 3388.6 0.0 34.50.0 702.3 0 100 c2t11d0 > 0.0 57.70.0 3805.3 0.0 34.30.0 594.0 0 100 c2t12d0 > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 60.00.0 4279.6 0.0 35.00.0 583.2 0 100 c2t8d0 > 0.0 48.00.0 3423.7 0.0 35.00.0 729.1 0 100 c2t9d0 > 0.0 41.00.0 2910.3 0.0 35.00.0 853.6 0 100 c2t10d0 > 0.0 50.00.0 3552.2 0.0 35.00.0 699.9 0 100 c2t11d0 > 0.0 48.00.0 3423.7 0.0 35.00.0 729.1 0 100 c2t12d0 > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 48.00.0 3424.6 0.0 35.00.0 728.9 0 100 c2t8d0 > 0.0 60.00.0 4280.8 0.0 35.00.0 583.1 0 100 c2t9d0 > 0.0 55.00.0 3938.2 0.0 35.00.0 636.1 0 100 c2t10d0 > 0.0 56.00.0 4024.3 0.0 35.00.0 624.7 0 100 c2t11d0 > 0.0 48.00.0 3424.6 0.0 35.00.0 728.9 0 100 c2t12d0 > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 52.00.0 3723.5 0.0 35.00.0 672.9 0 100 c2t8d0 > 0.0 43.00.0 3081.5 0.0 35.00.0 813.8 0 100 c2t9d0 > 0.0 46.00.0 3296.0 0.0 35.00.0 760.7 0 100 c2t10d0 > 0.0 48.00.0 3424.0 0.0 35.00.0 729.0 0 100 c2t11d0 > 0.0 62.00.0 4408.1 0.0 35.00.0 564.4 0 100 c2t12d0 > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 60.00.0 4279.8 0.0 35.00.0 583.2 0 100 c2t8d0 > 0.0 57.00.0 4065.8 0.0 35.00.0 613.9 0 100 c2t9d0 > 0.0 59.00.0 4194.3 0.0 35.00.0 593.1 0 100 c2t10d0 > 0.0 56.00.0 4023.3 0.0 35.00.0 624.9 0 100 c2t11d0 > 0.0 48.00.0 3424.3 0.0 35.00.0 729.1 0 100 c2t12d0 drops back a fraction > extended device statistics > r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 65.70.0 1385.0 0.0 14.50.0 220.8 0 90 c2t8d0 > 0.9 68.4 39.8 1623.6 0.0 13.00.0 187.8 0 87 c2t9d0 > 0.9 74.9 39.3 2054.6 0.0 16.70.0 219.6 0 94 c2t10d0 > 0.9 70.3 39.3 1662.9 0.0 15.40.0 216.1 0 95 c2t11d0 > 0.0 68.40.0 1
Re: [zfs-discuss] JBOD performance
> r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 0.0 48.00.0 3424.6 0.0 35.00.0 728.9 0 100 c2t8d0 > 0.0 60.00.0 4280.8 0.0 35.00.0 583.1 0 100 c2t9d0 > 0.0 55.00.0 3938.2 0.0 35.00.0 636.1 0 100 c2t10d0 > 0.0 56.00.0 4024.3 0.0 35.00.0 624.7 0 100 c2t11d0 > 0.0 48.00.0 3424.6 0.0 35.00.0 728.9 0 100 c2t12d0 That service time is just terrible! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Hi, On Dec 14, 2007 8:24 PM, Richard Elling <[EMAIL PROTECTED]> wrote: > Frank Penczek wrote: > > > > The performance is slightly disappointing. Does anyone have > > a similar setup and can anyone share some figures? > > Any pointers to possible improvements are greatly appreciated. > > > > > > Use a faster processor or change to a mirrored configuration. > raidz2 can become processor bound in the Reed-Soloman calculations > for the 2nd parity set. You should be able to see this in mpstat, and to > a coarser grain in vmstat. > -- richard Thanks for the hint. When dd'ing to the pool, mpstat tells me: # mpstat 1 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 46 0 67 518 413 5094 46 260 2721 3 0 96 1 44 0 64 1765 141 5765 46 260 2561 3 0 96 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 10 804 698 11304 69 520 2180 5 0 95 17 0 10 3301 390 11895 79 640 1060 4 0 96 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 04 294 188 6013 56 590850 2 0 98 10 04 1029 319 5931 57 610780 2 0 98 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 02 303 198 2382 33 100 1450 0 0 100 10 04 283 74 2613 3080 1590 1 0 99 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 02 0 113 1055 885 813 137 120 209 328337 57 0 36 1 90 2 74 3622 220 1328 74 118 49 18 199565 45 0 50 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 14 0 115 438 191 876 181 127 400 324257 54 0 39 1 14 0 197 1513 453 671 118 132 540 239295 53 0 42 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 98 901 679 1726 163 189 430 267185 48 0 47 10 0 121 3722 508 843 171 194 320 299476 53 0 41 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 50 418 185 905 162 109 210 318847 54 0 39 10 0 135 1772 550 670 102 107 370 238825 55 0 40 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 58 353 184 330 106 105 170 30134 28 47 0 26 11 0 74 862 250 604 128 106 270 263126 45 0 50 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 60 1087 851 1542 231 174 311 338007 61 0 32 10 0 136 4191 444 1072 125 165 391 202734 52 0 44 ... Based on the 'idl' column I interpret these numbers as "there are resources left" or is it me being naive? Cheers, Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Hi, sorry for the lengthy post ... On Dec 15, 2007 1:56 PM, Robert Milkowski <[EMAIL PROTECTED]> wrote: [...] > Sequential writing problem with process throttling - there's an open > bug for it for quite a while. Try to lower txg_time to 1s - should > help a little bit. Since setting txg_time to 1 the periodic drop in bandwidth seems to have gone. That's great. Unfortunately the performance is still not amazing - 10MB/s over the network and not more... > Can you also post iostat -xnz 1 while you're doing dd? > and zpool status --- # zpool status pool: storage_array state: ONLINE scrub: scrub completed with 0 errors on Wed Dec 12 23:38:36 2007 config: NAME STATE READ WRITE CKSUM storage_array ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 errors: No known data errors --- dd'ing to NFS mount: [EMAIL PROTECTED]://tmp> dd if=./file.tmp of=/home/fpz/file.tmp 20+0 records in 20+0 records out 10240 bytes (102 MB) copied, 11.3959 seconds, 9.0 MB/s # iostat -xnz 1 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 2.8 17.3 149.4 127.6 0.0 1.30.0 66.0 0 12 c2t8d0 2.8 17.3 149.4 127.6 0.0 1.30.0 65.9 0 13 c2t9d0 2.8 17.3 149.3 127.6 0.0 1.30.0 66.1 0 13 c2t10d0 2.8 17.3 149.3 127.6 0.0 1.30.0 66.4 0 13 c2t11d0 2.8 17.3 149.5 127.6 0.0 1.30.0 66.5 0 13 c2t12d0 0.31.05.4 133.9 0.0 0.00.1 27.2 0 1 c1t1d0 0.50.3 26.8 16.5 0.0 0.00.1 11.1 0 0 c1t0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.01.00.08.0 0.0 0.00.08.9 0 1 c1t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 10.00.07.0 0.0 0.00.00.5 0 0 c2t8d0 0.0 10.00.07.5 0.0 0.00.00.5 0 1 c2t9d0 0.0 10.00.06.0 0.0 0.00.00.7 0 1 c2t10d0 0.0 10.00.07.0 0.0 0.00.00.3 0 0 c2t11d0 0.0 10.00.07.5 0.0 0.00.00.3 0 0 c2t12d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 67.60.0 1298.6 0.0 9.80.2 145.2 1 71 c2t8d0 0.0 64.80.0 1139.4 0.0 9.20.0 141.8 0 69 c2t9d0 0.0 59.20.0 898.9 0.0 8.60.0 144.9 0 68 c2t10d0 0.0 67.60.0 1379.4 0.0 9.50.0 140.0 0 68 c2t11d0 0.0 70.40.0 1257.3 0.0 11.40.0 162.1 0 73 c2t12d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 43.80.0 3068.5 0.0 34.90.0 796.0 0 100 c2t8d0 0.0 55.60.0 3891.9 0.0 34.70.0 624.9 0 100 c2t9d0 0.0 58.80.0 4211.9 0.0 33.40.0 568.2 0 100 c2t10d0 0.0 49.20.0 3388.6 0.0 34.50.0 702.3 0 100 c2t11d0 0.0 57.70.0 3805.3 0.0 34.30.0 594.0 0 100 c2t12d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 60.00.0 4279.6 0.0 35.00.0 583.2 0 100 c2t8d0 0.0 48.00.0 3423.7 0.0 35.00.0 729.1 0 100 c2t9d0 0.0 41.00.0 2910.3 0.0 35.00.0 853.6 0 100 c2t10d0 0.0 50.00.0 3552.2 0.0 35.00.0 699.9 0 100 c2t11d0 0.0 48.00.0 3423.7 0.0 35.00.0 729.1 0 100 c2t12d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 48.00.0 3424.6 0.0 35.00.0 728.9 0 100 c2t8d0 0.0 60.00.0 4280.8 0.0 35.00.0 583.1 0 100 c2t9d0 0.0 55.00.0 3938.2 0.0 35.00.0 636.1 0 100 c2t10d0 0.0 56.00.0 4024.3 0.0 35.00.0 624.7 0 100 c2t11d0 0.0 48.00.0 3424.6 0.0 35.00.0 728.9 0 100 c2t12d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 52.00.0 3723.5 0.0 35.00.0 672.9 0 100 c2t8d0 0.0 43.00.0 3081.5 0.0 35.00.0 813.8 0 100 c2t9d0 0.0 46.00.0 3296.0 0.0 35.00.0 760.7 0 100 c2t10d0 0.0 48.00.0 3424.0 0.0 35.00.0 729.0 0 100 c2t11d0 0.0 62.00.0 4408.1 0.0 35.00.0 564.4 0 100 c2t12d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 60.00.0 4279.8 0.0 35.00.0 583.2 0 100 c2t8d
Re: [zfs-discuss] JBOD performance
Hi, On Dec 14, 2007 7:50 PM, Louwtjie Burger <[EMAIL PROTECTED]> wrote: [...] > I would have said ... to be expected, since the 280 came with a > 100Mbit interface. So a 9-12 MB/s peak would be acceptable. You did > mention a "gigabit switch"... did you install a gigabit HBA ? If > that's the case then yes, performance sucks. Yes, sorry, I forgot to mention that we're using a GigaSwift NIC in that machine. > fsstat? What 'fsstat' output are you interested in? For a start here's 'fsstat -F': # fsstat -F new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 660K 69.8K 16.0K 11.3M 170K 60.2M 154K 25.6M 14.0G 6.05M 58.9G ufs 0 0 0 43.6K 0 78.8K 14.0K 27.8K 10.0M 0 0 proc 0 0 021 0 0 0 0 0 0 0 nfs 30.7K 7.67K 11.2K 348M 56.9K 122M 4.61M 1.65M 25.1G 1.05M 58.2G zfs 0 0 0 574K 0 0 0 0 0 0 0 lofs 162K 17.3K 120K 273K 15.9K 1.48M 4.60K 418K 1.24G 1.10M 5.93G tmpfs 0 0 0 6.18K 0 0 051 9.31K 0 0 mntfs 0 0 0 0 0 0 0 0 0 0 0 nfs3 0 0 0 0 0 0 0 0 0 0 0 nfs4 0 0 043 0 0 0 0 0 0 0 autofs Thanks, Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Hello Peter, Saturday, December 15, 2007, 7:45:50 AM, you wrote: >> Use a faster processor or change to a mirrored configuration. >> raidz2 can become processor bound in the Reed-Soloman calculations >> for the 2nd parity set. You should be able to see this in mpstat, and to >> a coarser grain in vmstat. PS> Hmm. Is the OP's hardware *that* slow? (I don't know enough about the Sun PS> hardware models) PS> I have a 5-disk raidz2 (cheap SATA) here on my workstation, which is an X2 PS> 3800+ (i.e., one of the earlier AMD dual-core offerings). Here's me dd:ing to PS> a file on FreeBSD on ZFS running on that hardware: PS> promraid 741G 387G 0380 0 47.2M PS> promraid 741G 387G 0336 0 41.8M PS> promraid 741G 387G 0424510 51.0M PS> promraid 741G 387G 0441 0 54.5M PS> promraid 741G 387G 0514 0 19.2M PS> promraid 741G 387G 34192 4.12M 24.1M PS> promraid 741G 387G 0341 0 42.7M PS> promraid 741G 387G 0361 0 45.2M PS> promraid 741G 387G 0350 0 43.9M PS> promraid 741G 387G 0370 0 46.3M PS> promraid 741G 387G 1423 134K 51.7M PS> promraid 742G 386G 22329 2.39M 10.3M PS> promraid 742G 386G 28214 3.49M 26.8M PS> promraid 742G 386G 0347 0 43.5M PS> promraid 742G 386G 0349 0 43.7M PS> promraid 742G 386G 0354 0 44.3M PS> promraid 742G 386G 0365 0 45.7M PS> promraid 742G 386G 2460 7.49K 55.5M PS> At this point the bottleneck looks architectural rather than CPU. None of the PS> cores are saturated, and the CPU usage of the ZFS kernel threads is pretty PS> low. PS> I say architectural because writes to the underlying devices are not PS> sustained; it drops to almost zero for certain periods (this is more visible PS> in iostat -x than it is in the zpool statistics). What I think is happening PS> is that ZFS is too late to evict data in the cache, thus blocking the writing PS> process. Once a transaction group with a bunch of data gets committed the PS> application unblocks, but presumably ZFS waits for a little while before PS> resuming writes. PS> Note that this is also being run on plain hardware; it's not even PCI Express. PS> During throughput peaks, but not constantly, the bottleneck is probably the PS> PCI bus. Sequential writing problem with process throttling - there's an open bug for it for quite a while. Try to lower txg_time to 1s - should help a little bit. Can you also post iostat -xnz 1 while you're doing dd? and zpool status -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
> Use a faster processor or change to a mirrored configuration. > raidz2 can become processor bound in the Reed-Soloman calculations > for the 2nd parity set. You should be able to see this in mpstat, and to > a coarser grain in vmstat. Hmm. Is the OP's hardware *that* slow? (I don't know enough about the Sun hardware models) I have a 5-disk raidz2 (cheap SATA) here on my workstation, which is an X2 3800+ (i.e., one of the earlier AMD dual-core offerings). Here's me dd:ing to a file on FreeBSD on ZFS running on that hardware: promraid 741G 387G 0380 0 47.2M promraid 741G 387G 0336 0 41.8M promraid 741G 387G 0424510 51.0M promraid 741G 387G 0441 0 54.5M promraid 741G 387G 0514 0 19.2M promraid 741G 387G 34192 4.12M 24.1M promraid 741G 387G 0341 0 42.7M promraid 741G 387G 0361 0 45.2M promraid 741G 387G 0350 0 43.9M promraid 741G 387G 0370 0 46.3M promraid 741G 387G 1423 134K 51.7M promraid 742G 386G 22329 2.39M 10.3M promraid 742G 386G 28214 3.49M 26.8M promraid 742G 386G 0347 0 43.5M promraid 742G 386G 0349 0 43.7M promraid 742G 386G 0354 0 44.3M promraid 742G 386G 0365 0 45.7M promraid 742G 386G 2460 7.49K 55.5M At this point the bottleneck looks architectural rather than CPU. None of the cores are saturated, and the CPU usage of the ZFS kernel threads is pretty low. I say architectural because writes to the underlying devices are not sustained; it drops to almost zero for certain periods (this is more visible in iostat -x than it is in the zpool statistics). What I think is happening is that ZFS is too late to evict data in the cache, thus blocking the writing process. Once a transaction group with a bunch of data gets committed the application unblocks, but presumably ZFS waits for a little while before resuming writes. Note that this is also being run on plain hardware; it's not even PCI Express. During throughput peaks, but not constantly, the bottleneck is probably the PCI bus. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org signature.asc Description: This is a digitally signed message part. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
Frank Penczek wrote: > > The performance is slightly disappointing. Does anyone have > a similar setup and can anyone share some figures? > Any pointers to possible improvements are greatly appreciated. > > Use a faster processor or change to a mirrored configuration. raidz2 can become processor bound in the Reed-Soloman calculations for the 2nd parity set. You should be able to see this in mpstat, and to a coarser grain in vmstat. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] JBOD performance
> The throughput when writing from a local disk to the > zpool is around 30MB/s, when writing from a client Err.. sorry, the internal storage would be good old 1Gbit FCAL disks @ 10K rpm. Still, not the fastest around ;) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss