Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Tracey Bernath
On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone d...@geek.com.au wrote:

 On Sun, Feb 14, 2010 at 11:08:52PM -0600, Tracey Bernath wrote:
  Now, to add the second SSD ZIL/L2ARC for a mirror.

 Just be clear: mirror ZIL by all means, but don't mirror l2arc, just
 add more devices and let them load-balance.   This is especially true
 if you're sharing ssd writes with ZIL, as slices on the same devices.

 Well, the problem I am trying to solve is wouldn't it read 2x faster with
the mirror?  It seems once I can drive the single device to 10 queued
actions, and 100% busy, it would be more useful to have two channels to the
same data. Is ZFS not smart enough to understand that there are two
identical mirror devices in the cache to split requests to? Or, are you
saying that ZFS is smart enough to cache it in two places, although not
mirrored?

If the device itself was full, and items were falling off the L2ARC, then I
could see having two separate cache devices, but since I am only at about
50% utilization of the available capacity, and maxing out the IO, then
mirroring seemed smarter.

Am I missing something here?

Tracey



  I may even splurge for one more to get a three way mirror.

 With more devices, questions about selecting different devices
 appropriate for each purpose come into play.

  Now I need a bigger server

 See? :)

 --
 Dan.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Bob Friesenhahn

On Mon, 15 Feb 2010, Tracey Bernath wrote:


If the device itself was full, and items were falling off the L2ARC, then I 
could see having two
separate cache devices, but since I am only at about 50% utilization of the 
available capacity, and
maxing out the IO, then mirroring seemed smarter.

Am I missing something here?


I doubt it.  The only way to know for sure is to test it but it seems 
unlikely to me that zfs implementors would fail to load share the 
reads from mirrored L2ARC.  Richard's points about L2ARC bandwidth vs 
pool disk bandwidth are still good ones.  L2ARC is all about read 
latency, but L2ARC does not necessarily help with read bandwidth.  It 
is also useful to keep in mind that L2ARC offers at least 40x less 
bandwidth than ARC in RAM.  So always populate RAM first if you can 
afford it.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Daniel Carosone
On Mon, Feb 15, 2010 at 09:11:02PM -0600, Tracey Bernath wrote:
 On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone d...@geek.com.au wrote:
  Just be clear: mirror ZIL by all means, but don't mirror l2arc, just
  add more devices and let them load-balance.   This is especially true
  if you're sharing ssd writes with ZIL, as slices on the same devices.
 
  Well, the problem I am trying to solve is wouldn't it read 2x faster with
 the mirror?  It seems once I can drive the single device to 10 queued
 actions, and 100% busy, it would be more useful to have two channels to the
 same data. Is ZFS not smart enough to understand that there are two
 identical mirror devices in the cache to split requests to? Or, are you
 saying that ZFS is smart enough to cache it in two places, although not
 mirrored?

First, Bob is right, measurement trumps speculation.  Try it.

As for speculation, you're thinking only about reads.  I expect
reading from l2arc devices will be the same as reading from any other
zfs mirror, and largely the same in both cases above; load balanced
across either device.  In the rare case of a bad read from unmirrored
l2arc, data will be fetched from the pool, so mirroring l2arc doesn't
add any resiliency benefit.

However, your cache needs to be populated and maintained as well, and
this needs writes.  Twice as many of them for the mirror as for the
stripe. Half of what is written never needs to be read again. These
writes go to the same ssd devices you're using for ZIL, on commodity
ssd's which are not well write-optimised, they may be hurting zil
latency by making the ssd do more writing, stealing from the total
iops count on the channel, and (as a lesser concern) adding wear
cycles to the device.  

When you're already maxing out the IO, eliminating wasted cycles opens
your bottleneck, even if only a little. 

Once you reach steady state, I don't know how much turnover in l2arc
contents you will have, and therefore how many extra writes we're
talking about.  It may not be many, but they are unnecessary ones.  

Normally, we'd talk about measuring a potential benefit, and then
choosing based on the results.  In this case, if I were you I'd
eliminate the unnecessary writes, and measure the difference more as a
matter of curiosity and research, since I was already set up to do so.

--
Dan.



pgpqCi6va8O6V.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Richard Elling
On Feb 16, 2010, at 12:39 PM, Daniel Carosone wrote:
 On Mon, Feb 15, 2010 at 09:11:02PM -0600, Tracey Bernath wrote:
 On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone d...@geek.com.au wrote:
 Just be clear: mirror ZIL by all means, but don't mirror l2arc, just
 add more devices and let them load-balance.   This is especially true
 if you're sharing ssd writes with ZIL, as slices on the same devices.
 
 Well, the problem I am trying to solve is wouldn't it read 2x faster with
 the mirror?  It seems once I can drive the single device to 10 queued
 actions, and 100% busy, it would be more useful to have two channels to the
 same data. Is ZFS not smart enough to understand that there are two
 identical mirror devices in the cache to split requests to? Or, are you
 saying that ZFS is smart enough to cache it in two places, although not
 mirrored?
 
 First, Bob is right, measurement trumps speculation.  Try it.
 
 As for speculation, you're thinking only about reads.  I expect
 reading from l2arc devices will be the same as reading from any other
 zfs mirror, and largely the same in both cases above; load balanced
 across either device.  In the rare case of a bad read from unmirrored
 l2arc, data will be fetched from the pool, so mirroring l2arc doesn't
 add any resiliency benefit.
 
 However, your cache needs to be populated and maintained as well, and
 this needs writes.  Twice as many of them for the mirror as for the
 stripe. Half of what is written never needs to be read again. These
 writes go to the same ssd devices you're using for ZIL, on commodity
 ssd's which are not well write-optimised, they may be hurting zil
 latency by making the ssd do more writing, stealing from the total
 iops count on the channel, and (as a lesser concern) adding wear
 cycles to the device.  

The L2ARC writes are throttled to be 8MB/sec, except during cold
start where the throttle is 16MB/sec.  This should not be noticeable
on the channels.

 When you're already maxing out the IO, eliminating wasted cycles opens
 your bottleneck, even if only a little. 

+1 
 -- richard

 Once you reach steady state, I don't know how much turnover in l2arc
 contents you will have, and therefore how many extra writes we're
 talking about.  It may not be many, but they are unnecessary ones.  
 
 Normally, we'd talk about measuring a potential benefit, and then
 choosing based on the results.  In this case, if I were you I'd
 eliminate the unnecessary writes, and measure the difference more as a
 matter of curiosity and research, since I was already set up to do so.
 
 --
 Dan.
 

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 15-17, 2010)



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Fajar A. Nugraha
On Sun, Feb 14, 2010 at 12:51 PM, Tracey Bernath tbern...@ix.netcom.com wrote:
 I went from all four disks of the array at 100%, doing about 170 read
 IOPS/25MB/s
 to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
 off the cache drive (@ only 50% load).


 And, keep  in mind this was on less than $1000 of hardware.

really? complete box and all, or is it just the disks? Cause the 4
disks alone should cost about $400. Did you use ECC RAM?

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-15 Thread Tracey Bernath
For those following the saga:
With the prefetch problem fixed, and data coming off the L2ARC instead of
the disks, the system switched from IO bound to CPU bound, I opened up the
throttles with some explicit PARALLEL hints in the Oracle commands, and we
were finally able to max out the single SSD:


r/s  w/s kr/s  kw/s wait actv wsvc_t asvc_t  %w  %b device

  826.03.2 104361.8   35.2  0.0  9.90.0   12.0   3 100 c0t0d4

So, when we maxed out the SSD cache, it was delivering 100+MB/s, and 830
IOPS
with 3.4 TB behind it in a 4 disk SATA RAIDz1.

Still have to remap it to 8k blocks to get more efficiency, but for raw
numbers, it's right what I was
looking for. Now, to add the second SSD ZIL/L2ARC for a mirror. I may even
splurge for one more to
get a three way mirror. That will completely saturate the SCSI channel. Now
I need a bigger server

Did I mention it was $1000 for the whole setup? Bah-ha-ha-ha.

Tracey


On Sat, Feb 13, 2010 at 11:51 PM, Tracey Bernath tbern...@ix.netcom.comwrote:

 OK, that was the magic  incantation I  was looking for:
 - changing the noprefetch option opened the floodgates to the L2ARC
 - changing the max queue depth relived the wait time on the drives,
 although I may undo this again in the benchmarking since these drives all
 have NCQ

 I went from all four disks of the array at 100%, doing about 170 read
 IOPS/25MB/s
 to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
 off the cache drive (@ only 50% load).
 This bodes well for adding a second mirrored cache drive to push for the
 1KIOPS.

 Now I am ready to insert the mirror for the ZIL and the CACHE, and we will
 be ready
 for some production benchmarking.


 BEFORE:
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
   sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  80  0 82

   sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
   sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
   sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
   sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31



 AFTER:
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d1
 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d2
 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d3
   285.20.8 36236.2   14.4  0.0  0.50.01.8   1  37 c0t0d4


 And, keep  in mind this was on less than $1000 of hardware.

 Thanks for the pointers guys,
 Tracey



 On Sat, Feb 13, 2010 at 9:22 AM, Richard Elling 
 richard.ell...@gmail.comwrote:

 comment below...

 On Feb 12, 2010, at 2:25 PM, TMB wrote:
  I have a similar question, I put together a cheapo RAID with four 1TB WD
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
  # zpool status dpool
   pool: dpool
  state: ONLINE
  scrub: none requested
  config:
 
 NAMESTATE READ WRITE CKSUM
 dpool   ONLINE   0 0 0
   raidz1ONLINE   0 0 0
 c0t0d0  ONLINE   0 0 0
 c0t0d1  ONLINE   0 0 0
 c0t0d2  ONLINE   0 0 0
 c0t0d3  ONLINE   0 0 0
  [b]logs
   c0t0d4s0  ONLINE   0 0 0[/b]
  [b]cache
   c0t0d4s1  ONLINE   0 0 0[/b]
 spares
   c0t0d6AVAIL
   c0t0d7AVAIL
 
capacity operationsbandwidth
  pool used  avail   read  write   read  write
  --  -  -  -  -  -  -
  dpool   72.1G  3.55T237 12  29.7M   597K
   raidz172.1G  3.55T237  9  29.7M   469K
 c0t0d0  -  -166  3  7.39M   157K
 c0t0d1  -  -166  3  7.44M   157K
 c0t0d2  -  -166  3  7.39M   157K
 c0t0d3  -  -167  3  7.45M   157K
   c0t0d4s020K  4.97G  0  3  0   127K
  cache   -  -  -  -  -  -
   c0t0d4s1  17.6G  36.4G  3  1   249K   119K
  --  -  -  -  -  -  -
  I just don't seem to be getting any bang for the buck I should be.  This
 was taken while rebuilding an Oracle index, all files stored in this pool.
  The WD disks are at 100%, and nothing is coming from the cache.  The cache
 does have the entire DB cached (17.6G used), but hardly reads anything from
 it.  I also am not seeing the spike of data flowing into the ZIL either,
 although iostat show there is just write traffic hitting the SSD:
 
  extended device statistics  cpu
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  

Re: [zfs-discuss] SSD and ZFS

2010-02-15 Thread Daniel Carosone
On Sun, Feb 14, 2010 at 11:08:52PM -0600, Tracey Bernath wrote:
 Now, to add the second SSD ZIL/L2ARC for a mirror. 

Just be clear: mirror ZIL by all means, but don't mirror l2arc, just
add more devices and let them load-balance.   This is especially true
if you're sharing ssd writes with ZIL, as slices on the same devices.

 I may even splurge for one more to get a three way mirror.

With more devices, questions about selecting different devices
appropriate for each purpose come into play.

 Now I need a bigger server

See? :)

--
Dan.

pgpdQ822KAFIw.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-14 Thread Tracey Bernath
OK, that was the magic  incantation I  was looking for:
- changing the noprefetch option opened the floodgates to the L2ARC
- changing the max queue depth relived the wait time on the drives, although
I may undo this again in the benchmarking since these drives all have NCQ

I went from all four disks of the array at 100%, doing about 170 read
IOPS/25MB/s
to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
off the cache drive (@ only 50% load).
This bodes well for adding a second mirrored cache drive to push for the
1KIOPS.

Now I am ready to insert the mirror for the ZIL and the CACHE, and we will
be ready
for some production benchmarking.


 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
  sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  80  0 82
  sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
  sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
  sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
  sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d1
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d2
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d3
  285.20.8 36236.2   14.4  0.0  0.50.01.8   1  37 c0t0d4


And, keep  in mind this was on less than $1000 of hardware.

Thanks,
Tracey


On Sat, Feb 13, 2010 at 9:22 AM, Richard Elling richard.ell...@gmail.comwrote:

 comment below...

 On Feb 12, 2010, at 2:25 PM, TMB wrote:
  I have a similar question, I put together a cheapo RAID with four 1TB WD
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
  # zpool status dpool
   pool: dpool
  state: ONLINE
  scrub: none requested
  config:
 
 NAMESTATE READ WRITE CKSUM
 dpool   ONLINE   0 0 0
   raidz1ONLINE   0 0 0
 c0t0d0  ONLINE   0 0 0
 c0t0d1  ONLINE   0 0 0
 c0t0d2  ONLINE   0 0 0
 c0t0d3  ONLINE   0 0 0
  [b]logs
   c0t0d4s0  ONLINE   0 0 0[/b]
  [b]cache
   c0t0d4s1  ONLINE   0 0 0[/b]
 spares
   c0t0d6AVAIL
   c0t0d7AVAIL
 
capacity operationsbandwidth
  pool used  avail   read  write   read  write
  --  -  -  -  -  -  -
  dpool   72.1G  3.55T237 12  29.7M   597K
   raidz172.1G  3.55T237  9  29.7M   469K
 c0t0d0  -  -166  3  7.39M   157K
 c0t0d1  -  -166  3  7.44M   157K
 c0t0d2  -  -166  3  7.39M   157K
 c0t0d3  -  -167  3  7.45M   157K
   c0t0d4s020K  4.97G  0  3  0   127K
  cache   -  -  -  -  -  -
   c0t0d4s1  17.6G  36.4G  3  1   249K   119K
  --  -  -  -  -  -  -
  I just don't seem to be getting any bang for the buck I should be.  This
 was taken while rebuilding an Oracle index, all files stored in this pool.
  The WD disks are at 100%, and nothing is coming from the cache.  The cache
 does have the entire DB cached (17.6G used), but hardly reads anything from
 it.  I also am not seeing the spike of data flowing into the ZIL either,
 although iostat show there is just write traffic hitting the SSD:
 
  extended device statistics  cpu
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
  sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
  sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
  sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
  sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0
  sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
  [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]

 iostat has a n option, which is very useful for looking at device names
 :-)

 The SSD here is perfoming well.  The rest are clobbered. 205 millisecond
 response time will be agonizingly slow.

 By default, for this version of ZFS, up to 35 I/Os will be queued to the
 disk, which is why you see 35.0 in the actv column. The combination
 of actv=35 and svc_t200 indicates that this is the place to start working.
 Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to 4.
 This will reduce the concurrent load on the disks, thus reducing svc_t.

 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

  -- richard

  

Re: [zfs-discuss] SSD and ZFS

2010-02-13 Thread Tracey Bernath
Thanks Brendan,
I was going to move it over to 8kb block size once I got through this index
rebuild. My thinking was that a disproportionate block size would show up as
excessive IO thruput, not a lack of thruput.

The question about the cache comes from the fact that the 18GB or so that it
says is in the cache IS the database. This was why I was thinking the index
rebuild should be CPU constrained, and I should see a spike in reading from
the cache.  If the entire file is cached, why would it go to the disks at
all for the reads?

The disks are delivering about 30MB/s of reads, but this SSD is rated for
sustained 70MB/s, so there should be a chance to pick up 100% gain.

I've seen lots of mention of kernel settings, but those only seem to apply
to cache flushes on sync writes.

Any idea on where to look next? I've spent about a week tinkering with
it.I'm trying to get a major customer to switch over to zfs and an open
storage solution, but I'm afraid if I cant get it to work in the small
scale, I cant convince them about the large scale.

Thanks,
Tracey


On Fri, Feb 12, 2010 at 4:43 PM, Brendan Gregg - Sun Microsystems 
bren...@sun.com wrote:

 On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote:
  I have a similar question, I put together a cheapo RAID with four 1TB WD
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
  # zpool status dpool
pool: dpool
   state: ONLINE
   scrub: none requested
  config:
 
  NAMESTATE READ WRITE CKSUM
  dpool   ONLINE   0 0 0
raidz1ONLINE   0 0 0
  c0t0d0  ONLINE   0 0 0
  c0t0d1  ONLINE   0 0 0
  c0t0d2  ONLINE   0 0 0
  c0t0d3  ONLINE   0 0 0
  [b]logs
c0t0d4s0  ONLINE   0 0 0[/b]
  [b]cache
c0t0d4s1  ONLINE   0 0 0[/b]
  spares
c0t0d6AVAIL
c0t0d7AVAIL
 
 capacity operationsbandwidth
  pool used  avail   read  write   read  write
  --  -  -  -  -  -  -
  dpool   72.1G  3.55T237 12  29.7M   597K
raidz172.1G  3.55T237  9  29.7M   469K
  c0t0d0  -  -166  3  7.39M   157K
  c0t0d1  -  -166  3  7.44M   157K
  c0t0d2  -  -166  3  7.39M   157K
  c0t0d3  -  -167  3  7.45M   157K
c0t0d4s020K  4.97G  0  3  0   127K
  cache   -  -  -  -  -  -
c0t0d4s1  17.6G  36.4G  3  1   249K   119K
  --  -  -  -  -  -  -
  I just don't seem to be getting any bang for the buck I should be.  This
 was taken while rebuilding an Oracle index, all files stored in this pool.
  The WD disks are at 100%, and nothing is coming from the cache.  The cache
 does have the entire DB cached (17.6G used), but hardly reads anything from
 it.  I also am not seeing the spike of data flowing into the ZIL either,
 although iostat show there is just write traffic hitting the SSD:
 
   extended device statistics  cpu
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
  sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
  sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
  sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
  sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0
  sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
  [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
 
  Since this SSD is in a RAID array, and just presents as a regular disk
 LUN, is there a special incantation required to turn on the Turbo mode?
 
  Doesnt it seem that all this traffic should be maxing out the SSD? Reads
 from the cache, and writes to the ZIL? I have a seocnd identical SSD I
 wanted to add as a mirror, but it seems pointless if there's no zip to be
 had

 The most likely reason is that this workload has been identified as
 streaming
 by ZFS, which is prefetching from disk instead of the L2ARC
 (l2arc_nopreftch=1).

 It also looks like you've used a 128 Kbyte ZFS record size.  Is Oracle
 doing
 128 Kbyte random I/O?  We usually tune that down before creating the
 database;
 which will use the L2ARC device more efficiently.

 Brendan

 --
 Brendan Gregg, Fishworks
 http://blogs.sun.com/brendan




-- 
Tracey Bernath
913-488-6284
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-13 Thread Richard Elling
comment below...

On Feb 12, 2010, at 2:25 PM, TMB wrote:
 I have a similar question, I put together a cheapo RAID with four 1TB WD 
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with 
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
 # zpool status dpool
  pool: dpool
 state: ONLINE
 scrub: none requested
 config:
 
NAMESTATE READ WRITE CKSUM
dpool   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t0d1  ONLINE   0 0 0
c0t0d2  ONLINE   0 0 0
c0t0d3  ONLINE   0 0 0
 [b]logs
  c0t0d4s0  ONLINE   0 0 0[/b]
 [b]cache
  c0t0d4s1  ONLINE   0 0 0[/b]
spares
  c0t0d6AVAIL   
  c0t0d7AVAIL   
 
   capacity operationsbandwidth
 pool used  avail   read  write   read  write
 --  -  -  -  -  -  -
 dpool   72.1G  3.55T237 12  29.7M   597K
  raidz172.1G  3.55T237  9  29.7M   469K
c0t0d0  -  -166  3  7.39M   157K
c0t0d1  -  -166  3  7.44M   157K
c0t0d2  -  -166  3  7.39M   157K
c0t0d3  -  -167  3  7.45M   157K
  c0t0d4s020K  4.97G  0  3  0   127K
 cache   -  -  -  -  -  -
  c0t0d4s1  17.6G  36.4G  3  1   249K   119K
 --  -  -  -  -  -  -
 I just don't seem to be getting any bang for the buck I should be.  This was 
 taken while rebuilding an Oracle index, all files stored in this pool.  The 
 WD disks are at 100%, and nothing is coming from the cache.  The cache does 
 have the entire DB cached (17.6G used), but hardly reads anything from it.  I 
 also am not seeing the spike of data flowing into the ZIL either, although 
 iostat show there is just write traffic hitting the SSD:
 
 extended device statistics  cpu
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
 sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
 sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100 
 sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100 
 sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0 
 sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100 
 [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]

iostat has a n option, which is very useful for looking at device names :-)

The SSD here is perfoming well.  The rest are clobbered. 205 millisecond
response time will be agonizingly slow.

By default, for this version of ZFS, up to 35 I/Os will be queued to the
disk, which is why you see 35.0 in the actv column. The combination
of actv=35 and svc_t200 indicates that this is the place to start working.
Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to 4.
This will reduce the concurrent load on the disks, thus reducing svc_t.
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

 -- richard

 Since this SSD is in a RAID array, and just presents as a regular disk LUN, 
 is there a special incantation required to turn on the Turbo mode?
 
 Doesnt it seem that all this traffic should be maxing out the SSD? Reads from 
 the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to 
 add as a mirror, but it seems pointless if there's no zip to be had
 
 help?
 
 Thanks,
 Tracey
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD and ZFS

2010-02-12 Thread Andreas Höschler

Hi all,

just after sending a message to sunmanagers I realized that my question 
should rather have gone here. So sunmanagers please excus ethe double 
post:


I have inherited a X4140 (8 SAS slots) and have just setup the system 
with Solaris 10 09. I first setup the system on a mirrored pool over 
the first two disks


  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

and then tried to add the second pair of disks to this pool which did 
not work (famous error message reagding label, root pool BIOS issue). I 
therefore simply created an additional pool tank.


  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

 pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0

errors: No known data errors

So far so good. I have now replaced the last two SAS disks with 32GB 
SSDs and am wondering how to add these to the system. I googled a lot 
for best practise but found nothing so far that made me any wiser. My 
current approach still is to simply do


zpool add tank mirror c0t6d0 c0t7d0

as I would do with normal disks but I am wondering whether that's the 
right approach to significantly increase system performance. Will ZFS 
automatically use these SSDs and optimize accesses to tank? Probably! 
But it won't optimize accesses to rpool of course. Not sure whether I 
need that or should look for that. Should I try to get all disks into 
rpool inspite of the BIOS label issue so that SSDs are used for all 
accesses to the disk system?


Hints (best practises) are greatly appreciated?

Thanks a lot,

 Andreas


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread Scott Meilicke
I don't think adding an SSD mirror to an existing pool will do much for 
performance. Some of your data will surely go to those SSDs, but I don't think 
the solaris will know they are SSDs and move blocks in and out according to 
usage patterns to give you an all around boost. They will just be used to store 
data, nothing more.

Perhaps it will be more useful to add the SSDs as either an L2ARC or SLOG for 
the ZIL, but that will depend upon your work load. If you do NFS or iSCSI 
access, the putting the ZIL onto the SSD drive(s) will speed up writes. Added 
to the L2ARC will speed up reads.

Here is the ZFS best practices guide, which should help with this decision:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Read that, then come back with more questions.

Best,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread Brendan Gregg - Sun Microsystems
On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote:
 I have a similar question, I put together a cheapo RAID with four 1TB WD 
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with 
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
 # zpool status dpool
   pool: dpool
  state: ONLINE
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 dpool   ONLINE   0 0 0
   raidz1ONLINE   0 0 0
 c0t0d0  ONLINE   0 0 0
 c0t0d1  ONLINE   0 0 0
 c0t0d2  ONLINE   0 0 0
 c0t0d3  ONLINE   0 0 0
 [b]logs
   c0t0d4s0  ONLINE   0 0 0[/b]
 [b]cache
   c0t0d4s1  ONLINE   0 0 0[/b]
 spares
   c0t0d6AVAIL   
   c0t0d7AVAIL   
 
capacity operationsbandwidth
 pool used  avail   read  write   read  write
 --  -  -  -  -  -  -
 dpool   72.1G  3.55T237 12  29.7M   597K
   raidz172.1G  3.55T237  9  29.7M   469K
 c0t0d0  -  -166  3  7.39M   157K
 c0t0d1  -  -166  3  7.44M   157K
 c0t0d2  -  -166  3  7.39M   157K
 c0t0d3  -  -167  3  7.45M   157K
   c0t0d4s020K  4.97G  0  3  0   127K
 cache   -  -  -  -  -  -
   c0t0d4s1  17.6G  36.4G  3  1   249K   119K
 --  -  -  -  -  -  -
 I just don't seem to be getting any bang for the buck I should be.  This was 
 taken while rebuilding an Oracle index, all files stored in this pool.  The 
 WD disks are at 100%, and nothing is coming from the cache.  The cache does 
 have the entire DB cached (17.6G used), but hardly reads anything from it.  I 
 also am not seeing the spike of data flowing into the ZIL either, although 
 iostat show there is just write traffic hitting the SSD:
 
  extended device statistics  cpu
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
 sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
 sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100 
 sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100 
 sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0 
 sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100 
 [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
 
 Since this SSD is in a RAID array, and just presents as a regular disk LUN, 
 is there a special incantation required to turn on the Turbo mode?
 
 Doesnt it seem that all this traffic should be maxing out the SSD? Reads from 
 the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to 
 add as a mirror, but it seems pointless if there's no zip to be had

The most likely reason is that this workload has been identified as streaming
by ZFS, which is prefetching from disk instead of the L2ARC (l2arc_nopreftch=1).

It also looks like you've used a 128 Kbyte ZFS record size.  Is Oracle doing
128 Kbyte random I/O?  We usually tune that down before creating the database;
which will use the L2ARC device more efficiently.

Brendan

-- 
Brendan Gregg, Fishworks   http://blogs.sun.com/brendan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss