[zfs-discuss] Resilver / Scrub Status

2011-06-07 Thread Paul Kraus
I am running zpool 22 (Solaris 10U9) and I am looking for a way to
determine how much more work has to be done to complete a resilver
operation (it is already at 100%, but I know that is not a really
accurate number).

From my understanding of how the resilver operation works, it
walks the metadata structure and then the transaction groups. So if
there is no write (or snapshot or clone or ...) activity, once it
completes the walk of the metadata it is done (I assume the % complete
number is based on this). If there is write activity, it then replays
the TXG that came in after the resilver started. I have two zpools
that are resilvering and are having write activity. I know data is
still being committed to the devices being resilvered, but I am
looking for a way to determine how close they are to being done.

So is there a kernel structure I can look at (with kstat or mdb)
that will tell me how many TXG remain to be written to complete the
resilver ? I know this will be a dynamic number, but it will be a help
to determining if we should idle the replication job (in one of our
two cases) and catch up later (the replication happens over a WAN
link, so it is not very fast, 3 MB/sec. maybe) or just wait it out.

I'll be honest, I am nervous with a raidz2 vdev not at full
strength, and I am looking for some comfort :-)

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Separate Log Devices

2011-06-07 Thread Karl Rossing
The server I have currently have only has 2GB of ram. At some point, I 
will be adding more ram to the server but I'm not sure when.


I want to add a mirrored zil. I have 2 Intel 32GB SSDSA2SH032G1GN drives

As such, I have been reading the ZFS Best Practices Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Log_Devices

The guide suggests that the zil be sized to 1/2 the amount of ram in the 
server which would be 1GB.


I have a couple of questions

What happens if I oversize the zil?
If I create a 1GB slice for the zil, can I add another slice for another 
zil in the future when more ram is added?


Thanks
Karl








CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] L2ARC and poor read performance

2011-06-07 Thread Phil Harman

Ok here's the thing ...

A customer has some big tier 1 storage, and has presented 24 LUNs (from 
four RAID6 groups) to an OI148 box which is acting as a kind of iSCSI/FC 
bridge (using some of the cool features of ZFS along the way). The OI 
box currently has 32GB configured for the ARC, and 4x 223GB SSDs for 
L2ARC. It has a dual port QLogic HBA, and is currently configured to do 
round-robin MPXIO over two 4Gbps links. The iSCSI traffic is over a dual 
10Gbps card (rather like the one Sun used to sell).


I've just built a fresh pool, and have created 20x 100GB zvols which are 
mapped to iSCSI clients. I have initialised the first 20GB of each zvol 
with random data. I've had a lot of success with write performance (e.g. 
in earlier tests I had 20 parallel streams writing 100GB each at over 
600MB/sec aggregate), but read performance is very poor.


Right now I'm just playing with 20 parallel streams of reads from the 
first 2GB of each zvol (i.e. 40GB in all). During each run, I see lots 
of writes to the L2ARC, but less than a quarter the volume of reads. Yet 
my FC LUNS are hot with 1000s of reads per second. This doesn't change 
from run to run. Why?


Surely 20x 2GB of data (and it's associated metadata) will sit nicely in 
4x 223GB SSDs?


Phil
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate Log Devices

2011-06-07 Thread Christopher George
 The guide suggests that the zil be sized to 1/2 the amount of ram in the 
 server which would be 1GB.

The ZFS Best Practices Guide does detail the absolute maximum
size the ZIL can grow in theory, which as you stated is 1/2 the size
of the host's physical memory.  But in practice, the very next bullet 
point details the log device sizing equation which we have found to 
be a more relevant indicator.  Excerpt below:

For a target throughput of X MB/sec and given that ZFS pushes 
transaction groups every 5 seconds (and have 2 outstanding), we also 
expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service 
100MB/sec of synchronous writes, 1 GB of log device should be 
sufficient.

 What happens if I oversize the zil?

Oversizing the log device capacity has no negative repercussions other
than the under utilization of your SSD.

 If I create a 1GB slice for the zil, can I add another slice for another 
 zil in the future when more ram is added?

If the question is if multiple disk slices can be striped to aggregate 
capacity, then the answer is yes.  Be aware with most SSDs, including 
the Intel X25-E, using a disk slice instead of the entire device will 
automatically disable the on-board write cache.

Christopher George
Founder / CTO
http://www.ddrdrive.com/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC and poor read performance

2011-06-07 Thread Marty Scholes
I'll throw out some (possibly bad) ideas.

Is ARC satisfying the caching needs?  32 GB for ARC should almost cover the 
40GB of total reads, suggesting that the L2ARC doesn't add any value for this 
test.

Are the SSD devices saturated from an I/O standpoint?  Put another way, can ZFS 
put data to them fast enough?  If they aren't taking writes fast enough, then 
maybe they can't effectively load for caching.  Certainly if they are saturated 
for writes they can't do much for reads.

Are some of the reads sequential?  Sequential reads don't go to L2ARC.

What does iostat say for the SSD units?  What does arc_summary.pl (maybe 
spelled differently) say about the ARC / L2ARC usage?  How much of the SSD 
units are in use as reported in zpool iostat -v?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC and poor read performance

2011-06-07 Thread Phil Harman

On 07/06/2011 20:34, Marty Scholes wrote:

I'll throw out some (possibly bad) ideas.


Thanks for taking the time.


Is ARC satisfying the caching needs?  32 GB for ARC should almost cover the 
40GB of total reads, suggesting that the L2ARC doesn't add any value for this 
test.

Are the SSD devices saturated from an I/O standpoint?  Put another way, can ZFS 
put data to them fast enough?  If they aren't taking writes fast enough, then 
maybe they can't effectively load for caching.  Certainly if they are saturated 
for writes they can't do much for reads.


The SSDs are barely ticking over, and can deliver almost as much 
throughput as the current SAN storage.



Are some of the reads sequential?  Sequential reads don't go to L2ARC.


That'll be it. I assume the L2ARC is just taking metadata. In situations 
such as mine, I would quite like the option of routing sequential read 
data to the L2ARC also.


I do notice a benefit with a sequential update (i.e. COW for each 
block), and I think this is because the L2ARC satisfies most of the 
metadata reads instead of having to read them from the SAN.



What does iostat say for the SSD units?  What does arc_summary.pl (maybe 
spelled differently) say about the ARC / L2ARC usage?  How much of the SSD 
units are in use as reported in zpool iostat -v?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC and poor read performance

2011-06-07 Thread LaoTsao
You have un balance setup
Fc 4gbps vs 10gbps nic
After 10b/8b encoding it is even worse, but this not yet impact your benchmark 
yet

Sent from my iPad
Hung-Sheng Tsao ( LaoTsao) Ph.D

On Jun 7, 2011, at 5:46 PM, Phil Harman phil.har...@gmail.com wrote:

 On 07/06/2011 20:34, Marty Scholes wrote:
 I'll throw out some (possibly bad) ideas.
 
 Thanks for taking the time.
 
 Is ARC satisfying the caching needs?  32 GB for ARC should almost cover the 
 40GB of total reads, suggesting that the L2ARC doesn't add any value for 
 this test.
 
 Are the SSD devices saturated from an I/O standpoint?  Put another way, can 
 ZFS put data to them fast enough?  If they aren't taking writes fast enough, 
 then maybe they can't effectively load for caching.  Certainly if they are 
 saturated for writes they can't do much for reads.
 
 The SSDs are barely ticking over, and can deliver almost as much throughput 
 as the current SAN storage.
 
 Are some of the reads sequential?  Sequential reads don't go to L2ARC.
 
 That'll be it. I assume the L2ARC is just taking metadata. In situations such 
 as mine, I would quite like the option of routing sequential read data to the 
 L2ARC also.
 
 I do notice a benefit with a sequential update (i.e. COW for each block), and 
 I think this is because the L2ARC satisfies most of the metadata reads 
 instead of having to read them from the SAN.
 
 What does iostat say for the SSD units?  What does arc_summary.pl (maybe 
 spelled differently) say about the ARC / L2ARC usage?  How much of the SSD 
 units are in use as reported in zpool iostat -v?
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC and poor read performance

2011-06-07 Thread Phil Harman

On 07/06/2011 22:57, LaoTsao wrote:

You have un balance setup
Fc 4gbps vs 10gbps nic


It's actually 2x 4Gbps (using MPXIO) vs 1x 10Gbps.


After 10b/8b encoding it is even worse, but this not yet impact your benchmark 
yet

Sent from my iPad
Hung-Sheng Tsao ( LaoTsao) Ph.D

On Jun 7, 2011, at 5:46 PM, Phil Harmanphil.har...@gmail.com  wrote:


On 07/06/2011 20:34, Marty Scholes wrote:

I'll throw out some (possibly bad) ideas.

Thanks for taking the time.


Is ARC satisfying the caching needs?  32 GB for ARC should almost cover the 
40GB of total reads, suggesting that the L2ARC doesn't add any value for this 
test.

Are the SSD devices saturated from an I/O standpoint?  Put another way, can ZFS 
put data to them fast enough?  If they aren't taking writes fast enough, then 
maybe they can't effectively load for caching.  Certainly if they are saturated 
for writes they can't do much for reads.

The SSDs are barely ticking over, and can deliver almost as much throughput as 
the current SAN storage.


Are some of the reads sequential?  Sequential reads don't go to L2ARC.

That'll be it. I assume the L2ARC is just taking metadata. In situations such 
as mine, I would quite like the option of routing sequential read data to the 
L2ARC also.

I do notice a benefit with a sequential update (i.e. COW for each block), and I 
think this is because the L2ARC satisfies most of the metadata reads instead of 
having to read them from the SAN.


What does iostat say for the SSD units?  What does arc_summary.pl (maybe 
spelled differently) say about the ARC / L2ARC usage?  How much of the SSD 
units are in use as reported in zpool iostat -v?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Wired write performance problem

2011-06-07 Thread Ding Honghui

Hi,

I got a wired write performance and need your help.

One day, the write performance of zfs degrade.
The write performance decrease from 60MB/s to about 6MB/s in sequence write.

Command:
date;dd if=/dev/zero of=block bs=1024*128 count=1;date

The hardware configuration is 1 Dell MD3000 and 1 MD1000 with 30 disks.
The OS is Solaris 10U8, zpool version 15 and zfs version 4.

I run Dtrace to trace the write performance:

fbt:zfs:zfs_write:entry
{
self-ts = timestamp;
}


fbt:zfs:zfs_write:return
/self-ts/
{
@time = quantize(timestamp-self-ts);
self-ts = 0;
}

It shows
   value  - Distribution - count
8192 | 0
   16384 | 16
   32768 | 3270
   65536 |@@@  898
  131072 |@@@  985
  262144 | 33
  524288 | 1
 1048576 | 1
 2097152 | 3
 4194304 | 0
 8388608 |@180
16777216 | 33
33554432 | 0
67108864 | 0
   134217728 | 0
   268435456 | 1
   536870912 | 1
  1073741824 | 2
  2147483648 | 0
  4294967296 | 0
  8589934592 | 0
 17179869184 | 2
 34359738368 | 3
 68719476736 | 0

Compare to a working well storage(1 MD3000), the max write time of 
zfs_write is 4294967296, it is about 10 times faster.


Any suggestions?

Thanks
Ding

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Wired write performance problem

2011-06-07 Thread Ding Honghui

And one comment:
When we do write operation(by command dd), heavy read operation 
increased from zero to 3M for each disk,

and the write bandwidth is poor.
The disk io %b increase from 0 to about 60.

I don't understand why this happened.

   capacity operations
bandwidth
pool used  avail   read  write   
read  write
--  -  -  -  -  
-  -
datapool19.8T  5.48T543 47  
1.74M  5.89M
  raidz15.64T   687G146 13   
480K  1.66M
c3t600221900085486703B2490FB009d0  -  - 49 13  
3.26M   293K
c3t600221900085486703B4490FB063d0  -  - 48 13  
3.19M   296K
c3t6002219000852889055F4CB79C10d0  -  - 48 13  
3.19M   293K
c3t600221900085486703B8490FB0FFd0  -  - 50 13  
3.28M   284K
c3t600221900085486703BA490FB14Fd0  -  - 50 13  
3.31M   287K
c3t6002219000852889041C490FAFA0d0  -  - 49 14  
3.27M   297K
c3t600221900085486703C0490FB27Dd0  -  - 48 14  
3.24M   300K
  raidz15.73T   594G102  7   
337K   996K
c3t600221900085486703C2490FB2BFd0  -  - 52  5  
3.59M   166K
c3t6002219000852889041F490FAFD0d0  -  - 54  5  
3.72M   166K
c3t60022190008528890428490FB0D8d0  -  - 55  5  
3.79M   166K
c3t60022190008528890422490FB02Cd0  -  - 52  5  
3.57M   166K
c3t60022190008528890425490FB07Cd0  -  - 53  5  
3.64M   166K
c3t60022190008528890434490FB24Ed0  -  - 55  5  
3.76M   166K
c3t6002219000852889043949100968d0  -  - 55  5  
3.83M   166K
  raidz15.81T   519G117 10   
388K  1.26M
c3t6002219000852889056B4CB79D66d0  -  - 46  9  
3.09M   215K
c3t600221900085486704B94CB79F91d0  -  - 44  9  
2.91M   215K
c3t600221900085486704BB4CB79FE1d0  -  - 44  9  
2.97M   224K
c3t600221900085486704BD4CB7A035d0  -  - 44  9  
2.96M   215K
c3t600221900085486704BF4CB7A0ABd0  -  - 44  9  
2.97M   216K
c3t6002219000852889055C4CB79BB8d0  -  - 45  9  
3.04M   215K
c3t600221900085486704C14CB7A0FDd0  -  - 46  9  
3.02M   215K
  raidz12.59T  3.72T176 16   
581K  2.00M
c3t6002219000852889042B490FB124d0  -  - 48  5  
3.21M   342K
c3t600221900085486704C54CB7A199d0  -  - 46  5  
2.99M   342K
c3t600221900085486704C74CB7A1D5d0  -  - 49  5  
3.27M   342K
c3t600221900085288905594CB79B64d0  -  - 46  6  
3.00M   342K
c3t600221900085288905624CB79C86d0  -  - 47  6  
3.11M   342K
c3t600221900085288905654CB79CCCd0  -  - 50  6  
3.29M   342K
c3t600221900085288905684CB79D1Ed0  -  - 45  5  
2.98M   342K
  c3t6B8AC6FF837605864DC9E9F1d0 4K   928G  0  
0  0  0
--  -  -  -  -  
-  -


^C
root@nas-hz-01:~#


On 06/08/2011 11:07 AM, Ding Honghui wrote:

Hi,

I got a wired write performance and need your help.

One day, the write performance of zfs degrade.
The write performance decrease from 60MB/s to about 6MB/s in sequence 
write.


Command:
date;dd if=/dev/zero of=block bs=1024*128 count=1;date

The hardware configuration is 1 Dell MD3000 and 1 MD1000 with 30 disks.
The OS is Solaris 10U8, zpool version 15 and zfs version 4.

I run Dtrace to trace the write performance:

fbt:zfs:zfs_write:entry
{
self-ts = timestamp;
}


fbt:zfs:zfs_write:return
/self-ts/
{
@time = quantize(timestamp-self-ts);
self-ts = 0;
}

It shows
   value  - Distribution - count
8192 | 0
   16384 | 16
   32768 | 3270
   65536 |@@@  898
  131072 |@@@  985
  262144 | 33
  524288 | 1
 1048576 | 1
 2097152 | 3
 4194304 | 0
 8388608 |@180
16777216 | 33
33554432 | 0

Re: [zfs-discuss] Wired write performance problem

2011-06-07 Thread Donald Stahl
 One day, the write performance of zfs degrade.
 The write performance decrease from 60MB/s to about 6MB/s in sequence
 write.

 Command:
 date;dd if=/dev/zero of=block bs=1024*128 count=1;date

See this thread:

http://www.opensolaris.org/jive/thread.jspa?threadID=139317tstart=45

And search in the page for:
metaslab_min_alloc_size

Try adjusting the metaslab size and see if it fixes your performance problem.

-Don
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss