Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-25 Thread Wojciech Puchar


  here is my real world production example of users mail as well as 
documents.


  /dev/mirror/home1.eli      2788 1545  1243    55% 1941057 20981181    8%  
 /home


Not the same data, I imagine.


A mix. 90% Mailboxes and user data (documents, pictures), rest are some 
.tar.gz backups.


At other places i have similar situation. one or more gmirror sets, 1-3TB 
each depends on drives.


For those who puts 1000 of mailboxes i recommend dovecot with mdbox 
storage backend



  I was dealing with the actual byte counts ... that figure is going to 

be in whole blocks.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Adam Nowacki

On 2013-01-23 21:22, Wojciech Puchar wrote:

While RAID-Z is already a king of bad performance,


I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
measurements to back up your claim?


it is clearly described even in ZFS papers. Both on reads and writes it
gives single drive random I/O performance.


With ZFS and RAID-Z the situation is a bit more complex.

Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors).

A worst case scenario could happen if your random i/o workload was 
reading random files each of 2048 bytes. Each file read would require 
data from 4 disks (5th is parity and won't be read unless there are 
errors). However if files were 512 bytes or less then only one disk 
would be used. 1024 bytes - two disks, etc.


So ZFS is probably not the best choice to store millions of small files 
if random access to whole files is the primary concern.


But lets look at a different scenario - a PostgreSQL database. Here 
table data is split and stored in 1GB files. ZFS splits the file into 
128KiB records (recordsize property). This record is then again split 
into 4 columns each 32768 bytes. 5th column is generated containing 
parity. Each column is then stored on a different disk. You could think 
of it as a regular RAID-5 with stripe size of 32768 bytes.


PostgreSQL uses 8192 byte pages that fit evenly both into ZFS record 
size and column size. Each page access requires only a single disk read. 
Random i/o performance here should be 5 times that of a single disk.


For me the reliability ZFS offers is far more important than pure 
performance.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Wojciech Puchar
then stored on a different disk. You could think of it as a regular RAID-5 
with stripe size of 32768 bytes.


PostgreSQL uses 8192 byte pages that fit evenly both into ZFS record size and 
column size. Each page access requires only a single disk read. Random i/o 
performance here should be 5 times that of a single disk.


think about writing 8192 byte pages randomly. and then doing linear search 
over table.




For me the reliability ZFS offers is far more important than pure 
performance.

Except it is on paper reliability.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Zaphod Beeblebrox
Wow!.!  OK.  It sounds like you (or someone like you) can answer some of my
burning questions about ZFS.

On Thu, Jan 24, 2013 at 8:12 AM, Adam Nowacki nowa...@platinum.linux.plwrote:


 Lets assume 5 disk raidz1 vdev with ashift=9 (512 byte sectors).

 A worst case scenario could happen if your random i/o workload was reading
 random files each of 2048 bytes. Each file read would require data from 4
 disks (5th is parity and won't be read unless there are errors). However if
 files were 512 bytes or less then only one disk would be used. 1024 bytes -
 two disks, etc.

 So ZFS is probably not the best choice to store millions of small files if
 random access to whole files is the primary concern.

 But lets look at a different scenario - a PostgreSQL database. Here table
 data is split and stored in 1GB files. ZFS splits the file into 128KiB
 records (recordsize property). This record is then again split into 4
 columns each 32768 bytes. 5th column is generated containing parity. Each
 column is then stored on a different disk. You could think of it as a
 regular RAID-5 with stripe size of 32768 bytes.


Ok... so my question then would be... what of the small files.  If I write
several small files at once, does the transaction use a record, or does
each file need to use a record?  Additionally, if small files use
sub-records, when you delete that file, does the sub-record get moved or
just wasted (until the record is completely free)?

I'm considering the difference, say, between cyrus imap (one file per
message ZFS, database files on different ZFS filesystem) and dbmail imap
(postgresql on ZFS).

... now I realize that PostgreSQL on ZFS has some special issues (but I
don't have a choice here between ZFS and non-ZFS ... ZFS has already been
chosen), but I'm also figuring that PostgreSQL on ZFS has some waste
compared to cyrus IMAP on ZFS.

So far in my research, Cyrus makes some compelling arguments that the
common use case of most IMAP database files is full scan --- for which it's
database files are optimized and SQL-based files are not.  I agree that
some operations can be more efficient in a good SQL database, but full scan
(as a most often used query) is not.

Cyrus also makes sense to me as a collection of small files ... for which I
expect ZFS to excel... including the ability to snapshot with impunity...
but I am terribly curious how the files are handled in transactions.

I'm actually (right now) running some filesize statistics (and I'll get
back to the list, if asked), but I'd like to know how ZFS is going to store
the arriving mail... :).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Wojciech Puchar

several small files at once, does the transaction use a record, or does
each file need to use a record?  Additionally, if small files use
sub-records, when you delete that file, does the sub-record get moved or
just wasted (until the record is completely free)?


writes of small files are always good with ZFS.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Adam Nowacki

On 2013-01-24 15:24, Wojciech Puchar wrote:

For me the reliability ZFS offers is far more important than pure
performance.

Except it is on paper reliability.


This on paper reliability in practice saved a 20TB pool. See one of my 
previous emails. Any other filesystem or hardware/software raid without 
per-disk checksums would have failed. Silent corruption of non-important 
files would be the best case, complete filesystem death by important 
metadata corruption as the worst case.


I've been using ZFS for 3 years in many systems. Biggest one has 44 
disks and 4 ZFS pools - this one survived SAS expander disconnects, a 
few kernel panics and countless power failures (UPS only holds for a few 
hours).


So far I've not lost a single ZFS pool or any data stored.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Adam Nowacki

On 2013-01-24 15:45, Zaphod Beeblebrox wrote:

Ok... so my question then would be... what of the small files.  If I write
several small files at once, does the transaction use a record, or does
each file need to use a record?  Additionally, if small files use
sub-records, when you delete that file, does the sub-record get moved or
just wasted (until the record is completely free)?


Each file is a fully self-contained object (together with full parity) 
all the way to the physical storage. A 1 byte file on RAID-Z2 pool will 
always use 3 disks, 3 sectors total for data alone. You can use du to 
verify - it reports physical size together with parity. Metadata like 
directory entry or file attributes is stored separately and shared with 
other files. For small files there may be a lot of wasted space.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Zaphod Beeblebrox
Ok... here's the existing data:

There are 3,236,316 files summing to 97,500,008,691 bytes.  That puts the
average file at 30,127 bytes.  But for the full breakdown:

512 : 7758
1024 : 139046
2048 : 1468904
4096 : 325375
8192 : 492399
16384 : 324728
32768 : 263210
65536 : 102407
131072 : 43046
262144 : 22259
524288 : 17136
1048576 : 13788
2097152 : 8279
4194304 : 4501
8388608 : 2317
16777216 : 1045
33554432 : 119
67108864 : 2

I produced that list with the output of ls -R's byte counts, sorted and
then processed with:

(while read num; do count=$[count+1]; if [ $num -gt $size ]; then echo
$size : $count;size=$[size*2]; count=0; fi; done) imapfilesizelist

... now the new machine has two 2T disks in a ZFS mirror --- so I suppose
it won't waste as much space as a RAID-Z ZFS --- in that files less than
512 bytes will take 512 bytes?  By far the most common case is 2048 bytes
... so that would indicate that a RAID-Z larger than 5 disks would waste
much space.

Does that go to your recomendations on vdev size, then?   To have an 8 or 9
disk vdev, you should be storing at smallest 4k files?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Wojciech Puchar

So far I've not lost a single ZFS pool or any data stored.

so far my house wasn't robbed.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Wojciech Puchar

There are 3,236,316 files summing to 97,500,008,691 bytes.  That puts the
average file at 30,127 bytes.  But for the full breakdown:


quite low. what do you store.

here is my real world production example of users mail as well as 
documents.



/dev/mirror/home1.eli  2788 1545  124355% 1941057 209811818%   /home


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Zaphod Beeblebrox
On Thu, Jan 24, 2013 at 2:26 PM, Wojciech Puchar 
woj...@wojtek.tensor.gdynia.pl wrote:

 There are 3,236,316 files summing to 97,500,008,691 bytes.  That puts the
 average file at 30,127 bytes.  But for the full breakdown:


 quite low. what do you store.


Apparently you're not really following this thread... just trolling?  I had
said that it was cyrus IMAP data (which, for reference, is one file per
email message).


 here is my real world production example of users mail as well as
 documents.


 /dev/mirror/home1.eli  2788 1545  124355% 1941057 209811818%
 /home


Not the same data, I imagine.  I was dealing with the actual byte counts
... that figure is going to be in whole blocks.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-24 Thread Nikolay Denev

On Jan 24, 2013, at 4:24 PM, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl 
wrote:
 
 Except it is on paper reliability.

This on paper reliability saved my ass numerous times.
For example I had one home NAS server machine with flaky SATA controller that 
would not detect one of the four drives from time to time on reboot.
This made my pool degraded several times, and even rebooting with let's say 
disk4 failed to a situation that disk3 is failed did not corrupt any data.
I don't think this is possible with any other open source FS, let alone 
hardware RAID that would drop the whole array because of this.
I have never ever personally lost any data on ZFS. Yes, the performance is 
another topic, and you must know what you are doing, and what is your
usage pattern, but from reliability standpoint, to me ZFS looks more durable 
than anything else.

P.S.: My home NAS is running freebsd-CURRENT with ZFS from the first version 
available. Several drives died, two times the pool was expanded
by replacing all drives one by one and resilvered, no single byte lost.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

While RAID-Z is already a king of bad performance,


I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
measurements to back up your claim?


it is clearly described even in ZFS papers. Both on reads and writes it 
gives single drive random I/O performance.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

This is because RAID-Z spreads each block out over all disks, whereas RAID5
(as it is typically configured) puts each block on only one disk.  So to
read a block from RAID-Z, all data disks must be involved, vs. for RAID5
only one disk needs to have its head moved.

For other workloads (especially streaming reads/writes), there is no
fundamental difference, though of course implementation quality may vary.
streaming workload generally is always good. random I/O is what is 
important.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Chris Rees
On 23 Jan 2013 20:23, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl
wrote:

 While RAID-Z is already a king of bad performance,


 I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
 measurements to back up your claim?


 it is clearly described even in ZFS papers. Both on reads and writes it
gives single drive random I/O performance.

So we have to take your word for it?

Provide a link if you're going to make assertions, or they're no more than
your own opinion.

Chris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Mark Felder

On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:



So we have to take your word for it?
Provide a link if you're going to make assertions, or they're no more  
than

your own opinion.


I've heard this same thing -- every vdev == 1 drive in performance. I've  
never seen any proof/papers on it though.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Artem Belevich
On Wed, Jan 23, 2013 at 12:22 PM, Wojciech Puchar
woj...@wojtek.tensor.gdynia.pl wrote:
 While RAID-Z is already a king of bad performance,


 I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
 measurements to back up your claim?


 it is clearly described even in ZFS papers. Both on reads and writes it
 gives single drive random I/O performance.

For reads - true. For writes it's probably behaves better than RAID5
as it does not have to go through read-modify-write for partial block
updates. Search for RAID-5 write hole.
If you need higher performance, build your pool out of multiple RAID-Z vdevs.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Artem Belevich
On Wed, Jan 23, 2013 at 1:09 PM, Mark Felder f...@feld.me wrote:
 On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:


 So we have to take your word for it?
 Provide a link if you're going to make assertions, or they're no more than
 your own opinion.


 I've heard this same thing -- every vdev == 1 drive in performance. I've
 never seen any proof/papers on it though.

1 drive in performance only applies to number of random i/o
operations vdev can perform. You still get increased throughput. I.e.
5-drive RAIDZ will have 4x bandwidth of individual disks in vdev, but
would deliver only as many IOPS as the slowest drive as record would
have to be read back from N-1 or N-2 drived in vdev. It's the same for
RAID5. IMHO for identical record/block size RAID5 has no advantage
over RAID-Z for reads and does have disadvantage when it comes to
small writes. Never mind lack of data integrity checks and other bells
and whistles ZFS provides.

--Artem
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar


I've heard this same thing -- every vdev == 1 drive in performance. I've 
never seen any proof/papers on it though.

read original ZFS papers.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

gives single drive random I/O performance.


For reads - true. For writes it's probably behaves better than RAID5


yes, because as with reads it gives single drive performance. small writes 
on RAID5 gives lower than single disk performance.



If you need higher performance, build your pool out of multiple RAID-Z vdevs.

even you need normal performance use gmirror and UFS
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Chris Rees
On 23 January 2013 21:24, Wojciech Puchar
woj...@wojtek.tensor.gdynia.pl wrote:

 I've heard this same thing -- every vdev == 1 drive in performance. I've
 never seen any proof/papers on it though.

 read original ZFS papers.

No, you are making the assertion, provide a link.

Chris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

1 drive in performance only applies to number of random i/o
operations vdev can perform. You still get increased throughput. I.e.
5-drive RAIDZ will have 4x bandwidth of individual disks in vdev, but


unless your work is serving movies it doesn't matter.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Michel Talon
On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:


 So we have to take your word for it?
 Provide a link if you're going to make assertions, or they're no more  
 than
 your own opinion.

I've heard this same thing -- every vdev == 1 drive in performance. I've  
never seen any proof/papers on it though.


first google answer from request raids performance
https://blogs.oracle.com/roch/entry/when_to_and_not_to

Effectively,  as  a first approximation,  an  N-disk RAID-Z group will
behave as   a single   device in  terms  of  deliveredrandom input
IOPS. Thus  a 10-disk group of devices  each capable of 200-IOPS, will
globally act as a 200-IOPS capable RAID-Z group.  This is the price to
pay to achieve proper data  protection without  the 2X block  overhead
associated with mirroring.



--

Michel Talon
ta...@lpthe.jussieu.fr







smime.p7s
Description: S/MIME cryptographic signature


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Artem Belevich
On Wed, Jan 23, 2013 at 1:25 PM, Wojciech Puchar
woj...@wojtek.tensor.gdynia.pl wrote:
 gives single drive random I/O performance.


 For reads - true. For writes it's probably behaves better than RAID5


 yes, because as with reads it gives single drive performance. small writes
 on RAID5 gives lower than single disk performance.


 If you need higher performance, build your pool out of multiple RAID-Z
 vdevs.

 even you need normal performance use gmirror and UFS

I've no objection. If it works for you -- go for it.

For me personally ZFS performance is good enough, and data integrity
verification is something that I'm willing to sacrifice some
performance for. ZFS scrub gives me either warm and fuzzy feeling that
everything is OK, or explicitly tells me that something bad happened
*and* reconstructs the data if it's possible.

Just my $0.02,

--Artem
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Nikolay Denev

On Jan 23, 2013, at 11:09 PM, Mark Felder f...@feld.me wrote:

 On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:
 
 
 So we have to take your word for it?
 Provide a link if you're going to make assertions, or they're no more than
 your own opinion.
 
 I've heard this same thing -- every vdev == 1 drive in performance. I've 
 never seen any proof/papers on it though.
 ___
 freebsd...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org


Here is a blog post that describes why this is true for IOPS:

http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Chris Rees
On 23 Jan 2013 21:45, Michel Talon ta...@lpthe.jussieu.fr wrote:

 On Wed, 23 Jan 2013 14:26:43 -0600, Chris Rees utis...@gmail.com wrote:

 
  So we have to take your word for it?
  Provide a link if you're going to make assertions, or they're no more
  than
  your own opinion.

 I've heard this same thing -- every vdev == 1 drive in performance. I've
 never seen any proof/papers on it though.


 first google answer from request raids performance
 https://blogs.oracle.com/roch/entry/when_to_and_not_to

 Effectively,  as  a first approximation,  an  N-disk RAID-Z group will
 behave as   a single   device in  terms  of  deliveredrandom input
 IOPS. Thus  a 10-disk group of devices  each capable of 200-IOPS, will
 globally act as a 200-IOPS capable RAID-Z group.  This is the price to
 pay to achieve proper data  protection without  the 2X block  overhead
 associated with mirroring.

Thanks for the link, but I could have done that;  I am attempting to
explain to Wojciech that his habit of making bold assertions and
arrogantly refusing to back them up makes for frustrating reading.

Chris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar

associated with mirroring.


Thanks for the link, but I could have done that;  I am attempting to
explain to Wojciech that his habit of making bold assertions and
as you can see it is not a bold assertion, just you use something without 
even reading it's docs.

Not mentioning doing any more research.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread Wojciech Puchar


even you need normal performance use gmirror and UFS


I've no objection. If it works for you -- go for it.


both works. For todays trend of solving everything by more hardware ZFS 
may even have enough performance.


But still it is dangerous for a reasons i explained, as well as it 
promotes bad setups and layouts like making single filesystem out of large 
amount of disks. This is bad for no matter what filesystem and RAID setup 
you use, or even what OS.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-23 Thread matt
On 01/23/13 14:27, Wojciech Puchar wrote:


 both works. For todays trend of solving everything by more hardware
 ZFS may even have enough performance.

 But still it is dangerous for a reasons i explained, as well as it
 promotes bad setups and layouts like making single filesystem out of
 large amount of disks. This is bad for no matter what filesystem and
 RAID setup you use, or even what OS.


ZFS mirror performance is quite good (both random IO and sequential),
and resilvers/scrubs are measured in an hour or less. You can always
make pool out of these instead of RAIDZ if you can get away with less
total available space.

I think RAIDZ vs Gmirror is a bad comparison, you can use a ZFS mirror
with all the ZFS features, plus N-way (not sure if gmirror does this).

Regarding single large filesystems, there is an old saying about not
putting all your eggs into one basket, even if it's a great basket :)

Matt


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-22 Thread Matthew Ahrens
On Mon, Jan 21, 2013 at 11:36 PM, Peter Jeremy pe...@rulingia.com wrote:
 On 2013-Jan-21 12:12:45 +0100, Wojciech Puchar 
woj...@wojtek.tensor.gdynia.pl wrote:
While RAID-Z is already a king of bad performance,

 I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
 measurements to back up your claim?

Leaving aside anecdotal evidence (or actual measurements), RAID-Z is
fundamentally slower than RAID4/5 *for random reads*.

This is because RAID-Z spreads each block out over all disks, whereas RAID5
(as it is typically configured) puts each block on only one disk.  So to
read a block from RAID-Z, all data disks must be involved, vs. for RAID5
only one disk needs to have its head moved.

For other workloads (especially streaming reads/writes), there is no
fundamental difference, though of course implementation quality may vary.

 Even better - use UFS.

To each their own.  As a ZFS developer, it should come as no surprise that
in my opinion and experience, the benefits of ZFS almost always outweigh
this downside.

--matt
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-21 Thread Wojciech Puchar

Please don't misinterpret this post: ZFS's ability to recover from fairly
catastrophic failures is pretty stellar, but I'm wondering if there can be


from my testing it is exactly opposite. You have to see a difference 
between marketing and reality.



a little room for improvement.

I use RAID pretty much everywhere.  I don't like to loose data and disks
are cheap.  I have a fair amount of experience with all flavors ... and ZFS


just like me. And because i want performance and - as you described - 
disks are cheap - i use RAID-1 (gmirror).



has become a go-to filesystem for most of my applications.


My applications doesn't tolerate low performance, overcomplexity and 
high risk of data loss.


That's why i use properly tuned UFS, gmirror, and prefer not to use 
gstripe but have multiple filesystems



One of the best recommendations I can give for ZFS is it's
crash-recoverability.


Which is marketing, not truth. If you want bullet-proof recoverability, 
UFS beats everything i've ever seen.


If you want FAST crash recovery, use softupdates+journal, available in 
FreeBSD 9.



 As a counter example, if you have most hardware RAID
going or a software whole-disk raid, after a crash it will generally
declare one disk as good and the other disk as to be repaired ... after
which a full surface scan of the affected disks --- reading one and writing
the other --- ensues.


true. gmirror do it, but you can defer mirror rebuild, which i use.
I have a script that send me a mail when gmirror is degraded, and i - 
after finding out the cause of problem, and possibly replacing disk - run 
rebuild after work hours, so no slowdown is experienced.



ZFS is smart on this point: it will recover on reboot with a minimum amount
of fuss.  Even if you dislodge a drive ... so that it's missing the last
'n' transactions, ZFS seems to figure this out (which I thought was extra
cudos).


Yes this is marketing. practice is somehow different. as you discovered 
yourself.




MY PROBLEM comes from problems that scrub can fix.

Let's talk, in specific, about my home array.  It has 9x 1.5T and 8x 2T in
a RAID-Z configuration (2 sets, obviously).


While RAID-Z is already a king of bad performance, i assume 
you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools you would 
spread load unevenly and make performance even worse.




A full scrub of my drives weighs in at 36 hours or so.


which is funny as ZFS is marketed as doing this efficient (like checking 
only used space).


dd if=/dev/disk of=/dev/null bs=2m would take no more than a few hours. 
and you may do all in parallel.



   vr2/cvs:0x1c1

Now ... this is just an example: after each scrub, the hex number was


seems like scrub simply not do it's work right.


before the old error was cleared.  Then this new error gets similarly
cleared by the next scrub.  It seems that if the scrub returned to this new
found error after fixing the known errors, this could save whole new
scrub runs from being required.


Even better - use UFS.
For both bullet proof recoverability and performance.
If you need help in tuning you may ask me privately.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-21 Thread Peter Jeremy
On 2013-Jan-21 12:12:45 +0100, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl 
wrote:
That's why i use properly tuned UFS, gmirror, and prefer not to use 
gstripe but have multiple filesystems

When I started using ZFS, I didn't fully trust it so I had a gmirrored
UFS root (including a full src tree).  Over time, I found that gmirror
plus UFS was giving me more problems than ZFS.  In particular, I was
seeing behaviour that suggested that the mirrors were out of sync,
even though gmirror insisted they were in sync.  Unfortunately, there
is no way to get gmirror to verify the mirroring or to get UFS to
check correctness of data or metadata (fsck can only check metadata
consistency).  I've since moved to a ZFS root.

Which is marketing, not truth. If you want bullet-proof recoverability, 
UFS beats everything i've ever seen.

I've seen the opposite.  One big difference is that ZFS is designed to
ensure it returns the data that was written to it whereas UFS just
returns the bytes it finds where it thinks it wrote your data.  One
side effect of this is that ZFS is far fussier about hardware quality
- since it checksums everything, it is likely to pick up glitches that
UFS doesn't notice.

If you want FAST crash recovery, use softupdates+journal, available in 
FreeBSD 9.

I'll admit that I haven't used SU+J but one downside of SU+J is that
it prevents the use of snapshots, which in turn prevents the (safe)
use of dump(8) (which is the official tool for UFS backups) on live
filesystems.

 of fuss.  Even if you dislodge a drive ... so that it's missing the last
 'n' transactions, ZFS seems to figure this out (which I thought was extra
 cudos).

Yes this is marketing. practice is somehow different. as you discovered 
yourself.

Most of the time this works as designed.  It's possible there are bugs
in the implementation.

While RAID-Z is already a king of bad performance,

I don't believe RAID-Z is any worse than RAID5.  Do you have any actual
measurements to back up your claim?

 i assume 
you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools 
you would 
spread load unevenly and make performance even worse.

There's no real reason why you could't have 2 different vdevs in the
same pool.

 A full scrub of my drives weighs in at 36 hours or so.

which is funny as ZFS is marketed as doing this efficient (like checking 
only used space).

It _does_ only check used space but it does so in logical order rather
than physical order.  For a fragmented pool, this means random accesses.

Even better - use UFS.

Then you'll never know that your data has been corrupted.

For both bullet proof recoverability and performance.
use ZFS.

-- 
Peter Jeremy


pgpo1y4DGw4Rb.pgp
Description: PGP signature


ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-20 Thread Zaphod Beeblebrox
Please don't misinterpret this post: ZFS's ability to recover from fairly
catastrophic failures is pretty stellar, but I'm wondering if there can be
a little room for improvement.

I use RAID pretty much everywhere.  I don't like to loose data and disks
are cheap.  I have a fair amount of experience with all flavors ... and ZFS
has become a go-to filesystem for most of my applications.

One of the best recommendations I can give for ZFS is it's
crash-recoverability.  As a counter example, if you have most hardware RAID
going or a software whole-disk raid, after a crash it will generally
declare one disk as good and the other disk as to be repaired ... after
which a full surface scan of the affected disks --- reading one and writing
the other --- ensues.  On my Windows desktop, the pair of 2T's take 3 or 4
hours to do this.  A pair of green 2T's can take over 6.  You don't loose
any data, but you have severely reduced performance until it's repaired.

The rub is that you know only one or two blocks could possibly even be
different ... and that this is a highly unoptimized way of going about the
problem.

ZFS is smart on this point: it will recover on reboot with a minimum amount
of fuss.  Even if you dislodge a drive ... so that it's missing the last
'n' transactions, ZFS seems to figure this out (which I thought was extra
cudos).

MY PROBLEM comes from problems that scrub can fix.

Let's talk, in specific, about my home array.  It has 9x 1.5T and 8x 2T in
a RAID-Z configuration (2 sets, obviously).  The drives themselves are
housed (4 each) in external drive bays with a single SATA connection for
each.  I think I have spoken of this here before.

A full scrub of my drives weighs in at 36 hours or so.

Now around Christmas, while moving some things, I managed to pull the plug
on one cabinet of 4 drives.  It was likely that the only active use of the
filesystem was an automated cvs checkin (backup) given that the errors only
appeared on the cvs directory.

IN-THE-END, no data was lost, but I had to scrub 4 times to remove the
complaints, which showed like this from zpool status -v

errors: Permanent errors have been detected in the following files:

vr2/cvs:0x1c1

Now ... this is just an example: after each scrub, the hex number was
different.  I also couldn't actually find the error on the cvs filesystem,
as a side note.  Not many files are stored there, and they all seemed to be
present.

MY TAKEAWAY from this is that 2 major improvements could be made to ZFS:

1) a pause for scrub... such that long scrubs could be paused during
working hours.

2) going back over errors... during each scrub, the new error was found
before the old error was cleared.  Then this new error gets similarly
cleared by the next scrub.  It seems that if the scrub returned to this new
found error after fixing the known errors, this could save whole new
scrub runs from being required.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ZFS regimen: scrub, scrub, scrub and scrub again.

2013-01-20 Thread Attila Nagy

Hi,

On 01/20/13 23:26, Zaphod Beeblebrox wrote:


1) a pause for scrub... such that long scrubs could be paused during
working hours.



While not exactly pause, but isn't playing with scrub_delay works here?

vfs.zfs.scrub_delay: Number of ticks to delay scrub

Set this to a high value during working hours, and set back to its 
normal (or even below) value off working hours. (maybe resilver delay, 
or some other values should also be set, I haven't yet read the relevant 
code)

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org