Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-10 Thread InterNetX - Juergen Gotteswinter
www.bolthole.com/solaris/zrep/

Am 09.03.2016 um 03:08 schrieb Manuel Amador (Rudd-O):
> On 03/09/2016 12:05 AM, Liam Slusser wrote:
>>
>> We use a slightly modified zrep to handle the replication between the two.
> 
> zrep?
> 


---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-09 Thread Miroslav Lachman

Manuel Amador (Rudd-O) wrote on 03/09/2016 03:08:

On 03/09/2016 12:05 AM, Liam Slusser wrote:


We use a slightly modified zrep to handle the replication between the two.


zrep?


In ports sysutils/zrep - ZFS based replication and failover solution

WWW: http://www.bolthole.com/solaris/zrep/


---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


zrep - WAS::Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-09 Thread Jerry Kemp



On 03/ 8/16 08:08 PM, Manuel Amador (Rudd-O) wrote:

On 03/09/2016 12:05 AM, Liam Slusser wrote:


We use a slightly modified zrep to handle the replication between the two.


zrep?




repost from Phil Brown from the OpenSolaris zfs-discuss mailing list
12 March 2012

The bolthole website is up and online

Enjoy,

Jerry

..



I'm happy to announce the first release of zrep (v0.1)

http://www.bolthole.com/solaris/zrep/

This is a self-contained "single executable" tool, to implement
synchronization *and* failover of an active/passive zfs filesystem
pair.
No configuration files needed: configuration is stored in the zfs
filesystem properties.

Setting up replication, is a simple 2 step process
(presuming you already have root ssh trust set up)


  1. zrep init pool/myfs remotehost remotepool/remotefs
  (This will create, and sync, the remote filesystem)

  2. zrep sync pool/myfs
  (or if you prefer, zrep sync all)
   Do this manually, or crontab it,or

will automatically switch roles, making the src, the destination, and
vice versa.

You can then in theory set up "zrep sync -q SOME_SEC all"
as a cronjob on both sides, and then forget about it.
(although you should note that it currently is only single-threaded)

Failover is equally simple:

  zrep failover pool/myfs


zrep uses an internal locking mechanism to avoid problems with
overlapping operations on a filesystem.

zrep automatically handles serialization of snapshots. It uses a 6
digit hex serial number, of the form
  @zrep_##
It can thus handle running once a minute, every minute, for 11650
days. Or, over 30 years

By default it only keeps the last 5 snapshots, but that's tunable via
a property.



Simple usage summary:
zrep (init|-i) ZFS/fs remotehost remoteZFSpool/fs
zrep (sync|-S) ZFS/fs
zrep (sync|-S) all
zrep (status|-s) [ZFS/fs]
zrep (list|-l) [-v] [ZFS/fs]
zrep (expire|-e) [-L] (ZFS/fs ...)|(all)|()
zrep (changeconfig|-C) ZFS/fs remotehost remoteZFSpool/fs
zrep failover [-L] ZFS/fs
zrep takeover [-L] ZFS/fs
zrep clear ZFS/fs  -- REMOVE ZREP CONFIG AND SNAPS FROM FILESYSTEM


---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


RE: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-08 Thread Fred Liu
Gotcha!

From: Liam Slusser [mailto:lslus...@gmail.com]
Sent: 星期三, 三月 09, 2016 8:05
To: z...@lists.illumos.org
Cc: smartos-disc...@lists.smartos.org; develo...@lists.open-zfs.org; developer; 
illumos-developer; omnios-discuss; Discussion list for OpenIndiana; 
zfs-disc...@list.zfsonlinux.org; freebsd...@freebsd.org; zfs-de...@freebsd.org
Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- 
the zpool with most disks you have ever built


Hi Fred -

We don't use any cluster software.  Our backup server is just a full copy of 
our data and nothing more.  So in the event of a failure of the master our 
server clients don't automatically fail over or anything nifty like that.  This 
filer isn't customer facing, so in the event of a failure of the master there 
is no customer impact.  We use a slightly modified zrep to handle the 
replication between the two.

thanks,
liam


[Fred]: zpool wiith 280 drives in production is pretty big! I think 2000 drives 
were just in test. It is true that huge pools have lots of operation 
challenges. I have met the similar sluggish issue caused by a
   will-die disk.  Just curious, what is the cluster software 
implemented in 
http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
 ?

Thanks.

Fred
illumos-zfs | Archives 
[https://www.listbox.com/images/feed-icon-10x10.jpg22e4cdf.jpg?uri=aHR0cHM6Ly93d3cubGlzdGJveC5jb20vaW1hZ2VzL2ZlZWQtaWNvbi0xMHgxMC5qcGc]
  | 
Modify Your Subscription

[https://www.listbox.com/images/listbox-logo-small.png22e4cdf.png?uri=aHR0cHM6Ly93d3cubGlzdGJveC5jb20vaW1hZ2VzL2xpc3Rib3gtbG9nby1zbWFsbC5wbmc]





---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-08 Thread Manuel Amador
On 03/09/2016 12:05 AM, Liam Slusser wrote:
>
> We use a slightly modified zrep to handle the replication between the two.

zrep?

-- 
Rudd-O
http://rudd-o.com/




signature.asc
Description: OpenPGP digital signature




---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-08 Thread Liam Slusser
Hi Fred -

We don't use any cluster software.  Our backup server is just a full copy
of our data and nothing more.  So in the event of a failure of the master
our server clients don't automatically fail over or anything nifty like
that.  This filer isn't customer facing, so in the event of a failure of
the master there is no customer impact.  We use a slightly modified zrep to
handle the replication between the two.

thanks,
liam



> [Fred]: zpool wiith 280 drives in production is pretty big! I think 2000
>> drives were just in test. It is true that huge pools have lots of operation
>> challenges. I have met the similar sluggish issue caused by a
>>
>will-die disk.  Just curious, what is the cluster software
> implemented in
> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
>  ?
>
> Thanks.
>
> Fred
>



---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-07 Thread Fred Liu
2016-03-08 4:55 GMT+08:00 Liam Slusser :

> I don't have a 2000 drive array (thats amazing!) but I do have two 280
> drive arrays which are in production.  Here are the generic stats:
>
> server setup:
> OpenIndiana oi_151
> 1 server rack
> Dell r720xd 64g ram with mirrored 250g boot disks
> 5 x LSI 9207-8e dualport SAS pci-e host bus adapters
> Intel 10g fibre ethernet (dual port)
> 2 x SSD for log cache
> 2 x SSD for cache
> 23 x Dell MD1200 with 3T,4T, or 6T NLSAS disks (a mix of Toshiba, Western
> Digital, and Seagate drives - basically whatever Dell sends)
>
> zpool setup:
> 23 x 12-disk raidz2 glued together.  276 total disks.  Basically each new
> 12 disk MD1200 is a new raidz2 added to the pool.
>
> Total size: ~797T
>
> We have an identical server which we replicate changes via zfs snapshots
> every few minutes.  The whole setup as been up and running for a few years
> now, no issues.  As we run low on space we purchase two additional MD1200
> shelfs (one for each system) and add the new raidz2 into pool on-the-fly.
>
> The only real issues we've had is sometimes a disk fails in such a way
> (think Monty Python and the holy grail i'm not dead yet) where the disk
> hasn't failed but is timing out and slows the whole array to a standstill
> until we can manual find and remove the disk.  Other problems are once a
> disk has been replaced sometimes the resilver process can take
> an eternity.  We have also found the snapshot replication process can
> interfere with the resilver process - resilver gets stuck at 99% and never
> ends - so we end up stopping or only doing one replication a day until the
> resilver process is done.
>
> The last helpful hint I have was lowering all the drive timeouts, see
> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
> for info.
>
> [Fred]: zpool wiith 280 drives in production is pretty big! I think 2000
> drives were just in test. It is true that huge pools have lots of operation
> challenges. I have met the similar sluggish issue caused by a
>
   will-die disk.  Just curious, what is the cluster software
implemented in
http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
 ?

Thanks.

Fred

>
>
>
>>>
>>>
>>
> *illumos-zfs* | Archives
> 
>  |
> Modify
> 
> Your Subscription 
>



---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-07 Thread Richard Elling

> On Mar 6, 2016, at 9:06 PM, Fred Liu  wrote:
> 
> 
> 
> 2016-03-06 22:49 GMT+08:00 Richard Elling  >:
> 
>> On Mar 3, 2016, at 8:35 PM, Fred Liu > > wrote:
>> 
>> Hi,
>> 
>> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC RAID 
>> introduction,
>> the interesting survey -- the zpool with most disks you have ever built 
>> popped in my brain.
> 
> We test to 2,000 drives. Beyond 2,000 there are some scalability issues that 
> impact failover times.
> We’ve identified these and know what to fix, but need a real customer at this 
> scale to bump it to
> the top of the priority queue.
> 
> [Fred]: Wow! 2000 drives almost need 4~5 whole racks! 
>> 
>> For zfs doesn't support nested vdev, the maximum fault tolerance should be 
>> three(from raidz3).
> 
> Pedantically, it is N, because you can have N-way mirroring.
>  
> [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works in 
> theory and rarely happens in reality.
> 
>> It is stranded if you want to build a very huge pool.
> 
> Scaling redundancy by increasing parity improves data loss protection by 
> about 3 orders of 
> magnitude. Adding capacity by striping reduces data loss protection by 1/N. 
> This is why there is
> not much need to go beyond raidz3. However, if you do want to go there, 
> adding raidz4+ is 
> relatively easy.
> 
> [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of 2000 
> drives. If that is true, the possibility of 4/2000 will be not so low.
>Plus, reslivering takes longer time if single disk has bigger 
> capacity. And further, the cost of over-provisioning spare disks vs raidz4+ 
> will be an deserved 
> trade-off when the storage mesh at the scale of 2000 drives.

Please don't assume, you'll just hurt yourself :-)
For example, do not assume the only option is striping across raidz3 vdevs. 
Clearly, there are many
different options.
 -- richard





---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built

2016-03-06 Thread Fred Liu
2016-03-07 14:04 GMT+08:00 Richard Elling :

>
> On Mar 6, 2016, at 9:06 PM, Fred Liu  wrote:
>
>
>
> 2016-03-06 22:49 GMT+08:00 Richard Elling <
> richard.ell...@richardelling.com>:
>
>>
>> On Mar 3, 2016, at 8:35 PM, Fred Liu  wrote:
>>
>> Hi,
>>
>> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC
>> RAID introduction,
>> the interesting survey -- the zpool with most disks you have ever built
>> popped in my brain.
>>
>>
>> We test to 2,000 drives. Beyond 2,000 there are some scalability issues
>> that impact failover times.
>> We’ve identified these and know what to fix, but need a real customer at
>> this scale to bump it to
>> the top of the priority queue.
>>
>> [Fred]: Wow! 2000 drives almost need 4~5 whole racks!
>
>>
>> For zfs doesn't support nested vdev, the maximum fault tolerance should
>> be three(from raidz3).
>>
>>
>> Pedantically, it is N, because you can have N-way mirroring.
>>
>
> [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works
> in theory and rarely happens in reality.
>
>>
>> It is stranded if you want to build a very huge pool.
>>
>>
>> Scaling redundancy by increasing parity improves data loss protection by
>> about 3 orders of
>> magnitude. Adding capacity by striping reduces data loss protection by
>> 1/N. This is why there is
>> not much need to go beyond raidz3. However, if you do want to go there,
>> adding raidz4+ is
>> relatively easy.
>>
>
> [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of
> 2000 drives. If that is true, the possibility of 4/2000 will be not so low.
>Plus, reslivering takes longer time if single disk has bigger
> capacity. And further, the cost of over-provisioning spare disks vs raidz4+
> will be an deserved
> trade-off when the storage mesh at the scale of 2000 drives.
>
>
> Please don't assume, you'll just hurt yourself :-)
> For example, do not assume the only option is striping across raidz3
> vdevs. Clearly, there are many
> different options.
>

[Fred]:  Yeah. Assumptions always go far way from facts! ;-) Is designing a
storage mesh with 2000 drives biz secret? Or it is just too complicate to
elaborate?
Never mind. ;-)

Thanks.

Fred


>
> *smartos-discuss* | Archives
> 
>  |
> Modify
> 
> Your Subscription 
>



---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com