Re: [zfs-discuss] "NearLine SAS"?

2010-01-18 Thread Tim Cook
On Tue, Jan 19, 2010 at 1:06 AM, Erik Trimble  wrote:

>  stupid question here:  I understand the advantages of dual-porting a drive
> with a FC interface, but for SAS, exactly what are the advantages other than
> being able to read and write simultaneously (obviously, only from the
> on-drive cache).
> And yeah, these Seagates are dual-ported SAS. (according to the spec sheet)
>

Path redundancy.  While it's fairly rare, paths to drives do go down.
Redundancy is a good thing :)


> Also, a 38% increase in IOPS without LESS drive cache seems unlikely.  Or,
> at least highly workload-dependent.
> Check that, they're claiming 38% better IOPS/watt over the SATA version,
> which, given that the SAS one pulls 10% more watts, means in absolute terms
> 45% or so.   I'm really skeptical that only an interface change can do that.
>
>
Without benchmarking myself, I can't really speak much to their claims.  I
WILL however say it's VERY unlikely they'd drop the cache on something
intended for the enterprise without being extremely confident its
performance would be the same or better.  It wouldn't surprise me at all to
hear the components they use for their SAS interfaces yield significantly
better performance.  Plus, if it's dual ported...  I wouldn't expect to see
38% consistently, but I would expect to see better performance across the
board.

-- 
--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] "NearLine SAS"?

2010-01-18 Thread Erik Trimble

Tim Cook wrote:



On Tue, Jan 19, 2010 at 12:16 AM, Erik Trimble > wrote:


A poster in another forum mentioned that Seagate (and Hitachi,
amongst others) is now selling something labeled as "NearLine SAS"
storage  (e.g. Seagate's NL35 series).

Is it me, or does this look like nothing more than their standard
7200-rpm enterprise drives with a SAS or FC interface instead of a
SATA one?

I can't see any real advantage of those over the existing
enterprise SATA drives (e.g. Seagate's Constellation ES series),
other than not needing a FC/SAS->SATA gateway in the external
drive enclosure.



Seagate claims the SAS versions of their drives actually see IOPS 
improvements:

http://www.seagate.com/www/en-us/products/servers/barracuda_es/barracuda_es.2

If the SAS version is dual ported like I would expect, that's also a 
MAJOR benefit.


--
--Tim
stupid question here:  I understand the advantages of dual-porting a 
drive with a FC interface, but for SAS, exactly what are the advantages 
other than being able to read and write simultaneously (obviously, only 
from the on-drive cache). 


And yeah, these Seagates are dual-ported SAS. (according to the spec sheet)

Also, a 38% increase in IOPS without LESS drive cache seems unlikely.  
Or, at least highly workload-dependent.
Check that, they're claiming 38% better IOPS/watt over the SATA version, 
which, given that the SAS one pulls 10% more watts, means in absolute 
terms 45% or so.   I'm really skeptical that only an interface change 
can do that.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] "NearLine SAS"?

2010-01-18 Thread Tim Cook
On Tue, Jan 19, 2010 at 12:16 AM, Erik Trimble  wrote:

> A poster in another forum mentioned that Seagate (and Hitachi, amongst
> others) is now selling something labeled as "NearLine SAS" storage  (e.g.
> Seagate's NL35 series).
>
> Is it me, or does this look like nothing more than their standard 7200-rpm
> enterprise drives with a SAS or FC interface instead of a SATA one?
>
> I can't see any real advantage of those over the existing enterprise SATA
> drives (e.g. Seagate's Constellation ES series), other than not needing a
> FC/SAS->SATA gateway in the external drive enclosure.
>
>
>
Seagate claims the SAS versions of their drives actually see IOPS
improvements:
http://www.seagate.com/www/en-us/products/servers/barracuda_es/barracuda_es.2

If the SAS version is dual ported like I would expect, that's also a MAJOR
benefit.

-- 
--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] "NearLine SAS"?

2010-01-18 Thread Erik Trimble
A poster in another forum mentioned that Seagate (and Hitachi, amongst 
others) is now selling something labeled as "NearLine SAS" storage  
(e.g. Seagate's NL35 series).


Is it me, or does this look like nothing more than their standard 
7200-rpm enterprise drives with a SAS or FC interface instead of a SATA one?


I can't see any real advantage of those over the existing enterprise 
SATA drives (e.g. Seagate's Constellation ES series), other than not 
needing a FC/SAS->SATA gateway in the external drive enclosure.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Erik Trimble

Daniel Carosone wrote:

On Mon, Jan 18, 2010 at 03:25:56PM -0800, Erik Trimble wrote:

  

Hopefully, once BP rewrite materializes (I know, I'm treating this
much to much as a Holy Grail, here to save us from all the ZFS
limitations, but really...), we can implement defragmentation which
will seriously reduce the amount of reserved space required to keep
up performance. 



I doubt that.  I expect bp-rewrite in general, and its use for
effective defragmentation in particular, to require rather *more* free
space to be available.  Of course, you may be able to add that space by
stretching a raidz vdev to one more disk, but you also may not due to
other constraints (not enough ports, etc).
  
Really?  I would expect that a frequent defragging (i.e. defrag as the 
pool is used (say on a nightly basis), not just once it gets to 90%+ 
utilization) would seriously reduce the amount of reserved space 
required, as it keeps the pool in a much more optimal layout, and thus 
has a lower overhead requirement.


Other thought is that pools which are heavily used (and thus, likely to 
need frequent defrag) would require more reserve than those which 
contain relatively static datasets.  It would certainly be helpful to 
include a tunable parameter to ZFS so that it knows whether the dataset 
is likely to be very write-intensive, or is generally 
write-once-read-many.   If the later case, I would expect that a couple 
hundred MB is all that a pool would ever need for reserve space, 
regardless of the actual pool size.



Another poster pointed out recently that you can readily add more
reserved space using an unmounted filesystem with a reservation of the
appropriate size.  This is most relevant for systems without the "stop
looking a start ganging" fix.
  



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot that won't go away.

2010-01-18 Thread Ian Collins

Daniel Carosone wrote:

On Mon, Jan 18, 2010 at 05:52:25PM +1300, Ian Collins wrote:
  

Is it the parent snapshot for a clone?
  
  
I'm almost certain it isn't.  I haven't created any clones and none show  
in zpool history.



What about snapshot holds?  I don't know if (and doubt whether) these
are in S10, but since they produce exactly an EBUSY it's worth
checking. 

  

Good idea but they were added in ZFS pool version 18 (b121).

I might have to destroy the filesystem and restore it.

Thanks for the input.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Erik Trimble

Richard Elling wrote:

On Jan 18, 2010, at 3:25 PM, Erik Trimble wrote:
  

Given my (imperfect) understanding of the internals of ZFS, the non-ZIL 
portions of the reserved space are there mostly to insure that there is 
sufficient (reasonably) contiguous space for doing COW.  Hopefully, once BP 
rewrite materializes (I know, I'm treating this much to much as a Holy Grail, 
here to save us from all the ZFS limitations, but really...), we can implement 
defragmentation which will seriously reduce the amount of reserved space 
required to keep up performance.



[Richard pauses to remember the first garbage collection presentation by
a bunch of engineers dressed as garbage men... they're probably still working
at the same job :-)]
  

I think I work with a couple of those guys over here in Java-land.

:-)



There is still some work being done on the allocation front. From my experience
with other allocators (and garbage collectors) I'm sure the technology will 
change
right after it gets near perfect and we'll have to adapt again.  For example, 
b129
includes a fix for CR6869229, zfs should switch to shiny new metaslabs more
frequently.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6869229
I think the CR is worth reading if you have an interest in allocators and 
performance.
  
I'd be interested in knowing if there is any idea cross-pollination 
between memory and disk allocation/GC methods.  The GC guys over here 
are damned slick, and there's quite a bit of good academic literature on 
memory GC (and the GC in Sun JVM has undergone considerable work - there 
are now several built-in, and having such flexibility might be a good 
thing for a GC/defragger).  Maybe it's time for a guest symposium for 
the ZFS folks? 


Once all this gets done, I'd think we seldom would need more than a GB or two 
as reserve space...


I hope :-)
 -- richard
  

Please!

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot that won't go away.

2010-01-18 Thread Daniel Carosone
On Mon, Jan 18, 2010 at 05:52:25PM +1300, Ian Collins wrote:
>> Is it the parent snapshot for a clone?
>>   
> I'm almost certain it isn't.  I haven't created any clones and none show  
> in zpool history.

What about snapshot holds?  I don't know if (and doubt whether) these
are in S10, but since they produce exactly an EBUSY it's worth
checking. 

--
Dan


pgpmpqB2IMKLv.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!

2010-01-18 Thread Tim Cook
On Mon, Jan 18, 2010 at 8:48 PM, Charles Hedrick wrote:

> From the web page it looks like this is a card that goes into the computer
> system. That's not very useful for enterprise applications, as they are
> going to want to use an external array that can be used by a redundant pair
> of servers.
>
> I'm very interested in a cost-effective device that will interface to two
> systems.
>
>
That's called an SSD in a SAS array.

-- 
--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New ZFS Intent Log (ZIL) device available - Beta program now open!

2010-01-18 Thread Charles Hedrick
>From the web page it looks like this is a card that goes into the computer 
>system. That's not very useful for enterprise applications, as they are going 
>to want to use an external array that can be used by a redundant pair of 
>servers.

I'm very interested in a cost-effective device that will interface to two 
systems.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] how are free blocks are used?

2010-01-18 Thread Rodney Lindner
Hi all,
I was wondering, when blocks are freed as part COW process are the old blocks 
put on the top or bottom of the freeblock list?

The question came about looking a thin provisioning using zfs  on top of 
dynamically expanding disk images (VDI). If the free blocks are put at the end
free block list, over time the VDI will grow to its maximum size before it 
reuses any of the blocks.

Do I have to wait for ZFS defrag?

Regards
Rodney
=
Rodney Lindner
Services Chief Technologist
Sun Microsystems Australia
Phone: +61 (0)2 94669674 (EXTN:59674)
Mobile +61 (0)404 815 842
Email: rodney.lind...@sun.com
=

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-01-18 Thread Simon Breden
Thanks. Newegg shows quite a good customer rating for that drive: 70% rated it 
with 5 stars, and 11% with four stars, with 240 ratings.

Seems like some people have complained about them sleeping - presumable to save 
power, although others report they don't, so I'll need to look into that more. 
Did yours sleep?

Also, someone reported some issues with smartctl and understanding some of the 
attributes. Does checking your drive temperatures using smartctl work? Like 
with this script: http://breden.org.uk/2008/05/16/home-fileserver-drive-temps/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Richard Elling
On Jan 18, 2010, at 3:25 PM, Erik Trimble wrote:
> Given my (imperfect) understanding of the internals of ZFS, the non-ZIL 
> portions of the reserved space are there mostly to insure that there is 
> sufficient (reasonably) contiguous space for doing COW.  Hopefully, once BP 
> rewrite materializes (I know, I'm treating this much to much as a Holy Grail, 
> here to save us from all the ZFS limitations, but really...), we can 
> implement defragmentation which will seriously reduce the amount of reserved 
> space required to keep up performance.

[Richard pauses to remember the first garbage collection presentation by
a bunch of engineers dressed as garbage men... they're probably still working
at the same job :-)]

There is still some work being done on the allocation front. From my experience
with other allocators (and garbage collectors) I'm sure the technology will 
change
right after it gets near perfect and we'll have to adapt again.  For example, 
b129
includes a fix for CR6869229, zfs should switch to shiny new metaslabs more
frequently.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6869229
I think the CR is worth reading if you have an interest in allocators and 
performance.

> Once all this gets done, I'd think we seldom would need more than a GB or two 
> as reserve space...

I hope :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Daniel Carosone
On Mon, Jan 18, 2010 at 03:25:56PM -0800, Erik Trimble wrote:

> Hopefully, once BP rewrite materializes (I know, I'm treating this
> much to much as a Holy Grail, here to save us from all the ZFS
> limitations, but really...), we can implement defragmentation which
> will seriously reduce the amount of reserved space required to keep
> up performance. 

I doubt that.  I expect bp-rewrite in general, and its use for
effective defragmentation in particular, to require rather *more* free
space to be available.  Of course, you may be able to add that space by
stretching a raidz vdev to one more disk, but you also may not due to
other constraints (not enough ports, etc).

Another poster pointed out recently that you can readily add more
reserved space using an unmounted filesystem with a reservation of the
appropriate size.  This is most relevant for systems without the "stop
looking a start ganging" fix.



pgpkeQL4nCvUf.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Daniel Carosone
On Mon, Jan 18, 2010 at 01:38:16PM -0800, Richard Elling wrote:
> The Solaris 10 10/09 zfs(1m) man page says:
> 
>  The format of the stream is committed. You will be  able
>  to receive your streams on future versions of ZFS.
> 
> I'm not sure when that hit snv, but obviously it was after b112.
> ...

I only noticed it recently.  My guess was that it arrived together
with the format changes for "zfs send -D".   As handy as that change
is, it doesn't address some of the other deficiencies involved in
trying to repurpose a replication stream as an archive format.

> > When we brought it up last time, I think we found no one knows of a
> > userland tool similar to 'ufsdump' that's capable of serializing a ZFS
> > along with holes, large files, ``attribute'' forks, windows ACL's, and
> > checksums of its own, and then restoring the stream in a
> > filesystem-agnostic read/write/lseek/... manner like 'ufsrestore'.

It was precisely the realisation that the zpool-in-a-file format had
all of the desired characteristics that led to the scheme I described
elsewhere in this thread.  

(It wasn't one of my criteria, but this goes up to and including the
"userland tool" component - although I've not tried, I understand
there are userland versions of the zfs tools in the test suite.)

It has the advantage of being entirely common with the original, so it
will be well tested and keep in sync with new features (dedup, crypto,
next+N).   A focus on formalising this usage pattern would bring
further benefits in terms of features  (e.g, as a worthwhile use case
for "read only import") and of practices (documentation and process).

--
Dan.



pgpmzgIoI2655.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Erik Trimble

Tim Cook wrote:



On Mon, Jan 18, 2010 at 3:49 PM, Richard Elling 
mailto:richard.ell...@gmail.com>> wrote:


On Jan 18, 2010, at 7:55 AM, Jesus Cea wrote:
> zpool and zfs report different free space because zfs takes into
account
> an internal reservation of 32MB or 1/64 of the capacity of the pool,
> what is bigger.

This space is also used for the ZIL.

> So in a 2TB Harddisk, the reservation would be 32 gigabytes.
Seems a bit
> excessive to me...

Me too.  Before firing off an RFE, what would be a reasonable upper
bound?  A percentage?
 -- richard



Not being intimate with the guts of ZFS, it would seem to me that a 
percentage would be the best choice.  I'll make the (perhaps 
incorrect) assumption that as disks grow, if you have a set amount of 
free space (say 5g), it becomes harder and harder to find/get to that 
free space resulting in performance tanking.  Where as we can expect 
linear performance if it's a percentage.  No?


--
--Tim
Actually, as a section of the reservation is ZIL, that portion's impact 
on performance is directly tied to the PERFORMANCE of the underlying 
zpool, not it's size.  As such, given that hard drive performance is 
pretty much hit a wall, I think we should look at having a 
non-size-determined limit on the size of the reserved area, regardless 
of the actual size of the zpool.  The limit would still need to be 
heuristic, since higher-performing zpools would need to have a larger 
maximum reservation than lower-performing ones.


Given my (imperfect) understanding of the internals of ZFS, the non-ZIL 
portions of the reserved space are there mostly to insure that there is 
sufficient (reasonably) contiguous space for doing COW.  Hopefully, once 
BP rewrite materializes (I know, I'm treating this much to much as a 
Holy Grail, here to save us from all the ZFS limitations, but 
really...), we can implement defragmentation which will seriously reduce 
the amount of reserved space required to keep up performance.


Once all this gets done, I'd think we seldom would need more than a GB 
or two as reserve space...


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Peter Jeremy
On 2010-Jan-19 00:26:27 +0800, Jesus Cea  wrote:
>On 01/18/2010 05:11 PM, David Magda wrote:
>> Ext2/3 uses 5% by default for root's usage; 8% under FreeBSD for FFS.
>> Solaris (10) uses a bit more nuance for its UFS:
>
>That reservation is to preclude users to exhaust diskspace in such a way
>that ever "root" can not login and solve the problem.

At least for UFS-derived filesystems (ie FreeBSD and Solaris), the
primary reason for the 8-10% "reserved" space is to minimise FS
fragmentation and improve space allocation performance:  More total
free space means it's quicker and easier to find the required
contiguous (or any) free space whilst searching a free space bitmap.
Allowing root to eat into that "reserved" space provided a neat
solution to resource starvation issues but was not the justification.

>I agree that is a lot of space but only 2% of a "modern disk". My point
>is that 32GB is a lot of space to reserve to be able, for instance, to
>delete a file when the pool is "full" (thanks to COW). And more when the
>minimum reserved is 32MB and ZFS can get away with it. I think that
>could be a good thing to put a cap to the maximum implicit reservation.

AFAIK, it's also necessary to ensure reasonable ZFS performance - the
"find some free space" issue becomes much more time critical with a
COW filesystem.  I recently had a 2.7TB RAIDZ1 pool get to the point
where zpool was reporting ~2% free space - and performance was
absolutely abyssmal (fsync() was taking over 16 seconds).  When I
freed up a few percent more space, the performance recovered.

Maybe it would be useful if ZFS allowed the "reserved" space to be
tuned lower but, at least for ZFS v13, the "reserved" space seems to
actually be a bit less than is needed for ZFS to function reasonably.

-- 
Peter Jeremy


pgpaYK13eLyWU.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [nfs-discuss] Two pools, one flop-

2010-01-18 Thread CD

On 01/18/2010 07:50 PM, Tom Haynes wrote:

CD wrote:

On 01/18/2010 06:36 PM, Tom Haynes wrote:

CD wrote:

Greetings.

I've go two pools, but can only access one of them from my 
linux-machine. Both pools got the same settings and acl.


Both pools has sharenfs=on. Also, every filesystem got 
aclinherit=passthrough

NAME  PROPERTY  VALUE SOURCE
tank  sharenfs  onlocal
bitbox  sharenfs  onlocal



Does 'zfs list' show bitbox to be at the root of the server's 
namespace?


# zfs list -o name,sharenfs,mountpoint

NAME SHARENFS  MOUNTPOINT
bitbox   on/bitbox
bitbox/fs0  on/bitbox/fs0
bitbox/fs1   on/bitbox/fs1
rpooloff   /rpool
rpool/ROOT   off   legacy
rpool/ROOT/opensolaris   off   /
rpool/ROOT/xvm   off   /mnt/xvm
rpool/ROOT/xvm-1 off   /mnt/xvm1
rpool/dump   - -
rpool/export off   /export
rpool/export/homeoff   /export/home
rpool/swap   - -
tank on/tank
tank/fs0   on/tank/fs0
tank/fs1   on/tank/fs2



Hmm, tank/fs1 is mounted on /tank/fs2. Do you also have
a /tank/fs1? I.e., the shares down below don't match
the paths.

This shouldn't be the problem you are seeing...


I must apologize; I edited the oputput to make it simpler, and made a 
typo. The fs0 and fs1 are just placeholders. The original output looks okay.






What does share show as the active shares?

# share

-...@tank  /tank   rw   ""
-...@tank  /tank/fs0   rw   ""
-...@tank  /tank/fs1   rw   ""
f...@tank/fs0 /tank/fs0   rw   ""
f...@tank/fs1   /tank/fs1   rw   ""



If you don't see bitbox here, it will be a problem.

Seems I've got a problem ^^
But what? Aren't the filesystem handling the sharing?




Yes, they should be. I'm adding zfs-discuss to see what further 
triaging will help.




Great, thanks.






I've got samba shares active for most of my filesystems - can this be 
a problem?



Same ACL:
/bitbox
drwxr--r--+ 25 root sa25 Dec 18 12:43 folder0
   group:sa:rwxpdDaARWcCos:---:allow
 owner@:rwxpdDaARWcCos:---:allow
  everyone@:r-a-R-c---:---:allow
drwxr--r--+  3 root sa 3 Jun  1  2009 folder1
   group:sa:rwxpdDaARWcCos:---:allow
 owner@:rwxpdDaARWcCos:---:allow
  everyone@:r-a-R-c---:---:allow

/tank
drwxr--r--+  4 root root   4 Sep  9 15:47 folder0
   group:sa:rwxpdDaARWcCos:---:allow
 owner@:rwxpdDaARWcCos:---:allow
  everyone@:r-a-R-c---:---:allow
drwxr--r--+  7 root sa 9 May 19  2009 folder1
   group:sa:rwxpdDaARWcCos:---:allow
 owner@:rwxpdDaARWcCos:---:allow
  everyone@:r-a-R-c---:---:allow

Yet, when I mount the nfs, only 'tank' is listed:
mount -t nfs4 srv:/ /mnt/server

If I try to mount the pools separately:
$ sudo mount -t nfs4 srv:/tank /mnt/tank/ --work perfectly
$ sudo mount -t nfs4 srv:/bitbox /mnt/bitbox --gives error:
mount.nfs4: mounting srv:/bitbox failed, reason given by server:
  No such file or directory


What if you try a v3 mount?

I assmume the prefix "-t nfs" equls v3? I get:
mount.nfs: access denied by server while mounting srv:/bitbox


You don't need "-t nfs", just dropping the -t option will work.

But the fact that we don't see a share means we do not
expect to get access here.


The /etc/dfs/sharetab only contains /tank entires, even though I've got 
both nfs and smb shares in the /bitbox pool... Not sure why.

Also







I don't get it!
Also, where are the config files, such as the /etc/export?



If this were non-zfs, you'd want to look in /etc/dfs. But since this 
is zfs, the share

(i.e., export) is in the sharenfs property of the filesystem.





Thanks!





Thanks for replying.






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [nfs-discuss] Two pools, one flop-

2010-01-18 Thread Tom Haynes

CD wrote:

On 01/18/2010 06:36 PM, Tom Haynes wrote:

CD wrote:

Greetings.

I've go two pools, but can only access one of them from my 
linux-machine. Both pools got the same settings and acl.


Both pools has sharenfs=on. Also, every filesystem got 
aclinherit=passthrough

NAME  PROPERTY  VALUE SOURCE
tank  sharenfs  onlocal
bitbox  sharenfs  onlocal



Does 'zfs list' show bitbox to be at the root of the server's namespace?

# zfs list -o name,sharenfs,mountpoint

NAME SHARENFS  MOUNTPOINT
bitbox   on/bitbox
bitbox/fs0  on/bitbox/fs0
bitbox/fs1   on/bitbox/fs1
rpooloff   /rpool
rpool/ROOT   off   legacy
rpool/ROOT/opensolaris   off   /
rpool/ROOT/xvm   off   /mnt/xvm
rpool/ROOT/xvm-1 off   /mnt/xvm1
rpool/dump   - -
rpool/export off   /export
rpool/export/homeoff   /export/home
rpool/swap   - -
tank on/tank
tank/fs0   on/tank/fs0
tank/fs1   on/tank/fs2



Hmm, tank/fs1 is mounted on /tank/fs2. Do you also have
a /tank/fs1? I.e., the shares down below don't match
the paths.

This shouldn't be the problem you are seeing...



What does share show as the active shares?

# share

-...@tank  /tank   rw   ""
-...@tank  /tank/fs0   rw   ""
-...@tank  /tank/fs1   rw   ""
f...@tank/fs0 /tank/fs0   rw   ""
f...@tank/fs1   /tank/fs1   rw   ""



If you don't see bitbox here, it will be a problem.

Seems I've got a problem ^^
But what? Aren't the filesystem handling the sharing?




Yes, they should be. I'm adding zfs-discuss to see what further triaging 
will help.






I've got samba shares active for most of my filesystems - can this be 
a problem?



Same ACL:
/bitbox
drwxr--r--+ 25 root sa25 Dec 18 12:43 folder0
   group:sa:rwxpdDaARWcCos:---:allow
 owner@:rwxpdDaARWcCos:---:allow
  everyone@:r-a-R-c---:---:allow
drwxr--r--+  3 root sa 3 Jun  1  2009 folder1
   group:sa:rwxpdDaARWcCos:---:allow
 owner@:rwxpdDaARWcCos:---:allow
  everyone@:r-a-R-c---:---:allow

/tank
drwxr--r--+  4 root root   4 Sep  9 15:47 folder0
   group:sa:rwxpdDaARWcCos:---:allow
 owner@:rwxpdDaARWcCos:---:allow
  everyone@:r-a-R-c---:---:allow
drwxr--r--+  7 root sa 9 May 19  2009 folder1
   group:sa:rwxpdDaARWcCos:---:allow
 owner@:rwxpdDaARWcCos:---:allow
  everyone@:r-a-R-c---:---:allow

Yet, when I mount the nfs, only 'tank' is listed:
mount -t nfs4 srv:/ /mnt/server

If I try to mount the pools separately:
$ sudo mount -t nfs4 srv:/tank /mnt/tank/ --work perfectly
$ sudo mount -t nfs4 srv:/bitbox /mnt/bitbox --gives error:
mount.nfs4: mounting srv:/bitbox failed, reason given by server:
  No such file or directory


What if you try a v3 mount?

I assmume the prefix "-t nfs" equls v3? I get:
mount.nfs: access denied by server while mounting srv:/bitbox


You don't need "-t nfs", just dropping the -t option will work.

But the fact that we don't see a share means we do not
expect to get access here.





I don't get it!
Also, where are the config files, such as the /etc/export?



If this were non-zfs, you'd want to look in /etc/dfs. But since this 
is zfs, the share

(i.e., export) is in the sharenfs property of the filesystem.





Thanks!





Thanks for replying.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I can't seem to get the pool to export...

2010-01-18 Thread Travis Tabbal
On Sun, Jan 17, 2010 at 8:14 PM, Richard Elling wrote:

> On Jan 16, 2010, at 10:03 PM, Travis Tabbal wrote:
>
> > Hmm... got it working after a reboot. Odd that it had problems before
> that. I was able to rename the pools and the system seems to be running well
> now. Irritatingly, the settings for sharenfs, sharesmb, quota, etc. didn't
> get copied over with the zfs send/recv. I didn't have that many filesystems
> though, so it wasn't too bad to reconfigure them.
>
> What OS or build?  I've had similar issues with b130 on all sorts of mounts
> besides ZFS.
>


Opensolaris snv_129.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Robert Milkowski

On 18/01/2010 18:28, Lassi Tuura wrote:

Hi,

   

Here is the big difference. For a professional backup people still typically
use tapes although tapes have become expensive.

I still believe that a set of compressed incremental star archives give you
more features.
 

Thanks for your comments!

I think I roughly understand the feature trade-offs, and have some experience 
with back-up solutions ranging from simple to enterprise.

I guess what I am after is, for data which really matters to its owners and 
which they actually had to recover, did people use tar/pax archives (~ file 
level standard archive format), dump/restore (~ semi-standard format based on 
files/inodes) or zfs send/receive (~ file system block level dump), or 
something else, and why? (Regardless of how these are implemented, hobby 
scripts or enterprise tools, how they dealt with possible media failure issues, 
etc.)

Other than the file system vs. file restore, is there any concern in doing the block level thing? 
"Concerns" as in "mind being in the line of fire if it fails?" :-)

   


What we are doing basically is:

1. incremental rsync from a client to a dedicated filesystem for that client
2. snapshot after rsync finished
3. go to #1 for next backup

a pool is a dynamic stripe across raidz-2 groups + hot spares.
Then selected clients along with all their backups (snapshots) are 
replicated to another device which is exactly the same hardware/software 
configuration.


Add to it a management of snapshots (retention policies), reporting, 
etc. and you have a pretty good backup solution which allows you to 
restore a single file or entire filesystem with a very easy access to 
any backup. And it works for different OSes as clients.


See more details at
http://milek.blogspot.com/2009/12/my-presentation-at-losug.html

--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Tim Cook
On Mon, Jan 18, 2010 at 3:49 PM, Richard Elling wrote:

> On Jan 18, 2010, at 7:55 AM, Jesus Cea wrote:
> > zpool and zfs report different free space because zfs takes into account
> > an internal reservation of 32MB or 1/64 of the capacity of the pool,
> > what is bigger.
>
> This space is also used for the ZIL.
>
> > So in a 2TB Harddisk, the reservation would be 32 gigabytes. Seems a bit
> > excessive to me...
>
> Me too.  Before firing off an RFE, what would be a reasonable upper
> bound?  A percentage?
>  -- richard
>
>
>
Not being intimate with the guts of ZFS, it would seem to me that a
percentage would be the best choice.  I'll make the (perhaps incorrect)
assumption that as disks grow, if you have a set amount of free space (say
5g), it becomes harder and harder to find/get to that free space resulting
in performance tanking.  Where as we can expect linear performance if it's a
percentage.  No?

-- 
--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Daniel Carosone
On Mon, Jan 18, 2010 at 07:34:51PM +0100, Lassi Tuura wrote:
> > Consider then, using a zpool-in-a-file as the file format, rather than
> > zfs send streams.
> 
> This is an interesting suggestion :-)
> 
> Did I understand you correctly that once a slice is written, zfs
> won't rewrite it? In other words, I can have an infinitely growing
> pile of these slices, and once zfs fills one file up (or one
> "raidz2" set of files), I flush it to tape/optical disk/whatever,
> and zfs won't change it "ever after"? When I need more space, I just
> add more slices, but old slices are effectively read-only?

No.  The intention of this scheme is to have an ongoing backup pool,
with a rolling set of historical snapshots collected together.  I
wouldn't suggest growing the pool indefinately, because you'll need an
indefinite number of pool backing files online when it comes time for
a distant-future restore.

Even if you were never to delete old snapshots, zfs will still
update some of the metadata in each file, or raidz set of files,
whenever you make changes to your backup pool.  You will need to sync
each of these files to your backup media or offsite storage, to have a
consistent pool to restore from.  

However, you can add new top-level vdevs (e.g. raidz set of files) as
your backup pool starts to fill.  Until you start to recycle space (by
deleting old snapshots from the backup pool) there won't be much new
data written to the original full files.  Therefore, if your offsite
replication of these files is efficient for incremental changes
(e.g. rsync) then they will update quickly.

The same will happen for changes within the existing files, of course,
if they're not yet full or if you are recycling space.  It's the same
data, all that changes is how it's spread among the files.

If you're writing to tapes or dvd's, you'll either need to rewrite the
files completely every time, or layer some *other* incremental
mechanism on top (perhaps your existing large-scale enterprise VTL,
which you'd like to continue using, or are forced to continue using :).

I don't mind rewriting tape media every time.. keep 2 or three sets
and just roll between them each time.  There are advantages to
having a predictable cycle - knowing that you're going to need exactly
N tapes this week and they will take T time to write out.  You also
get the benefit of all other D-to-D-to-T schemes, that this can be
done at leisure asynchronously to host backup windows. For DVD's its a
little more annoying, but at least the media is cheap. 

For these purposes, you can also consider removable hard disks as
tapes.  As I replace one generation of drives with the next, higher
capacity, I intend for the previous ones can move to backup service. 

> And perhaps most importantly: has anyone actually done this for
> their back-ups and has success stories on restore after media
> failure? 

For me it's somewhere between concept and regular practice. It works,
I've tinkered with it and done intermittent runs at archiving off the
pool files and test imports and restores, but not written automation
or made it as regular practice as I should.  I've used it to drop
backups of my opensolaris hosts onto enterprise backed-up servers, in 
non-solaris customer environments.  

There are some wrinkles, like you can't mount zpool files off
read-only DVD media directly - you have to copy them to r/w scratch
space because import wants to update metadata (is there an RFE for
read-only import? I forget).  This is mildly irritating at worst -
copying one or two dvd's worth of files is easy, yet any more than
this in your backup pool, you'll need to copy anyway for lack of many
dvd readers at once. 

I also don't recommend files >1Gb in size for DVD media, due to
iso9660 limitations.  I haven't used UDF enough to say much about any
limitations there.

--
Dan.

pgpwXMKusfxKP.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Richard Elling
On Jan 18, 2010, at 7:55 AM, Jesus Cea wrote:
> zpool and zfs report different free space because zfs takes into account
> an internal reservation of 32MB or 1/64 of the capacity of the pool,
> what is bigger.

This space is also used for the ZIL.

> So in a 2TB Harddisk, the reservation would be 32 gigabytes. Seems a bit
> excessive to me...

Me too.  Before firing off an RFE, what would be a reasonable upper
bound?  A percentage?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Richard Elling
On Jan 18, 2010, at 11:04 AM, Miles Nordin wrote:
...
> Another problem is that the snv_112 man page says this:
> 
> -8<-
> The format of the stream is evolving. No backwards  com-
> patibility is guaranteed. You may not be able to receive
> your streams on future versions of ZFS.
> -8<-
> 
> I think the new man page says something more generous. 


The Solaris 10 10/09 zfs(1m) man page says:

 The format of the stream is committed. You will be  able
 to receive your streams on future versions of ZFS.

I'm not sure when that hit snv, but obviously it was after b112.
...

> the solaris userland is ancient, so if you use gtar you'll be
> surprised less often, like IMHO you are less likely to have silently
> truncated files, but then it's maintained upstream so it's missing
> support for all these forked files, mandatory labels, and windows
> ACL's that they're piling into ZFS now for NAS features.

OOB, the default OpenSolaris PATH places /usr/gnu/bin ahead
of /usr/bin, so gnu tar is "the default."  As of b130 (I'm not running
an older build currently) the included gnu tar is version 1.22 which
is the latest as released March 2009 at http://www.gnu.org/software/tar/

> When we brought it up last time, I think we found no one knows of a
> userland tool similar to 'ufsdump' that's capable of serializing a ZFS
> along with holes, large files, ``attribute'' forks, windows ACL's, and
> checksums of its own, and then restoring the stream in a
> filesystem-agnostic read/write/lseek/... manner like 'ufsrestore'.

That is my understanding as well.  It seems that the backup vendors
are moving forward in a more-or-less vendor-specific way. This is
not necessarily a bad thing, since there are open source solutions.
But I see the requirements for backups being much more sophisticated
than ufsdump was 25 years ago.  hmmm... has ufsdump changed over 
the past 25 years? ;-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Boot Disk Configuration

2010-01-18 Thread Richard Elling
On Jan 18, 2010, at 10:22 AM, Mr. T Doodle wrote:
> I would like some opinions on what people are doing in regards to configuring 
> ZFS for root/boot drives:
>  
> 1) If you have onbaord RAID controllers are you using them then creating the 
> ZFS pool (mirrored from hardware)?

I let ZFS do the mirroring.

> 2) How many slices? lol

One.

> I can't seem to find any best practices such as the old EIS standards for UFS 
> filesystems.

Since I'm one of the co-authors for that document, I can assure you that
most of what is in the EIS boot disk standard can be blissfully forgotten.
Much of the document deals with the pain involved in systems where 
the file system was not integrated with the RAID system. The pain of
such architectures led to the need for more documentation to help 
steer people into good, manageable configurations.

One of the reasons we needed the EIS boot disk standard is because
there was no single place to document such things in the existing
structure: Solaris installation, platform installation, SVM, and VxVM
are all documented separately, by separate organizations, each with
their own agenda. The only place where all four come together for 
boot disks is in the EIS docs.  IMHO, the need for such integrated docs 
is a bug. The best practices for ZFS boot disk are now captured in the 
ZFS Administration Guide (thanks Cindy!)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Mattias Pantzare
>> Ext2/3 uses 5% by default for root's usage; 8% under FreeBSD for FFS.
>> Solaris (10) uses a bit more nuance for its UFS:
>
> That reservation is to preclude users to exhaust diskspace in such a way
> that ever "root" can not login and solve the problem.

No, the reservation in UFS/FFS is to keep the performance up. It will
be harder and harder to find free space as the disk fills. Is is even
more important for ZFS to be able to find free space as all writes
need free space.

The root-thing is just a side effect.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Miles Nordin
> "mg" == Mike Gerdts  writes:
> "tt" == Toby Thain  writes:
> "tb" == Thomas Burgess  writes:

mg> Yet it is used in ZFS flash archives on Solaris 10 and are
mg> slated for use in the successor to flash archives.

in FLAR, ``if a single bit's flipped, the whole archive and any
incrementals descended from it are toast'' is a desireable
characteristic.  FLAR is a replication tool, just like 'zfs send | zfs
receive' together are, and the sensitivity's desireable because you
want a deterministicaly exact replica.

If zpools themselves had the characteristic, ``single bit flipped,
entire pool lost,'' few would be happy with that.  In fact it's part
of the ZFS kool-aid cocktail that bitflips are detected and reported,
and that critical metadata is written several times far apart on the
pool so the damage of a single bitflip, even on an unredundant pool,
is limited.  Other things that aren't ``enterprise backup solutions'',
like tarballs, are also resilient to single bit flips.  so, why
tolerate this fragility from a backup format?  It's not appropriate.

The other problems are lack of a verification tool that doesn't
involve ponderous amounts of kernel code revision-bound to hardware
drivers and such that may one day become unobtainable, the possibility
of painting yourself into a corner (ex. if you are backing up a 17TB
filesystem, you must provide 17TB of restore space or else there is no
way to access a 2kB file), and the stream format which is
intentionally tied to the relatively often-changing zfs version so
that it can make exact copies but at the cost of constraining the
restore environment which may be separated from the backup environment
by seven years and/or in the midst of a disaster in a way that
tarballs and cpios and so on don't constrain.

Another problem is that the snv_112 man page says this:

-8<-
 The format of the stream is evolving. No backwards  com-
 patibility is guaranteed. You may not be able to receive
 your streams on future versions of ZFS.
-8<-

I think the new man page says something more generous.  The use case
here is, we should be able to:

   old solarisnew solaris   newer solaris

   zfs send  ->   zfs recv
  \
   \_ zpool upgrade -->
  (not 'zfs upgrade')


 -- zfs send
|
 -> zfs recv


   zfs recv  <- zfs send


That is, the stream format should be totally deterministic given the
zfs version, and not depend on the zpool version nor the
kernel/userland version.  This way a single backup pool can hold the
backups of several different versions of solaris.  The alternative to
this level of guarantee is for your archival backup to involve a ball
of executable code like a LiveCD or a .VDI, and that's just not on.
Seven to ten years, people---there are laws TODAY that require
archiving WORM media that long, and being able to read it, which means
no rewriting for a decade.  That's a ``hard requirement'' as
Christopher says, _today,_ never mind what's desireable or useful.
I'm not sure the new man page makes a promise quite that big as
accomodating the above chart, but IIRC it's much better than the old
one.


anyway, zfs send and receive are very good replication tools, and
replication can be used for backup (by replicating into another zpool
*NOT* by storing the stream modulo the caveat above) or for system
install (by recv'ing multiple times like FLAR).

If you choose to store zfs send streams as backups, the least you can
do is warn those you advise that zfs send streams are different from
other kidns of backup stream, because sysadmin experience in how to
write backups is old and hard-won, and these tools diverge from it.
they diverge usefully---I'm not putting them down---I'm saying *don't
use them that way* unless you are able to think through
$subtleproblems, after which you'll probably not want to use them that
way.

tt> I can see the temptation, but isn't it a bit under-designed? I
tt> think Mr Nordin might have ranted about this in the past...

no, I think it's great for FLAR.  single-bit-self-destruct is exactly
what's wanted for FLAR.

for those case where you could somehow magically capture and replay an
rsync stream it's great.  It's not a dumb design because IMHO a single
format probably can't do well for both replication and backup, but
it's not a backup format, enterprise or otherwise.  What escalates my
objection into a rant is the way ``enterprise backup solution'' gets
tossed around as some kind of soundbite hatetorial prole phrase
substituting and blocking off exploring the actual use cases which
we've done over and over again on this list.  I 

Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Lassi Tuura
Hi,

> .. it's hard to beat the convenience of a "backup file" format, for
> all sorts of reasons, including media handling, integration with other
> services, and network convenience. 

Yes.

> Consider then, using a zpool-in-a-file as the file format, rather than
> zfs send streams.

This is an interesting suggestion :-)

Did I understand you correctly that once a slice is written, zfs won't rewrite 
it? In other words, I can have an infinitely growing pile of these slices, and 
once zfs fills one file up (or one "raidz2" set of files), I flush it to 
tape/optical disk/whatever, and zfs won't change it "ever after"? When I need 
more space, I just add more slices, but old slices are effectively read-only?

Or did I misunderstand how you meant it work? It sounded very interesting but 
my understanding on zfs is currently limited :-)

And perhaps most importantly: has anyone actually done this for their back-ups 
and has success stories on restore after media failure?

Regards,
Lassi
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Lassi Tuura
Hi,

> Here is the big difference. For a professional backup people still typically
> use tapes although tapes have become expensive.
> 
> I still believe that a set of compressed incremental star archives give you 
> more features.

Thanks for your comments!

I think I roughly understand the feature trade-offs, and have some experience 
with back-up solutions ranging from simple to enterprise.

I guess what I am after is, for data which really matters to its owners and 
which they actually had to recover, did people use tar/pax archives (~ file 
level standard archive format), dump/restore (~ semi-standard format based on 
files/inodes) or zfs send/receive (~ file system block level dump), or 
something else, and why? (Regardless of how these are implemented, hobby 
scripts or enterprise tools, how they dealt with possible media failure issues, 
etc.)

Other than the file system vs. file restore, is there any concern in doing the 
block level thing? "Concerns" as in "mind being in the line of fire if it 
fails?" :-)

Regards,
Lassi
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Boot Disk Configuration

2010-01-18 Thread Mr. T Doodle
I would like some opinions on what people are doing in regards to
configuring ZFS for root/boot drives:

1) If you have onbaord RAID controllers are you using them then creating the
ZFS pool (mirrored from hardware)?

2) How many slices? lol

I can't seem to find any best practices such as the old EIS standards for
UFS filesystems.

Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Lassi Tuura
Hi,

>> I am considering building a modest sized storage system with zfs. Some
>> of the data on this is quite valuable, some small subset to be backed
>> up "forever", and I am evaluating back-up options with that in mind.
> 
> You don't need to store the "zfs send" data stream on your backup media.
> This would be annoying for the reasons mentioned - some risk of being able
> to restore in future (although that's a pretty small risk) and inability to
> restore with any granularity, i.e. you have to restore the whole FS if you
> restore anything at all.
> 
> A better approach would be "zfs send" and pipe directly to "zfs receive" on
> the external media.  This way, in the future, anything which can read ZFS
> can read the backup media, and you have granularity to restore either the
> whole FS, or individual things inside there.
> 
> Plus, the only way to guarantee the integrity of a "zfs send" data stream is
> to perform a "zfs receive" on that data stream.  So by performing a
> successful receive, you've guaranteed the datastream is not corrupt.  Yet.

Thanks for your feedback!

My plan is to have near-line back-up on zfs on another physically independent 
media, much as you describe. Then another copy off-site on separate system 
(disk), and third copy on WORM-like media in "chunks" somewhere else. I am 
really looking to have the latter two sliced to fit non-disk media (tape, 
optical disks, ...), or external storage not really usable as a filesystem (S3, 
web hosting, ...).

The off-site disk is not in zfs-capable system, unless I set up a virtual 
machine; but it won't have enough physical disks for raidz2 anyway. I am 
primarily looking for something I can store encrypted in dvd- or tape-sized 
slices on at least two physically separate media. This backup would be used 
when at least two other media have already been lost, so while convenience is a 
plus I really desire reliability and longevity. (Yes I know tapes and DVDs die 
too.)

I appreciate the comments by others on zfs send not allowing individual file 
restore, but as I wrote before this is not very important to me, at least not 
as important than my other questions. Apropos, is anyone able to respond to 
those? (Is format documented, independent tools, bugs known to have affected 
send/restore, would you recommend over tar/pax in real life, etc.)

I am very interested in what people have actually done and experienced in real 
life, including disaster recovery from multiple media failure.

Regards,
Lassi
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/18/2010 05:11 PM, David Magda wrote:
> On Jan 18, 2010, at 10:55, Jesus Cea wrote:
> 
>> zpool and zfs report different free space because zfs takes into account
>> an internal reservation of 32MB or 1/64 of the capacity of the pool,
>> what is bigger.
>>
>> So in a 2TB Harddisk, the reservation would be 32 gigabytes. Seems a bit
>> excessive to me...
> 
> 1/64 is ~1.5% according to my math.
> 
> Ext2/3 uses 5% by default for root's usage; 8% under FreeBSD for FFS.
> Solaris (10) uses a bit more nuance for its UFS:

That reservation is to preclude users to exhaust diskspace in such a way
that ever "root" can not login and solve the problem.

> 32 GB may seem like a lot (and it can hold a lot of stuff), but it's not
> what it used to be. :)

I agree that is a lot of space but only 2% of a "modern disk". My point
is that 32GB is a lot of space to reserve to be able, for instance, to
delete a file when the pool is "full" (thanks to COW). And more when the
minimum reserved is 32MB and ZFS can get away with it. I think that
could be a good thing to put a cap to the maximum implicit reservation.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBS1SLs5lgi5GaxT1NAQJkPgP+NGg1iKbNX3BzHXJjYcFLYpVNA376Ys79
VHDbElKlCAzIo80ZqW1gHQpOumUzUCZaR910+0e+0vpUzL81hHQ9wncS8BBhmXZN
Hp3jA39zzB7JjvQxJ9K/CWxbg3O4Nqi+HTcez3sczyg5dx6k1aSf05MgNPt8jtvJ
VNbuQ1hdy7o=
=qxDK
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread David Magda

On Jan 18, 2010, at 10:55, Jesus Cea wrote:

zpool and zfs report different free space because zfs takes into  
account

an internal reservation of 32MB or 1/64 of the capacity of the pool,
what is bigger.

So in a 2TB Harddisk, the reservation would be 32 gigabytes. Seems a  
bit

excessive to me...


1/64 is ~1.5% according to my math.

Ext2/3 uses 5% by default for root's usage; 8% under FreeBSD for FFS.  
Solaris (10) uses a bit more nuance for its UFS:


The default is ((64 Mbytes/partition size) * 100), rounded down to  
the nearest integer and limited between 1% and 10%, inclusively.


32 GB may seem like a lot (and it can hold a lot of stuff), but it's  
not what it used to be. :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is ZFS internal reservation excessive?

2010-01-18 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

zpool and zfs report different free space because zfs takes into account
an internal reservation of 32MB or 1/64 of the capacity of the pool,
what is bigger.

So in a 2TB Harddisk, the reservation would be 32 gigabytes. Seems a bit
excessive to me...

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBS1SEaZlgi5GaxT1NAQLqXgP+PUBVTa+CU5uulGKzY8QNFGHWcKoIqwvR
w4dFGuVpXTCBnvM9/vzit6Bq5x849zjqsBH/JUFiy1ugIMj8/2Xp0QuVd8+3ynFO
5U1i5XjIWhm5BZfuEIF8NBvzwVZmJOafDvEj56jxb3phi6tnQzw8252F9APJhlI2
jVXxzeyC6XE=
=7nGF
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Robert Milkowski
or you might do something like: 
http://milek.blogspot.com/2009/12/my-presentation-at-losug.html


however in your case if all your clients are running zfs only 
filesystems then relaying just on zfs send|recv might be a good idea.



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Robert Milkowski

On 18/01/2010 08:59, Phil Harman wrote:
YMMV. At a recent LOSUG meeting we were told of a case where rsync was 
faster than an incremental zfs send/recv. But I think that was for a 
mail server with many tiny files (i.e. changed blocks are very easy to 
find in files with very few blocks).





After changes around build 114 (iirc) this shouldn't be the case anymore 
znd an incremental zfs send should always be able to at least match the 
performance of incremental rsync. Now with lots of files where only a 
small subset of them changes incremental zfs send should be much faster 
and that's my observation.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backing up a ZFS pool

2010-01-18 Thread Daniel Carosone
On Mon, Jan 18, 2010 at 03:24:19AM -0500, Edward Ned Harvey wrote:
> Unless I am mistaken, I believe, the following is not possible:
> 
> On the source, create snapshot "1"
> Send snapshot "1" to destination
> On the source, create snapshot "2"
> Send incremental, from "1" to "2" to the destination.
> On the source, destroy snapshot "1"
> On the destination, destroy snapshot "1"
> 
> I think, since snapshot "2" was derived from "1" you can't destroy "1"
> unless you've already destroyed "2"
> 
> Am I wrong?

As noted already, yes you are.

Indeed, if you specify zfs recv -F, you only need to destroy @1 at the
source.  When you later send -R, snapshots destroyed at the source
will also be destroyed at the receiver.  That's not always what you
want, so be careful, but if it is what you want it's useful.

--
Dan.


pgpsLR84PcUWW.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recordsize...

2010-01-18 Thread Robert Milkowski

On 17/01/2010 20:34, Bob Friesenhahn wrote:

On Mon, 18 Jan 2010, Tristan Ball wrote:

Is there a way to check the recordsize of a given file, assuming that 
the filesystems recordsize was changed at some point?


This would be problematic since a file may consist of different size 
records (at least I think so).  If the record size was changed after 
the file was already created, then new/updated parts would use the new 
record size.


A single file can only have one recordsize (except for a tail block 
which might be shorter).
So if you created a large file with the default recordsize of 128K and 
you change later a filesystem recordsize to lets say 8k it will affect 
only new files being created - the file will stay at using 128K. However 
if you would copy the file then its copy would use a new recordsize of 8k.


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Thomas Burgess
On Mon, Jan 18, 2010 at 3:59 AM, Phil Harman  wrote:

> YMMV. At a recent LOSUG meeting we were told of a case where rsync was
> faster than an incremental zfs send/recv. But I think that was for a mail
> server with many tiny files (i.e. changed blocks are very easy to find in
> files with very few blocks).
>
> However, I don't see why further ZFS perfomance work couldn't close that
> gap, since rsync will always need to compare directories and timestamps.
>
> Phil
>
> The best info i've read on this was on this blog:
http://richardelling.blogspot.com/2009/01/parallel-zfs-sendreceive.html


>
> On 18 Jan 2010, at 08:07, Edward Ned Harvey  wrote:
>
>  I still believe that a set of compressed incremental star archives give
>>> you
>>> more features.
>>>
>>
>> Big difference there is that in order to create an incremental star
>> archive,
>> star has to walk the whole filesystem or folder that's getting backed up,
>> and do a "stat" on every file to see which files have changed since the
>> last
>> backup run.  If you have a large filesystem, that can take a very long
>> time.
>>
>> I recently switched to ZFS specifically for this reason.  Previously, I
>> was
>> doing a nightly rsync on 1Tb of data.  It required 10 hrs every night to
>> run, and copy typically a few hundred megs that had changed that day.
>>  Now,
>> I run incremental zfs send & receive, and it completes typically in a
>> minute
>> or two.
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backing up a ZFS pool

2010-01-18 Thread Ian Collins

Edward Ned Harvey wrote:

Personally, I like to start with a fresh "full" image once a month,
  

and then do daily incrementals for the rest of the month.

This doesn't buy you anything. ZFS isn't like traditional backups.



If you never send another full, then eventually the delta from the original
to the present will become large.  Not a problem, you're correct, as long as
your destination media is sufficiently large.

Unless I am mistaken, I believe, the following is not possible:

On the source, create snapshot "1"
Send snapshot "1" to destination
On the source, create snapshot "2"
Send incremental, from "1" to "2" to the destination.
On the source, destroy snapshot "1"
On the destination, destroy snapshot "1"

I think, since snapshot "2" was derived from "1" you can't destroy "1"
unless you've already destroyed "2"

Am I wrong?

Yes - what you describe is how I maintain my remote backups!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recordsize...

2010-01-18 Thread Fajar A. Nugraha
On Mon, Jan 18, 2010 at 10:22 AM, Richard Elling
 wrote:
> On Jan 17, 2010, at 11:59 AM, Tristan Ball wrote:
>> Is there a way to check the recordsize of a given file, assuming that the 
>> filesystems recordsize was changed at some point?
>
> I don't know of an easy way to do this.

can't you use zdb? something like

zdb - pool_name/fs_name

then read the output of dblk. You'd have to manually look for that
file in the output though.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Phil Harman
YMMV. At a recent LOSUG meeting we were told of a case where rsync was  
faster than an incremental zfs send/recv. But I think that was for a  
mail server with many tiny files (i.e. changed blocks are very easy to  
find in files with very few blocks).


However, I don't see why further ZFS perfomance work couldn't close  
that gap, since rsync will always need to compare directories and  
timestamps.


Phil

On 18 Jan 2010, at 08:07, Edward Ned Harvey   
wrote:


I still believe that a set of compressed incremental star archives  
give

you
more features.


Big difference there is that in order to create an incremental star  
archive,
star has to walk the whole filesystem or folder that's getting  
backed up,
and do a "stat" on every file to see which files have changed since  
the last
backup run.  If you have a large filesystem, that can take a very  
long time.


I recently switched to ZFS specifically for this reason.   
Previously, I was
doing a nightly rsync on 1Tb of data.  It required 10 hrs every  
night to
run, and copy typically a few hundred megs that had changed that  
day.  Now,
I run incremental zfs send & receive, and it completes typically in  
a minute

or two.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backing up a ZFS pool

2010-01-18 Thread Gaëtan Lehmann


Le 18 janv. 10 à 09:24, Edward Ned Harvey a écrit :


Personally, I like to start with a fresh "full" image once a month,

and then do daily incrementals for the rest of the month.

This doesn't buy you anything. ZFS isn't like traditional backups.


If you never send another full, then eventually the delta from the  
original
to the present will become large.  Not a problem, you're correct, as  
long as

your destination media is sufficiently large.

Unless I am mistaken, I believe, the following is not possible:

On the source, create snapshot "1"
Send snapshot "1" to destination
On the source, create snapshot "2"
Send incremental, from "1" to "2" to the destination.
On the source, destroy snapshot "1"
On the destination, destroy snapshot "1"

I think, since snapshot "2" was derived from "1" you can't destroy "1"
unless you've already destroyed "2"



This is definitely possible with zfs. Just try!

Gaëtan

--
Gaëtan Lehmann
Biologie du Développement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr



PGP.sig
Description: Ceci est une signature électronique PGP
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recordsize...

2010-01-18 Thread Phil Harman

Richard Elling wrote:


Tristan Ball wrote:
Also - Am I right in thinking that if a 4K write is made to a  
filesystem block with a recordsize of 8K, then the original block  
is read (assuming it's not in the ARC), before the new block is  
written elsewhere (the "copy", from copy on write)? This would be  
one of the reasons that aligning application IO size and filesystem  
record sizes is a good thing, because where such IO is aligned, you  
remove the need for that original read?


No.  Think of recordsize as a limit.  As long as the recordsize >= 4  
KB, a 4KB file will only use one, 4KB record.

-- richard


I didn't read Tristan's question as refering to a 4KB file.

If a file with an 8KB recordsize already has one or more 8KB records,  
then a single 4KB non-synchronous write to a record not already in the  
ARC will require a read as part of the copy on write operation.


Howver, I'm assuming that mutiple synchronous sequential writes to,  
say, an Oracle redo log, which are first committed to the ZIL, will  
generally coalesce before the file's records are COW-ed, thus avoiding  
reads for all but the last record (assuming it's not aligned or cached).


But I'm always open to having my assumptions verified :)

Phil 
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backing up a ZFS pool

2010-01-18 Thread Edward Ned Harvey
> > Personally, I like to start with a fresh "full" image once a month,
> and then do daily incrementals for the rest of the month.
> 
> This doesn't buy you anything. ZFS isn't like traditional backups.

If you never send another full, then eventually the delta from the original
to the present will become large.  Not a problem, you're correct, as long as
your destination media is sufficiently large.

Unless I am mistaken, I believe, the following is not possible:

On the source, create snapshot "1"
Send snapshot "1" to destination
On the source, create snapshot "2"
Send incremental, from "1" to "2" to the destination.
On the source, destroy snapshot "1"
On the destination, destroy snapshot "1"

I think, since snapshot "2" was derived from "1" you can't destroy "1"
unless you've already destroyed "2"

Am I wrong?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Edward Ned Harvey
> Consider then, using a zpool-in-a-file as the file format, rather than
> zfs send streams.

That's a pretty cool idea.  Then you've still got the entire zfs volume
inside of a file, but you're able to mount and extract individual files if
you want, and you're able to pipe your zfs send directly to zfs receive, and
basically you get all the benefits of zfs send & receive, but also get all
the benefits of being able to treat your backup like a file or a set of
files.

Pretty cool.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Edward Ned Harvey
> I still believe that a set of compressed incremental star archives give
> you
> more features.

Big difference there is that in order to create an incremental star archive,
star has to walk the whole filesystem or folder that's getting backed up,
and do a "stat" on every file to see which files have changed since the last
backup run.  If you have a large filesystem, that can take a very long time.

I recently switched to ZFS specifically for this reason.  Previously, I was
doing a nightly rsync on 1Tb of data.  It required 10 hrs every night to
run, and copy typically a few hundred megs that had changed that day.  Now,
I run incremental zfs send & receive, and it completes typically in a minute
or two.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss