Re: [zfs-discuss] 'zfs recv' is very slow

2009-01-06 Thread Carsten Aulbert
Hi,

Brent Jones wrote:
> 
> Using mbuffer can speed it up dramatically, but this seems like a hack
> without addressing a real problem with zfs send/recv.
> Trying to send any meaningful sized snapshots from say an X4540 takes
> up to 24 hours, for as little as 300GB changerate.

I have not found a solution yet also. But it seems to depend highly on
the distribution of file sizes, number of files per directory or
whatever. The last tests I made showed still more than 50 hours for 700
GB and ~45 hours for 5 TB (both tests were null tests where zfs send
wrote to /dev/null).

Cheers from a still puzzled Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Neil Perrin


On 01/06/09 21:25, Nicholas Lee wrote:
> Since zfs is so smart is other areas is there a particular reason why a 
> high water mark is not calculated and the available space not reset to this?
> 
> I'd far rather have a zpool of 1000GB that said it only had 900GB but 
> did not have corruption as it ran out of space.
> 
> Nicholas

Is there any evidence of corruption at high capacity or just
a lack of performance? All file systems will slow down when
near capacity, as they struggle to find space and then have to
spread writes over the disk. Our priorities are integrity first,
followed somewhere by performance.

I vaguely remember a time when UFS had limits to prevent
ordinary users from consuming past a certain limit, allowing
only the super-user to use it. Not that I'm advocating that
approach for ZFS.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread JZ
BTW, high water mark method is not perfect, here is some for Novell support of 
water mark...
best,
z

http://www.novell.com/coolsolutions/tools/16991.html

Based on my own belief that there had to be a "better way" and the number of 
issues I'd seen reported in the Support Forums, I spent a lot of time 
researching how different memory settings affect the memory management and 
stability of the server. Based on that research I've made memory tuning 
recommendations to a large number of forum posters who were having memory 
tuning issues, and most of them have found their servers to be significantly 
more stable since applying the changes I recommended.

What follows are the formulas I developed for recommending memory tuning 
changes to a server. The formulas take a number of the values available from 
SEG.NLM (available from: http://www.novell.com/coolsolutions/tools/14445.html). 
To get the required values, load SEG.NLM, then from the main screen do '/', 
then 'Info', then 'Write SEGSTATS.TXT'. The SEGSTATS.TXT file will be created 
in SYS:SYSTEM.

SEG monitors the server and records a number of key memory statistics, my 
formulae take those statistics and recommend manual memory tuning parameters.

Note that as these are manual settings, auto tuning is disabled, and if the 
memory usage of the server changes significantly, then the server will need to 
be retuned to reflect the change in memory usage.

Also, after making the changes to use manual rather than auto tuning, the 
server may still recommend that the FCMS and "-u" memory settings be changed. 
These recommendations can be ignored. Following them will have the same effect 
as auto tuning, except you're doing it rather than the server doing it 
automatically - the same problems will still occur.

  - Original Message - 
  From: Tim 
  To: Nicholas Lee 
  Cc: zfs-discuss@opensolaris.org ; Sam 
  Sent: Wednesday, January 07, 2009 12:02 AM
  Subject: Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05





  On Tue, Jan 6, 2009 at 10:25 PM, Nicholas Lee  wrote:

Since zfs is so smart is other areas is there a particular reason why a 
high water mark is not calculated and the available space not reset to this?


I'd far rather have a zpool of 1000GB that said it only had 900GB but did 
not have corruption as it ran out of space.


Nicholas


  WHAT??!?  Put artificial limits in place to prevent users from killing 
themselves?  How did that go Jeff?

  "I suggest that you retire to the safety of the rubber room while the rest of 
us enjoy these zfs features. By the same measures, you would advocate that 
people should never be allowed to go outside due to the wide open spaces.  
Perhaps people will wander outside their homes and forget how to make it back.  
Or perhaps there will be gravity failure and some of the people outside will be 
lost in space."

  It's NEVER a good idea to put a default limitation in place to protect a 
*regular user*.  If they can't RTFM from front cover to back they don't deserve 
to use a computer.

  --Tim



--


  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Tim
On Tue, Jan 6, 2009 at 10:25 PM, Nicholas Lee  wrote:

> Since zfs is so smart is other areas is there a particular reason why a
> high water mark is not calculated and the available space not reset to this?
> I'd far rather have a zpool of 1000GB that said it only had 900GB but did
> not have corruption as it ran out of space.
>
> Nicholas
>


WHAT??!?  Put artificial limits in place to prevent users from killing
themselves?  How did that go Jeff?

"I suggest that you retire to the safety of the rubber room while the rest
of us enjoy these zfs features. By the same measures, you would advocate
that people should never be allowed to go outside due to the wide open
spaces.  Perhaps people will wander outside their homes and forget how to
make it back.  Or perhaps there will be gravity failure and some of the
people outside will be lost in space."

It's NEVER a good idea to put a default limitation in place to protect a
*regular user*.  If they can't RTFM from front cover to back they don't
deserve to use a computer.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Nicholas Lee
Since zfs is so smart is other areas is there a particular reason why a high
water mark is not calculated and the available space not reset to this?
I'd far rather have a zpool of 1000GB that said it only had 900GB but did
not have corruption as it ran out of space.

Nicholas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Tim
On Tue, Jan 6, 2009 at 6:19 PM, Sam  wrote:

> I was hoping that this was the problem (because just buying more discs is
> the cheapest solution given time=$$) but running it by somebody at work they
> said going over 90% can cause decreased performance but is unlikely to cause
> the strange errors I'm seeing.  However, I think I'll stick a 1TB drive in
> as a new volume and pull some data onto it to bring the zpool down to <75%
> capacity and see if that helps though anyway.  Probably update the OS to
> 2008.11 as well.
> --
>


Uhh, I would never accept that one as a solution.  90% full or not a READ
should never, ever, ever corrupt a pool.  Heck, a write shouldn't either.  I
could see the system falling over and puking on itself performance wise, but
corruption?  No way.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread David Magda
On Jan 6, 2009, at 14:21, Rob wrote:

> Obviously ZFS is ideal for large databases served out via  
> application level or web servers. But what other practical ways are  
> there to integrate the use of ZFS into existing setups to experience  
> it's benefits.

Remember that ZFS is made up of the ZPL and the DMU (amongst other  
things). The ZPL is the POSIX compatibility layer that most of us use.  
The DMU is the actual transactional object model that stores the  
actual data objects (e.g. files).

It would technically be possible for (say) MySQL to create a database  
engine on top of that transactional store. I believe that the Lustre  
people are using the DMU for their future data store back end. The DMU  
runs in userland so anyone can use it for any object store system.

People keep talking about ZFS in the context of replacing UFS/FFS,  
ext3, WAFL, etc., but few are utilizing (or realize the availability  
of) the transactional store.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Carson Gaspar
On 1/6/2009 4:19 PM, Sam wrote:
> I was hoping that this was the problem (because just buying more
> discs is the cheapest solution given time=$$) but running it by
> somebody at work they said going over 90% can cause decreased
> performance but is unlikely to cause the strange errors I'm seeing.
> However, I think I'll stick a 1TB drive in as a new volume and pull
> some data onto it to bring the zpool down to<75% capacity and see if
> that helps though anyway.  Probably update the OS to 2008.11 as
> well.

Pool corruption is _always_ a bug. It may be ZFS, or your block devices,
but something is broken

-- 
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Sam
I was hoping that this was the problem (because just buying more discs is the 
cheapest solution given time=$$) but running it by somebody at work they said 
going over 90% can cause decreased performance but is unlikely to cause the 
strange errors I'm seeing.  However, I think I'll stick a 1TB drive in as a new 
volume and pull some data onto it to bring the zpool down to <75% capacity and 
see if that helps though anyway.  Probably update the OS to 2008.11 as well.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Orvar Korvar
It is not recommended to store more than 90% on any file system, I think. For 
instance, NTFS can behave very badly when it runs out of space. Similar to if 
you fill up your RAM and you have no swap space. Then the computer starts to 
thrash badly. Not recommended. Avoid 90% and above, and you have eliminated a 
possible source of problems.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2009-01-06 Thread Brent Jones
On Sat, Dec 6, 2008 at 11:40 AM, Ian Collins  wrote:
> Richard Elling wrote:
>> Ian Collins wrote:
>>> Ian Collins wrote:
 Andrew Gabriel wrote:
> Ian Collins wrote:
>> I've just finished a small application to couple zfs_send and
>> zfs_receive through a socket to remove ssh from the equation and the
>> speed up is better than 2x.  I have a small (140K) buffer on the
>> sending
>> side to ensure the minimum number of sent packets
>>
>> The times I get for 3.1GB of data (b101 ISO and some smaller
>> files) to a
>> modest mirror at the receive end are:
>>
>> 1m36s for cp over NFS,
>> 2m48s for zfs send though ssh and
>> 1m14s through a socket.
>>
> So the best speed is equivalent to 42MB/s.
> It would be interesting to try putting a buffer (5 x 42MB = 210MB
> initial stab) at the recv side and see if you get any improvement.
>
>>> It took a while...
>>>
>>> I was able to get about 47MB/s with a 256MB circular input buffer. I
>>> think that's about as fast it can go, the buffer fills so receive
>>> processing is the bottleneck.  Bonnie++ shows the pool (a mirror) block
>>> write speed is 58MB/s.
>>>
>>> When I reverse the transfer to the faster box, the rate drops to 35MB/s
>>> with neither the send nor receive buffer filling.  So send processing
>>> appears to be the limit in this case.
>> Those rates are what I would expect writing to a single disk.
>> How is the pool configured?
>>
> The "slow" system has a single mirror pool of two SATA drives, the
> faster one a stripe of 4 mirrors and an IDE SD boot drive.
>
> ZFS send though ssh from the slow to the fast box takes 189 seconds, the
> direct socket connection send takes 82 seconds.
>
> --
> Ian.
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Reviving an old discussion, but has the core issue been addressed in
regards to zfs send/recv performance issues? I'm not able to find any
new bug reports on bugs.opensolaris.org related to this, but my search
kung-fu may be weak.

Using mbuffer can speed it up dramatically, but this seems like a hack
without addressing a real problem with zfs send/recv.
Trying to send any meaningful sized snapshots from say an X4540 takes
up to 24 hours, for as little as 300GB changerate.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread JZ
Ok, folks, new news -  [feel free to comment in any fashion, since I don't 
know how yet.]


EMC ACQUIRES OPEN-SOURCE ASSETS FROM SOURCELABS 
http://go.techtarget.com/r/5490612/6109175






<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Tim
On Tue, Jan 6, 2009 at 2:58 PM, Rob  wrote:

> Wow. I will read further into this. That seems like it could have great
> applications. I assume the same is true of FCoE?
> --
>

Yes, iSCSI, FC, FCOE all present out a LUN to Windows.  For the layman, from
the windows system the disk will look identical to a SCSI disk plugged
directly into the motherboard.  That's not entirely accurate, but close
enough for you to get an idea.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 04:10:10PM -0500, JZ wrote:
> Hello Darren,
> This one, ok, was a validate thought/question --

Darn, I was hoping...

> On Solaris, root pools cannot have EFI labels (the boot firmware doesn't  
> support booting from them).
> http://blog.yucas.info/2008/11/26/zfs-boot-solaris/

Yup.  If that were to change, it would make this much simpler.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Alex Viskovatoff
Hi Cindy,

I now suspect that the boot blocks are located outside of the space in 
partition 0 that actually belongs to the zpool, in which case it is not 
necessarily a bug that zpool attach does not write those blocks, IMO. Indeed, 
that must be the case, since GRUB needs to get to stage2 in order to be able to 
read zfs file systems. I'm just glad zpool attach warned me that I need to 
invoke grubinstall manually!

Thank you for making things less mysterious.

Alex
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 01:24:17PM -0800, Alex Viskovatoff wrote:

> a...@diotiima:~# installgrub -m /boot/grub/stage1 /boot/grub/stage2 
> /dev/rdsk/c4t0d0s0
> Updating master boot sector destroys existing boot managers (if any).
> continue (y/n)?y
> stage1 written to partition 0 sector 0 (abs 16065)
> stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
> stage1 written to master boot sector
> a...@diotima:~# 

> So installgrub writes to partition 0. How does one know that those
> sectors have not already been used by zfs, in its mirroring of the
> first drive by this second drive?

Because this is a VTOC label partition (necessary for Solaris boot), I
believe ZFS lives only within slice 0 (I need to verify this).  So VTOC
cylinder 0 is free.

> And why is writing to partition 0 even necessary? Since c3t0d0 must
> contain stage1 and stage2 in its partition 0, wouldn't c4t0d0 already
> have stage1 and stage 2 in its partition 0 through the silvering
> process?

I doubt that all of partition 0 is mirrored.  Only the data under ZFS
control (not the boot blocks) are copied.

Do not confuse the MBR partition 0 (the "Solaris partition") with the
VTOC slice 0 (the one that has all the disk cylinders other than
cylinder 0 in it and that appeared in your earlier "label" output).

> I don't find the present disk format/label/partitioning experience
> particularly unpleasant (except for grubinstall writing directly into
> a partition which belongs to a zpool). I just wish I understood what
> it involves.

Partition 0 contains all of Solaris.  So the OS just needs to keep
things straight.  It does this with the VTOC slicing within that
partition.

> Thank you for that link to the System Administration Guide. I just
> looked at it again, and it says partition 8 "Contains GRUB boot
> information". So partition 8 is the master boot sector and contains
> GRUB stage1?

It should probably refer to "slice 8" to reduce confusion.  Boot loaders
(GRUB since that's what is in use here) are simultaneously in partition
0 and slice 8.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Alex Viskovatoff
Thanks for clearing that up. That all makes sense.

I was wondering why ZFS doesn't use the whole disk in the standard OpenSolaris 
install. That explains it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-06 Thread Sam
I've run into this problem twice now, before I had 10x500GB drives in a ZFS+ 
setup and now again in a 12x500GB ZFS+ setup.

The problem is when the pool reaches ~85% capacity I get random read failures 
and around ~90% capacity I get read failures AND zpool corruption.  For example:

-I open a directory that I know for a fact has files and folders in it but it 
either shows 0 items or hangs on a directory listing
-I try to copy a file from the zpool volume to another volume and it hangs then 
fails

In both these situations if I do a 'zpool status' after the fact it claims that 
the volume has experienced an unrecoverable error and I should find the faulty 
drive and replace it blah blah.  If I do a 'zpool scrub' it eventually reports 
0 faults or error, also if I restart the machine it will usually work jut fine 
again (ie I can read the directory and copy files again).

Is this a systemic problem at 90% capacity or do I perhaps have a faulty drive 
in the array that only gets hit at 90%?  If it is a faulty drive why does 
'zpool status' report fully good health, that makes it hard to find the problem 
drive?

Thanks,
Sam
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] POSIX permission bits, ACEs, and inheritance confusion

2009-01-06 Thread Nicolas Williams
On Tue, Jan 06, 2009 at 01:27:41PM -0800, Peter Skovgaard Nielsen wrote:
> > ls -V file
> --+  1 root root   0 Jan  6 22:15 file
>  user:root:rwxpdDaARWcCos:--:allow
>  everyone@:--:--:allow
> 
> Not bad at all. However, I contend that this shouldn't be necessary -
> and I don't understand why the inclusion of just one "POSIX ACE"
> (empty to boot) makes things work as expected.

Because, IIRC, NFSv4 ACLs have an ambiguity, as to what happens if no
ACE matches the subject that's trying to access the resource.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] POSIX permission bits, ACEs, and inheritance confusion

2009-01-06 Thread Mark Shellenbaum
  >> ls -V file
> -rw-r--r--+  1 root root   0 Jan  6 21:42 d
>  user:root:rwxpdDaARWcCos:--:allow
> owner@:--x---:--:deny
> owner@:rw-p---A-W-Co-:--:allow
> group@:-wxp--:--:deny
> group@:r-:--:allow
>  everyone@:-wxp---A-W-Co-:--:deny
>  everyone@:r-a-R-c--s:--:allow
> 
> Can anyone explain to me what just happened? Why are owner/group/everyone 
> ACEs (corresponding to old fashioned POSIX permission bits) created and even 
> more strange, why are deny entries created? Is there something mandating the 
> creation of these ACEs? I can understand that umask might affect this, but I 
> wouldn't have though that it would be causing ACEs to appear out of the blue.
> 
> While writing this, I stumbled into this thread: http://tinyurl.com/7ofxfj. 
> Ok, so it seems that this is "intended" behavior to comply with POSIX. As the 
> author of the thread mentioned, I would like to see an inheritance mode that 
> completely ignores POSIX. The thread ends with Mark Shellenbaum commenting 
> that he will fasttrack "the behavior that many people want". It is not clear 
> to me what exactly he means by this.
> 
> Then I found http://docs.sun.com/app/docs/doc/819-5461/ftyxi?l=zh&a=view and 
> much to my confusion, the deny ACEs aren't created in example 8-10. How could 
> this be? Following some playing around, I came to the conclusion that as long 
> as at least one ACE corresponding to owner/group/everyone exists, the extra 
> ACEs aren't created:
> 

The requested mode from an application is only ignored if the directory 
has inheritable ACES that would affect the mode when aclinherit is set 
to passthrough.  Otherwise the mode is honored and you get the 
owner@,group@ and everyone@ ACEs.


The way a file create works is something like this

1. build up ACL based on inherited ACEs from parent.  This ACL will
often be "emtpy" when no inheritable ACEs exists.

2. Next the chmod algorithm is applied the the ACL in order to make it
reflect the requested mode.  This step is bypassed if the ACL created
in step 1 had any ACEs that affect the mode and aclinherit property
is set to passthrough.

>> mkdir test
>> chmod A=user:root:rwxpdDaARWcCos:fd:allow,everyone@::fd:allow test
>> ls -dV test
> d-+  3 root root  15 Jan  6 22:11 test
>  user:root:rwxpdDaARWcCos:fd:allow
>  everyone@:--:fd:allow
> 
>> cd test
>> touch file
>> ls -V file
> --+  1 root root   0 Jan  6 22:15 file
>  user:root:rwxpdDaARWcCos:--:allow
>  everyone@:--:--:allow
> 
> Not bad at all. However, I contend that this shouldn't be necessary - and I 
> don't understand why the inclusion of just one "POSIX ACE" (empty to boot) 
> makes things work as expected.
> 
> /Peter

We don't have a mode that says completely disregard POSIX.  We try to 
honor the applications mode request except in situations where the 
inherited ACEs would conflict with the requested mode from the application.


-Mark

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Cindy . Swearingen
Hi Alex,

The fact that you have to install the boot blocks manually on the
second disk that you added with zpool attach is a bug! I should have
mentioned this bug previously.

If you had used the initial installation method to create a mirrored
root pool, the boot blocks would have been applied automatically.

I don't think a way exists to discern whether the boot blocks are
already applied. I can't comment on why resilvering can't do this step.

Cindy

Alex Viskovatoff wrote:
> Cindy,
> 
> Well, it worked. The system can boot off c4t0d0s0 now.
> 
> But I am still a bit perplexed. Here is how the invocation of installgrub 
> went:
> 
> a...@diotiima:~# installgrub -m /boot/grub/stage1 /boot/grub/stage2 
> /dev/rdsk/c4t0d0s0
> Updating master boot sector destroys existing boot managers (if any).
> continue (y/n)?y
> stage1 written to partition 0 sector 0 (abs 16065)
> stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
> stage1 written to master boot sector
> a...@diotima:~# 
> 
> So installgrub writes to partition 0. How does one know that those sectors 
> have not already been used by zfs, in its mirroring of the first drive by 
> this second drive? And why is writing to partition 0 even necessary? Since 
> c3t0d0 must contain stage1 and stage2 in its partition 0, wouldn't c4t0d0 
> already have stage1 and stage 2 in its partition 0 through the silvering 
> process?
> 
> I don't find the present disk format/label/partitioning experience 
> particularly unpleasant (except for grubinstall writing directly into a 
> partition which belongs to a zpool). I just wish I understood what it 
> involves.
> 
> Thank you for that link to the System Administration Guide. I just looked at 
> it again, and it says partition 8 "Contains GRUB boot information". So 
> partition 8 is the master boot sector and contains GRUB stage1?
> 
> Alex
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] POSIX permission bits, ACEs, and inheritance confusion

2009-01-06 Thread Peter Skovgaard Nielsen
I am running a test system with Solaris 10u6 and I am somewhat confused as to 
how ACE inheritance works. I've read through 
http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf but it doesn't seem 
to cover what I am experiencing.

The ZFS file system that I am working on has both aclmode and aclinherit set to 
passthrough, which I thought would result in the ACEs being just that - passed 
through without modification.

In my test scenario, I am creating a folder, removing all ACEs and adding a 
single full access allow ACE with file and directory inheritance for one user:

> mkdir test
> chmod A=user:root:rwxpdDaARWcCos:fd:allow test

Permission check:
> ls -dV test
d-+  2 root  root  2 Jan  6 21:17 test
  user:root:rwxpdDaARWcCos:fd:allow

Ok, that seems to be as I intended. Now I cd into the folder and create a file:

> cd test
> touch file

Permission check:

> ls -V file
-rw-r--r--+  1 root root   0 Jan  6 21:42 d
 user:root:rwxpdDaARWcCos:--:allow
owner@:--x---:--:deny
owner@:rw-p---A-W-Co-:--:allow
group@:-wxp--:--:deny
group@:r-:--:allow
 everyone@:-wxp---A-W-Co-:--:deny
 everyone@:r-a-R-c--s:--:allow

Can anyone explain to me what just happened? Why are owner/group/everyone ACEs 
(corresponding to old fashioned POSIX permission bits) created and even more 
strange, why are deny entries created? Is there something mandating the 
creation of these ACEs? I can understand that umask might affect this, but I 
wouldn't have though that it would be causing ACEs to appear out of the blue.

While writing this, I stumbled into this thread: http://tinyurl.com/7ofxfj. Ok, 
so it seems that this is "intended" behavior to comply with POSIX. As the 
author of the thread mentioned, I would like to see an inheritance mode that 
completely ignores POSIX. The thread ends with Mark Shellenbaum commenting that 
he will fasttrack "the behavior that many people want". It is not clear to me 
what exactly he means by this.

Then I found http://docs.sun.com/app/docs/doc/819-5461/ftyxi?l=zh&a=view and 
much to my confusion, the deny ACEs aren't created in example 8-10. How could 
this be? Following some playing around, I came to the conclusion that as long 
as at least one ACE corresponding to owner/group/everyone exists, the extra 
ACEs aren't created:

> mkdir test
> chmod A=user:root:rwxpdDaARWcCos:fd:allow,everyone@::fd:allow test
> ls -dV test
d-+  3 root root  15 Jan  6 22:11 test
 user:root:rwxpdDaARWcCos:fd:allow
 everyone@:--:fd:allow

> cd test
> touch file
> ls -V file
--+  1 root root   0 Jan  6 22:15 file
 user:root:rwxpdDaARWcCos:--:allow
 everyone@:--:--:allow

Not bad at all. However, I contend that this shouldn't be necessary - and I 
don't understand why the inclusion of just one "POSIX ACE" (empty to boot) 
makes things work as expected.

/Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Alex Viskovatoff
Cindy,

Well, it worked. The system can boot off c4t0d0s0 now.

But I am still a bit perplexed. Here is how the invocation of installgrub went:

a...@diotiima:~# installgrub -m /boot/grub/stage1 /boot/grub/stage2 
/dev/rdsk/c4t0d0s0
Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
stage1 written to master boot sector
a...@diotima:~# 

So installgrub writes to partition 0. How does one know that those sectors have 
not already been used by zfs, in its mirroring of the first drive by this 
second drive? And why is writing to partition 0 even necessary? Since c3t0d0 
must contain stage1 and stage2 in its partition 0, wouldn't c4t0d0 already have 
stage1 and stage 2 in its partition 0 through the silvering process?

I don't find the present disk format/label/partitioning experience particularly 
unpleasant (except for grubinstall writing directly into a partition which 
belongs to a zpool). I just wish I understood what it involves.

Thank you for that link to the System Administration Guide. I just looked at it 
again, and it says partition 8 "Contains GRUB boot information". So partition 8 
is the master boot sector and contains GRUB stage1?

Alex
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread JZ
Hello Darren,
This one, ok, was a validate thought/question --

On Solaris, root pools cannot have EFI labels (the boot firmware doesn't 
support booting from them).
http://blog.yucas.info/2008/11/26/zfs-boot-solaris/

But again, this is a ZFS discussion, and obvously EFI is not a ZFS, or even 
Sun thing.
http://en.wikipedia.org/wiki/Extensible_Firmware_Interface

Hence, on ZFS turf, I would offer the following comment, in the notion of 
innovation, not trashing EFI.
The ZFS design point is to "make IT simple", so EFI or not EFI can be 
debate-able.
http://kerneltrap.org/node/6884

;-)
best,
z

- Original Message - 
From: "A Darren Dunham" 
To: 
Sent: Tuesday, January 06, 2009 3:38 PM
Subject: Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme


> On Tue, Jan 06, 2009 at 11:49:27AM -0700, cindy.swearin...@sun.com wrote:
>> My wish for this year is to boot from EFI-labeled disks so examining
>> disk labels is mostly unnecessary because ZFS pool components could be
>> constructed as whole disks, and the unpleasant disk
>> format/label/partitioning experience is just a dim memory...
>
> Is there any non-EFI hardware that would support EFI boot?
>
> -- 
> Darren
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Rob
Wow. I will read further into this. That seems like it could have great 
applications. I assume the same is true of FCoE?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
> Yes, iozone does support threading.  Here is a test with a record size of
> 8KB, eight threads, synchronous writes, and a 2GB test file:
>
>Multi_buffer. Work area 16777216 bytes
>OPS Mode. Output is in operations per second.
>Record Size 8 KB
>SYNC Mode.
>File size set to 2097152 KB
>Command line used: iozone -m -t 8 -T -O -r 8k -o -s 2G
>Time Resolution = 0.01 seconds.
>Processor cache size set to 1024 Kbytes.
>Processor cache line size set to 32 bytes.
>File stride size set to 17 * record size.
>Throughput test with 8 threads
>Each thread writes a 2097152 Kbyte file in 8 Kbyte records
>
> When testing with iozone, you will want to make sure that the test file is
> larger than available RAM, such as 2X the size.
>
> Bob


OK, I ran it as suggested (using a 17GB file pre-generated from
urandom) and I'm getting what appear to be sane iozone results now.
Do we have a place to compare performance notes?

thx
jake
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread JZ
[ok, no one replying, my spam then...]

Open folks just care about SMART so far.
http://www.mail-archive.com/linux-s...@vger.kernel.org/msg07346.html

Enterprise folks care more about spin-down.
(not an open thing yet, unless new practical industry standard is here that 
I don't know. yeah right.)

best,
z

- Original Message - 
From: "Anton B. Rang" 
To: 
Sent: Tuesday, January 06, 2009 9:07 AM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> For SCSI disks (including FC), you would use the FUA bit on the read 
> command.
>
> For SATA disks ... does anyone care?  ;-)
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 11:49:27AM -0700, cindy.swearin...@sun.com wrote:
> My wish for this year is to boot from EFI-labeled disks so examining
> disk labels is mostly unnecessary because ZFS pool components could be
> constructed as whole disks, and the unpleasant disk
> format/label/partitioning experience is just a dim memory...

Is there any non-EFI hardware that would support EFI boot?

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs create performance degrades dramatically with increasing number of file systems

2009-01-06 Thread Alastair Neil
On Mon, Jan 5, 2009 at 5:27 AM, Roch  wrote:
> Alastair Neil writes:
>  > I am attempting to create approx 10600 zfs file systems across two
>  > pools.  The devices underlying the pools are mirrored iscsi volumes
>  > shared over a dedicated gigabit Ethernet with jumbo frames enabled
>  > (MTU 9000) from a Linux Openfiler 2.3 system. I have added a couple of
>  > 4GByte  zvols from the root pool to use as zil devices for the two
>  > iscsi backed pools.
>  >
>  > The solaris system is running snv_101b, has 8 Gbyte RAM and dual 64
>  > bit Xeon processors.
>  >
>  > Initially I was able to create zfs file systems at a rate of around 1
>  > every 6 seconds (wall clock time),
>  > now three days later I have created
>  > 9000 zfs file systems and the creation rate has dropped to approx  1
>  > per minute, an order of magnitude slower.
>  >
>  > I attempted an experiment on a system with half the memory and using
>  > looped back zvols and saw similar performance.
>  >
>  > I find it hard to believe that such a performance degradation is
>  > expected.  Is there some parameters I should be tuning for using large
>  > numbers of file systems?
>  >
>  > Regards, Alastair
>  > ___
>  > zfs-discuss mailing list
>  > zfs-discuss@opensolaris.org
>  > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> Sounds like
>
> 6763592 : creating zfs filesystems gets slower as the number of zfs 
> filesystems increase
> 6572357 : libzfs should do more to avoid mnttab lookups
>
> Which  just integrated in SNV 105.
>
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6763592
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6572357
>
> -r
>
>

Roch

thanks for the information, I replied to you directly by accident,
sorry about that.  I am curious what is the update process for
opensolaris.  I installed my machines almost the instant 2008.11 was
available and yet so far there have been no released updates - unless
I am doing something wrong.  Will updates from the SNV releases be
periodically rolled into 2008.11?  Am I out of luck with 2008.11 and
have to wait for 2009.04 for these fixes?  I guess I could wait and
upgrade to a Developers edition release?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Volker A. Brandt
> http://docs.sun.com/app/docs/doc/817-5093/disksconcepts-20068?a=view
>
> (To add more confusion, partitions are also referred to as slices.)

Nope, at least not on x86 systems.  A partition holds the Solaris part
of the disk, and that part is subdivided into slices.  Partitions
are visible to other OSes on the box; slices aren't.  Whereever the
wrong term appears in Sun docs, it should be treated as a doc bug.

For Sparc systems, some people intermix the two terms, but it's not
really correct there either.


Regards -- Volker
-- 

Volker A. Brandt  Consulting and Support for Sun Solaris
Brandt & Brandt Computer GmbH   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim Email: v...@bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 45
Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 10:22:20AM -0800, Alex Viskovatoff wrote:
> I did an install of OpenSolaris in which I specified that the whole disk 
> should be used for the installation. Here is what "format> verify" produces 
> for that disk:
> 
> Part  TagFlag Cylinders SizeBlocks
>   0   rootwm   1 - 60797  465.73GB(60797/0/0) 976703805
>   1 unassignedwm   00 (0/0/0) 0
>   2 backupwu   0 - 60797  465.74GB(60798/0/0) 976719870
>   3 unassignedwm   00 (0/0/0) 0
>   4 unassignedwm   00 (0/0/0) 0
>   5 unassignedwm   00 (0/0/0) 0
>   6 unassignedwm   00 (0/0/0) 0
>   7 unassignedwm   00 (0/0/0) 0
>   8   bootwu   0 - 07.84MB(1/0/0) 16065
>   9 unassignedwm   00 (0/0/0) 0
> 
> I have several questions. First, what is the purpose of partitions 2 and 8 
> here? Why not simply have partition 0, the "root" partition, be the only 
> partition, and start at cylinder 0 as opposed to 1?

It's traditional in the VTOC label to have slice 2 encompass all
cylinders.  You don't have to use it.

In SPARC, the boot blocks fit into the 15 "free" blocks before the
filesystem actually starts writing data.  On x86, the boot code requires
more data.  So putting a UFS filesystem on cylinder 0 would not leave
sufficient room for boot code.  The traditional solution is that data
slices on x86 begin at cylinder 1, leaving cylinder 0 for boot data.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Rob wrote:

> Are you saying that a Windows Server can access a ZFS drive via 
> iSCSI and store NTFS files?

A volume is created under ZFS, similar to a large sequential file. 
The iSCSI protocol is used to export that volume as a LUN.  Windows 
can then format it and put NTFS on it.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Rob
I am not experienced with iSCSI. I understand it's block level disk access via 
TCP/IP. However I don't see how using it eliminates the need for virtualization.

Are you saying that a Windows Server can access a ZFS drive via iSCSI and store 
NTFS files?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Rob wrote:

> The only way I can visualize doing so would be to virtualize the 
> windows server and store it's image in a ZFS pool. That would add 
> additional overhead but protect the data at the disk level. It would 
> also allow snapshots of the Windows Machine's virtual file. However 
> none of these benefits would protect Windows from hurting it's own 
> data, if you catch my meaning.

With OpenSolaris you can use its built in SMB/CIFS service so that 
files are stored natively in ZFS by the Windows client.  Since the 
files are stored natively, individual lost/damaged files can be 
retrieved from a ZFS snapshot if snapshots are configured to be taken 
periodically.

If you use iSCSI or the forthcoming COMSTAR project (iSCSI, FC target, 
FCOE) then you can create native Windows volumes and the whole volume 
could be "protected" via snapshots but without the ability to retrieve 
individual files.  As you say, Windows could still destroy its own 
volume.  Snapshots of iSCSI volumes will be similar to if the Windows 
system suddenly lost power at the time the snapshot was taken.

As far as ZFS portability goes, ZFS is also supported on FreeBSD, on 
Linux in an inferior mode, and soon on OS-X.  The main 
interoperability issues seem to be with the disk partitioning 
strategies used by the different operating systems.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Marcelo Leal
Hello,
 - One way is virtualization, if you use a virtualization technology that uses 
NFS for example, you could add your virtual images on a ZFS filesystem.  NFS 
can be used without virtualization too, but as you said the machines are 
windows, i don't think the NFS client for windows is production ready. 
 Maybe somebody else on the list can say...
 - Virtualization inside solaris branded zones... IIRC, the idea is have 
branded zones to support another OS (like GNU/Linux/ MS Windows, etc).
 - Another option is iSCSI, and you would not need virtualization.

 Leal
[http://www.eall.com.br/blog]

> ZFS is the bomb. It's a great file system. What are
> it's real world applications besides solaris
> userspace? What I'd really like is to utilize the
> benefits of ZFS across all the platforms we use. For
> instance, we use Microsoft Windows Servers as our
> primary platform here. How might I utilize ZFS to
> protect that data? 
> 
> The only way I can visualize doing so would be to
> virtualize the windows server and store it's image in
> a ZFS pool. That would add additional overhead but
> protect the data at the disk level. It would also
> allow snapshots of the Windows Machine's virtual
> file. However none of these benefits would protect
> Windows from hurting it's own data, if you catch my
> meaning.
> 
> Obviously ZFS is ideal for large databases served out
> via application level or web servers. But what other
> practical ways are there to integrate the use of ZFS
> into existing setups to experience it's benefits.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Rob
ZFS is the bomb. It's a great file system. What are it's real world 
applications besides solaris userspace? What I'd really like is to utilize the 
benefits of ZFS across all the platforms we use. For instance, we use Microsoft 
Windows Servers as our primary platform here. How might I utilize ZFS to 
protect that data? 

The only way I can visualize doing so would be to virtualize the windows server 
and store it's image in a ZFS pool. That would add additional overhead but 
protect the data at the disk level. It would also allow snapshots of the 
Windows Machine's virtual file. However none of these benefits would protect 
Windows from hurting it's own data, if you catch my meaning.

Obviously ZFS is ideal for large databases served out via application level or 
web servers. But what other practical ways are there to integrate the use of 
ZFS into existing setups to experience it's benefits.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Jacob Ritorto wrote:

> I have that iozone program loaded, but its results were rather cryptic
> for me.  Is it adequate if I learn how to decipher the results?  Can
> it thread out and use all of my CPUs?

Yes, iozone does support threading.  Here is a test with a record size 
of 8KB, eight threads, synchronous writes, and a 2GB test file:

 Multi_buffer. Work area 16777216 bytes
 OPS Mode. Output is in operations per second.
 Record Size 8 KB
 SYNC Mode.
 File size set to 2097152 KB
 Command line used: iozone -m -t 8 -T -O -r 8k -o -s 2G
 Time Resolution = 0.01 seconds.
 Processor cache size set to 1024 Kbytes.
 Processor cache line size set to 32 bytes.
 File stride size set to 17 * record size.
 Throughput test with 8 threads
 Each thread writes a 2097152 Kbyte file in 8 Kbyte records

When testing with iozone, you will want to make sure that the test 
file is larger than available RAM, such as 2X the size.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send fails incremental snapshot

2009-01-06 Thread Brent Jones
On Mon, Jan 5, 2009 at 4:29 PM, Brent Jones  wrote:
> On Mon, Jan 5, 2009 at 2:50 PM, Richard Elling  wrote:
>> Correlation question below...
>>
>> Brent Jones wrote:
>>>
>>> On Sun, Jan 4, 2009 at 11:33 PM, Carsten Aulbert
>>>  wrote:
>>>

 Hi Brent,

 Brent Jones wrote:

>
> I am using 2008.11 with the Timeslider automatic snapshots, and using
> it to automatically send snapshots to a remote host every 15 minutes.
> Both sides are X4540's, with the remote filesystem mounted read-only
> as I read earlier that would cause problems.
> The snapshots send fine for several days, I accumulate many snapshots
> at regular intervals, and they are sent without any problems.
> Then I will get the dreaded:
> "
> cannot receive incremental stream: most recent snapshot of pdxfilu02
> does not match incremental source
> "
>
>

 Which command line are you using?

 Maybe you need to do a rollback first (zfs receive -F)?

 Cheers

 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


>>>
>>> I am using a command similar to this:
>>>
>>> zfs send -i pdxfilu01/arch...@zfs-auto-snap:frequent-2009-01-04-03:30
>>> pdxfilu01/arch...@zfs-auto-snap:frequent-2009-01-04-03:45 | ssh -c
>>> blowfish u...@host.com /sbin/zfs recv -d pdxfilu02
>>>
>>> It normally works, then after some time it will stop. It is still
>>> doing a full snapshot replication at this time (very slowly it seems,
>>> I'm bit by the bug of slow zfs send/resv)
>>>
>>> Once I get back on my regular snapshotting, if it comes out of sync
>>> again, I'll try doing a -F rollback and see if that helps.
>>>
>>
>> When this gets slow, are the other snapshot-related commands also
>> slow?  For example, normally I see "zfs list -t snapshot" completing
>> in a few seconds, but sometimes it takes minutes?
>> -- richard
>>
>>
>
> I'm not seeing zfs related commands any slower. On the remote side, it
> builds up thousands of snapshots, and aside from SSH scrolling as fast
> as it can over the network, no other slowness.
> But the actual send and receive is getting very very slow, almost to
> the point of needing the scrap the project and find some other way to
> ship data around!
>
> --
> Brent Jones
> br...@servuhome.net
>

Got a small update on the ZFS send, I am in fact seeing ZFS list
taking several minutes to complete. I must have timed it correctly
during the send, and both sides are not completing the ZFS list, and
its been about 5 minutes already. There is a small amount of network
traffic between the two hosts, so maybe it's comparing what needs to
be sent, not sure.
I'll update when/if it completes.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
I have that iozone program loaded, but its results were rather cryptic
for me.  Is it adequate if I learn how to decipher the results?  Can
it thread out and use all of my CPUs?



> Do you have tools to do random I/O exercises?
>
> --
> Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Cindy . Swearingen
Alex,

I think the root cause of your confusion is that the format utility and
disk labels are very unfriendly and confusing.

Partition 2 identifies the whole disk and on x86 systems, space is
needed for boot-related information and is currently stored in
partition 8. Neither of these partitions require any administration and 
should not be used for anything. You can read more here:

http://docs.sun.com/app/docs/doc/817-5093/disksconcepts-20068?a=view

(To add more confusion, partitions are also referred to as slices.)

However, the system actually boots from the root file system, in 
partition 0 on your system, which is why you need to run the installgrub 
command on c4t0d0s0. Your installgrub syntax looks correct to me.

My wish for this year is to boot from EFI-labeled disks so examining
disk labels is mostly unnecessary because ZFS pool components could be
constructed as whole disks, and the unpleasant disk
format/label/partitioning experience is just a dim memory...

Cindy



Alex Viskovatoff wrote:
> Hi all,
> 
> I did an install of OpenSolaris in which I specified that the whole disk 
> should be used for the installation. Here is what "format> verify" produces 
> for that disk:
> 
> Part  TagFlag Cylinders SizeBlocks
>   0   rootwm   1 - 60797  465.73GB(60797/0/0) 976703805
>   1 unassignedwm   00 (0/0/0) 0
>   2 backupwu   0 - 60797  465.74GB(60798/0/0) 976719870
>   3 unassignedwm   00 (0/0/0) 0
>   4 unassignedwm   00 (0/0/0) 0
>   5 unassignedwm   00 (0/0/0) 0
>   6 unassignedwm   00 (0/0/0) 0
>   7 unassignedwm   00 (0/0/0) 0
>   8   bootwu   0 - 07.84MB(1/0/0) 16065
>   9 unassignedwm   00 (0/0/0) 0
> 
> I have several questions. First, what is the purpose of partitions 2 and 8 
> here? Why not simply have partition 0, the "root" partition, be the only 
> partition, and start at cylinder 0 as opposed to 1?
> 
> My second question concerns the disk I have used to mirror the first root 
> zpool disk. After I set up the second disk to mirror the first one with 
> "zpool attach -f rpool c3t0d0s0 c4t0d0s0", I got the response
> 
> Please be sure to invoke installgrub(1M) to make 'c4t0d0s0' bootable.
> 
> Is that correct? Or do I want to make c4t0d0s8 bootable, given that the label 
> of that partition is "boot"? I cannot help finding this a little confusing. 
> As far as i can tell, c4t0d0s8 (as well as c3t0d0s8 from the original disk 
> which I mirrored), cylinder 0, is not used for anything.
> 
> Finally, is the correct command to make the disk I have added to mirror the 
> first disk bootable
> 
> "installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4t0d0s0" ?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Performance issue with zfs send of a zvol

2009-01-06 Thread Brian H. Nelson
I noticed this issue yesterday when I first started playing around with 
zfs send/recv. This is on Solaris 10U6.

It seems that a zfs send of a zvol issues 'volblocksize' reads to the 
physical devices. This doesn't make any sense to me, as zfs generally 
consolidates read/write requests to improve performance. Even the dd 
case with the same snapshot does not exhibit this behavior. It seems to 
be specific to zfs send.

I checked with 8k, 64k, and 128k volblocksize, and the reads generated 
by zfs send always seem to follow that size, while the reads with dd do not.

The small reads seems to hurt performance of zfs send. I tested with a 
mirror, but on another machine with a 7 disk raidz, the performance is 
MUCH worse because the 8k reads get broken up into even smaller reads 
and spread across the raidz.

Is this a bug, or can someone explain why this is happening?

Thanks
-Brian

Using 8k volblocksize:

-bash-3.00# zfs send pool1/vo...@now > /dev/null

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool14.01G   274G  1.88K  0  15.0M  0
  mirror 4.01G   274G  1.88K  0  15.0M  0
c0t9d0   -  -961  0  7.46M  0
c0t11d0  -  -968  0  7.53M  0
---  -  -  -  -  -  -
== ~8k reads to pool and drives

-bash-3.00# dd if=/dev/zvol/dsk/pool1/vo...@now of=/dev/null bs=8k

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool14.01G   274G  2.25K  0  17.9M  0
  mirror 4.01G   274G  2.25K  0  17.9M  0
c0t9d0   -  -108  0  9.00M  0
c0t11d0  -  -109  0  8.92M  0
---  -  -  -  -  -  -
== ~8k reads to pool, ~85k reads to drives


Using volblocksize of 64k:

-bash-3.00# zfs send pool1/vol...@now > /dev/null

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool16.01G   272G378  0  23.5M  0
  mirror 6.01G   272G378  0  23.5M  0
c0t9d0   -  -189  0  11.8M  0
c0t11d0  -  -189  0  11.7M  0
---  -  -  -  -  -  -
== ~64k reads to pool and drives

-bash-3.00# dd if=/dev/zvol/dsk/pool1/vol...@now of=/dev/null bs=64k

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool16.01G   272G414  0  25.7M  0
  mirror 6.01G   272G414  0  25.7M  0
c0t9d0   -  -107  0  12.9M  0
c0t11d0  -  -106  0  12.8M  0
---  -  -  -  -  -  -
== ~64k reads to pool, ~124k reads to drives


Using volblocksize of 128k:

-bash-3.00# zfs send pool1/vol1...@now > /dev/null

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool14.01G   274G188  0  23.3M  0
  mirror 4.01G   274G188  0  23.3M  0
c0t9d0   -  - 94  0  11.7M  0
c0t11d0  -  - 93  0  11.7M  0
---  -  -  -  -  -  -
== ~128k reads to pool and drives

-bash-3.00# dd if=/dev/zvol/dsk/pool1/vol1...@now of=/dev/null bs=128k

capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
pool14.01G   274G247  0  30.8M  0
  mirror 4.01G   274G247  0  30.8M  0
c0t9d0   -  -122  0  15.3M  0
c0t11d0  -  -123  0  15.5M  0
---  -  -  -  -  -  -
== ~128k reads to pool and drives

-- 
---
Brian H. Nelson Youngstown State University
System Administrator   Media and Academic Computing
  bnelson[at]cis.ysu.edu
---

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread A Darren Dunham
On Tue, Jan 06, 2009 at 08:44:01AM -0800, Jacob Ritorto wrote:

> Is this increase explicable / expected?  The throughput calculator
> sheet output I saw seemed to forecast better iops with the striped
> raidz vdevs and I'd read that, generally, throughput is augmented by
> keeping the number of vdevs in the single digits.  Is my superlative
> result perhaps related to the large cpu and memory bandwidth?

I'd think that for pure sequential loads, larger column setups wouldn't
have too many performance issues.

But as soon as you try to do random reads on the large setup you're
going to be much more limited.  Do you have tools to do random I/O
exercises?

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Questions about OS 2008.11 partitioning scheme

2009-01-06 Thread Alex Viskovatoff
Hi all,

I did an install of OpenSolaris in which I specified that the whole disk should 
be used for the installation. Here is what "format> verify" produces for that 
disk:

Part  TagFlag Cylinders SizeBlocks
  0   rootwm   1 - 60797  465.73GB(60797/0/0) 976703805
  1 unassignedwm   00 (0/0/0) 0
  2 backupwu   0 - 60797  465.74GB(60798/0/0) 976719870
  3 unassignedwm   00 (0/0/0) 0
  4 unassignedwm   00 (0/0/0) 0
  5 unassignedwm   00 (0/0/0) 0
  6 unassignedwm   00 (0/0/0) 0
  7 unassignedwm   00 (0/0/0) 0
  8   bootwu   0 - 07.84MB(1/0/0) 16065
  9 unassignedwm   00 (0/0/0) 0

I have several questions. First, what is the purpose of partitions 2 and 8 
here? Why not simply have partition 0, the "root" partition, be the only 
partition, and start at cylinder 0 as opposed to 1?

My second question concerns the disk I have used to mirror the first root zpool 
disk. After I set up the second disk to mirror the first one with "zpool attach 
-f rpool c3t0d0s0 c4t0d0s0", I got the response

Please be sure to invoke installgrub(1M) to make 'c4t0d0s0' bootable.

Is that correct? Or do I want to make c4t0d0s8 bootable, given that the label 
of that partition is "boot"? I cannot help finding this a little confusing. As 
far as i can tell, c4t0d0s8 (as well as c3t0d0s8 from the original disk which I 
mirrored), cylinder 0, is not used for anything.

Finally, is the correct command to make the disk I have added to mirror the 
first disk bootable

"installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c4t0d0s0" ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500, snv_101a, hd and zfs

2009-01-06 Thread Elaine Ashton
Ok, it gets a bit more specific

hdadm and write_cache run 'format -e -d $disk' 

On this system, format will produce the list of devices in short order - format 
-e, however, takes much, much longer and would explain why it takes hours to 
iterate over 48 drives.

It's very curious and I'm not sure at this point if it's related to ZFS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
OK, so use a real io test program or at least pre-generate files large
enough to exceed RAM caching?



On Tue, Jan 6, 2009 at 1:19 PM, Bob Friesenhahn
 wrote:
> On Tue, 6 Jan 2009, Jacob Ritorto wrote:
>
>> Is urandom nonblocking?
>
> The OS provided random devices need to be secure and so they depend on
> collecting "entropy" from the system so the random values are truely random.
>  They also execute complex code to produce the random numbers. As a result,
> both of the random device interfaces are much slower than a disk drive.
>
> Bob
> ==
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Jacob Ritorto wrote:

> Is urandom nonblocking?

The OS provided random devices need to be secure and so they depend on 
collecting "entropy" from the system so the random values are truely 
random.  They also execute complex code to produce the random numbers. 
As a result, both of the random device interfaces are much slower than 
a disk drive.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Keith Bierman

On Jan 6, 2009, at 11:12 AM   1/6/, Bob Friesenhahn wrote:

> On Tue, 6 Jan 2009, Keith Bierman wrote:
>
>> Do you get the same sort of results from /dev/random?
>
> /dev/random is very slow and should not be used for benchmarking.
>
Not directly, no. But copying from /dev/random to a real file and  
using that should provide better insight than all zeros or all ones  
(I have seen "clever" devices optimize things away).

Tests like bonnie are probably a better bet than rolling one's own;  
although the latter is good for building intuition ;>

-- 
Keith H. Bierman   khb...@gmail.com  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
 Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
Is urandom nonblocking?



On Tue, Jan 6, 2009 at 1:12 PM, Bob Friesenhahn
 wrote:
> On Tue, 6 Jan 2009, Keith Bierman wrote:
>
>> Do you get the same sort of results from /dev/random?
>
> /dev/random is very slow and should not be used for benchmarking.
>
> Bob
> ==
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Keith Bierman wrote:

> Do you get the same sort of results from /dev/random?

/dev/random is very slow and should not be used for benchmarking.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Bob Friesenhahn
On Tue, 6 Jan 2009, Jacob Ritorto wrote:

> My OpenSolaris 2008/11 PC seems to attain better throughput with one 
> big sixteen-device RAIDZ2 than with four stripes of 4-device RAIDZ. 
> I know it's by no means an exhaustive test, but catting /dev/zero to 
> a file in the pool now frequently exceeds 600 Megabytes per second, 
> whereas before with the striped RAIDZ I was only occasionally 
> peaking around 400MB/s.  The kit is SuperMicro Intel 64 bit,
>
> Is this increase explicable / expected?  The throughput calculator

This is not surprising.  However, your test is only testing the write 
performance using a single process.  With multiple writers and 
readers, the throughput will be better when using the configuration 
with more vdevs.

It is not recommended to use such a large RAIDZ2 due to the multi-user 
performance concern, and because a single slow/failing disk drive can 
destroy the performance until it is identified and fixed.  Maybe a 
balky (but still functioning) drive won't be replaced under warranty 
and so you have to pay for a replacement out of your own pocket.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Keith Bierman

On Jan 6, 2009, at 9:44 AM   1/6/, Jacob Ritorto wrote:

>  but catting /dev/zero to a file in the pool now f

Do you get the same sort of results from /dev/random?

I wouldn't be surprised if /dev/zero turns out to be a special case.

Indeed, using any of the special files is probably not ideal.
>

-- 
Keith H. Bierman   khb...@gmail.com  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
 Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Observation of Device Layout vs Performance

2009-01-06 Thread Jacob Ritorto
My OpenSolaris 2008/11 PC seems to attain better throughput with one big 
sixteen-device RAIDZ2 than with four stripes of 4-device RAIDZ.  I know it's by 
no means an exhaustive test, but catting /dev/zero to a file in the pool now 
frequently exceeds 600 Megabytes per second, whereas before with the striped 
RAIDZ I was only occasionally peaking around 400MB/s.  The kit is SuperMicro 
Intel 64 bit, 2-socket by 4 thread, 3 GHz with two AOC MV8 boards and 800 MHz 
(iirc) fsb connecting 16 GB RAM that runs at equal speed to fsb.  Cheap 7200 
RPM Seagate SATA half-TB disks with 32MB cache.

Is this increase explicable / expected?  The throughput calculator sheet output 
I saw seemed to forecast better iops with the striped raidz vdevs and I'd read 
that, generally, throughput is augmented by keeping the number of vdevs in the 
single digits.  Is my superlative result perhaps related to the large cpu and 
memory bandwidth?

Just throwing this out for sake of discussion/sanity check..

thx
jake
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to find out the zpool of an uberblock printed with the fbt:zfs:uberblock_update: probes?

2009-01-06 Thread Marcelo Leal
> Hi,

 Hello Bernd,

> 
> After I published a blog entry about installing
> OpenSolaris 2008.11 on a 
> USB stick, I read a comment about a possible issue
> with wearing out 
> blocks on the USB stick after some time because ZFS
> overwrites its 
> uberblocks in place.
 I did not understand well what you are trying to say with "wearing out 
blocks", but in fact the uberblocks are not overwriten in place. The pattern 
you did notice with the dtrace script, is the update of the uberblock that is 
maintained in an array of 128 elements (1K each, just one active at time). Each 
physical vdev has four labes (256K structures) L0, L1, L2, and L3. Two in the 
begining and two at the end.
 Because the labels are in fixed location on disk, is the only update that zfs 
does not uses cow, but a two staged update. IIRC, the update is L0 and L2,and 
after that L1 and L3.
 Take a look:

 
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_label.c

 So:
 - The label is overwritten (in a two staged update);
 - The uberblock is not overwritten, but do write to a new element on the 
array. So, the transition from one uberblock(txg and timestamp) to another is 
atomic.

 I'm deploying a USB solution too, so if you can clarify the problem, i would 
appreciate it. 

ps.: I did look your blog, but did not see any comments around that, and the 
comments section is closed. ;-)

 Leal
[http://www.eall.com.br/blog]

> 
> I tried to get more information about how updating
> uberblocks works with 
> the following dtrace script:
> 
> /* io:genunix::start */
> io:genunix:default_physio:start,
> io:genunix:bdev_strategy:start,
> io:genunix:biodone:done
> {
> printf ("%d %s %d %d", timestamp, execname,
>  args[0]->b_blkno, 
> rgs[0]->b_bcount);
> }
> 
> fbt:zfs:uberblock_update:entry
> {
> printf ("%d (%d) %d, %d, %d, %d, %d, %d, %d, %d",
>  timestamp,
>  args[0]->ub_timestamp,
>  args[0]->ub_rootbp.blk_prop, args[0]->ub_guid_sum,
> args[0]->ub_rootbp.blk_birth,
>  args[0]->ub_rootbp.blk_fill,
> args[1]->vdev_id, args[1]->vdev_asize,
>  args[1]->vdev_psize,
>  args[2]);
> e output shows the following pattern after most of
> the 
> uberblock_update events:
> 
> 0  34404 uberblock_update:entry 244484736418912
>  (1231084189) 
> 226475971064889345, 4541013553469450828, 26747, 159,
> 0, 0, 0, 26747
> 0   6668bdev_strategy:start 244485190035647
>  sched 502 1024
> 0   6668bdev_strategy:start 244485190094304
>  sched 1014 1024
> 0   6668bdev_strategy:start 244485190129133
>  sched 39005174 1024
> 0   6668bdev_strategy:start 244485190163273
>  sched 39005686 1024
> 0   6656  biodone:done 244485190745068
>  sched 502 1024
> 0   6656  biodone:done 244485191239190
>  sched 1014 1024
> 0   6656  biodone:done 244485191737766
>  sched 39005174 1024
> 0   6656  biodone:done 244485192236988
>  sched 39005686 1024
> ...
> 0  34404   uberblock_update:entry
>  244514710086249 
> 1231084219) 9226475971064889345, 4541013553469450828,
> 26747, 159, 0, 0, 
> 0, 26748
> 0  34404   uberblock_update:entry
>  244544710086804 
> 1231084249) 9226475971064889345, 4541013553469450828,
> 26747, 159, 0, 0, 
> 0, 26749
> ...
> 0  34404   uberblock_update:entry
>  244574740885524 
> 1231084279) 9226475971064889345, 4541013553469450828,
> 26750, 159, 0, 0, 
> 0, 26750
> 0   6668 bdev_strategy:start 244575189866189
>  sched 508 1024
> 0   6668 bdev_strategy:start 244575189926518
>  sched 1020 1024
> 0   6668 bdev_strategy:start 244575189961783
>  sched 39005180 1024
> 0   6668 bdev_strategy:start 244575189995547
>  sched 39005692 1024
> 0   6656   biodone:done 244575190584497
>  sched 508 1024
> 0   6656   biodone:done 244575191077651
>  sched 1020 1024
> 0   6656   biodone:done 244575191576723
>  sched 39005180 1024
> 0   6656   biodone:done 244575192077070
>  sched 39005692 1024
> I am not a dtrace or zfs expert, but to me it looks
> like in many cases, 
> an uberblock update is followed by a write of 1024
> bytes to four 
> different disk blocks. I also found that the four
> block numbers are 
> incremented with always even numbers (256, 258, 260,
> ,..) 127 times and 
> then the first block is written again. Which would
> mean that for a txg 
> of 5, the four uberblock copies have been written
> 5/127=393 
> times (Correct?).
> 
> What I would like to find out is how to access fields
> from arg1 (this is 
> the data of type vdev in:
> 
> int uberblock_update(uberblock_t *ub, vdev_t *rvd,
> uint64_t txg)
> 
> ). When using the fbt:zfs:uberblock_update:entry
> probe, its elements are 
> always 0, as you can see in the above output. When
> using the 
> fbt:zfs:uberblock_update:return probe, I am getting
> an error message 
> like the following:
> 
> dtrace: failed to compile script
> zfs-uberblock-report-04.d: line 14: 
> operator -> must be applied to a pointer
> 
> Any idea how to access the fields of v

Re: [zfs-discuss] SOLVED: Mount ZFS pool on different system

2009-01-06 Thread Cyril Payet
OK, got it : just use zpool import.
Sorry for the inconvenience ;-)
C. 

-Message d'origine-
De : zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] De la part de Cyril Payet
Envoyé : mardi 6 janvier 2009 09:16
À : D. Eckert; zfs-discuss@opensolaris.org
Objet : Re: [zfs-discuss] SOLVED: Mount ZFS pool on different system

Btw,
When you import a pool, you must know its name.
Is there any command to get the pool name to whom a non-impoted disks belongs.
"# vxdisk -o alldgs list "does this with vxm.
Thanx for your replies.
C. 

-Message d'origine-
De : zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] De la part de D. Eckert Envoyé : 
samedi 3 janvier 2009 09:10 À : zfs-discuss@opensolaris.org Objet : 
[zfs-discuss] SOLVED: Mount ZFS pool on different system

RTFM seems to solve many problems ;-)

:# zpool import poolname
--
This message posted from opensolaris.org 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs list improvements?

2009-01-06 Thread Chris Gerhard
To improve the performance of scripts that manipulate zfs snapshots and the zfs 
snapshot service in perticular there needs to be a way to list all the 
snapshots for a given object and only the snapshots for that object.

There are two RFEs filed that cover this:

http://bugs.opensolaris.org/view_bug.do?bug_id=6352014 :

'zfs list' should have an option to only present direct descendents

http://bugs.opensolaris.org/view_bug.do?bug_id=6762432 zfs 

list --depth

The first is asking for a way to list only the direct descendents of a data 
set, ie it's children. The second asks to be able to list all the data sets 
down to a depth of N. 

So zfs list --depth 0 would be almost the same as zfs list -c except that it 
would also list the parent data set.

While zfs list -c is more user friendly zfs list -depth is more powerful. I'm 
wondering if both should be fixed or just one and if just one which?

Comments?

--chris
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread Anton B. Rang
For SCSI disks (including FC), you would use the FUA bit on the read command.

For SATA disks ... does anyone care?  ;-)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500, snv_101a, hd and zfs

2009-01-06 Thread Jens Elkner
On Mon, Jan 05, 2009 at 05:43:23PM -0800, Elaine Ashton wrote:
> I've got a fresh install of 101a on a thumper with 48 disks and zfs with one 
> large 46 drive raidz2 pool. It has no load at the moment. My problem is that 
> the SUNWhd tools are excrutiatingly slow, by excrutiating I mean that the 
> command : "/opt/SUNWhd/hd/bin/hdadm write_cache display all" takes three 
> hours to iterate over the 48 drives which, I suspect, is not the expected 
> time for such a command to take. 

SunOS isis 5.10 Generic_138889-02 i86pc i386 i86pc
aka S10u6 with latest patches:

+ time -p hdadm display all
...
platform = Sun Fire X4500
...
real 9.90
user 0.04
sys 0.36

+ time -p sh /opt/SUNWhd/hd/bin/write_cache display all
...
real 104.67
user 14.89
sys 6.57

Regards,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Metaslab alignment on RAID-Z

2009-01-06 Thread Robert Milkowski
Is there any update on this? You suggested that Jeff had some kind of solution 
for this - has it been integrated or is someone working on it?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SOLVED: Mount ZFS pool on different system

2009-01-06 Thread Rodney Lindner - Services Chief Technologist




Yep..
Just run zpool import without a poolname and it will list any pools
that are available for import.

eg:
sb2000::#zpool import
  pool: mp
    id: 17232673347678393572
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

    mp   ONLINE
  raidz2 ONLINE
    c1t2d0   ONLINE
    c1t3d0   ONLINE
    c1t4d0   ONLINE
    c1t5d0   ONLINE
    c1t8d0   ONLINE
    c1t9d0   ONLINE
    c1t10d0  ONLINE
    c1t11d0  ONLINE
    c1t12d0  ONLINE
    c1t13d0  ONLINE
    c1t14d0  ONLINE
    spares
  c1t15d0
sb2000::#zpool import mp

Regards
Rodney

Cyril Payet wrote:

  Btw,
When you import a pool, you must know its name.
Is there any command to get the pool name to whom a non-impoted disks belongs.
"# vxdisk -o alldgs list "does this with vxm.
Thanx for your replies.
C. 

-Message d'origine-
De : zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] De la part de D. Eckert
Envoyé : samedi 3 janvier 2009 09:10
À : zfs-discuss@opensolaris.org
Objet : [zfs-discuss] SOLVED: Mount ZFS pool on different system

RTFM seems to solve many problems ;-)

:# zpool import poolname
--
This message posted from opensolaris.org ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


-- 


 Rodney Lindner
/ Services Chief Technologist
Sun Microsystems, Inc.
33 Berry St
Nth Sydney, NSW, AU, 2060
Phone x59674/+61294669674
Email rodney.lind...@sun.com
 
Cut Utility Costs in Half - Learn More



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SOLVED: Mount ZFS pool on different system

2009-01-06 Thread Cyril Payet
Btw,
When you import a pool, you must know its name.
Is there any command to get the pool name to whom a non-impoted disks belongs.
"# vxdisk -o alldgs list "does this with vxm.
Thanx for your replies.
C. 

-Message d'origine-
De : zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] De la part de D. Eckert
Envoyé : samedi 3 janvier 2009 09:10
À : zfs-discuss@opensolaris.org
Objet : [zfs-discuss] SOLVED: Mount ZFS pool on different system

RTFM seems to solve many problems ;-)

:# zpool import poolname
--
This message posted from opensolaris.org 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss