Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-15 Thread Tonmaus
Hello again,

I am still concerned if my points are being well taken.

 If you are concerned that a
 single 200TB pool would take a long
 time to scrub, then use more pools and scrub in
 parallel.

The main concern is not scrub time. Scrub time could be weeks if scrub just 
would behave. You may imagine that there are applications where segmentation is 
a pain point, too.

  The scrub will queue no more
 han 10 I/Os at one time to a device, so devices which
 can handle concurrent I/O
 are not consumed entirely by scrub I/O. This could be
 tuned lower, but your storage 
 is slow and *any* I/O activity will be noticed.

There are a couple of things I maybe don't understand, then.

- zpool iostat is reporting more than 1k of outputs while scrub
- throughput is as high as can be until maxing out CPU
- nominal I/O capacity of a single device is still around 90, how can 10 I/Os 
already bring down payload
- scrubbing the same pool, configured as raidz1 didn't max out CPU which is no 
surprise (haha, slow storage...) the notable part is that it didn't slow down 
payload that much either.
- scrub is obviously fine with data added or deleted during a pass. So, it 
could be possible to pause and resume a pass, couldn't it?

My conclusion from these observations is that not only disk speed counts here, 
but other bottlenecks may strike as well. Solving the issue by the wallet is 
one way, solving it by configuration of parameters is another. So, is there a 
lever for scrub I/O prio, or not? Is there a possibility to pause scrub passed 
and resume?

Regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-15 Thread Michael Hassey
Sorry if this is too basic -

So I have a single zpool in addition to the rpool, called xpool.

NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
rpool   136G   109G  27.5G79%  ONLINE  -
xpool   408G   171G   237G42%  ONLINE  -

I have 408 in the pool, am using 171 leaving me 237 GB. 

The pool is built up as;

  pool: xpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
xpool   ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c8t1d0  ONLINE   0 0 0
c8t2d0  ONLINE   0 0 0
c8t3d0  ONLINE   0 0 0

errors: No known data errors


But - and here is the question -

Creating file systems on it, and the file systems in play report only 76GB of 
space free

SNIP FROM ZFS LIST

xpool/zones/logserver/ROOT/zbe 975M  76.4G   975M  legacy
xpool/zones/openxsrvr 2.22G  76.4G  21.9K  /export/zones/openxsrvr
xpool/zones/openxsrvr/ROOT2.22G  76.4G  18.9K  legacy
xpool/zones/openxsrvr/ROOT/zbe2.22G  76.4G  2.22G  legacy
xpool/zones/puggles241M  76.4G  21.9K  /export/zones/puggles
xpool/zones/puggles/ROOT   241M  76.4G  18.9K  legacy
xpool/zones/puggles/ROOT/zbe   241M  76.4G   241M  legacy
xpool/zones/reposerver 299M  76.4G  21.9K  /export/zones/reposerver


So my question is, where is the space from xpool being used? or is it?


Thanks for reading.

Mike.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Gabriele Bulfon
Hello,
I'd like to check for any guidance about using zfs on iscsi storage appliances.
Recently I had an unlucky situation with an unlucky storage machine freezing.
Once the storage was up again (rebooted) all other iscsi clients were happy, 
while one of the iscsi clients (a sun solaris sparc, running Oracle) did not 
mount the volume marking it as corrupted.
I had no way to get back my zfs data: had to destroy and recreate from backups.
So I have some questions regarding this nice story:
- I remember sysadmins being able to almost always recover data on corrupted 
ufs filesystems by magic of superblocks. Is there something similar on zfs? Is 
there really no way to access data of a corrupted zfs filesystem?
- In this case, the storage appliance is a legacy system based on linux, so 
raids/mirrors are managed at the storage side its own way. Being an iscsi 
target, this volume was mounted as a single iscsi disk from the solaris host, 
and prepared as a zfs pool consisting of this single iscsi target. ZFS best 
practices, tell me that to be safe in case of corruption, pools should always 
be mirrors or raidz on 2 or more disks. In this case, I considered all safe, 
because the mirror and raid was managed by the storage machine. But from the 
solaris host point of view, the pool was just one! And maybe this has been the 
point of failure. What is the correct way to go in this case?
- Finally, looking forward to run new storage appliances using OpenSolaris and 
its ZFS+iscsitadm and/or comstar, I feel a bit confused by the possibility of 
having a double zfs situation: in this case, I would have the storage zfs 
filesystem divided into zfs volumes, accessed via iscsi by a possible solaris 
host that creates his own zfs pool on it (...is it too redundant??) and again I 
would fall in the same previous case (host zfs pool connected to one only iscsi 
resource).

Any guidance would be really appreciated :)
Thanks a lot
Gabriele.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-15 Thread Cindy Swearingen

Hi Michael,

For a RAIDZ pool, the zpool list command identifies the inflated space
for the storage pool, which is the physical available space without an
accounting for redundancy overhead.

The zfs list command identifies how much actual pool space is available
to the file systems.

See the example of a RAIDZ-2 pool created below with 3 44 GB disks.
The total pool capacity reported by zpool list is 134 GB. The amount of
pool space that is available to the file systems is 43.8 GB due to
RAIDZ-2 redundancy overhead.

See this FAQ section for more information.

http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq#HZFSAdministrationQuestions

Why doesn't the space that is reported by the zpool list command and the 
zfs list command match?


Although this site is dog-slow for me today...

Thanks,

Cindy

# zpool create xpool raidz2 c3t40d0 c3t40d1 c3t40d2
# zpool list xpool
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
xpool   134G   234K   134G 0%  ONLINE  -
# zfs list xpool
NAMEUSED  AVAIL  REFER  MOUNTPOINT
xpool  73.2K  43.8G  20.9K  /xpool


On 03/15/10 08:38, Michael Hassey wrote:

Sorry if this is too basic -

So I have a single zpool in addition to the rpool, called xpool.

NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
rpool   136G   109G  27.5G79%  ONLINE  -
xpool   408G   171G   237G42%  ONLINE  -

I have 408 in the pool, am using 171 leaving me 237 GB. 


The pool is built up as;

  pool: xpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
xpool   ONLINE   0 0 0
  raidz2ONLINE   0 0 0
c8t1d0  ONLINE   0 0 0
c8t2d0  ONLINE   0 0 0
c8t3d0  ONLINE   0 0 0

errors: No known data errors


But - and here is the question -

Creating file systems on it, and the file systems in play report only 76GB of 
space free

SNIP FROM ZFS LIST

xpool/zones/logserver/ROOT/zbe 975M  76.4G   975M  legacy
xpool/zones/openxsrvr 2.22G  76.4G  21.9K  /export/zones/openxsrvr
xpool/zones/openxsrvr/ROOT2.22G  76.4G  18.9K  legacy
xpool/zones/openxsrvr/ROOT/zbe2.22G  76.4G  2.22G  legacy
xpool/zones/puggles241M  76.4G  21.9K  /export/zones/puggles
xpool/zones/puggles/ROOT   241M  76.4G  18.9K  legacy
xpool/zones/puggles/ROOT/zbe   241M  76.4G   241M  legacy
xpool/zones/reposerver 299M  76.4G  21.9K  /export/zones/reposerver


So my question is, where is the space from xpool being used? or is it?


Thanks for reading.

Mike.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zf

2010-03-15 Thread Michael Hassey
That solved it.

Thank you Cindy.

Zpool list NOT reporting raidz overhead is what threw me...


Thanks again.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Ware Adams

On Mar 15, 2010, at 10:55 AM, Gabriele Bulfon wrote:

 - In this case, the storage appliance is a legacy system based on linux, so 
 raids/mirrors are managed at the storage side its own way. Being an iscsi 
 target, this volume was mounted as a single iscsi disk from the solaris host, 
 and prepared as a zfs pool consisting of this single iscsi target. ZFS best 
 practices, tell me that to be safe in case of corruption, pools should always 
 be mirrors or raidz on 2 or more disks. In this case, I considered all safe, 
 because the mirror and raid was managed by the storage machine. But from the 
 solaris host point of view, the pool was just one! And maybe this has been 
 the point of failure. What is the correct way to go in this case?

I'd guess this could be because the iscsi target wasn't honoring ZFS flush 
requests.

 - Finally, looking forward to run new storage appliances using OpenSolaris 
 and its ZFS+iscsitadm and/or comstar, I feel a bit confused by the 
 possibility of having a double zfs situation: in this case, I would have the 
 storage zfs filesystem divided into zfs volumes, accessed via iscsi by a 
 possible solaris host that creates his own zfs pool on it (...is it too 
 redundant??) and again I would fall in the same previous case (host zfs pool 
 connected to one only iscsi resource).

My experience with this is significantly lower end, but I have had iSCSI shares 
from a ZFS NAS come up as corrupt to the client.  It's fixable if you have 
snapshots.

I've been using iSCSI to provide Time Machine targets to OS X boxes.  We had a 
client crash during writing, and upon reboot it showed the iSCSI volume is 
corrupt.  You can put whatever file system you like the iSCSI target obviously. 
 The current OpenSolaris iSCSI implementation I believe uses synchronous 
writes, so hopefully what happened to you wouldn't happen in this case.

In my case I was using HFS+ (the OS X client has to), and I couldn't repair the 
volume.  However, with a snapshot I could roll it back.  If you plan ahead this 
should save you some restoration work (you'll need to be able to roll back all 
the files that have to be consistent).

Good luck,
Ware
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Ross Walker
On Mar 15, 2010, at 10:55 AM, Gabriele Bulfon gbul...@sonicle.com  
wrote:



Hello,
I'd like to check for any guidance about using zfs on iscsi storage  
appliances.
Recently I had an unlucky situation with an unlucky storage machine  
freezing.
Once the storage was up again (rebooted) all other iscsi clients  
were happy, while one of the iscsi clients (a sun solaris sparc,  
running Oracle) did not mount the volume marking it as corrupted.
I had no way to get back my zfs data: had to destroy and recreate  
from backups.

So I have some questions regarding this nice story:
- I remember sysadmins being able to almost always recover data on  
corrupted ufs filesystems by magic of superblocks. Is there  
something similar on zfs? Is there really no way to access data of a  
corrupted zfs filesystem?
- In this case, the storage appliance is a legacy system based on  
linux, so raids/mirrors are managed at the storage side its own way.  
Being an iscsi target, this volume was mounted as a single iscsi  
disk from the solaris host, and prepared as a zfs pool consisting of  
this single iscsi target. ZFS best practices, tell me that to be  
safe in case of corruption, pools should always be mirrors or raidz  
on 2 or more disks. In this case, I considered all safe, because the  
mirror and raid was managed by the storage machine. But from the  
solaris host point of view, the pool was just one! And maybe this  
has been the point of failure. What is the correct way to go in this  
case?
- Finally, looking forward to run new storage appliances using  
OpenSolaris and its ZFS+iscsitadm and/or comstar, I feel a bit  
confused by the possibility of having a double zfs situation: in  
this case, I would have the storage zfs filesystem divided into zfs  
volumes, accessed via iscsi by a possible solaris host that creates  
his own zfs pool on it (...is it too redundant??) and again I would  
fall in the same previous case (host zfs pool connected to one only  
iscsi resource).


Any guidance would be really appreciated :)
Thanks a lot
Gabriele.


What iSCSI target was this?

If it was IET I hope you were NOT using the write-back option on it as  
it caches write data in volatile RAM.


IET does support cache flushes, but if you cache in RAM (bad idea) a  
system lockup or panic will ALWAYS loose data.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Gabriele Bulfon
Well, I actually don't know what implementation is inside this legacy machine.
This machine is an AMI StoreTrends ITX, but maybe it has been built around IET, 
don't know.
Well, maybe I should disable write-back on every zfs host connecting on iscsi?
How do I check this?

Thx
Gabriele.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Ware Adams

On Mar 15, 2010, at 12:13 PM, Gabriele Bulfon wrote:

 Well, I actually don't know what implementation is inside this legacy machine.
 This machine is an AMI StoreTrends ITX, but maybe it has been built around 
 IET, don't know.
 Well, maybe I should disable write-back on every zfs host connecting on iscsi?
 How do I check this?

I think this would be a property of the NAS, not the clients.

--Ware
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Ross Walker
On Mar 15, 2010, at 12:19 PM, Ware Adams rwali...@washdcmail.com  
wrote:




On Mar 15, 2010, at 12:13 PM, Gabriele Bulfon wrote:

Well, I actually don't know what implementation is inside this  
legacy machine.
This machine is an AMI StoreTrends ITX, but maybe it has been built  
around IET, don't know.
Well, maybe I should disable write-back on every zfs host  
connecting on iscsi?

How do I check this?


I think this would be a property of the NAS, not the clients.


Yes, Ware's right the setting should be on the AMI device.

I don't know what target it's using either, but if it has an option to  
disable write-back caching at least then if it doesn't honor flushing  
your data should still be safe.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR 6880994 and pkg fix

2010-03-15 Thread David Dyer-Bennet

On Sun, March 14, 2010 13:54, Frank Middleton wrote:


 How can it even be remotely possible to get a checksum failure on mirrored
 drives
 with copies=2? That means all four copies were corrupted? Admittedly this
 is
 on a grotty PC with no ECC and flaky bus parity, but how come the same
 file always
 gets flagged as being clobbered (even though apparently it isn't).

 The oddest part is that libdlpi.so.1 doesn't actually seem to be
 corrupted. nm lists
 it with no problem and you can copy it to /tmp, rename it, and then copy
 it back.
 objdump and readelf can all process this library with no problem. But pkg
 fix
 flags an error in it's own inscrutable way. CCing pkg-discuss in case a
 pkg guru
 can shed any light on what the output of pkg fix (below) means.
 Presumably libc
 is OK, or it wouldn't boot :-).

This sounds really bizarre.

One detail suggestion on checking what's going on (since I don't have a
clue towards a real root-cause determination): Get an md5sum on a clean
copy of the file, say from a new install or something, and check the
allegedly-corrupted copy against that.  This can fairly easily give you a
pretty reliable indication if the file is truly corrupted or not.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool reporting consistent read errors

2010-03-15 Thread David Dyer-Bennet

On Mon, March 15, 2010 00:54, no...@euphoriq.com wrote:
 I'm running a raidz1 with 3 Samsung 1.5TB drives.  Every time I scrub the
 pool I get multiple read errors, no write errors and no checksum errors on
 one drive (always the same drive, and no data loss).

 I've changed cables, changed the sata ports the drives are attached to, I
 always get the same outcome.  The drives are new.  Is this likely a drive
 problem?

Given what you've already changed, it's sounding like it could well be a
drive problem.  The one other thing that comes to mind is power to the
drive.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-15 Thread Scott Meilicke
Greg, I am using NetBackup 6.5.3.1 (7.x is out) with fine results. Nice and 
fast.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-15 Thread Greg
Hey Scott, 
Thanks for the information. I doubt I can drop that kind of cash, but back to 
getting bacula working!

Thanks again,
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool reporting consistent read errors

2010-03-15 Thread no...@euphoriq.com
Wow.  I never thought about it.  I changed the power supply to a cheap one a 
while back (a now seemingly foolish effort to save money) - it could be the 
issue.  I'll change it back and let you know.

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool reporting consistent read errors

2010-03-15 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 15.03.2010 21:13, no...@euphoriq.com wrote:
 Wow.  I never thought about it.  I changed the power supply to a cheap one a 
 while back (a now seemingly foolish effort to save money) - it could be the 
 issue.  I'll change it back and let you know.

cheap powersupplies rarely are.  ;)

It's been my experience that if you overengineer the psu a bit, the
efficiency of the PSU increases (it's no longer pushing 100% of its
rated spec) and actually the consumed power (on the 220v side) drops.

//Svein

- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuemfgACgkQSBMQn1jNM7ZJ5gCghZuA3LnqkZnA54zddSlrkG6Y
MbcAoK8RU5td2Xx79q+Wmbztth7pB217
=pRID
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] persistent L2ARC

2010-03-15 Thread Abdullah Al-Dahlawi
Greeting ALL


I understand that L2ARC is still under enhancement. Does any one know if ZFS
can be upgrades to include Persistent L2ARC, ie. L2ARC will not loose its
contents after system reboot ?




-- 
Abdullah Al-Dahlawi
George Washington University
Department. Of Electrical  Computer Engineering

Check The Fastest 500 Super Computers Worldwide
http://www.top500.org/list/2009/11/100
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] pool causes kernel panic, recursive mutex enter, 134

2010-03-15 Thread Mark
hi,
i´m using opensolaris about 2 years with an mirrored rpool and an data pool 
with 3 x 2 (mirrored) drives.
the data pool drives are connected to SIL pci-express cards.

yesterday i updated from 130 to 134, everything seemed to be fine and i also 
replaced 1 pair of mirrored drives with larger disks.
still no problems, done some tests, rebooted a few times, checked logs, nothing 
special.

today i started copying a larger amount of data. while copying, at about 40gb, 
opensolaris gave me the first kernel panic ever seen on this system. system 
rebooted and while mounting the data pool, you may guess it, panic again.

what i did so far in trying to get it up again:

boot without data drive, try to mount manualy and with -F -n (non destructive 
as manual says)
tried to mount normal with different combination of mirrors taken offline, so 
that there is only a single drive for each slice. same panic.

i still have the drives that i replaced with the newer drives but i believe 
they are useless since the structure changed?

the kernel panic i get is cpu(0) recursive mutex enter and several lines of SIL 
driver errors. 
i tried also booting with previous BE 130 before the update and where the pools 
never got an error, same panic.

ANY ideas of volume rescue are welcome - if i missed some important 
information,please tell me.
regards, mark
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool reporting consistent read errors

2010-03-15 Thread David Dyer-Bennet

On Mon, March 15, 2010 15:35, Svein Skogen wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 15.03.2010 21:13, no...@euphoriq.com wrote:
 Wow.  I never thought about it.  I changed the power supply to a cheap
 one a while back (a now seemingly foolish effort to save money) - it
 could be the issue.  I'll change it back and let you know.

 cheap powersupplies rarely are.  ;)

I've had all types fail on me.  I think I've had more power supplies than
disk drives fail on me, even.

And they can produce the most *amazing* range of symptoms, if they don't
fail completely.  Quite remarkable.

 It's been my experience that if you overengineer the psu a bit, the
 efficiency of the PSU increases (it's no longer pushing 100% of its
 rated spec) and actually the consumed power (on the 220v side) drops.

Strangely enough, running up to the limit is hard on components, yes.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pool causes kernel panic, recursive mutex enter, 134

2010-03-15 Thread Mark
some screenshots that may help:

 pool: tank
id: 5649976080828524375
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

data   ONLINE
  mirror-0 ONLINE
c27t2d0ONLINE
c27t0d0ONLINE
  mirror-1 ONLINE
c27t3d0ONLINE
c29t1d0ONLINE
  mirror-2 ONLINE
c27t1d0ONLINE
c29t0d0ONLINE



Mar 15 21:42:50 solaris1.local ^Mpanic[cpu0]/thread=d6792f00:   

   
Mar 15 21:42:50 solaris1.local genunix: [ID 335743 kern.notice] BAD TRAP: 
type=e (#pf Page fault) rp=d76d3658 addr=34 occurred in module zfs due to a 
NULL pointer dereference   
Mar 15 21:42:50 solaris1.local unix: [ID 10 kern.notice]

   
Mar 15 21:42:50 solaris1.local unix: [ID 839527 kern.notice] syseventd: 

   
Mar 15 21:42:50 solaris1.local unix: [ID 753105 kern.notice] #pf Page fault 

   
Mar 15 21:42:50 solaris1.local unix: [ID 532287 kern.notice] Bad kernel fault 
at addr=0x34
 
Mar 15 21:42:50 solaris1.local unix: [ID 243837 kern.notice] pid=93, 
pc=0xf924b97e, sp=0xd76d36c4, eflags=0x10282
  
Mar 15 21:42:50 solaris1.local unix: [ID 211416 kern.notice] cr0: 
8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de   
 
Mar 15 21:42:50 solaris1.local unix: [ID 624947 kern.notice] cr2: 34

   
Mar 15 21:42:50 solaris1.local unix: [ID 625075 kern.notice] cr3: 2ead020   

   
Mar 15 21:42:50 solaris1.local unix: [ID 10 kern.notice]

   
Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] gs: d76d01b0  
fs:0  es:   cb0160  ds: e31a0160

Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice]edi:0 
esi: de581350 ebp: d76d36a4 esp: d76d3690   
 
Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice]ebx:0 
edx:b ecx:0 eax:0   
 
Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice]trp:e 
err:0 eip: f924b97e  cs:  158   
 
Mar 15 21:42:50 solaris1.local unix: [ID 717149 kern.notice]efl:10282 
usp: d76d36c4  ss: f924b9c6 
 
Mar 15 21:42:50 solaris1.local unix: [ID 10 kern.notice]

   
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3594 
unix:die+93 (e, d76d3658, 34, 0)
  
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3644 
unix:trap+1449 (d76d3658, 34, 0)
  
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3658 
unix:cmntrap+7c (d76d01b0, 0, cb0160)   
  
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d36a4 
zfs:vdev_is_dead+6 (0, 0, cb36a7, e31ad)
  
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d36c4 
zfs:vdev_readable+e (0, 1, 0, fe96c13d) 
  
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3704 
zfs:vdev_mirror_child_select+55 (dedc6560, 1, 0, f92)   
  
Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] 

Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-15 Thread Tonmaus
Hi Cindy,
trying to reproduce this 

 For a RAIDZ pool, the zpool list command identifies
 the inflated space
 for the storage pool, which is the physical available
 space without an
 accounting for redundancy overhead.
 
 The zfs list command identifies how much actual pool
 space is available
 to the file systems.

I am lacking 1 TB on my pool:

u...@filemeister:~$ zpool list daten
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
daten10T  3,71T  6,29T37%  1.00x  ONLINE  -
u...@filemeister:~$ zpool status daten
  pool: daten
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
daten ONLINE   0 0 0
  raidz2-0ONLINE   0 0 0
c10t2d0   ONLINE   0 0 0
c10t3d0   ONLINE   0 0 0
c10t4d0   ONLINE   0 0 0
c10t5d0   ONLINE   0 0 0
c10t6d0   ONLINE   0 0 0
c10t7d0   ONLINE   0 0 0
c10t8d0   ONLINE   0 0 0
c10t9d0   ONLINE   0 0 0
c11t18d0  ONLINE   0 0 0
c11t19d0  ONLINE   0 0 0
c11t20d0  ONLINE   0 0 0
spares
  c11t21d0AVAIL

errors: No known data errors
u...@filemeister:~$ zfs list daten
NAMEUSED  AVAIL  REFER  MOUNTPOINT
daten  3,01T  4,98T   110M  /daten

I am counting 11 disks 1 TB each in a raidz2 pool. This is 11 TB gross 
capacity, and 9 TB net. Zpool is however stating 10 TB and zfs is stating 8TB. 
The difference between net and gross is correct, but where is the capacity from 
the 11th disk going?

Regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-15 Thread Carson Gaspar

Tonmaus wrote:


I am lacking 1 TB on my pool:

u...@filemeister:~$ zpool list daten NAMESIZE  ALLOC   FREE
CAP  DEDUP  HEALTH  ALTROOT daten10T  3,71T  6,29T37%  1.00x
ONLINE  - u...@filemeister:~$ zpool status daten pool: daten state:
ONLINE scrub: none requested config:

NAME  STATE READ WRITE CKSUM daten ONLINE   0
0 0 raidz2-0ONLINE   0 0 0 c10t2d0   ONLINE
0 0 0 c10t3d0   ONLINE   0 0 0 c10t4d0   ONLINE
0 0 0 c10t5d0   ONLINE   0 0 0 c10t6d0   ONLINE
0 0 0 c10t7d0   ONLINE   0 0 0 c10t8d0   ONLINE
0 0 0 c10t9d0   ONLINE   0 0 0 c11t18d0  ONLINE
0 0 0 c11t19d0  ONLINE   0 0 0 c11t20d0  ONLINE
0 0 0 spares c11t21d0AVAIL

errors: No known data errors u...@filemeister:~$ zfs list daten NAME
USED  AVAIL  REFER  MOUNTPOINT daten  3,01T  4,98T   110M  /daten

I am counting 11 disks 1 TB each in a raidz2 pool. This is 11 TB
gross capacity, and 9 TB net. Zpool is however stating 10 TB and zfs
is stating 8TB. The difference between net and gross is correct, but
where is the capacity from the 11th disk going?


My guess is unit conversion and rounding. Your pool has 11 base 10 TB, 
which is 10.2445 base 2 TiB.


Likewise your fs has 9 base 10 TB, which is 8.3819 base 2 TiB.

--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-15 Thread Erik Trimble
On Mon, 2010-03-15 at 15:03 -0700, Tonmaus wrote:
 Hi Cindy,
 trying to reproduce this 
 
  For a RAIDZ pool, the zpool list command identifies
  the inflated space
  for the storage pool, which is the physical available
  space without an
  accounting for redundancy overhead.
  
  The zfs list command identifies how much actual pool
  space is available
  to the file systems.
 
 I am lacking 1 TB on my pool:
 
 u...@filemeister:~$ zpool list daten
 NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 daten10T  3,71T  6,29T37%  1.00x  ONLINE  -
 u...@filemeister:~$ zpool status daten
   pool: daten
  state: ONLINE
  scrub: none requested
 config:
 
 NAME  STATE READ WRITE CKSUM
 daten ONLINE   0 0 0
   raidz2-0ONLINE   0 0 0
 c10t2d0   ONLINE   0 0 0
 c10t3d0   ONLINE   0 0 0
 c10t4d0   ONLINE   0 0 0
 c10t5d0   ONLINE   0 0 0
 c10t6d0   ONLINE   0 0 0
 c10t7d0   ONLINE   0 0 0
 c10t8d0   ONLINE   0 0 0
 c10t9d0   ONLINE   0 0 0
 c11t18d0  ONLINE   0 0 0
 c11t19d0  ONLINE   0 0 0
 c11t20d0  ONLINE   0 0 0
 spares
   c11t21d0AVAIL
 
 errors: No known data errors
 u...@filemeister:~$ zfs list daten
 NAMEUSED  AVAIL  REFER  MOUNTPOINT
 daten  3,01T  4,98T   110M  /daten
 
 I am counting 11 disks 1 TB each in a raidz2 pool. This is 11 TB gross 
 capacity, and 9 TB net. Zpool is however stating 10 TB and zfs is stating 
 8TB. The difference between net and gross is correct, but where is the 
 capacity from the 11th disk going?
 
 Regards,
 
 Tonmaus

1TB disks aren't a terabyte.

Remember, the storage industry uses powers of 10, not 2.  it's
annoying.  

For each GB, you lose 7% in actual space computation. For each TB, it's
about 9%. So, your 1TB of  is actually about 931 GB. 

'zfs list' is going to report in actual powers-of-2, just like df. 


In my case, I have a 12 x 1TB configuration, and zfs list shows:


# zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
array2540  10.9T  5.46T  5.41T50%  ONLINE  -

Likewise:

# zfs list
NAMEUSED  AVAIL  REFER  MOUNTPOINT
array2540  4.53T  4.34T  80.4M  /data


So, here's the math:

1 storage TB = 1e12 / (1024^3) = 931 actual GB

931 GB x 12 = 11,172 GB
but, 1TB = 1024 GB
so:  931 GB x 12 / (1024) = 10.9TB.


Quick Math: 1 TB of advertised space = 0.91 TB of real space
1 GB of advertised space = 0.93 GB of real space





-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-15 Thread Erik Trimble
On Mon, 2010-03-15 at 15:40 -0700, Carson Gaspar wrote:
 Tonmaus wrote:
 
  I am lacking 1 TB on my pool:
  
  u...@filemeister:~$ zpool list daten NAMESIZE  ALLOC   FREE
  CAP  DEDUP  HEALTH  ALTROOT daten10T  3,71T  6,29T37%  1.00x
  ONLINE  - u...@filemeister:~$ zpool status daten pool: daten state:
  ONLINE scrub: none requested config:
  
  NAME  STATE READ WRITE CKSUM daten ONLINE   0
  0 0 raidz2-0ONLINE   0 0 0 c10t2d0   ONLINE
  0 0 0 c10t3d0   ONLINE   0 0 0 c10t4d0   ONLINE
  0 0 0 c10t5d0   ONLINE   0 0 0 c10t6d0   ONLINE
  0 0 0 c10t7d0   ONLINE   0 0 0 c10t8d0   ONLINE
  0 0 0 c10t9d0   ONLINE   0 0 0 c11t18d0  ONLINE
  0 0 0 c11t19d0  ONLINE   0 0 0 c11t20d0  ONLINE
  0 0 0 spares c11t21d0AVAIL
  
  errors: No known data errors u...@filemeister:~$ zfs list daten NAME
  USED  AVAIL  REFER  MOUNTPOINT daten  3,01T  4,98T   110M  /daten
  
  I am counting 11 disks 1 TB each in a raidz2 pool. This is 11 TB
  gross capacity, and 9 TB net. Zpool is however stating 10 TB and zfs
  is stating 8TB. The difference between net and gross is correct, but
  where is the capacity from the 11th disk going?
 
 My guess is unit conversion and rounding. Your pool has 11 base 10 TB, 
 which is 10.2445 base 2 TiB.
 
 Likewise your fs has 9 base 10 TB, which is 8.3819 base 2 TiB.

Not quite.  

11 x 10^12 =~ 10.004 x (1024^4).

So, the 'zpool list' is right on, at 10T available.

For the 'zfs list', remember there is a slight overhead for filesystem
formatting. 

So, instead of 

9 x 10^12 =~ 8.185 x (1024^4)

it shows 7.99TB usable. The roughly 200GB is the overhead. (or, about
3%).




-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Tonmaus
 Being an iscsi
 target, this volume was mounted as a single iscsi
 disk from the solaris host, and prepared as a zfs
 pool consisting of this single iscsi target. ZFS best
 practices, tell me that to be safe in case of
 corruption, pools should always be mirrors or raidz
 on 2 or more disks. In this case, I considered all
 safe, because the mirror and raid was managed by the
 storage machine. 

As far as I understand Best Practises, redundancy needs to be within zfs in 
order to provide full protection. So, actually Best Practises says that your 
scenario is rather one to be avoided. 

Regards,
Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-15 Thread Tonmaus
 My guess is unit conversion and rounding. Your pool
  has 11 base 10 TB, 
  which is 10.2445 base 2 TiB.
  
 Likewise your fs has 9 base 10 TB, which is 8.3819
  base 2 TiB.
 Not quite.  
 
 11 x 10^12 =~ 10.004 x (1024^4).
 
 So, the 'zpool list' is right on, at 10T available.

Duh! I completely forgot about this. Thanks for the heads-up.

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems

2010-03-15 Thread Carson Gaspar

Someone wrote (I haven't seen the mail, only the unattributed quote):

My guess is unit conversion and rounding. Your pool
 has 11 base 10 TB, 
 which is 10.2445 base 2 TiB.
 
Likewise your fs has 9 base 10 TB, which is 8.3819

 base 2 TiB.


Not quite.  


11 x 10^12 =~ 10.004 x (1024^4).

So, the 'zpool list' is right on, at 10T available.


Duh, I was doing GiB math (y = x * 10^9 / 2^20), not TiB math (y = x * 
10^12 / 2^40).


Thanks for the correction.

--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Tim Cook
On Mon, Mar 15, 2010 at 9:55 AM, Gabriele Bulfon gbul...@sonicle.comwrote:

 Hello,
 I'd like to check for any guidance about using zfs on iscsi storage
 appliances.
 Recently I had an unlucky situation with an unlucky storage machine
 freezing.
 Once the storage was up again (rebooted) all other iscsi clients were
 happy, while one of the iscsi clients (a sun solaris sparc, running Oracle)
 did not mount the volume marking it as corrupted.
 I had no way to get back my zfs data: had to destroy and recreate from
 backups.
 So I have some questions regarding this nice story:
 - I remember sysadmins being able to almost always recover data on
 corrupted ufs filesystems by magic of superblocks. Is there something
 similar on zfs? Is there really no way to access data of a corrupted zfs
 filesystem?
 - In this case, the storage appliance is a legacy system based on linux, so
 raids/mirrors are managed at the storage side its own way. Being an iscsi
 target, this volume was mounted as a single iscsi disk from the solaris
 host, and prepared as a zfs pool consisting of this single iscsi target. ZFS
 best practices, tell me that to be safe in case of corruption, pools should
 always be mirrors or raidz on 2 or more disks. In this case, I considered
 all safe, because the mirror and raid was managed by the storage machine.
 But from the solaris host point of view, the pool was just one! And maybe
 this has been the point of failure. What is the correct way to go in this
 case?
 - Finally, looking forward to run new storage appliances using OpenSolaris
 and its ZFS+iscsitadm and/or comstar, I feel a bit confused by the
 possibility of having a double zfs situation: in this case, I would have the
 storage zfs filesystem divided into zfs volumes, accessed via iscsi by a
 possible solaris host that creates his own zfs pool on it (...is it too
 redundant??) and again I would fall in the same previous case (host zfs pool
 connected to one only iscsi resource).

 Any guidance would be really appreciated :)
 Thanks a lot
 Gabriele.


To answer the other portion of your question, yes, you can roll back zfs if
you're at the proper version.  The procedure is listed below, essentially it
will try to find the last known good transaction.  If that doesn't work,
your only remaining option is to restore from backup:
http://docs.sun.com/app/docs/doc/817-2271/gbctt?l=jaa=view

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] persistent L2ARC

2010-03-15 Thread Giovanni Tirloni
On Mon, Mar 15, 2010 at 5:39 PM, Abdullah Al-Dahlawi dahl...@ieee.orgwrote:

 Greeting ALL


 I understand that L2ARC is still under enhancement. Does any one know if
 ZFS can be upgrades to include Persistent L2ARC, ie. L2ARC will not loose
 its contents after system reboot ?


There is a bug opened for that but it doesn't seem to be implemented yet.

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6662467

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Ross Walker

On Mar 15, 2010, at 7:11 PM, Tonmaus sequoiamo...@gmx.net wrote:


Being an iscsi
target, this volume was mounted as a single iscsi
disk from the solaris host, and prepared as a zfs
pool consisting of this single iscsi target. ZFS best
practices, tell me that to be safe in case of
corruption, pools should always be mirrors or raidz
on 2 or more disks. In this case, I considered all
safe, because the mirror and raid was managed by the
storage machine.


As far as I understand Best Practises, redundancy needs to be within  
zfs in order to provide full protection. So, actually Best Practises  
says that your scenario is rather one to be avoided.


There is nothing saying redundancy can't be provided below ZFS just if  
you want auto recovery you need redundancy within ZFS itself as well.


You can have 2 separate raid arrays served up via iSCSI to ZFS which  
then makes a mirror out of the storage.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Tim Cook
On Mon, Mar 15, 2010 at 9:10 PM, Ross Walker rswwal...@gmail.com wrote:

 On Mar 15, 2010, at 7:11 PM, Tonmaus sequoiamo...@gmx.net wrote:

  Being an iscsi
 target, this volume was mounted as a single iscsi
 disk from the solaris host, and prepared as a zfs
 pool consisting of this single iscsi target. ZFS best
 practices, tell me that to be safe in case of
 corruption, pools should always be mirrors or raidz
 on 2 or more disks. In this case, I considered all
 safe, because the mirror and raid was managed by the
 storage machine.


 As far as I understand Best Practises, redundancy needs to be within zfs
 in order to provide full protection. So, actually Best Practises says that
 your scenario is rather one to be avoided.


 There is nothing saying redundancy can't be provided below ZFS just if you
 want auto recovery you need redundancy within ZFS itself as well.

 You can have 2 separate raid arrays served up via iSCSI to ZFS which then
 makes a mirror out of the storage.

 -Ross


Perhaps I'm remembering incorrectly, but I didn't think mirroring would
auto-heal/recover, I thought that was limited to the raidz* implementations.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corruption of ZFS on iScsi storage

2010-03-15 Thread Ross Walker

On Mar 15, 2010, at 11:10 PM, Tim Cook t...@cook.ms wrote:




On Mon, Mar 15, 2010 at 9:10 PM, Ross Walker rswwal...@gmail.com  
wrote:

On Mar 15, 2010, at 7:11 PM, Tonmaus sequoiamo...@gmx.net wrote:

Being an iscsi
target, this volume was mounted as a single iscsi
disk from the solaris host, and prepared as a zfs
pool consisting of this single iscsi target. ZFS best
practices, tell me that to be safe in case of
corruption, pools should always be mirrors or raidz
on 2 or more disks. In this case, I considered all
safe, because the mirror and raid was managed by the
storage machine.

As far as I understand Best Practises, redundancy needs to be within  
zfs in order to provide full protection. So, actually Best Practises  
says that your scenario is rather one to be avoided.


There is nothing saying redundancy can't be provided below ZFS just  
if you want auto recovery you need redundancy within ZFS itself as  
well.


You can have 2 separate raid arrays served up via iSCSI to ZFS which  
then makes a mirror out of the storage.


-Ross


Perhaps I'm remembering incorrectly, but I didn't think mirroring  
would auto-heal/recover, I thought that was limited to the raidz*  
implementations.


Mirroring auto-heals, in fact copies=2 on a single disk vdev can auto- 
heal (if it isn't a disk failure).


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-15 Thread Richard Elling
On Mar 14, 2010, at 11:25 PM, Tonmaus wrote:
 Hello again,
 
 I am still concerned if my points are being well taken.
 
 If you are concerned that a
 single 200TB pool would take a long
 time to scrub, then use more pools and scrub in
 parallel.
 
 The main concern is not scrub time. Scrub time could be weeks if scrub just 
 would behave. You may imagine that there are applications where segmentation 
 is a pain point, too.

I agree.

 The scrub will queue no more
 han 10 I/Os at one time to a device, so devices which
 can handle concurrent I/O
 are not consumed entirely by scrub I/O. This could be
 tuned lower, but your storage 
 is slow and *any* I/O activity will be noticed.
 
 There are a couple of things I maybe don't understand, then.
 
 - zpool iostat is reporting more than 1k of outputs while scrub

ok

 - throughput is as high as can be until maxing out CPU

You would rather your CPU be idle?  What use is an idle CPU, besides wasting 
energy :-)?

 - nominal I/O capacity of a single device is still around 90, how can 10 I/Os 
 already bring down payload

90 IOPS is approximately the worst-case rate for a 7,200 rpm disk for a small, 
random
workload. ZFS tends to write sequentially, so random writes tend to become 
sequential
writes on ZFS. So it is quite common to see scrub workloads with  90 IOPS.

 - scrubbing the same pool, configured as raidz1 didn't max out CPU which is 
 no surprise (haha, slow storage...) the notable part is that it didn't slow 
 down payload that much either.

raidz creates more, smaller writes than a mirror or simple stripe. If the disks 
are slow,
then the IOPS will be lower and the scrub takes longer, but the I/O scheduler 
can
manage the queue better (disks are slower).

 - scrub is obviously fine with data added or deleted during a pass. So, it 
 could be possible to pause and resume a pass, couldn't it?

You can start or stop scrubs, there no resume directive.   There are several
bugs/RFEs along these lines, something like:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6743992

 My conclusion from these observations is that not only disk speed counts 
 here, but other bottlenecks may strike as well. Solving the issue by the 
 wallet is one way, solving it by configuration of parameters is another. So, 
 is there a lever for scrub I/O prio, or not? Is there a possibility to pause 
 scrub passed and resume?

Scrub is already the lowest priority.  Would you like it to be lower?
I think the issue is more related to which queue is being managed by
the ZFS priority scheduler rather than the lack of scheduling priority.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Atlanta, March 16-18, 2010 http://nexenta-atlanta.eventbrite.com 
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss