Re: New ZFSv28 patchset for 8-STABLE

2011-01-12 Thread Marcus Reid
On Mon, Jan 10, 2011 at 06:30:39PM +0100, Attila Nagy wrote:
> >why and we can't ask him now, I'm afraid. I just sent an e-mail to
>
> What happened to him?

Oops, I was thinking of something else.

http://valleywag.gawker.com/383763/freebsd-developer-kip-macy-arrested-for-tormenting-tenants

Marcus
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Attila Nagy

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz

Link to mfsBSD ISO files for testing (i386 and amd64):
 http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-amd64.iso
 http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-i386.iso

The root password for the ISO files: "mfsroot"
The ISO files work on real systems and in virtualbox.
They conatin a full install of FreeBSD 8.2-PRERELEASE with ZFS v28,
simply use the provided "zfsinstall" script.

The patch is against FreeBSD 8-STABLE as of 2010-12-15.

When applying the patch be sure to use correct options for patch(1)
and make sure the file sys/cddl/compat/opensolaris/sys/sysmacros.h gets
deleted:

 # cd /usr/src
 # fetch
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
 # xz -d stable-8-zfsv28-20101215.patch.xz
 # patch -E -p0<  stable-8-zfsv28-20101215.patch
 # rm sys/cddl/compat/opensolaris/sys/sysmacros.h

I've just got a panic:
http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/IMAGE_006.jpg

The panic line for google:
panic: solaris assert: task->ost_magic == TASKQ_MAGIC, file: 
/usr/src/sys/modules/zfs/../../cddl/compat/opensolaris/kern/opensolaris_taskq.c, 
line: 150


I hope this is enough for debugging, if it's not yet otherwise known. If 
not, I will try to catch it againt and make a dump.


Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Attila Nagy

 On 01/10/2011 09:57 AM, Pawel Jakub Dawidek wrote:

On Sun, Jan 09, 2011 at 12:52:56PM +0100, Attila Nagy wrote:
[...]

I've finally found the time to read the v28 patch and figured out the
problem: vfs.zfs.l2arc_noprefetch was changed to 1, so it doesn't use
the prefetched data on the L2ARC devices.
This is a major hit in my case. Enabling this again restored the
previous hit rates and lowered the load on the hard disks significantly.

Well, not storing prefetched data on L2ARC vdevs is the default is
Solaris. For some reason it was changed by kmacy@ in r205231. Not sure
why and we can't ask him now, I'm afraid. I just sent an e-mail to

What happened to him?

Brendan Gregg from Oracle who originally implemented L2ARC in ZFS why
this is turned off by default. Once I get answer we can think about
turning it on again.

I think it makes some sense as a stupid form of preferring random IO in 
the L2ARC instead of sequential. But if I rely on auto tuning and let 
prefetch enabled, even a busy mailserver will prefetch a lot of blocks 
and I think that's a fine example of random IO (also, it makes the 
system unusable, but that's another story).


Having this choice is good, and in this case enabling this makes sense 
for me. I don't know any reasons about why you wouldn't use all of your 
L2ARC space (apart from sparing the quickly wearing out flash space and 
move disk heads instead), but I'm sure Brendan made this choice with a 
good reason.

If you get an answer, please tell us. :)

Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Attila Nagy

 On 01/10/2011 10:02 AM, Pawel Jakub Dawidek wrote:

On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:

No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read
error)
and it seems it froze the whole zpool. Removing the disk by hand solved
the problem.
I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.

Such hangs happen when I/O never returns. ZFS doesn't timeout I/O
requests on its own, this is driver's responsibility. It is still
strange that the driver didn't pass I/O error up to ZFS or it might as
well be ZFS bug, but I don't think so.

Indeed, it may to be a controller/driver bug. The newly released (last 
december) firmware says something about a similar problem. I've 
upgraded, we'll see whether it will help next time a drive goes awry.
I've only seen these errors in dmesg, not in zpool status, there 
everything was clear (all zeroes).


BTW, I've swapped those bad drives (da4, which reported the above 
errors, and da16, which didn't reported anything to the OS, it was just 
plain bad according to the controller firmware -and after its deletion, 
I could offline da4, so it seems it's the real cause, see my previous 
e-mail), and zpool replaced first da4, but after some seconds of 
thinking all IO on all disks deceased.
After waiting some minutes, it was still the same, so I've rebooted. 
Then I noticed that a scrub is going on, so I stopped it.
Then the zpool replace da4 went fine, it started to resilver the disk. 
But another zpool replace (for da16) causes the same error: some seconds 
of IO, then nothing and it stuck in that.


Has anybody tried replacing two drives simultaneously with the zfs v28 
patch? (this is a stripe of two raidz2s and da4 and da16 are in 
different raidz2)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Pawel Jakub Dawidek
On Sat, Dec 18, 2010 at 10:00:11AM +0100, Krzysztof Dajka wrote:
> Hi,
> I applied patch against evening 2010-12-16 STABLE. I did what Martin asked:
> 
> On Thu, Dec 16, 2010 at 1:44 PM, Martin Matuska  wrote:
> >    # cd /usr/src
> >    # fetch
> > http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
> >    # xz -d stable-8-zfsv28-20101215.patch.xz
> >    # patch -E -p0 < stable-8-zfsv28-20101215.patch
> >    # rm sys/cddl/compat/opensolaris/sys/sysmacros.h
> >
> Patch applied cleanly.
> 
> #make buildworld
> #make buildkernel
> #make installkernel
> Reboot into single user mode.
> #mergemaster -p
> #make installworld
> #mergemaster
> Reboot.
> 
> 
> Rebooting with old world and new kernel went fine. But after reboot
> with new world I got:
> ZFS: zfs_alloc()/zfs_free() mismatch
> Just before loading kernel modules, after that my system hangs.

Could you tell me more about you pool configuration?
'zpool status' output might be helpful.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgptxjJnkkXhF.pgp
Description: PGP signature


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Pawel Jakub Dawidek
On Sun, Jan 09, 2011 at 12:52:56PM +0100, Attila Nagy wrote:
[...]
> I've finally found the time to read the v28 patch and figured out the 
> problem: vfs.zfs.l2arc_noprefetch was changed to 1, so it doesn't use 
> the prefetched data on the L2ARC devices.
> This is a major hit in my case. Enabling this again restored the 
> previous hit rates and lowered the load on the hard disks significantly.

Well, not storing prefetched data on L2ARC vdevs is the default is
Solaris. For some reason it was changed by kmacy@ in r205231. Not sure
why and we can't ask him now, I'm afraid. I just sent an e-mail to
Brendan Gregg from Oracle who originally implemented L2ARC in ZFS why
this is turned off by default. Once I get answer we can think about
turning it on again.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpxIXdIFOMEK.pgp
Description: PGP signature


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Pawel Jakub Dawidek
On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:
> No, it's not related. One of the disks in the RAIDZ2 pool went bad:
> (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
> (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
> (da4:arcmsr0:0:4:0): SCSI status: Check Condition
> (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read 
> error)
> and it seems it froze the whole zpool. Removing the disk by hand solved 
> the problem.
> I've seen this previously on other machines with ciss.
> I wonder why ZFS didn't throw it out of the pool.

Such hangs happen when I/O never returns. ZFS doesn't timeout I/O
requests on its own, this is driver's responsibility. It is still
strange that the driver didn't pass I/O error up to ZFS or it might as
well be ZFS bug, but I don't think so.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp246BCVH7mU.pgp
Description: PGP signature


Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Jeremy Chadwick
On Sun, Jan 09, 2011 at 01:42:13PM +0100, Attila Nagy wrote:
>  On 01/09/2011 01:18 PM, Jeremy Chadwick wrote:
> >On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:
> >>  On 01/09/2011 10:00 AM, Attila Nagy wrote:
> >>>On 12/16/2010 01:44 PM, Martin Matuska wrote:
> Hi everyone,
> 
> following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
> providing a ZFSv28 testing patch for 8-STABLE.
> 
> Link to the patch:
> 
> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
> 
> 
> >>>I've got an IO hang with dedup enabled (not sure it's related,
> >>>I've started to rewrite all data on pool, which makes a heavy
> >>>load):
> >>>
> >>>The processes are in various states:
> >>>65747   1001  1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
> >>>80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
> >>>1501 www 1  440  7304K  2504K zio->i  0   2:09  0.00% nginx
> >>>1479 www 1  440  7304K  2416K zio->i  1   2:03  0.00% nginx
> >>>1477 www 1  440  7304K  2664K zio->i  0   2:02  0.00% nginx
> >>>1487 www 1  440  7304K  2376K zio->i  0   1:40  0.00% nginx
> >>>1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
> >>>1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx
> >>>
> >>>And everything which wants to touch the pool is/becomes dead.
> >>>
> >>>Procstat says about one process:
> >>># procstat -k 1497
> >>>  PIDTID COMM TDNAME   KSTACK
> >>>1497 100257 nginx-mi_switch
> >>>sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock
> >>>VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred
> >>>kern_openat syscallenter syscall Xfast_syscall
> >>No, it's not related. One of the disks in the RAIDZ2 pool went bad:
> >>(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
> >>(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
> >>(da4:arcmsr0:0:4:0): SCSI status: Check Condition
> >>(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered
> >>read error)
> >>and it seems it froze the whole zpool. Removing the disk by hand
> >>solved the problem.
> >>I've seen this previously on other machines with ciss.
> >>I wonder why ZFS didn't throw it out of the pool.
> >Hold on a minute.  An unrecoverable read error does not necessarily mean
> >the drive is bad, it could mean that the individual LBA that was
> >attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad
> >block was encountered).  I would check SMART stats on the disk (since
> >these are probably SATA given use of arcmsr(4)) and provide those.
> >*That* will tell you if the disk is bad.  I'll help you decode the
> >attributes values if you provide them.
> You are right, and I gave incorrect information. There are a lot
> more errors for that disk in the logs, and the zpool was frozen.
> I tried to offline the given disk. That helped in the ciss case,
> where the symptom is the same, or something similar, like there is
> no IO for ages, then something small and nothing for long
> seconds/minutes, and there are no errors logged. zpool status
> reported no errors, and the dmesg was clear too.
> There I could find the bad disk by watching gstat output and there I
> saw when the very small amount of IO was done, there was one disk
> with response times well above a second, while the others responded
> quickly.
> There the zpool offline helped. Here not, the command just got hang,
> like everything else.
> So what I did then: got into the areca-cli and searched for errors.
> One disk was set to failed and it seemed to be the cause. I've
> removed it (and did a camcontrol rescan, but I'm not sure it was
> necessary or not), and suddenly the zpool offline finished and
> everything went back to normal.
> But there are two controllers in the system and now I see that the
> above disk is on ctrl 1, while the one I have removed is on ctrl 2.
> I was misleaded by their same position. So now I have an offlined
> disk (which produces read errors, but I couldn't see them in the
> zpool output) and another, which is shown as failed in the RAID
> controller and got removed by hand (and solved the situation):
> NAME STATE READ WRITE CKSUM
> data DEGRADED 0 0 0
>   raidz2-0   DEGRADED 0 0 0
> label/disk20-01  ONLINE   0 0 0
> label/disk20-02  ONLINE   0 0 0
> label/disk20-03  ONLINE   0 0 0
> label/disk20-04  ONLINE   0 0 0
> label/disk20-05  OFFLINE  0 0 0
> label/disk20-06  ONLINE   0 0 0
> label/disk20-07  ONLINE   0 0 0
> label/disk20-08  ONLINE   0 0 0
> label/disk20-09  ONLINE   0 0 0
> label/disk20-10  ONLINE   0 0 0

Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Rich
Once upon a time, this was a known problem with the arcmsr driver not
correctly interacting with ZFS, resulting in this behavior.

Since I'm presuming that the arcmsr driver update which was intended
to fix this behavior (in my case, at least) is in your nightly build,
it's probably worth pinging the arcmsr driver maintainer about this.

- Rich

On Sun, Jan 9, 2011 at 7:18 AM, Jeremy Chadwick
 wrote:
> On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:
>>  On 01/09/2011 10:00 AM, Attila Nagy wrote:
>> > On 12/16/2010 01:44 PM, Martin Matuska wrote:
>> >>Hi everyone,
>> >>
>> >>following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
>> >>providing a ZFSv28 testing patch for 8-STABLE.
>> >>
>> >>Link to the patch:
>> >>
>> >>http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
>> >>
>> >>
>> >I've got an IO hang with dedup enabled (not sure it's related,
>> >I've started to rewrite all data on pool, which makes a heavy
>> >load):
>> >
>> >The processes are in various states:
>> >65747   1001      1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
>> >80383   1001      1  54   10 40616K 30196K select  1   5:38  0.00% rsync
>> > 1501 www         1  44    0  7304K  2504K zio->i  0   2:09  0.00% nginx
>> > 1479 www         1  44    0  7304K  2416K zio->i  1   2:03  0.00% nginx
>> > 1477 www         1  44    0  7304K  2664K zio->i  0   2:02  0.00% nginx
>> > 1487 www         1  44    0  7304K  2376K zio->i  0   1:40  0.00% nginx
>> > 1490 www         1  44    0  7304K  1852K zfs     0   1:30  0.00% nginx
>> > 1486 www         1  44    0  7304K  2400K zfsvfs  1   1:05  0.00% nginx
>> >
>> >And everything which wants to touch the pool is/becomes dead.
>> >
>> >Procstat says about one process:
>> ># procstat -k 1497
>> >  PID    TID COMM             TDNAME           KSTACK
>> > 1497 100257 nginx            -                mi_switch
>> >sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock
>> >VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred
>> >kern_openat syscallenter syscall Xfast_syscall
>> No, it's not related. One of the disks in the RAIDZ2 pool went bad:
>> (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
>> (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
>> (da4:arcmsr0:0:4:0): SCSI status: Check Condition
>> (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered
>> read error)
>> and it seems it froze the whole zpool. Removing the disk by hand
>> solved the problem.
>> I've seen this previously on other machines with ciss.
>> I wonder why ZFS didn't throw it out of the pool.
>
> Hold on a minute.  An unrecoverable read error does not necessarily mean
> the drive is bad, it could mean that the individual LBA that was
> attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad
> block was encountered).  I would check SMART stats on the disk (since
> these are probably SATA given use of arcmsr(4)) and provide those.
> *That* will tell you if the disk is bad.  I'll help you decode the
> attributes values if you provide them.
>
> My understanding is that a single LBA read failure should not warrant
> ZFS marking the disk UNAVAIL in the pool.  It should have incremented
> the READ error counter and that's it.  Did you receive a *single* error
> for the disk and then things went catatonic?
>
> If the entire system got wedged (a soft wedge, e.g. kernel is still
> alive but nothing's happening in userland), that could be a different
> problem -- either with ZFS or arcmsr(4).  Does ZFS have some sort of
> timeout value internal to itself where it will literally mark a disk
> UNAVAIL in the case that repeated I/O transactions takes "too long"?
> What is its error recovery methodology?
>
> Speaking strictly about Solaris 10 and ZFS: I have seen many, many times
> a system "soft wedge" after repeated I/O errors (read or write) are
> spewed out on the console for a single SATA disk (via AHCI), but only
> when the disk is used as a sole root filesystem disk (no mirror/raidz).
> My impression is that ZFS isn't the problem in this scenario.  In most
> cases, post-mortem debugging on my part shows that disks encountered
> some CRC errors (indicating cabling issues, etc.), sometimes as few as
> 2, but "something else" went crazy -- or possibly ZFS couldn't mark the
> disk UNAVAIL (if it has that logic) because it's a single disk
> associated with root.  Hardware in this scenario are Hitachi SATA disks
> with an ICH ESB2 controller, software is Solaris 10 (Generic_142901-06)
> with ZFS v15.
>
> --
> | Jeremy Chadwick                                   j...@parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.               PGP 4BD6C0CB |
>
> ___
> freebsd...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freeb

Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Attila Nagy

 On 01/09/2011 01:18 PM, Jeremy Chadwick wrote:

On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:

  On 01/09/2011 10:00 AM, Attila Nagy wrote:

On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz



I've got an IO hang with dedup enabled (not sure it's related,
I've started to rewrite all data on pool, which makes a heavy
load):

The processes are in various states:
65747   1001  1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
1501 www 1  440  7304K  2504K zio->i  0   2:09  0.00% nginx
1479 www 1  440  7304K  2416K zio->i  1   2:03  0.00% nginx
1477 www 1  440  7304K  2664K zio->i  0   2:02  0.00% nginx
1487 www 1  440  7304K  2376K zio->i  0   1:40  0.00% nginx
1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx

And everything which wants to touch the pool is/becomes dead.

Procstat says about one process:
# procstat -k 1497
  PIDTID COMM TDNAME   KSTACK
1497 100257 nginx-mi_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock
VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred
kern_openat syscallenter syscall Xfast_syscall

No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered
read error)
and it seems it froze the whole zpool. Removing the disk by hand
solved the problem.
I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.

Hold on a minute.  An unrecoverable read error does not necessarily mean
the drive is bad, it could mean that the individual LBA that was
attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad
block was encountered).  I would check SMART stats on the disk (since
these are probably SATA given use of arcmsr(4)) and provide those.
*That* will tell you if the disk is bad.  I'll help you decode the
attributes values if you provide them.
You are right, and I gave incorrect information. There are a lot more 
errors for that disk in the logs, and the zpool was frozen.
I tried to offline the given disk. That helped in the ciss case, where 
the symptom is the same, or something similar, like there is no IO for 
ages, then something small and nothing for long seconds/minutes, and 
there are no errors logged. zpool status reported no errors, and the 
dmesg was clear too.
There I could find the bad disk by watching gstat output and there I saw 
when the very small amount of IO was done, there was one disk with 
response times well above a second, while the others responded quickly.
There the zpool offline helped. Here not, the command just got hang, 
like everything else.

So what I did then: got into the areca-cli and searched for errors.
One disk was set to failed and it seemed to be the cause. I've removed 
it (and did a camcontrol rescan, but I'm not sure it was necessary or 
not), and suddenly the zpool offline finished and everything went back 
to normal.
But there are two controllers in the system and now I see that the above 
disk is on ctrl 1, while the one I have removed is on ctrl 2.
I was misleaded by their same position. So now I have an offlined disk 
(which produces read errors, but I couldn't see them in the zpool 
output) and another, which is shown as failed in the RAID controller and 
got removed by hand (and solved the situation):

NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
  raidz2-0   DEGRADED 0 0 0
label/disk20-01  ONLINE   0 0 0
label/disk20-02  ONLINE   0 0 0
label/disk20-03  ONLINE   0 0 0
label/disk20-04  ONLINE   0 0 0
label/disk20-05  OFFLINE  0 0 0
label/disk20-06  ONLINE   0 0 0
label/disk20-07  ONLINE   0 0 0
label/disk20-08  ONLINE   0 0 0
label/disk20-09  ONLINE   0 0 0
label/disk20-10  ONLINE   0 0 0
label/disk20-11  ONLINE   0 0 0
label/disk20-12  ONLINE   0 0 0
  raidz2-1   DEGRADED 0 0 0
label/disk21-01  ONLINE   0 0 0
label/disk21-02  ONLINE   0 0 0
label/disk21-03  ONLINE   0 0 0
label/disk21-04  ONLINE   0 0  

Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Jeremy Chadwick
On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:
>  On 01/09/2011 10:00 AM, Attila Nagy wrote:
> > On 12/16/2010 01:44 PM, Martin Matuska wrote:
> >>Hi everyone,
> >>
> >>following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
> >>providing a ZFSv28 testing patch for 8-STABLE.
> >>
> >>Link to the patch:
> >>
> >>http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
> >>
> >>
> >I've got an IO hang with dedup enabled (not sure it's related,
> >I've started to rewrite all data on pool, which makes a heavy
> >load):
> >
> >The processes are in various states:
> >65747   1001  1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
> >80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
> > 1501 www 1  440  7304K  2504K zio->i  0   2:09  0.00% nginx
> > 1479 www 1  440  7304K  2416K zio->i  1   2:03  0.00% nginx
> > 1477 www 1  440  7304K  2664K zio->i  0   2:02  0.00% nginx
> > 1487 www 1  440  7304K  2376K zio->i  0   1:40  0.00% nginx
> > 1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
> > 1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx
> >
> >And everything which wants to touch the pool is/becomes dead.
> >
> >Procstat says about one process:
> ># procstat -k 1497
> >  PIDTID COMM TDNAME   KSTACK
> > 1497 100257 nginx-mi_switch
> >sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock
> >VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred
> >kern_openat syscallenter syscall Xfast_syscall
> No, it's not related. One of the disks in the RAIDZ2 pool went bad:
> (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
> (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
> (da4:arcmsr0:0:4:0): SCSI status: Check Condition
> (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered
> read error)
> and it seems it froze the whole zpool. Removing the disk by hand
> solved the problem.
> I've seen this previously on other machines with ciss.
> I wonder why ZFS didn't throw it out of the pool.

Hold on a minute.  An unrecoverable read error does not necessarily mean
the drive is bad, it could mean that the individual LBA that was
attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad
block was encountered).  I would check SMART stats on the disk (since
these are probably SATA given use of arcmsr(4)) and provide those.
*That* will tell you if the disk is bad.  I'll help you decode the
attributes values if you provide them.

My understanding is that a single LBA read failure should not warrant
ZFS marking the disk UNAVAIL in the pool.  It should have incremented
the READ error counter and that's it.  Did you receive a *single* error
for the disk and then things went catatonic?

If the entire system got wedged (a soft wedge, e.g. kernel is still
alive but nothing's happening in userland), that could be a different
problem -- either with ZFS or arcmsr(4).  Does ZFS have some sort of
timeout value internal to itself where it will literally mark a disk
UNAVAIL in the case that repeated I/O transactions takes "too long"?
What is its error recovery methodology?

Speaking strictly about Solaris 10 and ZFS: I have seen many, many times
a system "soft wedge" after repeated I/O errors (read or write) are
spewed out on the console for a single SATA disk (via AHCI), but only
when the disk is used as a sole root filesystem disk (no mirror/raidz).
My impression is that ZFS isn't the problem in this scenario.  In most
cases, post-mortem debugging on my part shows that disks encountered
some CRC errors (indicating cabling issues, etc.), sometimes as few as
2, but "something else" went crazy -- or possibly ZFS couldn't mark the
disk UNAVAIL (if it has that logic) because it's a single disk
associated with root.  Hardware in this scenario are Hitachi SATA disks
with an ICH ESB2 controller, software is Solaris 10 (Generic_142901-06)
with ZFS v15.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Attila Nagy

 On 01/01/2011 08:09 PM, Artem Belevich wrote:

On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy  wrote:

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)


...

Any ideas on what could cause these? I haven't upgraded the pool version and
nothing was changed in the pool or in the file system.

The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.


I've finally found the time to read the v28 patch and figured out the 
problem: vfs.zfs.l2arc_noprefetch was changed to 1, so it doesn't use 
the prefetched data on the L2ARC devices.
This is a major hit in my case. Enabling this again restored the 
previous hit rates and lowered the load on the hard disks significantly.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Attila Nagy

 On 01/09/2011 10:00 AM, Attila Nagy wrote:

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz 



I've got an IO hang with dedup enabled (not sure it's related, I've 
started to rewrite all data on pool, which makes a heavy load):


The processes are in various states:
65747   1001  1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
 1501 www 1  440  7304K  2504K zio->i  0   2:09  0.00% nginx
 1479 www 1  440  7304K  2416K zio->i  1   2:03  0.00% nginx
 1477 www 1  440  7304K  2664K zio->i  0   2:02  0.00% nginx
 1487 www 1  440  7304K  2376K zio->i  0   1:40  0.00% nginx
 1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
 1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx

And everything which wants to touch the pool is/becomes dead.

Procstat says about one process:
# procstat -k 1497
  PIDTID COMM TDNAME   KSTACK
 1497 100257 nginx-mi_switch sleepq_wait 
__lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV 
_vn_lock nullfs_root lookup namei vn_open_cred kern_openat 
syscallenter syscall Xfast_syscall

No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read 
error)
and it seems it froze the whole zpool. Removing the disk by hand solved 
the problem.

I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Attila Nagy

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz

I've got an IO hang with dedup enabled (not sure it's related, I've 
started to rewrite all data on pool, which makes a heavy load):


The processes are in various states:
65747   1001  1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
 1501 www 1  440  7304K  2504K zio->i  0   2:09  0.00% nginx
 1479 www 1  440  7304K  2416K zio->i  1   2:03  0.00% nginx
 1477 www 1  440  7304K  2664K zio->i  0   2:02  0.00% nginx
 1487 www 1  440  7304K  2376K zio->i  0   1:40  0.00% nginx
 1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
 1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx

And everything which wants to touch the pool is/becomes dead.

Procstat says about one process:
# procstat -k 1497
  PIDTID COMM TDNAME   KSTACK
 1497 100257 nginx-mi_switch sleepq_wait 
__lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV 
_vn_lock nullfs_root lookup namei vn_open_cred kern_openat syscallenter 
syscall Xfast_syscall


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-04 Thread Attila Nagy

 On 01/03/2011 10:35 PM, Bob Friesenhahn wrote:


After four days, the L2 hit rate is still hovering around 10-20 
percents (was between 60-90), so I think it's clearly a regression in 
the ZFSv28 patch...

And the massive growth in CPU usage can also very nicely be seen...

I've updated the graphs at (switch time can be checked on the zfs-mem 
graph):

http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/

There is a new phenomenom: the large IOPS peaks. I use this munin 
script on a lot of machines and never seen anything like this... I'm 
not sure whether it's related or not.


It is not so clear that there is a problem.  I am not sure what you 
are using this server for but it is wise 

The IO pattern has changed radically, so for me it's a problem.
to consider that this is the funny time when a new year starts, SPAM 
delivery goes through the roof, and employees and customers behave 
differently.  You chose the worst time of the year to implement the 
change and observe behavior.
It's a free software mirror, ftp.fsn.hu, and I'm sure that it's (the 
very low hit rate and the increased CPU usage) not related to the time 
when I made the switch.


CPU use is indeed increased somewhat.  A lower loading of the l2arc is 
not necessarily a problem.  The l2arc is usually bandwidth limited 
compared with main store so if bulk data can not be cached in RAM, 
then it is best left in main store.  A smarter l2arc algorithm could 
put only the data producing the expensive IOPS (the ones requiring a 
seek) in the l2arc, lessening the amount of data cached on the device.
That would make sense, if I wouldn't have 100-120 IOPS (for 7k2 RPM 
disks, it's about their max, gstat tells me the same) on the disks, and 
as low as 10 percents of L2 hit rate.
What's smarter? Having 60-90% hit rate from the SSDs and moving the slow 
disk heads less, or having 10-20 percent of hit rate and kill the disks 
with random IO?
If you are right, ZFS tries to be too smart and falls on its face with 
this kind of workload.


BTW, I've checked the v15-v28 patch for arc.c, and I can't see any L2ARC 
related change there. I'm not sure whether the hypothetical logic would 
be there, or a different file, I haven't read it end to end.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-03 Thread Bob Friesenhahn


After four days, the L2 hit rate is still hovering around 10-20 percents (was 
between 60-90), so I think it's clearly a regression in the ZFSv28 patch...

And the massive growth in CPU usage can also very nicely be seen...

I've updated the graphs at (switch time can be checked on the zfs-mem graph):
http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/

There is a new phenomenom: the large IOPS peaks. I use this munin script on a 
lot of machines and never seen anything like this... I'm not sure whether 
it's related or not.


It is not so clear that there is a problem.  I am not sure what you 
are using this server for but it is wise to consider that this is the 
funny time when a new year starts, SPAM delivery goes through the 
roof, and employees and customers behave differently.  You chose the 
worst time of the year to implement the change and observe behavior.


CPU use is indeed increased somewhat.  A lower loading of the l2arc is 
not necessarily a problem.  The l2arc is usually bandwidth limited 
compared with main store so if bulk data can not be cached in RAM, 
then it is best left in main store.  A smarter l2arc algorithm could 
put only the data producing the expensive IOPS (the ones requiring a 
seek) in the l2arc, lessening the amount of data cached on the device.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-03 Thread Attila Nagy

 On 01/01/2011 08:09 PM, Artem Belevich wrote:

On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy  wrote:

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)


...

Any ideas on what could cause these? I haven't upgraded the pool version and
nothing was changed in the pool or in the file system.

The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.


After four days, the L2 hit rate is still hovering around 10-20 percents 
(was between 60-90), so I think it's clearly a regression in the ZFSv28 
patch...

And the massive growth in CPU usage can also very nicely be seen...

I've updated the graphs at (switch time can be checked on the zfs-mem 
graph):

http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/

There is a new phenomenom: the large IOPS peaks. I use this munin script 
on a lot of machines and never seen anything like this... I'm not sure 
whether it's related or not.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-02 Thread J. Hellenthal
On 01/02/2011 03:45, Attila Nagy wrote:
>  On 01/02/2011 05:06 AM, J. Hellenthal wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> On 01/01/2011 13:18, Attila Nagy wrote:
>>>   On 12/16/2010 01:44 PM, Martin Matuska wrote:
 Link to the patch:

 http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz




>>> I've used this:
>>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz
>>>
>>>
>>> on a server with amd64, 8 G RAM, acting as a file server on
>>> ftp/http/rsync, the content being read only mounted with nullfs in
>>> jails, and the daemons use sendfile (ftp and http).
>>>
>>> The effects can be seen here:
>>> http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/
>>> the exact moment of the switch can be seen on zfs_mem-week.png, where
>>> the L2 ARC has been discarded.
>>>
>>> What I see:
>>> - increased CPU load
>>> - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
>>> hard disk load (IOPS graph)
>>>
>>> Maybe I could accept the higher system load as normal, because there
>>> were a lot of things changed between v15 and v28 (but I was hoping if I
>>> use the same feature set, it will require less CPU), but dropping the
>>> L2ARC hit rate so radically seems to be a major issue somewhere.
>>> As you can see from the memory stats, I have enough kernel memory to
>>> hold the L2 headers, so the L2 devices got filled up to their maximum
>>> capacity.
>>>
>>> Any ideas on what could cause these? I haven't upgraded the pool version
>>> and nothing was changed in the pool or in the file system.
>>>
>> Running arc_summary.pl[1] -p4 should print a summary about your l2arc
>> and you should also notice in that section that there is a high number
>> of "SPA Mismatch" mine usually grew to around 172k before I would notice
>> a crash and I could reliably trigger this while in scrub.
>>
>> What ever is causing this needs desperate attention!
>>
>> I emailed mm@ privately off-list when I noticed this going on but have
>> not received any feedback as of yet.
> It's at zero currently (2 days of uptime):
> kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
> 

Right but do you have a 'cache' 'l2arc' vdev attached to any pool in the
system ? This suggests to me that you do not at this time.

If not can you attach a cache vdev and run a scrub on it and monitor the
value of that MIB ?

-- 

Regards,

 jhell,v
 JJH48-ARIN
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-02 Thread Attila Nagy

 On 01/02/2011 05:06 AM, J. Hellenthal wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/01/2011 13:18, Attila Nagy wrote:

  On 12/16/2010 01:44 PM, Martin Matuska wrote:

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz




I've used this:
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz

on a server with amd64, 8 G RAM, acting as a file server on
ftp/http/rsync, the content being read only mounted with nullfs in
jails, and the daemons use sendfile (ftp and http).

The effects can be seen here:
http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/
the exact moment of the switch can be seen on zfs_mem-week.png, where
the L2 ARC has been discarded.

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)

Maybe I could accept the higher system load as normal, because there
were a lot of things changed between v15 and v28 (but I was hoping if I
use the same feature set, it will require less CPU), but dropping the
L2ARC hit rate so radically seems to be a major issue somewhere.
As you can see from the memory stats, I have enough kernel memory to
hold the L2 headers, so the L2 devices got filled up to their maximum
capacity.

Any ideas on what could cause these? I haven't upgraded the pool version
and nothing was changed in the pool or in the file system.


Running arc_summary.pl[1] -p4 should print a summary about your l2arc
and you should also notice in that section that there is a high number
of "SPA Mismatch" mine usually grew to around 172k before I would notice
a crash and I could reliably trigger this while in scrub.

What ever is causing this needs desperate attention!

I emailed mm@ privately off-list when I noticed this going on but have
not received any feedback as of yet.

It's at zero currently (2 days of uptime):
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread J. Hellenthal
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/01/2011 13:18, Attila Nagy wrote:
>  On 12/16/2010 01:44 PM, Martin Matuska wrote:
>> Link to the patch:
>>
>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
>>
>>
>>
> I've used this:
> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz
> 
> on a server with amd64, 8 G RAM, acting as a file server on
> ftp/http/rsync, the content being read only mounted with nullfs in
> jails, and the daemons use sendfile (ftp and http).
> 
> The effects can be seen here:
> http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/
> the exact moment of the switch can be seen on zfs_mem-week.png, where
> the L2 ARC has been discarded.
> 
> What I see:
> - increased CPU load
> - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
> hard disk load (IOPS graph)
> 
> Maybe I could accept the higher system load as normal, because there
> were a lot of things changed between v15 and v28 (but I was hoping if I
> use the same feature set, it will require less CPU), but dropping the
> L2ARC hit rate so radically seems to be a major issue somewhere.
> As you can see from the memory stats, I have enough kernel memory to
> hold the L2 headers, so the L2 devices got filled up to their maximum
> capacity.
> 
> Any ideas on what could cause these? I haven't upgraded the pool version
> and nothing was changed in the pool or in the file system.
> 

Running arc_summary.pl[1] -p4 should print a summary about your l2arc
and you should also notice in that section that there is a high number
of "SPA Mismatch" mine usually grew to around 172k before I would notice
a crash and I could reliably trigger this while in scrub.

What ever is causing this needs desperate attention!

I emailed mm@ privately off-list when I noticed this going on but have
not received any feedback as of yet.

[1] http://bit.ly/fdRiYT

- -- 

Regards,

 jhell,v
 JJH48-ARIN
-BEGIN PGP SIGNATURE-

iQEcBAEBAgAGBQJNH/msAAoJEJBXh4mJ2FR+bFYH/0bBJbLYU5zzbqpUUXX1M/B9
+g8RwQ9Tek4/fxwpD8DNIfkpzO0MvUcx5Nhwld69jk7sSys9IUpYhuYVggcOgavx
sl6AwNNUG0XD/spO2RvV3jD4tVbR6TjlSdLCyBG7iPFU2nNB6wZM+UfWxGYwEyUE
loOr13Vk4eU2l2cepUwJH0oGu2hsDZ7qR/fTd+d33NfS6/PT43vbCjPNTsnDJeY9
MdeC5vBUPl3AW3iC/5hxBi9WABGMHeAXTolpAtBQVBNi22mINacYFO6FEdfANy9E
Xo207Cd6vBmZb8aTs0BFHs5ZdTHUco/iNysaWvzx9TlIWlyyBRgOXgtBweB+6d4=
=lcxW
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread Attila Nagy

 On 01/01/2011 08:09 PM, Artem Belevich wrote:

On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy  wrote:

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)


...

Any ideas on what could cause these? I haven't upgraded the pool version and
nothing was changed in the pool or in the file system.

The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.


Sadly no, but I remember that I've seen increasing hit rates as the 
cache grew, that's what I wrote the email after one and a half days.

Currently it's at the same level, when it was right after the reboot...

We'll see after few days.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread Artem Belevich
On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy  wrote:
> What I see:
> - increased CPU load
> - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
> hard disk load (IOPS graph)
>
...
> Any ideas on what could cause these? I haven't upgraded the pool version and
> nothing was changed in the pool or in the file system.


The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.

--Artem
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread Attila Nagy

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz



I've used this:
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz
on a server with amd64, 8 G RAM, acting as a file server on 
ftp/http/rsync, the content being read only mounted with nullfs in 
jails, and the daemons use sendfile (ftp and http).


The effects can be seen here:
http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/
the exact moment of the switch can be seen on zfs_mem-week.png, where 
the L2 ARC has been discarded.


What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased 
hard disk load (IOPS graph)


Maybe I could accept the higher system load as normal, because there 
were a lot of things changed between v15 and v28 (but I was hoping if I 
use the same feature set, it will require less CPU), but dropping the 
L2ARC hit rate so radically seems to be a major issue somewhere.
As you can see from the memory stats, I have enough kernel memory to 
hold the L2 headers, so the L2 devices got filled up to their maximum 
capacity.


Any ideas on what could cause these? I haven't upgraded the pool version 
and nothing was changed in the pool or in the file system.


Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-29 Thread Jean-Yves Avenard
On Wednesday, 29 December 2010, jhell  wrote:

>
> Another note too, I think I read that you mentioned using the L2ARC and
> slog device on the same disk You simply shouldn't do this it could
> be contributing to the real cause and there is absolutely no gain in
> either sanity or performance and you will end up bottle-necking your system.
>
>>

And why would that be?

I've read so many conflictinginformation on the matter over the past
few days that I'm starting to wonder if there's an actual definitive
answer on the matter or if anyone has a clue regarding what they're
talking about.

It ranges from, should only use raw disk to freebsd isn't solaris so
slices are fine. Don't use slice because they can't be read by another
OS use partitions..
It doesn't apply to SSD and so on..

The way I look at it, the only thing that would bottleneck access to
that SSD drive, is the SATA interface itself. So using two drives, or
two partitions on the same drive, I can't see how it would make much
difference if any other than the traditional "I think I know"
argument. Surely latency as with know it with hard drive do not apply
to SSDs.

Even within sun's official documentation, they are contradicting
information, starting from the commands on how to add remove/cache of
log device.

It seems to me that tuning ZFS is very much like black magic, everyone
has their own idea about what to do, and not once did I get to read
conclusive evidence about what is best or find an information people
actually agree on.

As for using unofficial code, sure I accept that risk now. I made a
conscious decision on using it, there's now no way to go back and I
accept that.
At the end of the day, it's the only thing that will make that code
suitable for real world condition: testing. If that particular code
isn't put under any actual stress how else are you going to know if
its good or not.

I don't really like reading between the lines of your post that I
shouldn't be surprised should anything break or that it doesn't matter
if it crashes. there's a deadlock occurring somewhere : it needs to be
found. I know nothing about the ZFS code, and I could only do what I'm
capable of under those circumstances: find a way to reproduce the
problem consistently, report as much information as I have so someone
more clueey will know what to do with it.

Hope that makes sense

Jean-Yves
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-28 Thread jhell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/28/2010 18:20, Martin Matuska wrote:
> Please don't consider these patches as production-ready.
> What we want to do is find and resolve as many bugs as possible.

I completely agree with Martin here. If your running it then your
willing to loose what you have if you have not taken precaution to save
your data somewhere else. Even though with that said ZFS does a pretty
fine job of ensuring that nothing happens to it, it is still best
practice to have a copy somewhere other than "IN THAT BOX" ;)

Another note too, I think I read that you mentioned using the L2ARC and
slog device on the same disk You simply shouldn't do this it could
be contributing to the real cause and there is absolutely no gain in
either sanity or performance and you will end up bottle-necking your system.

> 
> To help us fix these bugs, a way to reproduce the bug from a clean start
> (e.g. in virtualbox) would be great and speed up finding the cause for
> the problem.
> 
> Your problem looks like some sort of deadlock. In your case, when you
> experiene the hang, try running "procstat -k -k PID" in another shell
> (console). That will give us valuable information.
> 

Martin,

I agree with the above that it may be some sort of live or dead lock
problem in this case. It would be awesome to know if some of the
following sysctl(8)'s values are and how this reacts when set to the
opposite of their current values.

vfs.zfs.l2arc_noprefetch:
vfs.zfs.dedup.prefetch:
vfs.zfs.prefetch_disable

The reason why I say this is on one of my personal systems that I toy
with the box cannot make it very long with prefetch enabled on either
v15 or v28 after some 'unknown' commit to world on stable/8. Now this
may actually be just a contributing factor that makes it happen sooner
than it normally would but probably also directly relates to the exact
problem. I would love to see this go away as I had been using the L2ARC
with prefetch enabled for a long time and now all of a sudden just
plainly does not work correctly.

I also have about 19 core.txt.NN files from when this started happening
with various stack traces. If you would like these just let me know and
Ill privately mail them to you.


Regards,

- -- 

 jhell,v - JJH48-ARIN
-BEGIN PGP SIGNATURE-

iQEbBAEBAgAGBQJNGsDLAAoJEJBXh4mJ2FR+TqkH8wVFQKiU/C6L+F4Y3/ClScQD
b4s0IkC1B+bHl9eD6Hhxif/1iKj1w9clYvuLt8ageDF98KTB9GCRjuh48VswdtPQ
FQtDRTj1pGzWPxmOn2Nrf7qrFnymqZk+qoTBX8A1nDvrSe41Mqp82ue9E7nZ1ipg
Dz9k5F8J+WxUAZYLHxtxLvYEa19/hvG1K5LOpRKIU0iycsqaywezFflTGDcR5lT8
A50ic9sZ21jr87CK45TLv1Wmu+kDgpy2j1x77bYTDGoAzQMlPcOENO8st8EobcWB
eIwwXIjtRwOKF4rSxoqwwYxOM4ek+tK4p6ZnO1uLipNXMB+zJTjs//GV6Xp3TA==
=Io4+
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-28 Thread Jean-Yves Avenard
Hi

On Wednesday, 29 December 2010, Martin Matuska  wrote:
> Please don't consider these patches as production-ready.
> What we want to do is find and resolve as many bugs as possible.
>
> To help us fix these bugs, a way to reproduce the bug from a clean start
> (e.g. in virtualbox) would be great and speed up finding the cause for
> the problem.
>
> Your problem looks like some sort of deadlock. In your case, when you
> experiene the hang, try running "procstat -k -k PID" in another shell
> (console). That will give us valuable information.
>

I am away until next week now (hopefully no problem will occur until then)

I will try reproduce the issue then. I have to say that v28 massively
increased write performance over samba, over 3 times than when v14 or
v15.

How do you disable ZIL with v28? I wanted to test performance in the
case I'm trying to qtroubleshoot.
Writing our file over samba,
V14-v15: 55s
V14-v15 zip disabled: 6s
V28: 16s.. (with or without separate log drive: SSD Intel X25-m 40GB).
Playing with the only zil parameter showing in sysctl, made no
difference whatsoever.
UFS boot drive: 14s

sequential read shows over 280MB/s from that raidz array, similar with writes

I started a thread in the freebsd forum:
http://forums.freebsd.org/showthread.php?t=20476

And finally, which patch should I try on your site?

Thanks
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-28 Thread Martin Matuska
Please don't consider these patches as production-ready.
What we want to do is find and resolve as many bugs as possible.

To help us fix these bugs, a way to reproduce the bug from a clean start
(e.g. in virtualbox) would be great and speed up finding the cause for
the problem.

Your problem looks like some sort of deadlock. In your case, when you
experiene the hang, try running "procstat -k -k PID" in another shell
(console). That will give us valuable information.

Cheers,
mm

Dňa 28.12.2010 18:39, Jean-Yves Avenard  wrote / napísal(a):
> On 29 December 2010 03:15, Jean-Yves Avenard  wrote:
> 
>> # zpool import
>> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 15.11r 0.00u 0.03s 0% 2556k
>> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 15.94r 0.00u 0.03s 0% 2556k
>> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 16.57r 0.00u 0.03s 0% 2556k
>> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 16.95r 0.00u 0.03s 0% 2556k
>> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 32.19r 0.00u 0.03s 0% 2556k
>> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 32.72r 0.00u 0.03s 0% 2556k
>> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 40.13r 0.00u 0.03s 0% 2556k
>>
>> ah ah !
>> it's not the separate log that make zpool crash, it's the cache !
>>
>> Having the cache in prevent from importing the pool again
>>
>> rebooting: same deal... can't access the pool any longer !
>>
>> Hopefully this is enough hint for someone to track done the bug ...
>>
> 
> More details as I was crazy enough to try various things.
> 
> The problem of zpool being stuck in spa_namespace_lock, only occurs if
> you are using both the cache and the log at the same time.
> Use one or the other : then there's no issue
> 
> But the instant you add both log and cache to the pool, it becomes unusable.
> 
> Now, I haven't tried using cache and log from a different disk. The
> motherboard on the server has 8 SATA ports, and I have no free port to
> add another disk. So my only option to have both a log and cache
> device in my zfs pool, is to use two slices on the same disk.
> 
> Hope this helps..
> Jean-Yves
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-28 Thread Freddie Cash
On Tue, Dec 28, 2010 at 9:39 AM, Jean-Yves Avenard
> Now, I haven't tried using cache and log from a different disk. The
> motherboard on the server has 8 SATA ports, and I have no free port to
> add another disk. So my only option to have both a log and cache
> device in my zfs pool, is to use two slices on the same disk.

For testing, you can always just connect a USB stick, and use that for
the cache.  I've done this on large ZFS systems (24-drives) and on my
home server (5-drives).  Works nicely.

That should narrow it down to either "can't use cache/log on same
device" or "can't use cache and log at the same time".

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: ARRRGG HELP !!

2010-12-28 Thread Freddie Cash
On Tue, Dec 28, 2010 at 8:58 AM, Jean-Yves Avenard  wrote:
> On 28 December 2010 08:56, Freddie Cash  wrote:
>
>> Is that a typo, or the actual command you used?  You have an extra "s"
>> in there.  Should be "log" and not "logs".  However, I don't think
>> that command is correct either.
>>
>> I believe you want to use the "detach" command, not "remove".
>>
>> # zpool detach pool label/zil
>
> well, I tried the detach command:
>
> server4# zpool detach pool ada1s1
> cannot detach ada1s1: only applicable to mirror and replacing vdevs
>
> server4# zpool remove pool ada1s1
> server4#
>
> so you need to use remove, and adding log (or cache) makes no
> difference whatsoever..
>
Interesting, thanks for the confirmation.  I don't have any ZFS
systems using log devices, so can only go by what's in the docs.

May want to make a note that the man page (at least for ZFSv15 in
FreeBSD 8.1) includes several sections that use "zpool detach" when
talking about log devices.

If that's still in the man page for ZFSv28, it'll need to be cleaned up.


-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-28 Thread Jean-Yves Avenard
On 29 December 2010 03:15, Jean-Yves Avenard  wrote:

> # zpool import
> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 15.11r 0.00u 0.03s 0% 2556k
> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 15.94r 0.00u 0.03s 0% 2556k
> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 16.57r 0.00u 0.03s 0% 2556k
> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 16.95r 0.00u 0.03s 0% 2556k
> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 32.19r 0.00u 0.03s 0% 2556k
> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 32.72r 0.00u 0.03s 0% 2556k
> load: 0.00  cmd: zpool 405 [spa_namespace_lock] 40.13r 0.00u 0.03s 0% 2556k
>
> ah ah !
> it's not the separate log that make zpool crash, it's the cache !
>
> Having the cache in prevent from importing the pool again
>
> rebooting: same deal... can't access the pool any longer !
>
> Hopefully this is enough hint for someone to track done the bug ...
>

More details as I was crazy enough to try various things.

The problem of zpool being stuck in spa_namespace_lock, only occurs if
you are using both the cache and the log at the same time.
Use one or the other : then there's no issue

But the instant you add both log and cache to the pool, it becomes unusable.

Now, I haven't tried using cache and log from a different disk. The
motherboard on the server has 8 SATA ports, and I have no free port to
add another disk. So my only option to have both a log and cache
device in my zfs pool, is to use two slices on the same disk.

Hope this helps..
Jean-Yves
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: ARRRGG HELP !!

2010-12-28 Thread Jean-Yves Avenard
On 28 December 2010 08:56, Freddie Cash  wrote:

> Is that a typo, or the actual command you used?  You have an extra "s"
> in there.  Should be "log" and not "logs".  However, I don't think
> that command is correct either.
>
> I believe you want to use the "detach" command, not "remove".
>
> # zpool detach pool label/zil

well, I tried the detach command:

server4# zpool detach pool ada1s1
cannot detach ada1s1: only applicable to mirror and replacing vdevs

server4# zpool remove pool ada1s1
server4#

so you need to use remove, and adding log (or cache) makes no
difference whatsoever..
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-28 Thread Jean-Yves Avenard
Hi

On 27 December 2010 16:04, jhell  wrote:

> 1) Set vfs.zfs.recover=1 at the loader prompt (OK set vfs.zfs.recover=1)
> 2) Boot into single user mode without opensolaris.ko and zfs.ko loaded
> 3) ( mount -w / ) to make sure you can remove and also write new
> zpool.cache as needed.
> 3) Remove /boot/zfs/zpool.cache
> 4) kldload both zfs and opensolaris i.e. ( kldload zfs ) should do the trick
> 5) verify that vfs.zfs.recover=1 is set then ( zpool import pool )
> 6) Give it a little bit monitor activity using Ctrl+T to see activity.

Ok..

I've got into the same situation again, no idea why this time.

I've followed your instructions, and sure enough I could do an import
of my pool again.

However, wanted to find out what was going on..
So I did:
zpool export pool

followed by zpool import

And guess what ... hanged zpool again.. can't Ctrl-C it, have to reboot..

So here we go again.
Rebooted as above.
zpool import pool -> ok

this time, I decided that maybe that what was screwing things up was the cache.
zpool remove pool ada1s2 -> ok
zpool status:
# zpool status
  pool: pool
 state: ONLINE
 scan: scrub repaired 0 in 18h20m with 0 errors on Tue Dec 28 10:28:05 2010
config:

NAMESTATE READ WRITE CKSUM
poolONLINE   0 0 0
  raidz1-0  ONLINE   0 0 0
ada2ONLINE   0 0 0
ada3ONLINE   0 0 0
ada4ONLINE   0 0 0
ada5ONLINE   0 0 0
ada6ONLINE   0 0 0
ada7ONLINE   0 0 0
logs
  ada1s1ONLINE   0 0 0

errors: No known data errors

# zpool export pool -> ok
# zpool import pool -> ok
# zpool add pool cache /dev/ada1s2 -> ok
# zpool status
  pool: pool
 state: ONLINE
 scan: scrub repaired 0 in 18h20m with 0 errors on Tue Dec 28 10:28:05 2010
config:

NAMESTATE READ WRITE CKSUM
poolONLINE   0 0 0
  raidz1-0  ONLINE   0 0 0
ada2ONLINE   0 0 0
ada3ONLINE   0 0 0
ada4ONLINE   0 0 0
ada5ONLINE   0 0 0
ada6ONLINE   0 0 0
ada7ONLINE   0 0 0
logs
  ada1s1ONLINE   0 0 0
cache
  ada1s2ONLINE   0 0 0

errors: No known data errors

# zpool export pool -> ok

# zpool import
load: 0.00  cmd: zpool 405 [spa_namespace_lock] 15.11r 0.00u 0.03s 0% 2556k
load: 0.00  cmd: zpool 405 [spa_namespace_lock] 15.94r 0.00u 0.03s 0% 2556k
load: 0.00  cmd: zpool 405 [spa_namespace_lock] 16.57r 0.00u 0.03s 0% 2556k
load: 0.00  cmd: zpool 405 [spa_namespace_lock] 16.95r 0.00u 0.03s 0% 2556k
load: 0.00  cmd: zpool 405 [spa_namespace_lock] 32.19r 0.00u 0.03s 0% 2556k
load: 0.00  cmd: zpool 405 [spa_namespace_lock] 32.72r 0.00u 0.03s 0% 2556k
load: 0.00  cmd: zpool 405 [spa_namespace_lock] 40.13r 0.00u 0.03s 0% 2556k

ah ah !
it's not the separate log that make zpool crash, it's the cache !

Having the cache in prevent from importing the pool again

rebooting: same deal... can't access the pool any longer !

Hopefully this is enough hint for someone to track done the bug ...
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: ARRRGG HELP !!

2010-12-28 Thread Jean-Yves Avenard
Well

Today I added the log device:
zpool add pool log /dev/ada1s1 (8GB slice on a SSD Intel X25 disk)..

then added the cache (32GB)
zpool add pool cache /dev/ada1s2

So far so good.
zpool status -> all good.

Reboot : it hangs

booted in single user mode, zpool status:
ZFS filesystem version 5
ZFS storage pool version 28

and that's it no more.. Just like before when I thought that removing
the log disk had failed.

This time no error nothing... just a nasty hang and unusable system again...

:(
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: ARRRGG HELP !!

2010-12-27 Thread Jean-Yves Avenard
Hi

On Tuesday, 28 December 2010, Freddie Cash  wrote:
> On Sun, Dec 26, 2010 at 4:43 PM, Jean-Yves Avenard  
> wrote:
>> On 27 December 2010 09:55, Jean-Yves Avenard  wrote:
>>> Hi there.
>>>
>>> I used stable-8-zfsv28-20101223-nopython.patch.xz from
>>> http://people.freebsd.org/~mm/patches/zfs/v28/
>>
>> I did the following:
>>
>> # zpool status
>>  pool: pool
>>  state: ONLINE
>>  scan: none requested
>> config:
>>
>>        NAME            STATE     READ WRITE CKSUM
>>        pool            ONLINE       0     0     0
>>          raidz1-0      ONLINE       0     0     0
>>            ada2        ONLINE       0     0     0
>>            ada3        ONLINE       0     0     0
>>            ada4        ONLINE       0     0     0
>>            ada5        ONLINE       0     0     0
>>            ada6        ONLINE       0     0     0
>>            ada7        ONLINE       0     0     0
>>        cache
>>          label/zcache  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> so far so good
>>
>> [r...@server4 /pool/home/jeanyves_avenard]# zpool add pool log
>> /dev/label/zil [r...@server4 /pool/home/jeanyves_avenard]# zpool
>> status
>>  pool: pool
>>  state: ONLINE
>>  scan: none requested
>> config:
>>
>>        NAME            STATE     READ WRITE CKSUM
>>        pool            ONLINE       0     0     0
>>          raidz1-0      ONLINE       0     0     0
>>            ada2        ONLINE       0     0     0
>>            ada3        ONLINE       0     0     0
>>            ada4        ONLINE       0     0     0
>>            ada5        ONLINE       0     0     0
>>            ada6        ONLINE       0     0     0
>>            ada7        ONLINE       0     0     0
>>        logs
>>          label/zil     ONLINE       0     0     0
>>        cache
>>          label/zcache  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> so far so good:
>>
>> # zpool remove pool logs label/zil
>> cannot remove logs: no such device in pool
>
> Is that a typo, or the actual command you used?  You have an extra "s"
> in there.  Should be "log" and not "logs".  However, I don't think
> that command is correct either.
>
> I believe you want to use the "detach" command, not "remove".

> # zpool detach pool label/zil
>
> --
> Freddie Cash
> fjwc...@gmail.com
>

It was a typo, it should have been log (according to sun's doc). As it
was showing "logs" in the status I typed this.
According to sun, it zpool remove pool cache/log

A typo should have never resulted in what happened, showing an error
for sure; but zpool hanging and kernel panic?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: ARRRGG HELP !!

2010-12-27 Thread Freddie Cash
On Sun, Dec 26, 2010 at 4:43 PM, Jean-Yves Avenard  wrote:
> On 27 December 2010 09:55, Jean-Yves Avenard  wrote:
>> Hi there.
>>
>> I used stable-8-zfsv28-20101223-nopython.patch.xz from
>> http://people.freebsd.org/~mm/patches/zfs/v28/
>
> I did the following:
>
> # zpool status
>  pool: pool
>  state: ONLINE
>  scan: none requested
> config:
>
>        NAME            STATE     READ WRITE CKSUM
>        pool            ONLINE       0     0     0
>          raidz1-0      ONLINE       0     0     0
>            ada2        ONLINE       0     0     0
>            ada3        ONLINE       0     0     0
>            ada4        ONLINE       0     0     0
>            ada5        ONLINE       0     0     0
>            ada6        ONLINE       0     0     0
>            ada7        ONLINE       0     0     0
>        cache
>          label/zcache  ONLINE       0     0     0
>
> errors: No known data errors
>
> so far so good
>
> [r...@server4 /pool/home/jeanyves_avenard]# zpool add pool log
> /dev/label/zil [r...@server4 /pool/home/jeanyves_avenard]# zpool
> status
>  pool: pool
>  state: ONLINE
>  scan: none requested
> config:
>
>        NAME            STATE     READ WRITE CKSUM
>        pool            ONLINE       0     0     0
>          raidz1-0      ONLINE       0     0     0
>            ada2        ONLINE       0     0     0
>            ada3        ONLINE       0     0     0
>            ada4        ONLINE       0     0     0
>            ada5        ONLINE       0     0     0
>            ada6        ONLINE       0     0     0
>            ada7        ONLINE       0     0     0
>        logs
>          label/zil     ONLINE       0     0     0
>        cache
>          label/zcache  ONLINE       0     0     0
>
> errors: No known data errors
>
> so far so good:
>
> # zpool remove pool logs label/zil
> cannot remove logs: no such device in pool

Is that a typo, or the actual command you used?  You have an extra "s"
in there.  Should be "log" and not "logs".  However, I don't think
that command is correct either.

I believe you want to use the "detach" command, not "remove".

# zpool detach pool label/zil

-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-26 Thread Jean-Yves Avenard
Hi

On 27 December 2010 16:04, jhell  wrote:

>
> Before anything else can you: (in FreeBSD)
>
> 1) Set vfs.zfs.recover=1 at the loader prompt (OK set vfs.zfs.recover=1)
> 2) Boot into single user mode without opensolaris.ko and zfs.ko loaded
> 3) ( mount -w / ) to make sure you can remove and also write new
> zpool.cache as needed.
> 3) Remove /boot/zfs/zpool.cache
> 4) kldload both zfs and opensolaris i.e. ( kldload zfs ) should do the trick
> 5) verify that vfs.zfs.recover=1 is set then ( zpool import pool )
> 6) Give it a little bit monitor activity using Ctrl+T to see activity.
>
> You should have your pool back to a working condition after this. The
> reason why oi_127 can't work with your pool is because it cannot see
> FreeBSD generic labels. The only way to work around this for oi_127
> would be to either point it directly at the replacing device or to use
> actual slices or partitions for your slogs and other such devices.
>
> Use adaNsN or gpt or gptid for working with your pool if you plan on
> using other OS's for recovery effects.
>

Hi..

Thank you for your response, I will keep it safely should it ever occur again.

Let me explain why I used labels..

It all started when I was trying to solve some serious performance
issue when running with zfs
http://forums.freebsd.org/showthread.php?t=20476

One of the step in trying to trouble shoot the latency problem, was to
use AHCI ; I had always thought that activating AHCI in the bios was
sufficient to get it going on FreeBSD, turned out that was the case
and that I needed to load ahci.ko as well.

After doing so, my system wouldn't boot anymore as it was trying to be
/dev/ad0 which didn't exist anymore and was now names /dev/ata0.
So I used a label to the boot disk to ensure that I will never
encounter that problem ever again.

In the same mindset, I used labels for the cache and log device I
later added to the pool...

I have to say however, that zfs had no issue using the labels until I
tried to remove it. I had rebooted several times without having any
problems.
zpool status never hanged

It all started to play up when I ran the command:
zpool remove pool log label/zil

zpool never ever came out from running that command (I let it run for
a good 30 minutes, during which I was fearing the worse, and once I
rebooted and nothing ever worked, suicide looked like an appealing
alternative)

It is very disappointing however that because the pool is in a
non-working state, none of the command available to troubleshoot the
problem would actually work (which I'm guessing is related to zpool
looking for a device name that it can never find being a label)

I also can't explain why FreeBSD would kernel panic when it was
finally in a state of being able to do an import.

I have to say unfortunately, that if I hadn't had OpenIndiana, I would
probably still by crying underneath my desk right now...

Thanks again for your email, I have no doubt that this would have
worked but in my situation, I got your answer in just 2 hours, which
is better than any paid support could provide !

Jean-Yves
PS: saving my 5MB files over the network , went from 40-55s with v15
to a constant 16s with v28... I can't test with ZIL completely
disabled , it seems that vfs.zfs.zil_disable has been removed, and so
did vfs.zfs.write_limit_override
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-26 Thread jhell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/26/2010 23:17, Jean-Yves Avenard wrote:
> Responding to myself again :P
> 
> On 27 December 2010 13:28, Jean-Yves Avenard  wrote:
>> tried to force a zpool import
>>
>> got a kernel panic:
>> panic: solaris assert: weight >= space && weight <= 2 * space, file:
>> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c,
>> line: 793
>>
>> cpuid = 5
>> KDB: stack backtrace
>> #0: 0xff805f64be at kdb_backtrace
>> #1 ..  panic+0x187
>> #2 .. metaslab_weight+0xe1
>> #3: metaslab_sync_done+0x21e
>> #4: vdev_sync_done
>> #5: spa_sync+0x6a2
>> #6 txg_sync_thread+0x147
>> #7: fork_exit+0x118
>> #8: fork_trampoline+0xe
>>
>> uptime 2m25s..
>>
> 
> Command used to import in FreeBSD was:
> zpool import -fF -R / pool
> which told me that zil was missing, and to use -m
> 
> I booted openindiana (which is the only distribution I could ifnd with
> a live CD supporting zpool v28)
> 
> Doing a zpool import actually made it show that the pool had
> successfully been repaired by the command above.
> It did think that the pool was in use (and it was, as I didn't do a
> zpool export).
> 
> So I run zpool import -f pool in openindiana, and luckily, all my
> files were there. Not sure if anything was lost...
> 
> in openindiana, I then ran zpool export and rebooted into FreeBSD.
> 
> I ran zpool import there, and got the same original behaviour of a
> zpool import hanging, I can't sigbreak it nothing. Only left with the
> option of rebooting.
> 
> Back into openindiana, tried to remove the log drive, but no luck.
> Always end up with the message:
> cannot remove log: no such device in pool
> 
> Googling that error seems to be a common issue when trying to remove a
> ZIL but while that message is displayed, the log drive is actually
> removed.
> Not in my case..
> 
> So I tried something brave:
> In Open Indiana
> zpool export pool
> 
> rebooted the PC, disconnected the SSD drive I had use and rebooted
> into openindiana
> ran zpool import -fF -R / pool (complained that log device was
> missing) and again zpool import -fF -m -R / pool
> 
> zfs status showed that logs device being unavailable this time.
> 
> ran zpool remove pool log hex_number_showing_in_place
> 
> It showed the error "cannot remove log: no such device in pool"
> but zpool status showed that everything was allright
> 
> zpool export pool , then reboot into FreeBSD
> 
> zpool import this time didn't hang and successfully imported my pool.
> All data seems to be there.
> 
> 
> Summary: v28 is still buggy when it comes to removing the log
> device... And once something is screwed, zpool utility becomes
> hopeless as it hangs.
> 
> So better have a OpenIndiana live CD to repair things :(
> 
> But I won't be trying to remove the log device for a long time ! at
> least the data can be recovered when it happens..
> 
> Could it be that this is related to the v28 patch I used
> (http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz
> and should have stuck to the standard one).
> 

Before anything else can you: (in FreeBSD)

1) Set vfs.zfs.recover=1 at the loader prompt (OK set vfs.zfs.recover=1)
2) Boot into single user mode without opensolaris.ko and zfs.ko loaded
3) ( mount -w / ) to make sure you can remove and also write new
zpool.cache as needed.
3) Remove /boot/zfs/zpool.cache
4) kldload both zfs and opensolaris i.e. ( kldload zfs ) should do the trick
5) verify that vfs.zfs.recover=1 is set then ( zpool import pool )
6) Give it a little bit monitor activity using Ctrl+T to see activity.

You should have your pool back to a working condition after this. The
reason why oi_127 can't work with your pool is because it cannot see
FreeBSD generic labels. The only way to work around this for oi_127
would be to either point it directly at the replacing device or to use
actual slices or partitions for your slogs and other such devices.

Use adaNsN or gpt or gptid for working with your pool if you plan on
using other OS's for recovery effects.


Regards,

- -- 

 jhell,v
-BEGIN PGP SIGNATURE-

iQEcBAEBAgAGBQJNGB5QAAoJEJBXh4mJ2FR+rUAH/1HhzfnDI1jTICrA2Oiwyk12
BLXac0HoTY+NVUrdieMUWPh781oiB0eOuzjnOprev1D2uTqrmKvivnWdzuT/5Kfi
vWSSnIqWiNbtvA5ocgWs7IPtcaD5pZS06oToihvLlsEiRyYXTSh2XD7JOsLbQMNb
uKTfAvGI/XnNX0OY3RNI+OOa031GfpdHEWon8oi5aFBYdsDsv3Wn8Z45qCp8yfI+
WZlI+P+uunrmfgZdSzDbpAxeByhTB+8ntnB6QC4d0GRXKwqTVrFmIw5yuuqRAIf8
oCJYDhH6AUi+cxAGDExhLz2e75mEZNHAqB2nkxTaWbwL/rGjBnVidNm1aj7WnWw=
=FlmB
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-26 Thread Jean-Yves Avenard
Responding to myself again :P

On 27 December 2010 13:28, Jean-Yves Avenard  wrote:
> tried to force a zpool import
>
> got a kernel panic:
> panic: solaris assert: weight >= space && weight <= 2 * space, file:
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c,
> line: 793
>
> cpuid = 5
> KDB: stack backtrace
> #0: 0xff805f64be at kdb_backtrace
> #1 ..  panic+0x187
> #2 .. metaslab_weight+0xe1
> #3: metaslab_sync_done+0x21e
> #4: vdev_sync_done
> #5: spa_sync+0x6a2
> #6 txg_sync_thread+0x147
> #7: fork_exit+0x118
> #8: fork_trampoline+0xe
>
> uptime 2m25s..
>

Command used to import in FreeBSD was:
zpool import -fF -R / pool
which told me that zil was missing, and to use -m

I booted openindiana (which is the only distribution I could ifnd with
a live CD supporting zpool v28)

Doing a zpool import actually made it show that the pool had
successfully been repaired by the command above.
It did think that the pool was in use (and it was, as I didn't do a
zpool export).

So I run zpool import -f pool in openindiana, and luckily, all my
files were there. Not sure if anything was lost...

in openindiana, I then ran zpool export and rebooted into FreeBSD.

I ran zpool import there, and got the same original behaviour of a
zpool import hanging, I can't sigbreak it nothing. Only left with the
option of rebooting.

Back into openindiana, tried to remove the log drive, but no luck.
Always end up with the message:
cannot remove log: no such device in pool

Googling that error seems to be a common issue when trying to remove a
ZIL but while that message is displayed, the log drive is actually
removed.
Not in my case..

So I tried something brave:
In Open Indiana
zpool export pool

rebooted the PC, disconnected the SSD drive I had use and rebooted
into openindiana
ran zpool import -fF -R / pool (complained that log device was
missing) and again zpool import -fF -m -R / pool

zfs status showed that logs device being unavailable this time.

ran zpool remove pool log hex_number_showing_in_place

It showed the error "cannot remove log: no such device in pool"
but zpool status showed that everything was allright

zpool export pool , then reboot into FreeBSD

zpool import this time didn't hang and successfully imported my pool.
All data seems to be there.


Summary: v28 is still buggy when it comes to removing the log
device... And once something is screwed, zpool utility becomes
hopeless as it hangs.

So better have a OpenIndiana live CD to repair things :(

But I won't be trying to remove the log device for a long time ! at
least the data can be recovered when it happens..

Could it be that this is related to the v28 patch I used
(http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz
and should have stuck to the standard one).

Jean-Yves
Breezing again !
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: Kernel Panic

2010-12-26 Thread Jean-Yves Avenard
tried to force a zpool import

got a kernel panic:
panic: solaris assert: weight >= space && weight <= 2 * space, file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c,
line: 793

cpuid = 5
KDB: stack backtrace
#0: 0xff805f64be at kdb_backtrace
#1 ..  panic+0x187
#2 .. metaslab_weight+0xe1
#3: metaslab_sync_done+0x21e
#4: vdev_sync_done
#5: spa_sync+0x6a2
#6 txg_sync_thread+0x147
#7: fork_exit+0x118
#8: fork_trampoline+0xe

uptime 2m25s..

sorry for not writing down all the RAM addressed in the backtrace ...

Starting to smell very poorly :(
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: ARRRGG HELP !!

2010-12-26 Thread Jean-Yves Avenard
Rebooting in single-user mode.

zpool status pool
or spool scrub pool

hangs just the same ... and there's no disk activity either ...

Will download a liveCD of OpenIndiana, hopefully it will show me what's wrong :(

Jean-Yves
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE: ARRRGG HELP !!

2010-12-26 Thread Jean-Yves Avenard
On 27 December 2010 09:55, Jean-Yves Avenard  wrote:
> Hi there.
>
> I used stable-8-zfsv28-20101223-nopython.patch.xz from
> http://people.freebsd.org/~mm/patches/zfs/v28/

I did the following:

# zpool status
  pool: pool
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
poolONLINE   0 0 0
  raidz1-0  ONLINE   0 0 0
ada2ONLINE   0 0 0
ada3ONLINE   0 0 0
ada4ONLINE   0 0 0
ada5ONLINE   0 0 0
ada6ONLINE   0 0 0
ada7ONLINE   0 0 0
cache
  label/zcache  ONLINE   0 0 0

errors: No known data errors

so far so good

[r...@server4 /pool/home/jeanyves_avenard]# zpool add pool log
/dev/label/zil [r...@server4 /pool/home/jeanyves_avenard]# zpool
status
  pool: pool
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
poolONLINE   0 0 0
  raidz1-0  ONLINE   0 0 0
ada2ONLINE   0 0 0
ada3ONLINE   0 0 0
ada4ONLINE   0 0 0
ada5ONLINE   0 0 0
ada6ONLINE   0 0 0
ada7ONLINE   0 0 0
logs
  label/zil ONLINE   0 0 0
cache
  label/zcache  ONLINE   0 0 0

errors: No known data errors

so far so good:

# zpool remove pool logs label/zil
cannot remove logs: no such device in pool

^C

Great... now nothing respond..

Rebooting the box, I can boot in single user mode.
but doing zpool status give me:

ZFS filesystem version 5
ZFS storage pool version 28

and it hangs there forever...

What should I do :( ?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2010-12-26 Thread Jean-Yves Avenard
Hi there.

I used stable-8-zfsv28-20101223-nopython.patch.xz from
http://people.freebsd.org/~mm/patches/zfs/v28/

simply because it was the most recent at this location.

Is this the one to use?

Just asking cause the file server I installed it on has stopped
responding this morning and doing a remote power cycle didn't work.

So got to get to the office and see what went on :(
Suspect a kernel panic of some kind

Jean-Yves
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2010-12-23 Thread jhell

Hi Martin, List,

Patched up to a ZFSv28 20101218 and it is working as expected, Great Job!.

There seems to be some assertion errors that are left to be fixed yet
with the following examples:

Panic String: solaris assert: vd->vdev_stat.vs_alloc == 0 (0x18a000 ==
0x0),
file:/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c,
line: 4623

#3  0x84caca35 in spa_vdev_remove (spa=0x84dba000,
guid=2330662286000872312, unspare=0)
at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:4623
4623ASSERT3U(vd->vdev_stat.vs_alloc, ==, 0);
(kgdb) list
4618
4619/*
4620 * The evacuation succeeded.  Remove any remaining MOS
metadata
4621 * associated with this vdev, and wait for these changes
to sync.
4622 */
4623ASSERT3U(vd->vdev_stat.vs_alloc, ==, 0);
4624txg = spa_vdev_config_enter(spa);
4625vd->vdev_removing = B_TRUE;
4626vdev_dirty(vd, 0, NULL, txg);
4627vdev_config_dirty(vd);

This happens on i386 upon ( zfs remove pool  )

Also if it is of any relevance this happens during ``offline'' too.


If further information is needed I still have these cores and kernel
just let me know what you need.


Regards,

-- 

 jhell,v
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Updated py-zfs ? Re: New ZFSv28 patchset for 8-STABLE

2010-12-23 Thread Ruben van Staveren
Thanks, I'm going to check it out!

On 23 Dec 2010, at 9:58, Martin Matuska wrote:

> I have updated the py-zfs port right now so it should work with v28,
> too. The problem was a non-existing solaris.misc module, I had to patch
> and remove references to this module.
> 
> Cheers,
> mm
> 

Regards,
Ruben
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Updated py-zfs ? Re: New ZFSv28 patchset for 8-STABLE

2010-12-23 Thread Martin Matuska
I have updated the py-zfs port right now so it should work with v28,
too. The problem was a non-existing solaris.misc module, I had to patch
and remove references to this module.

Cheers,
mm

Dňa 23.12.2010 09:27, Ruben van Staveren  wrote / napísal(a):
> Hi,
> 
> On 16 Dec 2010, at 13:44, Martin Matuska wrote:
> 
>> Hi everyone,
>>
>> following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
>> providing a ZFSv28 testing patch for 8-STABLE.
> 
> Where can I find an updated py-zfs so that zfs (un)allow/userspace/groupspace 
> can be tested ?
> 
> Regards,
>   Ruben
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Updated py-zfs ? Re: New ZFSv28 patchset for 8-STABLE

2010-12-23 Thread Ruben van Staveren
Hi,

On 16 Dec 2010, at 13:44, Martin Matuska wrote:

> Hi everyone,
> 
> following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
> providing a ZFSv28 testing patch for 8-STABLE.

Where can I find an updated py-zfs so that zfs (un)allow/userspace/groupspace 
can be tested ?

Regards,
Ruben

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2010-12-21 Thread Ruben van Staveren
Ok,

On 16 Dec 2010, at 13:44, Martin Matuska wrote:

> Please test, test, test. Chances are this is the last patchset before
> v28 going to HEAD (finally) and after a reasonable testing period into
> 8-STABLE.
> Especially test new changes, like boot support and sendfile(2) support.
> Also be sure to verify if you can import for existing ZFS pools
> (v13-v15) when running v28 or boot from your existing pools.
> 
> Please test the (v13-v15) compatibility layer as well:
> Old usereland + new kernel / old kernel + new userland

Using v28 kernel+userland seems to work on FreeBSD/amd64, I didn't dare to mix 
userland/kernel as that is ill advised by itself when there are major changes, 
like this one.

I can't seem to use zfs allow/userspace/groupspace. old py-zfs just dumped core 
on those commands, recompiling gave my warnings about a missing solaris.misc 
module which persisted even after a upgrade to py26-zfs-1_1.

Thanks for keeping up the good work on ZFS in FreeBSD!

Best Regards,
Ruben___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2010-12-18 Thread Krzysztof Dajka
On Sat, Dec 18, 2010 at 7:30 PM, Martin Matuska  wrote:
> The information about pools is stored in /boot/zfs/zpool.cache
> If this file doesn't contain correct information, your system pools will
> not be discovered.
>
> In v28, importing a pool with the "altroot" option does not touch the
> cache file (it has to be specified manually with a option to zpool import).
>
> Regarding rollback - rolling back a live root file system is not
> recommended.
>
> Dňa 18.12.2010 19:43, Krzysztof Dajka  wrote / napísal(a):
>> t my system working
>> again. I did:
>> zpool import -o altroot=/tank tank
>> chroot /tank
>> reboot
>>
>> Can anyone explain why chroot to /tank is needed?
>

I used 8.2BETA1 memstick image to import and rollback. Thanks for
explanation, few moments ago I would argue that zpool(1) didn't
mention cachefile :)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2010-12-18 Thread Krzysztof Dajka
Hi,
I applied patch against evening 2010-12-16 STABLE. I did what Martin asked:

On Thu, Dec 16, 2010 at 1:44 PM, Martin Matuska  wrote:
>    # cd /usr/src
>    # fetch
> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
>    # xz -d stable-8-zfsv28-20101215.patch.xz
>    # patch -E -p0 < stable-8-zfsv28-20101215.patch
>    # rm sys/cddl/compat/opensolaris/sys/sysmacros.h
>
Patch applied cleanly.

#make buildworld
#make buildkernel
#make installkernel
Reboot into single user mode.
#mergemaster -p
#make installworld
#mergemaster
Reboot.


Rebooting with old world and new kernel went fine. But after reboot
with new world I got:
ZFS: zfs_alloc()/zfs_free() mismatch
Just before loading kernel modules, after that my system hangs.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2010-12-17 Thread Andrei V. Lavreniyuk

17.12.2010 12:12, Romain Garbage пишет:



following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz

Link to mfsBSD ISO files for testing (i386 and amd64):
http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-amd64.iso
http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-i386.iso

The root password for the ISO files: "mfsroot"
The ISO files work on real systems and in virtualbox.
They conatin a full install of FreeBSD 8.2-PRERELEASE with ZFS v28,
simply use the provided "zfsinstall" script.

The patch is against FreeBSD 8-STABLE as of 2010-12-15.

When applying the patch be sure to use correct options for patch(1)
and make sure the file sys/cddl/compat/opensolaris/sys/sysmacros.h gets
deleted:

# cd /usr/src
# fetch
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
# xz -d stable-8-zfsv28-20101215.patch.xz
# patch -E -p0<  stable-8-zfsv28-20101215.patch
# rm sys/cddl/compat/opensolaris/sys/sysmacros.h


Patch seemed to apply fine against yesterday evening (2010-12-16)
8-STABLE, world and kernel compiled fine, and booting from mirrored
pool v15 was also fine.



...booting from RAIDZ2 pool v15 was bad... :(



---
Best regards, Andrei Lavreniyuk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2010-12-17 Thread Romain Garbage
2010/12/16 Martin Matuska :
> Hi everyone,
>
> following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
> providing a ZFSv28 testing patch for 8-STABLE.
>
> Link to the patch:
>
> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
>
> Link to mfsBSD ISO files for testing (i386 and amd64):
>    http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-amd64.iso
>    http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-i386.iso
>
> The root password for the ISO files: "mfsroot"
> The ISO files work on real systems and in virtualbox.
> They conatin a full install of FreeBSD 8.2-PRERELEASE with ZFS v28,
> simply use the provided "zfsinstall" script.
>
> The patch is against FreeBSD 8-STABLE as of 2010-12-15.
>
> When applying the patch be sure to use correct options for patch(1)
> and make sure the file sys/cddl/compat/opensolaris/sys/sysmacros.h gets
> deleted:
>
>    # cd /usr/src
>    # fetch
> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
>    # xz -d stable-8-zfsv28-20101215.patch.xz
>    # patch -E -p0 < stable-8-zfsv28-20101215.patch
>    # rm sys/cddl/compat/opensolaris/sys/sysmacros.h

Patch seemed to apply fine against yesterday evening (2010-12-16)
8-STABLE, world and kernel compiled fine, and booting from mirrored
pool v15 was also fine.

Cheers,
Romain
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"