Re: [OmniOS-discuss] (no subject)

2015-09-14 Thread Stephan Budach

Am 15.09.15 um 03:46 schrieb Paul B. Henson:

From: Omen Wild
Sent: Monday, September 14, 2015 3:10 PM

Mostly we are wondering how to clear the corruption off disk and worried
what else might be corrupt since the scrub turns up no issues.

While looking into possible corruption from the recent L2 cache bug it seems
that running 'zdb -bbccsv' is a good test for finding corruption as it looks
at all of the blocks and verifies all of the checksums.

___
As George Wilson wrote on the ZFS mailing list: " Unfortunately, if the 
corruption impacts a data block then we won't be able to detect it.". 
So, I am afarid apart from metadata and indirect blocks corruption, 
there's no way to even detect a corruption inside a data block, as the 
checksum fits.


I think, the best one can do is to run a scrub and act on the results of 
that. If scrub reports no errors, one can live with that or one would 
need to think of options to reference the data with known, good data 
from that pool, e.g. from a backup prior to 6214 having been introduced, 
but depending on the sheer amount of data or the type of it, that might 
not be even possible.


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS r151014 update - needs reboot!

2015-09-14 Thread Stephan Budach

Hi Dan,

I will apply the upgrade to a couple of my OmniOS boxes today and give 
it a go.


Thanks,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS Bloody update

2015-09-14 Thread Dan McDonald
There have been some fixes there, but I'm not sure if it's all there.  I do 
know one has to use --reject options to make a switch.  

Lauri "lotheac" Tirkkonen can provide more details.  Also note - there is an 
effort to replace sunssh with OpenSSh altogether.

Dan


Sent from my iPhone (typos, autocorrect, and all)

On Sep 14, 2015, at 9:38 PM, Paul B. Henson  wrote:

>> From: Dan McDonald
>> Sent: Monday, September 14, 2015 2:58 PM
>> 
>> - OpenSSH is now at version 7.1p1.
> 
> Has the packaging been fixed in bloody so you can actually install this now
> :)? If so, any thoughts on potentially back porting that to the current LTS
> :)?
> 
>> - An additional pair of ZFS fixes from Delphix not yet upstreamed in
> illumos-gate.
> 
> That would be DLPX-36997 and DLPX-35372? Do you happen to know if Delphix
> has their issue tracker accessible to the Internet if somebody wanted to
> take a look in more detail at these? Google didn't provide anything of any
> obvious use.
> 
> Thanks!
> 
> 
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] zdb -h bug?

2015-09-14 Thread Paul B. Henson
While trying to look for corruption from the recent L2 cache bug, I noticed
that zdb core dumps trying to list the history on both my data pool (which
had L2 cache) and my rpool (which did not). I'm wondering if there is some
bug with zdb that is causing this as opposed to corruption of the pool.

I'd be curious as to what 'zdb -h' does on the various pools out there,
particularly ones created prior to 014 then subsequently upgraded to 014 but
without large_blocks being enabled (as those are the characteristics of my
pools :) ).

If I get a little time I'm going to try to build a 012 box and simulate how
my pools got to where they are and see if I can reproduce it.


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] (no subject)

2015-09-14 Thread Paul B. Henson
> From: Omen Wild
> Sent: Monday, September 14, 2015 3:10 PM
> 
> Mostly we are wondering how to clear the corruption off disk and worried
> what else might be corrupt since the scrub turns up no issues.

While looking into possible corruption from the recent L2 cache bug it seems
that running 'zdb -bbccsv' is a good test for finding corruption as it looks
at all of the blocks and verifies all of the checksums.

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS r151014 update - needs reboot!

2015-09-14 Thread Paul B. Henson
> From: Dan McDonald
> Sent: Monday, September 14, 2015 2:58 PM
> 
> Most importantly, this update fixes illumos 6214 for OmniOS.  You should
be
> able to restore your L2ARC devices using the method I mentioned in my last
e-
> mail:

Call me a scaredy-cat, but I think I might wait a bit for that to burn in
before I reenable my cache :).

> Because of the changes to zfs, this update requires that you reboot your
system.

And then despite successful scrub and zdb runs I'm still nervous about the
pool not being successfully imported after a reboot 8-/ , so I might
put this off until I've got a chunk of free time in case of unxpected
recovery issues.

Thanks much for the quick turnaround though!

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS Bloody update

2015-09-14 Thread Paul B. Henson
> From: Dan McDonald
> Sent: Monday, September 14, 2015 2:58 PM
> 
> - OpenSSH is now at version 7.1p1.

Has the packaging been fixed in bloody so you can actually install this now
:)? If so, any thoughts on potentially back porting that to the current LTS
:)?

> - An additional pair of ZFS fixes from Delphix not yet upstreamed in
illumos-gate.

That would be DLPX-36997 and DLPX-35372? Do you happen to know if Delphix
has their issue tracker accessible to the Internet if somebody wanted to
take a look in more detail at these? Google didn't provide anything of any
obvious use.

Thanks!


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] r151014 users - beware of illumos 6214 - steps to check and repair...

2015-09-14 Thread Paul B. Henson
> From: Guenther Alka
> Sent: Monday, September 14, 2015 9:21 AM
> 
> 1. what is the recommended way to detect possible problems
>a. run scrub? seems useless

I don't think it is necessarily useless, it might detect a problem. However,
from what I understand there might be a problem it doesn't detect. So it can
be considered verification there is a problem, but not verification that
there isn't.

>b. run zdb pool and check for what

I ran a basic zdb and also a 'zdb -bbccsv', the former seems to be core
dumping on parsing the history, but the latter ran successfully with no
issues. If I understood George correctly, 'zdb -bbccsv' should be fairly
reliable on finding metadata corruption as it traverses all of the blocks.

> 2. when using an L2Arc and there is no obvious error detected by scrub
> or zdb
>a. trash the pool and restore from backup  via rsync with possible
> file corruptions but ZFS structure is 100% ok then
>b. keep the pool and hope that there is no metadata corruption?
>c. some action to verify that at least the pool is ok: 

Hmm, at this point given a successful scrub and successful zdb runs I'm
going to keep my fingers crossed that I have no corruption. I was only
running the buggy code for it out of month, without a particularly high
load, so hopefully I got lucky.

> 3. when using an L2Arc and there is an error detected by scrub or zdb
[...]
>b. keep the pool and hope that there is no metadata corruption

If the scrub or zdb detect errors, it is possible your box might panic at
some point, or be unable to import the pool after a reboot. So in that case,
I don't think just keeping it is advisable :). I'm not sure if there is any
way to fix it or if the best case is to try to restore it or temporarily
transfer the data elsewhere, re-create it, and put it back.

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS panic when unlinking a file

2015-09-14 Thread Omen Wild
Quoting Dan McDonald  on Mon, Sep 14 18:41:
>
> One thing you can try is to overwrite the file and then remove it. 
> Someone else reported a similar vug, and it turned out to be corrupt
> metadata or extended attributes.

We will try it, probably tomorrow. This is a backup server and we have
a long running job in progress.

> Do you have a URL for the panic?

I will send it off-list.

> Also, please try today's update.

I have run the update, but we will have to wait until at least
tomorrow to reboot.

Thanks!
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] (no subject)

2015-09-14 Thread Dan McDonald
One thing you can try is to overwrite the file and then remove it.  Someone 
else reported a similar vug, and it turned out to be corrupt metadata or 
extended attributes.

Do you have a URL  for the panic?  Also, please try today's update.

Dan 

Sent from my iPhone (typos, autocorrect, and all)

> On Sep 14, 2015, at 6:09 PM, Omen Wild  wrote:
> 
> [ I originally posted this to the Illumos ZFS list but got no responses. ]
> 
> We have an up to date OmniOS system that panics every time we try to
> unlink a specific file. We have a kernel pages-only crashdump and can
> reproduce easily. I can make the panic files available to an interested
> party. 
> 
> A zpool scrub turned up no errors or repairs. 
> 
> Mostly we are wondering how to clear the corruption off disk and worried
> what else might be corrupt since the scrub turns up no issues.
> 
> Details below.
> 
> When we first encountered the issue we were running with a version from
> mid-July: zfs@0.5.11,5.11-0.151014:20150417T182430Z .
> 
> After the first couple panics we upgraded to the newest (as of a couple
> days ago, zfs@0.5.11,5.11-0.151014:20150818T161042Z) which still panics.
> 
> # uname -a
> SunOS zaphod 5.11 omnios-d08e0e5 i86pc i386 i86pc
> 
> The error looks like this:
> BAD TRAP: type=e (#pf Page fault) rp=ff002ed54b00 addr=e8 occurred in 
> module "zfs" due to a NULL pointer dereference
> 
> The panic stack looks like this in every case:
>   param_preset
>   die+0xdf
>   trap+0xdb3
>   0xfb8001d6
>   zfs_remove+0x395
>   fop_remove+0x5b
>   vn_removeat+0x382
>   unlinkat+0x59
>   _sys_sysenter_post_swapgs+0x149
> 
> It is triggered by trying to rm a specific file. ls'ing the file gives
> the error "Operation not applicable", ls'ing the directory shows ? in
> place of the data:
> 
> ??   ? ??  ?? filename.html
> 
> I have attached the output of:
> echo '::panicinfo\n::cpuinfo -v\n::threadlist -v 
> 10\n::msgbuf\n*panic_thread::findstack -v\n::stacks' | mdb 7
> 
> I am a Solaris/OI/OmniOS debugging neophyte, but will happily run any
> commands recommended.
> 
> Thanks
>  Omen
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS panic when unlinking a file

2015-09-14 Thread Omen Wild
Apologies, this email escaped without a subject line. I'm hoping this
one, coupled with threading, will help ameliorate the problem.

-- 
"What is this talk of 'release'?  Klingons do not make software
'releases'.  Our software 'escapes,' leaving a bloody trail of
designers and quality assurance people in its wake."
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] OmniOS Bloody update

2015-09-14 Thread Dan McDonald
Hello again!

With one week left until Surge & illumos Day, I wanted to make sure an update 
to bloody happened.  With recent illumos bugs (e.g. 6214) taking high priority, 
I wanted to make sure their fixes ended up in the bloody release (as well as 
appropriate ones making it back into r151014).

New with this update out of omnios-build (now at master revision f01dd5c):

- Mozilla NSS up to version 3.20.  (Includes ca-bundle update.)

- OpenSSH is now at version 7.1p1.

- The kayak images now include previously missing bits.


And highlights of illumos-omnios progress (now at master revision 23b18eb, 
meaning uname -v == omnios-23b18eb) are:

- A fix to illumos 6214, which will prevent the existence of l2arc/cache 
devices from potentially corrupting data.

- An additional pair of ZFS fixes from Delphix not yet upstreamed in 
illumos-gate.

- Updated ses connector lists.

- An htable_reap() fix from Joyent, which may prevent memory hogging and 
reap-related slowdowns.

- New kstats for the NFS server (see illumos 6090).


There will be one, possibly two, more bloody updates before I freeze for 
r151016.  '016 will be a bit late this time (late October/early November), and 
one more bloody update will contain a potentially numerous upgrade of various 
omnios-build packages.

I have updated the .iso, .usb-dd, and the kayak images as well.

Happy updating!
Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] OmniOS r151014 update - needs reboot!

2015-09-14 Thread Dan McDonald
I have updated release media as well as the IPS server.

omnios-build branch r151014 is now on revision 437bddb.
illumos-omnios branch r151014 is now on revision c65, which means "uname 
-v" now shows omnios-c65.

Most importantly, this update fixes illumos 6214 for OmniOS.  You should be 
able to restore your L2ARC devices using the method I mentioned in my last 
e-mail:

zpool add  cache 

PLEASE MAKE SURE YOU SPECIFY "cache" when adding the vdev, or else you will 
append 's space to your pool.

Special thanks to illumos community member Arne "sensille" Jansen for both 
finding 6214 and fixing it.


Additional fixes in this update include:

- An additional set of ZFS fixes from Delphix.

- Mozilla NSS up to version 3.20 (including ca-bundle).

- Kernel htable fixes, which should improve kernel memory behavior in the face 
of reaping (illumos 6202).

- Fault management topology changes for ses brought up to date with 
illumos-gate.

- Small bug with zpool import fixed (illumos 1778).

Because of the changes to zfs, this update requires that you reboot your system.


Thank you!
Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Matthew Lagoe
I know it is multithreaded just in my experience (at least historically) it
wasn't completely multi-threaded and you could run into bottlenecks with
spare cpu cores sitting idle.

-Original Message-
From: Saso Kiselkov [mailto:skiselkov...@gmail.com] 
Sent: Monday, September 14, 2015 12:46 PM
To: Matthew Lagoe; 'Doug Hughes'
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] zfs compression limits write throughput to
100MB/sec

On 9/14/15 9:40 PM, Matthew Lagoe wrote:
> Also I believe the compression is not threaded as well as it could be 
> so you may be limited by the single core performance of your machine.

It is multi-threaded.

Cheers,
--
Saso



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Saso Kiselkov
On 9/14/15 9:40 PM, Matthew Lagoe wrote:
> Also I believe the compression is not threaded as well as it could be so you
> may be limited by the single core performance of your machine.

It is multi-threaded.

Cheers,
-- 
Saso

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Matthew Lagoe
Also I believe the compression is not threaded as well as it could be so you
may be limited by the single core performance of your machine.

-Original Message-
From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On
Behalf Of Saso Kiselkov
Sent: Monday, September 14, 2015 12:34 PM
To: Doug Hughes
Cc: omnios-discuss@lists.omniti.com
Subject: Re: [OmniOS-discuss] zfs compression limits write throughput to
100MB/sec

On 9/14/15 9:18 PM, Doug Hughes wrote:
> That does seem to keep performance at much closer to parity. It still 
> seems about 70-80% of peak vs what I was seeing before, but not that 
> 100MB/sec bottleneck.

Well, that's the reality of compression. Even the compressibility check is
not free, but it's a lot less of an impact with lz4 than with lzjb.

Cheers,
--
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Saso Kiselkov
On 9/14/15 9:18 PM, Doug Hughes wrote:
> That does seem to keep performance at much closer to parity. It still
> seems about 70-80% of peak vs what I was seeing before, but not that
> 100MB/sec bottleneck.

Well, that's the reality of compression. Even the compressibility check
is not free, but it's a lot less of an impact with lz4 than with lzjb.

Cheers,
-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Doug Hughes
That does seem to keep performance at much closer to parity. It still seems
about 70-80% of peak vs what I was seeing before, but not that 100MB/sec
bottleneck.


On Mon, Sep 14, 2015 at 3:08 PM, Saso Kiselkov 
wrote:

> On 9/14/15 9:05 PM, Doug Hughes wrote:
> > Probably something for Illumos, but you guys may have seen this or may
> > like to know.
> >
> > I've got a 10g connected Xyratex box running OmniOS, and I noticed that
> > no matter how many streams (1, 2, 3) I only get 100MB/sec write
> > throughput and it just tops out. Even with 1 stream. This is with the
> > default lzjb compression on (fast option).
> >
> > I turned off compression and have 2 streams running now and am getting
> > about 250-600MB/sec in aggregate. Much better!
> >
> > The compress ratio was only 1.02x - 1.03x so it's no great loss on this
> > data. I just thought the 100MB/sec speed limit was interesting.
>
> Try setting compression=lz4. It should perform much, much better than
> lzjb on incompressible data.
>
> --
> Saso
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Saso Kiselkov
On 9/14/15 9:05 PM, Doug Hughes wrote:
> Probably something for Illumos, but you guys may have seen this or may
> like to know.
> 
> I've got a 10g connected Xyratex box running OmniOS, and I noticed that
> no matter how many streams (1, 2, 3) I only get 100MB/sec write
> throughput and it just tops out. Even with 1 stream. This is with the
> default lzjb compression on (fast option).
> 
> I turned off compression and have 2 streams running now and am getting
> about 250-600MB/sec in aggregate. Much better!
> 
> The compress ratio was only 1.02x - 1.03x so it's no great loss on this
> data. I just thought the 100MB/sec speed limit was interesting.

Try setting compression=lz4. It should perform much, much better than
lzjb on incompressible data.

-- 
Saso
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] zfs compression limits write throughput to 100MB/sec

2015-09-14 Thread Doug Hughes
Probably something for Illumos, but you guys may have seen this or may like
to know.

I've got a 10g connected Xyratex box running OmniOS, and I noticed that no
matter how many streams (1, 2, 3) I only get 100MB/sec write throughput and
it just tops out. Even with 1 stream. This is with the default lzjb
compression on (fast option).

I turned off compression and have 2 streams running now and am getting
about 250-600MB/sec in aggregate. Much better!

The compress ratio was only 1.02x - 1.03x so it's no great loss on this
data. I just thought the 100MB/sec speed limit was interesting.
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] r151014 users - beware of illumos 6214 - steps to check and repair...

2015-09-14 Thread Guenther Alka

What is the recommended action on OmniOS 151014 about the L2Arc Problem

1. what is the recommended way to detect possible problems
  a. run scrub? seems useless
  b. run zdb pool and check for what
 You said:  Look for assertion failures, or other non-0 exits. Is 
this the key for a corrupt pool?


2. when using an L2Arc and there is no obvious error detected by scrub 
or zdb
  a. trash the pool and restore from backup  via rsync with possible 
file corruptions but ZFS structure is 100% ok then

  b. keep the pool and hope that there is no metadata corruption?
  c. some action to verify that at least the pool is ok: 

3. when using an L2Arc and there is an error detected by scrub or zdb
  a. trash the pool and restore from backup with possible file 
corruption but pool is 100% ok

  b. keep the pool and hope that there is no metadata corruption
  c. some action to verify that at least the pool is ok: 

Is there an alert page at OmniOS wiki about?

Gea


Am 10.09.2015 um 13:53 schrieb Dan McDonald:

If you are using a zpool with r151014 and you have an L2ARC ("cache") vdev, I 
recommend at this time disabling it.  You may disable it by uttering:

zpool remove  

For example:

zpool remove data c2t2d0

The bug in question has a good analysis here:

https://www.illumos.org/issues/6214

This bug can lead to problems ranging from false-positives on zpool scrub all 
the way up to actual pool corruption.

We will be updating the package repo AND the install media once 6214 is 
upstreamed to illumos-gate, and pulled back into the r151014 branch of 
illumos-omnios.  The fix is undergoing some tests from ZFS experts right now to 
verify its correctness.

So please disable your L2ARC/cache devices for maximum data safety.  You can 
add them back after we update r151014 by uttering:

zpool add  cache 

PLEASE NOTE the "cache" indicator when you add back.  If you omit this, the 
vdev is ADDED to your pool, an operation one can't reverse.

zpool add data cache c2t2d0

Thanks,
Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss