Re: [OmniOS-discuss] (no subject)
Am 15.09.15 um 03:46 schrieb Paul B. Henson: From: Omen Wild Sent: Monday, September 14, 2015 3:10 PM Mostly we are wondering how to clear the corruption off disk and worried what else might be corrupt since the scrub turns up no issues. While looking into possible corruption from the recent L2 cache bug it seems that running 'zdb -bbccsv' is a good test for finding corruption as it looks at all of the blocks and verifies all of the checksums. ___ As George Wilson wrote on the ZFS mailing list: " Unfortunately, if the corruption impacts a data block then we won't be able to detect it.". So, I am afarid apart from metadata and indirect blocks corruption, there's no way to even detect a corruption inside a data block, as the checksum fits. I think, the best one can do is to run a scrub and act on the results of that. If scrub reports no errors, one can live with that or one would need to think of options to reference the data with known, good data from that pool, e.g. from a backup prior to 6214 having been introduced, but depending on the sheer amount of data or the type of it, that might not be even possible. Cheers, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] OmniOS r151014 update - needs reboot!
Hi Dan, I will apply the upgrade to a couple of my OmniOS boxes today and give it a go. Thanks, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] OmniOS Bloody update
There have been some fixes there, but I'm not sure if it's all there. I do know one has to use --reject options to make a switch. Lauri "lotheac" Tirkkonen can provide more details. Also note - there is an effort to replace sunssh with OpenSSh altogether. Dan Sent from my iPhone (typos, autocorrect, and all) On Sep 14, 2015, at 9:38 PM, Paul B. Henson wrote: >> From: Dan McDonald >> Sent: Monday, September 14, 2015 2:58 PM >> >> - OpenSSH is now at version 7.1p1. > > Has the packaging been fixed in bloody so you can actually install this now > :)? If so, any thoughts on potentially back porting that to the current LTS > :)? > >> - An additional pair of ZFS fixes from Delphix not yet upstreamed in > illumos-gate. > > That would be DLPX-36997 and DLPX-35372? Do you happen to know if Delphix > has their issue tracker accessible to the Internet if somebody wanted to > take a look in more detail at these? Google didn't provide anything of any > obvious use. > > Thanks! > > ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
[OmniOS-discuss] zdb -h bug?
While trying to look for corruption from the recent L2 cache bug, I noticed that zdb core dumps trying to list the history on both my data pool (which had L2 cache) and my rpool (which did not). I'm wondering if there is some bug with zdb that is causing this as opposed to corruption of the pool. I'd be curious as to what 'zdb -h' does on the various pools out there, particularly ones created prior to 014 then subsequently upgraded to 014 but without large_blocks being enabled (as those are the characteristics of my pools :) ). If I get a little time I'm going to try to build a 012 box and simulate how my pools got to where they are and see if I can reproduce it. ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] (no subject)
> From: Omen Wild > Sent: Monday, September 14, 2015 3:10 PM > > Mostly we are wondering how to clear the corruption off disk and worried > what else might be corrupt since the scrub turns up no issues. While looking into possible corruption from the recent L2 cache bug it seems that running 'zdb -bbccsv' is a good test for finding corruption as it looks at all of the blocks and verifies all of the checksums. ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] OmniOS r151014 update - needs reboot!
> From: Dan McDonald > Sent: Monday, September 14, 2015 2:58 PM > > Most importantly, this update fixes illumos 6214 for OmniOS. You should be > able to restore your L2ARC devices using the method I mentioned in my last e- > mail: Call me a scaredy-cat, but I think I might wait a bit for that to burn in before I reenable my cache :). > Because of the changes to zfs, this update requires that you reboot your system. And then despite successful scrub and zdb runs I'm still nervous about the pool not being successfully imported after a reboot 8-/ , so I might put this off until I've got a chunk of free time in case of unxpected recovery issues. Thanks much for the quick turnaround though! ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] OmniOS Bloody update
> From: Dan McDonald > Sent: Monday, September 14, 2015 2:58 PM > > - OpenSSH is now at version 7.1p1. Has the packaging been fixed in bloody so you can actually install this now :)? If so, any thoughts on potentially back porting that to the current LTS :)? > - An additional pair of ZFS fixes from Delphix not yet upstreamed in illumos-gate. That would be DLPX-36997 and DLPX-35372? Do you happen to know if Delphix has their issue tracker accessible to the Internet if somebody wanted to take a look in more detail at these? Google didn't provide anything of any obvious use. Thanks! ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] r151014 users - beware of illumos 6214 - steps to check and repair...
> From: Guenther Alka > Sent: Monday, September 14, 2015 9:21 AM > > 1. what is the recommended way to detect possible problems >a. run scrub? seems useless I don't think it is necessarily useless, it might detect a problem. However, from what I understand there might be a problem it doesn't detect. So it can be considered verification there is a problem, but not verification that there isn't. >b. run zdb pool and check for what I ran a basic zdb and also a 'zdb -bbccsv', the former seems to be core dumping on parsing the history, but the latter ran successfully with no issues. If I understood George correctly, 'zdb -bbccsv' should be fairly reliable on finding metadata corruption as it traverses all of the blocks. > 2. when using an L2Arc and there is no obvious error detected by scrub > or zdb >a. trash the pool and restore from backup via rsync with possible > file corruptions but ZFS structure is 100% ok then >b. keep the pool and hope that there is no metadata corruption? >c. some action to verify that at least the pool is ok: Hmm, at this point given a successful scrub and successful zdb runs I'm going to keep my fingers crossed that I have no corruption. I was only running the buggy code for it out of month, without a particularly high load, so hopefully I got lucky. > 3. when using an L2Arc and there is an error detected by scrub or zdb [...] >b. keep the pool and hope that there is no metadata corruption If the scrub or zdb detect errors, it is possible your box might panic at some point, or be unable to import the pool after a reboot. So in that case, I don't think just keeping it is advisable :). I'm not sure if there is any way to fix it or if the best case is to try to restore it or temporarily transfer the data elsewhere, re-create it, and put it back. ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ZFS panic when unlinking a file
Quoting Dan McDonald on Mon, Sep 14 18:41: > > One thing you can try is to overwrite the file and then remove it. > Someone else reported a similar vug, and it turned out to be corrupt > metadata or extended attributes. We will try it, probably tomorrow. This is a backup server and we have a long running job in progress. > Do you have a URL for the panic? I will send it off-list. > Also, please try today's update. I have run the update, but we will have to wait until at least tomorrow to reboot. Thanks! ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] (no subject)
One thing you can try is to overwrite the file and then remove it. Someone else reported a similar vug, and it turned out to be corrupt metadata or extended attributes. Do you have a URL for the panic? Also, please try today's update. Dan Sent from my iPhone (typos, autocorrect, and all) > On Sep 14, 2015, at 6:09 PM, Omen Wild wrote: > > [ I originally posted this to the Illumos ZFS list but got no responses. ] > > We have an up to date OmniOS system that panics every time we try to > unlink a specific file. We have a kernel pages-only crashdump and can > reproduce easily. I can make the panic files available to an interested > party. > > A zpool scrub turned up no errors or repairs. > > Mostly we are wondering how to clear the corruption off disk and worried > what else might be corrupt since the scrub turns up no issues. > > Details below. > > When we first encountered the issue we were running with a version from > mid-July: zfs@0.5.11,5.11-0.151014:20150417T182430Z . > > After the first couple panics we upgraded to the newest (as of a couple > days ago, zfs@0.5.11,5.11-0.151014:20150818T161042Z) which still panics. > > # uname -a > SunOS zaphod 5.11 omnios-d08e0e5 i86pc i386 i86pc > > The error looks like this: > BAD TRAP: type=e (#pf Page fault) rp=ff002ed54b00 addr=e8 occurred in > module "zfs" due to a NULL pointer dereference > > The panic stack looks like this in every case: > param_preset > die+0xdf > trap+0xdb3 > 0xfb8001d6 > zfs_remove+0x395 > fop_remove+0x5b > vn_removeat+0x382 > unlinkat+0x59 > _sys_sysenter_post_swapgs+0x149 > > It is triggered by trying to rm a specific file. ls'ing the file gives > the error "Operation not applicable", ls'ing the directory shows ? in > place of the data: > > ?? ? ?? ?? filename.html > > I have attached the output of: > echo '::panicinfo\n::cpuinfo -v\n::threadlist -v > 10\n::msgbuf\n*panic_thread::findstack -v\n::stacks' | mdb 7 > > I am a Solaris/OI/OmniOS debugging neophyte, but will happily run any > commands recommended. > > Thanks > Omen > > ___ > OmniOS-discuss mailing list > OmniOS-discuss@lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] ZFS panic when unlinking a file
Apologies, this email escaped without a subject line. I'm hoping this one, coupled with threading, will help ameliorate the problem. -- "What is this talk of 'release'? Klingons do not make software 'releases'. Our software 'escapes,' leaving a bloody trail of designers and quality assurance people in its wake." ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
[OmniOS-discuss] OmniOS Bloody update
Hello again! With one week left until Surge & illumos Day, I wanted to make sure an update to bloody happened. With recent illumos bugs (e.g. 6214) taking high priority, I wanted to make sure their fixes ended up in the bloody release (as well as appropriate ones making it back into r151014). New with this update out of omnios-build (now at master revision f01dd5c): - Mozilla NSS up to version 3.20. (Includes ca-bundle update.) - OpenSSH is now at version 7.1p1. - The kayak images now include previously missing bits. And highlights of illumos-omnios progress (now at master revision 23b18eb, meaning uname -v == omnios-23b18eb) are: - A fix to illumos 6214, which will prevent the existence of l2arc/cache devices from potentially corrupting data. - An additional pair of ZFS fixes from Delphix not yet upstreamed in illumos-gate. - Updated ses connector lists. - An htable_reap() fix from Joyent, which may prevent memory hogging and reap-related slowdowns. - New kstats for the NFS server (see illumos 6090). There will be one, possibly two, more bloody updates before I freeze for r151016. '016 will be a bit late this time (late October/early November), and one more bloody update will contain a potentially numerous upgrade of various omnios-build packages. I have updated the .iso, .usb-dd, and the kayak images as well. Happy updating! Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
[OmniOS-discuss] OmniOS r151014 update - needs reboot!
I have updated release media as well as the IPS server. omnios-build branch r151014 is now on revision 437bddb. illumos-omnios branch r151014 is now on revision c65, which means "uname -v" now shows omnios-c65. Most importantly, this update fixes illumos 6214 for OmniOS. You should be able to restore your L2ARC devices using the method I mentioned in my last e-mail: zpool add cache PLEASE MAKE SURE YOU SPECIFY "cache" when adding the vdev, or else you will append 's space to your pool. Special thanks to illumos community member Arne "sensille" Jansen for both finding 6214 and fixing it. Additional fixes in this update include: - An additional set of ZFS fixes from Delphix. - Mozilla NSS up to version 3.20 (including ca-bundle). - Kernel htable fixes, which should improve kernel memory behavior in the face of reaping (illumos 6202). - Fault management topology changes for ses brought up to date with illumos-gate. - Small bug with zpool import fixed (illumos 1778). Because of the changes to zfs, this update requires that you reboot your system. Thank you! Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec
I know it is multithreaded just in my experience (at least historically) it wasn't completely multi-threaded and you could run into bottlenecks with spare cpu cores sitting idle. -Original Message- From: Saso Kiselkov [mailto:skiselkov...@gmail.com] Sent: Monday, September 14, 2015 12:46 PM To: Matthew Lagoe; 'Doug Hughes' Cc: omnios-discuss@lists.omniti.com Subject: Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec On 9/14/15 9:40 PM, Matthew Lagoe wrote: > Also I believe the compression is not threaded as well as it could be > so you may be limited by the single core performance of your machine. It is multi-threaded. Cheers, -- Saso ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec
On 9/14/15 9:40 PM, Matthew Lagoe wrote: > Also I believe the compression is not threaded as well as it could be so you > may be limited by the single core performance of your machine. It is multi-threaded. Cheers, -- Saso ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec
Also I believe the compression is not threaded as well as it could be so you may be limited by the single core performance of your machine. -Original Message- From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf Of Saso Kiselkov Sent: Monday, September 14, 2015 12:34 PM To: Doug Hughes Cc: omnios-discuss@lists.omniti.com Subject: Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec On 9/14/15 9:18 PM, Doug Hughes wrote: > That does seem to keep performance at much closer to parity. It still > seems about 70-80% of peak vs what I was seeing before, but not that > 100MB/sec bottleneck. Well, that's the reality of compression. Even the compressibility check is not free, but it's a lot less of an impact with lz4 than with lzjb. Cheers, -- Saso ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec
On 9/14/15 9:18 PM, Doug Hughes wrote: > That does seem to keep performance at much closer to parity. It still > seems about 70-80% of peak vs what I was seeing before, but not that > 100MB/sec bottleneck. Well, that's the reality of compression. Even the compressibility check is not free, but it's a lot less of an impact with lz4 than with lzjb. Cheers, -- Saso ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec
That does seem to keep performance at much closer to parity. It still seems about 70-80% of peak vs what I was seeing before, but not that 100MB/sec bottleneck. On Mon, Sep 14, 2015 at 3:08 PM, Saso Kiselkov wrote: > On 9/14/15 9:05 PM, Doug Hughes wrote: > > Probably something for Illumos, but you guys may have seen this or may > > like to know. > > > > I've got a 10g connected Xyratex box running OmniOS, and I noticed that > > no matter how many streams (1, 2, 3) I only get 100MB/sec write > > throughput and it just tops out. Even with 1 stream. This is with the > > default lzjb compression on (fast option). > > > > I turned off compression and have 2 streams running now and am getting > > about 250-600MB/sec in aggregate. Much better! > > > > The compress ratio was only 1.02x - 1.03x so it's no great loss on this > > data. I just thought the 100MB/sec speed limit was interesting. > > Try setting compression=lz4. It should perform much, much better than > lzjb on incompressible data. > > -- > Saso > ___ > OmniOS-discuss mailing list > OmniOS-discuss@lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] zfs compression limits write throughput to 100MB/sec
On 9/14/15 9:05 PM, Doug Hughes wrote: > Probably something for Illumos, but you guys may have seen this or may > like to know. > > I've got a 10g connected Xyratex box running OmniOS, and I noticed that > no matter how many streams (1, 2, 3) I only get 100MB/sec write > throughput and it just tops out. Even with 1 stream. This is with the > default lzjb compression on (fast option). > > I turned off compression and have 2 streams running now and am getting > about 250-600MB/sec in aggregate. Much better! > > The compress ratio was only 1.02x - 1.03x so it's no great loss on this > data. I just thought the 100MB/sec speed limit was interesting. Try setting compression=lz4. It should perform much, much better than lzjb on incompressible data. -- Saso ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
[OmniOS-discuss] zfs compression limits write throughput to 100MB/sec
Probably something for Illumos, but you guys may have seen this or may like to know. I've got a 10g connected Xyratex box running OmniOS, and I noticed that no matter how many streams (1, 2, 3) I only get 100MB/sec write throughput and it just tops out. Even with 1 stream. This is with the default lzjb compression on (fast option). I turned off compression and have 2 streams running now and am getting about 250-600MB/sec in aggregate. Much better! The compress ratio was only 1.02x - 1.03x so it's no great loss on this data. I just thought the 100MB/sec speed limit was interesting. ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] r151014 users - beware of illumos 6214 - steps to check and repair...
What is the recommended action on OmniOS 151014 about the L2Arc Problem 1. what is the recommended way to detect possible problems a. run scrub? seems useless b. run zdb pool and check for what You said: Look for assertion failures, or other non-0 exits. Is this the key for a corrupt pool? 2. when using an L2Arc and there is no obvious error detected by scrub or zdb a. trash the pool and restore from backup via rsync with possible file corruptions but ZFS structure is 100% ok then b. keep the pool and hope that there is no metadata corruption? c. some action to verify that at least the pool is ok: 3. when using an L2Arc and there is an error detected by scrub or zdb a. trash the pool and restore from backup with possible file corruption but pool is 100% ok b. keep the pool and hope that there is no metadata corruption c. some action to verify that at least the pool is ok: Is there an alert page at OmniOS wiki about? Gea Am 10.09.2015 um 13:53 schrieb Dan McDonald: If you are using a zpool with r151014 and you have an L2ARC ("cache") vdev, I recommend at this time disabling it. You may disable it by uttering: zpool remove For example: zpool remove data c2t2d0 The bug in question has a good analysis here: https://www.illumos.org/issues/6214 This bug can lead to problems ranging from false-positives on zpool scrub all the way up to actual pool corruption. We will be updating the package repo AND the install media once 6214 is upstreamed to illumos-gate, and pulled back into the r151014 branch of illumos-omnios. The fix is undergoing some tests from ZFS experts right now to verify its correctness. So please disable your L2ARC/cache devices for maximum data safety. You can add them back after we update r151014 by uttering: zpool add cache PLEASE NOTE the "cache" indicator when you add back. If you omit this, the vdev is ADDED to your pool, an operation one can't reverse. zpool add data cache c2t2d0 Thanks, Dan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss