Re: [OmniOS-discuss] (no subject)
> From: Stephan Budach > Sent: Monday, September 14, 2015 10:00 PM > > As George Wilson wrote on the ZFS mailing list: " Unfortunately, if the > corruption impacts a data block then we won't be able to detect it.". > So, I am afarid apart from metadata and indirect blocks corruption, > there's no way to even detect a corruption inside a data block, as the > checksum fits. Yes, that's true, assuming you have no external source of verification. However, Arne said he didn't think this bug would result in data corruption, only metadata corruption. I was mostly worried about pool corruption that would cause panics or failure to import, which data level corruption would not cause. Most of the data on the pool I was worried about is media, a bad data block here or there wouldn't be too tragic. > from that pool, e.g. from a backup prior to 6214 having been introduced, > but depending on the sheer amount of data or the type of it, that might > not be even possible. Yup. This was a sucky bug :(. ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] (no subject)
> From: Andy Fiddaman > Sent: Tuesday, September 15, 2015 1:41 AM > > zdb_blkptr_cb: Got error 50 reading <3077, 212, 0, 52> > DVA[0]=<0:14d528f8200:1ce00> [L0 ZFS plain file] fletcher4 lz4 LE > contiguous unique single size=2L/13200P birth=3708038L/3708038P fill=1 > cksum=1717c7d38f62:374184e099ada9b:a86cf60db2f68605:2be4a1817f9f4b1d > -- > skipping > > Is this an indicator of corruption in the pool? > It's going to be a right royal pain to rebuild them if I need to! That certainly doesn't look good :(. I'd recommend posting this output on the zfs mailing list and asking for feedback. ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] (no subject)
On Mon, 14 Sep 2015, Paul B. Henson wrote: ; > From: Omen Wild ; > Sent: Monday, September 14, 2015 3:10 PM ; > ; > Mostly we are wondering how to clear the corruption off disk and worried ; > what else might be corrupt since the scrub turns up no issues. ; ; While looking into possible corruption from the recent L2 cache bug it seems ; that running 'zdb -bbccsv' is a good test for finding corruption as it looks ; at all of the blocks and verifies all of the checksums. zpool scrub is fine but I get lots of messages like this when I run zdb -bbccsv zdb_blkptr_cb: Got error 50 reading <3077, 212, 0, 52> DVA[0]=<0:14d528f8200:1ce00> [L0 ZFS plain file] fletcher4 lz4 LE contiguous unique single size=2L/13200P birth=3708038L/3708038P fill=1 cksum=1717c7d38f62:374184e099ada9b:a86cf60db2f68605:2be4a1817f9f4b1d -- skipping Is this an indicator of corruption in the pool? It's going to be a right royal pain to rebuild them if I need to! Thanks, Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquir...@citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] (no subject)
Am 15.09.15 um 03:46 schrieb Paul B. Henson: From: Omen Wild Sent: Monday, September 14, 2015 3:10 PM Mostly we are wondering how to clear the corruption off disk and worried what else might be corrupt since the scrub turns up no issues. While looking into possible corruption from the recent L2 cache bug it seems that running 'zdb -bbccsv' is a good test for finding corruption as it looks at all of the blocks and verifies all of the checksums. ___ As George Wilson wrote on the ZFS mailing list: " Unfortunately, if the corruption impacts a data block then we won't be able to detect it.". So, I am afarid apart from metadata and indirect blocks corruption, there's no way to even detect a corruption inside a data block, as the checksum fits. I think, the best one can do is to run a scrub and act on the results of that. If scrub reports no errors, one can live with that or one would need to think of options to reference the data with known, good data from that pool, e.g. from a backup prior to 6214 having been introduced, but depending on the sheer amount of data or the type of it, that might not be even possible. Cheers, Stephan ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] (no subject)
> From: Omen Wild > Sent: Monday, September 14, 2015 3:10 PM > > Mostly we are wondering how to clear the corruption off disk and worried > what else might be corrupt since the scrub turns up no issues. While looking into possible corruption from the recent L2 cache bug it seems that running 'zdb -bbccsv' is a good test for finding corruption as it looks at all of the blocks and verifies all of the checksums. ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] (no subject)
One thing you can try is to overwrite the file and then remove it. Someone else reported a similar vug, and it turned out to be corrupt metadata or extended attributes. Do you have a URL for the panic? Also, please try today's update. Dan Sent from my iPhone (typos, autocorrect, and all) > On Sep 14, 2015, at 6:09 PM, Omen Wild wrote: > > [ I originally posted this to the Illumos ZFS list but got no responses. ] > > We have an up to date OmniOS system that panics every time we try to > unlink a specific file. We have a kernel pages-only crashdump and can > reproduce easily. I can make the panic files available to an interested > party. > > A zpool scrub turned up no errors or repairs. > > Mostly we are wondering how to clear the corruption off disk and worried > what else might be corrupt since the scrub turns up no issues. > > Details below. > > When we first encountered the issue we were running with a version from > mid-July: zfs@0.5.11,5.11-0.151014:20150417T182430Z . > > After the first couple panics we upgraded to the newest (as of a couple > days ago, zfs@0.5.11,5.11-0.151014:20150818T161042Z) which still panics. > > # uname -a > SunOS zaphod 5.11 omnios-d08e0e5 i86pc i386 i86pc > > The error looks like this: > BAD TRAP: type=e (#pf Page fault) rp=ff002ed54b00 addr=e8 occurred in > module "zfs" due to a NULL pointer dereference > > The panic stack looks like this in every case: > param_preset > die+0xdf > trap+0xdb3 > 0xfb8001d6 > zfs_remove+0x395 > fop_remove+0x5b > vn_removeat+0x382 > unlinkat+0x59 > _sys_sysenter_post_swapgs+0x149 > > It is triggered by trying to rm a specific file. ls'ing the file gives > the error "Operation not applicable", ls'ing the directory shows ? in > place of the data: > > ?? ? ?? ?? filename.html > > I have attached the output of: > echo '::panicinfo\n::cpuinfo -v\n::threadlist -v > 10\n::msgbuf\n*panic_thread::findstack -v\n::stacks' | mdb 7 > > I am a Solaris/OI/OmniOS debugging neophyte, but will happily run any > commands recommended. > > Thanks > Omen > > ___ > OmniOS-discuss mailing list > OmniOS-discuss@lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss