subject:"\"Re\\\: \\\[OmniOS\\\-discuss\\\] \\\(no subject\\\)\""

Re: [OmniOS-discuss] (no subject)

2015-09-15 Thread Paul B. Henson

> From: Stephan Budach
> Sent: Monday, September 14, 2015 10:00 PM
>
> As George Wilson wrote on the ZFS mailing list: " Unfortunately, if the
> corruption impacts a data block then we won't be able to detect it.".
> So, I am afarid apart from metadata and indirect blocks corruption,
> there's no way to even detect a corruption inside a data block, as the
> checksum fits.

Yes, that's true, assuming you have no external source of verification.
However, Arne said he didn't think this bug would result in data corruption,
only metadata corruption. I was mostly worried about pool corruption that
would cause panics or failure to import, which data level corruption would
not cause. Most of the data on the pool I was worried about is media, a bad
data block here or there wouldn't be too tragic.

> from that pool, e.g. from a backup prior to 6214 having been introduced,
> but depending on the sheer amount of data or the type of it, that might
> not be even possible.

Yup. This was a sucky bug :(.

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] (no subject)

2015-09-15 Thread Paul B. Henson

> From: Andy Fiddaman
> Sent: Tuesday, September 15, 2015 1:41 AM
>
> zdb_blkptr_cb: Got error 50 reading <3077, 212, 0, 52>
> DVA[0]=<0:14d528f8200:1ce00> [L0 ZFS plain file] fletcher4 lz4 LE
> contiguous unique single size=2L/13200P birth=3708038L/3708038P fill=1
> cksum=1717c7d38f62:374184e099ada9b:a86cf60db2f68605:2be4a1817f9f4b1d
> --
> skipping
> 
> Is this an indicator of corruption in the pool?
> It's going to be a right royal pain to rebuild them if I need to!

That certainly doesn't look good :(. I'd recommend posting this output on
the zfs mailing list and asking for feedback.


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] (no subject)

2015-09-15 Thread Andy Fiddaman


On Mon, 14 Sep 2015, Paul B. Henson wrote:

; > From: Omen Wild
; > Sent: Monday, September 14, 2015 3:10 PM
; >
; > Mostly we are wondering how to clear the corruption off disk and worried
; > what else might be corrupt since the scrub turns up no issues.
;
; While looking into possible corruption from the recent L2 cache bug it seems
; that running 'zdb -bbccsv' is a good test for finding corruption as it looks
; at all of the blocks and verifies all of the checksums.

zpool scrub is fine but I get lots of messages like this when I run zdb
-bbccsv

zdb_blkptr_cb: Got error 50 reading <3077, 212, 0, 52>
DVA[0]=<0:14d528f8200:1ce00> [L0 ZFS plain file] fletcher4 lz4 LE
contiguous unique single size=2L/13200P birth=3708038L/3708038P fill=1
cksum=1717c7d38f62:374184e099ada9b:a86cf60db2f68605:2be4a1817f9f4b1d --
skipping

Is this an indicator of corruption in the pool?
It's going to be a right royal pain to rebuild them if I need to!

Thanks,

Andy
-- 
Citrus IT Limited | +44 (0)870 199 8000 | enquir...@citrus-it.co.uk
Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
Registered in England and Wales | Company number 4899123

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] (no subject)

2015-09-14 Thread Stephan Budach


Am 15.09.15 um 03:46 schrieb Paul B. Henson:

From: Omen Wild
Sent: Monday, September 14, 2015 3:10 PM

Mostly we are wondering how to clear the corruption off disk and worried
what else might be corrupt since the scrub turns up no issues.

While looking into possible corruption from the recent L2 cache bug it seems
that running 'zdb -bbccsv' is a good test for finding corruption as it looks
at all of the blocks and verifies all of the checksums.

___
As George Wilson wrote on the ZFS mailing list: " Unfortunately, if the 
corruption impacts a data block then we won't be able to detect it.". 
So, I am afarid apart from metadata and indirect blocks corruption, 
there's no way to even detect a corruption inside a data block, as the 
checksum fits.


I think, the best one can do is to run a scrub and act on the results of 
that. If scrub reports no errors, one can live with that or one would 
need to think of options to reference the data with known, good data 
from that pool, e.g. from a backup prior to 6214 having been introduced, 
but depending on the sheer amount of data or the type of it, that might 
not be even possible.


Cheers,
Stephan
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] (no subject)

2015-09-14 Thread Paul B. Henson

> From: Omen Wild
> Sent: Monday, September 14, 2015 3:10 PM
> 
> Mostly we are wondering how to clear the corruption off disk and worried
> what else might be corrupt since the scrub turns up no issues.

While looking into possible corruption from the recent L2 cache bug it seems
that running 'zdb -bbccsv' is a good test for finding corruption as it looks
at all of the blocks and verifies all of the checksums.

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] (no subject)

2015-09-14 Thread Dan McDonald

One thing you can try is to overwrite the file and then remove it.  Someone 
else reported a similar vug, and it turned out to be corrupt metadata or 
extended attributes.

Do you have a URL  for the panic?  Also, please try today's update.

Dan 

Sent from my iPhone (typos, autocorrect, and all)

> On Sep 14, 2015, at 6:09 PM, Omen Wild  wrote:
> 
> [ I originally posted this to the Illumos ZFS list but got no responses. ]
> 
> We have an up to date OmniOS system that panics every time we try to
> unlink a specific file. We have a kernel pages-only crashdump and can
> reproduce easily. I can make the panic files available to an interested
> party. 
> 
> A zpool scrub turned up no errors or repairs. 
> 
> Mostly we are wondering how to clear the corruption off disk and worried
> what else might be corrupt since the scrub turns up no issues.
> 
> Details below.
> 
> When we first encountered the issue we were running with a version from
> mid-July: zfs@0.5.11,5.11-0.151014:20150417T182430Z .
> 
> After the first couple panics we upgraded to the newest (as of a couple
> days ago, zfs@0.5.11,5.11-0.151014:20150818T161042Z) which still panics.
> 
> # uname -a
> SunOS zaphod 5.11 omnios-d08e0e5 i86pc i386 i86pc
> 
> The error looks like this:
> BAD TRAP: type=e (#pf Page fault) rp=ff002ed54b00 addr=e8 occurred in 
> module "zfs" due to a NULL pointer dereference
> 
> The panic stack looks like this in every case:
>   param_preset
>   die+0xdf
>   trap+0xdb3
>   0xfb8001d6
>   zfs_remove+0x395
>   fop_remove+0x5b
>   vn_removeat+0x382
>   unlinkat+0x59
>   _sys_sysenter_post_swapgs+0x149
> 
> It is triggered by trying to rm a specific file. ls'ing the file gives
> the error "Operation not applicable", ls'ing the directory shows ? in
> place of the data:
> 
> ??   ? ??  ?? filename.html
> 
> I have attached the output of:
> echo '::panicinfo\n::cpuinfo -v\n::threadlist -v 
> 10\n::msgbuf\n*panic_thread::findstack -v\n::stacks' | mdb 7
> 
> I am a Solaris/OI/OmniOS debugging neophyte, but will happily run any
> commands recommended.
> 
> Thanks
>  Omen
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Re: [OmniOS-discuss] (no subject)

Re: [OmniOS-discuss] (no subject)

Re: [OmniOS-discuss] (no subject)

Re: [OmniOS-discuss] (no subject)

Re: [OmniOS-discuss] (no subject)

Re: [OmniOS-discuss] (no subject)

6 matches

Site Navigation

Mail list logo

Footer information