Re: [OmniOS-discuss] Multiple faulty SSD's ?
Thanks everyone for you advice on this. What a pain in the ass….. Shaun From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf Of Schweiss, Chip Sent: Wednesday, 27 July 2016 5:55 AM To: Piotr Jasiukajtis Cc: omnios-discuss Subject: Re: [OmniOS-discuss] Multiple faulty SSD's ? I don't have a lot of experience with the 850 Pro, but a lot with the 840 Pro under OmniOS With 4K block size set in sd.conf and slicing them to only use 80% of their capacity a pool of 72 of them has been under near constant heavy read/write workload for over 3 years without a single chksum error. -Chip On Tue, Jul 26, 2016 at 1:30 PM, Piotr Jasiukajtis mailto:est...@me.com>> wrote: I don’t know a root cause, but it’s better to have a workaround than a corrupted pools. -- Piotr Jasiukajtis > On 26 Jul 2016, at 20:06, Dan McDonald > mailto:dan...@omniti.com>> wrote: > > I wonder if those sd.conf changes should be upstreamed or not? > > Dan > > Sent from my iPhone (typos, autocorrect, and all) > >> On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis >> mailto:est...@me.com>> wrote: >> >> You may want to force the driver to use 4k instead of 512b for those drivers >> and create a new pool: >> >> https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5 >> >> -- >> Piotr Jasiukajtis >> >>> On 26 Jul 2016, at 02:24, Shaun McGuane >>> mailto:sh...@rackcentral.com>> wrote: >>> >>> Hi List, >>> >>> I want to report very strange SSD behaviour on a new pool I setup. >>> >>> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card >>> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79 >>> >>> All the drives are brand spanking new setup in a raidz2 array. >>> >>> Within 2 months the below has happened and there has been very >>> Little use on this array. >>> >>> pool: SSD-TANK >>> state: DEGRADED >>> status: One or more devices are faulted in response to persistent errors. >>> Sufficient replicas exist for the pool to continue functioning in a >>> degraded state. >>> action: Replace the faulted device, or use 'zpool clear' to mark the device >>> repaired. >>> scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04 2016 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> SSD-TANK DEGRADED 16735 >>> raidz2-0 DEGRADED 472 113 >>> c5t500253884014D0D3d0 ONLINE 0 0 2 >>> c5t50025388401F767Ad0 DEGRADED 0 019 too many errors >>> c5t50025388401F767Bd0 FAULTED 0 0 0 too many errors >>> c5t50025388401F767Dd0 ONLINE 0 0 0 >>> c5t50025388401F767Fd0 ONLINE 0 0 1 >>> c5t50025388401F7679d0 ONLINE 0 0 2 >>> c5t50025388401F7680d0 REMOVED 0 0 0 >>> c5t50025388401F7682d0 ONLINE 0 0 1 >>> >>> Can anyone suggest why I would have this problem where I am seeing CKSUM >>> errors >>> On most disks and while only one has faulted others have been degraded or >>> removed. >>> >>> Thanks >>> Shaun >>> ___ >>> OmniOS-discuss mailing list >>> OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com> >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >> >> ___ >> OmniOS-discuss mailing list >> OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com> http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] Multiple faulty SSD's ?
I don't have a lot of experience with the 850 Pro, but a lot with the 840 Pro under OmniOS With 4K block size set in sd.conf and slicing them to only use 80% of their capacity a pool of 72 of them has been under near constant heavy read/write workload for over 3 years without a single chksum error. -Chip On Tue, Jul 26, 2016 at 1:30 PM, Piotr Jasiukajtis wrote: > I don’t know a root cause, but it’s better to have a workaround than a > corrupted pools. > > -- > Piotr Jasiukajtis > > > On 26 Jul 2016, at 20:06, Dan McDonald wrote: > > > > I wonder if those sd.conf changes should be upstreamed or not? > > > > Dan > > > > Sent from my iPhone (typos, autocorrect, and all) > > > >> On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis wrote: > >> > >> You may want to force the driver to use 4k instead of 512b for those > drivers and create a new pool: > >> > >> > https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5 > >> > >> -- > >> Piotr Jasiukajtis > >> > >>> On 26 Jul 2016, at 02:24, Shaun McGuane wrote: > >>> > >>> Hi List, > >>> > >>> I want to report very strange SSD behaviour on a new pool I setup. > >>> > >>> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card > >>> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79 > >>> > >>> All the drives are brand spanking new setup in a raidz2 array. > >>> > >>> Within 2 months the below has happened and there has been very > >>> Little use on this array. > >>> > >>> pool: SSD-TANK > >>> state: DEGRADED > >>> status: One or more devices are faulted in response to persistent > errors. > >>> Sufficient replicas exist for the pool to continue functioning > in a > >>> degraded state. > >>> action: Replace the faulted device, or use 'zpool clear' to mark the > device > >>> repaired. > >>> scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04 > 2016 > >>> config: > >>> > >>> NAME STATE READ WRITE CKSUM > >>> SSD-TANK DEGRADED 16735 > >>> raidz2-0 DEGRADED 472 113 > >>> c5t500253884014D0D3d0 ONLINE 0 0 2 > >>> c5t50025388401F767Ad0 DEGRADED 0 019 too many > errors > >>> c5t50025388401F767Bd0 FAULTED 0 0 0 too many > errors > >>> c5t50025388401F767Dd0 ONLINE 0 0 0 > >>> c5t50025388401F767Fd0 ONLINE 0 0 1 > >>> c5t50025388401F7679d0 ONLINE 0 0 2 > >>> c5t50025388401F7680d0 REMOVED 0 0 0 > >>> c5t50025388401F7682d0 ONLINE 0 0 1 > >>> > >>> Can anyone suggest why I would have this problem where I am seeing > CKSUM errors > >>> On most disks and while only one has faulted others have been degraded > or removed. > >>> > >>> Thanks > >>> Shaun > >>> ___ > >>> OmniOS-discuss mailing list > >>> OmniOS-discuss@lists.omniti.com > >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss > >> > >> ___ > >> OmniOS-discuss mailing list > >> OmniOS-discuss@lists.omniti.com > >> http://lists.omniti.com/mailman/listinfo/omnios-discuss > > ___ > OmniOS-discuss mailing list > OmniOS-discuss@lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] Multiple faulty SSD's ?
I don’t know a root cause, but it’s better to have a workaround than a corrupted pools. -- Piotr Jasiukajtis > On 26 Jul 2016, at 20:06, Dan McDonald wrote: > > I wonder if those sd.conf changes should be upstreamed or not? > > Dan > > Sent from my iPhone (typos, autocorrect, and all) > >> On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis wrote: >> >> You may want to force the driver to use 4k instead of 512b for those drivers >> and create a new pool: >> >> https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5 >> >> -- >> Piotr Jasiukajtis >> >>> On 26 Jul 2016, at 02:24, Shaun McGuane wrote: >>> >>> Hi List, >>> >>> I want to report very strange SSD behaviour on a new pool I setup. >>> >>> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card >>> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79 >>> >>> All the drives are brand spanking new setup in a raidz2 array. >>> >>> Within 2 months the below has happened and there has been very >>> Little use on this array. >>> >>> pool: SSD-TANK >>> state: DEGRADED >>> status: One or more devices are faulted in response to persistent errors. >>> Sufficient replicas exist for the pool to continue functioning in a >>> degraded state. >>> action: Replace the faulted device, or use 'zpool clear' to mark the device >>> repaired. >>> scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04 2016 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> SSD-TANK DEGRADED 16735 >>> raidz2-0 DEGRADED 472 113 >>> c5t500253884014D0D3d0 ONLINE 0 0 2 >>> c5t50025388401F767Ad0 DEGRADED 0 019 too many errors >>> c5t50025388401F767Bd0 FAULTED 0 0 0 too many errors >>> c5t50025388401F767Dd0 ONLINE 0 0 0 >>> c5t50025388401F767Fd0 ONLINE 0 0 1 >>> c5t50025388401F7679d0 ONLINE 0 0 2 >>> c5t50025388401F7680d0 REMOVED 0 0 0 >>> c5t50025388401F7682d0 ONLINE 0 0 1 >>> >>> Can anyone suggest why I would have this problem where I am seeing CKSUM >>> errors >>> On most disks and while only one has faulted others have been degraded or >>> removed. >>> >>> Thanks >>> Shaun >>> ___ >>> OmniOS-discuss mailing list >>> OmniOS-discuss@lists.omniti.com >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >> >> ___ >> OmniOS-discuss mailing list >> OmniOS-discuss@lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] Multiple faulty SSD's ?
I wonder if those sd.conf changes should be upstreamed or not? Dan Sent from my iPhone (typos, autocorrect, and all) > On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis wrote: > > You may want to force the driver to use 4k instead of 512b for those drivers > and create a new pool: > > https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5 > > -- > Piotr Jasiukajtis > >> On 26 Jul 2016, at 02:24, Shaun McGuane wrote: >> >> Hi List, >> >> I want to report very strange SSD behaviour on a new pool I setup. >> >> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card >> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79 >> >> All the drives are brand spanking new setup in a raidz2 array. >> >> Within 2 months the below has happened and there has been very >> Little use on this array. >> >> pool: SSD-TANK >> state: DEGRADED >> status: One or more devices are faulted in response to persistent errors. >>Sufficient replicas exist for the pool to continue functioning in a >>degraded state. >> action: Replace the faulted device, or use 'zpool clear' to mark the device >>repaired. >> scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04 2016 >> config: >> >>NAME STATE READ WRITE CKSUM >>SSD-TANK DEGRADED 16735 >> raidz2-0 DEGRADED 472 113 >>c5t500253884014D0D3d0 ONLINE 0 0 2 >>c5t50025388401F767Ad0 DEGRADED 0 019 too many errors >>c5t50025388401F767Bd0 FAULTED 0 0 0 too many errors >>c5t50025388401F767Dd0 ONLINE 0 0 0 >>c5t50025388401F767Fd0 ONLINE 0 0 1 >>c5t50025388401F7679d0 ONLINE 0 0 2 >>c5t50025388401F7680d0 REMOVED 0 0 0 >>c5t50025388401F7682d0 ONLINE 0 0 1 >> >> Can anyone suggest why I would have this problem where I am seeing CKSUM >> errors >> On most disks and while only one has faulted others have been degraded or >> removed. >> >> Thanks >> Shaun >> ___ >> OmniOS-discuss mailing list >> OmniOS-discuss@lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss > > ___ > OmniOS-discuss mailing list > OmniOS-discuss@lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] Multiple faulty SSD's ?
You may want to force the driver to use 4k instead of 512b for those drivers and create a new pool: https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5 -- Piotr Jasiukajtis > On 26 Jul 2016, at 02:24, Shaun McGuane wrote: > > Hi List, > > I want to report very strange SSD behaviour on a new pool I setup. > > The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card > And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79 > > All the drives are brand spanking new setup in a raidz2 array. > > Within 2 months the below has happened and there has been very > Little use on this array. > > pool: SSD-TANK > state: DEGRADED > status: One or more devices are faulted in response to persistent errors. > Sufficient replicas exist for the pool to continue functioning in a > degraded state. > action: Replace the faulted device, or use 'zpool clear' to mark the device > repaired. > scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04 2016 > config: > > NAME STATE READ WRITE CKSUM > SSD-TANK DEGRADED 16735 > raidz2-0 DEGRADED 472 113 > c5t500253884014D0D3d0 ONLINE 0 0 2 > c5t50025388401F767Ad0 DEGRADED 0 019 too many errors > c5t50025388401F767Bd0 FAULTED 0 0 0 too many errors > c5t50025388401F767Dd0 ONLINE 0 0 0 > c5t50025388401F767Fd0 ONLINE 0 0 1 > c5t50025388401F7679d0 ONLINE 0 0 2 > c5t50025388401F7680d0 REMOVED 0 0 0 > c5t50025388401F7682d0 ONLINE 0 0 1 > > Can anyone suggest why I would have this problem where I am seeing CKSUM > errors > On most disks and while only one has faulted others have been degraded or > removed. > > Thanks > Shaun > ___ > OmniOS-discuss mailing list > OmniOS-discuss@lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss
Re: [OmniOS-discuss] Multiple faulty SSD's ?
Hi Shaun, I have seen something very similar on an Oracle X4-2 I added two 256MB Samsung 850 PRO and created a mirrored zpool The system zpool was a mirror of two SAS Seagate 600GB which came withe the system. The system was running Solaris 10 with latest patches. After a few hours I saw checksum errors on both disks. Oracle told me they did not support the Samsung disks so I swapped them for Intel DC S3500 which have been working fine for months. The 850 PRO drives worked fine on a X4140 but this has a different raid controller. So I guess you need to be careful which SSD's you use with certain controllers :-( Shaun McGuane wrote: > > Hi List, > > > > I want to report very strange SSD behaviour on a new pool I setup. > > > > The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card > > And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79 > > > > All the drives are brand spanking new setup in a raidz2 array. > > > > Within 2 months the below has happened and there has been very > > Little use on this array. > > > > pool: SSD-TANK > > state: DEGRADED > > status: One or more devices are faulted in response to persistent errors. > > Sufficient replicas exist for the pool to continue functioning > in a > > degraded state. > > action: Replace the faulted device, or use 'zpool clear' to mark the > device > > repaired. > > scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 > 20:13:04 2016 > > config: > > > > NAME STATE READ WRITE CKSUM > > SSD-TANK DEGRADED 16735 > > raidz2-0 DEGRADED 472 113 > > c5t500253884014D0D3d0 ONLINE 0 0 2 > > c5t50025388401F767Ad0 DEGRADED 0 019 too > many errors > > c5t50025388401F767Bd0 FAULTED 0 0 0 too > many errors > > c5t50025388401F767Dd0 ONLINE 0 0 0 > > c5t50025388401F767Fd0 ONLINE 0 0 1 > > c5t50025388401F7679d0 ONLINE 0 0 2 > > c5t50025388401F7680d0 REMOVED 0 0 0 > > c5t50025388401F7682d0 ONLINE 0 0 1 > > > > Can anyone suggest why I would have this problem where I am seeing > CKSUM errors > > On most disks and while only one has faulted others have been degraded > or removed. > > > > Thanks > > Shaun > > > > ___ > OmniOS-discuss mailing list > OmniOS-discuss@lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > ___ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss