Re: [OmniOS-discuss] Multiple faulty SSD's ?

2016-07-26 Thread Shaun McGuane
Thanks everyone for you advice on this.

What a pain in the ass…..

Shaun


From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On Behalf 
Of Schweiss, Chip
Sent: Wednesday, 27 July 2016 5:55 AM
To: Piotr Jasiukajtis 
Cc: omnios-discuss 
Subject: Re: [OmniOS-discuss] Multiple faulty SSD's ?

I don't have a lot of experience with the 850 Pro, but a lot with the 840 Pro 
under OmniOS

With 4K block size set in sd.conf and slicing them to only use 80% of their 
capacity a pool of 72 of them has been under near constant heavy read/write 
workload for over 3 years without a single chksum error.

-Chip

On Tue, Jul 26, 2016 at 1:30 PM, Piotr Jasiukajtis 
mailto:est...@me.com>> wrote:
I don’t know a root cause, but it’s better to have a workaround than a 
corrupted pools.

--
Piotr Jasiukajtis

> On 26 Jul 2016, at 20:06, Dan McDonald 
> mailto:dan...@omniti.com>> wrote:
>
> I wonder if those sd.conf changes should be upstreamed or not?
>
> Dan
>
> Sent from my iPhone (typos, autocorrect, and all)
>
>> On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis 
>> mailto:est...@me.com>> wrote:
>>
>> You may want to force the driver to use 4k instead of 512b for those drivers 
>> and create a new pool:
>>
>> https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5
>>
>> --
>> Piotr Jasiukajtis
>>
>>> On 26 Jul 2016, at 02:24, Shaun McGuane 
>>> mailto:sh...@rackcentral.com>> wrote:
>>>
>>> Hi List,
>>>
>>> I want to report very strange SSD behaviour on a new pool I setup.
>>>
>>> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card
>>> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79
>>>
>>> All the drives are brand spanking new setup in a raidz2 array.
>>>
>>> Within 2 months the below has happened and there has been very
>>> Little use on this array.
>>>
>>> pool: SSD-TANK
>>> state: DEGRADED
>>> status: One or more devices are faulted in response to persistent errors.
>>>   Sufficient replicas exist for the pool to continue functioning in a
>>>   degraded state.
>>> action: Replace the faulted device, or use 'zpool clear' to mark the device
>>>   repaired.
>>> scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04 2016
>>> config:
>>>
>>>   NAME   STATE READ WRITE CKSUM
>>>   SSD-TANK   DEGRADED 16735
>>> raidz2-0 DEGRADED 472   113
>>>   c5t500253884014D0D3d0  ONLINE   0 0 2
>>>   c5t50025388401F767Ad0  DEGRADED 0 019  too many errors
>>>   c5t50025388401F767Bd0  FAULTED  0 0 0  too many errors
>>>   c5t50025388401F767Dd0  ONLINE   0 0 0
>>>   c5t50025388401F767Fd0  ONLINE   0 0 1
>>>   c5t50025388401F7679d0  ONLINE   0 0 2
>>>   c5t50025388401F7680d0  REMOVED  0 0 0
>>>   c5t50025388401F7682d0  ONLINE   0 0 1
>>>
>>> Can anyone suggest why I would have this problem where I am seeing CKSUM 
>>> errors
>>> On most disks and while only one has faulted others have been degraded or 
>>> removed.
>>>
>>> Thanks
>>> Shaun
>>> ___
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com<mailto:OmniOS-discuss@lists.omniti.com>
http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Multiple faulty SSD's ?

2016-07-26 Thread Schweiss, Chip
I don't have a lot of experience with the 850 Pro, but a lot with the 840
Pro under OmniOS

With 4K block size set in sd.conf and slicing them to only use 80% of their
capacity a pool of 72 of them has been under near constant heavy read/write
workload for over 3 years without a single chksum error.

-Chip

On Tue, Jul 26, 2016 at 1:30 PM, Piotr Jasiukajtis  wrote:

> I don’t know a root cause, but it’s better to have a workaround than a
> corrupted pools.
>
> --
> Piotr Jasiukajtis
>
> > On 26 Jul 2016, at 20:06, Dan McDonald  wrote:
> >
> > I wonder if those sd.conf changes should be upstreamed or not?
> >
> > Dan
> >
> > Sent from my iPhone (typos, autocorrect, and all)
> >
> >> On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis  wrote:
> >>
> >> You may want to force the driver to use 4k instead of 512b for those
> drivers and create a new pool:
> >>
> >>
> https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5
> >>
> >> --
> >> Piotr Jasiukajtis
> >>
> >>> On 26 Jul 2016, at 02:24, Shaun McGuane  wrote:
> >>>
> >>> Hi List,
> >>>
> >>> I want to report very strange SSD behaviour on a new pool I setup.
> >>>
> >>> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card
> >>> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79
> >>>
> >>> All the drives are brand spanking new setup in a raidz2 array.
> >>>
> >>> Within 2 months the below has happened and there has been very
> >>> Little use on this array.
> >>>
> >>> pool: SSD-TANK
> >>> state: DEGRADED
> >>> status: One or more devices are faulted in response to persistent
> errors.
> >>>   Sufficient replicas exist for the pool to continue functioning
> in a
> >>>   degraded state.
> >>> action: Replace the faulted device, or use 'zpool clear' to mark the
> device
> >>>   repaired.
> >>> scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04
> 2016
> >>> config:
> >>>
> >>>   NAME   STATE READ WRITE CKSUM
> >>>   SSD-TANK   DEGRADED 16735
> >>> raidz2-0 DEGRADED 472   113
> >>>   c5t500253884014D0D3d0  ONLINE   0 0 2
> >>>   c5t50025388401F767Ad0  DEGRADED 0 019  too many
> errors
> >>>   c5t50025388401F767Bd0  FAULTED  0 0 0  too many
> errors
> >>>   c5t50025388401F767Dd0  ONLINE   0 0 0
> >>>   c5t50025388401F767Fd0  ONLINE   0 0 1
> >>>   c5t50025388401F7679d0  ONLINE   0 0 2
> >>>   c5t50025388401F7680d0  REMOVED  0 0 0
> >>>   c5t50025388401F7682d0  ONLINE   0 0 1
> >>>
> >>> Can anyone suggest why I would have this problem where I am seeing
> CKSUM errors
> >>> On most disks and while only one has faulted others have been degraded
> or removed.
> >>>
> >>> Thanks
> >>> Shaun
> >>> ___
> >>> OmniOS-discuss mailing list
> >>> OmniOS-discuss@lists.omniti.com
> >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >>
> >> ___
> >> OmniOS-discuss mailing list
> >> OmniOS-discuss@lists.omniti.com
> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Multiple faulty SSD's ?

2016-07-26 Thread Piotr Jasiukajtis
I don’t know a root cause, but it’s better to have a workaround than a 
corrupted pools. 

--
Piotr Jasiukajtis

> On 26 Jul 2016, at 20:06, Dan McDonald  wrote:
> 
> I wonder if those sd.conf changes should be upstreamed or not?
> 
> Dan
> 
> Sent from my iPhone (typos, autocorrect, and all)
> 
>> On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis  wrote:
>> 
>> You may want to force the driver to use 4k instead of 512b for those drivers 
>> and create a new pool:
>> 
>> https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5
>> 
>> --
>> Piotr Jasiukajtis
>> 
>>> On 26 Jul 2016, at 02:24, Shaun McGuane  wrote:
>>> 
>>> Hi List,
>>> 
>>> I want to report very strange SSD behaviour on a new pool I setup.
>>> 
>>> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card
>>> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79
>>> 
>>> All the drives are brand spanking new setup in a raidz2 array.
>>> 
>>> Within 2 months the below has happened and there has been very
>>> Little use on this array. 
>>> 
>>> pool: SSD-TANK
>>> state: DEGRADED
>>> status: One or more devices are faulted in response to persistent errors.
>>>   Sufficient replicas exist for the pool to continue functioning in a
>>>   degraded state.
>>> action: Replace the faulted device, or use 'zpool clear' to mark the device
>>>   repaired.
>>> scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04 2016
>>> config:
>>> 
>>>   NAME   STATE READ WRITE CKSUM
>>>   SSD-TANK   DEGRADED 16735
>>> raidz2-0 DEGRADED 472   113
>>>   c5t500253884014D0D3d0  ONLINE   0 0 2
>>>   c5t50025388401F767Ad0  DEGRADED 0 019  too many errors
>>>   c5t50025388401F767Bd0  FAULTED  0 0 0  too many errors
>>>   c5t50025388401F767Dd0  ONLINE   0 0 0
>>>   c5t50025388401F767Fd0  ONLINE   0 0 1
>>>   c5t50025388401F7679d0  ONLINE   0 0 2
>>>   c5t50025388401F7680d0  REMOVED  0 0 0
>>>   c5t50025388401F7682d0  ONLINE   0 0 1
>>> 
>>> Can anyone suggest why I would have this problem where I am seeing CKSUM 
>>> errors
>>> On most disks and while only one has faulted others have been degraded or 
>>> removed.
>>> 
>>> Thanks
>>> Shaun
>>> ___
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss@lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> 
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Multiple faulty SSD's ?

2016-07-26 Thread Dan McDonald
I wonder if those sd.conf changes should be upstreamed or not?

Dan

Sent from my iPhone (typos, autocorrect, and all)

> On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis  wrote:
> 
> You may want to force the driver to use 4k instead of 512b for those drivers 
> and create a new pool:
> 
> https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5
> 
> --
> Piotr Jasiukajtis
> 
>> On 26 Jul 2016, at 02:24, Shaun McGuane  wrote:
>> 
>> Hi List,
>> 
>> I want to report very strange SSD behaviour on a new pool I setup.
>> 
>> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card
>> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79
>> 
>> All the drives are brand spanking new setup in a raidz2 array.
>> 
>> Within 2 months the below has happened and there has been very
>> Little use on this array. 
>> 
>>  pool: SSD-TANK
>> state: DEGRADED
>> status: One or more devices are faulted in response to persistent errors.
>>Sufficient replicas exist for the pool to continue functioning in a
>>degraded state.
>> action: Replace the faulted device, or use 'zpool clear' to mark the device
>>repaired.
>>  scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04 2016
>> config:
>> 
>>NAME   STATE READ WRITE CKSUM
>>SSD-TANK   DEGRADED 16735
>>  raidz2-0 DEGRADED 472   113
>>c5t500253884014D0D3d0  ONLINE   0 0 2
>>c5t50025388401F767Ad0  DEGRADED 0 019  too many errors
>>c5t50025388401F767Bd0  FAULTED  0 0 0  too many errors
>>c5t50025388401F767Dd0  ONLINE   0 0 0
>>c5t50025388401F767Fd0  ONLINE   0 0 1
>>c5t50025388401F7679d0  ONLINE   0 0 2
>>c5t50025388401F7680d0  REMOVED  0 0 0
>>c5t50025388401F7682d0  ONLINE   0 0 1
>> 
>> Can anyone suggest why I would have this problem where I am seeing CKSUM 
>> errors
>> On most disks and while only one has faulted others have been degraded or 
>> removed.
>> 
>> Thanks
>> Shaun
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Multiple faulty SSD's ?

2016-07-26 Thread Piotr Jasiukajtis
You may want to force the driver to use 4k instead of 512b for those drivers 
and create a new pool:

https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5

--
Piotr Jasiukajtis

> On 26 Jul 2016, at 02:24, Shaun McGuane  wrote:
> 
> Hi List,
>  
> I want to report very strange SSD behaviour on a new pool I setup.
>  
> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card
> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79
>  
> All the drives are brand spanking new setup in a raidz2 array.
>  
> Within 2 months the below has happened and there has been very
> Little use on this array. 
>  
>   pool: SSD-TANK
> state: DEGRADED
> status: One or more devices are faulted in response to persistent errors.
> Sufficient replicas exist for the pool to continue functioning in a
> degraded state.
> action: Replace the faulted device, or use 'zpool clear' to mark the device
> repaired.
>   scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04 2016
> config:
>  
> NAME   STATE READ WRITE CKSUM
> SSD-TANK   DEGRADED 16735
>   raidz2-0 DEGRADED 472   113
> c5t500253884014D0D3d0  ONLINE   0 0 2
> c5t50025388401F767Ad0  DEGRADED 0 019  too many errors
> c5t50025388401F767Bd0  FAULTED  0 0 0  too many errors
> c5t50025388401F767Dd0  ONLINE   0 0 0
> c5t50025388401F767Fd0  ONLINE   0 0 1
> c5t50025388401F7679d0  ONLINE   0 0 2
> c5t50025388401F7680d0  REMOVED  0 0 0
> c5t50025388401F7682d0  ONLINE   0 0 1
>  
> Can anyone suggest why I would have this problem where I am seeing CKSUM 
> errors
> On most disks and while only one has faulted others have been degraded or 
> removed.
>  
> Thanks
> Shaun
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Multiple faulty SSD's ?

2016-07-26 Thread Richard Skelton
Hi Shaun,
I have seen something very similar on an Oracle X4-2
I added two 256MB Samsung 850 PRO and created a mirrored zpool
The system zpool was a mirror of two SAS Seagate 600GB which came withe
the system.
The system was running Solaris 10 with latest patches.
After a few hours I saw checksum errors on both disks.
Oracle told me they did not support the Samsung disks so I swapped them
for Intel DC S3500 which have been working fine for months.
The 850 PRO drives worked fine on a X4140 but this has a different raid
controller.
So I guess you need to be careful which SSD's you use with certain
controllers :-(

Shaun McGuane wrote:
>
> Hi List,
>
>  
>
> I want to report very strange SSD behaviour on a new pool I setup.
>
>  
>
> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card
>
> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79
>
>  
>
> All the drives are brand spanking new setup in a raidz2 array.
>
>  
>
> Within 2 months the below has happened and there has been very
>
> Little use on this array.
>
>  
>
>   pool: SSD-TANK
>
> state: DEGRADED
>
> status: One or more devices are faulted in response to persistent errors.
>
> Sufficient replicas exist for the pool to continue functioning
> in a
>
> degraded state.
>
> action: Replace the faulted device, or use 'zpool clear' to mark the
> device
>
> repaired.
>
>   scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25
> 20:13:04 2016
>
> config:
>
>  
>
> NAME   STATE READ WRITE CKSUM
>
> SSD-TANK   DEGRADED 16735
>
>   raidz2-0 DEGRADED 472   113
>
> c5t500253884014D0D3d0  ONLINE   0 0 2
>
> c5t50025388401F767Ad0  DEGRADED 0 019  too
> many errors
>
> c5t50025388401F767Bd0  FAULTED  0 0 0  too
> many errors
>
> c5t50025388401F767Dd0  ONLINE   0 0 0
>
> c5t50025388401F767Fd0  ONLINE   0 0 1
>
> c5t50025388401F7679d0  ONLINE   0 0 2
>
> c5t50025388401F7680d0  REMOVED  0 0 0
>
> c5t50025388401F7682d0  ONLINE   0 0 1
>
>  
>
> Can anyone suggest why I would have this problem where I am seeing
> CKSUM errors
>
> On most disks and while only one has faulted others have been degraded
> or removed.
>
>  
>
> Thanks
>
> Shaun
>
> 
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>   
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss