Re: [OmniOS-discuss] NVMe JBOF

2018-12-14 Thread Richard Elling



> On Dec 14, 2018, at 6:39 AM, Schweiss, Chip  wrote:
> 
> Has the NVMe support in Illumos come far enough along to properly support two 
> servers connected to NVMe JBOF storage such as the Supermicro SSG-136R-N32JBF?

I can't speak to the Supermicro, but I can talk in detail about 
https://www.vikingenterprisesolutions.com/products-2/nds-2244/

> 
> While I do not run HA because of too many issues, I still build everything 
> with two server nodes.  This makes updates and reboots possible by moving a 
> pool to the sister host and greatly minimizing downtime.   This is essential 
> when the NFS target is hosting 300+ vSphere VMs.

The NDS-2244 is a 24-slot u.2 NVMe chassis with programmable PCIe switches.
To the host, the devices look like locally attached NVMe and there is no 
software
changes required. Multiple hosts can connect, up to the PCIe port limits. If 
you use
dual-port NVMe drives, then you can share the drives between any two hosts 
concurrently.
Programming the switches is accomplished out-of-band by an HTTTP-based interface
that also monitors the enclosure.

In other words, if you want a NVMe equivalent to a dual-hosted SAS JBOD, the 
NDS-2244
is very capable, but more configurable.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow NFS writes in 151026

2018-08-23 Thread Richard Elling
fwiw, nfssvrstat breaks down the NFS writes by sync, async, and commits: 
explicitly for determining how the workload will impact ZIL. For writing many 
files, the (compound) operations can also include creates and sync-on-close 
that also impacts performance.

  -- richard



> On Aug 23, 2018, at 5:23 PM, Lee Damon  wrote:
> 
> (This doesn't appear to have gone out so I'm re-sending. Apologies if it's a 
> duplicate.)
> 
>> On 8/23/18 16:43 , Lee Damon wrote:
>> (I've just changed from digest to regular subscription as I see there
>> are messages relevant to this that I haven't received yet...)
>> Doug, I'm not familiar with the evil zfs tuning wiki mechanism. I'll
>> have to see if Google can help me find it.
>> As for the ZIL+ L2ARC on the same SSD potentially being the problem,
>> clearly I can't say with 100% certanty that it is not a problem however I
>> have a second host (running 151022) with _exactly_ the same configuration
>> of hard drives + split-SSD and NFS writes to that pool are fine.
>> hvfs2 is ~18 months old but the chrup0 pool is a few months old.
>> time cp -rp /misc/fs1test/004test /misc/hvfs2chru/omics1
>> real3m11.431s
>> user0m0.177s
>> sys 0m28.030s
>> time cp -rp /misc/fs1test/004test /misc/fs2test/omics1
>> real21m13.412s
>> user0m0.188s
>> sys 0m28.678s
>> nomad
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-16 Thread Richard Elling
fmadm allows you to load/unload modules.
 -- richard

> On Mar 16, 2018, at 8:24 AM, Schweiss, Chip  wrote:
> 
> I need to get this JBOD working with OmniOS.  Is there a way to get FMD to 
> ignore this SES device until this issue is fixed?   
> 
> It is a RAID, Inc. 4U 96-Bay  
> http://www.raidinc.com/products/object-storage/ability-4u-96-bay 
>  
> 
> -Chip
> 
> On Fri, Mar 16, 2018 at 9:18 AM, Schweiss, Chip  > wrote:
> While this problem was originally ruled out as an artifact of running as a 
> virtual machine, I've now installed the same HBA and JBOD to a physical 
> server.   The problem is exactly the same.   
> 
> This is on OmniOS CE r151024r
> 
> -Chip
> 
> # /usr/lib/fm/fmd/fmd -o fg=true -o client.debug=true
> fmd: [ loading modules ... ABORT: attempted zero-length allocation: Operation 
> not supported
> Abort (core dumped)
> 
> > $C
> 080462a8 libc.so.1`_lwp_kill+0x15(1, 6, 80462f8, fef42000, fef42000, 8046330)
> 080462c8 libc.so.1`raise+0x2b(6, 0, 80462e0, feec1b59, 0, 0)
> 08046318 libc.so.1`abort+0x10e(fead51f0, 0, fede2a40, 30, 524f4241, 61203a54)
> 08046748 libses.so.1`ses_panic(fdde6758, 8046774, 80467e8, fdb6b67a, 83eb0a8, 
> fdb6c398)
> 08046768 libses.so.1`ses_realloc(fdde6758, 0, 83f01b8, fdde6130, fddf7000, 
> fdb6658f)
> 08046788 libses.so.1`ses_alloc+0x27(0, feb8, 6, 10, ee0, 8111627)
> 080467b8 libses.so.1`ses_zalloc+0x1e(0, 0, 73, fdb6659d, 83f0190, 8)
> 08046838 ses2.so`elem_parse_aes_misc+0x91(81114f4, 83eb0a8, 8, fdb65d85)
> 08046888 ses2.so`elem_parse_aes+0xfc(82f1ac8, 83f0288, 80468f8, fdb80eae)
> 080468a8 ses2.so`ses2_fill_element_node+0x37(82f1ac8, 83f0288, 832e930, 4)
> 080468d8 ses2.so`ses2_node_parse+0x53(82f1ac8, 83f0288, e, fddf7000)
> 080468f8 libses.so.1`ses_fill_node+0x22(83f0288, 83f0348, fdde38ae, fdde394c)
> 08046918 libses.so.1`ses_fill_tree+0x21(83f0288, 82f1c88, 83e4cc8, fdde394c)
> 08046938 libses.so.1`ses_fill_tree+0x33(82f1d88, 82f1b88, 8046968, fdde394c)
> 08046958 libses.so.1`ses_fill_tree+0x33(82f1c88, 82ef758, 8046998, fdde394c)
> 08046978 libses.so.1`ses_fill_tree+0x33(82f1b88, 0, 18, fddf7000)
> 08046998 libses.so.1`ses_fill_snap+0x22(82f08a0, 80, 0, fdde56eb)
> 080469e8 libses.so.1`ses_snap_new+0x325(82f1b48, 0, 8046a18, fdde3006)
> 08046a18 libses.so.1`ses_open_scsi+0xc4(1, 82ef688, 8046aa0, fed71c1b, 
> 81053f8, fede4042)
> 08046a68 libses.so.1`ses_open+0x98(1, 8046aa0, 0, feecedd3, 43, fde1fc58)
> 08046eb8 ses.so`ses_process_dir+0x133(fde20159, 83cc348, 0, fed77e40)
> 08046ee8 ses.so`ses_enum+0xc1(81053f8, 82f21a0, 8386608, 0, 400, 0)
> 08046f38 libtopo.so.1`topo_mod_enumerate+0xc4(81053f8, 82f21a0, 82fb1c8, 
> 8386608, 0, 400)
> 08046f88 libtopo.so.1`enum_run+0xe9(8105a18, 83d6f78, a, fed7b1dd)
> 08046fd8 libtopo.so.1`topo_xml_range_process+0x13e(8105a18, 82eb5b0, 83d6f78, 
> 8047008)
> 08047028 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82eb5b0, 82f21a0)
> 08047088 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82ebd30, 82f21a0, 
> 8105a18, 83cbac0)
> 080470e8 libtopo.so.1`topo_xml_walk+0x1b2(8105a18, 82dfde0, 82de080, 82f21a0)
> 08047128 libtopo.so.1`dependent_create+0x127(8105a18, 82dfde0, 83d3aa0, 
> 82de080, 82f21a0, fed7b1f9)
> 08047168 libtopo.so.1`dependents_create+0x64(8105a18, 82dfde0, 83d3aa0, 
> 82de300, 82f21a0, 81eb0d8)
> 08047218 libtopo.so.1`pad_process+0x51e(8105a18, 83ce100, 82de300, 82f21a0, 
> 83ce128, 81d8638)
> 08047278 libtopo.so.1`topo_xml_range_process+0x31f(8105a18, 82de300, 83ce100, 
> 80472a8)
> 080472c8 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82de300, 81eb198)
> 08047328 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82d1ca0, 81eb198, 
> 8103f40, fed8c000)
> 08047358 libtopo.so.1`topo_xml_enum+0x67(8105a18, 82dfde0, 81eb198, feac2000)
> 08047488 libtopo.so.1`topo_file_load+0x139(8105a18, 81eb198, fe20c127, 
> fe20bda2, 0, 82d2000)
> 080474b8 libtopo.so.1`topo_mod_enummap+0x26(8105a18, 81eb198, fe20c127, 
> fe20bda2, 8105a18, fe20b11c)
> 08047508 x86pi.so`x86pi_enum_start+0xc5(8105a18, 8047530, 8047538, fe205580, 
> 8105a18, 8105a18)
> 08047558 x86pi.so`x86pi_enum+0x55(8105a18, 81eb198, 81d8a90, 0, 0, 0)
> 080475a8 libtopo.so.1`topo_mod_enumerate+0xc4(8105a18, 81eb198, 80ebf38, 
> 81d8a90, 0, 0)
> 080475f8 libtopo.so.1`enum_run+0xe9(8105b68, 81f1070, a, fed7b1dd)
> 08047648 libtopo.so.1`topo_xml_range_process+0x13e(8105b68, 81f94c8, 81f1070, 
> 8047678)
> 08047698 libtopo.so.1`tf_rdata_new+0x135(8105b68, 81f4240, 81f94c8, 81eb198)
> 080476f8 libtopo.so.1`topo_xml_walk+0x246(8105b68, 81f4240, 81f9608, 81eb198, 
> 8103f40, fed8c000)
> 08047728 libtopo.so.1`topo_xml_enum+0x67(8105b68, 81f4240, 81eb198, 81d8ad0)
> 08047858 libtopo.so.1`topo_file_load+0x139(8105b68, 81eb198, 80f3f38, 
> 81d8aa0, 0, 2c)
> 08047898 libtopo.so.1`topo_tree_enum+0x89(8103f40, 81f51c8, 80478c8, 
> fe70e6f8, 81f7f78, 8103f40)
> 080478b8 libtopo.so.1`topo_tree_enum_all+0x20(8103f40, 81f7f78, 80478f8, 
> fed71087)
> 080478f8

Re: [OmniOS-discuss] write amplification zvol

2017-10-02 Thread Richard Elling

> On Oct 2, 2017, at 12:51 AM, anthony omnios  wrote:
> 
> Hi, 
> 
> i have tried with a pool with ashift=9 and there is no write amplification, 
> problem is solved.

ashift=13 means that the minumum size (bytes) written will be 8k (1<<13). So 
when you write
a single byte, there will be at least 2 writes for the data (both sides of the 
mirror), 4 writes for
metadata (both sides of the mirror * 2 copies of metadata for redundancy). Each 
metadata 
block contains information on 128 or more data blocks, so there is not a 1:1 
correlation between
data and metadata writes.

Reducing ashift doesn't change the number of blocks written for a single byte 
write. It can only
reduce or increase the size in bytes of the writes.

HTH
 -- richard

> 
> But i can't used a ashift=9 with ssd (850 evo), i have read many articles 
> indicated problems with ashift=9 on ssd.
> 
> How ca i do ? does i need to tweak specific zfs value ?
> 
> Thanks,
> 
> Anthony
> 
> 
> 
> 2017-09-28 11:48 GMT+02:00 anthony omnios  <mailto:icoomn...@gmail.com>>:
> Thanks for you help Stephan.
> 
> i have tried differents LUN with default of 512b and 4096:
> 
> LU Name: 600144F04D4F060059A588910001
> Operational Status: Online
> Provider Name : sbd
> Alias : /dev/zvol/rdsk/filervm2/hdd-110002b
> View Entry Count  : 1
> Data File : /dev/zvol/rdsk/filervm2/hdd-110002b
> Meta File : not set
> Size  : 26843545600
> Block Size: 4096
> Management URL: not set
> Vendor ID : SUN 
> Product ID: COMSTAR 
> Serial Num: not set
> Write Protect : Disabled
> Writeback Cache   : Disabled
> Access State  : Active
> 
> Problem is the same.
> 
> Cheers,
> 
> Anthony
> 
> 2017-09-28 10:33 GMT+02:00 Stephan Budach  <mailto:stephan.bud...@jvm.de>>:
> - Ursprüngliche Mail -
> 
> > Von: "anthony omnios" mailto:icoomn...@gmail.com>>
> > An: "Richard Elling"  > <mailto:richard.ell...@richardelling.com>>
> > CC: omnios-discuss@lists.omniti.com <mailto:omnios-discuss@lists.omniti.com>
> > Gesendet: Donnerstag, 28. September 2017 09:56:42
> > Betreff: Re: [OmniOS-discuss] write amplification zvol
> 
> > Thanks Richard for your help.
> 
> > My problem is that i have a network ISCSI traffic of 2 MB/s, each 5
> > seconds i need to write on disks 10 MB of network traffic but on
> > pool filervm2 I am writing much more that, approximatively 60 MB
> > each 5 seconds. Each ssd of filervm2 is writting 15 MB every 5
> > second. When i check with smartmootools every ssd is writing
> > approximatively 250 GB of data each day.
> 
> > How can i reduce amont of data writting on each ssd ? i have try to
> > reduce block size of zvol but it change nothing.
> 
> > Anthony
> 
> > 2017-09-28 1:29 GMT+02:00 Richard Elling <
> > richard.ell...@richardelling.com <mailto:richard.ell...@richardelling.com> 
> > > :
> 
> > > Comment below...
> >
> 
> > > > On Sep 27, 2017, at 12:57 AM, anthony omnios <
> > > > icoomn...@gmail.com <mailto:icoomn...@gmail.com>
> > > > > wrote:
> >
> > > >
> >
> > > > Hi,
> >
> > > >
> >
> > > > i have a problem, i used many ISCSI zvol (for each vm), network
> > > > traffic is 2MB/s between kvm host and filer but i write on disks
> > > > many more than that. I used a pool with separated mirror zil
> > > > (intel s3710) and 8 ssd samsung 850 evo 1To
> >
> > > >
> >
> > > > zpool status
> >
> > > > pool: filervm2
> >
> > > > state: ONLINE
> >
> > > > scan: resilvered 406G in 0h22m with 0 errors on Wed Sep 20
> > > > 15:45:48
> > > > 2017
> >
> > > > config:
> >
> > > >
> >
> > > > NAME STATE READ WRITE CKSUM
> >
> > > > filervm2 ONLINE 0 0 0
> >
> > > > mirror-0 ONLINE 0 0 0
> >
> > > > c7t5002538D41657AAFd0 ONLINE 0 0 0
> >
> > > > c7t5002538D41F85C0Dd0 ONLINE 0 0 0
> >
> > > > mirror-2 ONLINE 0 0 0
> >
> > > > c7t5002538D41CC7105d0 ONLINE 0 0 0
> >
> > > > c7t5002538D41CC7127d0 ONLINE 0 0 0
> >
> > > > mirror-3 ONLINE 0 0 0
> >
> > > > c7t5002538D41CD7F7Ed0 ONLINE 0 0 0
> >
> > > > c7t5002538D41CD83FDd0 ONLINE 0 0 0

Re: [OmniOS-discuss] write amplification zvol

2017-09-27 Thread Richard Elling
Comment below...

> On Sep 27, 2017, at 12:57 AM, anthony omnios  wrote:
> 
> Hi,
> 
> i have a problem, i used many ISCSI zvol (for each vm), network traffic is 
> 2MB/s between kvm host and filer but i write on disks many more than that. I 
> used a pool with separated mirror zil (intel s3710) and 8 ssd samsung  850 
> evo 1To
> 
>  zpool status
>   pool: filervm2
>  state: ONLINE
>   scan: resilvered 406G in 0h22m with 0 errors on Wed Sep 20 15:45:48 2017
> config:
> 
> NAME   STATE READ WRITE CKSUM
> filervm2   ONLINE   0 0 0
>   mirror-0 ONLINE   0 0 0
> c7t5002538D41657AAFd0  ONLINE   0 0 0
> c7t5002538D41F85C0Dd0  ONLINE   0 0 0
>   mirror-2 ONLINE   0 0 0
> c7t5002538D41CC7105d0  ONLINE   0 0 0
> c7t5002538D41CC7127d0  ONLINE   0 0 0
>   mirror-3 ONLINE   0 0 0
> c7t5002538D41CD7F7Ed0  ONLINE   0 0 0
> c7t5002538D41CD83FDd0  ONLINE   0 0 0
>   mirror-4 ONLINE   0 0 0
> c7t5002538D41CD7F7Ad0  ONLINE   0 0 0
> c7t5002538D41CD7F7Dd0  ONLINE   0 0 0
> logs
>   mirror-1 ONLINE   0 0 0
> c4t2d0 ONLINE   0 0 0
> c4t4d0 ONLINE   0 0 0
> 
> i used correct ashift of 13 for samsung 850 evo
> zdb|grep ashift :
> 
> ashift: 13
> ashift: 13
> ashift: 13
> ashift: 13
> ashift: 13
> 
> But i write a lot on ssd every 5 seconds (many more than the network traffic 
> of 2 MB/s)
> 
> iostat -xn -d 1 : 
> 
>  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>11.0 3067.5  288.3 153457.4  6.8  0.52.20.2   5  14 filervm2

filervm2 is seeing 3067 writes per second. This is the interface to the upper 
layers.
These writes are small.

> 0.00.00.00.0  0.0  0.00.00.0   0   0 rpool
> 0.00.00.00.0  0.0  0.00.00.0   0   0 c4t0d0
> 0.00.00.00.0  0.0  0.00.00.0   0   0 c4t1d0
> 0.0  552.60.0 17284.0  0.0  0.10.00.2   0   8 c4t2d0
> 0.0  552.60.0 17284.0  0.0  0.10.00.2   0   8 c4t4d0

The log devices are seeing 552 writes per second and since sync=standard that 
means that the upper layers are requesting syncs.

> 1.0  233.3   48.1 10051.6  0.0  0.00.00.1   0   3 
> c7t5002538D41657AAFd0
> 5.0  250.3  144.2 13207.3  0.0  0.00.00.1   0   3 
> c7t5002538D41CC7127d0
> 2.0  254.3   24.0 13207.3  0.0  0.00.00.1   0   4 
> c7t5002538D41CC7105d0
> 3.0  235.3   72.1 10051.6  0.0  0.00.00.1   0   3 
> c7t5002538D41F85C0Dd0
> 0.0  228.30.0 16178.7  0.0  0.00.00.2   0   4 
> c7t5002538D41CD83FDd0
> 0.0  225.30.0 16210.7  0.0  0.00.00.2   0   4 
> c7t5002538D41CD7F7Ed0
> 0.0  282.30.0 19991.1  0.0  0.00.00.2   0   5 
> c7t5002538D41CD7F7Dd0
> 0.0  280.30.0 19871.0  0.0  0.00.00.2   0   5 
> c7t5002538D41CD7F7Ad0

The pool disks see 1989 writes per second total or 994 writes per second 
logically.

It seems to me that reducing 3067 requested writes to 994 logical writes is the 
opposite
of amplification. What do you expect?
 -- richard

> 
> I used zvol of 64k, i try with 8k and problem is the same.
> 
> zfs get all filervm2/hdd-110022a :
> 
> NAME  PROPERTY  VALUE  SOURCE
> filervm2/hdd-110022a  type  volume -
> filervm2/hdd-110022a  creation  Tue May 16 10:24 2017  -
> filervm2/hdd-110022a  used  5.26G  -
> filervm2/hdd-110022a  available 2.90T  -
> filervm2/hdd-110022a  referenced5.24G  -
> filervm2/hdd-110022a  compressratio 3.99x  -
> filervm2/hdd-110022a  reservation   none   default
> filervm2/hdd-110022a  volsize   25Glocal
> filervm2/hdd-110022a  volblocksize  64K-
> filervm2/hdd-110022a  checksum  on default
> filervm2/hdd-110022a  compression   lz4local
> filervm2/hdd-110022a  readonly  offdefault
> filervm2/hdd-110022a  copies1  default
> filervm2/hdd-110022a  refreservationnone   default
> filervm2/hdd-110022a  primarycache  alldefault
> filervm2/hdd-110022a  secondarycachealldefault
> filervm2/hdd-110022a  usedbysnapshots   15.4M  -
> filervm2/hdd-1

Re: [OmniOS-discuss] Upgrade to 151022m from 014 - horrible NFS performance

2017-08-24 Thread Richard Elling

> On Aug 24, 2017, at 5:41 AM, Schweiss, Chip  wrote:
> 
> I just move one of my production systems to OmniOS CE 151022m from 151014 and 
> my NFS performance has tanked.  
> 
> Here's a snapshot of nfssvrtop:
> 
> 2017 Aug 24 07:34:39, load: 1.54, read: 5427 KB, swrite: 104  KB, 
> awrite: 9634 KB
> Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw  
> SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%
> 3   10.28.17.10   0   0   0   0   0   
> 0   0   0   0   0   0   0
> 3   all   0   0   0   0   0   0   
> 0   0   0   0   0   0   0
> 4   10.28.17.19   0   0   0   0   0   
> 0   0   0   0   0   0   0
> 4   10.28.16.160 17   0   0   0   0   0   
> 0   0   0   0   0   0   0
> 4   10.28.16.127 20   0   0   0   0   0   
> 0   0   0   0   0   0   0
> 4   10.28.16.113 74   6   6   0   0  48  
> 56   01366   20824   0   0 100
> 4   10.28.16.64 338  16   0  36   3 476   
> 01065 120   0 130  117390 100
> 4   10.28.16.54 696  68   0  91   52173   
> 02916  52   0  93  142083 100
> 4   all1185  90   6 127   82697  
> 563996 151   20824 104  133979 100
> 
> The pool is not doing anything but serving NFS.   Before the upgrade, the 
> pool would sustain 20k NFS ops.   

The commit time is in microseconds, and it does look high. Is there a slog?
 — richard

> 
> Is there some significant change in NFS that I need to adjust its tuning?
> 
> -Chip
> 
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Fragmentation

2017-06-23 Thread Richard Elling
ZIL pre-allocates at the block level, so think along the lines of 12k or 132k.
 — richard

> On Jun 23, 2017, at 11:30 AM, Günther Alka  wrote:
> 
> hello Richard
> 
> I can follow that the Zil does not add more fragmentation to the free space 
> but is this effect relevant?
> If a ZIL pre-allocates say 4G and the remaining fragmented poolsize for 
> regular writes is 12T
> 
> Gea
> 
> Am 23.06.2017 um 19:30 schrieb Richard Elling:
>> A slog helps fragmentation because the space for ZIL is pre-allocated based 
>> on a prediction of
>> how big the write will be. The pre-allocated space includes a 
>> physical-block-sized chain block for the
>> ZIL. An 8k write can allocate 12k for the ZIL entry that is freed when the 
>> txg commits. Thus, a slog
>> can help decrease free space fragmentation in the pool.
>>  — richard
>> 
>> 
>>> On Jun 23, 2017, at 8:56 AM, Guenther Alka  wrote:
>>> 
>>> A Zil or better dedicated Slog device will not help as this is not a write 
>>> cache but a logdevice. Its only there to commit every written datablock and 
>>> to put it onto stable storage. It is read only after a crash to redo a 
>>> missing committed write.
>>> 
>>> All writes, does not matter if sync or not, are going over the rambased 
>>> write cache (per default up to 4GB). This is flushed from time to time as a 
>>> large sequential write. Writes are fragmented then depending on the 
>>> fragmentation of the free space.
>>> 
>>> Gea
>>> 
>>> 
>>>> To prevent it, a ZIL caching all writes (including sync ones, e.g. nfs) 
>>>> can help. Perhaps a DDR drive (or mirror of these) with battery and flash 
>>>> protection from poweroffs, so it does not wear out like flash would. In 
>>>> this case, how-ever random writes come, ZFS does not have to put them on 
>>>> media asap - so it can do larger writes later. This can also protect SSD 
>>>> arrays from excessive small writes and wear-out, though there a bad(ly 
>>>> sized) ZIL can become a bottleneck.
>>>> 
>>>> Hope this helps,
>>>> Jim
>>>> --
>>> ___
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss@lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> -- 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Fragmentation

2017-06-23 Thread Richard Elling
A slog helps fragmentation because the space for ZIL is pre-allocated based on 
a prediction of
how big the write will be. The pre-allocated space includes a 
physical-block-sized chain block for the
ZIL. An 8k write can allocate 12k for the ZIL entry that is freed when the txg 
commits. Thus, a slog
can help decrease free space fragmentation in the pool.
 — richard


> On Jun 23, 2017, at 8:56 AM, Guenther Alka  wrote:
> 
> A Zil or better dedicated Slog device will not help as this is not a write 
> cache but a logdevice. Its only there to commit every written datablock and 
> to put it onto stable storage. It is read only after a crash to redo a 
> missing committed write.
> 
> All writes, does not matter if sync or not, are going over the rambased write 
> cache (per default up to 4GB). This is flushed from time to time as a large 
> sequential write. Writes are fragmented then depending on the fragmentation 
> of the free space.
> 
> Gea
> 
> 
>> To prevent it, a ZIL caching all writes (including sync ones, e.g. nfs) can 
>> help. Perhaps a DDR drive (or mirror of these) with battery and flash 
>> protection from poweroffs, so it does not wear out like flash would. In this 
>> case, how-ever random writes come, ZFS does not have to put them on media 
>> asap - so it can do larger writes later. This can also protect SSD arrays 
>> from excessive small writes and wear-out, though there a bad(ly sized) ZIL 
>> can become a bottleneck.
>> 
>> Hope this helps,
>> Jim
>> --
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] To the OmniOS Community

2017-05-12 Thread Richard Elling
Thanks for everything Dan, you’ve been a terrific asset to the OmniOS community!
 — richard

> On May 12, 2017, at 8:07 AM, Dan McDonald  wrote:
> 
> Dear OmniOS Community,
> 
> For the past 3+ years, it has been my pleasure and honor to be "OmniOS 
> Engineering" at OmniTI. I hope I made OmniOS a nice platform to use for 
> solving problems, whether they be Home-Data-Center, service-hosting box, 
> network filer, or other uses.
> 
> As you saw, OmniTI is turning over OmniOS completely to the community.  The 
> decision-making was sensible for OmniTI, and I understand it completely. With 
> r151022, OmniOS should be at a nice long-term state (022 was always intended 
> to be LTS).
> 
> I will be around OmniTI for another week (though on Friday the 19th my 
> availability will be spotty).  During this next week, I encourage people to 
> start upgrading or installing r151022 on their environments.  I already have 
> updated it on my own HDC, and it seems to be performing as it has previously. 
>  
> 
> Thank you again, OmniOS community, for making these past three years as 
> rewarding as I'd hoped they'd be when I joined OmniTI. And as for OmniTI, if 
> you need web or database consulting, please keep them in mind. Still a fan, 
> even though I'm no longer with them.
> 
> Dan McDonald -- OmniOS Engineering
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Resilver zero progress

2017-05-10 Thread Richard Elling

> On May 10, 2017, at 8:29 AM, Schweiss, Chip  wrote:
> 
> I have a pool that has had a resilver running for about an hour but the 
> progress status is a bit alarming.  I'm concerned for some reason it will not 
> resilver.   Resilvers are tuned to be faster in /etc/system.   This is on 
> OmniOS r151014, currently fully updated.   Any suggestions?

mdb’s "::zfs_dbgmsg" macro shows scan progress
 — richard

> 
> -Chip
> 
> from /etc/system:
> 
> set zfs:zfs_resilver_delay = 0
> set zfs:zfs_scrub_delay = 0
> set zfs:zfs_top_maxinflight = 64
> set zfs:zfs_resilver_min_time_ms = 5000
> 
> 
> # zpool status hcp03
>   pool: hcp03
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>   scan: resilver in progress since Wed May 10 09:22:15 2017
> 1 scanned out of 545T at 1/s, (scan is slow, no estimated time)
> 0 resilvered, 0.00% done
> config:
> 
> NAME STATE READ WRITE CKSUM
> hcp03DEGRADED 0 0 0
>   raidz2-0   DEGRADED 0 0 0
> c0t5000C500846F161Fd0ONLINE   0 0 0
> spare-1  UNAVAIL  0 0 0
>   5676922542927845170UNAVAIL  0 0 0  was 
> /dev/dsk/c0t5000C5008473DBF3d0s0
>   c0t5000C500846F1823d0  ONLINE   0 0 0
> c0t5000C500846F134Fd0ONLINE   0 0 0
> c0t5000C500846F139Fd0ONLINE   0 0 0
> c0t5000C5008473B89Fd0ONLINE   0 0 0
> c0t5000C500846F145Bd0ONLINE   0 0 0
> c0t5000C5008473B6BBd0ONLINE   0 0 0
> c0t5000C500846F131Fd0ONLINE   0 0 0
>   raidz2-1   ONLINE   0 0 0
> c0t5000C5008473BB63d0ONLINE   0 0 0
> c0t5000C5008473C9C7d0ONLINE   0 0 0
> c0t5000C500846F1A17d0ONLINE   0 0 0
> c0t5000C5008473A0A3d0ONLINE   0 0 0
> c0t5000C5008473D047d0ONLINE   0 0 0
> c0t5000C5008473BF63d0ONLINE   0 0 0
> c0t5000C5008473BC83d0ONLINE   0 0 0
> c0t5000C5008473E35Bd0ONLINE   0 0 0
>   raidz2-2   ONLINE   0 0 0
> c0t5000C5008473ABAFd0ONLINE   0 0 0
> c0t5000C5008473ADF3d0ONLINE   0 0 0
> c0t5000C5008473AE77d0ONLINE   0 0 0
> c0t5000C5008473A23Bd0ONLINE   0 0 0
> c0t5000C5008473C907d0ONLINE   0 0 0
> c0t5000C5008473CCABd0ONLINE   0 0 0
> c0t5000C5008473C77Fd0ONLINE   0 0 0
> c0t5000C5008473B6D3d0ONLINE   0 0 0
>   raidz2-3   ONLINE   0 0 0
> c0t5000C5008473E4FFd0ONLINE   0 0 0
> c0t5000C5008473ECFFd0ONLINE   0 0 0
> c0t5000C5008473F4C3d0ONLINE   0 0 0
> c0t5000C5008473F8CFd0ONLINE   0 0 0
> c0t5000C500846F1897d0ONLINE   0 0 0
> c0t5000C500846F14B7d0ONLINE   0 0 0
> c0t5000C500846F1353d0ONLINE   0 0 0
> c0t5000C5008473EEDFd0ONLINE   0 0 0
>   raidz2-4   ONLINE   0 0 0
> c0t5000C500846F144Bd0ONLINE   0 0 0
> c0t5000C5008473F10Fd0ONLINE   0 0 0
> c0t5000C500846F15CBd0ONLINE   0 0 0
> c0t5000C500846F1493d0ONLINE   0 0 0
> c0t5000C5008473E26Fd0ONLINE   0 0 0
> c0t5000C500846F1A0Bd0ONLINE   0 0 0
> c0t5000C5008473EE07d0ONLINE   0 0 0
> c0t5000C500846F1453d0ONLINE   0 0 0
>   raidz2-5   ONLINE   0 0 0
> c0t5000C500846F153Bd0ONLINE   0 0 0
> c0t5000C5008473F9EBd0ONLINE   0 0 0
> c0t5000C500846F14EFd0ONLINE   0 0 0
> c0t5000C5008473AB0Bd0ONLINE   0 0 0
> c0t5000C500846F140Bd0ONLINE   0 0 0
> c0t5000C5008473FC0Fd0ONLINE   0 0 0
> c0t5000C5008473DFA3d0ONLINE   0 0 0
> c0t5000C5008473F89Bd0ONLINE   0 0 0
>   raidz2-6   ONLINE   0 0 0
> c0t5000C500846F19BFd0ONLINE  

Re: [OmniOS-discuss] zdb doesn't find a pool

2017-04-17 Thread Richard Elling

> On Apr 15, 2017, at 7:31 PM, Dan McDonald  wrote:
> 
> That woz was the result of zpool split is news to me (or I missed it, in 
> which case I apologize).
> 
> I wonder if a toy test with files ala the zfs test suite can reproduce this?
> 
> - create 3-way mirror
> - zdb
> - split one disk
> - zdb original and split-created pool

hint: when provided a pool name, zdb looks in /etc/zfs/zpool.cache to determine 
the
poolname to devices mapping. Using the -e (exported) option causes zdb to read 
ZFS labels on the disks to locate the devices mapping. Since zdb runs in 
userland and
can be run by non-root users, it has no knowledge of internal kernel state. 
"zdb -C"
will pretty-print the packed nvlist cachefile for your viewing pleasure
 -- richard

> 
> Adding illumos zfs list.
> 
> Dan
> 
> Sent from my iPhone (typos, autocorrect, and all)
> 
>> On Apr 15, 2017, at 9:19 PM, Michael Mounteney  
>> wrote:
>> 
>> Hello and apology to Dan to whom I've already mentioned this matter on
>> IRC.
>> 
>> Summary:  zdb doesn't see a pool specified by name.
>> 
>> My (home) server has three pools so:
>> 
>> 
>> # zpool status
>> pool: rpool
>> state: ONLINE
>> scan: scrub repaired 0 in 0h2m with 0 errors on Sun Feb 19 14:01:41
>> 2017 config:
>> 
>>   NAMESTATE READ WRITE CKSUM
>>   rpool   ONLINE   0 0 0
>> c2t0d0s0  ONLINE   0 0 0
>> 
>> errors: No known data errors
>> 
>> pool: vault
>> state: ONLINE
>> scan: none requested
>> config:
>> 
>>   NAME  STATE READ WRITE CKSUM
>>   vault ONLINE   0 0 0
>> raidz2-0ONLINE   0 0 0
>>   c2t1d0s0  ONLINE   0 0 0
>>   c2t2d0s0  ONLINE   0 0 0
>>   c2t3d0s0  ONLINE   0 0 0
>>   c2t5d0s0  ONLINE   0 0 0
>> 
>> errors: No known data errors
>> 
>> pool: woz
>> state: ONLINE
>> scan: resilvered 359G in 8h21m with 0 errors on Fri Mar 17 03:16:51
>> 2017 config:
>> 
>>   NAMESTATE READ WRITE CKSUM
>>   woz ONLINE   0 0 0
>> c2t4d0s0  ONLINE   0 0 0
>> 
>> errors: No known data errors
>> 
>> 
>> However, zdb won't see the pool 'woz':
>> 
>> 
>> # zdb vault | head -4
>> 
>> Cached configuration:
>>   version: 5000
>>   name: 'vault'
>> # zdb woz
>> zdb: can't open 'woz': No such file or directory
>> # zdb -l /dev/rdsk/c2t4d0s0 | head -6
>> 
>> LABEL 0
>> 
>>   version: 5000
>>   name: 'woz'
>>   state: 0
>> 
>> 
>> So zdb finds pool 'vault' alright but not 'woz'.  It will however see
>> 'woz' if it's referred-to by disk.  The difference which I think might
>> be crucial is that 'vault' was created via zpool create, whereas 'woz'
>> was created via zpool split.  In the full output of zdb
>> -l /dev/rdsk/c2t4d0s0 (omitted here for brevity), there are four labels
>> numbered 0 to 3.  The output is much shorter as well;  it doesn't list
>> the individual file objects as zdb vault does.
>> 
>> Is this worth a mention on https://www.illumos.org/issues ?
>> 
>> __
>> Michael Mounteney
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZeusRAM - predictive failure

2017-04-11 Thread Richard Elling

> On Apr 10, 2017, at 4:30 PM, Machine Man  wrote:
> 
> Do you select drives based on DWPD?

Not really. Inside a given product line, the difference in DWPD is a matter of 
overprovisioning.
You can adjust the overprovisioning yourself, if needed.

note to the lurkers, overprovisioning also impacts the write performance of 
garbage collection

> I am struggling to $500 - $700 drives in stock. I am limited to a number of 
> distributors and pretty much unless its HP, Cisco or Dell its not kept in 
> stock. On a number of disks options I got a ship date of late June and all 3 
> distributors  indicating SSD drives are constrained. 

Yes, there is a global shortage and all major vendors are on allocation.

> I am now down to adding a single SSD during busy hours or when the alerts 
> start rolling in and removing the ZIL afterhours or when the load reduces 
> again.
> 
> My only other options for the next 3 weeks are:
> 1 - add 15K drives for ZIL and see if that helps.
> 2 - Hope for the best on the single old OCS Talos 2

I have bad luck with these

> 3 - Mix SAS/SATA on the same backplane.

No guarantees, but for more modern expanders and HBAs, we see fewer problems 
mixing.
I wouldn’t attempt for 3G SAS/SATA, but 12G seems more robust.
 — richard


> 
> I was 100% banking on the ZeusRAM since that is what I could get my hands 
> immediately.
> From: Richard Elling 
> Sent: Monday, April 10, 2017 5:49:55 PM
> To: Machine Man
> Cc: omnios-discuss@lists.omniti.com
> Subject: Re: [OmniOS-discuss] ZeusRAM - predictive failure
>  
> 
>> On Apr 10, 2017, at 2:39 PM, Machine Man > <mailto:gearbo...@outlook.com>> wrote:
>> 
>> Thank you. I am sending it back to where we purchased it from. I thought 
>> these were no longer avail, but the distributor still listed them and had in 
>> stock.
>> I was hesitant to purchase, but I am in desperate need for a ZIL. 
> 
> ZeusRAMs have been EOL for a year or more. AIUI, the parts are no longer 
> available to build them.
> We do see better performance from the modern, enterprise-class, 12G SAS parts 
> from HGST and Toshiba.
> Unfortunately, they are priced by $/GB and not $/latency, so the smaller 
> capacity (GB) drives are also slower.
>  — richard
> 
>> 
>> 
>> From: Richard Elling > <mailto:richard.ell...@richardelling.com>>
>> Sent: Monday, April 10, 2017 4:15:32 PM
>> To: Machine Man
>> Cc: omnios-discuss@lists.omniti.com <mailto:omnios-discuss@lists.omniti.com>
>> Subject: Re: [OmniOS-discuss] ZeusRAM - predictive failure
>>  
>> 
>>> On Apr 10, 2017, at 1:00 PM, Machine Man >> <mailto:gearbo...@outlook.com>> wrote:
>>> 
>>> Today I received one of the ZeusRAM that I ordered, both brand new. I was 
>>> struggling to find SAS SSD drives that were available in my price range as 
>>> I desperately need to add a ZIL. 
>>> I decided to order ZeusRAM since they had one in stock and figured I'll add 
>>> it while waiting for the other one as they are really should not be prone 
>>> to failure based on design. I have not used them and would normally just 
>>> prefer to use regular SSD drives.
>>> 
>>> Slotted ZeusRAM in and it began to rapidly blink the same as the disks that 
>>> are currently in the pool on that backplain. Running the command format 
>>> would never return with a list of disks. I left it for about 15 min and 
>>> pulled it since it says on the disk that it can take up to 10 min for the 
>>> caps. I could see there is an amber and green LED on the drive itself 
>>> blinking, even when removed.
>>> I slotted it back in and the disk was then available. After a few min the 
>>> fault light cam on and the disk was unavailable due to the following:
>>> 
>>> Fault class : fault.io.disk.predictive-failure
>> 
>> This occurs when the drive responds to an I/O and indicates a predictive 
>> failure or
>> the periodic query for drives sees a predicted failure. It is the drive 
>> telling the OS that
>> the drive thinks it will fail. There is nothing you can do on the OS to 
>> “fix” this.
>> 
>> It is possible that HGST (nee STEC) can help with further diagnosis using 
>> the vendor-specific
>> log pages. Several years ago, STEC helped us with root cause of failing 
>> ultracapacitor in a drive.
>> AFAIK, there is no publicly available decoder for those log pages.
>>  — richard
>> 
>> 
>>> Affects : 
>>> dev:///:devid=id1,sd@n5000a720300b3d57//pci@0,0/pci8086,340e@7/pci1000,3040@0/iport@f0/disk@w5000a72a300b3d57,0
>>>  
>

Re: [OmniOS-discuss] ZeusRAM - predictive failure

2017-04-10 Thread Richard Elling

> On Apr 10, 2017, at 2:39 PM, Machine Man  wrote:
> 
> Thank you. I am sending it back to where we purchased it from. I thought 
> these were no longer avail, but the distributor still listed them and had in 
> stock.
> I was hesitant to purchase, but I am in desperate need for a ZIL. 

ZeusRAMs have been EOL for a year or more. AIUI, the parts are no longer 
available to build them.
We do see better performance from the modern, enterprise-class, 12G SAS parts 
from HGST and Toshiba.
Unfortunately, they are priced by $/GB and not $/latency, so the smaller 
capacity (GB) drives are also slower.
 — richard

> 
> 
> From: Richard Elling 
> Sent: Monday, April 10, 2017 4:15:32 PM
> To: Machine Man
> Cc: omnios-discuss@lists.omniti.com
> Subject: Re: [OmniOS-discuss] ZeusRAM - predictive failure
>  
> 
>> On Apr 10, 2017, at 1:00 PM, Machine Man > <mailto:gearbo...@outlook.com>> wrote:
>> 
>> Today I received one of the ZeusRAM that I ordered, both brand new. I was 
>> struggling to find SAS SSD drives that were available in my price range as I 
>> desperately need to add a ZIL. 
>> I decided to order ZeusRAM since they had one in stock and figured I'll add 
>> it while waiting for the other one as they are really should not be prone to 
>> failure based on design. I have not used them and would normally just prefer 
>> to use regular SSD drives.
>> 
>> Slotted ZeusRAM in and it began to rapidly blink the same as the disks that 
>> are currently in the pool on that backplain. Running the command format 
>> would never return with a list of disks. I left it for about 15 min and 
>> pulled it since it says on the disk that it can take up to 10 min for the 
>> caps. I could see there is an amber and green LED on the drive itself 
>> blinking, even when removed.
>> I slotted it back in and the disk was then available. After a few min the 
>> fault light cam on and the disk was unavailable due to the following:
>> 
>> Fault class : fault.io.disk.predictive-failure
> 
> This occurs when the drive responds to an I/O and indicates a predictive 
> failure or
> the periodic query for drives sees a predicted failure. It is the drive 
> telling the OS that
> the drive thinks it will fail. There is nothing you can do on the OS to “fix” 
> this.
> 
> It is possible that HGST (nee STEC) can help with further diagnosis using the 
> vendor-specific
> log pages. Several years ago, STEC helped us with root cause of failing 
> ultracapacitor in a drive.
> AFAIK, there is no publicly available decoder for those log pages.
>  — richard
> 
> 
>> Affects : 
>> dev:///:devid=id1,sd@n5000a720300b3d57//pci@0,0/pci8086,340e@7/pci1000,3040@0/iport@f0/disk@w5000a72a300b3d57,0
>>  
>> 
>>   faulted and taken out of service
>> FRU : "Slot 09" 
>> (hc://:product-id=LSI-SAS2X36:server-id=:chassis-id=50030480178cf57f:serial=STM000:part=STEC-ZeusRAM:revision=C025/ses-enclosure=1/bay=8/disk=0
>>  
>> )
>>   faulty
>> Description : SMART health-monitoring firmware reported that a disk
>>   failure is imminent.
>> 
>> 
>> I cleared the fault and the drive was then usable again for a few min same 
>> thing happened. Eventually the amber light on the disk itself (not the 
>> enclosure disk light) no longer blinked and the disks was online for quite 
>> some time before the alert above reappeared.
>> 
>> 
>> === START OF INFORMATION SECTION ===
>> Vendor:   STEC
>> Product:  ZeusRAM
>> Revision: C025
>> Compliance:   SPC-4
>> User Capacity:8,000,000,000 bytes [8.00 GB]
>> Logical block size:   512 bytes
>> Rotation Rate:Solid State Device
>> Form Factor:  3.5 inches
>> Logical Unit id:  0x5000a720300b3d57
>> Serial number:STM000**
>> Device type:  disk
>> Transport protocol:   SAS (SPL-3)
>> Local Time is:Mon Apr 10 19:17:23 2017 UTC
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>> Temperature Warning:  Enabled
>> === START OF READ SMART DATA SECTION ===
>> SMART Health Status: OK
>> Current Drive Temperature: 40 C
>> Drive Trip Temperature:80 C
>> Elements in grown defect list: 0
>> Vendor (Seagate) cache information
>>   Blocks sent to initiator = 0
>>   Blocks sent to initiator = 0
>> Error counter log:
>>Errors Corrected by   Total   Correction Gigabytes
>> Total
>>   

Re: [OmniOS-discuss] ZeusRAM - predictive failure

2017-04-10 Thread Richard Elling

> On Apr 10, 2017, at 1:00 PM, Machine Man  wrote:
> 
> Today I received one of the ZeusRAM that I ordered, both brand new. I was 
> struggling to find SAS SSD drives that were available in my price range as I 
> desperately need to add a ZIL. 
> I decided to order ZeusRAM since they had one in stock and figured I'll add 
> it while waiting for the other one as they are really should not be prone to 
> failure based on design. I have not used them and would normally just prefer 
> to use regular SSD drives.
> 
> Slotted ZeusRAM in and it began to rapidly blink the same as the disks that 
> are currently in the pool on that backplain. Running the command format would 
> never return with a list of disks. I left it for about 15 min and pulled it 
> since it says on the disk that it can take up to 10 min for the caps. I could 
> see there is an amber and green LED on the drive itself blinking, even when 
> removed.
> I slotted it back in and the disk was then available. After a few min the 
> fault light cam on and the disk was unavailable due to the following:
> 
> Fault class : fault.io.disk.predictive-failure

This occurs when the drive responds to an I/O and indicates a predictive 
failure or
the periodic query for drives sees a predicted failure. It is the drive telling 
the OS that
the drive thinks it will fail. There is nothing you can do on the OS to “fix” 
this.

It is possible that HGST (nee STEC) can help with further diagnosis using the 
vendor-specific
log pages. Several years ago, STEC helped us with root cause of failing 
ultracapacitor in a drive.
AFAIK, there is no publicly available decoder for those log pages.
 — richard


> Affects : 
> dev:///:devid=id1,sd@n5000a720300b3d57//pci@0,0/pci8086,340e@7/pci1000,3040@0/iport@f0/disk@w5000a72a300b3d57,0
>  
> 
>   faulted and taken out of service
> FRU : "Slot 09" 
> (hc://:product-id=LSI-SAS2X36:server-id=:chassis-id=50030480178cf57f:serial=STM000:part=STEC-ZeusRAM:revision=C025/ses-enclosure=1/bay=8/disk=0
>  
> )
>   faulty
> Description : SMART health-monitoring firmware reported that a disk
>   failure is imminent.
> 
> 
> I cleared the fault and the drive was then usable again for a few min same 
> thing happened. Eventually the amber light on the disk itself (not the 
> enclosure disk light) no longer blinked and the disks was online for quite 
> some time before the alert above reappeared.
> 
> 
> === START OF INFORMATION SECTION ===
> Vendor:   STEC
> Product:  ZeusRAM
> Revision: C025
> Compliance:   SPC-4
> User Capacity:8,000,000,000 bytes [8.00 GB]
> Logical block size:   512 bytes
> Rotation Rate:Solid State Device
> Form Factor:  3.5 inches
> Logical Unit id:  0x5000a720300b3d57
> Serial number:STM000**
> Device type:  disk
> Transport protocol:   SAS (SPL-3)
> Local Time is:Mon Apr 10 19:17:23 2017 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> Temperature Warning:  Enabled
> === START OF READ SMART DATA SECTION ===
> SMART Health Status: OK
> Current Drive Temperature: 40 C
> Drive Trip Temperature:80 C
> Elements in grown defect list: 0
> Vendor (Seagate) cache information
>   Blocks sent to initiator = 0
>   Blocks sent to initiator = 0
> Error counter log:
>Errors Corrected by   Total   Correction Gigabytes
> Total
>ECC  rereads/errors   algorithm  processed
> uncorrected
>fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  
> errors
> read:  00 0 0  0 21.323   
> 0
> write: 00 0 0  0 83.809   
> 0
> Non-medium error count:0
> 
> 
> 
> Is there anything special that should be done for ZeusRAM in sd.conf? Its a 
> node install and both nodes can see all the drives. I don't see any smart 
> errors listed, but running fmadm it will show the disk as faulty due to 
> predictive failure.
> OmniOS r20 all patches applied.
> 
> 
> thanks,  
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com 
> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
> 
--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] nfsv3rwsnoop.d lists NFS writes to files

2017-03-22 Thread Richard Elling

> On Mar 22, 2017, at 12:54 AM, Richard Skelton  wrote:
> 
> Hi Richard,
> The nfsv3rwsnoop.d script shows to offset to the write as 0 is this then not 
> a new file?

not necessarily

> TIME(us) CLIENT OP OFFSETBYTES PATHNAME
> 20811423544560   xxx.xx.xxx.21   W 0   116 
> 20811423525741   xxx.xx.xxx.50   W 0  5232 
> 20811423528311   xxx.xx.xxx.50   W 0   304 
> 20811423529251   xxx.xx.xxx.50   W 0   376 
> 20811423555753   xxx.xx.xxx.18   R 0   685 
> If they are files which have been deleted they must be very short lived files 
> ?

not necessarily

If you want auditing, you need auditing tools and dtrace is not the right tool
 -- richard

> 
> Richard Elling wrote:
>> 
>> 
>>> On Mar 21, 2017, at 3:54 PM, Richard Skelton >> <mailto:skelt...@btconnect.com>> wrote:
>>> 
>>> Hi,
>>> I am using the dtrace script nfsv3rwsnoop.d to find file that are accessed 
>>> from my OmniOS r151020 filer and some file names are listed as unknown :-(
>>> I guess they are files that have been open for a long time and have dropped 
>>> out of some data structure.
>> 
>> almost... they are files that were open prior to the dtrace script running 
>> or they are files
>> which have been deleted (!), such that there is no mapping between the nfs 
>> file handle and
>> the current file system
>> 
>>> Is there any way to increase the persistence of the name stored.
>>> I have lots on memory in this system and would be happy to sacrifice some 
>>> if I could see more file name :-)
>> 
>> depending on what you wish to accomplish, dtrace might be the wrong tool and 
>> you might
>> want auditing or NFS logging instead 
>>  — richard
>> 
>>> 
>>> root@filer:/scratch# /root/dtrace/nfsv3rwsnoop.d |more
>>> 1189849649391xxx.xx.xxx.59   W 10500138879 
>>> 1189849649582xxx.xx.xxx.59   W 10500137788 
>>> 1189849740621xxx.xx.xxx.118  W 0  2404 
>>> 1189849781136xxx.xx.xxx.109  W 19832756675 /scratch/run.log
>>> 1189849849301xxx.xx.xxx.102  W 1096  57513 
>>> /scratch/avm_remote_job_f64ccf56-efa1-4605-8a48-874816779289_2.out
>>> Cheers
>>> Richard
>>> 
>>> 
>>> ___
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss@lists.omniti.com <mailto:OmniOS-discuss@lists.omniti.com>
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
>>> <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>> 

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] nfsv3rwsnoop.d lists NFS writes to files

2017-03-21 Thread Richard Elling

> On Mar 21, 2017, at 3:54 PM, Richard Skelton  wrote:
> 
> Hi,
> I am using the dtrace script nfsv3rwsnoop.d to find file that are accessed 
> from my OmniOS r151020 filer and some file names are listed as unknown :-(
> I guess they are files that have been open for a long time and have dropped 
> out of some data structure.

almost... they are files that were open prior to the dtrace script running or 
they are files
which have been deleted (!), such that there is no mapping between the nfs file 
handle and
the current file system

> Is there any way to increase the persistence of the name stored.
> I have lots on memory in this system and would be happy to sacrifice some if 
> I could see more file name :-)

depending on what you wish to accomplish, dtrace might be the wrong tool and 
you might
want auditing or NFS logging instead 
 — richard

> 
> root@filer:/scratch# /root/dtrace/nfsv3rwsnoop.d |more
> 1189849649391xxx.xx.xxx.59   W 10500138879 
> 1189849649582xxx.xx.xxx.59   W 10500137788 
> 1189849740621xxx.xx.xxx.118  W 0  2404 
> 1189849781136xxx.xx.xxx.109  W 19832756675 /scratch/run.log
> 1189849849301xxx.xx.xxx.102  W 1096  57513 
> /scratch/avm_remote_job_f64ccf56-efa1-4605-8a48-874816779289_2.out
> Cheers
> Richard
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] DTrace Scripts

2017-03-11 Thread Richard Elling

> On Mar 9, 2017, at 10:06 AM, John Barfield  wrote:
> 
> Im looking for some general dtrace scripts for debugging ZFS on OmniOS (like 
> updated dtrace toolkit)..didnt want to reinvent the wheel if some folks are 
> willing to share. Also willing to purchase if needed.

Having written a number of these over the years, most are constructed to answer 
specific
questions. Those questions that we feel become routine, usually get converted 
to kstats,
which is a better method for long-term collection.

That said, I've got a few that are more generally useful when debugging some 
performance
issues. But they aren't documented, so a little work is required to get them 
documented.
 -- richard

> 
> John Barfield
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-30 Thread Richard Elling

> On Jan 29, 2017, at 3:10 AM, Stephan Budach  wrote:
> 
> Hi,
> 
> just to wrap this up… I decided to go with 15 additional LUNs on each storage 
> zpool, to avoid zfs complainign about replication mismatches. I know, I cluld 
> have done otherwise, but it somehow felt better this way.
> 
> After all three underlying zpools were "pimped", I was able to mount the 
> problematic zpool in my S11.1 host without any issue. It just took a coulpe 
> of seconds and zfs reported approx 2.53MB resilvered…
> 
> Now, there's a scrub running on that zpool tnat is just happily humming away 
> on the data.
> 
> Thanks for all the input, everyone.

may all your scrubs complete cleanly :-)
 — richard

> 
> Stephan 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-26 Thread Richard Elling

> On Jan 26, 2017, at 12:20 AM, Stephan Budach  wrote:
> 
> Hi Richard,
> 
> gotcha… read on, below…

"thin provisioning" bit you. For "thick provisioning" you’ll have a 
refreservation and/or reservation.
 — richard

> 
> Am 26.01.17 um 00:43 schrieb Richard Elling:
>> more below…
>> 
>>> On Jan 25, 2017, at 3:01 PM, Stephan Budach >> <mailto:stephan.bud...@jvm.de>> wrote:
>>> 
>>> Ooops… should have waited with sending that message after I rebootet the 
>>> S11.1 host…
>>> 
>>> 
>>> Am 25.01.17 um 23:41 schrieb Stephan Budach:
>>>> Hi Richard,
>>>> 
>>>> Am 25.01.17 um 20:27 schrieb Richard Elling:
>>>>> Hi Stephan,
>>>>> 
>>>>>> On Jan 25, 2017, at 5:54 AM, Stephan Budach >>>>> <mailto:stephan.bud...@jvm.de>> wrote:
>>>>>> 
>>>>>> Hi guys,
>>>>>> 
>>>>>> I have been trying to import a zpool, based on a 3way-mirror provided by 
>>>>>> three omniOS boxes via iSCSI. This zpool had been working flawlessly 
>>>>>> until some random reboot of the S11.1 host. Since then, S11.1 has been 
>>>>>> importing this zpool without success.
>>>>>> 
>>>>>> This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… yeah 
>>>>>> I know, we shouldn't have done that in the first place, but performance 
>>>>>> was not the primary goal for that, as this one is a backup/archive pool.
>>>>>> 
>>>>>> When issueing a zpool import, it says this:
>>>>>> 
>>>>>> root@solaris11atest2:~# zpool import
>>>>>>   pool: vsmPool10
>>>>>> id: 12653649504720395171
>>>>>>  state: DEGRADED
>>>>>> status: The pool was last accessed by another system.
>>>>>> action: The pool can be imported despite missing or damaged devices.  The
>>>>>> fault tolerance of the pool may be compromised if imported.
>>>>>>see: http://support.oracle.com/msg/ZFS-8000-EY 
>>>>>> <http://support.oracle.com/msg/ZFS-8000-EY>
>>>>>> config:
>>>>>> 
>>>>>> vsmPool10  DEGRADED
>>>>>>   mirror-0 DEGRADED
>>>>>> c0t600144F07A350658569398F60001d0  DEGRADED  corrupted 
>>>>>> data
>>>>>> c0t600144F07A35066C5693A0D90001d0  DEGRADED  corrupted 
>>>>>> data
>>>>>> c0t600144F07A35001A5693A2810001d0  DEGRADED  corrupted 
>>>>>> data
>>>>>> 
>>>>>> device details:
>>>>>> 
>>>>>> c0t600144F07A350658569398F60001d0DEGRADED 
>>>>>> scrub/resilver needed
>>>>>> status: ZFS detected errors on this device.
>>>>>> The device is missing some data that is recoverable.
>>>>>> 
>>>>>> c0t600144F07A35066C5693A0D90001d0DEGRADED 
>>>>>> scrub/resilver needed
>>>>>> status: ZFS detected errors on this device.
>>>>>> The device is missing some data that is recoverable.
>>>>>> 
>>>>>> c0t600144F07A35001A5693A2810001d0DEGRADED 
>>>>>> scrub/resilver needed
>>>>>> status: ZFS detected errors on this device.
>>>>>> The device is missing some data that is recoverable.
>>>>>> 
>>>>>> However, when  actually running zpool import -f vsmPool10, the system 
>>>>>> starts to perform a lot of writes on the LUNs and iostat report an 
>>>>>> alarming increase in h/w errors:
>>>>>> 
>>>>>> root@solaris11atest2:~# iostat -xeM 5
>>>>>>  extended device statistics  errors 
>>>>>> ---
>>>>>> devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn 
>>>>>> tot
>>>>>> sd0   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0  
>>>>>>  0
>>>>>> sd1   0.00.00.00.0  0.0  0.00.0   0   0   0 

Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-25 Thread Richard Elling
more below…

> On Jan 25, 2017, at 3:01 PM, Stephan Budach  wrote:
> 
> Ooops… should have waited with sending that message after I rebootet the 
> S11.1 host…
> 
> 
> Am 25.01.17 um 23:41 schrieb Stephan Budach:
>> Hi Richard,
>> 
>> Am 25.01.17 um 20:27 schrieb Richard Elling:
>>> Hi Stephan,
>>> 
>>>> On Jan 25, 2017, at 5:54 AM, Stephan Budach >>> <mailto:stephan.bud...@jvm.de>> wrote:
>>>> 
>>>> Hi guys,
>>>> 
>>>> I have been trying to import a zpool, based on a 3way-mirror provided by 
>>>> three omniOS boxes via iSCSI. This zpool had been working flawlessly until 
>>>> some random reboot of the S11.1 host. Since then, S11.1 has been importing 
>>>> this zpool without success.
>>>> 
>>>> This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… yeah I 
>>>> know, we shouldn't have done that in the first place, but performance was 
>>>> not the primary goal for that, as this one is a backup/archive pool.
>>>> 
>>>> When issueing a zpool import, it says this:
>>>> 
>>>> root@solaris11atest2:~# zpool import
>>>>   pool: vsmPool10
>>>> id: 12653649504720395171
>>>>  state: DEGRADED
>>>> status: The pool was last accessed by another system.
>>>> action: The pool can be imported despite missing or damaged devices.  The
>>>> fault tolerance of the pool may be compromised if imported.
>>>>see: http://support.oracle.com/msg/ZFS-8000-EY 
>>>> <http://support.oracle.com/msg/ZFS-8000-EY>
>>>> config:
>>>> 
>>>> vsmPool10  DEGRADED
>>>>   mirror-0 DEGRADED
>>>> c0t600144F07A350658569398F60001d0  DEGRADED  corrupted data
>>>> c0t600144F07A35066C5693A0D90001d0  DEGRADED  corrupted data
>>>> c0t600144F07A35001A5693A2810001d0  DEGRADED  corrupted data
>>>> 
>>>> device details:
>>>> 
>>>> c0t600144F07A350658569398F60001d0DEGRADED 
>>>> scrub/resilver needed
>>>> status: ZFS detected errors on this device.
>>>> The device is missing some data that is recoverable.
>>>> 
>>>> c0t600144F07A35066C5693A0D90001d0DEGRADED 
>>>> scrub/resilver needed
>>>> status: ZFS detected errors on this device.
>>>> The device is missing some data that is recoverable.
>>>> 
>>>> c0t600144F07A35001A5693A2810001d0DEGRADED 
>>>> scrub/resilver needed
>>>> status: ZFS detected errors on this device.
>>>> The device is missing some data that is recoverable.
>>>> 
>>>> However, when  actually running zpool import -f vsmPool10, the system 
>>>> starts to perform a lot of writes on the LUNs and iostat report an 
>>>> alarming increase in h/w errors:
>>>> 
>>>> root@solaris11atest2:~# iostat -xeM 5
>>>>  extended device statistics  errors ---
>>>> devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
>>>> sd0   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
>>>> sd1   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
>>>> sd2   0.00.00.00.0  0.0  0.00.0   0   0   0  71   0  71
>>>> sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
>>>> sd4   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
>>>> sd5   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
>>>>  extended device statistics  errors ---
>>>> devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
>>>> sd0  14.2  147.30.70.4  0.2  0.12.0   6   9   0   0   0   0
>>>> sd1  14.28.40.40.0  0.0  0.00.3   0   0   0   0   0   0
>>>> sd2   0.04.20.00.0  0.0  0.00.0   0   0   0  92   0  92
>>>> sd3 157.3   46.22.10.2  0.0  0.73.7   0  14   0  30   0  30
>>>> sd4 123.9   29.41.60.1  0.0  1.7   10.9   0  36   0  40   0  40
>>>> sd5 142.5   43.02.00.1  0.0  1.9   10

Re: [OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

2017-01-25 Thread Richard Elling
Hi Stephan,

> On Jan 25, 2017, at 5:54 AM, Stephan Budach  wrote:
> 
> Hi guys,
> 
> I have been trying to import a zpool, based on a 3way-mirror provided by 
> three omniOS boxes via iSCSI. This zpool had been working flawlessly until 
> some random reboot of the S11.1 host. Since then, S11.1 has been importing 
> this zpool without success.
> 
> This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… yeah I 
> know, we shouldn't have done that in the first place, but performance was not 
> the primary goal for that, as this one is a backup/archive pool.
> 
> When issueing a zpool import, it says this:
> 
> root@solaris11atest2:~# zpool import
>   pool: vsmPool10
> id: 12653649504720395171
>  state: DEGRADED
> status: The pool was last accessed by another system.
> action: The pool can be imported despite missing or damaged devices.  The
> fault tolerance of the pool may be compromised if imported.
>see: http://support.oracle.com/msg/ZFS-8000-EY 
> 
> config:
> 
> vsmPool10  DEGRADED
>   mirror-0 DEGRADED
> c0t600144F07A350658569398F60001d0  DEGRADED  corrupted data
> c0t600144F07A35066C5693A0D90001d0  DEGRADED  corrupted data
> c0t600144F07A35001A5693A2810001d0  DEGRADED  corrupted data
> 
> device details:
> 
> c0t600144F07A350658569398F60001d0DEGRADED 
> scrub/resilver needed
> status: ZFS detected errors on this device.
> The device is missing some data that is recoverable.
> 
> c0t600144F07A35066C5693A0D90001d0DEGRADED 
> scrub/resilver needed
> status: ZFS detected errors on this device.
> The device is missing some data that is recoverable.
> 
> c0t600144F07A35001A5693A2810001d0DEGRADED 
> scrub/resilver needed
> status: ZFS detected errors on this device.
> The device is missing some data that is recoverable.
> 
> However, when  actually running zpool import -f vsmPool10, the system starts 
> to perform a lot of writes on the LUNs and iostat report an alarming increase 
> in h/w errors:
> 
> root@solaris11atest2:~# iostat -xeM 5
>  extended device statistics  errors ---
> devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
> sd0   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
> sd1   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
> sd2   0.00.00.00.0  0.0  0.00.0   0   0   0  71   0  71
> sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
> sd4   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
> sd5   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
>  extended device statistics  errors ---
> devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
> sd0  14.2  147.30.70.4  0.2  0.12.0   6   9   0   0   0   0
> sd1  14.28.40.40.0  0.0  0.00.3   0   0   0   0   0   0
> sd2   0.04.20.00.0  0.0  0.00.0   0   0   0  92   0  92
> sd3 157.3   46.22.10.2  0.0  0.73.7   0  14   0  30   0  30
> sd4 123.9   29.41.60.1  0.0  1.7   10.9   0  36   0  40   0  40
> sd5 142.5   43.02.00.1  0.0  1.9   10.2   0  45   0  88   0  88
>  extended device statistics  errors ---
> devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
> sd0   0.0  234.50.00.6  0.2  0.11.4   6  10   0   0   0   0
> sd1   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
> sd2   0.00.00.00.0  0.0  0.00.0   0   0   0  92   0  92
> sd3   3.6   64.00.00.5  0.0  4.3   63.2   0  63   0 235   0 235
> sd4   3.0   67.00.00.6  0.0  4.2   60.5   0  68   0 298   0 298
> sd5   4.2   59.60.00.4  0.0  5.2   81.0   0  72   0 406   0 406
>  extended device statistics  errors ---
> devicer/sw/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
> sd0   0.0  234.80.00.7  0.4  0.12.2  11  10   0   0   0   0
> sd1   0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0
> sd2   0.00.00.00.0  0.0  0.00.0   0   0   0  92   0  92
> sd3   5.4   54.40.00.3  0.0  2.9   48.5   0  67   0 384   0 384
> sd4   6.0   53.40.00.3  0.0  4.6   77.7   0  87   0 519   0 519
> sd5   6.0   60.80.00.3  0.0  4.8   72.5   0  87   0 727   0 727

h/w errors are a classification of other errors. The full error list is 
available from "iostat -E" and will
be important to tracking this down.

A better, more detailed analysis can be g

Re: [OmniOS-discuss] PCI-e x8 only running x4

2017-01-23 Thread Richard Elling

> On Jan 23, 2017, at 5:17 PM, Michael Rasmussen  wrote:
> 
> On Mon, 23 Jan 2017 19:27:51 -0500
> Dale Ghent  wrote:
> 
>> 
>> Check your BIOS settings for the slots, they might be limited there. I also 
>> note that this motherboard model has two physical x8 PCIe slots but one of 
>> them is wired for 4 lanes only. Is your card in that slot? Something not 
>> linking up like this is likely to be more of a hardware configuration issue.
>> 
> Doesn't this proof an x8 slot? LnkCap: Port #0, Speed 2.5GT/s, Width x8
> I was aware of that one of the x8 slot was actually a x4 slot so I am
> quite certain I have picked the right one.
> 
> It couldn't be as simple as since there is no disks connected to the
> second channel so the 4 lanes for the second channel is not active?
> 
>> Also keep in mind that the 1068E card is quite old and its driver support in 
>> OmniOS/illumos is still the closed-source mpt driver binary, so there is not 
>> much room here.
>> 
> I am aware of that and an upgrade to a 2x08 is on the todo list.

This card can’t saturate a modern x4, so you gain nothing going to x8.
 — richard

> 
> -- 
> Hilsen/Regards
> Michael Rasmussen
> 
> Get my public GnuPG keys:
> michael  rasmussen  cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir  datanom  net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir  miras  org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --
> /usr/games/fortune -es says:
> Tis man's perdition to be safe, when for the truth he ought to die.
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Backup script (one local, multiple remote snapshots)

2017-01-21 Thread Richard Elling


> On Jan 21, 2017, at 8:07 AM, Dale Ghent  wrote:
> 
> We developed Zetaback for this.

+1 for zetaback. There are perhaps hundreds of implementations of this over the 
years. I think you'll find that zetaback is one of the best designs.

  -- richard

> As for how you exactly want your snapshots to be in number and how long they 
> should stay around, you might be able to configure a backup policy which 
> covers that.
> 
> https://github.com/omniti-labs/zetaback
> 
> The documentation is perdoc within the zetaback script.
> 
> /dale
> 
>> On Jan 21, 2017, at 8:33 AM, Matej Žerovnik  wrote:
>> 
>> Hello,
>> 
>> I would like to implement backup for one of my servers with zfs send/recv. 
>> My scenario would be the following:
>> 
>> For each dataset:
>> - keep one daily snapshot on src server
>> - copy daily snapshot from src server to backup server
>> - on backup server, I would like to have one daily, one weekly and one 
>> monthly snapshots
>> 
>> I checked out some ZFS backup scripts I found on GitHub, but non of them did 
>> that I wanted (znapzend came close, but it keeps too many snapshots for my 
>> case)
>> 
>> Does anyone has anything like that implementented? If yes, would be willing 
>> to share?
>> 
>> Thanks, Matej
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Understanding OmniOS disk IO timeouts and options to control them

2017-01-04 Thread Richard Elling
one more thing…

> On Jan 4, 2017, at 10:29 AM, Richard Elling 
>  wrote:
> 
>> 
>> On Jan 4, 2017, at 10:04 AM, Chris Siebenmann  wrote:
>> 
>> We recently had a server reboot due to the ZFS vdev_deadman/spa_deadman
>> timeout timer activating and panicing the system. If you haven't heard
>> of this timer before, that's not surprising; triggering it requires an
>> IO to a vdev to take more than 1000 seconds (by default; it's controlled
>> by the value of zfs_deadman_synctime_ms, in spa_misc.c).
>> 
>> Before this happened, I would not have expected that our OmniOS system
>> allowed an IO to run that long before timing it out and returning an
>> error to ZFS. Clearly I'm wrong, which means that I'd like to understand
>> what disk IO timeouts OmniOS has and where (or if) we can control them
>> so that really long IOs get turned into forced errors well before 16
>> minutes go by. Unfortunately our disk topology is a bit complicated;
>> we have scsi_vhci multipathing on top of iSCSI disks.
> 
> Do not assume the timeout reflects properly operating software or firmware.
> The original impetus for the deadman was to allow debugging of the underlying
> stack. Prior to adding the deadman, the I/O could be stuck forever.
> 
>> 
>> In some Internet searching I've found sd_io_time (60 seconds by
>> default) and the default SD retry count of 5 (I think, it may be
>> 3), which can be adjusted on a per-disk-type basis through the
>> retries-timeout parameter (per the sd manpage). Searching the kernel
>> code suggests that there are some hard-coded timeouts in the 30 to 90
>> second range, which also doesn't seem excessive.
> 
> For sd-level, most commands follow the sd_io_time and retries. scsi_vhci adds
> significant complexity above sd and below zfs.
> — richard
> 
>> 
>> (I have a crash dump from this panic, so I can in theory use mdb
>> to look through it to see just what level an IO appears stuck at
>> if I know what to look for and how.)
>> 
>> Based on 'fmdump -eV' output, it looks like our server was
>> retrying IO repeatedly.
>> 
>> Does anyone know what I should be looking at to find and adjust
>> timeouts, retry counts, and so on? Is there any documentation that
>> I'm overlooking?

I find the "zio_state" mdb macro helpful in these cases. It shows all of the 
I/Os in the zio pipeline
and the timeout values relative to the deadman.
 — richard

>> 
>> Thanks in advance.
>> 
>>  - cks
>> PS: some links I've dug up in searches:
>>   
>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
>>   https://smartos.org/bugview/OS-2415
>>   https://www.illumos.org/issues/1553
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Understanding OmniOS disk IO timeouts and options to control them

2017-01-04 Thread Richard Elling

> On Jan 4, 2017, at 10:04 AM, Chris Siebenmann  wrote:
> 
> We recently had a server reboot due to the ZFS vdev_deadman/spa_deadman
> timeout timer activating and panicing the system. If you haven't heard
> of this timer before, that's not surprising; triggering it requires an
> IO to a vdev to take more than 1000 seconds (by default; it's controlled
> by the value of zfs_deadman_synctime_ms, in spa_misc.c).
> 
> Before this happened, I would not have expected that our OmniOS system
> allowed an IO to run that long before timing it out and returning an
> error to ZFS. Clearly I'm wrong, which means that I'd like to understand
> what disk IO timeouts OmniOS has and where (or if) we can control them
> so that really long IOs get turned into forced errors well before 16
> minutes go by. Unfortunately our disk topology is a bit complicated;
> we have scsi_vhci multipathing on top of iSCSI disks.

Do not assume the timeout reflects properly operating software or firmware.
The original impetus for the deadman was to allow debugging of the underlying
stack. Prior to adding the deadman, the I/O could be stuck forever.

> 
> In some Internet searching I've found sd_io_time (60 seconds by
> default) and the default SD retry count of 5 (I think, it may be
> 3), which can be adjusted on a per-disk-type basis through the
> retries-timeout parameter (per the sd manpage). Searching the kernel
> code suggests that there are some hard-coded timeouts in the 30 to 90
> second range, which also doesn't seem excessive.

For sd-level, most commands follow the sd_io_time and retries. scsi_vhci adds
significant complexity above sd and below zfs.
 — richard

> 
> (I have a crash dump from this panic, so I can in theory use mdb
> to look through it to see just what level an IO appears stuck at
> if I know what to look for and how.)
> 
> Based on 'fmdump -eV' output, it looks like our server was
> retrying IO repeatedly.
> 
> Does anyone know what I should be looking at to find and adjust
> timeouts, retry counts, and so on? Is there any documentation that
> I'm overlooking?
> 
> Thanks in advance.
> 
>   - cks
> PS: some links I've dug up in searches:
>
> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/
>https://smartos.org/bugview/OS-2415
>https://www.illumos.org/issues/1553
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Smartmontools - JBOD chassis

2016-12-15 Thread Richard Elling


> On Dec 15, 2016, at 11:47 AM, Dale Ghent  wrote:
> 
> 
>> On Dec 15, 2016, at 12:02 PM, Machine Man  wrote:
>> 
>> Can the smartmontools be used on OmniOS to collect JBOD enclosure status, 
>> FANs voltages etc. ?
> 
> SMARTmontools is a disk device monitoring tool that pulls SMART telemetry 
> from the actual disks themselves. What I think you're referring to is 
> something different, called SCSI Enclosure Services (SES), which to my 
> knowledge SMARTmontools doesn't really concern itself with.

fmtopo and sestopo allow you to interface with SES devices, the UI is butugly. 
I'd recommend the sg3_utils and smp_utils combo since they have a marginally 
better UI and are available on multiple OSes.


  -- richard

> 
> There is a ses driver in OmniOS, however it's fairly dated and basic; and 
> there are no utilities which directly consume it in userland. I might be 
> mistaken, but the ses driver was mainly used back in the day by unbunbled Sun 
> storage products, such as software that interfaced with the StorEDGE line of 
> JBODs and RAID arrays.
> 
> Nexenta reportedly has a much improved ses driver with userland utility to 
> query enclosure status and manipulate service lights on drive slots, but they 
> haven't upstreamed this.
> 
> /dale
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Large infolog_hival file

2016-12-06 Thread Richard Elling

> On Dec 6, 2016, at 8:16 PM, Lawrence Giam  wrote:
> 
> Hi All,
> 
> Also to note that I have another running OpenIndiana 151a7 and this is also a 
> SuperMicro server, this has the same behaviour which is the system keep 
> generating the resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast message 
> and getting logged into infolog_hival but one thing different is that the 
> logadm is doing it's job of rotating the log.
> 
> On the system running OmniOS, I have compare the logadm.conf on both 
> OpenIndiana and OmniOS and there are identical but OmniOS is not rotating 
> this particular log. Is there any way I can check what is wrong? Why logadm 
> is not rotating the infolog_hival when the filesize reach greater than 10m ?

Under heavy writes, logadm won’t rotate the logs.

You might need to disable the port. We’ve seen cases where miswired SAS 
connections
that are not disabled by default in the firmware cause spurious interrupts. 
This can be done
with various tools, depending on which port is complaining. For example, in 
smp_utils, 
smp_phy_control allows you to send link control commands to expanders. However, 
the
only real cure is hardware/firmware.
 — richard


> 
> This is the setting of logadm.conf from OpenIndiana 151a7:
> /var/fm/fmd/infolog_hival -N -A 2y -S 50m -s 10m -M '/usr/sbin/fmadm -q 
> rotate infolog_hival && mv /var/fm/fmd/infolog_hival.0- $nfile'
> 
> Looking into /var/logadm/timestamps
> # This file holds internal data for logadm(1M).
> # Do not edit.
> /var/log/syslog -P 'Sat Dec  3 19:10:00 2016'
> /var/adm/messages -P 'Wed Oct 26 19:10:00 2016'
> /var/cron/log -P 'Sat Jun 25 19:10:00 2016'
> /var/fm/fmd/infolog_hival -P 'Mon Dec  5 19:10:00 2016'
> /var/adm/wtmpx -P 'Wed Oct  5 19:10:00 2016'
> 
> 
> This is the setting of logadm.conf from OmniOS R151014:
> /var/fm/fmd/infolog_hival -N -A 2y -S 50m -s 10m -M '/usr/sbin/fmadm -q 
> rotate infolog_hival && mv /var/fm/fmd/infolog_hival.0- $nfile'
> 
> Looking into /var/logadm/timestamps
> # This file holds internal data for logadm(1M).
> # Do not edit.
> /var/adm/messages -P 'Tue Dec  6 19:10:00 2016'
> /var/cron/log -P 'Thu Nov 24 19:10:00 2016'
> /var/svc/log/system-idmap:default.log -P 'Tue Dec  8 19:10:00 2015'
> 
> Cron is set to run logadm everyday at 03:10 am.
> 
> Regards.
> 
> 
> On Tue, Nov 29, 2016 at 12:16 PM, Lawrence Giam  > wrote:
> Hi All,
> 
> I have a Supermicro box running OmniOS R151014 build 170cea2. 
> I am getting a lot of entries in infolog_hival about 
> resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast
> 
> lawrence@sgsan1n2:/var/fm/fmd$ fmdump -I
> Nov 29 08:45:33.0893 resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast
> Nov 29 08:45:33.0893 resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast
> Nov 29 08:45:33.0893 resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast
> 
> lawrence@sgsan1n2:/var/fm/fmd$ fmdump -IV
> Nov 29 2016 03:12:20.907712302 (absent)
> nvlist version: 0
> driver_instance = 0
> port_address = w500304800bfc5c02
> devfs_path = /pci@0,0/pci8086,340c@5/pci15d9,400@0
> PhyIdentifier = 0x2
> event_type = port_broadcast_ses
> class = resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast
> version = 0x0
> __ttl = 0x1
> __tod = 0x583c8194 0x361a972e
> 
> Nov 29 2016 03:12:20.907718074 (absent)
> nvlist version: 0
> driver_instance = 0
> port_address = w500304800bfc5c03
> devfs_path = /pci@0,0/pci8086,340c@5/pci15d9,400@0
> PhyIdentifier = 0x3
> event_type = port_broadcast_ses
> class = resource.sysevent.EC_hba.ESC_sas_hba_port_broadcast
> version = 0x0
> __ttl = 0x1
> __tod = 0x583c8194 0x361aadba
> 
> 
> This is a chassic with 2 motherboard sharing a single backbone, each 
> motherboard accessing it's own set of disks and we have a HA solution to 
> auto-mount the disks to the other motherboard should there be a problem with 
> the motherboard.
> 
> IS it possible to disable them?
> 
> Thanks & Regards.
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Increase default maximum NFS server threads?

2016-12-06 Thread Richard Elling

> On Dec 6, 2016, at 7:30 AM, Dan McDonald  wrote:
> 
> I got a link to this commit from the Delphix illumos repo a while back:
> 
>   https://github.com/openzfs/openzfs/pull/186/
> 
> I was curious if NFS-using people in the audience here would like to see this 
> one Just Land (TM) in illumos-omnios or not?

just land it, no real downside
 — richard


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] slices & zfs

2016-06-08 Thread Richard Elling

> On Jun 8, 2016, at 12:24 AM, Jim Klimov  wrote:
> 
> 8 июня 2016 г. 7:00:50 CEST, Martijn Fennis  пишет:
>> Hi,
>> 
>> 
>> Does someone have some info about setting up ZFS on top of slices ?
>> 
>> Having 10 drives which i would like to have the outer ring configured @
>> raid 10 for about 1 TB with dedup, the rest of the drives in Raid 50.
>> 
>> 
>> What i find on google is to use complete devs and difficult to
>> interpret slicing info..
>> 
>> 
>> 
>> Thanks,
>> Martijn
>> 
>> Verzonden vanuit Mail voor Windows 10
>> 
>> 
>> 
>> 
>> 
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> You just set up slices (with format utility on one disk, replicate setup with 
> prtvtoc to others if they are identical). Then you use slice numbers as vdevs 
> to zpool command. Note that slices s0 and s2 are reserved, so you have s1 and 
> s3-s7 to play with. (X86 may expose s8 and s9 - also reserved).

This is not correct. For SMI labels, s2, by convention, defaults to the whole 
disk — aka the “backup” partition (slice).
Since around 1980, I know of no backup software that expects s2 to be the whole 
disk. This was also an overlapping
slice. For modern disks and EFI labels, overlapping slices are not permitted — 
thus preventing many common mistakes
that led to corrupted data on the old SMI-labeled drives.
 — richard


> 
> Also note that dedup may require some RAM and maybe a separate fast SSD to 
> cache reads and writes, to be efficient. Otherwise it can bog you down below 
> single mb/s of i/o.
> 
> Jim
> --
> Typos courtesy of K-9 Mail on my Samsung Android
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] slices & zfs

2016-06-08 Thread Richard Elling

> On Jun 8, 2016, at 7:37 AM, Martijn Fennis  wrote:
> 
> Hi Dale and Jim,
>  
> Thanks for your time.
>  
> The reason is to store VMs on the Quick-pool and downloads on the Slow-pool.
>  
> I would personally not assign the Slow-pool any of my memory, perhaps 
> meta-data.
>  
> That’s why i would like to assign the inner part to the slow-pool.
>  
>  
>  
> Also i’ve read about a maximum of 2 TB per slice, anyone?

For SMI labels, use EFI labels for modern disks.
 
>  
>  
> But maybe i should not do the slicing at all?

You cannot not slice. The only question is whether ZFS creates an EFI label and 
slice 0 for you,
or you do that yourself using “format -e”  This is purely a convenience 
feature, there is no other
science or strategy behind it.
 — richard

>  
>  
> Thanks in advance
>  
>  
> Van: Dale Ghent 
> Verzonden: woensdag 8 juni 2016 08:21
> Aan: Martijn Fennis 
> CC: omnios-discuss@lists.omniti.com 
> Onderwerp: Re: [OmniOS-discuss] slices & zfs
>  
> 
> > On Jun 8, 2016, at 1:00 AM, Martijn Fennis  > > wrote:
> > 
> > Hi,
> > 
> > 
> > Does someone have some info about setting up ZFS on top of slices ?
> > 
> > Having 10 drives which i would like to have the outer ring configured @ 
> > raid 10 for about 1 TB with dedup, the rest of the drives in Raid 50.
> > 
> > 
> > What i find on google is to use complete devs and difficult to interpret 
> > slicing info..
> 
> ZFS is designed to operate with full control of the drive - this means no 
> slicing. Yes, one can use ZFS with slices, but you must first understand why 
> that is not optimal, and what you give up when such a configuration is used.
> 
> When consuming a partition on a disk, ZFS can no-longer assume that it has 
> complete control over the entire disk and cannot enable (and proactively 
> manage) the disk's own write caching capabilities. This will incur a 
> performance penalty, the magnitude of which depends on your specific IO 
> patterns.
> 
> I am curious why you think you need to slice up your drives like that in such 
> a scheme... the mixing of RAID levels across the same devices seem unwieldy 
> given that ZFS was designed to avoid this kind of scenario, and ZFS will 
> compete with itself because it now is operating 2 pools across the same 
> physical devices, which is absolutely terrible for IO scheduling.
> 
> /dale
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com 
> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
> 
--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] fan control

2016-05-24 Thread Richard Elling

> On May 24, 2016, at 4:37 PM, Robert Fantini  wrote:
> 
> I'll check that the next time I'm at the server room, which will not be for a 
> week. 
> Will check then.

yeah, there is a weird interaction whereby power-on-boot goes to full fans, but 
a 
reboot goes to normal fans. Surely there is software somewhere that controls 
this,
but haven’t found the time to track it down.
 — richard

> 
> Thanks for the fast response!
> 
> On Tue, May 24, 2016 at 7:33 PM, Richard Elling 
> mailto:richard.ell...@richardelling.com>> 
> wrote:
> 
>> On May 24, 2016, at 4:28 PM, Robert Fantini > <mailto:robertfant...@gmail.com>> wrote:
>> 
>> we've installed napp-it to a couple of systems .   they run a lot louder - 
>> fans do not seem to be controlled from software. 
>> 
>> is there a software package that that can be installed to handle sensors and 
>> fans?
>> 
>> Or is this something to deal with in Supermicro bios?
> 
> Do the fans slow after a warm reboot?
>  — richard
> 
>> 
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com <mailto:OmniOS-discuss@lists.omniti.com>
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
>> <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
> 
> --
> 
> richard.ell...@richardelling.com <mailto:richard.ell...@richardelling.com>
> +1-760-896-4422 
> 
> 
> 
> 

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] fan control

2016-05-24 Thread Richard Elling

> On May 24, 2016, at 4:28 PM, Robert Fantini  wrote:
> 
> we've installed napp-it to a couple of systems .   they run a lot louder - 
> fans do not seem to be controlled from software. 
> 
> is there a software package that that can be installed to handle sensors and 
> fans?
> 
> Or is this something to deal with in Supermicro bios?

Do the fans slow after a warm reboot?
 — richard

> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] HP Gen9, H241 HBA

2016-05-21 Thread Richard Elling
we've been working on H241 JBOD driver. We have it talking to drives on a 
variety of enclosures along with several weeks of constant load. The current 
cpqary3 driver is not suitable, even if it attaches. 

Let me know if you want to become involved and have enough time in your 
schedule to wait for it to be fully baked. Otherwise, HP does sell a rebranded 
LSI 23xx-based HBA that works fine OOB today.

  -- richard



> On May 21, 2016, at 5:29 AM, Sebastian Gabler  wrote:
> 
> Hi,
> I am pondering to set up a new server. I am looking into a Gen9 HP machine 
> (Options include inexpensive DL20 to DL360 using more RAM), using a H241 HBA 
> connecting to a SuperMicro Jbod (dual 12g expanders).
> Any opinions? Anything I should be aware of besides the Broadcom 1G NIC 
> issues?
> 
> TIA,
> 
> Sebastian
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Similar tools like flowadm

2016-05-05 Thread Richard Elling

> On May 3, 2016, at 7:57 PM, Ergi Thanasko  wrote:
> 
> Hi Dan,
> Yes is it a zfs pool shared over NFS. Yup going through the rabbit whole, but 
> I can wait for a while I have patience. Any help is appreciated thank you 

The most direct approach is to use multiple IP addresses: one per pool. Then 
you have a destination 
address for the flowadm tuple. 
 — richard

> 
> Sent from my iPhone
> 
>> On May 3, 2016, at 7:10 PM, Dan McDonald  wrote:
>> 
>> 
>>> On May 3, 2016, at 7:54 PM, Ergi Thanasko  wrote:
>>> 
>>> Hi guys,
>>> Is there any tools like flowadm that can control   bandwidth usage on a per 
>>> pool basis? instead of host ip. Also flowadm will use both in/out summary 
>>> to  limit bandwidth. We are looking for something deeper that we can 
>>> seperate control on incoming or outgoing or traffic at a pool level .
>> 
>> flowadm only controls network abstractions.  When you say "pool", do you 
>> mean zfs pool?  If so, does that mean with NFS, or iSCSI, or SMB, or some 
>> other file-sharer TBD?!?  You're going down a rabbit hole you can't get back 
>> out of quickly.
>> 
>> Sorry,
>> Dan
>> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow scrub on SSD-only pool

2016-04-22 Thread Richard Elling

> On Apr 22, 2016, at 10:28 AM, Dan McDonald  wrote:
> 
> 
>> On Apr 22, 2016, at 1:13 PM, Richard Elling 
>>  wrote:
>> 
>> If you're running Solaris 11 or pre-2015 OmniOS, then the old write throttle 
>> is impossible
>> to control and you'll chase your tail trying to balance scrubs/resilvers 
>> against any other
>> workload. From a control theory perspective, it is unstable.
> 
> pre-2015 can be clarified a bit:  r151014 and later has the modern ZFS write 
> throttle.  Now I know that Stephen is running later versions of OmniOS, so 
> you can be guaranteed it's the modern write-throttle.
> 
> Furthermore, anyone running any OmniOS EARLIER than r151014 is not 
> supportable, and any pre-014 release is not supported.

Thanks for the clarification Dan!
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow scrub on SSD-only pool

2016-04-22 Thread Richard Elling

> On Apr 22, 2016, at 5:00 AM, Stephan Budach  wrote:
> 
> Am 21.04.16 um 18:36 schrieb Richard Elling:
>>> On Apr 21, 2016, at 7:47 AM, Chris Siebenmann  wrote:
>>> 
>>> [About ZFS scrub tunables:]
>>>> Interesting read - and it surely works. If you set the tunable before
>>>> you start the scrub you can immediately see the thoughput being much
>>>> higher than with the standard setting. [...]
>>> It's perhaps worth noting here that the scrub rate shown in 'zpool
>>> status' is a cumulative one, ie the average scrub rate since the scrub
>>> started. As far as I know the only way to get the current scrub rate is
>>> run 'zpool status' twice with some time in between and then look at how
>>> much progress the scrub's made during that time.
>> Scrub rate measured in IOPS or bandwidth is not useful. Neither is a 
>> reflection
>> of the work being performed in ZFS nor the drives.
>> 
>>> As such, increasing the scrub speed in the middle of what had been a
>>> slow scrub up to that point probably won't make a massive or immediate
>>> difference in the reported scrub rate. You should see it rising over
>>> time, especially if you drastically speeded it up, but it's not any sort
>>> of instant jump.
>>> 
>>> (You can always monitor iostat, but that mixes in other pool IO. There's
>>> probably something clever that can be done with DTrace.)
>> I've got some dtrace that will show progress. However, it is only marginally
>> useful when you've got multiple datasets.
>> 
>>> This may already be obvious and well known to people, but I figured
>>> I'd mention it just in case.
>> People fret about scrubs and resilvers, when they really shouldn't. In ZFS
>> accessing data also checks and does recovery, so anything they regularly
>> access will be unaffected by the subsequent scan. Over the years, I've tried
>> several ways to approach teaching people about failures and scrubs/resilvers,
>> but with limited success: some people just like to be afraid... Hollywood 
>> makes
>> a lot of money on them :-)
>>  -- richard
>> 
>> 
> No… not afraid, but I actually do think, that I can judge whether or not I 
> want to speed scrubs up and trade in some performance for that. As long as I 
> can do that, I am fine with it. And the same applies for resilvers, I guess.

For current OmniOS the priority scheduler can be adjusted using mdb to change
the priority for scrubs vs other types of I/O. There is no userland interface. 
See Adam's
blog for more details.
http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/ 
<http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/>

If you're running Solaris 11 or pre-2015 OmniOS, then the old write throttle is 
impossible
to control and you'll chase your tail trying to balance scrubs/resilvers 
against any other
workload. From a control theory perspective, it is unstable.

> If you need to resilver one half of a mirrored zpool, most people will want 
> that to run as fast as feasible, don't they?

It depends. I've had customers on both sides of the fence and one customer for 
whom we
cron'ed the priority changes to match their peak. Suffice to say, nobody seems 
to want 
resilvers to dominate real work.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow scrub on SSD-only pool

2016-04-21 Thread Richard Elling

> On Apr 21, 2016, at 7:47 AM, Chris Siebenmann  wrote:
> 
> [About ZFS scrub tunables:]
>> Interesting read - and it surely works. If you set the tunable before
>> you start the scrub you can immediately see the thoughput being much
>> higher than with the standard setting. [...]
> 
> It's perhaps worth noting here that the scrub rate shown in 'zpool
> status' is a cumulative one, ie the average scrub rate since the scrub
> started. As far as I know the only way to get the current scrub rate is
> run 'zpool status' twice with some time in between and then look at how
> much progress the scrub's made during that time.

Scrub rate measured in IOPS or bandwidth is not useful. Neither is a reflection
of the work being performed in ZFS nor the drives.

> 
> As such, increasing the scrub speed in the middle of what had been a
> slow scrub up to that point probably won't make a massive or immediate
> difference in the reported scrub rate. You should see it rising over
> time, especially if you drastically speeded it up, but it's not any sort
> of instant jump.
> 
> (You can always monitor iostat, but that mixes in other pool IO. There's
> probably something clever that can be done with DTrace.)

I've got some dtrace that will show progress. However, it is only marginally
useful when you've got multiple datasets.

> 
> This may already be obvious and well known to people, but I figured
> I'd mention it just in case.

People fret about scrubs and resilvers, when they really shouldn't. In ZFS
accessing data also checks and does recovery, so anything they regularly
access will be unaffected by the subsequent scan. Over the years, I've tried
several ways to approach teaching people about failures and scrubs/resilvers,
but with limited success: some people just like to be afraid... Hollywood makes
a lot of money on them :-)
 -- richard


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [developer] NVMe Performance

2016-04-16 Thread Richard Elling

> On Apr 15, 2016, at 7:49 PM, Richard Yao  wrote:
> 
> On 04/15/2016 10:24 PM, Josh Coombs wrote:
>> On Fri, Apr 15, 2016 at 9:26 PM, Richard Yao  wrote:
>> 
>>> 
>>> The first is to make sure that ZFS uses proper alignment on the device.
>>> According to what I learned via Google searches, the Intel DC P3600
>>> supports both 512-byte sectors and 4096-byte sectors, but is low leveled
>>> formatted to 512-byte sectors by default. You could run fio to see how the
>>> random IO performance differs on 512-byte IOs at 512-byte formatting vs 4KB
>>> IOs at 4KB formatting, but I expect that you will find it performs best in
>>> the 4KB case like Intel's enterprise SATA SSDs do. If the 512-byte random
>>> IO performance was notable, Intel would have advertised it, but they did
>>> not do that:
>>> 
>>> 
>>> http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-p3600-spec.pdf
>>> 
>>> http://www.cadalyst.com/%5Blevel-1-with-primary-path%5D/how-configure-oracle-redo-intel-pcie-ssd-dc-p3700-23534
>>> 
>> So, I played around with this.  Intel's isdct tool will let you secure
>> erase the P3600 and set it up as a 4k sector device, or a 512, with a few
>> other options as well.  I have to re-look but it might support 8k sectors
>> too.  Unfortunately the NVMe driver doesn't play well with the SSD
>> formatted for anything other than 512 byte sectors.  I noted my findings in
>> Illumos bug #6912.
> 
> The documentation does not say that it will do 8192-byte sectors,
> although ZFS in theory should be okay with them. My tests on the Intel
> DC S3700 suggested that 4KB vs 8KB was too close to tell. I recall
> deciding that Intel did a good enough job at 4KB that it should go into
> ZoL's quirks list as a 4KB drive.

ZIL traffic is all 4K, unless phys blocksize is larger. There are a number of 
Flash SSDs
that prefer 8k, and you can tell by the “optimal transfer size.” Since the bulk 
of the market
driving SSD sales is running NTFS, 4K is the market sweet spot.

> 
> The P3600 is probably similar because its NAND flash controller "is an
> evolution of the design used in the S3700/S3500":
> 
> http://www.anandtech.com/show/8104/intel-ssd-dc-p3700-review-the-pcie-ssd-transition-begins-with-nvme
>  
> 
> 
>> I need to look at how Illumos partitions the devices if you just feed zpool
>> the device rather than a partition, I didn't look to see if it was aligning
>> things correctly or not on it's own.
> 
> It will put the first partition at a 1MB boundary and set an internal
> alignment shift consistent with what the hardware reports.
> 
>> The second is that it is possible to increase IOPS beyond Intel's
>>> specifications by doing a secure erase, giving SLOG a tiny 4KB aligned
>>> partition and leaving the rest of the device unused. Intel's numbers are
>>> for steady state performance where almost every flash page is dirty. If you
>>> leave a significant number of pages clean (i.e. unused following a secure
>>> erase), the drive should perform better than what Intel claims by virtue of
>>> the internal book keeping and garbage collection having to do less. 
>>> Anandtech
>>> has benchmarks numbers showing this effect on older consumer SSDs on
>>> Windows in a comparison with the Intel DC S3700:
>>> 
>> 
>> Using isdct I have mine set to 50% over-provisioning, so they show up as
>> 200GB devices now.  As noted in bug 6912 you have to secure erase after
>> changing that setting or the NVMe driver REALLY gets unhappy.
> 
> If you are using it as a SLOG, you would probably want something like
> 98% overprovisioning to match the ZeusRAM, which was designed for use as
> a ZFS SLOG device and was very well regarded until it was discontinued:
> 
> https://www.hgst.com/sites/default/files/resources/[FAQ]_ZeusRAM_FQ008-EN-US.pdf
>  
> 

ZeusRAM was great for its time, but the 12G replacements perform similarly. The
biggest difference between ZeusRAMs and Flash SSDs seems to be in the garbage
collection. In my testing, low DWPD drives have less consistent performance as 
the
garbage collection is less optimized. For the 3 DWPD drives we’ve tested, the 
performance
for slog workloads is more consistent than the 1 DWPD drives.

> 
> ZFS generally does not need much more from a SLOG device. The way to
> ensure that you do not overprovision more/less than ZFS is willing to
> use on your system would be to look at zfs_dirty_data_max
> 
> That being said, you likely will want to run fio random IO benchmarks at
> different overprovisioning levels after a secure erase and a dd
> if=/dev/urandom of=/path/to/device so you can see the difference in
> performance yourself. Happy benchmarking. :)

/dev/urandom is too (intentionally) slow. You’ll bottleneck there.

Richard’s advice is good: test with random workloads. Contrary to p

Re: [OmniOS-discuss] Optimal Server Configuration or Disk Controller Recommendations

2016-04-10 Thread Richard Elling

> On Apr 10, 2016, at 3:05 AM, Guenther Alka  wrote:
> 
> Check this prebuild machine as a reference
> http://www.supermicro.com/products/system/2U/2028/SSG-2028R-ACR24L.cfm
> 
> it comes with 3 x Avago 3008 HBAs in IT mode and 2 X 10 GbE Ethernet

FYI, Avago bought Broadcom, then changed their name to Broadcom.
Qlogic and Marvell are seeking acquirers. The number of suppliers of chipsets 
will continue to dwindle.
 — richard

> This allows using Sata disks instead of SAS that you should use with Expander 
> based solutions.
> 
> For database use, I would insist on SSDs like Intel S3610/S3710 or Samsung 
> SM/PM 863 due their reliability with powerloss protection. Decide based on 
> needs of write iops under load.Due the high iops (up to 50k per SSD under 
> constant load compared to 100 iops of a spindle), you do not need mirrors but 
> can use a more cost effective n x Raid-Z2 config with 6 SSD per vdev.Add as 
> much RAM as needed. (ex 128 GB up, the board can hold up to 2TB ECC what is 
> more than your database). Use any Xeons, prefer frequency over number of 
> cores.
> 
> With very fast SSDs in a pool, you can skip the extra Slog for sync write 
> unless you use an NVMe like an Intel P3700 that is much faster than the Sata 
> SSD.
> 
> 
> Gea
> 
> 
> Am 06.04.2016 um 19:23 schrieb Josh Barton:
>> I am hoping to build a server with 15+ drives managed in ZFS that will run a 
>> 1TB+ postgres database.
>> 
>> 
>> [NEEDS]
>> 
>> 15-20 drive bays (ideally SFF/2.5")
>> 
>> JBOD disk controller ideal for running ZFS
>> 
>> 
>> In the past I purchased an HP Proliant DL380P Gen 8 but it had a "Smart 
>> Array Controller" that prevented us from hot swapping drives. It also seemed 
>> to haveless IO channels than might be good when running a box with 16 drives.
>> 
>> 
>> What would you recommend for  either a complete server configuration or at 
>> the very least  a disk controller for running this environment?
>> 
>> 
>> Thank you,
>> 
>> 
>> Josh Barton
>> 
>> Systems Admin/Developer
>> 
>> USU Research Foundation
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Good way to debug DTrace invalid address errors?

2016-03-24 Thread Richard Elling
a few pointers...

> On Mar 23, 2016, at 12:48 PM, Chris Siebenmann  wrote:
> 
> I have a relatively complicated chunk of dtrace code that reads kernel
> data structures and chases pointers through them. Some of the time it
> spits out 'invalid address' errors during execution, for example:
> 
>   dtrace: error on enabled probe ID 8 (ID 75313: 
> fbt:nfssrv:nfs3_fhtovp:return): invalid address (0x2e8) in action #6 at DIF 
> offset 40
>   dtrace: error on enabled probe ID 8 (ID 75313: 
> fbt:nfssrv:nfs3_fhtovp:return): invalid address (0xbaddcafebaddcc7e) in 
> action #6 at DIF offset 60
> 
> I'd like to find out exactly what pointer dereferences or other
> operations are failing here, so that I can figure out how to work
> around the issue. However, I have no solid idea how to map things
> like 'probe ID 8' and 'DIF offset 60' to particular lines in my
> DTrace source code.
> 
> I assume that the answer to this involves reading DIF (the DTrace
> intermediate form). I've looked at 'dtrace -Se' output from this
> DTrace script, but I can't identify the spot I need to look at.
> In particular, as far as I can see nothing in the output has
> instructions with an offset as high as 40 or 60.
> 
> I can flail around sticking guards in and varying how I do stuff
> to make the errors go away, but I'd like to understand how to debug
> this sort of stuff so I can have more confidence in my changes.
> 
> Thanks in advance for any suggestions, and if people want to see
> the actual code involved it is this DTrace script:
> 
>   https://github.com/siebenmann/cks-dtrace/blob/master/nfs3-long.d

We do something similar, however we differ in the method to arrive at
pool and dataset. Without knowing your conventions, it is not really possible
to make concrete recommendations, but here is how we do it:
+ each share is unique
+ each share has a UUID
+ mount point contains the share UUID
+ each UUID can be mapped to a pool and dataset
+ all permutations of (operation, client, share) are collected, including "all"
+ today, all of this data is pumped into collectd --> influxdb for queries

With newer illumos distros, there are Riemann sum kstats for each NFS version,
operation, by share. This might or might not help. We find the averages to be
less insightful than the distribution.

Also, for client IP address, here's what we do:
self->remote = (xlate  (args[3])).ci_remote;
 -- richard

> 
> (Look at line 144 for the specific dtrace probe that is probably
> failing, since it's the only probe on fbt:nfssrv:nfs3_fhtovp:return.)
> 
>   - cks
> PS: it's entirely possible that there's a better way to do what I'm
>trying here, too. These DTrace scripts were originally written
>on Solaris 10 update 8 and haven't been drastically revised
>since.
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] 4kn or 512e with ashift=12

2016-03-24 Thread Richard Elling

> On Mar 23, 2016, at 6:37 PM, Bob Friesenhahn  
> wrote:
> 
> On Wed, 23 Mar 2016, Richard Elling wrote:
> 
>> 
>>> On Mar 23, 2016, at 7:49 AM, Richard Jahnel  wrote:
>>> 
>>> It should be noted that using a 512e disk as a 512n disk subjects you to a 
>>> significant risk of silent corruption in the event of power loss. Because 
>>> 512e disks does a read>modify>write operation to modify 512byte chunk of a 
>>> 4k sector, zfs won't know about the other 7 corrupted 512e sectors in the 
>>> event of a power loss during a write operation. So when discards the 
>>> incomplete txg on reboot, it won't do anything about the other 7 512e 
>>> sectors it doesn't know were affected.
>> 
>> Disagree. The risk is no greater than HDDs today with their volatile write 
>> caches.
> 
> If the data unrelated to the current transaction group is read and then 
> partially modifed (possibly with data corruption due to loss of power during 
> write), this would seem to be worse than loss due to a volatile write cache 
> (assuming drives which observe cache sync requests) since data unrelated to 
> the current transaction group may have been modified.  The end result would 
> be checksum errors during a scrub.

The old data is not modified. This is not read-destroy-modify-write.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] 4kn or 512e with ashift=12

2016-03-23 Thread Richard Elling

> On Mar 23, 2016, at 7:36 AM, Chris Siebenmann  wrote:
> 
>>> The sd.conf whitelist also requires a reboot to activate if you need
>>> to add a new entry, as far as I know.
>>> 
>>>(Nor do I know what happens if you have some 512n disks and
>>>some 512e disks, both correctly recognized and in different
>>>pools, and now you need to replace a 512n disk with a spare 512e
>>>disk so you change sd.conf to claim that all of the 512e disks
>>>are 512n. I'd like to think that ZFS will carry on as normal,
>>>but I'm not sure.  This makes it somewhat dangerous to change
>>>sd.conf on a live system.)
>> 
>> There are two cases if we don't use the remedy (whitelist in illumos
>> or -o ashift in ZoL) here:
>> a): 512n <---> 512e. This replacement should work *in theory* if the
>> lie works *correctly*.
> 
> This will not work without the sd.conf workaround in Illumos.
> 
> All 512e disks that I know of correctly report their actual physical
> disk size to Illumos (and to Linux/ZoL). When a disk reports a 4K
> physical sector size, ZFS will refuse to allow it into an ashift=9
> vdev *regardless* of the fact that it is 512e and will accept reads
> and writes in 512-byte sectors.
> 
> In Illumos, you can use sd.conf to lie to the system and claim that
> this is not a 512e but a 512n disk (ie, it has a 512 byte physical
> sector size). I don't believe there's an equivalent on ZoL, but I
> haven't looked.
> 
> This absolute insistence on ZFS's part is what makes ashift=9 vdevs so
> dangerous today, because you cannot replace existing 512n disks in them
> with 512e disks without (significant) hackery.
> 
> (Perhaps I'm misunderstanding what people mean by '512e' here; I've
> been assuming it means a disk which reports 512 byte logical sectors and
> 4k physical sectors. Such disks are what you commonly get today.)

Yes. 512e means:
 un_phy_blocksize = 4096 (or 8192)
 un_tgt_blocksize = 512
for disks that don't lie. Lying disks claim un_phy_blocksize = 512 when it 
isn't.

At this point, before the discussion degenerates further, remember that George
covered this in detail at the OpenZFS conference and in his blog.
http://blog.delphix.com/gwilson/2012/11/15/4k-sectors-and-zfs/ 


http://www.youtube.com/watch?v=TmH3iRLhZ-A&feature=youtu.be


 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] 4kn or 512e with ashift=12

2016-03-23 Thread Richard Elling

> On Mar 23, 2016, at 7:49 AM, Richard Jahnel  wrote:
> 
> It should be noted that using a 512e disk as a 512n disk subjects you to a 
> significant risk of silent corruption in the event of power loss. Because 
> 512e disks does a read>modify>write operation to modify 512byte chunk of a 4k 
> sector, zfs won't know about the other 7 corrupted 512e sectors in the event 
> of a power loss during a write operation. So when discards the incomplete txg 
> on reboot, it won't do anything about the other 7 512e sectors it doesn't 
> know were affected.

Disagree. The risk is no greater than HDDs today with their volatile write 
caches.
 -- richard

> 
> Richard Jahnel
> Network Engineer
> On-Site.com | Ellipse Design
> 866-266-7483 ext. 4408
> Direct: 669-800-6270
> 
> 
> -Original Message-
> From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On 
> Behalf Of Chris Siebenmann
> Sent: Wednesday, March 23, 2016 9:36 AM
> To: Fred Liu 
> Cc: omnios-discuss@lists.omniti.com
> Subject: Re: [OmniOS-discuss] 4kn or 512e with ashift=12
> 
>>> The sd.conf whitelist also requires a reboot to activate if you need
>>> to add a new entry, as far as I know.
>>> 
>>>(Nor do I know what happens if you have some 512n disks and
>>>some 512e disks, both correctly recognized and in different
>>>pools, and now you need to replace a 512n disk with a spare 512e
>>>disk so you change sd.conf to claim that all of the 512e disks
>>>are 512n. I'd like to think that ZFS will carry on as normal,
>>>but I'm not sure.  This makes it somewhat dangerous to change
>>>sd.conf on a live system.)
>> 
>> There are two cases if we don't use the remedy (whitelist in illumos 
>> or -o ashift in ZoL) here:
>> a): 512n <---> 512e. This replacement should work *in theory* if the 
>> lie works *correctly*.
> 
> This will not work without the sd.conf workaround in Illumos.
> 
> All 512e disks that I know of correctly report their actual physical disk 
> size to Illumos (and to Linux/ZoL). When a disk reports a 4K physical sector 
> size, ZFS will refuse to allow it into an ashift=9 vdev *regardless* of the 
> fact that it is 512e and will accept reads and writes in 512-byte sectors.
> 
> In Illumos, you can use sd.conf to lie to the system and claim that this is 
> not a 512e but a 512n disk (ie, it has a 512 byte physical sector size). I 
> don't believe there's an equivalent on ZoL, but I haven't looked.
> 
> This absolute insistence on ZFS's part is what makes ashift=9 vdevs so 
> dangerous today, because you cannot replace existing 512n disks in them with 
> 512e disks without (significant) hackery.
> 
> (Perhaps I'm misunderstanding what people mean by '512e' here; I've been 
> assuming it means a disk which reports 512 byte logical sectors and 4k 
> physical sectors. Such disks are what you commonly get today.)
> 
>   - cks
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] 4kn or 512e with ashift=12

2016-03-22 Thread Richard Elling

> On Mar 22, 2016, at 7:41 AM, Chris Siebenmann  wrote:
> 
>>> This implicitly assumes that the only reason to set ashift=12 is
>>> if you are currently using one or more drives that require it. I
>>> strongly disagree with this view. Since ZFS cannot currently replace
>>> a 512n drive with a 512e one, I feel [...]
>> 
>> *In theory* this replacement should work well if the lie works *correctly*.
>> In ZoL, for the "-o ashift" is supported in "zpool replace", the
>> replacement should also work in mixed sector sizes.
>> And in illumos the whitelist will do the same.
>> What errors have you ever seen?
> 
> We have seen devices that changed between (claimed) 512n and
> (claimed) 512e/4k *within the same model number*; the only thing that
> distinguished the two was firmware version (which is not something that
> you can match in sd.conf). This came as a complete surprise to us the
> first time we needed to replace an old (512n) one of these with a new
> (512e) one.
> 
> The sd.conf whitelist also requires a reboot to activate if you need
> to add a new entry, as far as I know.
> 
> (Nor do I know what happens if you have some 512n disks and some
> 512e disks, both correctly recognized and in different pools, and
> now you need to replace a 512n disk with a spare 512e disk so you
> change sd.conf to claim that all of the 512e disks are 512n. I'd
> like to think that ZFS will carry on as normal, but I'm not sure.
> This makes it somewhat dangerous to change sd.conf on a live system.)

What is missing from
http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks 

is:

1. how to change the un_phy_blocksize for any or all uns
2. how to set a default setting for all drives in sd.conf by setting attributes 
to
the "" of ""  (see sd(7d))

I am aware of no new HDDs with 512n, so this problem will go away for HDDs.
However, there are many SSDs that work better with un_phy_blocksize = 8192
and some vendors set sd.conf or source appropriately.
 -- richard

> 
>>> For many usage cases, somewhat more space usage and perhaps
>>> somewhat slower pools are vastly preferable to a loss of pool
>>> redundancy over time. I feel that OmniOS should at least give you
>>> the option here (in a less crude way than simply telling it that
>>> absolutely all of your drives are 4k drives, partly because such
>>> general lies are problematic in various situations).
>> 
>> The whitelist (sd.conf) should fit into this consideration. But not
>> sure how mixed sector sizes impact the performance.
> 
> Oh, 512e disks in a 512n pool will probably have not great performance.
> ZFS does a lot of unaligned reads and writes, unlike other filesystems;
> if you say your disks are 512n, it really believes you and behaves
> accordingly.
> 
>   - cks

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] 4kn or 512e with ashift=12

2016-03-21 Thread Richard Elling

> On Mar 21, 2016, at 12:00 PM, Richard Jahnel  wrote:
> 
> Both approaches have their error points.
> 
> FWIW I would very very much like to be able to force my new pools into 
> ashift=12. It would make drive purchasing and replacement a lot more straight 
> forward and future resistant.

The fundamental problem is that this is a vdev-settable, not a pool settable. 
Today, it is very common
for pools to have multiple ashifts active. Recently, per-vdev ZAP objects have 
been proposed and that
code is working its way through the review and integration process.

Meanwhile, use one or more of the dozens of methods documented for doing this.

FWIW, most people with HDDs want space efficiency, because HDDs lost the 
performance race to
SSDs long ago. In general, forcing everything to 4k reduces space efficiency, 
so it is unlikely to be
a good default.
 -- richard

> 
> Regards
> 
> Richard Jahnel
> Network Engineer
> On-Site.com | Ellipse Design
> 866-266-7483 ext. 4408
> Direct: 669-800-6270
> 
> -Original Message-
> From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On 
> Behalf Of Richard Elling
> Sent: Monday, March 21, 2016 1:54 PM
> To: Jim Klimov 
> Cc: omnios-discuss@lists.omniti.com
> Subject: Re: [OmniOS-discuss] 4kn or 512e with ashift=12
> 
> 
>> On Mar 21, 2016, at 11:11 AM, Jim Klimov  wrote:
>> 
>> 21 марта 2016 г. 10:02:03 CET, Hanno Hirschberger 
>>  пишет:
>>> On 21.03.2016 08:00, Fred Liu wrote:
>>>> So that means illumos can handle 512n and 4kn automatically and
>>> properly?
>>> 
>>> Not necessarily as far as I know. Sometime drives are emulating 512 
>>> blocks and don't properly tell the OS about that and Illumos ZFS is 
>>> aligning the drives with ashift=9 which leads to enormous performance 
>>> issues. Also forcing the system to handle drives with a specific 
>>> sector
>>> 
>>> size with the sd.conf doesn't turn out to be reliable in some cases 
>>> (at
>>> 
>>> least on my workstations). Here's what I do to ensure ashift=12 values:
>>> 
>>> Reboot the system with a Linux live disk of your choice and install 
>>> ZoL
>>> 
>>> in the live session. Then create the ZFS pool, export it and reboot 
>>> the
>>> 
>>> machine. OmniOS / Illumos can import the new pool without problems 
>>> and the ashift value is correctly set. There was a fixed zpool binary 
>>> (Solaris 11 binary) flying around the internet which can handle the 
>>> "-o
>>> 
>>> shift=12" parameter and works with OmniOS but unfortunately I can't 
>>> find it again right now. This would make the reboot into a live 
>>> session obsolete.
>>> 
>>> Does anyone know if the "ashift" parameter will be implemented in the 
>>> OmniOS / Illumos zpool binary in the near future?
>>> 
>>> Best regards
>>> 
>>> Hanno
>>> ___
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss@lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> 
>> Adding the ashift argument to zpool was discussed every few years and so far 
>> was always deemed not enterprisey enough for the Solaris heritage, so the 
>> setup to tweak sd driver reports and properly rely on that layer was pushed 
>> instead.
> 
> The issue is that once a drive model lies, then the Solaris approach is to 
> encode the lie into a whitelist, so that the lie is always handled properly. 
> The whitelist is in the sd.conf file.
> 
> By contrast, the ZFSonLinux approach requires that the sysadmin knows there 
> is a lie and manually corrects for every invocation. This is unfortunately a 
> very error-prone approach.
> -- richard
> 
>> 
>> That said, the old tweaked binary came with a blog post detailing the 
>> source changes; you're welcome to try a d port and rti it (I'd say 
>> there is enough user demand to back the non-enterprisey fix to be on 
>> par with other OpenZFS siblings). At worst, you can publish the 
>> modernized binary as the original blogger did ;)
>> 
>> Jim
>> --
>> Typos courtesy of K-9 Mail on my Samsung Android 
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] 4kn or 512e with ashift=12

2016-03-21 Thread Richard Elling

> On Mar 21, 2016, at 12:19 PM, Bob Friesenhahn  
> wrote:
> 
> On Mon, 21 Mar 2016, Richard Elling wrote:
>>> 
>>> Adding the ashift argument to zpool was discussed every few years and so 
>>> far was always deemed not enterprisey enough for the Solaris heritage, so 
>>> the setup to tweak sd driver reports and properly rely on that layer was 
>>> pushed instead.
>> 
>> The issue is that once a drive model lies, then the Solaris approach is to 
>> encode
>> the lie into a whitelist, so that the lie is always handled properly. The 
>> whitelist is in the
>> sd.conf file.
> 
> Does this approach require that Illumos users only use drive hardware much 
> older than the version of Illumos they happen running since outdated 
> whitelist won't know about the new lies?

Forunately, lies are becoming less common. But this raises a good point: if 
your drive doesn't lie,
then you don't need to workaround.

> 
> What if a user is using classic drives but wants to be prepared to install 
> newer drives which require ashift=12?

See the bazillion other posts on this topic.
 -- richard

> 
> Bob
> -- 
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] 4kn or 512e with ashift=12

2016-03-21 Thread Richard Elling

> On Mar 21, 2016, at 11:11 AM, Jim Klimov  wrote:
> 
> 21 марта 2016 г. 10:02:03 CET, Hanno Hirschberger 
>  пишет:
>> On 21.03.2016 08:00, Fred Liu wrote:
>>> So that means illumos can handle 512n and 4kn automatically and
>> properly?
>> 
>> Not necessarily as far as I know. Sometime drives are emulating 512 
>> blocks and don't properly tell the OS about that and Illumos ZFS is 
>> aligning the drives with ashift=9 which leads to enormous performance 
>> issues. Also forcing the system to handle drives with a specific sector
>> 
>> size with the sd.conf doesn't turn out to be reliable in some cases (at
>> 
>> least on my workstations). Here's what I do to ensure ashift=12 values:
>> 
>> Reboot the system with a Linux live disk of your choice and install ZoL
>> 
>> in the live session. Then create the ZFS pool, export it and reboot the
>> 
>> machine. OmniOS / Illumos can import the new pool without problems and 
>> the ashift value is correctly set. There was a fixed zpool binary 
>> (Solaris 11 binary) flying around the internet which can handle the "-o
>> 
>> shift=12" parameter and works with OmniOS but unfortunately I can't
>> find 
>> it again right now. This would make the reboot into a live session
>> obsolete.
>> 
>> Does anyone know if the "ashift" parameter will be implemented in the 
>> OmniOS / Illumos zpool binary in the near future?
>> 
>> Best regards
>> 
>> Hanno
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> Adding the ashift argument to zpool was discussed every few years and so far 
> was always deemed not enterprisey enough for the Solaris heritage, so the 
> setup to tweak sd driver reports and properly rely on that layer was pushed 
> instead.

The issue is that once a drive model lies, then the Solaris approach is to 
encode
the lie into a whitelist, so that the lie is always handled properly. The 
whitelist is in the
sd.conf file.

By contrast, the ZFSonLinux approach requires that the sysadmin knows there is a
lie and manually corrects for every invocation. This is unfortunately a very 
error-prone
approach.
 -- richard

> 
> That said, the old tweaked binary came with a blog post detailing the source 
> changes; you're welcome to try a d port and rti it (I'd say there is enough 
> user demand to back the non-enterprisey fix to be on par with other OpenZFS 
> siblings). At worst, you can publish the modernized binary as the original 
> blogger did ;)
> 
> Jim
> --
> Typos courtesy of K-9 Mail on my Samsung Android
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] supermicro 3U all-in one storage system

2016-03-19 Thread Richard Elling

> On Mar 18, 2016, at 4:12 PM, Geoff Nordli  wrote:
> 
> Hi.
> 
> I have had good luck with the SuperStorage 6037R-E1R16L chassis with the LSI 
> 2308 IT mode HBA.

We have several similar servers. The X10DRH is fine. For a non-HA system, 
single expander backplane
is ok (BPN-SAS3-836EL1). 

> 
> Thoughts on the 
> http://www.supermicro.com/products/system/3U/6038/SSG-6038R-E1CR16L.cfm. It 
> has the LSI 3008 HBA and Intel x540 network cards.

We have many variants of these parts. All work fine.

> 
> I want to get at least 4TB 3.5" SAS drives.  Any suggestions on those?

HGST or Seagate 4TB both seem to work fine. For Seagate, you'll want firmware 
0004 or later,
but that should be all that is in the supply chain for the past year. I avoid 
WD.
 -- richard


> 
> I will be installing the latest version of Ominos.
> 
> thanks,
> 
> Geoff
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] recovering deleted files from ZFS?

2016-03-14 Thread Richard Elling


> On Mar 14, 2016, at 11:42 AM, CJ Keist  wrote:
> 
> Dan,
>   You know if this is a just one shot attempt?  Meaning if I choose the wrong 
> TXG to import, can I export and try again with a different TXG?
> 

pro tip: try retro import with readonly option


  -- richard

> 
>> On 3/14/16 12:28 PM, Dan McDonald wrote:
>> The very long-shot chance is to export the pool (after backing it recent 
>> changes), use zdb to see what its current TXG number is, and re-import the 
>> pool with an earlier TXG number using -T.  You'll lose files or changes 
>> created after the old TXG, AND if it's been a long enough time or a full 
>> enough pool, you won't get back to that TXG.
>> 
>> It's an extraordinary measure, one I wouldn't take unless I was truly 
>> desperate.
>> 
>> Dan
>> 
>> Sent from my iPhone (typos, autocorrect, and all)
>> 
>>> On Mar 14, 2016, at 1:41 PM, CJ Keist  wrote:
>>> 
>>> All,
>>>   Thought I try asking this question on this forum.  In light of no 
>>> snapshots, is there a way in ZFS to recover a recently deleted file?  We do 
>>> nightly backups, but this would be a file that was deleted before the 
>>> coming daily backup.  Does anyone know if there is a service that can do 
>>> file recoveries from ZFS?
>>> 
>>> 
>>> -- 
>>> C. J. Keist Email: cj.ke...@colostate.edu
>>> Systems Group Manager   Solaris 10 OS (SAI)
>>> Engineering Network ServicesPhone: 970-491-0630
>>> College of Engineering, CSU Fax:   970-491-5569
>>> Ft. Collins, CO 80523-1301
>>> 
>>> All I want is a chance to prove 'Money can't buy happiness'
>>> 
>>> ___
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss@lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> -- 
> C. J. Keist Email: cj.ke...@colostate.edu
> Systems Group Manager   Solaris 10 OS (SAI)
> Engineering Network ServicesPhone: 970-491-0630
> College of Engineering, CSU Fax:   970-491-5569
> Ft. Collins, CO 80523-1301
> 
> All I want is a chance to prove 'Money can't buy happiness'
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] weird disk behavior

2016-03-09 Thread Richard Elling
comment below...


> On Mar 9, 2016, at 11:05 AM, Michael Rasmussen  wrote:
> 
> Hi all,
> 
> I suddenly noticed one of the disk bays in my storage server going red
> with this logged in dmesg:
> Mar  9 19:19:47 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0 (mpt0):
> Mar  9 19:19:47 nas Disconnected command timeout for Target 1
> Mar  9 19:19:51 nas genunix: [ID 365881 kern.info] 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0 (mpt0):
> Mar  9 19:19:51 nas Log info 0x3114 received for target 1.
> Mar  9 19:19:51 nas scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
> Mar  9 19:19:51 nas genunix: [ID 365881 kern.info] 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0 (mpt0):
> Mar  9 19:19:51 nas Log info 0x3113 received for target 1.
> Mar  9 19:19:51 nas scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
> Mar  9 19:19:51 nas genunix: [ID 365881 kern.info] 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0 (mpt0):
> Mar  9 19:19:51 nas Log info 0x3113 received for target 1.
> Mar  9 19:19:51 nas scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
> Mar  9 19:20:21 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0/sd@1,0 (sd3):
> Mar  9 19:20:21 nas Command failed to complete...Device is gone
> Mar  9 19:20:24 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0/sd@1,0 (sd3):
> Mar  9 19:20:24 nas Command failed to complete...Device is gone
> Mar  9 19:20:24 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0/sd@1,0 (sd3):
> Mar  9 19:20:24 nas SYNCHRONIZE CACHE command failed (5)
> Mar  9 19:20:24 nas genunix: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci1022,1708@3/pci1028,1f0e@0/sd@1,0 (sd3):
> Mar  9 19:20:24 nas drive offline
> 
> zpool online and smartctl could not talk to the disk.
> 
> Pulling the disk and reinserting it and the status showed green in
> which case both smartctl and zpool online could talk to the disk.
> 
> Resilvering is now taking place.
> 
> Any idea what has went wrong or should I worry for a disk imminently
> failing?

these are symptoms that the drive is not responding and resets are being sent 
to try (often in vain) to bring the disk online. Since this is mpt, it is 
likely 3Gbps and if the drive is SATA your tears will flow. Now that the drive 
is back AND the symptoms cleared after reinstalling the drive, it is very 
likely that drive is the source of the errors. smartctl might give more info. 
IMHO you should plan for replacement of that drive.

NB, for that SAS fabric generation, it is posaible that the problem drive is 
not the only drive showing the same errors, but your drive pull test is a 
reasonable approach. 

Do not be surprised if smartctl doesn't correctly identify the issue, smart 
isn't very smart sometimes.

  -- richard


> 
> 
> -- 
> Hilsen/Regards
> Michael Rasmussen
> 
> Get my public GnuPG keys:
> michael  rasmussen  cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir  datanom  net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir  miras  org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --
> /usr/games/fortune -es says:
> No group of professionals meets except to conspire against the public
> at large. -- Mark Twain
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Configuration sanity checking

2016-02-19 Thread Richard Elling

> On Feb 19, 2016, at 7:14 AM, Trey Palmer  wrote:
> 
> I haven't checked the Supermicro models lately, but the HP DL380Gen9 has a 
> model with 24 direct-wired 2.5" slots in 3 modular 8-disk bays.  The SAS 
> expander is an add-in card which you don't have to buy.
> 
> It's really excellent for Intel DC s3X10's.   Of course it's also more 
> expensive than a Supermicro. 
> 
> Hopefully Supermicro has something similar now. Wiring anything SATA through 
> a SAS expander is a risky practice.

yes, Supermicro has almost  every conceivable option for drive 
backplanes/centerplanes including straight through wiring (SATA and SAS use the 
same connector)
  -- richard

> 
> -- Trey
> 
> 
>> On Friday, February 19, 2016, Bob Friesenhahn  
>> wrote:
>>> On Fri, 19 Feb 2016, Peter Tribble wrote:
>>> 
>>> So that was an interesting question, and I had to go away and investigate 
>>> further.
>>> So the way this works is that the system has a 24-slot disk backplane. The 
>>> disks plug
>>> into the backplane; you have to connect the backplane as a unit to 
>>> something, which
>>> is where the HBA comes in. It doesn't look like there's a way to wire an 
>>> individual drive
>>> to an onboard SATA port.
>> 
>> These really dense 2U systems are often a pre-packaged (or close to it) 
>> system from SuperMicro and you get what they (SuperMicro) are able to build. 
>>  The product page usually makes note of that. Your integrator/builder is 
>> only able to make relatively small changes. Direct wiring 24 drives in a 2U 
>> system would be crazy.
>> 
>> With SAS driving SATA, it seems like you would still need/want one SAS 
>> channel per drive and 24 channels is a lot.  It sounds like you are getting 
>> only 8 SAS channels.
>> 
>> Bob
>> -- 
>> Bob Friesenhahn
>> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
>> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Configuration sanity checking

2016-02-18 Thread Richard Elling
Hi Peter!

> On Feb 18, 2016, at 2:15 PM, Peter Tribble  wrote:
> 
> We're looking at some new boxes, and I would appreciate any comments on the
> components proposed. This would be in a 2U chassis with 24 2.5" disk slots at
> the front, although we're leaving a lot of those free for future growth.
> 
> Motherboard - Supermicro X10DRi family
> CPUs - E5-2620 v3
> HBA - LSI 9300-8i
> Network - Intel i350 or 540, depending on precise motherboard variant
> Disks (SSD) - Intel S3510 (boot), S3610 (application + data)

The drives are SATA and there are quite a few SATA ports on the mobo, do you
need the SAS HBA?

The rest looks fairly common.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-18 Thread Richard Elling
comments below...

> On Feb 18, 2016, at 12:57 PM, Schweiss, Chip  wrote:
> 
> 
> 
> On Thu, Feb 18, 2016 at 5:14 AM, Michael Rasmussen  > wrote:
> On Thu, 18 Feb 2016 07:13:36 +0100
> Stephan Budach mailto:stephan.bud...@jvm.de>> wrote:
> 
> >
> > So, when I issue a simple ls -l on the folder of the vdisks, while the 
> > switchover is happening, the command somtimes comcludes in 18 to 20 
> > seconds, but sometime ls will just sit there for minutes.
> >
> This is a known limitation in NFS. NFS was never intended to be
> clustered so what you experience is the NFS process on the client side
> keeps kernel locks for the now unavailable NFS server and any request
> to the process hangs waiting for these locks to be resolved. This can
> be compared to a situation where you hot-swap a drive in the pool
> without notifying the pool.
> 
> Only way to resolve this is to forcefully kill all NFS client processes
> and the restart the NFS client.

ugh. No, something else is wrong. I've been running such clusters for almost 20 
years,
it isn't a problem with the NFS server code.

> 
> 
> I've been running RSF-1 on OmniOS since about r151008.  All my clients have 
> always been NFSv3 and NFSv4.   
> 
> My memory is a bit fuzzy, but when I first started testing RSF-1, OmniOS 
> still had the Sun lock manager which was later replaced with the BSD lock 
> manager.   This has had many difficulties.
> 
> I do remember that fail overs when I first started with RSF-1 never had these 
> stalls, I believe this was because the lock state was stored in the pool and 
> the server taking over the pool would inherit that state too.   That state is 
> now lost when a pool is imported with the BSD lock manager.   
> 
> When I did testing I would do both full speed reading and writing to the pool 
> and force fail overs, both by command line and by killing power on the active 
> server.Never did I have a fail over take more than about 30 seconds for 
> NFS to fully resume data flow.   

Clients will back-off, but the client's algorithm is not universal, so we do 
expect to
see different client retry intervals for different clients. For example, the 
retries can
exceed 30 seconds for Solaris clients after a minute or two (alas, I don't have 
the
detailed data at my fingertips anymore :-(. Hence we work hard to make sure 
failovers
occur as fast as feasible.

> 
> Others who know more about the BSD lock manager vs the old Sun lock manager 
> may be able to tell us more.  I'd also be curious if Nexenta has addressed 
> this.

Lock manager itself is an issue and through we're currently testing the BSD lock
manager in anger, we haven't seen this behaviour.

Related to lock manager is name lookup. If you use name services, you add a 
latency
dependency to failover for name lookups, which is why we often disable DNS or 
other
network name services on high-availability services as a best practice.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zpool fragmentation question

2016-02-15 Thread Richard Elling

> On Feb 15, 2016, at 4:40 AM, Dominik Hassler  wrote:
> 
> Hi there,
> 
> on my server at home (OmniOS r16, patched to the latest version) I added a 
> brand new zpool (simple 2 HDD mirror).
> 
> zpool list shows a fragmentation of 14% on my main pool. I did a recursive 
> snapshot on a dataset on the main pool. transferred the dataset via 
> replication stream to the new pool (zfs send -R mainpool/dataset@backup | zfs 
> recv -F newpool/dataset).
> 
> now zpool list shows a fragmentation of 27% on the *newpool* (no other data 
> have ever been written to that pool).
> 
> How can this be? Was my assumption wrong that send/recv acts like defrag on 
> the receiving end?

The pool’s fragmentation is a roll-up of the metaslab fragmentation. A 
metaslab’s fragmentation metric is a weighted
estimate of the number of small unallocated spaces in the metaslab. As such, a 
100% free metaslab has no
fragmentation. Similarly, a metaslab with a lot of 512-byte spaces free has a 
higher fragmentation metric.

To get a better idea of the layout, free space, and computed fragmentation 
metric, use “zdb -mm poolname”

It is not clear how useful the metric is in practice, particularly when 
comparing pools of different size and 
metaslab counts. IMHO, the zdb -mm output is much more useful than the 
aggregate metric.
 — richard

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] How to get NFS read & write latency in OmniOS r151016

2015-12-24 Thread Richard Elling

> On Dec 23, 2015, at 12:58 AM, Dan Vatca  wrote:
> 
> If you need latency, you will most likely need a latency distribution 
> histogram, and not an average latency.
> With averages you will lose latency outliers that are very important. Here's 
> a good read with lots of references on this topic: 
> https://www.vividcortex.com/blog/why-percentiles-dont-work-the-way-you-think 
> 
> To currently do this on OmniOS, you need to use dtrace to aggregate 
> (quantize) time differences between nfsv3:::op-read-start and 
> nfsv3:::op-read-done (same for write).

Indeed, distributions are much more enlightening than averages.

Unfortunately, the new kstats added for NFS server operations on a 
per-mountpoint basis
are implemented using the Riemann sums (KSTAT_TYPE_IO) and it is not possible 
to obtain
per-operation information needed for min/max or distribution. These are the 
same type of 
kstat used for the iostat command.

Shameless plug, nfssvrtop has proven to be useful in watching NFS traffic and 
uses the 
op-read-start/op-read-done method.
https://github.com/richardelling/tools 

 — richard

> 
> 
> Dan Vâtca
> CTO at Syneto
> Tel: +40723604357, Skype: dan_vatca
>  
> On Wed, Dec 23, 2015 at 2:44 AM, 張峻宇  > wrote:
> Hi all,
> 
>   According to the release note of OmniOS r151016, we could get “IOPS, 
> bandwidth, and latency kstats for NFS server”
> 
>  
> 
>   there is lots of information showing when I use enter command #kstat,
> 
>   I want to get the “nfs read & write latency for NFS server”
> 
>  
> 
>   Q1 : Is the ‘nfs:0:rfsprocio_v4_write:wtime’ & 
> ‘nfs:0:rfsprocio_v4_read:wtime’ meant write & read latency ?
> 
>   Q2 : I mounted the nfs share directory, and write lots file to it, the 
> number of ‘nfs:0:rfsprocio_v4_write:wtime’ & ‘nfs:0:rfsprocio_v4_read:wtime’ 
> still zero. Why ?
> 
>  
> 
>   #kstat –p –m nfs –n rfsprocio_v4_write
> 
> nfs:0:rfsprocio_v4_write:classrfsprocio_v4
> 
> nfs:0:rfsprocio_v4_write:crtime 50.833043074
> 
> nfs:0:rfsprocio_v4_write:nread  3932160
> 
> nfs:0:rfsprocio_v4_write:nwritten  5374607360
> 
> nfs:0:rfsprocio_v4_write:rcnt 0
> 
> nfs:0:rfsprocio_v4_write:reads   163840
> 
> nfs:0:rfsprocio_v4_write:rlastupdate 12048225488385
> 
> nfs:0:rfsprocio_v4_write:rlentime  33429565743
> 
> nfs:0:rfsprocio_v4_write:rtime   23992279289
> 
> nfs:0:rfsprocio_v4_write:snaptime 269635.483575440
> 
> nfs:0:rfsprocio_v4_write:wcnt0
> 
> nfs:0:rfsprocio_v4_write:wlastupdate0
> 
> nfs:0:rfsprocio_v4_write:wlentime 0
> 
> nfs:0:rfsprocio_v4_write:writes  163840/ number of writes /
> 
> nfs:0:rfsprocio_v4_write:wtime 0  / wait queue - time 
> spent waiting /
> 
>  
> 
> #kstat –p –m nfs –n rfsprocio_v4_read
> 
> nfs:0:rfsprocio_v4_read:class rfsprocio_v4
> 
> nfs:0:rfsprocio_v4_read:crtime  50.833003263
> 
> nfs:0:rfsprocio_v4_read:nread   0
> 
> nfs:0:rfsprocio_v4_read:nwritten   0
> 
> nfs:0:rfsprocio_v4_read:rcnt  0
> 
> nfs:0:rfsprocio_v4_read:reads0
> 
> nfs:0:rfsprocio_v4_read:rlastupdate  0
> 
> nfs:0:rfsprocio_v4_read:rlentime   0
> 
> nfs:0:rfsprocio_v4_read:rtime0
> 
> nfs:0:rfsprocio_v4_read:snaptime  269635.483080962
> 
> nfs:0:rfsprocio_v4_read:wcnt 0
> 
> nfs:0:rfsprocio_v4_read:wlastupdate 0
> 
> nfs:0:rfsprocio_v4_read:wlentime 0
> 
> nfs:0:rfsprocio_v4_read:writes   0
> 
> nfs:0:rfsprocio_v4_read:wtime  0
> 
>
> 
>  
> 
>  
> 
> Best regards,
> 
> -
> 
> 張峻宇
> 
> 中華電信研究院雲端運算研究所
> 
> TEL: 03-4245663
> 
>  
> 
> 
> 
> 本信件可能包含中華電信股份有限公司機密資訊,非指定之收件者,請勿蒐集、處理或利用本信件內容,並請銷毀此信件. 
> 如為指定收件者,應確實保護郵件中本公司之營業機密及個人資料,不得任意傳佈或揭露,並應自行確認本郵件之附檔與超連結之安全性,以共同善盡資訊安全與個資保護責任.
>  
> Please be advised that this email message (including any attachments) 
> contains confidential information and may be legally privileged. If you are 
> not the intended recipient, please destroy this message and all attachments 
> from your system and do not further collect, process, or use them. Chunghwa 
> Telecom and all its subsidiaries and associated companies shall not be liable 
> for the improper or incomplete transmission of the information contained in 
> this email nor for any delay in its receipt or damage to your system. If you 
> are the intended recipient, please protect the confidential and/or personal 
> information contained in this email with due care. Any unauthorized use, 
> disclosure or distribution of this message in whole or in part is strictly 
> prohibited. Also, please self-inspect attachments and hyperlinks contained in 
> this email to ensure the information security and to protect personal 
> information.
> 
> ___
> OmniOS-discuss mailing list
> Omni

Re: [OmniOS-discuss] How to get NFS read & write latency in OmniOS r151016

2015-12-22 Thread Richard Elling

> On Dec 22, 2015, at 4:44 PM, 張峻宇  wrote:
> 
> Hi all,
>   According to the release note of OmniOS r151016, we could get “IOPS, 
> bandwidth, and latency kstats for NFS server”
>  
>   there is lots of information showing when I use enter command #kstat,
>   I want to get the “nfs read & write latency for NFS server”
>  
>   Q1 : Is the ‘nfs:0:rfsprocio_v4_write:wtime’ & 
> ‘nfs:0:rfsprocio_v4_read:wtime’ meant write & read latency ?

No, wtime is the wait queue occupancy (%wait in iostat -x)
A good reference is the man page for kstat(3kstat)
man -s 3kstat kstat

Hopefully, the information there will answer your Q2.
 -- richard

>   Q2 : I mounted the nfs share directory, and write lots file to it, the 
> number of ‘nfs:0:rfsprocio_v4_write:wtime’ & ‘nfs:0:rfsprocio_v4_read:wtime’ 
> still zero. Why ? 
>  
>   #kstat –p –m nfs –n rfsprocio_v4_write
> nfs:0:rfsprocio_v4_write:classrfsprocio_v4
> nfs:0:rfsprocio_v4_write:crtime 50.833043074
> nfs:0:rfsprocio_v4_write:nread  3932160
> nfs:0:rfsprocio_v4_write:nwritten  5374607360
> nfs:0:rfsprocio_v4_write:rcnt 0
> nfs:0:rfsprocio_v4_write:reads   163840
> nfs:0:rfsprocio_v4_write:rlastupdate 12048225488385
> nfs:0:rfsprocio_v4_write:rlentime  33429565743
> nfs:0:rfsprocio_v4_write:rtime   23992279289
> nfs:0:rfsprocio_v4_write:snaptime 269635.483575440
> nfs:0:rfsprocio_v4_write:wcnt0
> nfs:0:rfsprocio_v4_write:wlastupdate0
> nfs:0:rfsprocio_v4_write:wlentime 0
> nfs:0:rfsprocio_v4_write:writes  163840/ number of writes /
> nfs:0:rfsprocio_v4_write:wtime 0  / wait queue - time 
> spent waiting /
>  
> #kstat –p –m nfs –n rfsprocio_v4_read
> nfs:0:rfsprocio_v4_read:class rfsprocio_v4
> nfs:0:rfsprocio_v4_read:crtime  50.833003263
> nfs:0:rfsprocio_v4_read:nread   0
> nfs:0:rfsprocio_v4_read:nwritten   0
> nfs:0:rfsprocio_v4_read:rcnt  0
> nfs:0:rfsprocio_v4_read:reads0
> nfs:0:rfsprocio_v4_read:rlastupdate  0
> nfs:0:rfsprocio_v4_read:rlentime   0
> nfs:0:rfsprocio_v4_read:rtime0
> nfs:0:rfsprocio_v4_read:snaptime  269635.483080962
> nfs:0:rfsprocio_v4_read:wcnt 0
> nfs:0:rfsprocio_v4_read:wlastupdate 0
> nfs:0:rfsprocio_v4_read:wlentime 0
> nfs:0:rfsprocio_v4_read:writes   0
> nfs:0:rfsprocio_v4_read:wtime  0
> 
>  
>  
> Best regards,
> -
> 張峻宇
> 中華電信研究院雲端運算研究所
> TEL: 03-4245663
>  
> 
> 
> 本信件可能包含中華電信股份有限公司機密資訊,非指定之收件者,請勿蒐集、處理或利用本信件內容,並請銷毀此信件. 
> 如為指定收件者,應確實保護郵件中本公司之營業機密及個人資料,不得任意傳佈或揭露,並應自行確認本郵件之附檔與超連結之安全性,以共同善盡資訊安全與個資保護責任.
>  
> Please be advised that this email message (including any attachments) 
> contains confidential information and may be legally privileged. If you are 
> not the intended recipient, please destroy this message and all attachments 
> from your system and do not further collect, process, or use them. Chunghwa 
> Telecom and all its subsidiaries and associated companies shall not be liable 
> for the improper or incomplete transmission of the information contained in 
> this email nor for any delay in its receipt or damage to your system. If you 
> are the intended recipient, please protect the confidential and/or personal 
> information contained in this email with due care. Any unauthorized use, 
> disclosure or distribution of this message in whole or in part is strictly 
> prohibited. Also, please self-inspect attachments and hyperlinks contained in 
> this email to ensure the information security and to protect personal 
> information.___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com 
> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
> 
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Samsung SM863

2015-12-10 Thread Richard Elling

> On Dec 10, 2015, at 12:02 PM, Dave Pooser  wrote:
> 
> On 12/10/15, 12:13 PM, "OmniOS-discuss on behalf of Richard Elling"
>  richard.ell...@richardelling.com> wrote:
> 
>> 
>>> On Dec 10, 2015, at 4:58 AM, Tobias Oetiker  wrote:
>>> 
>>> Just found that samsung now has an ssd with  power loss protection
>>> 
>>> http://www.storagereview.com/samsung_sm863_ssd_review
>>> 
>>> what do you think ?
>> 
>> Power-loss protection is not required (ZFS works on HDDs :-) but it is a
>> nice feature.
> 
> On a device that will likely be used for ZIL, I'd call power-loss
> protection required. HDDs don't lie to the OS about when data has been
> flushed from cache to disk the way SSDs do, right?

You do need a device that honors cache flush commands. But that goes for
any device or RAID array, not just SSDs.
 -- richard

> 
> On a device that's going to be L2ARC I care a lot less, obviously. ;-)
> -- 
> Dave Pooser
> Cat-Herder-in-Chief, Pooserville.com
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Samsung SM863

2015-12-10 Thread Richard Elling

> On Dec 10, 2015, at 4:58 AM, Tobias Oetiker  wrote:
> 
> Just found that samsung now has an ssd with  power loss protection
> 
> http://www.storagereview.com/samsung_sm863_ssd_review
> 
> what do you think ?

Power-loss protection is not required (ZFS works on HDDs :-) but it is a nice 
feature.
Overall, this looks like a very nice SSD. I expect more enterprise-grade SSDs 
from
Samsung in the future.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow performance with ZeusRAM?

2015-10-23 Thread Richard Elling
additional insight below...

> On Oct 22, 2015, at 12:02 PM, Matej Zerovnik  wrote:
> 
> Hello,
> 
> I'm building a new system and I'm having a bit of a performance problem. 
> Well, its either that or I'm not getting the whole ZIL idea:)
> 
> My system is following:
> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
> - LSI 9207 HBA (P19 firmware)
> - Supermicro JBOD with SAS expander
> - 4TB SAS3 drives
> - ZeusRAM for ZIL
> - LTS Omnios (all patches applied)
> 
> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get 48k 
> IOPS out of it, no problem there.

Do not assume writes to the slog for 4k random write workload are only 4k in 
size.
You'll want to measure to be sure, but the worst case here is 8k written to 
slog:
   4k data + 4k chain pointer = 8k physical write

There are cases where multiple 4k data gets coalesced, so the above is worst 
case.
Measure to be sure. A quick back-of-the-napkin measurement can be done from
iostat -x output. More detailed measurements can be done wil zilstat or other 
specific
dtracing.
 -- richard

> 
> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for ZIL 
> and set sync=always, I can only squeeze 14k IOPS out of the system.
> Is that normal or should I be getting 48k IOPS on the 2nd pool as well, since 
> this is the performance ZeusRAM can deliver?
> 
> I'm testing with fio:
> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers 
> --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16 
> --numjobs=16 --runtime=60 --group_reporting --name=4ktest
> 
> thanks, Matej___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] big zfs storage?

2015-10-07 Thread Richard Elling

> On Oct 7, 2015, at 1:59 PM, Mick Burns  wrote:
> 
> So... how does Nexenta copes with hot spares and all kinds of disk failures ?
> Adding hot spares is part of their administration manuals so can we
> assume things are almost always handled smoothly ?  I'd like to hear
> from tangible experiences in production.

I do not speak for Nexenta.

Hot spares are a bigger issue when you have single parity protection.
With double parity and large pools, warm spares is a better approach.
The reasons are:

1. Hot spares exist solely to eliminate the time between disk failure and human
   intervention for corrective action. There is no other reason to have hot 
spares.
   The exposure for a single disk failure under single parity protection is too 
risky
   for most folks, but with double parity (eg raidz2 or RAID-6) the few hours 
you 
   save has little impact on overall data availabilty vs warm spares.

2. Under some transient failure conditions (eg isolated power failure, IOM 
reboot, or fabric 
   partition), all available hot spares can be kicked into action. This can 
leave you with a
   big mess for large pools with many drives and spares. You can avoid this by 
making a
   human be involved in the decision process, rather than just *locally 
isolated,* automated
   decision making.

 -- richard

> 
> 
> thanks
> 
> On Mon, Jul 13, 2015 at 7:58 AM, Schweiss, Chip  wrote:
>> Liam,
>> 
>> This report is encouraging.  Please share some details of your
>> configuration.   What disk failure parameters are have you set?   Which
>> JBODs and disks are you running?
>> 
>> I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
>> expanders and Supermicro has LSI, both setups have pretty much the same
>> behavior with disk failures.   All my servers are Supermicro with LSI HBAs.
>> 
>> If there's a magic combination of hardware and OS config out there that
>> solves the disk failure panic problem, I will certainly change my builds
>> going forward.
>> 
>> -Chip
>> 
>> On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser  wrote:
>>> 
>>> I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T systems.
>>> Things generally work very well.  We loose a disk here and there but its
>>> never resulted in downtime.  They're all on Dell hardware with LSI or Dell
>>> PERC controllers.
>>> 
>>> Putting in smaller disk failure parameters, so disks fail quicker, was a
>>> big help when something does go wrong with a disk.
>>> 
>>> thanks,
>>> liam
>>> 
>>> 
>>> On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip 
>>> wrote:
 
 Unfortunately for the past couple years panics on disk failure has been
 the norm.   All my production systems are HA with RSF-1, so at least things
 come back online relatively quick.  There are quite a few open tickets in
 the Illumos bug tracker related to mpt_sas related panics.
 
 Most of the work to fix these problems has been committed in the past
 year, though problems still exist.  For example, my systems are dual path
 SAS, however, mpt_sas will panic if you pull a cable instead of dropping a
 path to the disks.  Dan McDonald is actively working to resolve this.   He
 is also pushing a bug fix in genunix from Nexenta that appears to fix a lot
 of the panic problems.   I'll know for sure in a few months after I see a
 disk or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta is
 responsible for most of the updates to mpt_sas including support for 3008
 (12G SAS).
 
 I haven't run any 12G SAS yet, but plan to on my next build in a couple
 months.   This will be about 300TB using an 84 disk JBOD.  All the code 
 from
 Nexenta to support the 3008 appears to be in Illumos now, and they fully
 support it so I suspect it's pretty stable now.  From what I understand
 there may be some 12G performance fixes coming sometime.
 
 The fault manager is nice when the system doesn't panic.  When it panics,
 the fault manger never gets a chance to take action.  It is still the
 consensus that is is better to run pools without hot spares because there
 are situations the fault manager will do bad things.   I witnessed this
 myself when building a system and the fault manger replaced 5 disks in a
 raidz2 vdev inside 1 minute, trashing the pool.   I haven't completely 
 yield
 to the "best practice".  I now run one hot spare per pool.  I figure with
 raidz2, the odds of the fault manager causing something catastrophic is 
 much
 less possible.
 
 -Chip
 
 
 
 On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley 
 wrote:
> 
> I have to build and maintain my own system. I usually help others
> build(i teach zfs and freenas classes/consulting). I really love fault
> management in solaris and miss it. Just thought since it's my system and I
> get to choose I would use omni. I have 20+ years using solaris and only 2 

Re: [OmniOS-discuss] r151014 users - beware of illumos 6214 and L2ARC

2015-09-11 Thread Richard Elling

> On Sep 11, 2015, at 11:37 AM, Dan McDonald  wrote:
> 
> 
>> On Sep 11, 2015, at 2:33 PM, Michael Rasmussen  wrote:
>> 
>> What should one look for from the zdb output to identify any errors?
> 
> Look for assertion failures, or other non-0 exits.
> 
> Basically, zdb has the kernel zfs implementation in userspace.  If a kernel 
> would panic, it will also "panic" zdb, leaving a 'core' around.

Also recall that zdb reads from disks. Therefore if a pool is imported, it can 
get
out of sync with current reality. That said, it should be reasonably ok for 
older
data that is not being overwritten (deleted in the current pool with no 
snapshot)
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] dell 730xd with md3060e jbod

2015-08-13 Thread Richard Elling

> On Aug 13, 2015, at 8:47 AM, Randy S  wrote:
> 
> Hi,
> 
> A while ago I had a moment to test a 730XD with a md3060e jbod. 
> I have read the other threads  regarding the 730 usability.
> I had no problems with it using omnios R12. However the use  of the JBOD did 
> raise an issue regarding blinking disk leds.
> 
> I noticed that the signals send with sas2ircu were not doing their job 
> (nothing blinks). After some calls to dell technicians, I heard
> that dell has disabled these signals in their firmware and only allows 
> signalling through their own tool, which only works with windows
> and some linux flavours.
> 
> At that time I hear about santools which "might propably" be used for this 
> blinking functionality (and more), but you have to buy it to test it.
> Bit expensive for a test.
> 
> After this long intro, my question is:
> Does anybody know of a another way (script, tools etc)  to get this blinking 
> functionality going in this hardware combination (ofcourse NOT using dd) ?

Try fmtopo first. There are 3 indicators defined: fail, ident, ok2rm. Many SES 
vendors
only implement fail and ident. Here an example:

/usr/lib/fm/fmd/fmtopo -P facility.mode=uint32:1 
hc://:chassis-mfg=XYZ:chassis-name=XYZ-ABC:chassis-part=unknown:chassis-serial=500093d0016cc76/ses-enclosure=0/bay=1?indicator=fail

to de-assert the indicator, facility.mode=uint32:0

to find the FMRI string, use fmtopo to observe what your hardware reports.
 -- richard

> 
> As I understood, this same combination is used, as a standard, by a another 
> (big) illumos kernel user and their systems also do not blink disks. I 
> however,
> would like to be able to find a disk easilly to e.g. replace and not only 
> depend on the internal disk tests which the JBOD seems to do by itself 
> regardless of the OS used. 
> 
> (Dell told me theJBOD detects defect disks by itself by performing some 
> periodic tests. How it does this I do not know).
> 
> Regards,
> 
> R
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow Drive Detection and boot-archive

2015-07-21 Thread Richard Elling

> On Jul 20, 2015, at 7:56 PM, Michael Talbott  wrote:
> 
> Thanks for the reply. The bios for the card is disabled already. The 8 second 
> per drive scan happens after the kernel has already loaded and it is scanning 
> for devices. I wonder if it's due to running newer firmware. I did update the 
> cards to fw v.20.something before I moved to omnios. Is there a particular 
> firmware version on the cards I should run to match OmniOS's drivers?

Google "LSI P20 firmware" for many tales of woe for many different OSes.
Be aware that getting the latest version of firmware from Avago might not be 
obvious...
the latest version is 20.00.04.00 for Windows.
 -- richard

> 
> 
> 
> Michael Talbott
> Systems Administrator
> La Jolla Institute
> 
>> On Jul 20, 2015, at 6:06 PM, Marion Hakanson  wrote:
>> 
>> Michael,
>> 
>> I've not seen this;  I do have one system with 120 drives and it
>> definitely does not have this problem.  A couple with 80+ drives
>> are also free of this issue, though they are still running OpenIndiana.
>> 
>> One thing I pretty much always do here, is to disable the boot option
>> in the LSI HBA's config utility (accessible from the during boot after
>> the BIOS has started up).  I do this because I don't want the BIOS
>> thinking it can boot from any of the external JBOD disks;  And also
>> because I've had some system BIOS crashes when they tried to enumerate
>> too many drives.  But, this all happens at the BIOS level, before the
>> OS has even started up, so in theory it should not affect what
>> you are seeing.
>> 
>> Regards,
>> 
>> Marion
>> 
>> 
>> 
>> Subject: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
>> From: Michael Talbott 
>> Date: Fri, 17 Jul 2015 16:15:47 -0700
>> To: omnios-discuss 
>> 
>> Just realized my typo. I'm using this on my 90 and 180 drive systems:
>> 
>> # svccfg -s boot-archive setprop start/timeout_seconds=720
>> # svccfg -s boot-archive setprop start/timeout_seconds=1440
>> 
>> Seems like 8 seconds to detect each drive is pretty excessive.
>> 
>> Any ideas on how to speed that up?
>> 
>> 
>> 
>> Michael Talbott
>> Systems Administrator
>> La Jolla Institute
>> 
>>> On Jul 17, 2015, at 4:07 PM, Michael Talbott  wrote:
>>> 
>>> I have multiple NAS servers I've moved to OmniOS and each of them have 
>>> 90-180 4T disks. Everything has worked out pretty well for the most part. 
>>> But I've come into an issue where when I reboot any of them, I'm getting 
>>> boot-archive service timeouts happening. I found a workaround of increasing 
>>> the timeout value which brings me to the following. As you can see below in 
>>> a dmesg output, it's taking the kernel about 8 seconds to detect each of 
>>> the drives. They're connected via a couple SAS2008 based LSI cards.
>>> 
>>> Is this normal?
>>> Is there a way to speed that up?
>>> 
>>> I've fixed my frustrating boot-archive timeout problem by adjusting the 
>>> timeout value from the default of 60 seconds (I guess that'll work ok on 
>>> systems with less than 8 drives?) to 8 seconds * 90 drives + a little extra 
>>> time = 280 seconds (for the 90 drive systems). Which means it takes between 
>>> 12-24 minutes to boot those machines up.
>>> 
>>> # svccfg -s boot-archive setprop start/timeout_seconds=280
>>> 
>>> I figure I can't be the only one. A little googling also revealed: 
>>> https://www.illumos.org/issues/4614 
>>> 
>>> Jul 17 15:40:15 store2 genunix: [ID 583861 kern.info] sd29 at mpt_sas3: 
>>> unit-address w5c0f0401bd43,0: w5c0f0401bd43,0
>>> Jul 17 15:40:15 store2 genunix: [ID 936769 kern.info] sd29 is 
>>> /pci@0,0/pci8086,e06@2,2/pci1000,3080@0/iport@f/disk@w5c0f0401bd43,0
>>> Jul 17 15:40:16 store2 genunix: [ID 408114 kern.info] 
>>> /pci@0,0/pci8086,e06@2,2/pci1000,3080@0/iport@f/disk@w5c0f0401bd43,0 
>>> (sd29) online
>>> Jul 17 15:40:24 store2 genunix: [ID 583861 kern.info] sd30 at mpt_sas3: 
>>> unit-address w5c0f045679c3,0: w5c0f045679c3,0
>>> Jul 17 15:40:24 store2 genunix: [ID 936769 kern.info] sd30 is 
>>> /pci@0,0/pci8086,e06@2,2/pci1000,3080@0/iport@f/disk@w5c0f045679c3,0
>>> Jul 17 15:40:24 store2 genunix: [ID 408114 kern.info] 
>>> /pci@0,0/pci8086,e06@2,2/pci1000,3080@0/iport@f/disk@w5c0f045679c3,0 
>>> (sd30) online
>>> Jul 17 15:40:33 store2 genunix: [ID 583861 kern.info] sd31 at mpt_sas3: 
>>> unit-address w5c0f045712b3,0: w5c0f045712b3,0
>>> Jul 17 15:40:33 store2 genunix: [ID 936769 kern.info] sd31 is 
>>> /pci@0,0/pci8086,e06@2,2/pci1000,3080@0/iport@f/disk@w5c0f045712b3,0
>>> Jul 17 15:40:33 store2 genunix: [ID 408114 kern.info] 
>>> /pci@0,0/pci8086,e06@2,2/pci1000,3080@0/iport@f/disk@w5c0f045712b3,0 
>>> (sd31) online
>>> Jul 17 15:40:42 store2 genunix: [ID 583861 kern.info] sd32 at mpt_sas3: 
>>> unit-address w5c0f04571497,0: w5c0f04571497,0
>>> Jul 17 15:40:42 st

Re: [OmniOS-discuss] Zil Device

2015-07-16 Thread Richard Elling

> On Jul 16, 2015, at 11:30 AM, Schweiss, Chip  wrote:
> 
> The 850 Pro should never be used as a log device.  It does not have power 
> fail protection of its ram cache.   You might as well set sync=disabled and 
> skip using a log device entirely because the 850 Pro is not protecting your 
> last transactions in case of power failure.

Chip, are you asserting that the 850 Pro does not honor the cache flush command?
In the bad old days, there were SSDs that were broken for cache flushes, but 
some of
the most important (to Samsung) OSes, like Windows, rely on cache flushes to 
work.
ZFS does as well.
 -- richard

> 
> Only SSDs with power failure protection should be considered for log devices. 
>   
> 
> That being said, unless your running application that need transaction 
> consistency such as databases, don't bother with using a log device and set 
> sync=disabled.   
> 
> -Chip
> 
> On Thu, Jul 16, 2015 at 11:55 AM, Doug Hughes  > wrote:
> 8GB zil on very active server and 100+GB ssd lasts many years. We have yet, 
> after years of use of various SSDs, to have one fail from wear usage, and 
> that's with fairly active NFS use.
> They usually fail for other reasons.
> We started with with Intel X series, which are only 32GB in size, and some of 
> them are still active, though less active use now. With Samsung 850 pro, you 
> practically don't have to worry about it, and the price is really good.
> 
> 
> On Thu, Jul 16, 2015 at 12:36 PM, Brogyányi József  > wrote:
> Hi Doug
> 
> Can you write its life time? I don't trust any SSD but I've thinking for a 
> while to use as a ZIL+L2ARC.
> Could you share with us your experiences? I would be interested in server 
> usage. Thanks.
> 
> 
> 
> 2015.07.15. 22:42 keltezéssel, Doug Hughes írta:
>> We have been preferring commodity SSD like Intel 320 (older), intel 710, or 
>> currently, Samsung 850 pro. We also use it as boot drive and reserve an 8GB 
>> slide for ZIL so that massive synchronous NFS IOPS are manageable.
>> 
>> Sent from my android device.
>> 
>> -Original Message-
>> From: Matthew Lagoe  
>> 
>> To: omnios-discuss@lists.omniti.com 
>> Sent: Wed, 15 Jul 2015 16:29
>> Subject: [OmniOS-discuss] Zil Device
>> 
>> Is the zeusram SSD still the big zil device out there or are there other 
>> high performance reliable options that anyone knows of on the market now? I 
>> can't go with like the DDRdrive as its pcie. 
>> 
>> Thanks 
>> 
>> 
>> ___ 
>> OmniOS-discuss mailing list 
>> OmniOS-discuss@lists.omniti.com  
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
>>  
>> 
>> 
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com 
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
>> 
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com 
> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
> 
> 
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com 
> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
> 
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Clues for tracking down why kernel memory isn't being released?

2015-07-16 Thread Richard Elling

> On Jul 16, 2015, at 9:48 AM, Chris Siebenmann  wrote:
> 
> I wrote:
>> We have one ZFS-based NFS fileserver that persistently runs at a very
>> high level of non-ARC kernel memory usage that never seems to shrink.
>> On a 128 GB machine, mdb's ::memstat reports 95% memory usage by just
>> 'Kernel' while the ZFS ARC is only at about 21 GB (as reported by
>> 'kstat -m') although c_max should allow it to grow much bigger.
>> 
>> According to ::kmastat, a *huge* amount of this memory appears to be
>> vanishing into allocated but not used kmem_alloc_131072 slab buffers:
>> 
>>> ::kmastat
>> cachebuf   buf   buf memory  alloc 
>> alloc
>> namesizein use total in usesucceed  
>> fail
>> -- - - - -- -- 
>> -
>> [...]
>> kmem_alloc_131072   128K 6613033  74.8G  196862991   
>>   0
> 
> It turns out that the explanation for this is relatively simple, as
> is the work around. Put simply: the OmniOS kernel does not actually
> free up these deallocated cache objects until the system is put under
> relatively strong memory pressure. Crucially, *the ZFS ARC does not
> create this memory pressure*; I think that you pretty much need a user
> level program allocating enough memory in order to trigger it, and I
> think the memory growth needs to happen relatively rapidly fast so that
> the kernel doesn't reclaim enough memory through lesser means (such as
> shrinking the ZFS ARC).

I don't think we will get much traction for ZFS pushing applications out of RAM.
There is a nuance here, that can be difficult to resolve.

> 
> (Specifically, you need to force kmem_reap() to be called. The primary
> path for this is if 'freemem' drops under 'lotsfree', which is only a few
> hundred MB on many systems. See usr/src/uts/common/os/vm_pageout.c in
> the OmniOS source repo.)
> 
> Since our fileservers are purely NFS fileservers and have a basically
> static level of user memory usage, they rarely or never rapidly use up
> enough memory to trigger this 'allocated but unused' reclaim[*].
> 
> The good news is that it's easy enough these days to eat memory at the
> user level (you can do it with modern 64-bit scripting languages like
> Python, even at an interactive prompt). The bad news is that when we did
> this on the server in question we provoked a significant system stall at
> both the NFS server level and even the level of ssh logins and shells;
> this is clearly not something that we'd want to automate.
> 
> It's my personal opinion that there should be something in the kernel
> that automatically reaps drastically outsized kmem caches after a
> while. It's absurd that we've run for weeks with more than 70 GB of RAM
> sitting unused and an undersized ZFS ARC because of this.

kmem reaps can be very painful

> 
>   - cks
> [*: interested parties can see how often cache reaping has been triggered
>with the following 'mdb -k' command:
>   ::walk kmem_cache | ::printf "%4d %s\n" kmem_cache_t cache_reap 
> cache_name

ugh. How about:
kstat -p :::reap

 -- richard

> 
>Even on this heavily used fileserver, up for 45 days, the reap count
>was *8*.  Many of our other fileservers, with less usage, have reap
>counts of 0.
> ]
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS crash/reboot loop

2015-07-12 Thread Richard Elling

> On Jul 12, 2015, at 5:26 PM, Derek Yarnell  wrote:
> 
> On 7/12/15 3:21 PM, Günther Alka wrote:
>> First action:
>> If you can mount the pool read-only, update your backup
> 
> We are securing all the non-scratch data currently before messing with
> the pool any more.  We had backups as recent as the night before but it
> is still going to be faster to pull the current data from the readonly
> pool than from backups.
> 
>> Then
>> I would expect that a single bad disk is the reason of the problem on a
>> write command. I would first check the system and fault log or
>> smartvalues for hints about a bad disk. If there is a suspicious disk,
>> remove that and retry a regular import.
> 
> We have pulled all disks individually yesterday to test this exact
> theory.  We have hit the mpt_sas disk failure panics before so we had
> already tried this.

I don't believe this is a bad disk.

Some additional block pointer verification code was added in changeset
f63ab3d5a84a12b474655fc7e700db3efba6c4c9 and likely is the cause
of this assertion. In general, assertion failures are almost always software
problems -- the programmer didn't see what they expected.

Dan, if you're listening, Matt would be the best person to weigh-in on this.
 -- richard

> 
>> If there is no hint
>> Next what I would try is a pool export. Then create a script that
>> imports the pool followed by a scrub cancel. (Hope that the cancel is
>> faster than the crash). Then check logs during some pool activity.
> 
> If I have not imported the pool RW can I export the pool?  I thought we
> have tried this but I will have to confer.
> 
>> If this does not help, I would remove all data disks and bootup.
>> Then hot-plug disk by disk and check if its detected properly and check
>> logs. Your pool remains offline until enough disks come back.
>> Adding disk by disk and checking logs should help to find a bad disk
>> that initiates a crash
> 
> This is interesting and we will try this once we secure the data.
> 
>> Next option is, try a pool import where always one or next disk is
>> missing. Until there is no write, missing disks are not a problem with
>> ZFS (you may need to clear errors).
> 
> Wouldn't this be the same as above hot-plugging disk by disk?
> 
>> Last option:
>> use another server where you try to import (mainboard, power,  hba or
>> backplane problem) remove all disks and do a nondestructive or smart
>> test on another machine
> 
> Sadly we do not have a spare chassis with 40 slots around to test this.
> I am so far unconvinced that this is a hardware problem though.
> 
> We will most likely boot up into linux live CD to run smartctl and see
> if it has any information on the disks.
> 
> -- 
> Derek T. Yarnell
> University of Maryland
> Institute for Advanced Computer Studies
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] allocation throttle

2015-06-28 Thread Richard Elling

> On Jun 27, 2015, at 4:03 PM, Dan McDonald  wrote:
> 
> On Jun 27, 2015, at 2:06 PM, Richard Elling  <mailto:richard.ell...@richardelling.com>> wrote:
> 
>> it has been in for a year or so
>> 
>>  -- richard
> 
> Eesh - which commit?  I can tell you which OmniOS release it first appeared in

time flies…
69962b56 Matt Ahrens, 2013-08-26
 — richard

> 
> Dan
> 
> Sent from my iPhone (typos, autocorrect, and all)

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS considered in-memory OS ?

2015-06-28 Thread Richard Elling

> On Jun 26, 2015, at 8:47 AM, John Barfield  wrote:
> 
> I’ve been interested in configuring omnios to run in memory off of a ram 
> disk myself. 
> 
> Does anyone know where you could find a good guide for booting Solaris(h) 
> kernel into memory with a ramdisk?

:-)
the funny thing is that SunOS has been bootable into ramdisk since... forever.
Back in the day, before CDs were invented, we booted from tape or (if you really
hated yourself) floppies :-). Net-net, this probably isn't documented anywhere, 
per se,
outside of the building shell scripts :-(

The way it works is to produce an image of a file system. Add that to a boot 
loader
and instruct the boot loader to put it in memory and run. The actual work is to 
produce
the image, and this is where the build scripts comes into play. You can 
certainly take
an OmniOS build and produce an in-memory root, however, the missing management
part is handling the persistence of identity. For that, it is useful to see how 
SmartOS
handles /var and files normally modified in /etc.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] allocation throttle

2015-06-27 Thread Richard Elling
it has been in for a year or so

  -- richard



> On Jun 27, 2015, at 8:13 AM, Dan McDonald  wrote:
> 
> 
>> On Jun 27, 2015, at 7:24 AM, Tobias Oetiker  wrote:
>> 
>> I am just watching OpenZFS Conference Videos. George Wilson just
>> showed off his allocation throttle work ... is this in omnios
>> already ?
> 
> If it's in illumos-gate, it's in at least OmniOS bloody.  I believe this work 
> hasn't been upstreamed yet out of DelphixOS.
> 
> Dan
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Zpool export while resilvering?

2015-06-09 Thread Richard Elling

> On Jun 9, 2015, at 12:00 PM, Narayan Desai  wrote:
> 
> You might also crank up the priority on your resilver, particularly if it is 
> getting tripped all of the time:
> http://broken.net/uncategorized/zfs-performance-tuning-for-scrubs-and-resilvers/
>  
> 
>  -nld

In general, yes this is a very good post. However, for more recent ZFS and 
certainly
the lastest OmniOS something-14 release, the write throttle has been completely
rewritten, positively impacting resilvers. And, with that rewrite, there is a 
few more
tunables at your disposal, while the old ones fade to the bucket of bad 
memories :-)

In most cases, resilver is capped by the time to write to the resilvering 
device.
You can see this in iostat "-x" as the device that is 100% busy with write 
workload.
That said, for this specific case, the drives are not actually failed, just 
taken offline,
so you could have a short resilver session, once they are brought back online.
 -- richard

> 
> On Tue, Jun 9, 2015 at 1:31 PM, Dave Pooser  > wrote:
> >This is probably a silly question, but I¹ve honestly never tried this and
> >don¹t have a test machine handy at the moment ­ can a pool be safely
> >exported and re-imported later if it is currently resilvering?
> >
> >In the way of a bit of background, I have a pool made up with thirty or
> >so 4TB Seagate disks with a firmware issue that results in their max temp
> >being set at 40C as opposed to 60C. This particular pool is
> > in an office building in Texas, in an air-conditioned server room. The
> >condenser for this unit is in the building¹s plenum and when the building
> >a/c goes off over weekends in the summer my server room a/c struggles and
> >temps run up to about 85F or so. This
> > is causing my pool to drop random disks lately (fmadm reports high temp
> >and they get marked as removed from the pool), and I¹ve only just
> >narrowed it down to this firmware issue. Seagate firmware update utility
> >is apparently Windows only, so the disks must
> > come out for the firmware update, but the pool is resilvering several
> >disks with days remaining, hence my original query.
> 
> Not an answer to your question, but the approach I'd take is renting a
> portable 110V or 220V A/C unit from somebody like spot-coolers.com 
>  to get
> you through the resilver, then apply the firmware update. (And then I'd
> start trying to convince management that it's worth adding a unit
> permanently -- our Office Pro 24 24000BTU/hr cost us under $4k back in
> 2011.)
> --
> Dave Pooser
> Cat-Herder-in-Chief, Pooserville.com
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com 
> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Zpool export while resilvering?

2015-06-09 Thread Richard Elling

> On Jun 9, 2015, at 8:05 AM, Robert A. Brock  
> wrote:
> 
> List,
>  
> This is probably a silly question, but I’ve honestly never tried this and 
> don’t have a test machine handy at the moment – can a pool be safely exported 
> and re-imported later if it is currently resilvering?

yes.

>  
> In the way of a bit of background, I have a pool made up with thirty or so 
> 4TB Seagate disks with a firmware issue that results in their max temp being 
> set at 40C as opposed to 60C.

yep, this is the broken 003 firmware from Seagate, know it well :-P

> This particular pool is in an office building in Texas, in an air-conditioned 
> server room. The condenser for this unit is in the building’s plenum and when 
> the building a/c goes off over weekends in the summer my server room a/c 
> struggles and temps run up to about 85F or so. This is causing my pool to 
> drop random disks lately (fmadm reports high temp and they get marked as 
> removed from the pool), and I’ve only just narrowed it down to this firmware 
> issue. Seagate firmware update utility is apparently Windows only, so the 
> disks must come out for the firmware update, but the pool is resilvering 
> several disks with days remaining, hence my original query.

fwflash might work, but it is unlikely Seagate knows anything about it. In any 
case,
firmware upgrades on production system is not a best practice.

You can also disable the FMA agent, disk-transport, which is the agent 
responsible 
for watching to ensure the temperature does not exceed the "temperature at which
the drive vendor says the drive should not be operated" The impact to you is 
that 
the same agent detects other failures, such as predicted failures, that 
probably do
need to be noticed. For a short window, this option might work for you.

Useful commands:
fmstat - shows the current FMA modules, and should include 
disk-transport
fmadm unload disk-transport
fmadm load disk-transport

The temp checks (and PFA) are done once per hour, by default.
 -- richard


>  
> Regards,
> Rob
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] disk failure causing reboot?

2015-05-21 Thread Richard Elling

> On May 18, 2015, at 11:25 AM, Jeff Stockett  wrote:
> 
> A drive failed in one of our supermicro 5048R-E1CR36L servers running omnios 
> r151012 last night, and somewhat unexpectedly, the whole system seems to have 
> panicked.
>  
> May 18 04:43:08 zfs01 scsi: [ID 365881 kern.info] 
> /pci@0,0/pci8086,2f02@1/pci15d9,808@0 (mpt_sas0):
> May 18 04:43:08 zfs01 Log info 0x3114 received for target 29 
> w5c0f01f1bf06.
> May 18 04:43:08 zfs01 scsi_status=0x0, ioc_status=0x8048, 
> scsi_state=0xc

[forward reference]

> May 18 04:44:36 zfs01 genunix: [ID 843051 kern.info] NOTICE: SUNW-MSG-ID: 
> SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
> May 18 04:44:36 zfs01 unix: [ID 836849 kern.notice]
> May 18 04:44:36 zfs01 ^Mpanic[cpu0]/thread=ff00f3ecbc40:
> May 18 04:44:36 zfs01 genunix: [ID 918906 kern.notice] I/O to pool 'dpool' 
> appears to be hung.
> May 18 04:44:36 zfs01 unix: [ID 10 kern.notice]
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3ecba20 
> zfs:vdev_deadman+10b ()

Bugs notwithstanding, the ZFS deadman timer occurs when a ZFS I/O does not
complete in 10,000 seconds (by default). The problem likely lies below ZFS. For 
this
reason, the deadman timer was invented -- don't blame ZFS for a problem below 
ZFS.

> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3ecba70 
> zfs:vdev_deadman+4a ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3ecbac0 
> zfs:vdev_deadman+4a ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3ecbaf0 
> zfs:spa_deadman+ad ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3ecbb90 
> genunix:cyclic_softint+fd ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3ecbba0 
> unix:cbe_low_level+14 ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3ecbbf0 
> unix:av_dispatch_softvect+78 ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3ecbc20 
> apix:apix_dispatch_softint+35 ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3e05990 
> unix:switch_sp_and_call+13 ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3e059e0 
> apix:apix_do_softint+6c ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3e05a40 
> apix:apix_do_interrupt+34a ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3e05a50 
> unix:cmnint+ba ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3e05bc0 
> unix:acpi_cpu_cstate+11b ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3e05bf0 
> unix:cpu_acpi_idle+8d ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3e05c00 
> unix:cpu_idle_adaptive+13 ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3e05c20 
> unix:idle+a7 ()
> May 18 04:44:36 zfs01 genunix: [ID 655072 kern.notice] ff00f3e05c30 
> unix:thread_start+8 ()
> May 18 04:44:36 zfs01 unix: [ID 10 kern.notice]
> May 18 04:44:36 zfs01 genunix: [ID 672855 kern.notice] syncing file systems...
> May 18 04:44:38 zfs01 genunix: [ID 904073 kern.notice]  done
> May 18 04:44:39 zfs01 genunix: [ID 111219 kern.notice] dumping to 
> /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
> May 18 04:44:39 zfs01 ahci: [ID 405573 kern.info] NOTICE: ahci0: 
> ahci_tran_reset_dport port 1 reset port
> May 18 05:17:56 zfs01 genunix: [ID 10 kern.notice]
> May 18 05:17:56 zfs01 genunix: [ID 665016 kern.notice] ^M100% done: 8607621 
> pages dumped,
> May 18 05:17:56 zfs01 genunix: [ID 851671 kern.notice] dump succeeded
>  
> The disks are all 4TB WD40001FYYG enterprise SAS drives.
> 

I've had such bad luck with that model, IMNSHO I recommend replacing with 
anything else :-(

That said, I don't think it is a root cause for this panic. To get the trail of 
tears, you'll need to
look at the FMA ereports for the 10,000 seconds prior to the panic. fmdump has 
a -t option you'll
find useful. The [foreward reference] is the result of a SCSI reset of the 
target, LUN, or HBA.
These occur when the sd driver has not had a reply and issues one of those 
types of resets *or*
the device or something in the data path resets.

HTH,
 -- richard

>   Googling seems to indicate it is a known problem with the way the various 
> subsystems sometimes interact. Is there any way to fix/workaround this issue?
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] nfs client in a zone won't start

2015-05-15 Thread Richard Elling

> On May 15, 2015, at 2:25 PM, Jim Klimov  wrote:
> 
> 15 мая 2015 г. 21:56:09 CEST, "mar...@waldenvik.se"  
> пишет:
>> Hi
>> 
>> I created a zone per omnios wiki for a mysql-server (omnios r151014).
>> But i can’t seem to start the nfs/client service. It just says
>> offline*.There are no clue in any of the logs. If i do a svcadm enable
>> -r nfs/client it says svcadm: svc:/milestone/network depends on
>> svc:/network/physical, which has multiple instances.
>> 
>> Any help would be appreciated. I wish to mount a nfs share for backing
>> up mysql-databases
>> 
>> Regards
>> Martin
>> Sent with Airmail
>> 
>> 
>> 
>> 
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 
> The state offline* (with asterisk) means transition from offline (is in 
> process of onlining). You might want to look into 
> /var/svc/log/*nfs-client*log for possible more details, and/or to manually 
> rerun (or instrument with 'sh -x' and the likes) the scripts and bits of the 
> service to trace into the problem.

pro tip:
cat $(svcs -L nfs/client)
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Writeback Cache Auto disabled

2015-05-08 Thread Richard Elling

> On May 6, 2015, at 12:49 AM, d...@xmweixun.com wrote:
> 
> Hi Richard
>  I use stmfadm modify-lu –p wcd=false LU Name, change write cache to 
> enable,but when client read  or wirte io  from lu,lu status (writeback cache) 
> change to disable again.

This is correct. Initiators can override the target's default.
 -- richard

>  
>  
>  
> Best Regards,
> Deng Wei Quan / 邓伟权
> Mob: +86 13906055059
> Mail: d...@xmweixun.com <mailto:d...@xmweixun.com>
> 厦门维讯信息科技有限公司
>  
> 发件人: dwq+auto_=dengweiquan=139@xmweixun.com 
> [mailto:dwq+auto_=dengweiquan=139@xmweixun.com] 代表 Richard Elling
> 发送时间: 2015年5月5日 23:17
> 收件人: d...@xmweixun.com
> 抄送: omnios-discuss@lists.omniti.com
> 主题: Re: [OmniOS-discuss] Writeback Cache Auto disabled
>  
>  
>> On May 5, 2015, at 12:54 AM, mailto:d...@xmweixun.com>> 
>> mailto:d...@xmweixun.com>> wrote:
>>  
>> Hi All,
>>  When I present lu to hpux or aix, lu writeback cache auto 
>> disabled,why?
>  
> In SCSI, initiators can change the write cache policy.
>  — richard
> 
> 
>>  
>> LU Name: 600144F05548DC360005
>> Operational Status: Online
>> Provider Name : sbd
>> Alias : /dev/zvol/rdsk/wxnas/hpuxtest03
>> View Entry Count  : 1
>> Data File : /dev/zvol/rdsk/wxnas/hpuxtest03
>> Meta File : not set
>> Size  : 21474836480
>> Block Size: 512
>> Management URL: not set
>> Vendor ID : SUN 
>> Product ID: COMSTAR 
>> Serial Num: not set
>> Write Protect : Disabled
>> Writeback Cache   : Disabled
>> Access State  : Active
>>  
>>  
>> Thanks.
>>  
>> Version:
>> SunOS wxos1 5.11 omnios-b281e50 i86pc i386 i86pc
>> Deng
>>  
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com <mailto:OmniOS-discuss@lists.omniti.com>
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
>> <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>  
> --
>  
> richard.ell...@richardelling.com <mailto:richard.ell...@richardelling.com>
> +1-760-896-4422
> 
> 
>  

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Writeback Cache Auto disabled

2015-05-05 Thread Richard Elling

> On May 5, 2015, at 12:54 AM,   wrote:
> 
> Hi All,
>  When I present lu to hpux or aix, lu writeback cache auto 
> disabled,why?

In SCSI, initiators can change the write cache policy.
 — richard

>  
> LU Name: 600144F05548DC360005
> Operational Status: Online
> Provider Name : sbd
> Alias : /dev/zvol/rdsk/wxnas/hpuxtest03
> View Entry Count  : 1
> Data File : /dev/zvol/rdsk/wxnas/hpuxtest03
> Meta File : not set
> Size  : 21474836480
> Block Size: 512
> Management URL: not set
> Vendor ID : SUN 
> Product ID: COMSTAR 
> Serial Num: not set
> Write Protect : Disabled
> Writeback Cache   : Disabled
> Access State  : Active
>  
>  
> Thanks.
>  
> Version:
> SunOS wxos1 5.11 omnios-b281e50 i86pc i386 i86pc
> Deng
>  
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com 
> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
> 
--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] What do people use for basic system monitoring?

2015-04-21 Thread Richard Elling

> On Apr 21, 2015, at 2:41 PM, Theo Schlossnagle  wrote:
> 
> Given that several of the original core OmniOS team work for Circonus, I'd 
> say the answer from this side would be pretty biased.
> 
> Collectd works okay, but certainly isn't my preference as the polling 
> interval can't easily modified on-demand during troubleshooting.

We've done a bunch of work on collectd collectors. It has the benefit of being 
lightweight and
low-impact, but isn't as inherently flexible as nad. 
https://github.com/Coraid/coraid-collectd 

> 
> We use nad everywhere: https://github.com/circonus-labs/nad 
>   It exposes systems telemetry in JSON 
> over HTTP and has some really nice features like exposing histograms of 
> syscall latencies and/or disk I/O latencies allowing you to track the latency 
> of every individual I/O against every spindle -- nice for understanding 
> workload changes and disk behavior issues.
> 
> As it is JSON data, it should trivial to pump it into just about any metrics 
> systems... Circonus is free for up to 500 metrics:
> 
> http://www.circonus.com/free-account/ 
> 
> 
> On Tue, Apr 21, 2015 at 4:51 PM, Chris Siebenmann  > wrote:
>  Out of curiosity: I suspect that plenty of people are gathering basic
> system activity stats for their OmniOS systems and pushing them into
> modern metrics systems such as graphite (to pick perhaps the most well
> known package for this). For those that are doing this, what is your
> preferred collection agent?

graphite is at the end of its life, though we can still feed it from collectd.
There are many things I like about Circonus, but for various reasons we've
been going with influxdb as an interesting target.

> 
> (My ideal collection agent would be able to gather stats for ZFS,
> network and disk IO, and general kernel stats analogous to vmstat
> and mpstat.)

The upstream (collectd.org ) collectd collectors are 
pretty generic and lowest-common
denominator. To get details like mpstat/vmstat we added new collectors, see 
above link.
 -- richard

> 
>  Thanks in advance.
> 
> - cks
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com 
> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
> 
> 
> 
> 
> -- 
> Theo Schlossnagle
> 
> http://omniti.com/is/theo-schlossnagle 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Internal pkg error during a test r151010 to r151014 upgrade

2015-04-08 Thread Richard Elling

> On Apr 7, 2015, at 9:40 AM, Chris Siebenmann  wrote:
> 
>> History lesson: until people could afford to purchase more than one
>> disk and before Sun invented the diskless workstation (with shared
>> /usr), everything was under /.
> 
> As Richard knows but other people may not, this is ahistorical on
> Unix. From almost the beginning[*] Unix had a split between the root
> filesystem and the /usr filesystem, based (as far as I understand
> it) on the physical disks involved at Bell Labs CSRG on their Unix
> machine. This is part of why the split of commands between /bin and
> /usr/bin existed for years. Sun's diskless machines did not invent a
> split /usr, they just took advantage of existing practice and made it
> read-only and shared.

Splitting some hairs... originally the OS was in / and user programs in /usr
(hence the name) It was later that the thing we now call the "OS" moved
to /usr. Now the "OS" is moving elsewhere, invading as it goes, as Volker
described rather well.

The key point is that trying to use filesystem(5) as written only works if
everyone uses it, and they don't :-( with /opt being the perfect example of
organizational dysfunction. The packaging system doesn't matter as it just
sweeps the dust under the rug.

IMHO the companies that solve this take the reductionist path: one file
system. When done well, upgrades and installation are painless and my
grandmother has no problem upgrading her phone without assistance.

My approach to creating filesystems is based on policies to be applied,
where those policies are fundamental to filesystems. Harkening back to
the diskless workstation example or the more modern SmartOS model,
there can be a readonly policy for the fixed OS bits that are replaced en
masse. Other common policy knobs include: quota, reservation, backup,
dedup, compression.

Back to the problems at hand:
1. BE managing /opt as if it was its own, exclusive, waterfront resort.
   IMHO trying to assert an upgrade en masse policy to /opt is futile.
   Oracle's hack in Solaris 11.2 just kicks the can down the street.

2. Saving space for dumps.
   Don't waste time dumping to ZFS, setup a dump device on a raw partition
   somewhere. No need to mirror it or back it up.

 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Internal pkg error during a test r151010 to r151014 upgrade

2015-04-07 Thread Richard Elling

> On Apr 7, 2015, at 8:49 AM, Chris Siebenmann  wrote:
> 
>> Short story is that /opt is part of a namespace managed by the Solaris
>> packaging and as such is part of a BE fs tree. If you have privately
>> managed packages under certain subdirs, turn those sub-dirs into
>> separate datasets instead.
> 
> If this is the case for OmniOS, I believe that it should be strongly
> and visibly documented in the OmniOS wiki as part of the install
> instructions and so on. It is not intuitive to me at all, and in fact
> I would strongly expect that it would not be the case as /opt is where
> people have traditionally put *third party* software.

See filesystem(5)

History lesson: until people could afford to purchase more than one disk
and before Sun invented the diskless workstation (with shared /usr), everything
was under /.  Indeed, many other wildly successful OSes (MS-DOS, MS-Windows,
OSX, Android) do likewise. As such, having separate filesystems is actually 
detrimental to systems management and only arrived at the marketplace when the
OS outgrew the drives available at the time *and* people could start to afford
buying more than one drive. RAID-0 came later.

In other words, it is a bad idea to manage many filesystems under the pretense
that there is some magical contract regarding their stability. Today, we have
more options available: pooled storage, snapshots, etc. that make managing
multiple filesystems almost as easy as managing one.
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks

2015-03-30 Thread Richard Elling

> On Mar 30, 2015, at 1:16 PM, wuffers  wrote:
> 
> 
>> On Mar 30, 2015, at 4:10 PM, Richard Elling 
>>  wrote:
>> 
>> 
>> is compression enabled?
>> 
>> 
>>  -- richard
>>> 
> 
> Yes, LZ4. Dedupe off.

Ironically, WRITE_SAME is the perfect workload for dedup :-)
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks

2015-03-30 Thread Richard Elling

On Mar 26, 2015, at 11:24 PM, wuffers  wrote:

>> 
>> So here's what I will attempt to test:
>> - Create thin vmdk @ 10TB with vSphere fat client: PASS 
>> - Create lazy zeroed vmdk @ 10 TB with vSphere fat client: PASS
>> - Create eager zeroed vmdk @ 10 TB with vSphere web client: PASS! (took 1 
>> hour)
>> - Create thin vmdk @ 10TB with vSphere web client: PASS
>> - Create lazy zeroed vmdk @ 10 TB with vSphere web client: PASS
> 
> Additionally, I tried:
> - Create fixed vhdx @ 10TB with SCVMM (Hyper-V): PASS (most likely no 
> primitives in use here - this took slightly over 3 hours)

is compression enabled?


  -- richard

>  
> Everything passed (which I didn't expect, especially the 10TB eager zero).. 
> then I tried again on the vSphere web client for a 20TB eager zero disk, and 
> I got another kernel panic altogether (no kmem_flags 0xf set, unfortunately).
> 
> Mar 27 2015 01:09:33.66406 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd 
> SUNOS-8000-KL
> 
>   TIME CLASS ENA
>   Mar 27 01:09:33.6307 ireport.os.sunos.panic.dump_available 
> 0x
>   Mar 27 01:08:30.6688 ireport.os.sunos.panic.dump_pending_on_device 
> 0x
> 
> nvlist version: 0
> version = 0x0
> class = list.suspect
> uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
> code = SUNOS-8000-KL
> diag-time = 1427432973 633746
> de = fmd:///module/software-diagnosis
> fault-list-sz = 0x1
> fault-list = (array of embedded nvlists)
> (start fault-list[0])
> nvlist version: 0
> version = 0x0
> class = defect.sunos.kernel.panic
> certainty = 0x64
> asru = 
> sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
> resource = 
> sw:///:path=/var/crash/unknown/.2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
> savecore-succcess = 1
> dump-dir = /var/crash/unknown
> dump-files = vmdump.2
> os-instance-uuid = 2e2305c2-54b5-c1f4-aafd-fb1eccc982dd
> panicstr = BAD TRAP: type=d (#gp General protection) 
> rp=ff01eb72ea70 addr=0
> panicstack = unix:real_mode_stop_cpu_stage2_end+9e23 () | 
> unix:trap+a30 () | unix:cmntrap+e6 () | genunix:anon_decref+35 () | 
> genunix:anon_free+74 () | genunix:segvn_free+242 () | genunix:seg_free+30 () 
> | genunix:segvn_unmap+cde () | genunix:as_free+e7 () | genunix:relvm+220 () | 
> genunix:proc_exit+454 () | genunix:exit+15 () | genunix:rexit+18 () | 
> unix:brand_sys_sysenter+1c9 () |
> crashtime = 1427431421
> panic-time = Fri Mar 27 00:43:41 2015 EDT
> (end fault-list[0])
> 
> fault-status = 0x1
> severity = Major
> __ttl = 0x1
> __tod = 0x5514e60d 0x2794c060
> 
> Crash file:
> https://drive.google.com/file/d/0B7mCJnZUzJPKT0lpTW9GZFJCLTg/view?usp=sharing
> 
> It appears I can do thin and lazy zero disks of those sizes, so I will have 
> to be satisfied to use those options as a workaround (plus disabling 
> WRITE_SAME from the hosts if I really wanted the eager zeroed disk) until 
> some of that Nexenta COMSTAR love is upstreamed. For comparison sake, 
> provisioning a 10TB fixed vhdx took approximately 3 hours in Hyper-V, while 
> the same provisioning in VMware took about 1 hour. So we can say that 
> WRITE_SAME accelerated the same job by 3x.
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] best or preferred 10g card for OmniOS

2015-03-28 Thread Richard Elling

> On Mar 26, 2015, at 9:24 AM, Doug Hughes  wrote:
> 
> any recommendations? We're having some pretty big problems with the 
> Solarflare card and driver dropping network under high load. We eliminated 
> LACP as a culprit, and the switch.
> 
> Intel? Chelsio? other?

I've been running exclusively Intel for several years now. It gets the most
attention in the illumos community.

 -- richard


> 
> - Doug
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] How to check if you have enough NFS server threads?

2015-03-20 Thread Richard Elling

> On Mar 20, 2015, at 1:09 PM, Chris Siebenmann  wrote:
> 
> We're running into a situation with one of our NFS ZFS fileservers[*]
> where we're wondering if we have enough NFS server threads to handle
> our load. Per 'sharectl get nfs', we have 'servers=512' configured,
> but we're not sure we know how to check how many are actually in use
> and active at any given time and whether or not we're running into
> this limit.
> 
> Does anyone know how to tell either?

Yes, these are dynamically sized and you can track via the number of current 
threads 
as shown by ps or something sneaky like "ls /proc/$(pgrep nfsd)/lwp | wc -l"

Some distros, including Solaris 11.1, have kstats for this information. So when 
we track
them over time, they can and do change dynamically and quickly.

> 
> We've looked at mdb -k's '::svc_pool nfs' but I've concluded that I
> don't know enough about OmniOS kernel internals to know for sure what
> it's telling us (partly because it seems to be giving us implausibly
> high numbers). Is the number we're looking for 'Non detached threads'
> minus 'Asleep threads'? (Or that plus detached threads?)

In general, the number of threads is an indication of the load of the clients 
and
the service ability of the server (in queuing theory terms). Too much load gives
the same result as too slow of a back-end. In NFS, clients limit the number of
concurrent requests, which is the best way to deal with too much load.
 -- richard

> 
> Thanks in advance.
> 
>   - cks
> [*: our server setup and configuration is:
>   http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFileserverSetupII
> ]
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] QLE2652 I/O Disconnect. Heat Sinks?

2015-03-06 Thread Richard Elling

> On Mar 5, 2015, at 6:00 AM, Nate Smith  wrote:
> 
> I’ve had this problem for a while, and I have no way to diagnose what is 
> going on, but occasionally when system IO gets high (I’ve seen it happen 
> especially on backups), I will lose connectivity with my Fibre Channel cards 
> which serve up fibre channel LUNS to a VM cluster. All hell breaks loose, and 
> then connectivity gets restored. I don’t get an error that it’s dropped, at 
> least not on the Omnios system, but I get notice when it’s restored (which 
> makes no sense). I’m wondering if the cards are just overheating, and if heat 
> sinks with a fan would help on the io chip.

Is there a PCI bridge in the data path? These can often be found on mezzanine 
or riser cards.
 -- richard

>  
> Mar  5 01:55:01 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
> portid 2, topology Fabric Pt-to-Pt,speed 8G
> Mar  5 01:56:26 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt0,0 LINK UP, 
> portid 20100, topology Fabric Pt-to-Pt,speed 8G
> Mar  5 02:00:13 newstorm last message repeated 1 time
> Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt3,0 LINK UP, 
> portid 1, topology Fabric Pt-to-Pt,speed 8G
> Mar  5 02:00:15 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt2,0 LINK UP, 
> portid 2, topology Fabric Pt-to-Pt,speed 8G
> Mar  5 02:00:18 newstorm fct: [ID 132490 kern.notice] NOTICE: qlt1,0 LINK UP, 
> portid 10100, topology Fabric Pt-to-Pt,speed 8G
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] HGST 7K6000 for a RAIDZ2 pool

2015-02-25 Thread Richard Elling

> On Feb 25, 2015, at 3:17 PM, Tobias Oetiker  wrote:
> 
> experts!
> 
> If you were to buy 6TB disks for a RAIDZ2 Pool, would you go for
> 512n like in the olden days, or use the new 4Kn.
> 
> I know ZFS can deal with both ...
> 
> So what would be your choice, and WHY?

Better yet, what would be your requirements, then your choice, then why?
 -- richard

> 
> cheers
> tobi
> 
> -- 
> Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
> www.oetiker.ch t...@oetiker.ch +41 62 775 9902
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS Slog - force all writes to go to Slog

2015-02-18 Thread Richard Elling

> On Feb 18, 2015, at 12:04 PM, Rune Tipsmark  wrote:
> 
> hi all,
>  
> I found an entry about zil_slog_limit here: 
> http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSWritesAndZILII 
> 
> it basically explains how writes larger than 1MB per default hits the main 
> pool rather than my Slog device - I could not find much further information 
> nor the equivalent setting in OmniOS. I also read 
> http://nex7.blogspot.ca/2013/04/zfs-intent-log.html 
>  but it didn't truly 
> help me understand just how I can force every written byte to my ZFS box to 
> go the ZIL regardless of size, I never ever want anything to go directly to 
> my man pool ever.
> 

"never ever want anything to go to main pool" is not feasible. The ZIL is a ZFS 
Intent Log
http://en.wikipedia.org/wiki/Intent_log 
 and, unless you overwrite prior to 
txg commit, everything
ends up in the main pool.

>  
> I have sync=always and disabled write back cache on my volume based LU's.
>  
> Testing with zfs_txg_timeout set to 30 or 60 seconds seems to make no 
> difference if I write large files to my LU's - I don't seem the write speed 
> being consistent with the performance of the Slog devices. It looks as if it 
> goes straight to disk and hence the performance is less than great to say the 
> least.
> 

Ultimately, the pool must be able to sustain the workload, or it will have to 
throttle.

The comment for zil_slog_limit is concise:
/*
 * Use the slog as long as the logbias is 'latency' and the current commit size
 * doesn't exceed the limit or the total list size doesn't exceed its limit.
 * Limit checking is disabled by setting zil_slog_limit to UINT64_MAX.
 */
uint64_t zil_slog_limit = (1024 * 1024);
uint64_t zil_slog_list_limit = (1024 * 1024 * 200);

and you can change this on the fly using mdb to experiment.

>  
> How do I ensure 100% that all writes always goes to my Slog devices - no 
> exceptions.
> 

The question really isn't "how" the question is "why"? Now that you know what 
an 
Intent Log is, and how the performance of the pool is your ultimate limit, 
perhaps you
can explain what you are really trying to accomplish?
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Mildly confusing ZFS iostat output

2015-01-26 Thread Richard Elling

> On Jan 26, 2015, at 5:16 PM, W Verb  wrote:
> 
> Hello All,
> 
> I am mildly confused by something iostat does when displaying statistics for 
> a zpool. Before I begin rooting through the iostat source, does anyone have 
> an idea of why I am seeing high "wait" and "wsvc_t" values for "ppool" when 
> my devices apparently are not busy? I would have assumed that the stats for 
> the pool would be the sum of the stats for the zdevs

welcome to queuing theory! ;-)

First, iostat knows nothing about the devices being measured. It is really just 
a processor
for kstats of type KSTAT_TYPE_IO (see the kstat(3kstat) man page for 
discussion)  For that
type, you get a 2-queue set. For many cases, 2-queues is a fine model, but when 
there is
only one interesting queue, sometimes developers choose to put less interesting 
info in the
"wait" queue.

Second, it is the responsibility of the developer to define the queues. In the 
case of pools,
the queues are defined as:
wait = vdev_queue_io_add() until vdev_queue_io_remove()
run = vdev_queue_pending_add() until vdev_queue_pending_remove()

The run queue is closer to the actual measured I/O to the vdev (the juicy 
performance bits)
The wait queue is closer to the transaction engine and includes time for 
aggregation.
Thus we expect the wait queue to be higher, especially for async workloads. But 
since I/Os
can and do get aggregated prior to being sent to the vdev, it is not a very 
useful measure of
overall performance. In other words, optimizing this away could actually hurt 
performance.

In general, worry about the run queues and don't worry so much about the wait 
queues.
NB, iostat calls "run" queues "active" queues. You say Tomato, I say 'mater.
 -- richard


> 
> extended device statistics
> r/sw/s   kr/s kw/s  wait actv wsvc_t asvc_t  %w  %b device
>10.0 9183.0   40.5 344942.0   0.0  1.80.00.2   0 178 c4
> 1.0  187.04.0  19684.0   0.0  0.10.00.5   0   8 
> c4t5000C5006A597B93d0
> 2.0  199.0   12.0  20908.0   0.0  0.10.00.6   0  12 
> c4t5000C500653DE049d0
> 2.0  197.08.0  20788.0   0.0  0.20.00.8   0  15 
> c4t5000C5003607D87Bd0
> 0.0  202.00.0  20908.0   0.0  0.10.00.6   0  11 
> c4t5000C5006A5903A2d0
> 0.0  189.00.0  19684.0   0.0  0.10.00.5   0  10 
> c4t5000C500653DEE58d0
> 5.0  957.0   16.5   1966.5   0.0  0.10.00.1   0   7 
> c4t50026B723A07AC78d0
> 0.0  201.00.0  20787.9   0.0  0.10.00.7   0  14 
> c4t5000C5003604ED37d0
> 0.00.00.0  0.0   0.0  0.00.00.0   0   0 
> c4t5000C500653E447Ad0
> 0.0 3525.00.0 110107.7   0.0  0.50.00.2   0  51 
> c4t500253887000690Dd0
> 0.0 3526.00.0 110107.7   0.0  0.50.00.1   1  50 
> c4t5002538870006917d0
>10.0 6046.0   40.5 344941.5 837.4  1.9  138.30.3  23  67 ppool
> 
> 
> For those following the VAAI thread, this is the system I will be using as my 
> testbed.
> 
> Here is the structure of ppool (taken at a different time than above):
> 
> root@sanbox:/root# zpool iostat -v ppool
>   capacity operationsbandwidth
> pool   alloc   free   read  write   read  write
> -  -  -  -  -  -  -
> ppool   191G  7.97T 23637   140K  15.0M
>   mirror   63.5G  2.66T  7133  46.3K   840K
> c4t5000C5006A597B93d0  -  -  1 13  24.3K   844K
> c4t5000C500653DEE58d0  -  -  1 13  24.1K   844K
>   mirror   63.6G  2.66T  7133  46.5K   839K
> c4t5000C5006A5903A2d0  -  -  1 13  24.0K   844K
> c4t5000C500653DE049d0  -  -  1 13  24.6K   844K
>   mirror   63.5G  2.66T  7133  46.8K   839K
> c4t5000C5003607D87Bd0  -  -  1 13  24.5K   843K
> c4t5000C5003604ED37d0  -  -  1 13  24.4K   843K
> logs   -  -  -  -  -  -
>   mirror301M   222G  0236  0  12.5M
> c4t5002538870006917d0  -  -  0236  5  12.5M
> c4t500253887000690Dd0  -  -  0236  5  12.5M
> cache  -  -  -  -  -  -
>   c4t50026B723A07AC78d062.3G  11.4G 19113  83.0K  1.07M
> -  -  -  -  -  -  -
> 
> root@sanbox:/root# zfs get all ppool
> NAME   PROPERTY  VALUE  SOURCE
> ppool  type  filesystem -
> ppool  creation  Sat Jan 24 18:37 2015  -
> ppool  used  5.16T  -
> ppool  available 2.74T  -
> ppool  referenced96K-
> ppool  compressratio 1.51x  -
> ppool  

Re: [OmniOS-discuss] iostat skip first output

2015-01-24 Thread Richard Elling

> On Jan 24, 2015, at 9:25 AM, Rune Tipsmark  wrote:
> 
> hi all, I am just writing some scripts to gather performance data from 
> iostat... or at least trying... I would like to completely skip the first 
> output since boot from iostat output and just get right to the period I 
> specified with the data current from that period. Is this possible at all?
> 

iostat -xn 10 2 | awk '$1 == "extended" && NR > 2 {show=1} show == 1'

NB, this is just a derivative of a sample period. A better approach is to store
long-term trends in a database intended for such use. If that is too much work,
then you should consider storing the raw data that iostat uses for this:
kstat -p 'sd::/sd[0-9]+$/'

or in JSON:
kstat -jp 'sd::/sd[0-9]+$/'

insert shameless plug for Circonus here :-)
 -- richard

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running?

2015-01-20 Thread Richard Elling

> On Jan 20, 2015, at 11:30 AM, Stephan Budach  wrote:
> 
> Am 20.01.15 um 16:42 schrieb Dan McDonald:
>> Check the firmware revisions on both mpt_sas controllers.  It's possible one 
>> need up-or-down grading.
>> 
>> There are known good and known bad revisions of the mpt_sas firmware.  Other 
>> on this list are more cognizant of what those revisions are.
>> 
>> Dan
>> 
> Hi Dan,
> 
> thanks - I do have all of my boxes equipped with the same hardware - LSI 
> 2907-8i. I am downloading the MRM software from LSI and will install it on my 
> backup host to check the firmware revisions of both HBAs on that box first.

avoid P20 like the plague (google it for the blow-by-blow accounts of pain)
 -- richard

> 
> Cheers,
> budy
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed

2015-01-19 Thread Richard Elling
Thanks Rune, more below...

> On Jan 19, 2015, at 5:23 AM, Rune Tipsmark  wrote:
> 
> From: Richard Elling 
> Sent: Monday, January 19, 2015 1:57 PM
> To: Rune Tipsmark
> Cc: omnios-discuss@lists.omniti.com
> Subject: Re: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion 
> Speed
>  
> 
>> On Jan 19, 2015, at 3:55 AM, Rune Tipsmark > <mailto:r...@steait.net>> wrote:
>> 
>> hi all,
>>  
>> just in case there are other people out there using their ZFS box against 
>> vSphere 5.1 or later... I found my storage vmotion were slow... really 
>> slow... not much info available and so after a while of trial and error I 
>> found a nice combo that works very well in terms of performance, latency as 
>> well as throughput and storage vMotion.
>>  
>> - Use ZFS volumes instead of thin provisioned LU's - Volumes support two of 
>> the VAAI features
>> 
> 
> AFAIK, ZFS is not available in VMware. Do you mean run iSCSI to connect the 
> ESX box to
> the server running ZFS? If so...
> >> I run 8G Fibre Channel

ok, still it is COMSTAR, so the backend is the same

>> - Use thick provisioning disks, lazy zeroed disks in my case reduced storage 
>> vMotion by 90% or so - machine 1 dropped from 8½ minutes to 23 seconds and 
>> machine 2 dropped from ~7 minutes to 54 seconds... a rather nice improvement 
>> simply by changing from thin to thick provisioning.
>> 
> 
> This makes no difference in ZFS. The "thick provisioned" volume is simply a 
> volume with a reservation.
> All allocations are copy-on-write. So the only difference between a "thick" 
> and "thin" volume occurs when
> you run out of space in the pool.
> >> I am talking thick provisioning in VMware, that's where it makes a huge 
> >> difference

yes, you should always let VMware think it is thick provisioned, even if it 
isn't. VMware is too
ignorant of copy-on-write file systems to be able to make good decisions.

>> - I dropped my Qlogic HBA max queue depth from default 64 to 16 on all ESXi 
>> hosts and now I see an average latency of less than 1ms per data store (on 
>> 8G fibre channel).  Of course there are spikes when doing storage vMotion at 
>> these speeds but its well worth it.
>> 
> 
> I usually see storage vmotion running at wire speed for well configured 
> systems. When you get 
> into the 2GByte/sec range this can get tricky, because maintaining that flow 
> through the RAM
> and disks requires nontrivial amounts of hardware.
> >> I don't even get close to wire speed unfortunately my SLOGs can only do 
> >> around 5-600 MBbyte/sec with sync=always.

Indeed, the systems we make fast have enough hardware to be fast.

> More likely, you're seeing the effects of caching, which is very useful for 
> storage vmotion and
> allows you to hit line rate.
> 
> >> Not sure this is the case with using sync=always?

Caching will make a big difference. You should also see effective use of the 
ZFS prefetcher.

Thanks for sharing your experience.
 -- richard

>>  
>> I am getting to the point where I am almost happy with my ZFS backend for 
>> vSphere.
>> 
> 
> excellent!
>  -- richard
> 
> 

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed

2015-01-19 Thread Richard Elling

> On Jan 19, 2015, at 3:55 AM, Rune Tipsmark  wrote:
> 
> hi all,
>  
> just in case there are other people out there using their ZFS box against 
> vSphere 5.1 or later... I found my storage vmotion were slow... really 
> slow... not much info available and so after a while of trial and error I 
> found a nice combo that works very well in terms of performance, latency as 
> well as throughput and storage vMotion.
>  
> - Use ZFS volumes instead of thin provisioned LU's - Volumes support two of 
> the VAAI features
> 

AFAIK, ZFS is not available in VMware. Do you mean run iSCSI to connect the ESX 
box to
the server running ZFS? If so...

> - Use thick provisioning disks, lazy zeroed disks in my case reduced storage 
> vMotion by 90% or so - machine 1 dropped from 8½ minutes to 23 seconds and 
> machine 2 dropped from ~7 minutes to 54 seconds... a rather nice improvement 
> simply by changing from thin to thick provisioning.
> 

This makes no difference in ZFS. The "thick provisioned" volume is simply a 
volume with a reservation.
All allocations are copy-on-write. So the only difference between a "thick" and 
"thin" volume occurs when
you run out of space in the pool.

> - I dropped my Qlogic HBA max queue depth from default 64 to 16 on all ESXi 
> hosts and now I see an average latency of less than 1ms per data store (on 8G 
> fibre channel).  Of course there are spikes when doing storage vMotion at 
> these speeds but its well worth it.
> 

I usually see storage vmotion running at wire speed for well configured 
systems. When you get 
into the 2GByte/sec range this can get tricky, because maintaining that flow 
through the RAM
and disks requires nontrivial amounts of hardware.

More likely, you're seeing the effects of caching, which is very useful for 
storage vmotion and
allows you to hit line rate.

>  
> I am getting to the point where I am almost happy with my ZFS backend for 
> vSphere.
> 

excellent!
 -- richard


___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] assertion failed for thread / omnios r12

2015-01-09 Thread Richard Elling

> On Jan 9, 2015, at 1:33 PM, Randy S  wrote:
> 
> Hi all, 
> 
> Maybe this has been covered already (I saw a bug about this so I thought this 
> occurence should not be present in omnios r12) but when I do a zdb -d rpool 
> after having upgraded the rpool to the latest version, I get a :
> assertion failed for thread 0xfd7fff162a40, thread-id 1: 
> spa_writeable(vd->vdev_spa), file ../../../uts/common/fs/zfs/vdev.c, line 1566
> 
> What can have caused this. 

Its a bug, zdb doesn't open the pool for writing, so it can't be writable.

> 
> zpool upgrade rpool
> This system supports ZFS pool feature flags.
> 
> Enabled the following features on 'rpool':
>   lz4_compress
>   multi_vdev_crash_dump
>   spacemap_histogram
>   enabled_txg
>   hole_birth
>   extensible_dataset
>   embedded_data
>   bookmarks
>   filesystem_limits
> 
> Is there a way I can disable this spacemap feature after having done the 
> upgrade?
> It seems that Bug #5165 (https://www.illumos.org/issues/5165) is still in 
> there. 

yep

zdb is intended for debugging and isn't guaranteed to run successfully on 
imported
pools. There is likely some other way to get the info your looking for... so 
what are 
you looking for?
 -- richard


> 
> Regards,
> 
> R
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow NFS speeds at rsize > 128k

2015-01-07 Thread Richard Elling

> On Jan 7, 2015, at 1:21 PM, Stephan Budach  wrote:
> 
> Am 07.01.15 um 21:48 schrieb Richard Elling:
>> 
>>> On Jan 7, 2015, at 12:11 PM, Stephan Budach >> <mailto:stephan.bud...@jvm.de>> wrote:
>>> 
>>> Am 07.01.15 um 18:00 schrieb Richard Elling:
>>>> 
>>>>> On Jan 7, 2015, at 2:28 AM, Stephan Budach >>>> <mailto:stephan.bud...@jvm.de>> wrote:
>>>>> 
>>>>> Hello everyone,
>>>>> 
>>>>> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really 
>>>>> bad NFS read performance, when rsize goes beyond 128k, whereas the 
>>>>> performance is just fine at 32k. The issue is, that the ovs-agent, which 
>>>>> is performing the actual mount, doesn't accept or pass any NFS mount 
>>>>> options to the NFS server.
>>>> 
>>>> The other issue is that illumos/Solaris on x86 tuning of server-side size 
>>>> settings does
>>>> not work because the compiler optimizes away the tunables. There is a 
>>>> trivial fix, but it
>>>> requires a rebuild.
>>>> 
>>>>> To give some numbers, a rsize of 1mb results in a read throughput of 
>>>>> approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS 
>>>>> export from a OEL 6u4 box has no issues with this, as the read speeds 
>>>>> from this export are 108+MB/s regardles of the rsize of the NFS mount.
>>>> 
>>>> Brendan wrote about a similar issue in the Dtrace book as a case study. 
>>>> See chapter 5
>>>> case study on ZFS 8KB mirror reads.
>>>> 
>>>>> 
>>>>> The OmniOS box is currently connected to a 10GbE port at our core 6509, 
>>>>> but the NFS client is connected through a 1GbE port only. MTU is at 1500 
>>>>> and can currently not be upped.
>>>>> Anyone having a tip, why a rsize of 64k+ will result in such a 
>>>>> performance drop?
>>>> 
>>>> It is entirely due to optimizations for small I/O going way back to the 
>>>> 1980s.
>>>>  -- richard
>>> But, doesn't that mean, that Oracle Solaris will have the same issue or has 
>>> Oracle addressed that in recent Solaris versions? Not, that I am intending 
>>> to switch over, but that would be something I'd like to give my SR engineer 
>>> to chew on…
>> 
>> Look for yourself :-)
>> In "broken" systems, such as this Solaris 11.1 system:
>> # echo nfs3_tsize::dis | mdb -k
>> nfs3_tsize: pushq  %rbp
>> nfs3_tsize+1:   movq   %rsp,%rbp
>> nfs3_tsize+4:   subq   $0x8,%rsp
>> nfs3_tsize+8:   movq   %rdi,-0x8(%rbp)
>> nfs3_tsize+0xc: movl   (%rdi),%eax
>> nfs3_tsize+0xe: leal   -0x2(%rax),%ecx
>> nfs3_tsize+0x11:cmpl   $0x1,%ecx
>> nfs3_tsize+0x14:jbe+0x12
>> nfs3_tsize+0x16:cmpl   $0x5,%eax
>> nfs3_tsize+0x19:movl   $0x10,%eax
>> nfs3_tsize+0x1e:movl   $0x8000,%ecx
>> nfs3_tsize+0x23:cmovl.ne %ecx,%eax
>> nfs3_tsize+0x26:jmp+0x5 
>> nfs3_tsize+0x28:movl   $0x10,%eax
>> nfs3_tsize+0x2d:leave  
>> nfs3_tsize+0x2e:ret
>> 
>> at +0x19 you'll notice hardwired 1MB
> Ouch! Is that from a NFS client or server?

server

> Or rather, I know that the NFS server negotiates the options with the client 
> and if no options are passed from the client to the server, the server sets 
> up the connection with it's defaults.

the server and client negotiate, so both can have defaults

> So, this S11.1 output - is that from the NFS server? If yes, it would mean 
> that the NFS server would go with the 1mb rsize/wsize since the OracleVM 
> Server has not provided any options to it.

You are not mistaken. AFAIK, this has been broken in Solaris x86 for more than 
10 years.
Fortunately, most people can adjust on the client side, unless you're running 
ESX or something
that is difficult to adjust... like you seem to be.

>> 
>> by contrast, on a proper system
>> # echo nfs3_tsize::dis | mdb -k
>> nfs3_tsize: pushq  %rbp
>> nfs3_tsize+1:   movq   %rsp,%rbp
>> nfs3_tsize+4:   subq   $0x10,%rsp
>> nfs3_tsize+8:   movq   %rdi,-0x8(%rbp)
>> nfs3_tsize+0xc:  

Re: [OmniOS-discuss] Slow NFS speeds at rsize > 128k

2015-01-07 Thread Richard Elling

> On Jan 7, 2015, at 12:11 PM, Stephan Budach  wrote:
> 
> Am 07.01.15 um 18:00 schrieb Richard Elling:
>> 
>>> On Jan 7, 2015, at 2:28 AM, Stephan Budach >> <mailto:stephan.bud...@jvm.de>> wrote:
>>> 
>>> Hello everyone,
>>> 
>>> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really bad 
>>> NFS read performance, when rsize goes beyond 128k, whereas the performance 
>>> is just fine at 32k. The issue is, that the ovs-agent, which is performing 
>>> the actual mount, doesn't accept or pass any NFS mount options to the NFS 
>>> server.
>> 
>> The other issue is that illumos/Solaris on x86 tuning of server-side size 
>> settings does
>> not work because the compiler optimizes away the tunables. There is a 
>> trivial fix, but it
>> requires a rebuild.
>> 
>>> To give some numbers, a rsize of 1mb results in a read throughput of 
>>> approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS 
>>> export from a OEL 6u4 box has no issues with this, as the read speeds from 
>>> this export are 108+MB/s regardles of the rsize of the NFS mount.
>> 
>> Brendan wrote about a similar issue in the Dtrace book as a case study. See 
>> chapter 5
>> case study on ZFS 8KB mirror reads.
>> 
>>> 
>>> The OmniOS box is currently connected to a 10GbE port at our core 6509, but 
>>> the NFS client is connected through a 1GbE port only. MTU is at 1500 and 
>>> can currently not be upped.
>>> Anyone having a tip, why a rsize of 64k+ will result in such a performance 
>>> drop?
>> 
>> It is entirely due to optimizations for small I/O going way back to the 
>> 1980s.
>>  -- richard
> But, doesn't that mean, that Oracle Solaris will have the same issue or has 
> Oracle addressed that in recent Solaris versions? Not, that I am intending to 
> switch over, but that would be something I'd like to give my SR engineer to 
> chew on…

Look for yourself :-)
In "broken" systems, such as this Solaris 11.1 system:
# echo nfs3_tsize::dis | mdb -k
nfs3_tsize: pushq  %rbp
nfs3_tsize+1:   movq   %rsp,%rbp
nfs3_tsize+4:   subq   $0x8,%rsp
nfs3_tsize+8:   movq   %rdi,-0x8(%rbp)
nfs3_tsize+0xc: movl   (%rdi),%eax
nfs3_tsize+0xe: leal   -0x2(%rax),%ecx
nfs3_tsize+0x11:cmpl   $0x1,%ecx
nfs3_tsize+0x14:jbe+0x12
nfs3_tsize+0x16:cmpl   $0x5,%eax
nfs3_tsize+0x19:movl   $0x10,%eax
nfs3_tsize+0x1e:movl   $0x8000,%ecx
nfs3_tsize+0x23:cmovl.ne %ecx,%eax
nfs3_tsize+0x26:jmp+0x5 
nfs3_tsize+0x28:movl   $0x10,%eax
nfs3_tsize+0x2d:leave  
nfs3_tsize+0x2e:ret

at +0x19 you'll notice hardwired 1MB

by contrast, on a proper system
# echo nfs3_tsize::dis | mdb -k
nfs3_tsize: pushq  %rbp
nfs3_tsize+1:   movq   %rsp,%rbp
nfs3_tsize+4:   subq   $0x10,%rsp
nfs3_tsize+8:   movq   %rdi,-0x8(%rbp)
nfs3_tsize+0xc: movl   (%rdi),%edx
nfs3_tsize+0xe: leal   -0x2(%rdx),%eax
nfs3_tsize+0x11:cmpl   $0x1,%eax
nfs3_tsize+0x14:jbe+0x12
nfs3_tsize+0x16:
movl   -0x37f8ea60(%rip),%eax   
nfs3_tsize+0x1c:cmpl   $0x5,%edx
nfs3_tsize+0x1f:
cmovl.ne -0x37f8ea72(%rip),%eax 
nfs3_tsize+0x26:leave  
nfs3_tsize+0x27:ret
nfs3_tsize+0x28:
movl   -0x37f8ea76(%rip),%eax   
nfs3_tsize+0x2e:leave  
nfs3_tsize+0x2f:ret

where you can actually tune it according to the Solaris Tunable Parameters 
guide.

NB, we fixed this years ago at Nexenta and I'm certain it has not been 
upstreamed. There are
a number of other related fixes, all of the same nature. If someone is inclined 
to upstream 
contact me directly.

Once, fixed, you'll be able to change the server's settings for negotiating the 
rsize/wsize with
the clients. Many NAS vendors use smaller limits, and IMHO it is a good idea 
anyway. For 
example, see 
http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html 
<http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html>
 -- richard


> 
> In any way, the first bummer is, that Oracle chose to not have it's ovs-agent 
> be capable of accepting and passing the NFS mount options…
> 
> Cheers,
> budy

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow NFS speeds at rsize > 128k

2015-01-07 Thread Richard Elling

> On Jan 7, 2015, at 2:28 AM, Stephan Budach  wrote:
> 
> Hello everyone,
> 
> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really bad 
> NFS read performance, when rsize goes beyond 128k, whereas the performance is 
> just fine at 32k. The issue is, that the ovs-agent, which is performing the 
> actual mount, doesn't accept or pass any NFS mount options to the NFS server.

The other issue is that illumos/Solaris on x86 tuning of server-side size 
settings does
not work because the compiler optimizes away the tunables. There is a trivial 
fix, but it
requires a rebuild.

> To give some numbers, a rsize of 1mb results in a read throughput of approx. 
> 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS export from a 
> OEL 6u4 box has no issues with this, as the read speeds from this export are 
> 108+MB/s regardles of the rsize of the NFS mount.

Brendan wrote about a similar issue in the Dtrace book as a case study. See 
chapter 5
case study on ZFS 8KB mirror reads.

> 
> The OmniOS box is currently connected to a 10GbE port at our core 6509, but 
> the NFS client is connected through a 1GbE port only. MTU is at 1500 and can 
> currently not be upped.
> Anyone having a tip, why a rsize of 64k+ will result in such a performance 
> drop?

It is entirely due to optimizations for small I/O going way back to the 1980s.
 -- richard

> 
> Thanks,
> budy
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] slow drive response times

2015-01-06 Thread Richard Elling

> On Jan 6, 2015, at 3:25 PM, Kevin Swab  wrote:
> 
> Thanks!  This has been very educational.  Let me see if I have this
> straight:  The zero error counts for the HBA and the expander ports
> eliminate either of those as the source of the errors seen in the
> sg_logs output - is that right?

Not quite. Zero error counts for HBA, expander, and disk ports eliminates 
cabling as the source of latency issues.

> 
> So back to my original question:  If I see long service times on a
> drive, and it shows errors in the drive counters you mentioned, but not
> on the expander ports or HBAs, then is it safe to conclude the fault
> lies with the drive?

With high probability.
 -- richard

> 
> Kevin
> 
> On 01/06/2015 02:23 PM, Richard Elling wrote:
>> 
>>> On Jan 6, 2015, at 12:18 PM, Kevin Swab  wrote:
>>> 
>>> SAS expanders are involved in my systems, so I installed 'sasinfo' and
>>> 'smp_utils'.  After a bit of poking around in the dark, I came up with
>>> the following commands which I think get at the error counters you
>>> mentioned.
>> 
>> Yes, this data looks fine
>> 
>>> 
>>> Unfortunately, I had to remove the "wounded soldier" from this system
>>> since it was causing problems.  This output is from the same slot, but
>>> with a healthy replacement drive:
>>> 
>>> # sasinfo hba-port -a SUNW-mpt_sas-1 -l
>>> HBA Name: SUNW-mpt_sas-1
>>> HBA Port Name: /dev/cfg/c7
>>>   Phy Information:
>>> Identifier: 0
>>>   Link Error Statistics:
>>>   Invalid Dword: 0
>>>   Running Disparity Error: 0
>>>   Loss of Dword Sync: 0
>>>   Reset Problem: 0
>>> Identifier: 1
>>>   Link Error Statistics:
>>>   Invalid Dword: 0
>>>   Running Disparity Error: 0
>>>   Loss of Dword Sync: 0
>>>   Reset Problem: 0
>>> Identifier: 2
>>>   Link Error Statistics:
>>>   Invalid Dword: 0
>>>   Running Disparity Error: 0
>>>   Loss of Dword Sync: 0
>>>   Reset Problem: 0
>>> Identifier: 3
>>>   Link Error Statistics:
>>>   Invalid Dword: 0
>>>   Running Disparity Error: 0
>>>   Loss of Dword Sync: 0
>>>   Reset Problem: 0
>> 
>> perfect!
>> 
>>> HBA Port Name: /dev/cfg/c8
>>>   Phy Information:
>>> Identifier: 4
>>>   Link Error Statistics:
>>>   Invalid Dword: 0
>>>   Running Disparity Error: 0
>>>   Loss of Dword Sync: 0
>>>   Reset Problem: 0
>>> Identifier: 5
>>>   Link Error Statistics:
>>>   Invalid Dword: 0
>>>   Running Disparity Error: 0
>>>   Loss of Dword Sync: 0
>>>   Reset Problem: 0
>>> Identifier: 6
>>>   Link Error Statistics:
>>>   Invalid Dword: 0
>>>   Running Disparity Error: 0
>>>   Loss of Dword Sync: 0
>>>   Reset Problem: 0
>>> Identifier: 7
>>>   Link Error Statistics:
>>>   Invalid Dword: 0
>>>   Running Disparity Error: 0
>>>   Loss of Dword Sync: 0
>> 
>> perfect!
>> 
>>> 
>>> 
>>> 
>>> # ./smp_discover /dev/smp/expd9 | egrep '(c982|c983)'
>>> phy  26:U:attached:[5394a8cbc982:00  t(SSP)]  6 Gbps
>>> # ./smp_discover /dev/smp/expd11 | egrep '(c982|c983)'
>>> phy  26:U:attached:[5394a8cbc983:01  t(SSP)]  6 Gbps
>>> # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd9
>>> Report phy error log response:
>>> Expander change count: 228
>>> phy identifier: 26
>>> invalid dword count: 0
>>> running disparity error count: 0
>>> loss of dword synchronization count: 0
>>> phy reset problem count: 0
>>> # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd11
>>> Report phy error log response:
>>> Expander change count: 228
>>> phy identifier: 26
>>> invalid dword count: 0
>>> running disparity error count: 0
>>> loss of dword synchronization count: 0
>>> phy reset problem count: 0
>>> #
>>> 
>>> "disparity error count" and "loss of dword sync count" are 0 in all of
>>> this output, in contrast with the non-zero values seen in the sg_logs
&

Re: [OmniOS-discuss] slow drive response times

2015-01-06 Thread Richard Elling

> On Jan 6, 2015, at 12:18 PM, Kevin Swab  wrote:
> 
> SAS expanders are involved in my systems, so I installed 'sasinfo' and
> 'smp_utils'.  After a bit of poking around in the dark, I came up with
> the following commands which I think get at the error counters you
> mentioned.

Yes, this data looks fine

> 
> Unfortunately, I had to remove the "wounded soldier" from this system
> since it was causing problems.  This output is from the same slot, but
> with a healthy replacement drive:
> 
> # sasinfo hba-port -a SUNW-mpt_sas-1 -l
> HBA Name: SUNW-mpt_sas-1
>  HBA Port Name: /dev/cfg/c7
>Phy Information:
>  Identifier: 0
>Link Error Statistics:
>Invalid Dword: 0
>Running Disparity Error: 0
>Loss of Dword Sync: 0
>Reset Problem: 0
>  Identifier: 1
>Link Error Statistics:
>Invalid Dword: 0
>Running Disparity Error: 0
>Loss of Dword Sync: 0
>Reset Problem: 0
>  Identifier: 2
>Link Error Statistics:
>Invalid Dword: 0
>Running Disparity Error: 0
>Loss of Dword Sync: 0
>Reset Problem: 0
>  Identifier: 3
>Link Error Statistics:
>Invalid Dword: 0
>Running Disparity Error: 0
>Loss of Dword Sync: 0
>Reset Problem: 0

perfect!

>  HBA Port Name: /dev/cfg/c8
>Phy Information:
>  Identifier: 4
>Link Error Statistics:
>Invalid Dword: 0
>Running Disparity Error: 0
>Loss of Dword Sync: 0
>Reset Problem: 0
>  Identifier: 5
>Link Error Statistics:
>Invalid Dword: 0
>Running Disparity Error: 0
>Loss of Dword Sync: 0
>Reset Problem: 0
>  Identifier: 6
>Link Error Statistics:
>Invalid Dword: 0
>Running Disparity Error: 0
>Loss of Dword Sync: 0
>Reset Problem: 0
>  Identifier: 7
>Link Error Statistics:
>Invalid Dword: 0
>Running Disparity Error: 0
>Loss of Dword Sync: 0

perfect!

> 
> 
> 
> # ./smp_discover /dev/smp/expd9 | egrep '(c982|c983)'
>  phy  26:U:attached:[5394a8cbc982:00  t(SSP)]  6 Gbps
> # ./smp_discover /dev/smp/expd11 | egrep '(c982|c983)'
>  phy  26:U:attached:[5394a8cbc983:01  t(SSP)]  6 Gbps
> # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd9
> Report phy error log response:
>  Expander change count: 228
>  phy identifier: 26
>  invalid dword count: 0
>  running disparity error count: 0
>  loss of dword synchronization count: 0
>  phy reset problem count: 0
> # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd11
> Report phy error log response:
>  Expander change count: 228
>  phy identifier: 26
>  invalid dword count: 0
>  running disparity error count: 0
>  loss of dword synchronization count: 0
>  phy reset problem count: 0
> #
> 
> "disparity error count" and "loss of dword sync count" are 0 in all of
> this output, in contrast with the non-zero values seen in the sg_logs
> output for the "wounded soldier".

perfect!

> 
> Am I looking at the right output?

Yes, this is not showing any errors, which is a good thing.

>  Does "phy" in the above commands
> refer to the HDD itself or the port on the expander it's connected to?

Expander port. The HDD's view is in the sg_logs --page=0x18 /dev/rdsk/...

> Had I been able to run the above commands with the "wounded soldier"
> still installed, what should I have been looking for?

The process is to rule out errors. You have succeeded.
 -- richard

> 
> Thanks again for your help,
> Kevin
> 
> 
> On 01/02/2015 03:45 PM, Richard Elling wrote:
>> 
>>> On Jan 2, 2015, at 1:50 PM, Kevin Swab >> <mailto:kevin.s...@colostate.edu>> wrote:
>>> 
>>> I've run 'sg_logs' on the drive I pulled last week.  There were alot of
>>> errors in the backgroud scan section of the output, which made it very
>>> large, so I put it here:
>>> 
>>> http://pastebin.com/jx5BvSep
>>> 
>>> When I pulled this drive, the SMART health status was OK.  
>> 
>> SMART isn’t smart :-P
>> 
>>> However, when
>>> I put it in a test system to run 'sg_logs', the status changed to
>>> "impending failure...".  Had the SMART status changed before pulling the
>>> drive, I'm sure 'fmd' would have alerted me to the problem

Re: [OmniOS-discuss] High Availability storage with ZFS

2015-01-06 Thread Richard Elling

> On Jan 6, 2015, at 9:28 AM, Schweiss, Chip  wrote:
> 
> On Tue, Jan 6, 2015 at 5:16 AM, Filip Marvan  > wrote:
> Hi
> 
>  
> 
> as few guys before, I'm thinking again about High Availability storage with 
> ZFS. I know, that there is great commercial RSF-1, but that's quite expensive 
> for my needs.
> 
> I know, that Sašo did a great job about that on his blog 
> http://zfs-create.blogspot.cz  but I never 
> found the way, how to successfully configure that on current OmniOS versions.
> 
>  
> 
> So I'm thinking about something more simple. Arrange two LUNs from two OmniOS 
> ZFS storages in one software mirror through fibrechannel. Arrange that mirror 
> in client, for example mdadm in Linux. I know, that it will have performance 
> affect and I will lost some ZFS advantages, but I still can use snapshots, 
> backups with send/receive and some other interesting ZFS things, so it could 
> be usable for some projects.
> 
> Is there anyone, who tried that before? Any eperience with that?
> 
> 
> While this sounds technically possible, it is not HA.  Your client is the 
> single point of failure.   I would wager that mdadm would create more 
> availability issues than it would be solving.   
> 
> I run RSF-1 and HA is still hard to achieve.  

HA: 98% perspiration, 2% good fortune :-)

But seriously, providing HA services is much, much more than just running 
software.
 -- richard

> I don't think I have gained any additional up-time overcoming failures, but 
> it definitely helps with planned maintenance.   Unfortunately, there are 
> still too many ways a zfs pool can fail that having a second server connected 
> does not help.   
> 
> -Chip
>  
> 
>  
> 
> Thank you,
> 
>  
> 
> Filip Marvan
> 
>  
> 
>  
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com 
> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
> 
> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] slow drive response times

2015-01-02 Thread Richard Elling

> On Jan 2, 2015, at 1:50 PM, Kevin Swab  wrote:
> 
> I've run 'sg_logs' on the drive I pulled last week.  There were alot of
> errors in the backgroud scan section of the output, which made it very
> large, so I put it here:
> 
> http://pastebin.com/jx5BvSep
> 
> When I pulled this drive, the SMART health status was OK.  

SMART isn’t smart :-P

> However, when
> I put it in a test system to run 'sg_logs', the status changed to
> "impending failure...".  Had the SMART status changed before pulling the
> drive, I'm sure 'fmd' would have alerted me to the problem…

By default, fmd looks for the predictive failure (PFA) and self-test every hour 
using the disk_transport 
agent. fmstat should show activity there. When a PFA is seen, then there will 
be an ereport generated
and, for most cases, a syslog message. However, this will not cause a 
zfs-retire event.

Vendors have significant leeway in how they implement SMART. In my experience 
the only thing
you can say for sure is if the vendor thinks the drive’s death is imminent, 
then you should replace
it. I suspect these policies are financially motivated rather than scientific… 
some amount of truthiness
is to be expected.

In the logs, clearly the one disk has lots of errors that have been corrected 
and the rate is increasing.
The rate of change for "Errors corrected with possible delays” may correlate to 
your performance issues,
but the interpretation is left up to the vendors.

In the case of this naughty drive, yep it needs replacing.

> 
> Since that drive had other indications of trouble, I ran 'sg_logs' on
> another drive I pulled recently that has a SMART health status of OK,
> but exibits similar slow service time behavior:
> 
> http://pastebin.com/Q0t8Jnug <http://pastebin.com/Q0t8Jnug>

This one looks mostly healthy.

Another place to look for latency issues is the phy logs. In the sg_logs 
output, this is the
Protocol Specific port log page for SAS SSP. Key values are running disparity 
error 
count and loss of dword sync count. The trick here is that you need to look at 
both ends
of the wire for each wire. For a simple case, this means looking at both the 
HBA’s phys error
counts and the driver. If you have expanders in the mix, it is more work. 
You’ll want to look at 
all of the HBA, expander, and drive phys health counters for all phys.

This can get tricky because wide ports are mostly dumb. For example, if an HBA 
has a 4-link
wide port (common) and one of the links is acting up (all too common) the 
latency impacts 
will be random.

To see HBA and expander link health, you can use sg3_utils, its companion 
smp_utils, or
sasinfo (installed as a separate package from OmniOS, IIRC). For example, 
sasinfo hba-port -l

HTH
 — richard


> 
> Thanks for taking the time to look at these, please let me know what you
> find...
> 
> Kevin
> 
> 
> 
> 
> On 12/31/2014 06:13 PM, Richard Elling wrote:
>> 
>>> On Dec 31, 2014, at 4:30 PM, Kevin Swab  wrote:
>>> 
>>> Hello Richard and group, thanks for your reply!
>>> 
>>> I'll look into sg_logs for one of these devices once I have a chance to
>>> track that progam down...
>>> 
>>> Thanks for the tip on the 500 ms latency, I wasn't aware that could
>>> happen in normal cases.  However, I don't believe what I'm seeing
>>> constitutes normal behavior.
>>> 
>>> First, some anecdotal evidence:  If I pull and replace the suspect
>>> drive, my downstream systems stop complaining, and the high service time
>>> numbers go away.
>> 
>> We call these "wounded soldiers"  -- it takes more resources to manage a
>> wounded soldier than a dead soldier, so one strategy of war is to wound your
>> enemy causing them to consume resources tending the wounded. The sg_logs
>> should be enlightening.
>> 
>> NB, consider a 4TB disk with 5 platters: if a head or surface starts to go, 
>> then
>> you  have a 1/10 chance that the data you request is under the damaged head
>> and will need to be recovered by the drive. So it is not uncommon to see
>> 90+% of the I/Os to the drive completing quickly. It is also not unusual to 
>> see
>> only a small number of sectors or tracks affected.
>> 
>> Detecting these becomes tricky, especially as you reduce the timeout/retry
>> interval, since the problem is rarely seen in the average latency -- that 
>> which
>> iostat and sar record. This is an area where we can and are improving.
>> -- richard
>> 
>>> 
>>> I threw out 500 ms as a guess to the point at which I start seeing
>>> problems.  However, I see service times 

Re: [OmniOS-discuss] Ang: Ang: Ang: Re: LU read only and r/w for different hosts?

2015-01-02 Thread Richard Elling

> On Jan 2, 2015, at 9:52 AM, Johan Kragsterman  
> wrote:
> 
> Hmmm againa lot of hmmm's here today...
> 
> 
> Been reading some more, and it looks like it is possible to reserve at LU 
> level.

You are correct. The spec is for targets as managed by the initiator. If you’d 
like to propose an
enhancement to the spec… :-)

> I found a guy that uses an sg-persist command on solaris 11.1, but I don't 
> find it in OmniOS. I did a pkg search, but perhaps I don't know what to 
> search for…?

I’m not sure if OmniTI packages it, but source is readily available at 
http://sg.danny.cz/sg/sg3_utils.html 
 — richard

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LU read only and r/w for different hosts?

2015-01-02 Thread Richard Elling
It has been a while since using comstar, but the SCSI protocol has
WERO group reservations. Would that suffice?

 -- richard


> On Jan 2, 2015, at 12:18 AM, Johan Kragsterman  
> wrote:
> 
> Hi!
> 
> 
> I've been thinking about this for a while, and haven't figured out a solution.
> 
> I'd like to have a possibility to set LU read only for some hosts, but r/w to 
> others, for the same LU. There are possibilities to set read only, or r/w, on 
> a LU, but that property is valid for all hosts, it is not(afaik) possible to 
> choose which hosts are going to get read only, and which are going to get r/w.
> 
> This is an access controll operation, and as such, imho, should be controlled 
> by comstar. It is the responsability of the view to handle this, but I 
> haven't seen this anywhere in the comstar/stmf configuration posibilities.
> 
> Are there someone on this list that can shed some light on this?
> 
> 
> Best regards from/Med vänliga hälsningar från
> 
> Johan Kragsterman
> 
> Capvert
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


  1   2   >