Re: [OmniOS-discuss] NVMe JBOF

2018-12-14 Thread Schweiss, Chip
On Fri, Dec 14, 2018 at 10:20 AM Richard Elling <
richard.ell...@richardelling.com> wrote:

>
> I can't speak to the Supermicro, but I can talk in detail about
> https://www.vikingenterprisesolutions.com/products-2/nds-2244/
>
> >
> > While I do not run HA because of too many issues, I still build
> everything with two server nodes.  This makes updates and reboots possible
> by moving a pool to the sister host and greatly minimizing downtime.   This
> is essential when the NFS target is hosting 300+ vSphere VMs.
>
> The NDS-2244 is a 24-slot u.2 NVMe chassis with programmable PCIe switches.
> To the host, the devices look like locally attached NVMe and there is no
> software
> changes required. Multiple hosts can connect, up to the PCIe port limits.
> If you use
> dual-port NVMe drives, then you can share the drives between any two hosts
> concurrently.
> Programming the switches is accomplished out-of-band by an HTTTP-based
> interface
> that also monitors the enclosure.
>
> In other words, if you want a NVMe equivalent to a dual-hosted SAS JBOD,
> the NDS-2244
> is very capable, but more configurable.
>  -- richard
>
>
This is execellent.   I like the idea of only one host seeing the SSDs at
once, but a programatic way to flip them to the other host.   This solves
the fencing problem in ZFS nicely.

Thanks for product reference.   The Viking JBOF looks like what I need.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] NVMe JBOF

2018-12-14 Thread Schweiss, Chip
Has the NVMe support in Illumos come far enough along to properly support
two servers connected to NVMe JBOF storage such as the Supermicro
SSG-136R-N32JBF?

While I do not run HA because of too many issues, I still build everything
with two server nodes.  This makes updates and reboots possible by moving a
pool to the sister host and greatly minimizing downtime.   This is
essential when the NFS target is hosting 300+ vSphere VMs.

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Panic on OmniOS CE r151022ay

2018-08-31 Thread Schweiss, Chip
Looks like the fix is missing:

# mdb -ke xattr_dir_inactive::dis | grep mutex
xattr_dir_inactive+0x1f:call   -0x304cf4
xattr_dir_inactive+0x3c:call   -0x304bf1
xattr_dir_inactive+0x73:call   -0x304c28

Looking closer I thought I had updated this system after the first crash
but did not.  However,  I had explicitly put that patch in place back in
January, but it may not have made it into later OmniOS CE releases that the
system was upgraded to.

I just ran the test on an r151022bk system and it passes.   I'll get this
system updated ASAP.

Thanks!
-Chip

On Thu, Aug 30, 2018 at 5:08 PM, Andy Fiddaman  wrote:

>
> On Thu, 30 Aug 2018, Schweiss, Chip wrote:
>
> ; > panicstack = unix:real_mode_stop_cpu_stage2_end+b203
> () |
> ; > unix:trap+a70 () | unix:cmntrap+e6 () | zfs:zfs_getattr+1a0 () |
> ; > genunix:fop_getattr+a8 () | genunix:xattr_dir_getattr+16c () |
> ; > genunix:fop_getattr+a8 () | nfssrv:rfs4_delegated_getattr+20 () |
> ; > nfssrv:acl3_getxattrdir+102 () | nfssrv:common_dispatch+5ab () |
> ; > nfssrv:acl_dispatch+2d () | rpcmod:svc_getreq+1c1 () |
> rpcmod:svc_run+e0 ()
> ; > | rpcmod:svc_do_run+8e () | nfs:nfssys+111 () |
> unix:brand_sys_sysenter+1d3
>
> That does look quite similar to issue 8806 that was fixed earlier in the
> year. Can you check that the fix is in place on your box since you're
> running a version of OmniOS from May.
>
> If this produces any output, then the fix is missing, otherwise it's
> something
> else.
>
> mdb -ke xattr_dir_inactive::dis | grep mutex
>
> Please can you open an issue for this at
> https://github.com/omniosorg/illumos-omnios/issues/new
> in the first instance as it may be OmniOS-specific?
>
> Andy
>
> --
> Citrus IT Limited | +44 (0)333 0124 007 | enquir...@citrus-it.co.uk
> Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ
> Registered in England and Wales | Company number 4899123
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Panic on OmniOS CE r151022ay

2018-08-30 Thread Schweiss, Chip
Here's the dump from the panic:

ftp://ftp.nrg.wustl.edu/pub/zfs/mirpool03-xattr-20180830-vmdump.1



On Thu, Aug 30, 2018 at 9:29 AM, Schweiss, Chip  wrote:

> I've seen this panic twice now in the past couple weeks.   Does anyone
> know if there is a patch already that fixes this?  Looks like another xattr
> problem.
>
> # fmdump -Vp -u b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
> TIME   UUID
>  SUNW-MSG-ID
> Aug 30 2018 08:29:32.089419000 b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
> SUNOS-8000-KL
>
>   TIME CLASS ENA
>   Aug 30 08:27:50.8299 ireport.os.sunos.panic.dump_pending_on_device
> 0x
>
> nvlist version: 0
> version = 0x0
> class = list.suspect
> uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
> code = SUNOS-8000-KL
> diag-time = 1535635766 223254
> de = fmd:///module/software-diagnosis
> fault-list-sz = 0x1
> fault-list = (array of embedded nvlists)
> (start fault-list[0])
> nvlist version: 0
> version = 0x0
> class = defect.sunos.kernel.panic
> certainty = 0x64
> asru = sw:///:path=/var/crash//.b7c98
> 40b-8bb1-cbbc-e165-a5b6fa34078b
> resource = sw:///:path=/var/crash//.b7c98
> 40b-8bb1-cbbc-e165-a5b6fa34078b
> savecore-succcess = 0
> os-instance-uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
> panicstr = BAD TRAP: type=d (#gp General protection)
> rp=d001e9855360 addr=d063784ee8d0
> panicstack = unix:real_mode_stop_cpu_stage2_end+b203 () |
> unix:trap+a70 () | unix:cmntrap+e6 () | zfs:zfs_getattr+1a0 () |
> genunix:fop_getattr+a8 () | genunix:xattr_dir_getattr+16c () |
> genunix:fop_getattr+a8 () | nfssrv:rfs4_delegated_getattr+20 () |
> nfssrv:acl3_getxattrdir+102 () | nfssrv:common_dispatch+5ab () |
> nfssrv:acl_dispatch+2d () | rpcmod:svc_getreq+1c1 () | rpcmod:svc_run+e0 ()
> | rpcmod:svc_do_run+8e () | nfs:nfssys+111 () | unix:brand_sys_sysenter+1d3
> () |
> crashtime = 1535633923
> panic-time = Thu Aug 30 07:58:43 2018 CDT
> (end fault-list[0])
>
> fault-status = 0x1
> severity = Major
> __ttl = 0x1
> __tod = 0x5b87f13c 0x5546cf8
>
> Let me know what other information I can provide here.
>
> -Chip
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Panic on OmniOS CE r151022ay

2018-08-30 Thread Schweiss, Chip
I've seen this panic twice now in the past couple weeks.   Does anyone know
if there is a patch already that fixes this?  Looks like another xattr
problem.

# fmdump -Vp -u b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
TIME   UUID
 SUNW-MSG-ID
Aug 30 2018 08:29:32.089419000 b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
SUNOS-8000-KL

  TIME CLASS ENA
  Aug 30 08:27:50.8299 ireport.os.sunos.panic.dump_pending_on_device
0x

nvlist version: 0
version = 0x0
class = list.suspect
uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
code = SUNOS-8000-KL
diag-time = 1535635766 223254
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru = sw:///:path=/var/crash//.b7c9840b-8bb1-cbbc-e165-
a5b6fa34078b
resource = sw:///:path=/var/crash//.b7c9840b-8bb1-cbbc-e165-
a5b6fa34078b
savecore-succcess = 0
os-instance-uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b
panicstr = BAD TRAP: type=d (#gp General protection)
rp=d001e9855360 addr=d063784ee8d0
panicstack = unix:real_mode_stop_cpu_stage2_end+b203 () |
unix:trap+a70 () | unix:cmntrap+e6 () | zfs:zfs_getattr+1a0 () |
genunix:fop_getattr+a8 () | genunix:xattr_dir_getattr+16c () |
genunix:fop_getattr+a8 () | nfssrv:rfs4_delegated_getattr+20 () |
nfssrv:acl3_getxattrdir+102 () | nfssrv:common_dispatch+5ab () |
nfssrv:acl_dispatch+2d () | rpcmod:svc_getreq+1c1 () | rpcmod:svc_run+e0 ()
| rpcmod:svc_do_run+8e () | nfs:nfssys+111 () | unix:brand_sys_sysenter+1d3
() |
crashtime = 1535633923
panic-time = Thu Aug 30 07:58:43 2018 CDT
(end fault-list[0])

fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x5b87f13c 0x5546cf8

Let me know what other information I can provide here.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Big Data

2018-06-19 Thread Schweiss, Chip
I've used BeeGFS on Ubuntu 16.04 for about a year now.   I like your idea
of putting it lxzones on OmniOS for scratch space.

I have found it to scale with millions of files very well.   It's running
on a 4 node cluster.  Each node is client, metadata and data nodes.   These
are very big GPU boxes with 9 Tesla GPUs, 40 CPU cores and 256GB ram.

The metadata is mirrored on 2 Samsung Pro SSDs on each node.   It sustains
about 33k metadata ops with never more than one queued.

This is my third iteration of setting it up.  Metadata performance was our
bottleneck each time previously.   What I have found is that latency and
horizontal scaling is king with BeeGFS metadata.   It doesn't take a lot of
CPU, but keep it close as possible on the network to the clients and keep
latency low with fast network and SSDs.

My complaints about BeeGFS is lack of snapshots, so backup is limited to
rsync of a live file system.   For this reason it's only used for this very
high read demand cluster.  I still use ZFS on OmniOS for our PBs of data
where snapshots and replication are priceless.


-Chip



On Sat, Jun 16, 2018 at 12:45 PM, Michael Talbott  wrote:

> We've been using OmniOS happily for years now for our storage server
> needs. But we're rapidly increasing our data footprint and growing so much
> (multiple PBs per year) that ideally I'd like to move to a cluster based
> object store based system ontop of OmniOS. I successfully use BeeGFS inside
> lxzones in OmniOS which seems to work nicely for our HPC scratch volume,
> but, it doesn't sound like it scales to hundreds of millions of files very
> well.
>
> I am hoping that someone has some ideas for me. Ideally I'd like something
> that's cluster capable and has erasure coding like Ceph and have cluster
> aware snapshots (not local zfs snaps) and an s3 compatibility/access layer.
>
> Any thoughts on the topic are greatly appreciated.
>
> Thanks,
>
> Michael
> Sent from my iPhone
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] rpcgen

2018-05-09 Thread Schweiss, Chip
I need the rpcgen on OmniOS.  Any suggestions on where I can get this, or
do I need to build it myself?

Thanks,
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] rpcgen

2018-05-09 Thread Schweiss, Chip
I just found it in 'developer/object-file'.

Sorry for the bother.

-Chip

On Wed, May 9, 2018 at 7:54 AM, Schweiss, Chip  wrote:

> I need the rpcgen on OmniOS.  Any suggestions on where I can get this, or
> do I need to build it myself?
>
> Thanks,
> -Chip
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-19 Thread Schweiss, Chip
The fault manager is starting now.  However, the disks still as show as
UNAVAIL when running zpool import.

# zpool import
   pool: hcp-arc01
 id: 11579406004081253836
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://illumos.org/msg/ZFS-8000-3C
 config:

hcp-arc01UNAVAIL
insufficient replicas
  raidz3-0   UNAVAIL
insufficient replicas
c0t5000C50093E3BE87d0p0  UNAVAIL
cannot open
c0t5000C50086B52EABd0p0  UNAVAIL
cannot open
c0t5000C50093F046A7d0p0  UNAVAIL
cannot open
c0t5000C50093E3086Fd0p0  UNAVAIL
cannot open
c0t5000C50093E85C07d0p0  UNAVAIL
cannot open
c0t5000C50093E3BED3d0p0  UNAVAIL
cannot open
c0t5000C50093E39267d0p0  UNAVAIL
cannot open
c0t5000C50093E309DBd0p0  UNAVAIL
cannot open
c0t5000C50093E31407d0p0  UNAVAIL
cannot open
c0t5000C50093E3885Bd0p0  UNAVAIL
cannot open
c0t5000C50093E344D7d0p0  UNAVAIL
cannot open
c0t5000C50093E332AFd0p0  UNAVAIL
cannot open
c0t5000C50093F04A2Fd0p0  UNAVAIL
cannot open
c0t5000C50093F04763d0p0  UNAVAIL
cannot open
c0t5000C50086B5DCE3d0p0  UNAVAIL
cannot open
c0t5000C50086B5CD37d0p0  UNAVAIL
cannot open
c0t5000C50086B5E263d0p0  UNAVAIL
cannot open
c0t5000C50086B5CD07d0p0  UNAVAIL
cannot open
c0t5000C50086B5DB3Bd0p0  UNAVAIL
cannot open
c0t5000C50086B5D95Fd0p0  UNAVAIL
cannot open
c0t5000C50086B566BBd0p0  UNAVAIL
cannot open
c0t5000C50086B5F38Fd0p0  UNAVAIL
cannot open
c0t5000C50093E37C97d0p0  UNAVAIL
cannot open
c0t5000C50093E3909Bd0p0  UNAVAIL
cannot open
  raidz3-1   UNAVAIL
insufficient replicas
c0t5000C50093E85C1Fd0p0  UNAVAIL
cannot open
c0t5000C50093E3A29Fd0p0  UNAVAIL
cannot open
c0t5000C50093E342BFd0p0  UNAVAIL
cannot open
c0t5000C50093E359DFd0p0  UNAVAIL
cannot open
c0t5000C50086B5281Fd0p0  UNAVAIL
cannot open
c0t5000C50093E331F7d0p0  UNAVAIL
cannot open
c0t5000C50093E35A93d0p0  UNAVAIL
cannot open
c0t5000C50093E38347d0p0  UNAVAIL
cannot open
c0t5000C50093E8532Bd0p0  UNAVAIL
cannot open
c0t5000C50093E3422Fd0p0  UNAVAIL
cannot open
c0t5000C50093CFA493d0p0  UNAVAIL
cannot open
c0t5000C50093E29DB3d0p0  UNAVAIL
cannot open
c0t5000C50093E3B70Bd0p0  UNAVAIL
cannot open
c0t5000C50093E3946Fd0p0  UNAVAIL
cannot open
c0t5000C50086B5319Bd0p0  UNAVAIL
cannot open
c0t5000C50086B5608Bd0p0  UNAVAIL
cannot open
c0t5000C50086B5D9B7d0p0  UNAVAIL
cannot open
c0t5000C50086B5E1ABd0p0  UNAVAIL
cannot open
c0t5000C50093E85D93d0p0  UNAVAIL
cannot open
c0t5000C50093E85C73d0p0  UNAVAIL
cannot open
c0t5000C50086B5D7CBd0p0  UNAVAIL
cannot open
c0t5000C50093E33F23d0p0  UNAVAIL
cannot open
c0t5000C50093E36A8Fd0p0  UNAVAIL
cannot open
c0t5000C50093E30193d0p0  UNAVAIL
cannot open
  raidz3-2   UNAVAIL
insufficient replicas
c0t5000C50093E34E3Fd0p0  UNAVAIL
cannot open
c0t5000C50093E36DB7d0p0  UNAVAIL
cannot open
c0t5000C50093E2C467d0p0  UNAVAIL
cannot open
c0t5000C50093E3A213d0p0  UNAVAIL
cannot open
c0t5000C50093E387E3d0p0

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-19 Thread Schweiss, Chip
On Mon, Mar 19, 2018 at 9:33 AM, Andy Fiddaman  wrote:

>
> On Mon, 19 Mar 2018, Schweiss, Chip wrote:
>
> ; On Mon, Mar 19, 2018 at 9:19 AM, Andy Fiddaman 
> wrote:
> ;
> ; >
> ; > I'll have a look at this for you and get a hot-fix built. I have the
> core
> ; > file that you made available so just need to go through and work out
> why
> ; > it thinks there are 0 phys somewhere.
> ; >
> ; >
> ; Many thanks!
> ;
> ; In discusscussion with JBOD vendor support, this JBOD has two SAS
> ; expanders, which are linked together.  One is likely incorrectly
> reporting
> ; 0 and should be ignored.
>
> The device is identifying as ESC_ELECTRONICS rather than a SAS_EXPANDER but
> I'll do some more digging.
>
>
You might be on to something.  I was suspicious of Element 96 when
examining via sg_ses.  This JBOD has 96 slots and no display panel.  The
vendor suspected other issues.

sg_ses -p ed /dev/es/ses1
  RAIDINC   96BAY 1715
  Primary enclosure logical identifier (hex): 500093d230938000
Element Descriptor In diagnostic page:
  generation code: 0x1
  element descriptor list (grouped by type):
Element type: Array device slot, subenclosure id: 0 [ti=0]
  Overall descriptor: Array Dev Slot
  Element 0 descriptor: SLOT 01 11
  Element 1 descriptor: SLOT 02 12
  Element 2 descriptor: SLOT 03 13
  Element 3 descriptor: SLOT 04 14
  Element 4 descriptor: SLOT 05 15
  Element 5 descriptor: SLOT 06 16
  Element 6 descriptor: SLOT 07 17
  Element 7 descriptor: SLOT 08 18
  Element 8 descriptor: SLOT 09 19
  Element 9 descriptor: SLOT 10 1A
  Element 10 descriptor: SLOT 11 1B
  Element 11 descriptor: SLOT 12 1C
  Element 12 descriptor: SLOT 13 1D
  Element 13 descriptor: SLOT 14 1E
  Element 14 descriptor: SLOT 15 21
  Element 15 descriptor: SLOT 16 22
  Element 16 descriptor: SLOT 17 23
  Element 17 descriptor: SLOT 18 24
  Element 18 descriptor: SLOT 19 25
  Element 19 descriptor: SLOT 20 26
  Element 20 descriptor: SLOT 21 27
  Element 21 descriptor: SLOT 22 28
  Element 22 descriptor: SLOT 23 29
  Element 23 descriptor: SLOT 24 2A
  Element 24 descriptor: SLOT 25 2B
  Element 25 descriptor: SLOT 26 2C
  Element 26 descriptor: SLOT 27 2D
  Element 27 descriptor: SLOT 28 2E
  Element 28 descriptor: SLOT 29 31
  Element 29 descriptor: SLOT 30 32
  Element 30 descriptor: SLOT 31 33
  Element 31 descriptor: SLOT 32 34
  Element 32 descriptor: SLOT 33 35
  Element 33 descriptor: SLOT 34 36
  Element 34 descriptor: SLOT 35 37
  Element 35 descriptor: SLOT 36 38
  Element 36 descriptor: SLOT 37 39
  Element 37 descriptor: SLOT 38 3A
  Element 38 descriptor: SLOT 39 3B
  Element 39 descriptor: SLOT 40 3C
  Element 40 descriptor: SLOT 41 3D
  Element 41 descriptor: SLOT 42 3E
  Element 42 descriptor: SLOT 43 41
  Element 43 descriptor: SLOT 44 42
  Element 44 descriptor: SLOT 45 43
  Element 45 descriptor: SLOT 46 44
  Element 46 descriptor: SLOT 47 45
  Element 47 descriptor: SLOT 48 46
  Element 48 descriptor: SLOT 49 47
  Element 49 descriptor: SLOT 50 49
  Element 50 descriptor: SLOT 51 4A
  Element 51 descriptor: SLOT 52 4B
  Element 52 descriptor: SLOT 53 4C
  Element 53 descriptor: SLOT 54 4D
  Element 54 descriptor: SLOT 55 4E
  Element 55 descriptor: SLOT 56 51
  Element 56 descriptor: SLOT 57 52
  Element 57 descriptor: SLOT 58 53
  Element 58 descriptor: SLOT 59 54
  Element 59 descriptor: SLOT 60 55
  Element 60 descriptor: SLOT 61 56
  Element 61 descriptor: SLOT 62 57
  Element 62 descriptor: SLOT 63 59
  Element 63 descriptor: SLOT 64 5A
  Element 64 descriptor: SLOT 65 5B
  Element 65 descriptor: SLOT 66 5C
  Element 66 descriptor: SLOT 67 5D
  Element 67 descriptor: SLOT 68 5E
  Element 68 descriptor: SLOT 69 61
  Element 69 descriptor: SLOT 70 62
  Element 70 descriptor: SLOT 71 63
  Element 71 descriptor: SLOT 72 64
  Element 72 descriptor: SLOT 73 65
  Element 73 descriptor: SLOT 74 66
  Element 74 descriptor: SLOT 75 67
  Element 75 descriptor: SLOT 76 68
  Element 76 descriptor: SLOT 77 69
  Element 77 descriptor: SLOT 78 6A
  Element 78 descriptor: SLOT 79 6B
  Element 79 descriptor: SLOT 80 6C
  Element 80 descriptor: SLOT 81 6D
  Element 81 descriptor: SLOT 82 6E
  Element 82 descriptor: SLOT 83 71
  Element 83 descriptor: SLOT 84 72
  Element 84 descriptor: SLOT 85 73
  Element 85 descriptor: SLOT 86 74
  Element 86 descriptor: SLOT 87 75
  Element 87 descriptor: SLOT 88 76
  Element 88 descriptor: SLOT 89 77
  Element 89 descriptor: SLOT 90 78
  Element 90 descriptor: SLOT 91 79
  Element 91 descriptor: SLOT 92 7A

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-19 Thread Schweiss, Chip
On Mon, Mar 19, 2018 at 9:19 AM, Andy Fiddaman  wrote:

>
> I'll have a look at this for you and get a hot-fix built. I have the core
> file that you made available so just need to go through and work out why
> it thinks there are 0 phys somewhere.
>
>
Many thanks!

In discusscussion with JBOD vendor support, this JBOD has two SAS
expanders, which are linked together.  One is likely incorrectly reporting
0 and should be ignored.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-19 Thread Schweiss, Chip
Even unloading all the modules except 'fmd-self-diagnosis' which will not
unload, fmd still dies as soon as I plug in the JBOD.

# fmadm config
MODULE   VERSION STATUS  DESCRIPTION
fmd-self-diagnosis   1.0 active  Fault Manager Self-Diagnosis

# fmadm unload fmd-self-diagnosis
fmadm: failed to unload fmd-self-diagnosis: module is in use and cannot be
unloaded

Looks like I'm dead in the water to make this work with Illumos until this
bug is fixed.

-Chip

On Fri, Mar 16, 2018 at 3:42 PM, Richard Elling <
richard.ell...@richardelling.com> wrote:

> fmadm allows you to load/unload modules.
>  -- richard
>
> On Mar 16, 2018, at 8:24 AM, Schweiss, Chip  wrote:
>
> I need to get this JBOD working with OmniOS.  Is there a way to get FMD to
> ignore this SES device until this issue is fixed?
>
> It is a RAID, Inc. 4U 96-Bay  http://www.raidinc.
> com/products/object-storage/ability-4u-96-bay
>
> -Chip
>
> On Fri, Mar 16, 2018 at 9:18 AM, Schweiss, Chip 
> wrote:
>
>> While this problem was originally ruled out as an artifact of running as
>> a virtual machine, I've now installed the same HBA and JBOD to a physical
>> server.   The problem is exactly the same.
>>
>> This is on OmniOS CE r151024r
>>
>> -Chip
>>
>> # /usr/lib/fm/fmd/fmd -o fg=true -o client.debug=true
>> fmd: [ loading modules ... ABORT: attempted zero-length allocation:
>> Operation not supported
>> Abort (core dumped)
>>
>> > $C
>> 080462a8 libc.so.1`_lwp_kill+0x15(1, 6, 80462f8, fef42000, fef42000,
>> 8046330)
>> 080462c8 libc.so.1`raise+0x2b(6, 0, 80462e0, feec1b59, 0, 0)
>> 08046318 libc.so.1`abort+0x10e(fead51f0, 0, fede2a40, 30, 524f4241,
>> 61203a54)
>> 08046748 libses.so.1`ses_panic(fdde6758, 8046774, 80467e8, fdb6b67a,
>> 83eb0a8, fdb6c398)
>> 08046768 libses.so.1`ses_realloc(fdde6758, 0, 83f01b8, fdde6130,
>> fddf7000, fdb6658f)
>> 08046788 libses.so.1`ses_alloc+0x27(0, feb8, 6, 10, ee0, 8111627)
>> 080467b8 libses.so.1`ses_zalloc+0x1e(0, 0, 73, fdb6659d, 83f0190, 8)
>> 08046838 ses2.so`elem_parse_aes_misc+0x91(81114f4, 83eb0a8, 8, fdb65d85)
>> 08046888 ses2.so`elem_parse_aes+0xfc(82f1ac8, 83f0288, 80468f8, fdb80eae)
>> 080468a8 ses2.so`ses2_fill_element_node+0x37(82f1ac8, 83f0288, 832e930,
>> 4)
>> 080468d8 ses2.so`ses2_node_parse+0x53(82f1ac8, 83f0288, e, fddf7000)
>> 080468f8 libses.so.1`ses_fill_node+0x22(83f0288, 83f0348, fdde38ae,
>> fdde394c)
>> 08046918 libses.so.1`ses_fill_tree+0x21(83f0288, 82f1c88, 83e4cc8,
>> fdde394c)
>> 08046938 libses.so.1`ses_fill_tree+0x33(82f1d88, 82f1b88, 8046968,
>> fdde394c)
>> 08046958 libses.so.1`ses_fill_tree+0x33(82f1c88, 82ef758, 8046998,
>> fdde394c)
>> 08046978 libses.so.1`ses_fill_tree+0x33(82f1b88, 0, 18, fddf7000)
>> 08046998 libses.so.1`ses_fill_snap+0x22(82f08a0, 80, 0, fdde56eb)
>> 080469e8 libses.so.1`ses_snap_new+0x325(82f1b48, 0, 8046a18, fdde3006)
>> 08046a18 libses.so.1`ses_open_scsi+0xc4(1, 82ef688, 8046aa0, fed71c1b,
>> 81053f8, fede4042)
>> 08046a68 libses.so.1`ses_open+0x98(1, 8046aa0, 0, feecedd3, 43, fde1fc58)
>> 08046eb8 ses.so`ses_process_dir+0x133(fde20159, 83cc348, 0, fed77e40)
>> 08046ee8 ses.so`ses_enum+0xc1(81053f8, 82f21a0, 8386608, 0, 400, 0)
>> 08046f38 libtopo.so.1`topo_mod_enumerate+0xc4(81053f8, 82f21a0, 82fb1c8,
>> 8386608, 0, 400)
>> 08046f88 libtopo.so.1`enum_run+0xe9(8105a18, 83d6f78, a, fed7b1dd)
>> 08046fd8 libtopo.so.1`topo_xml_range_process+0x13e(8105a18, 82eb5b0,
>> 83d6f78, 8047008)
>> 08047028 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82eb5b0,
>> 82f21a0)
>> 08047088 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82ebd30,
>> 82f21a0, 8105a18, 83cbac0)
>> 080470e8 libtopo.so.1`topo_xml_walk+0x1b2(8105a18, 82dfde0, 82de080,
>> 82f21a0)
>> 08047128 libtopo.so.1`dependent_create+0x127(8105a18, 82dfde0, 83d3aa0,
>> 82de080, 82f21a0, fed7b1f9)
>> 08047168 libtopo.so.1`dependents_create+0x64(8105a18, 82dfde0, 83d3aa0,
>> 82de300, 82f21a0, 81eb0d8)
>> 08047218 libtopo.so.1`pad_process+0x51e(8105a18, 83ce100, 82de300,
>> 82f21a0, 83ce128, 81d8638)
>> 08047278 libtopo.so.1`topo_xml_range_process+0x31f(8105a18, 82de300,
>> 83ce100, 80472a8)
>> 080472c8 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82de300,
>> 81eb198)
>> 08047328 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82d1ca0,
>> 81eb198, 8103f40, fed8c000)
>> 08047358 libtopo.so.1`topo_xml_enum+0x67(8105a18, 82dfde0, 81eb198,
>> feac2000)
>> 08047488 libtopo.so.1`topo_file_load+0x139(8105a18, 81eb198, fe20c127,
>> fe20bda2, 0, 82d2000)
>> 080

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-16 Thread Schweiss, Chip
I need to get this JBOD working with OmniOS.  Is there a way to get FMD to
ignore this SES device until this issue is fixed?

It is a RAID, Inc. 4U 96-Bay
http://www.raidinc.com/products/object-storage/ability-4u-96-bay

-Chip

On Fri, Mar 16, 2018 at 9:18 AM, Schweiss, Chip  wrote:

> While this problem was originally ruled out as an artifact of running as a
> virtual machine, I've now installed the same HBA and JBOD to a physical
> server.   The problem is exactly the same.
>
> This is on OmniOS CE r151024r
>
> -Chip
>
> # /usr/lib/fm/fmd/fmd -o fg=true -o client.debug=true
> fmd: [ loading modules ... ABORT: attempted zero-length allocation:
> Operation not supported
> Abort (core dumped)
>
> > $C
> 080462a8 libc.so.1`_lwp_kill+0x15(1, 6, 80462f8, fef42000, fef42000,
> 8046330)
> 080462c8 libc.so.1`raise+0x2b(6, 0, 80462e0, feec1b59, 0, 0)
> 08046318 libc.so.1`abort+0x10e(fead51f0, 0, fede2a40, 30, 524f4241,
> 61203a54)
> 08046748 libses.so.1`ses_panic(fdde6758, 8046774, 80467e8, fdb6b67a,
> 83eb0a8, fdb6c398)
> 08046768 libses.so.1`ses_realloc(fdde6758, 0, 83f01b8, fdde6130,
> fddf7000, fdb6658f)
> 08046788 libses.so.1`ses_alloc+0x27(0, feb8, 6, 10, ee0, 8111627)
> 080467b8 libses.so.1`ses_zalloc+0x1e(0, 0, 73, fdb6659d, 83f0190, 8)
> 08046838 ses2.so`elem_parse_aes_misc+0x91(81114f4, 83eb0a8, 8, fdb65d85)
> 08046888 ses2.so`elem_parse_aes+0xfc(82f1ac8, 83f0288, 80468f8, fdb80eae)
> 080468a8 ses2.so`ses2_fill_element_node+0x37(82f1ac8, 83f0288, 832e930, 4)
> 080468d8 ses2.so`ses2_node_parse+0x53(82f1ac8, 83f0288, e, fddf7000)
> 080468f8 libses.so.1`ses_fill_node+0x22(83f0288, 83f0348, fdde38ae,
> fdde394c)
> 08046918 libses.so.1`ses_fill_tree+0x21(83f0288, 82f1c88, 83e4cc8,
> fdde394c)
> 08046938 libses.so.1`ses_fill_tree+0x33(82f1d88, 82f1b88, 8046968,
> fdde394c)
> 08046958 libses.so.1`ses_fill_tree+0x33(82f1c88, 82ef758, 8046998,
> fdde394c)
> 08046978 libses.so.1`ses_fill_tree+0x33(82f1b88, 0, 18, fddf7000)
> 08046998 libses.so.1`ses_fill_snap+0x22(82f08a0, 80, 0, fdde56eb)
> 080469e8 libses.so.1`ses_snap_new+0x325(82f1b48, 0, 8046a18, fdde3006)
> 08046a18 libses.so.1`ses_open_scsi+0xc4(1, 82ef688, 8046aa0, fed71c1b,
> 81053f8, fede4042)
> 08046a68 libses.so.1`ses_open+0x98(1, 8046aa0, 0, feecedd3, 43, fde1fc58)
> 08046eb8 ses.so`ses_process_dir+0x133(fde20159, 83cc348, 0, fed77e40)
> 08046ee8 ses.so`ses_enum+0xc1(81053f8, 82f21a0, 8386608, 0, 400, 0)
> 08046f38 libtopo.so.1`topo_mod_enumerate+0xc4(81053f8, 82f21a0, 82fb1c8,
> 8386608, 0, 400)
> 08046f88 libtopo.so.1`enum_run+0xe9(8105a18, 83d6f78, a, fed7b1dd)
> 08046fd8 libtopo.so.1`topo_xml_range_process+0x13e(8105a18, 82eb5b0,
> 83d6f78, 8047008)
> 08047028 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82eb5b0,
> 82f21a0)
> 08047088 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82ebd30,
> 82f21a0, 8105a18, 83cbac0)
> 080470e8 libtopo.so.1`topo_xml_walk+0x1b2(8105a18, 82dfde0, 82de080,
> 82f21a0)
> 08047128 libtopo.so.1`dependent_create+0x127(8105a18, 82dfde0, 83d3aa0,
> 82de080, 82f21a0, fed7b1f9)
> 08047168 libtopo.so.1`dependents_create+0x64(8105a18, 82dfde0, 83d3aa0,
> 82de300, 82f21a0, 81eb0d8)
> 08047218 libtopo.so.1`pad_process+0x51e(8105a18, 83ce100, 82de300,
> 82f21a0, 83ce128, 81d8638)
> 08047278 libtopo.so.1`topo_xml_range_process+0x31f(8105a18, 82de300,
> 83ce100, 80472a8)
> 080472c8 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82de300,
> 81eb198)
> 08047328 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82d1ca0,
> 81eb198, 8103f40, fed8c000)
> 08047358 libtopo.so.1`topo_xml_enum+0x67(8105a18, 82dfde0, 81eb198,
> feac2000)
> 08047488 libtopo.so.1`topo_file_load+0x139(8105a18, 81eb198, fe20c127,
> fe20bda2, 0, 82d2000)
> 080474b8 libtopo.so.1`topo_mod_enummap+0x26(8105a18, 81eb198, fe20c127,
> fe20bda2, 8105a18, fe20b11c)
> 08047508 x86pi.so`x86pi_enum_start+0xc5(8105a18, 8047530, 8047538,
> fe205580, 8105a18, 8105a18)
> 08047558 x86pi.so`x86pi_enum+0x55(8105a18, 81eb198, 81d8a90, 0, 0, 0)
> 080475a8 libtopo.so.1`topo_mod_enumerate+0xc4(8105a18, 81eb198, 80ebf38,
> 81d8a90, 0, 0)
> 080475f8 libtopo.so.1`enum_run+0xe9(8105b68, 81f1070, a, fed7b1dd)
> 08047648 libtopo.so.1`topo_xml_range_process+0x13e(8105b68, 81f94c8,
> 81f1070, 8047678)
> 08047698 libtopo.so.1`tf_rdata_new+0x135(8105b68, 81f4240, 81f94c8,
> 81eb198)
> 080476f8 libtopo.so.1`topo_xml_walk+0x246(8105b68, 81f4240, 81f9608,
> 81eb198, 8103f40, fed8c000)
> 08047728 libtopo.so.1`topo_xml_enum+0x67(8105b68, 81f4240, 81eb198,
> 81d8ad0)
> 08047858 libtopo.so.1`topo_file_load+0x139(8105b68, 81eb198, 80f3f38,
> 81d8aa0, 0, 2c)
> 08047898 libtopo.so.1`topo_tree_enum+0x89(8103f40, 81f51c8, 80478c8,
> fe70e6f8, 81f7f78, 8103f40)
> 080478b8 libtopo.so.1`topo_tree_enum_all+0x20(

Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-03-16 Thread Schweiss, Chip
While this problem was originally ruled out as an artifact of running as a
virtual machine, I've now installed the same HBA and JBOD to a physical
server.   The problem is exactly the same.

This is on OmniOS CE r151024r

-Chip

# /usr/lib/fm/fmd/fmd -o fg=true -o client.debug=true
fmd: [ loading modules ... ABORT: attempted zero-length allocation:
Operation not supported
Abort (core dumped)

> $C
080462a8 libc.so.1`_lwp_kill+0x15(1, 6, 80462f8, fef42000, fef42000,
8046330)
080462c8 libc.so.1`raise+0x2b(6, 0, 80462e0, feec1b59, 0, 0)
08046318 libc.so.1`abort+0x10e(fead51f0, 0, fede2a40, 30, 524f4241,
61203a54)
08046748 libses.so.1`ses_panic(fdde6758, 8046774, 80467e8, fdb6b67a,
83eb0a8, fdb6c398)
08046768 libses.so.1`ses_realloc(fdde6758, 0, 83f01b8, fdde6130, fddf7000,
fdb6658f)
08046788 libses.so.1`ses_alloc+0x27(0, feb8, 6, 10, ee0, 8111627)
080467b8 libses.so.1`ses_zalloc+0x1e(0, 0, 73, fdb6659d, 83f0190, 8)
08046838 ses2.so`elem_parse_aes_misc+0x91(81114f4, 83eb0a8, 8, fdb65d85)
08046888 ses2.so`elem_parse_aes+0xfc(82f1ac8, 83f0288, 80468f8, fdb80eae)
080468a8 ses2.so`ses2_fill_element_node+0x37(82f1ac8, 83f0288, 832e930, 4)
080468d8 ses2.so`ses2_node_parse+0x53(82f1ac8, 83f0288, e, fddf7000)
080468f8 libses.so.1`ses_fill_node+0x22(83f0288, 83f0348, fdde38ae,
fdde394c)
08046918 libses.so.1`ses_fill_tree+0x21(83f0288, 82f1c88, 83e4cc8, fdde394c)
08046938 libses.so.1`ses_fill_tree+0x33(82f1d88, 82f1b88, 8046968, fdde394c)
08046958 libses.so.1`ses_fill_tree+0x33(82f1c88, 82ef758, 8046998, fdde394c)
08046978 libses.so.1`ses_fill_tree+0x33(82f1b88, 0, 18, fddf7000)
08046998 libses.so.1`ses_fill_snap+0x22(82f08a0, 80, 0, fdde56eb)
080469e8 libses.so.1`ses_snap_new+0x325(82f1b48, 0, 8046a18, fdde3006)
08046a18 libses.so.1`ses_open_scsi+0xc4(1, 82ef688, 8046aa0, fed71c1b,
81053f8, fede4042)
08046a68 libses.so.1`ses_open+0x98(1, 8046aa0, 0, feecedd3, 43, fde1fc58)
08046eb8 ses.so`ses_process_dir+0x133(fde20159, 83cc348, 0, fed77e40)
08046ee8 ses.so`ses_enum+0xc1(81053f8, 82f21a0, 8386608, 0, 400, 0)
08046f38 libtopo.so.1`topo_mod_enumerate+0xc4(81053f8, 82f21a0, 82fb1c8,
8386608, 0, 400)
08046f88 libtopo.so.1`enum_run+0xe9(8105a18, 83d6f78, a, fed7b1dd)
08046fd8 libtopo.so.1`topo_xml_range_process+0x13e(8105a18, 82eb5b0,
83d6f78, 8047008)
08047028 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82eb5b0, 82f21a0)
08047088 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82ebd30,
82f21a0, 8105a18, 83cbac0)
080470e8 libtopo.so.1`topo_xml_walk+0x1b2(8105a18, 82dfde0, 82de080,
82f21a0)
08047128 libtopo.so.1`dependent_create+0x127(8105a18, 82dfde0, 83d3aa0,
82de080, 82f21a0, fed7b1f9)
08047168 libtopo.so.1`dependents_create+0x64(8105a18, 82dfde0, 83d3aa0,
82de300, 82f21a0, 81eb0d8)
08047218 libtopo.so.1`pad_process+0x51e(8105a18, 83ce100, 82de300, 82f21a0,
83ce128, 81d8638)
08047278 libtopo.so.1`topo_xml_range_process+0x31f(8105a18, 82de300,
83ce100, 80472a8)
080472c8 libtopo.so.1`tf_rdata_new+0x135(8105a18, 82dfde0, 82de300, 81eb198)
08047328 libtopo.so.1`topo_xml_walk+0x246(8105a18, 82dfde0, 82d1ca0,
81eb198, 8103f40, fed8c000)
08047358 libtopo.so.1`topo_xml_enum+0x67(8105a18, 82dfde0, 81eb198,
feac2000)
08047488 libtopo.so.1`topo_file_load+0x139(8105a18, 81eb198, fe20c127,
fe20bda2, 0, 82d2000)
080474b8 libtopo.so.1`topo_mod_enummap+0x26(8105a18, 81eb198, fe20c127,
fe20bda2, 8105a18, fe20b11c)
08047508 x86pi.so`x86pi_enum_start+0xc5(8105a18, 8047530, 8047538,
fe205580, 8105a18, 8105a18)
08047558 x86pi.so`x86pi_enum+0x55(8105a18, 81eb198, 81d8a90, 0, 0, 0)
080475a8 libtopo.so.1`topo_mod_enumerate+0xc4(8105a18, 81eb198, 80ebf38,
81d8a90, 0, 0)
080475f8 libtopo.so.1`enum_run+0xe9(8105b68, 81f1070, a, fed7b1dd)
08047648 libtopo.so.1`topo_xml_range_process+0x13e(8105b68, 81f94c8,
81f1070, 8047678)
08047698 libtopo.so.1`tf_rdata_new+0x135(8105b68, 81f4240, 81f94c8, 81eb198)
080476f8 libtopo.so.1`topo_xml_walk+0x246(8105b68, 81f4240, 81f9608,
81eb198, 8103f40, fed8c000)
08047728 libtopo.so.1`topo_xml_enum+0x67(8105b68, 81f4240, 81eb198, 81d8ad0)
08047858 libtopo.so.1`topo_file_load+0x139(8105b68, 81eb198, 80f3f38,
81d8aa0, 0, 2c)
08047898 libtopo.so.1`topo_tree_enum+0x89(8103f40, 81f51c8, 80478c8,
fe70e6f8, 81f7f78, 8103f40)
080478b8 libtopo.so.1`topo_tree_enum_all+0x20(8103f40, 81f7f78, 80478f8,
fed71087)
080478f8 libtopo.so.1`topo_snap_create+0x13d(8103f40, 804794c, 0, fed7118d,
807c010, 4d5)
08047928 libtopo.so.1`topo_snap_hold+0x56(8103f40, 0, 804794c, 80e7f08, 0,
8047ac8)
08047968 fmd_topo_update+0x9f(80e7f08, 8085dfa, 8047a68, 80601f7, 0, 0)
08047978 fmd_topo_init+0xb(0, 0, 0, 0, 2, 80992f8)
08047a68 fmd_run+0x118(809a8c0, , 0, 2)
08047ae8 main+0x344(8047adc, fef4f348, 8047b18, 805fdd3, 5, 8047b24)
08047b18 _start+0x83(5, 8047c38, 8047c4c, 8047c4f, 8047c57, 8047c5a)


On Fri, Feb 16, 2018 at 10:57 AM, Schweiss, Chip  wrote:

> On Fri, Feb 16, 2018 at 10:47 AM, Robert Mustacchi  wrote:
>
>> We're getting a zero length allocation here. It appears that the number
>> 

Re: [OmniOS-discuss] 8806 back port

2018-03-15 Thread Schweiss, Chip
On Tue, Mar 13, 2018 at 5:50 PM, Andy Fiddaman  wrote:

>
> This error message actually looks like you still have the publisher set
> from the last time you applied the fix by hand. Can you check the output
> of 'pkg publisher'?
>
>
That was it.   It was still there because I rolled back to the BE before it
was previously installed.

Thanks!

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] 8806 back port

2018-03-13 Thread Schweiss, Chip
The 8806 backport will no longer apply to r151022ap.

# pkg apply-hot-fix --be-name=omniosce-r151022ap-8806 8806-backport_r22.p5p
No updates available for this image.
pkg set-publisher: Could not refresh the catalog for omnios

file protocol error: code: E_FAILED_INIT (2) reason: Package archive
/root/8806-backport_r22.p5p does not contain the requested package file(s):
publisher/omnios/catalog/catalog.attrs.
Repository URL: 'file:///root/8806-backport_r22.p5p'. (happened 4 times)


Some things are not clear about hot-fixes.

Do they need to be reapplied after each update?

Are they only compatible with the release they are built against?

Cheers!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-02-16 Thread Schweiss, Chip
On Fri, Feb 16, 2018 at 10:47 AM, Robert Mustacchi  wrote:

> We're getting a zero length allocation here. It appears that the number
> of phys that we're detecting in one of the elements is probably zero. Is
> it possible to upload the core so we can confirm the data and fix the
> ses module to handle this, potentially odd, case?
>
>
Sure, where would you like me to upload the core?

I've put it here if you'd like to grab it:
ftp://ftp.nrg.wustl.edu/pub/zfs/fmd.core

-Chip



> Thanks,
> Robert
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] FMD fails to run

2018-02-16 Thread Schweiss, Chip
Here's what I'm seeing:

# /usr/lib/fm/fmd/fmd -o fg=true -o client.debug=true
fmd: [ loading modules ... ABORT: attempted zero-length allocation: No such
device or address
Abort (core dumped)

# mdb core
Loading modules: [ fmd libumem.so.1 libc.so.1 libnvpair.so.1 libtopo.so.1
libuutil.so.1 libavl.so.1 libcmdutils.so.1 libsysevent.so.1 ld.so.1 ]
> $C
08046298 libc.so.1`_lwp_kill+0x15(1, 6, 80462e8, fef42000, fef42000,
8046320)
080462b8 libc.so.1`raise+0x2b(6, 0, 80462d0, feec1b59, 0, 0)
08046308 libc.so.1`abort+0x10e(fede2a40, fef44cb8, 8046348, 6, 524f4241,
61203a54)
08046738 libses.so.1`ses_panic(fdde6758, 8046764, 80467d8, fdb6b67a,
83f1048, fdb6c398)
08046758 libses.so.1`ses_realloc(fdde6758, 0, 83f5078, fdde6130, fddf7000,
fdb6658f)
08046778 libses.so.1`ses_alloc+0x27(0, feb8, 6, 10, ee0, 80f4627)
080467a8 libses.so.1`ses_zalloc+0x1e(0, 0, 73, fdb6659d, 83f5050, 8)
08046828 ses2.so`elem_parse_aes_misc+0x91(80f44f4, 83f1048, 8, fdb65d85)
08046878 ses2.so`elem_parse_aes+0xfc(82bd388, 83f5148, 80468e8, fdb80eae)
08046898 ses2.so`ses2_fill_element_node+0x37(82bd388, 83f5148, 8303ed8, 4)
080468c8 ses2.so`ses2_node_parse+0x53(82bd388, 83f5148, e, fddf7000)
080468e8 libses.so.1`ses_fill_node+0x22(83f5148, 83f5208, fdde38ae,
fdde394c)
08046908 libses.so.1`ses_fill_tree+0x21(83f5148, 82bd548, 83e9cc8, fdde394c)
08046928 libses.so.1`ses_fill_tree+0x33(82bd648, 82bd448, 8046958, fdde394c)
08046948 libses.so.1`ses_fill_tree+0x33(82bd548, 82a5270, 8046988, fdde394c)
08046968 libses.so.1`ses_fill_tree+0x33(82bd448, 0, 18, fddf7000)
08046988 libses.so.1`ses_fill_snap+0x22(82c08d0, 80, 0, fdde56eb)
080469d8 libses.so.1`ses_snap_new+0x325(82bd408, 0, 8046a08, fdde3006)
08046a08 libses.so.1`ses_open_scsi+0xc4(1, 82a51a0, 8046a90, fed71c1b,
80e9468, fede4042)
08046a58 libses.so.1`ses_open+0x98(1, 8046a90, 0, feecedd3, 43, fde1fc58)
08046ea8 ses.so`ses_process_dir+0x133(fde20159, 83d8ed8, 0, fed77e40)
08046ed8 ses.so`ses_enum+0xc1(80e9468, 83aeb58, 8356570, 0, 400, 0)
08046f28 libtopo.so.1`topo_mod_enumerate+0xc4(80e9468, 83aeb58, 82d4a88,
8356570, 0, 400)
08046f78 libtopo.so.1`enum_run+0xe9(80e9a18, 83d77c8, a, fed7b1dd)
08046fc8 libtopo.so.1`topo_xml_range_process+0x13e(80e9a18, 82bb0b0,
83d77c8, 8046ff8)
08047018 libtopo.so.1`tf_rdata_new+0x135(80e9a18, 81c8790, 82bb0b0, 83aeb58)
08047078 libtopo.so.1`topo_xml_walk+0x246(80e9a18, 81c8790, 82bb830,
83aeb58, 80e9a18, 83d5bc0)
080470d8 libtopo.so.1`topo_xml_walk+0x1b2(80e9a18, 81c8790, 82b0b28,
83aeb58)
08047118 libtopo.so.1`dependent_create+0x127(80e9a18, 81c8790, 83d6ab0,
82b0b28, 83aeb58, fed7b1f9)
08047158 libtopo.so.1`dependents_create+0x64(80e9a18, 81c8790, 83d6ab0,
82b0da8, 83aeb58, 81bd0d8)
08047208 libtopo.so.1`pad_process+0x51e(80e9a18, 83d79a8, 82b0da8, 83aeb58,
83d79d0, 8356340)
08047268 libtopo.so.1`topo_xml_range_process+0x31f(80e9a18, 82b0da8,
83d79a8, 8047298)
080472b8 libtopo.so.1`tf_rdata_new+0x135(80e9a18, 81c8790, 82b0da8, 81bd258)
08047318 libtopo.so.1`topo_xml_walk+0x246(80e9a18, 81c8790, 82a37a0,
81bd258, 80e5f40, fed8c000)
08047348 libtopo.so.1`topo_xml_enum+0x67(80e9a18, 81c8790, 81bd258,
feac2000)
08047478 libtopo.so.1`topo_file_load+0x139(80e9a18, 81bd258, fe20c127,
fe20bda2, 0, 82a6000)
080474a8 libtopo.so.1`topo_mod_enummap+0x26(80e9a18, 81bd258, fe20c127,
fe20bda2, 80e9a18, fe20b11c)
080474f8 x86pi.so`x86pi_enum_start+0xc5(80e9a18, 8047520, 8047528,
fe205580, 80e9a18, 80e9a18)
08047548 x86pi.so`x86pi_enum+0x55(80e9a18, 81bd258, 81a6a70, 0, 0, 0)
08047598 libtopo.so.1`topo_mod_enumerate+0xc4(80e9a18, 81bd258, 80cdf38,
81a6a70, 0, 0)
080475e8 libtopo.so.1`enum_run+0xe9(80e9b68, 82a5fa8, a, fed7b1dd)
08047638 libtopo.so.1`topo_xml_range_process+0x13e(80e9b68, 82a3f70,
82a5fa8, 8047668)
08047688 libtopo.so.1`tf_rdata_new+0x135(80e9b68, 81c8bd0, 82a3f70, 81bd258)
080476e8 libtopo.so.1`topo_xml_walk+0x246(80e9b68, 81c8bd0, 81c7108,
81bd258, 80e5f40, fed8c000)
08047718 libtopo.so.1`topo_xml_enum+0x67(80e9b68, 81c8bd0, 81bd258, 81a6ab0)
08047848 libtopo.so.1`topo_file_load+0x139(80e9b68, 81bd258, 80d4f38,
81a6a80, 0, 2c)
08047888 libtopo.so.1`topo_tree_enum+0x89(80e5f40, 81c5318, 80478b8,
fe70e6f8, 81b5310, 80e5f40)
080478a8 libtopo.so.1`topo_tree_enum_all+0x20(80e5f40, 81b5310, 80478e8,
fed71087)
080478e8 libtopo.so.1`topo_snap_create+0x13d(80e5f40, 804793c, 0, fed7118d,
807c010, 21)
08047918 libtopo.so.1`topo_snap_hold+0x56(80e5f40, 0, 804793c, 80c9f08, 0,
8047ab8)
08047958 fmd_topo_update+0x9f(80c9f08, 8085dfa, 8047a58, 80601f7, 0, 0)
08047968 fmd_topo_init+0xb(0, 0, 0, 0, 2, 80992f8)
08047a58 fmd_run+0x118(809a8c0, , 0, 0)
08047ad8 main+0x344(8047acc, fef4f348, 8047b0c, 805fdd3, 5, 8047b18)
08047b0c _start+0x83(5, 8047c2c, 8047c40, 8047c43, 8047c4b, 8047c4e)



On Fri, Feb 16, 2018 at 10:29 AM, Yuri Pankov  wrote:

> Schweiss, Chip wrote:
>
>> This is on OmniOS CE r151024l running in a VMware virtual machine under
>> ESXi 6.5 with PCI pass-thru to a SAS3008 HBA.
>>
>> The problem is relate

[OmniOS-discuss] FMD fails to run

2018-02-16 Thread Schweiss, Chip
This is on OmniOS CE r151024l running in a VMware virtual machine under
ESXi 6.5 with PCI pass-thru to a SAS3008 HBA.

The problem is related to the HBA on pass-thru.  If I disconnect it,
everything starts fine, but I am not clear why or how to fix this.   I have
done similar VM passthrough setups with older versions of OmniOS and SAS2
HBAs without any problems.

The same HBA was being used successfully in the same configuration with
CentOS 7 in the VM so I know this can function.

I can see all the disks, but cannot import the pool because the fault
manager is not running.

The logs show:

[ Feb 16 10:02:14 Method "start" exited with status 1. ]
[ Feb 16 10:02:14 Executing start method ("/usr/lib/fm/fmd/fmd"). ]
ABORT: attempted zero-length allocation: No such device or address
[ Feb 16 10:02:14 Method "start" exited with status 1. ]
[ Feb 16 10:02:14 Executing start method ("/usr/lib/fm/fmd/fmd"). ]
ABORT: attempted zero-length allocation: No such device or address
[ Feb 16 10:02:14 Method "start" exited with status 1. ]
[ Feb 16 10:05:09 Leaving maintenance because clear requested. ]
[ Feb 16 10:05:09 Enabled. ]
[ Feb 16 10:05:09 Executing start method ("/usr/lib/fm/fmd/fmd"). ]
ABORT: attempted zero-length allocation: No such device or address
[ Feb 16 10:05:10 Method "start" exited with status 1. ]

Any hope of making this work?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] Re: rpcbind: t_bind failed

2018-01-17 Thread Schweiss, Chip
I haven't seen this bug filed yet.

Please submit this.  For anyone using automounter this bug is a ticking
time bomb.

I've been able to extend my frequency of reboots by about a week with

ndd -set /dev/tcp tcp_smallest_anon_port 1024

However, until this is fixed, I'm forced to reboot every couple weeks.

Thank you,

-Chip

On Mon, Jan 8, 2018 at 10:46 AM, Dan McDonald  wrote:

> OH PHEW!
>
> > On Jan 8, 2018, at 11:43 AM, Youzhong Yang  wrote:
> >
> > This is our patch. It was applied 3 years ago so the line number could
> be different for the latest version of the file.
> > diff --git a/usr/src/uts/common/rpc/clnt_cots.c
> b/usr/src/uts/common/rpc/clnt_cots.c
> > index 4466e93..0a0951d 100644
> > --- a/usr/src/uts/common/rpc/clnt_cots.c
> > +++ b/usr/src/uts/common/rpc/clnt_cots.c
> > @@ -2285,6 +2285,7 @@ start_retry_loop:
> >   if (rpcerr->re_status == RPC_SUCCESS)
> >   rpcerr->re_status = RPC_XPRTFAILED;
> >   cm_entry->x_connected = FALSE;
> > + cm_entry->x_dead = TRUE;
> >   } else
> >   cm_entry->x_connected = connected;
> >
> > @@ -2403,6 +2404,7 @@ connmgr_wrapconnect(
> >   if (rpcerr->re_status == RPC_SUCCESS)
> >   rpcerr->re_status = RPC_XPRTFAILED;
> >   cm_entry->x_connected = FALSE;
> > + cm_entry->x_dead = TRUE;
> >   } else
> >   cm_entry->x_connected = connected;
>
> This makes TONS more sense, and alleviates/obviates my concerns previously.
>
> If there isn't a bug already, please file one.  Once filed or found,
> please add me as a code reviewer for this.
>
> Thanks,
> Dan
>
>
> --
> illumos-zfs
> Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T8f10bde64dc0d5c5-M889b6aaf7cbeb0b32617f321
> Powered by Topicbox: https://topicbox.com
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] OmniOSce installer rpool slicing

2018-01-05 Thread Schweiss, Chip
In the previous Solaris style installer we had the option of only using a
portion of the disk that the rpool went on.   This was very good for SSDs
that perform better and last longer if they have some additional slack
space that never has data written to it.

Is there a way to achieve this with the new installer?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOSce installer rpool slicing

2018-01-05 Thread Schweiss, Chip
I didn't think about that.  Thanks!

On Fri, Jan 5, 2018 at 9:11 AM, Volker A. Brandt  wrote:

> Hi Chip!
>
>
> > In the previous Solaris style installer we had the option of only using a
> > portion of the disk that the rpool went on.   This was very good for
> SSDs that
> > perform better and last longer if they have some additional slack space
> that
> > never has data written to it.
> >
> > Is there a way to achieve this with the new installer?
>
> Yes.  Just drop to the shell from the installation menu and create your
> rpool using fdisk, format, and zpool create.  Exit the shell and select
> "use existing pool".
>
>
> Regards -- Volker
> --
> 
> Volker A. Brandt   Consulting and Support for Oracle Solaris
> Brandt & Brandt Computer GmbH   WWW: http://www.bb-c.de/
> Am Wiesenpfad 6, 53340 Meckenheim, GERMANYEmail: v...@bb-c.de
> Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 46
> Geschäftsführer: Rainer J.H. Brandt und Volker A. Brandt
>
> "When logic and proportion have fallen sloppy dead"
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] Re: rpcbind: t_bind failed

2018-01-03 Thread Schweiss, Chip
Hopefully the patch Marcel is talking about fixes this.  I've at least
figured out enough to predict when the problem is imminent.

We have been migrating to using automounter instead of hard mounts which
could to be related to this problem growing over time.

Just an FYI:  I've kept the server running in this state, but moved its
storage pool to a sister server.   The port binding problem remains with NO
NFS clients connected, but neither pfiles or lsof shows rpcbind as the
culprit:

# netstat -an|grep BOUND|wc -l
32739

# /opt/ozmt/bin/SunOS/lsof -i:41155

{nothing returned}

# pfiles `pgrep rpcbind`
449:/usr/sbin/rpcbind
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:527,0 ino:7077 uid:0 gid:3 rdev:135,2
  O_RDWR
  /devices/pseudo/mm@0:null
  offset:0
   1: S_IFCHR mode:0666 dev:527,0 ino:7077 uid:0 gid:3 rdev:135,2
  O_RDWR
  /devices/pseudo/mm@0:null
  offset:0
   2: S_IFCHR mode:0666 dev:527,0 ino:7077 uid:0 gid:3 rdev:135,2
  O_RDWR
  /devices/pseudo/mm@0:null
  offset:0
   3: S_IFCHR mode: dev:527,0 ino:61271 uid:0 gid:0 rdev:231,64
  O_RDWR
sockname: AF_INET6 ::  port: 111
  /devices/pseudo/udp6@0:udp6
  offset:0
   4: S_IFCHR mode: dev:527,0 ino:50998 uid:0 gid:0 rdev:231,59
  O_RDWR
sockname: AF_INET6 ::  port: 0
  /devices/pseudo/udp6@0:udp6
  offset:0
   5: S_IFCHR mode: dev:527,0 ino:61264 uid:0 gid:0 rdev:231,58
  O_RDWR
sockname: AF_INET6 ::  port: 60955
  /devices/pseudo/udp6@0:udp6
  offset:0
   6: S_IFCHR mode: dev:527,0 ino:64334 uid:0 gid:0 rdev:224,57
  O_RDWR
sockname: AF_INET6 ::  port: 111
  /devices/pseudo/tcp6@0:tcp6
  offset:0
   7: S_IFCHR mode: dev:527,0 ino:64333 uid:0 gid:0 rdev:224,56
  O_RDWR
sockname: AF_INET6 ::  port: 0
  /devices/pseudo/tcp6@0:tcp6
  offset:0
   8: S_IFCHR mode: dev:527,0 ino:64332 uid:0 gid:0 rdev:230,55
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 111
  /devices/pseudo/udp@0:udp
  offset:0
   9: S_IFCHR mode: dev:527,0 ino:64330 uid:0 gid:0 rdev:230,54
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 0
  /devices/pseudo/udp@0:udp
  offset:0
  10: S_IFCHR mode: dev:527,0 ino:64331 uid:0 gid:0 rdev:230,53
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 60994
  /devices/pseudo/udp@0:udp
  offset:0
  11: S_IFCHR mode: dev:527,0 ino:64327 uid:0 gid:0 rdev:223,52
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 111
  /devices/pseudo/tcp@0:tcp
  offset:0
  12: S_IFCHR mode: dev:527,0 ino:64326 uid:0 gid:0 rdev:223,51
  O_RDWR
sockname: AF_INET 0.0.0.0  port: 0
  /devices/pseudo/tcp@0:tcp
  offset:0
  13: S_IFCHR mode: dev:527,0 ino:64324 uid:0 gid:0 rdev:226,32
  O_RDWR
  /devices/pseudo/tl@0:ticlts
  offset:0
  14: S_IFCHR mode: dev:527,0 ino:64328 uid:0 gid:0 rdev:226,33
  O_RDWR
  /devices/pseudo/tl@0:ticlts
  offset:0
  15: S_IFCHR mode: dev:527,0 ino:64324 uid:0 gid:0 rdev:226,35
  O_RDWR
  /devices/pseudo/tl@0:ticlts
  offset:0
  16: S_IFCHR mode: dev:527,0 ino:64322 uid:0 gid:0 rdev:226,36
  O_RDWR
  /devices/pseudo/tl@0:ticotsord
  offset:0
  17: S_IFCHR mode: dev:527,0 ino:64321 uid:0 gid:0 rdev:226,37
  O_RDWR
  /devices/pseudo/tl@0:ticotsord
  offset:0
  18: S_IFCHR mode: dev:527,0 ino:64030 uid:0 gid:0 rdev:226,39
  O_RDWR
  /devices/pseudo/tl@0:ticots
  offset:0
  19: S_IFCHR mode: dev:527,0 ino:64029 uid:0 gid:0 rdev:226,40
  O_RDWR
  /devices/pseudo/tl@0:ticots
  offset:0
  20: S_IFIFO mode: dev:525,0 ino:206 uid:1 gid:12 rdev:0,0
  O_RDWR|O_NONBLOCK
  21: S_IFIFO mode: dev:525,0 ino:206 uid:1 gid:12 rdev:0,0
  O_RDWR|O_NONBLOCK
  23: S_IFCHR mode: dev:527,0 ino:33089 uid:0 gid:0 rdev:129,21273
  O_WRONLY FD_CLOEXEC
  /devices/pseudo/log@0:conslog
  offset:0

Restarting rpcbind doesn't affect it either:

# svcadm restart svc:/network/rpc/bind:default

# netstat -an|grep BOUND|wc -l
32739

In the interim of this patch getting integrated I'll monitor the number of
bound ports to know when I should fail my pool over again.


On Wed, Jan 3, 2018 at 10:32 AM, Marcel Telka  wrote:

> On Wed, Jan 03, 2018 at 10:02:43AM -0600, Schweiss, Chip wrote:
> > The problem occurred again starting last night.  I have another clue,
> but I
> > still don't know how it is occurring or how to fix it.
> >
> > It looks like all the TCP ports are in "bound" state, but not being
> > released.
> >
> > How can I isolate the cause of this?
>
> This is a bug in rpcmod, very likely related to
> https://www.illumos.org/issues/1616
>
> I discussed this few weeks back with some guy who faced the same issue.  It
> looks like he found the cause a

Re: [OmniOS-discuss] rpcbind: t_bind failed

2018-01-03 Thread Schweiss, Chip
0 private/defer
d063568fd7e0 stream-ord 000 000
d063568fdb90 stream-ord 000 000
d06356840078 stream-ord d0635685a700 000 private/bounce
d06356840428 stream-ord 000 000
d063568407d8 stream-ord 000 000
d06356840b88 stream-ord d0635685a800 000 private/rewrite
d06356843070 stream-ord d06356810380 000 private/tlsmgr
d06356843420 stream-ord 000 000
d063568437d0 stream-ord 000 000
d06356849068 stream-ord 000 000
d06356849418 stream-ord 000 000
d063568497c8 stream-ord d0635685a000 000 public/qmgr
d06356849b78 stream-ord d0635685a100 000 public/cleanup
d0635684d060 stream-ord 000 000
d0635684d410 stream-ord 000 000
d0635684db70 stream-ord 000 000
d06355646058 stream-ord 000 000
d06355646b68 stream-ord d0635685a300 000 public/pickup
d063551bf3f8 stream-ord d063193fe900 000 /var/run/.inetd.uds
d063550e7b50 dgram  d063550eb380 000 /var/run/in.rdisc_mib
d06355031798 dgram  d063536c8800 000 /var/run/in.ndpd_mib
d06355031b48 stream-ord d063536c8c00 000 /var/run/in.ndpd_ipadm
d0635265a028 stream-ord 000 d0634e4acd00
/var/run/dbus/system_bus_socket
d0635265a788 stream-ord 000 d063500ffc80
/var/run/hald/dbus-y1Me9kLIpf
d0635265ab38 stream-ord 000 000 /var/run/hald/dbus-y1Me9kLIpf
d06351d553d0 stream-ord 000 000 /var/run/hald/dbus-y1Me9kLIpf
d06351d55780 stream-ord 000 000 /var/run/hald/dbus-y1Me9kLIpf
d06351d55b30 stream-ord 000 d063500ffc80
/var/run/hald/dbus-y1Me9kLIpf
d06351996018 stream-ord 000 d063500ffc80
/var/run/hald/dbus-y1Me9kLIpf
d063519963c8 stream-ord 000 000 /var/run/hald/dbus-y1Me9kLIpf
d06351996778 stream-ord 000 d063500ffc80
/var/run/hald/dbus-y1Me9kLIpf
d063500fe010 stream-ord 000 000 /var/run/hald/dbus-5Qrha0Wmu3
d063500fe3c0 stream-ord 000 d063500ffa80
/var/run/hald/dbus-5Qrha0Wmu3
d063500fe770 stream-ord d063500ffa80 000
/var/run/hald/dbus-5Qrha0Wmu3
d063500feb20 stream-ord d063500ffc80 000
/var/run/hald/dbus-y1Me9kLIpf
d0634e4ad008 stream-ord 000 000
d0634e4ad3b8 stream-ord 000 000
d0634e4ad768 stream-ord 000 000 /var/run/dbus/system_bus_socket
d0634e4adb18 stream-ord d0634e4acd00 000
/var/run/dbus/system_bus_socket


A sorted output shows nearly all 64K ports in bound state.

On Tue, Jan 2, 2018 at 8:40 AM, Schweiss, Chip  wrote:

> About once every week or two I'm having NFS connections start to collapse
> to one of my servers.   Clients will lose thier connections of the the
> course of several hours. The logs fill with these messages:
>
> Dec 25 16:21:14 mir-zfs03 rpcbind: [ID 452059 daemon.error]  do_accept :
> t_bind failed : Couldn't allocate address
> Dec 25 16:21:14 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
> daemon.error] t_bind(file descriptor 188/transport tcp) TLI error 5
> Dec 25 16:21:31 mir-zfs03 last message repeated 85 times
> Dec 25 16:21:31 mir-zfs03 rpcbind: [ID 452059 daemon.error]  do_accept :
> t_bind failed : Couldn't allocate address
> Dec 25 16:21:32 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
> daemon.error] t_bind(file descriptor 188/transport tcp) TLI error 5
> Dec 25 16:21:34 mir-zfs03 last message repeated 19 times
> Dec 25 16:21:37 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
> daemon.error] t_bind(file descriptor 200/transport tcp) TLI error 5
> Dec 25 16:22:17 mir-zfs03 last message repeated 116 times
> Dec 25 16:22:21 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
> daemon.error] t_bind(file descriptor 206/transport tcp) TLI error 5
> Dec 25 16:23:04 mir-zfs03 last message repeated 81 times
>
> This is a fully updated OmniOS CE r151022.
>
> I've tried restarting NFS services, but the only thing that has been
> successful in restoring services has been rebooting.
>
> I'm not finding anything useful via Google except the source code that
> spits out this message.   HP-UX appears to have had the same issue that
> they patched years ago.   I'm guessing shared NFS/RPC code.
>
> Any clue as to the cause of this and how to fix?
>
> -Chip
>
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] rpcbind: t_bind failed

2018-01-02 Thread Schweiss, Chip
About once every week or two I'm having NFS connections start to collapse
to one of my servers.   Clients will lose thier connections of the the
course of several hours. The logs fill with these messages:

Dec 25 16:21:14 mir-zfs03 rpcbind: [ID 452059 daemon.error]  do_accept :
t_bind failed : Couldn't allocate address
Dec 25 16:21:14 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
daemon.error] t_bind(file descriptor 188/transport tcp) TLI error 5
Dec 25 16:21:31 mir-zfs03 last message repeated 85 times
Dec 25 16:21:31 mir-zfs03 rpcbind: [ID 452059 daemon.error]  do_accept :
t_bind failed : Couldn't allocate address
Dec 25 16:21:32 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
daemon.error] t_bind(file descriptor 188/transport tcp) TLI error 5
Dec 25 16:21:34 mir-zfs03 last message repeated 19 times
Dec 25 16:21:37 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
daemon.error] t_bind(file descriptor 200/transport tcp) TLI error 5
Dec 25 16:22:17 mir-zfs03 last message repeated 116 times
Dec 25 16:22:21 mir-zfs03 /usr/lib/nfs/nfsd[27689]: [ID 396295
daemon.error] t_bind(file descriptor 206/transport tcp) TLI error 5
Dec 25 16:23:04 mir-zfs03 last message repeated 81 times

This is a fully updated OmniOS CE r151022.

I've tried restarting NFS services, but the only thing that has been
successful in restoring services has been rebooting.

I'm not finding anything useful via Google except the source code that
spits out this message.   HP-UX appears to have had the same issue that
they patched years ago.   I'm guessing shared NFS/RPC code.

Any clue as to the cause of this and how to fix?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Editing kernel command line with BSD Loader

2017-10-30 Thread Schweiss, Chip
Forgive me if there is a FAQ somewhere on this, but I could not locate one.

How do I edit the command line now that my OmniOS is using the BSD loader?

I'd like to disable a driver at boot time such as:

-B disable-mpt_sas=true

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] SAS 9305-16e HBA support in Illumos

2017-09-08 Thread Schweiss, Chip
Robert,

That is awesome.   I'd definitely be interested in testing this.

I'll get my feet wet with building OmniOS CE with it.

Thanks!
-Chip

On Fri, Sep 8, 2017 at 10:07 AM, Robert Mustacchi  wrote:

> On 9/8/17 6:43 , Schweiss, Chip wrote:
> > Now that I'm back to working on this.   The only way I could get the
> > firmware updated was booting into the UEFI shell.   A bit of a pain but
> it
> > worked.
> >
> > Unfortunately, it has not changed the behavior of the HBA.
> >
> > Where do I go from here?Any hope of getting this working on OmniOS?
>
> Hi Chip,
>
> I'm just catching up on this. So, I do have some good news and bad news.
> First, the bad news. This is based on the SAS3224 chipset, which it
> appears the 16e is also describing itself as. Of note, this uses a
> slightly newer version of the MPI specification and the driver as it is
> written doesn't quite notice that it requires slightly different
> behavior and a simple PCI ID update isn't sufficient.
>
> The good news is that I just finished doing this work for the LSI
> 9305-24i and was going to send that up to illumos shortly. If you want,
> I can send those changes your way if you're comfortable building illumos
> and want to test that.
>
> Robert
>
> > On Thu, Aug 31, 2017 at 9:53 AM, Schweiss, Chip 
> wrote:
> >
> >> This server will be serving NFS for vSphere.  It is running OmniOS CE,
> >> nothing VMware.
> >>
> >> I'm working on flashing firmware now and will report back any changes.
> >>
> >> -Chip
> >>
> >> On Thu, Aug 31, 2017 at 9:42 AM, Dale Ghent 
> wrote:
> >>
> >>>> On Aug 31, 2017, at 9:29 AM, Schweiss, Chip 
> wrote:
> >>>>
> >>>> I've added mpt_sas "pciex1000,c9" to /etc/driver_aliases and rebooted.
> >>>>
> >>>> Looks like it's partially working, but it's not fully functional.
> >>> Service are timing out:
> >>>>
> >>>> Here's what I see in /var/adm/messages:
> >>>>
> >>>>
> >>>> Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
> >>> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
> >>>> Aug 31 08:15:49 vsphere-zfs01   MPT Firmware Fault, code: 2667
> >>>> Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
> >>> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
> >>>
> >>> The driver is reporting that the MPT IOC (IO Controller) is reporting a
> >>> fault. It's just reading this condition off the controller chip
> itself, and
> >>> unfortunately there doesn't seem to be a handy reference published by
> >>> LSI/Avago regarding what 2667h actually means.
> >>>
> >>> However I note from your machine's hostname that this is perhaps a ESI
> >>> guest that is being given the HBA in passthrough mode? It would seem
> that
> >>> someone else has encountered a similar issue as yourself in this case,
> with
> >>> the same MPT fault code, but on Linux running Proxmox. According to
> this
> >>> forum thread, they ended up flashing the firmware on the card to
> something
> >>> newer and the problem went away:
> >>>
> >>> https://forum.proxmox.com/threads/pci-passthrough.16483/
> >>>
> >>> I would suggest Tim's approach and flashing your card up to the newest
> IT
> >>> (not IR) firmware.
> >>>
> >>> /dale
> >>>
> >
> > --
> > illumos-zfs
> > Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T372d7ddd75316296-M4bd824d5e1881e2772ee518a
> > Powered by Topicbox: https://topicbox.com
> >
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] SAS 9305-16e HBA support in Illumos

2017-09-08 Thread Schweiss, Chip
Now that I'm back to working on this.   The only way I could get the
firmware updated was booting into the UEFI shell.   A bit of a pain but it
worked.

Unfortunately, it has not changed the behavior of the HBA.

Where do I go from here?Any hope of getting this working on OmniOS?

-Chip

On Thu, Aug 31, 2017 at 9:53 AM, Schweiss, Chip  wrote:

> This server will be serving NFS for vSphere.  It is running OmniOS CE,
> nothing VMware.
>
> I'm working on flashing firmware now and will report back any changes.
>
> -Chip
>
> On Thu, Aug 31, 2017 at 9:42 AM, Dale Ghent  wrote:
>
>>
>> > On Aug 31, 2017, at 9:29 AM, Schweiss, Chip  wrote:
>> >
>> > I've added mpt_sas "pciex1000,c9" to /etc/driver_aliases and rebooted.
>> >
>> > Looks like it's partially working, but it's not fully functional.
>> Service are timing out:
>> >
>> > Here's what I see in /var/adm/messages:
>> >
>> >
>> > Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
>> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
>> > Aug 31 08:15:49 vsphere-zfs01   MPT Firmware Fault, code: 2667
>> > Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
>> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
>>
>> The driver is reporting that the MPT IOC (IO Controller) is reporting a
>> fault. It's just reading this condition off the controller chip itself, and
>> unfortunately there doesn't seem to be a handy reference published by
>> LSI/Avago regarding what 2667h actually means.
>>
>> However I note from your machine's hostname that this is perhaps a ESI
>> guest that is being given the HBA in passthrough mode? It would seem that
>> someone else has encountered a similar issue as yourself in this case, with
>> the same MPT fault code, but on Linux running Proxmox. According to this
>> forum thread, they ended up flashing the firmware on the card to something
>> newer and the problem went away:
>>
>> https://forum.proxmox.com/threads/pci-passthrough.16483/
>>
>> I would suggest Tim's approach and flashing your card up to the newest IT
>> (not IR) firmware.
>>
>> /dale
>>
>>
>> --
>> illumos-zfs
>> Archives: https://illumos.topicbox.com/groups/zfs/discussions/T372d7dd
>> d75316296-Mb0dd6c92e5393440a8b0c8fb
>> Powered by Topicbox: https://topicbox.com
>>
>>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] SAS 9305-16e HBA support in Illumos

2017-08-31 Thread Schweiss, Chip
This server will be serving NFS for vSphere.  It is running OmniOS CE,
nothing VMware.

I'm working on flashing firmware now and will report back any changes.

-Chip

On Thu, Aug 31, 2017 at 9:42 AM, Dale Ghent  wrote:

>
> > On Aug 31, 2017, at 9:29 AM, Schweiss, Chip  wrote:
> >
> > I've added mpt_sas "pciex1000,c9" to /etc/driver_aliases and rebooted.
> >
> > Looks like it's partially working, but it's not fully functional.
> Service are timing out:
> >
> > Here's what I see in /var/adm/messages:
> >
> >
> > Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
> > Aug 31 08:15:49 vsphere-zfs01   MPT Firmware Fault, code: 2667
> > Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING:
> /pci@0,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
>
> The driver is reporting that the MPT IOC (IO Controller) is reporting a
> fault. It's just reading this condition off the controller chip itself, and
> unfortunately there doesn't seem to be a handy reference published by
> LSI/Avago regarding what 2667h actually means.
>
> However I note from your machine's hostname that this is perhaps a ESI
> guest that is being given the HBA in passthrough mode? It would seem that
> someone else has encountered a similar issue as yourself in this case, with
> the same MPT fault code, but on Linux running Proxmox. According to this
> forum thread, they ended up flashing the firmware on the card to something
> newer and the problem went away:
>
> https://forum.proxmox.com/threads/pci-passthrough.16483/
>
> I would suggest Tim's approach and flashing your card up to the newest IT
> (not IR) firmware.
>
> /dale
>
>
> --
> illumos-zfs
> Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T372d7ddd75316296-Mb0dd6c92e5393440a8b0c8fb
> Powered by Topicbox: https://topicbox.com
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] SAS 9305-16e HBA support in Illumos

2017-08-31 Thread Schweiss, Chip
s01 scsi: [ID 243001 kern.info]
w5000c5002c6d6512 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d7192 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d36ee FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d64f2 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4b3c16 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c50070a5c08a FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d3662 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d60d6 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d35a2 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d1fbe FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d27c6 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d66a2 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c500056fb256 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c40e3be FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d1846 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c6d5ff6 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d1e52 FastPath Capable and Enabled
Aug 31 08:15:48 vsphere-zfs01 scsi: [ID 243001 kern.info]
w5000c5002c4d2276 FastPath Capable and Enabled
Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:49 vsphere-zfs01   MPT Firmware Fault, code: 2667
Aug 31 08:15:49 vsphere-zfs01 scsi: [ID 107833 kern.warning] WARNING: /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:49 vsphere-zfs01   ioc reset abort passthru
Aug 31 08:15:49 vsphere-zfs01 mpt_sas: [ID 201859 kern.warning] WARNING:
smp_start do passthru error 11
Aug 31 08:15:51 vsphere-zfs01 scsi: [ID 365881 kern.info] /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:51 vsphere-zfs01   MPT Firmware version v9.0.100.0 (?)
Aug 31 08:15:51 vsphere-zfs01 scsi: [ID 365881 kern.info] /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:51 vsphere-zfs01   mpt_sas0 MPI Version 0x206
Aug 31 08:15:51 vsphere-zfs01 scsi: [ID 365881 kern.info] /pci@0
,0/pci8086,1905@1,1/pci1000,3180@0 (mpt_sas0):
Aug 31 08:15:51 vsphere-zfs01   mpt0: IOC Operational.

Where do I go from here?

-Chip


On Thu, Aug 31, 2017 at 7:30 AM, Schweiss, Chip  wrote:

> On Wed, Aug 30, 2017 at 3:12 PM, Dan McDonald  wrote:
>
>> > On Aug 30, 2017, at 4:11 PM, Dale Ghent  wrote:
>> >
>> > Or rather:
>> >
>> > # update_drv -a -i '"pciex1000,c9"' mpt_sas
>>
>> It MIGHT fail because mpt_sas checks PCI IDs explicitly itself.  :(
>>
>>
> Yes, the update_drv command just hangs indefinately.
>
> -Chip
>
>
>
>> FYI,
>> Dan
>>
>>
>> --
>> illumos-zfs
>> Archives: https://illumos.topicbox.com/groups/zfs/discussions/T372d7dd
>> d75316296-M31c90a92ac6d9d9dbd977114
>> Powered by Topicbox: https://topicbox.com
>>
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] [zfs] SAS 9305-16e HBA support in Illumos

2017-08-31 Thread Schweiss, Chip
On Wed, Aug 30, 2017 at 3:12 PM, Dan McDonald  wrote:

> > On Aug 30, 2017, at 4:11 PM, Dale Ghent  wrote:
> >
> > Or rather:
> >
> > # update_drv -a -i '"pciex1000,c9"' mpt_sas
>
> It MIGHT fail because mpt_sas checks PCI IDs explicitly itself.  :(
>
>
Yes, the update_drv command just hangs indefinately.

-Chip



> FYI,
> Dan
>
>
> --
> illumos-zfs
> Archives: https://illumos.topicbox.com/groups/zfs/discussions/
> T372d7ddd75316296-M31c90a92ac6d9d9dbd977114
> Powered by Topicbox: https://topicbox.com
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] SAS 9305-16e HBA support in Illumos

2017-08-30 Thread Schweiss, Chip
I made the assumption that a Broadcom/LSI HBA would be supported already in
OmniOS CE r151022o.

This HBA is not loading.

Here's the 'lspci -vv' output:

02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3216
PCI-Express Fusion-MPT SAS-3 (rev 01)
Subsystem: LSI Logic / Symbios Logic Device 3180
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- ___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Upgrade to 151022m from 014 - horrible NFS performance

2017-08-24 Thread Schweiss, Chip
I switched back to 014 for now, it was too bad to inflict on my users.

I have some new systems coming in soon that I'll test on r151022 before
making them live.   I will start with the NFS defaults.

-Chip

On Thu, Aug 24, 2017 at 8:35 AM, Dan McDonald  wrote:

>
> > On Aug 24, 2017, at 8:41 AM, Schweiss, Chip  wrote:
> >
> > I just move one of my production systems to OmniOS CE 151022m from
> 151014 and my NFS performance has tanked.
> >
> > Here's a snapshot of nfssvrtop:
> >
> > 2017 Aug 24 07:34:39, load: 1.54, read: 5427 KB, swrite: 104
> KB, awrite: 9634 KB
> > Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
> SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%
> > 3   10.28.17.10   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 3   all   0   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 4   10.28.17.19   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 4   10.28.16.160 17   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 4   10.28.16.127 20   0   0   0   0   0
>  0   0   0   0   0   0   0
> > 4   10.28.16.113 74   6   6   0   0  48
> 56   01366   20824   0   0 100
> > 4   10.28.16.64 338  16   0  36   3 476
>  01065 120   0 130  117390 100
> > 4   10.28.16.54 696  68   0  91   52173
>  02916  52   0  93  142083 100
> > 4   all1185  90   6 127   82697
> 563996 151   20824 104  133979 100
> >
> > The pool is not doing anything but serving NFS.   Before the upgrade,
> the pool would sustain 20k NFS ops.
> >
> > Is there some significant change in NFS that I need to adjust its tuning?
>
> Oh my.
>
> I'd start pinging the illumos list on this.  Also, are there any special
> tweaks you made in the 014 configuration?  IF you did, I'd start back
> removing them and seeing what a default system does, just in case.
>
> I know Delphix and Nexenta still care about NFS quite a bit, so I can't
> believe something would be that bad.
>
> Maintainers:  Check for NFS changes RIGHT AFTER 022 closed for blanket
> upstream pull-ins.  Maybe it closed during a poor-performance window?
>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Upgrade to 151022m from 014 - horrible NFS performance

2017-08-24 Thread Schweiss, Chip
I just move one of my production systems to OmniOS CE 151022m from 151014
and my NFS performance has tanked.

Here's a snapshot of nfssvrtop:

2017 Aug 24 07:34:39, load: 1.54, read: 5427 KB, swrite: 104  KB,
awrite: 9634 KB
Ver Client   NFSOPS   Reads SWrites AWrites Commits   Rd_bw
 SWr_bw  AWr_bwRd_t   SWr_t   AWr_t   Com_t  Align%
3   10.28.17.10   0   0   0   0   0
  0   0   0   0   0   0   0
3   all   0   0   0   0   0   0
  0   0   0   0   0   0   0
4   10.28.17.19   0   0   0   0   0
  0   0   0   0   0   0   0
4   10.28.16.160 17   0   0   0   0   0
  0   0   0   0   0   0   0
4   10.28.16.127 20   0   0   0   0   0
  0   0   0   0   0   0   0
4   10.28.16.113 74   6   6   0   0  48
 56   01366   20824   0   0 100
4   10.28.16.64 338  16   0  36   3 476
  01065 120   0 130  117390 100
4   10.28.16.54 696  68   0  91   52173
  02916  52   0  93  142083 100
4   all1185  90   6 127   82697
 563996 151   20824 104  133979 100

The pool is not doing anything but serving NFS.   Before the upgrade, the
pool would sustain 20k NFS ops.

Is there some significant change in NFS that I need to adjust its tuning?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] scsi command timeouts

2017-06-22 Thread Schweiss, Chip
I'm talking about an offline pool.   I started this thread after rebooting
a server that is part of an HA pair. The other server has the pools
online.  It's been over 4 hours now and it still hasn't completed its disk
scan.

Every tool I have that helps me locate disks, suffers from the same insane
command timeout to happen many times before moving on.   Operations that
typically take seconds blow up to hours really fast because of a few dead
disks.

-Chip



On Thu, Jun 22, 2017 at 3:12 PM, Dale Ghent  wrote:

>
> Have you able to and have tried offlining it in the zpool?
>
> zpool offline thepool 
>
> I'm assuming the pool has some redundancy which would allow for this.
>
> /dale
>
> > On Jun 22, 2017, at 11:54 AM, Schweiss, Chip  wrote:
> >
> > When ever a disk goes south, several disk related takes become painfully
> slow.  Boot up times can jump into the hours to complete the disk scans.
> >
> > The logs slowly get these type messages:
> >
> > genunix: WARNING /pci@0,0/pci8086,340c@5/pci15d9,400@0 (mpt_sas0):
> > Timeout of 60 seconds expired with 1 commands on target 16 lun 0
> >
> > I thought this /etc/system setting would reduce the timeout to 5 seconds:
> > set sd:sd_io_time = 5
> >
> > But this doesn't seem to change anything.
> >
> > Is there anyway to make this a more reasonable timeout, besides pulling
> the disk that's causing it?   Just locating the defective disk is also
> painfully slow because of this problem.
> >
> > -Chip
> > ___
> > OmniOS-discuss mailing list
> > OmniOS-discuss@lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] scsi command timeouts

2017-06-22 Thread Schweiss, Chip
On Thu, Jun 22, 2017 at 11:05 AM, Michael Rasmussen  wrote:

>
> > I thought this /etc/system setting would reduce the timeout to 5 seconds:
> > set sd:sd_io_time = 5
> >
> I think it expects a hex value so try 0x5 instead.
>
>
Unfortunately, no, I've tried that too.

-Chip


> --
> Hilsen/Regards
> Michael Rasmussen
>
> Get my public GnuPG keys:
> michael  rasmussen  cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir  datanom  net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir  miras  org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --
> /usr/games/fortune -es says:
> Look, we play the Star Spangled Banner before every game.  You want us
> to pay income taxes, too?
> -- Bill Veeck, Chicago White Sox
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] scsi command timeouts

2017-06-22 Thread Schweiss, Chip
When ever a disk goes south, several disk related takes become painfully
slow.  Boot up times can jump into the hours to complete the disk scans.

The logs slowly get these type messages:

genunix: WARNING /pci@0,0/pci8086,340c@5/pci15d9,400@0 (mpt_sas0):
Timeout of 60 seconds expired with 1 commands on target 16 lun 0

I thought this /etc/system setting would reduce the timeout to 5 seconds:
set sd:sd_io_time = 5

But this doesn't seem to change anything.

Is there anyway to make this a more reasonable timeout, besides pulling the
disk that's causing it?   Just locating the defective disk is also
painfully slow because of this problem.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] To the OmniOS Community

2017-05-15 Thread Schweiss, Chip
I just added my potential in paid support for OmniOS being maintained
professionally.   This spreadsheet does reveal the realities OmniTI was
facing with OmniOS.   My guess is they were never close to turning a profit
on it.

It saddens me to see such a well developed and maintained distribution go
unsupported.   I hope there is enough commercial potential that OmniOS gets
picked up to offer a supported distribution, but I don't expect there will
be.   OmniOS is, unfortunately, a niche product, that doesn't have enough
paying customer potential. Just as many others did, I initially started
using it as a platform to run ZFS storage.  It made much
more economical sense than buying into Nexenta in order to get a supported
platform.  ZFS is now much more mature on Linux and I suspect this
drastically reduces OmniOS's demand.

Dan McDonald and OmniTI put forward an excellent run at it.
Unfortunately, OmniOS was, for the most part, a one-man show that without
Dan will, by my best prediction, die a long slow death.   I'm sure there
will be some advancement by the community for several years to come, but
not at a pace and quality that can be tolerated by users like myself.

I give the credit to Dan, for the development of such a great distribution
of Illumos, and relentless effort to support it.  I do have to criticize
OmniTI for poor execution.   We were paid supporters of OmniOS, however,
things like sending out renewal invoices were never executed in a timely
manner and requests for quotes for additional server support were difficult
to get a response on.   I think in the three years we were paid supporters,
we probably only paid for two.

Should a new company offer a paid support contract for OmniOS with the
talent to back it up, I will very quickly get Washington University on
board.  If that doesn't happen in the next few months, I will start our
plan of moving or ZFS storage to another platform, that has long-term
viability and hopefully some sort of paid support.   That may mean getting
an independent contractor or consulting firm on retainer for emergency help
or dedicated patch development/maintenance.

-Chip

On Mon, May 15, 2017 at 12:37 AM, Michael Rasmussen  wrote:

> On Sun, 14 May 2017 19:10:32 -0400
> Theo Schlossnagle  wrote:
>
> >
> > Communities are built on collective need and contributed time. It comes
> > down to volunteering. I have yet to see anyone pledge time to do work,
> and
> > from my perspective that is what is truly needed here.
> >
> This is not entirely true since I some weeks ago did exactly that
> without specifying what I would do precisely.
>
> > I was hoping not to be the first, but I will attempt to lead by example.
> > I'll pledge my time to do required security package publications on 014
> and
> > 022. As security issues with packages in core arise, I will update the
> > build system, re-roll the packages and publish them.
> >
> To follow example I will commit myself to maintain the wiki and any
> other web based infrastructure the project may need and choose to have.
>
> --
> Hilsen/Regards
> Michael Rasmussen
>
> Get my public GnuPG keys:
> michael  rasmussen  cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir  datanom  net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir  miras  org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --
> /usr/games/fortune -es says:
> Don't patch bad code - rewrite it.
> - The Elements of Programming Style (Kernighan & Plaugher)
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Resilver zero progress

2017-05-10 Thread Schweiss, Chip
I have a pool that has had a resilver running for about an hour but the
progress status is a bit alarming.  I'm concerned for some reason it will
not resilver.   Resilvers are tuned to be faster in /etc/system.   This is
on OmniOS r151014, currently fully updated.   Any suggestions?

-Chip

from /etc/system:

set zfs:zfs_resilver_delay = 0
set zfs:zfs_scrub_delay = 0
set zfs:zfs_top_maxinflight = 64
set zfs:zfs_resilver_min_time_ms = 5000


# zpool status hcp03
  pool: hcp03
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed May 10 09:22:15 2017
1 scanned out of 545T at 1/s, (scan is slow, no estimated time)
0 resilvered, 0.00% done
config:

NAME STATE READ WRITE CKSUM
hcp03DEGRADED 0 0 0
  raidz2-0   DEGRADED 0 0 0
c0t5000C500846F161Fd0ONLINE   0 0 0
spare-1  UNAVAIL  0 0 0
  5676922542927845170UNAVAIL  0 0 0  was
/dev/dsk/c0t5000C5008473DBF3d0s0
  c0t5000C500846F1823d0  ONLINE   0 0 0
c0t5000C500846F134Fd0ONLINE   0 0 0
c0t5000C500846F139Fd0ONLINE   0 0 0
c0t5000C5008473B89Fd0ONLINE   0 0 0
c0t5000C500846F145Bd0ONLINE   0 0 0
c0t5000C5008473B6BBd0ONLINE   0 0 0
c0t5000C500846F131Fd0ONLINE   0 0 0
  raidz2-1   ONLINE   0 0 0
c0t5000C5008473BB63d0ONLINE   0 0 0
c0t5000C5008473C9C7d0ONLINE   0 0 0
c0t5000C500846F1A17d0ONLINE   0 0 0
c0t5000C5008473A0A3d0ONLINE   0 0 0
c0t5000C5008473D047d0ONLINE   0 0 0
c0t5000C5008473BF63d0ONLINE   0 0 0
c0t5000C5008473BC83d0ONLINE   0 0 0
c0t5000C5008473E35Bd0ONLINE   0 0 0
  raidz2-2   ONLINE   0 0 0
c0t5000C5008473ABAFd0ONLINE   0 0 0
c0t5000C5008473ADF3d0ONLINE   0 0 0
c0t5000C5008473AE77d0ONLINE   0 0 0
c0t5000C5008473A23Bd0ONLINE   0 0 0
c0t5000C5008473C907d0ONLINE   0 0 0
c0t5000C5008473CCABd0ONLINE   0 0 0
c0t5000C5008473C77Fd0ONLINE   0 0 0
c0t5000C5008473B6D3d0ONLINE   0 0 0
  raidz2-3   ONLINE   0 0 0
c0t5000C5008473E4FFd0ONLINE   0 0 0
c0t5000C5008473ECFFd0ONLINE   0 0 0
c0t5000C5008473F4C3d0ONLINE   0 0 0
c0t5000C5008473F8CFd0ONLINE   0 0 0
c0t5000C500846F1897d0ONLINE   0 0 0
c0t5000C500846F14B7d0ONLINE   0 0 0
c0t5000C500846F1353d0ONLINE   0 0 0
c0t5000C5008473EEDFd0ONLINE   0 0 0
  raidz2-4   ONLINE   0 0 0
c0t5000C500846F144Bd0ONLINE   0 0 0
c0t5000C5008473F10Fd0ONLINE   0 0 0
c0t5000C500846F15CBd0ONLINE   0 0 0
c0t5000C500846F1493d0ONLINE   0 0 0
c0t5000C5008473E26Fd0ONLINE   0 0 0
c0t5000C500846F1A0Bd0ONLINE   0 0 0
c0t5000C5008473EE07d0ONLINE   0 0 0
c0t5000C500846F1453d0ONLINE   0 0 0
  raidz2-5   ONLINE   0 0 0
c0t5000C500846F153Bd0ONLINE   0 0 0
c0t5000C5008473F9EBd0ONLINE   0 0 0
c0t5000C500846F14EFd0ONLINE   0 0 0
c0t5000C5008473AB0Bd0ONLINE   0 0 0
c0t5000C500846F140Bd0ONLINE   0 0 0
c0t5000C5008473FC0Fd0ONLINE   0 0 0
c0t5000C5008473DFA3d0ONLINE   0 0 0
c0t5000C5008473F89Bd0ONLINE   0 0 0
  raidz2-6   ONLINE   0 0 0
c0t5000C500846F19BFd0ONLINE   0 0 0
c0t5000C5008473D1ABd0ONLINE   0 0 0
c0t5000C50084739FD3d0ONLINE   0 0 0
c0t5000C5008473FFB7d0ONLINE   0 0 0
c0t5000C5008473E72Fd0ONLINE   0 0 0
c0t5000C50084

Re: [OmniOS-discuss] OmniOS DOS'd my entire network

2017-05-09 Thread Schweiss, Chip
Here's the screen shot:

Given the rarity of this, I wouldn't be surprised it never happens again to
me.   The major difficulty was locating the offending system.   All we were
finding was very poor TCP connections everywhere on or network.  Even VLANs
that were not active on the server, but trunked on its switch ports.

That mac address corresponds to another OmniOS server, that is not part of
the same HA cluster as this one.   It has not been shut down since the
incident.

-Chip





On Tue, May 9, 2017 at 3:22 PM, Dan McDonald  wrote:

>
> > On May 9, 2017, at 3:32 PM, Schweiss, Chip  wrote:
> >
> > This was a first for me and extremely painful to locate.
> >
> > In the middle of the night between last Friday and Saturday, I started
> getting down alerts from most of my network.   It took 4 engineers
> including myself 9 hours to pinpoint the source of the problem.
> >
> > The problem turned out to be one of my OmniOS boxes sending out pure
> garbage constantly on layer 2 out the 10G network ports.   This disrupted
> ARP caches on every machine on every VLAN that was trunked on these ports,
> not just the VLANs that were configured on the server.   The switches
> reported every port healthy and without error.   The traffic on the bad
> port was not high either, just severely disruptive.
>
> Whoa!  On L2 (like non-TCP/IP ethernet frames)?
>
> > The affected OmniOS box appear to be healthy, as it was still serving
> the VM data stores for over 350 virtual machines.   However, it like every
> other service on the network appeared to be up and down repeatedly, but NFS
> kept on recovering gracefully.
> >
> > The only thing that finally identified this server was when one of us
> plug a monitor to the console and saw "WARNING: proxy ARP problem?"
> happening so fast that it took taking a cellphone picture of it a high
> frame rate to read it.   Powering off this server, cleared the problem for
> the entire network, and its pools were taken over by its HA sister.
>
> If it's easy to do so, unplug or "ifconfig down" the interface next time
> this happens.
>
> > Googling for that warning brings up nothing useful.
> >
> > Has anyone ever seen a problem like this?   How did you locate it?
>
> Should search src.illumos.org, you'll find this:
>
> http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/
> common/inet/ip/ip_arp.c#1449
>
> We appear to be freaking out over another node having our IP.  The only
> caller with AR_CN_BOGON is after ip_nce_resolve_all() returns AR_BOGON.
>
> I wonder if some other entity had the same IP, and they
> fed-back-upon-each-other negatively?
>
> The message you cite should show an IP address with it:
>
> "proxy ARP problem?  Node '%s' is using %s on %s",
>
> where the %s-es are MAC-address, IP-address, and interface-name
> respectively.  You didn't get examples with your digital camera, did you?
>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] OmniOS DOS'd my entire network

2017-05-09 Thread Schweiss, Chip
This was a first for me and extremely painful to locate.

In the middle of the night between last Friday and Saturday, I started
getting down alerts from most of my network.   It took 4 engineers
including myself 9 hours to pinpoint the source of the problem.

The problem turned out to be one of my OmniOS boxes sending out pure
garbage constantly on layer 2 out the 10G network ports.   This disrupted
ARP caches on every machine on every VLAN that was trunked on these ports,
not just the VLANs that were configured on the server.   The switches
reported every port healthy and without error.   The traffic on the bad
port was not high either, just severely disruptive.

The affected OmniOS box appear to be healthy, as it was still serving the
VM data stores for over 350 virtual machines.   However, it like every
other service on the network appeared to be up and down repeatedly, but NFS
kept on recovering gracefully.

The only thing that finally identified this server was when one of us plug
a monitor to the console and saw "WARNING: proxy ARP problem?"  happening
so fast that it took taking a cellphone picture of it a high frame rate to
read it.   Powering off this server, cleared the problem for the entire
network, and its pools were taken over by its HA sister.

Googling for that warning brings up nothing useful.

Has anyone ever seen a problem like this?   How did you locate it?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] new supermicro server

2017-03-08 Thread Schweiss, Chip
On Wed, Mar 8, 2017 at 8:36 AM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

>
>
> Perhaps there is a way to tell the HBA BIOS to not advertize the SAS
> drives which are not needed for booting?


In the HBA BIOS configuration, set the HBA to disabled.  OmniOS will still
see the HBA and disks, but the BIOS will not want to list all the disks.
You only need it enabled if you boot from a disk attached to the HBA.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Fwd: NFS server unresponsive

2017-02-01 Thread Schweiss, Chip
-- Forwarded message --
From: Schweiss, Chip 
Date: Wed, Feb 1, 2017 at 8:58 PM
Subject: NFS server unresponsive
To: zfs 


I have a storage pool with 100+ NFS clients.   It is part of an HA cluster.
  No matter which host I put the pool on NFS comes to a halt for all pools.
  Everything else is fine.  Just no NFS.

Most of the clients are NFSv4.  I've rebooted the system and the problem
repeats itself in just a few seconds of being responsive after the pool
come online.

I suspected NFS resources, and tried doubling most everything:

# sharectl get nfs
servers=1024
lockd_listen_backlog=512
lockd_servers=4096
lockd_retransmit_timeout=5
grace_period=90
server_versmin=3
server_versmax=4
client_versmin=3
client_versmax=4
server_delegation=on
nfsmapid_domain=
max_connections=-1
protocol=ALL
listen_backlog=64
device=
mountd_listen_backlog=64
mountd_max_threads=16

Didn't change anything.

Nothing in logs.   Pool is fine for everything but NFS.

nfssvrtop show no I/O about 99% of the time, but periodically shows a
couple small I/O.


Possibly a client knocking over, but I cannot isolate the issue tighter
than the entire pool.

What else should I be looking at?

Any help greatly appreciated.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Corrupted file recovery and restoring pool to ONLINE

2017-01-23 Thread Schweiss, Chip
To get back to an online state you need to detach the offline disk:

zpool detach B-034 c10t5C0F0132772Ed0s0

If the corrupted file is in any snapshots those snapshots will have to be
destroyed to stop it from being found as a corruption during a scrub.

-Chip

On Mon, Jan 23, 2017 at 9:53 AM,  wrote:

>
>   Howdy!
>
>  I had a corrupted file during resilvering after a drive
> failure/replacment, I replaced the file from a backup and the
> pool started to resilver again. If finished and is still in a
> state
> that shows DEGRADED with file errors. I can read the file fine and md5sum
> checks out.
>
>   What do I need to do to put this pool into an ONLINE state and
> remove the error?
> Or is this pool still problematic??
>
> thanx - steve
>
>   pool: B-034
>  state: DEGRADED
> status: One or more devices has experienced an error resulting in data
> corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
> entire pool from backup.
>see: http://illumos.org/msg/ZFS-8000-8A
>   scan: resilvered 3.46T in 28h19m with 3 errors on Sun Jan 22 17:43:53
> 2017
> config:
>
> NAMESTATE READ WRITE CKSUM
> B-034   DEGRADED 0 0 3
>   raidz1-0  DEGRADED 0 0 6
> c0t5000C500571D5D9Fd0s0 ONLINE   0 0 0
> c0t5000C500571D69D3d0s0 ONLINE   0 0 0
> c10t5C0F01F82C82d0s0ONLINE   0 0 0
> c10t5C0F01F84B6Ad0s0ONLINE   0 0 0
> replacing-4 DEGRADED 0 0 0
>   c10t5C0F0132772Ed0s0  OFFLINE  0 0 0
>   c10t539578C8A83Ed0s0  ONLINE   0 0 0
> c10t5C0F0136989Ad0s0ONLINE   0 0 0
> c10t5C0F01327226d0s0ONLINE   0 0 0
> c10t5C0F01327316d0s0ONLINE   0 0 0
>
> errors: Permanent errors have been detected in the following files:
>
> B-034@nov_5_2016:/51/17/1000621751.bkt
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Network >10Gb/s

2017-01-10 Thread Schweiss, Chip
On Tue, Jan 10, 2017 at 9:58 AM, Dan McDonald  wrote:

>
> > On Jan 10, 2017, at 8:41 AM, Schweiss, Chip  wrote:
> >
> > It appears that my options for 40Gb/s Ethernet are Intel, Chelsio and
> SolarFlare.
> >
> > Can anyone comment on which of these is the most stable solution when
> running under OmniOS?   What's the fastest NFS throughput you've been able
> to achieve?
>
> The Intel i40e driver is nascent, but it will receive more attention as
> time passes.  Doug's point about SolarFlare is a good one.
>
>
I'm a bit concerned on the Intel because of posts like this:
https://news.ycombinator.com/item?id=11373848  and the fact that they seem
to have shifted their focus to Omni-Path which from my understanding is
incompatible with the existing 40G gear.

SolarFlare seems promising, but I'd like to know of at least on success
story.

-Chip



> You may wish to ping the larger illumos community about this as well.
>


> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Network >10Gb/s

2017-01-10 Thread Schweiss, Chip
It appears that my options for 40Gb/s Ethernet are Intel, Chelsio and
SolarFlare.

Can anyone comment on which of these is the most stable solution when
running under OmniOS?   What's the fastest NFS throughput you've been able
to achieve?

Also is there any work being done by anyone to bring an Omni-Path
compatible NIC to Illumos/OmniOS?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Multiple faulty SSD's ?

2016-07-26 Thread Schweiss, Chip
I don't have a lot of experience with the 850 Pro, but a lot with the 840
Pro under OmniOS

With 4K block size set in sd.conf and slicing them to only use 80% of their
capacity a pool of 72 of them has been under near constant heavy read/write
workload for over 3 years without a single chksum error.

-Chip

On Tue, Jul 26, 2016 at 1:30 PM, Piotr Jasiukajtis  wrote:

> I don’t know a root cause, but it’s better to have a workaround than a
> corrupted pools.
>
> --
> Piotr Jasiukajtis
>
> > On 26 Jul 2016, at 20:06, Dan McDonald  wrote:
> >
> > I wonder if those sd.conf changes should be upstreamed or not?
> >
> > Dan
> >
> > Sent from my iPhone (typos, autocorrect, and all)
> >
> >> On Jul 26, 2016, at 1:28 PM, Piotr Jasiukajtis  wrote:
> >>
> >> You may want to force the driver to use 4k instead of 512b for those
> drivers and create a new pool:
> >>
> >>
> https://github.com/joyent/smartos-live/commit/dd25937d2f9725def16f5e8dbb16a8bcbc2213d5
> >>
> >> --
> >> Piotr Jasiukajtis
> >>
> >>> On 26 Jul 2016, at 02:24, Shaun McGuane  wrote:
> >>>
> >>> Hi List,
> >>>
> >>> I want to report very strange SSD behaviour on a new pool I setup.
> >>>
> >>> The hardware is a HP DL180 G6 Server with the LSI 9207-8i Card
> >>> And 8x 1TB Samsung SSD Pro drives. Running omnios-10b9c79
> >>>
> >>> All the drives are brand spanking new setup in a raidz2 array.
> >>>
> >>> Within 2 months the below has happened and there has been very
> >>> Little use on this array.
> >>>
> >>> pool: SSD-TANK
> >>> state: DEGRADED
> >>> status: One or more devices are faulted in response to persistent
> errors.
> >>>   Sufficient replicas exist for the pool to continue functioning
> in a
> >>>   degraded state.
> >>> action: Replace the faulted device, or use 'zpool clear' to mark the
> device
> >>>   repaired.
> >>> scan: scrub repaired 23K in 1h12m with 0 errors on Mon Jul 25 20:13:04
> 2016
> >>> config:
> >>>
> >>>   NAME   STATE READ WRITE CKSUM
> >>>   SSD-TANK   DEGRADED 16735
> >>> raidz2-0 DEGRADED 472   113
> >>>   c5t500253884014D0D3d0  ONLINE   0 0 2
> >>>   c5t50025388401F767Ad0  DEGRADED 0 019  too many
> errors
> >>>   c5t50025388401F767Bd0  FAULTED  0 0 0  too many
> errors
> >>>   c5t50025388401F767Dd0  ONLINE   0 0 0
> >>>   c5t50025388401F767Fd0  ONLINE   0 0 1
> >>>   c5t50025388401F7679d0  ONLINE   0 0 2
> >>>   c5t50025388401F7680d0  REMOVED  0 0 0
> >>>   c5t50025388401F7682d0  ONLINE   0 0 1
> >>>
> >>> Can anyone suggest why I would have this problem where I am seeing
> CKSUM errors
> >>> On most disks and while only one has faulted others have been degraded
> or removed.
> >>>
> >>> Thanks
> >>> Shaun
> >>> ___
> >>> OmniOS-discuss mailing list
> >>> OmniOS-discuss@lists.omniti.com
> >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >>
> >> ___
> >> OmniOS-discuss mailing list
> >> OmniOS-discuss@lists.omniti.com
> >> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LSI3108

2016-05-23 Thread Schweiss, Chip
On Mon, May 23, 2016 at 3:18 PM, Dan McDonald  wrote:

>
> > On May 23, 2016, at 4:10 PM, Fábio Rabelo 
> wrote:
> >
> > Hi to all ...
> >
> > You may need to flash the IT firmware :
> >
> >
> http://www.avagotech.com/products/server-storage/host-bus-adapters/sas-9305-24i#downloads
>
> And if you do, make sure it's version 19 or lower.  The IT firmware > 19
> is known to be flaky.  Check the illumos list archives for details on why.
>

That only on the 6G HBA.   On the 12G HBA version 12 is the newest.   I
haven't heard any preferred versions on these.  I've been running  v10 for
several months now without issue.

-Chip


>
> Dan
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Supermicro X9DR3-F PCI Bus reported fault.

2016-04-27 Thread Schweiss, Chip
I've run many Supermicro servers in the X9 and X10 series.   It sounds like
you have a bad board, or at the minimum a bad slot.   If it's under
warranty get it exchanged.

-Chip

On Tue, Apr 26, 2016 at 9:47 PM, Shaun McGuane 
wrote:

> Hi OmniOS list,
>
> I am wondering if anyone has had any experience with the Supermicro boards
> for OmniOS in particular Model X9DR3-F
> I have it setup with 2x Intel E5-2670 processors (so I can use all the
> pci-e slots) and 256GB DDR3 ECC Ram as a base.
>
> I have then tried to run this with LSI 9207-8i Cards x3 for the complete
> setup, started off with none to get a base OmniOS
> Install on the server to ensure all is working OK before adding cards and
> drives.
>
> The OmniOS version I am running is r151014 – I have also tried the latest
> current build from 2016 and I get the same result
>
> I am getting the following error when performing : fmadm faulty
>
> Fault class: fault.io.pciex.device-interr
> Affects: dev:pci@78,0/pci8086,3c08@3/pci8086,a21f@0 faulted and taken
> out of service
> FRU: “CPU2_SLOT6”
>
> This slot being reported is the slot closest to the cpu.
>
> The problem that I have is that I have 2 of these boards are showing the
> same error and I have tested these boards running
> Ubuntu 14.04 and Windows and I do not have any errors or issues using this
> slot. I am new to using super micro boards for
> my ZFS arrays and are used to using HP Servers (DL180 G6, etc)
>
> I don’t necessarily need to use this slot, but I am seeing strange issues
> with removing and re-inserting drives where drives
> Show up when running "iostat –En” but not when I run format to label them.
>
> I thought the 2 issues maybe connected.
>
> Kind Regards
> Shaun McGuane
>
>
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Routing challlenges

2016-04-07 Thread Schweiss, Chip
On Thu, Apr 7, 2016 at 12:51 PM, Michael Talbott  wrote:

> Oh, I see. Sorry about that, reading it on my phone didn't render your
> diagram properly ;)
>
> The reason this is happening is because the omnios box has knowledge of
> both subnets in its routing table and it always takes the shortest path to
> reach an ip destination.
>

That's definitely the reason, but not correct when stateful firewalls are
involved.

>
> So you will need to put the "clients" in a unique subnet that always
> passes through the firewall in both directions (in a subnet that's not
> shared by the omnios machines). Any attempt to add/modify a static route to
> the omnios box to resolve this will likely fail (it'll just move the
> problem from one network to the other one and cause your "services" network
> to route improperly).
>

The problem is each person who manages these (there are 4) are also clients
of the services (SMB, NFS).

For management, going through the firewall is fine, it is low volume, but
the services need to be on the same VLAN or else the 1gb firewall will
choke on the 10gb services.


> Either that, or remove the firewall as a hop, set sshd to listen only on
> the management IP, and add a management vlan interface to the clients
> allowed to connect.
>
>
I've considered this too, but I have some floating IP attached to zfs pools
in an HA cluster that SSH needs to bind to.

Unless I can get the network stack on the management vlan to act
independently of the other interfaces, it may come to modifying the
sshd_config and restarting ssh each time a pool is started or stopped on a
host.

-Chip



> Michael
>
>
> On Apr 7, 2016, at 10:25 AM, Michael Talbott  wrote:
>
> It sounds like you're using the same subnet for management and service
> traffic, that would be the problem causing the split route. Give each vlan
> a unique subnet and traffic should flow correctly.
>
> Michael
> Sent from my iPhone
>
> On Apr 7, 2016, at 8:52 AM, Schweiss, Chip  wrote:
>
> On several of my OmniOS hosts I have a setup a management interface for
> SSH access on an independent VLAN.   There are service vlans attached to
> other nics.
>
> The problem I am having is that when on privileged machine on one of the
> vlans also on the service side that has access to the management SSH port
> the TCP SYN comes in the management VLAN but the SYNACK goes out the
> service VLAN instead of routing back out its connecting port.   This causes
> a split route and the firewall blocks the connection because the connection
> never appears complete.
>
> Traffic is flowing like this:
> client   firewall omnnios
> 10.28.0.106 ->   10.28.0.254->10.28.125.254  -> 10.28.125.44
>
> 10.28.0.106  <- 10.28.0.44
>
> How can I cause connections to only communicate on the vlan that the
> connection is initiated from?
>
> I don't want to use the 10.28.0.44 interface because that is a virtual IP
> and will not always be on the same host.
>
> -Chip
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Routing challlenges

2016-04-07 Thread Schweiss, Chip
On several of my OmniOS hosts I have a setup a management interface for SSH
access on an independent VLAN.   There are service vlans attached to other
nics.

The problem I am having is that when on privileged machine on one of the
vlans also on the service side that has access to the management SSH port
the TCP SYN comes in the management VLAN but the SYNACK goes out the
service VLAN instead of routing back out its connecting port.   This causes
a split route and the firewall blocks the connection because the connection
never appears complete.

Traffic is flowing like this:
client   firewall omnnios
10.28.0.106 ->   10.28.0.254->10.28.125.254  -> 10.28.125.44

10.28.0.106  <- 10.28.0.44

How can I cause connections to only communicate on the vlan that the
connection is initiated from?

I don't want to use the 10.28.0.44 interface because that is a virtual IP
and will not always be on the same host.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] sshd logging

2016-03-31 Thread Schweiss, Chip
I'm trying to get sshd logging to work on OmniOS with OpenSSH installed.
Nothing I try seems to produce any logging.

In sshd_config I have:

# Syslog facility and level
SyslogFacility AUTH
LogLevel VERBOSE

In /etc/syslog.conf:

*.err;kern.notice;auth.notice   /dev/sysmsg
*.err;kern.debug;daemon.notice;mail.crit/var/adm/messages
*.alert;kern.err;daemon.err operator
*.alert root
*.emerg *

# if a non-loghost machine chooses to have authentication messages
# sent to the loghost machine, un-comment out the following line:
auth.notice ifdef(`LOGHOST', /var/log/authlog, @loghost)
mail.debug  ifdef(`LOGHOST', /var/log/syslog, @loghost)

#
# non-loghost machines will use the following lines to cause "user"
# log messages to be logged locally.
#
ifdef(`LOGHOST', ,
user.err/dev/sysmsg
user.err/var/adm/messages
user.alert  `root, operator'
user.emerg  *
)

I've tried may combinations, in both ssshd_config and syslog.conf.

Can someone clue me in on the magic formula?

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-19 Thread Schweiss, Chip
On Thu, Feb 18, 2016 at 3:56 PM, Richard Elling <
richard.ell...@richardelling.com> wrote:

>
>
> Related to lock manager is name lookup. If you use name services, you add
> a latency
> dependency to failover for name lookups, which is why we often disable DNS
> or other
> network name services on high-availability services as a best practice.
>  -- richard
>
>
Interesting approach.  Something I will definitely test in our environment.
  The biggest challenge I see is that I run Samba on a couple hosts that
needs DNS.   Hopefully I can find a work around for it.

It would be nice if DNS could be disabled just for NFS.

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Question on puppet agent for OmniOS

2016-02-18 Thread Schweiss, Chip
The version in OpenCSW is quite old.  If you're starting fresh with Puppet
start with the newest version possible.   Upgrades can be very painful.

I run Puppet 4.3.2 installed from a gem.   I use Ruby from OpenCSW to make
this possible.

It's still in early development, but is working very well.   I've been
running Puppet for 5 years on Linux, just recently on OmniOS.

Soon the provision script will be on our public repo.  If any would like to
look at it now, email me and I'll send you a copy.

I have to update ruby gems before installing puppet.  The important
snippets from my provision script:

RUBYGEMS_VERSION="2.0.15"RUBYGEMS_UPDATE="rubygems-update-${RUBYGEMS_VERSION}.gem"RUBYGEMS_UPDATE_SOURCE="https://github.com/rubygems/rubygems/releases/download/v${RUBYGEMS_VERSION}";
current_gems_version=`/opt/csw/bin/gem --version`if [
"$current_gems_version" != "${RUBYGEMS_VERSION}" ]; thenwget
${RUBYGEMS_UPDATE_SOURCE}/${RUBYGEMS_UPDATE}/opt/csw/bin/gem
install --local ${RUBYGEMS_UPDATE} && rm ${RUBYGEMS_UPDATE}fi
/opt/csw/bin/gem install --no-rdoc --no-ri puppet

-Chip


On Wed, Feb 17, 2016 at 6:01 PM, Trey Palmer  wrote:

> I should add, I'd use the niksula package over CSW if you can.   I really
> appreciate their repo (and OmniOS).
>
> We only use the CSW package because at the time the available version
> happened to line up with what we were running everywhere else.   As a
> general rule, the agents shouldn't be an earlier version than your masters.
>
>
> Now niksula has 3.8.5 which I might actually be able to change to.   Using
> CSW packages intended for a diverging closed source Solaris is obviously
> gonna bite me sooner or later.
>
> As far as the manifest/method, if you run the daemon you can have puppet
> install them on the first run, and puppet's standard service resource has a
> "manifest" parameter that will "svccfg import" for you.
>
> But Lauri is right that there's no real reason to run the daemon vice from
> cron except to standardize with the rest of your org.
>
>-- Trey
>
>
>
> On Wed, Feb 17, 2016 at 4:59 PM, Lauri Tirkkonen  wrote:
>
>> On Wed, Feb 17 2016 15:52:00 -0600, Paul Jochum wrote:
>> > Thanks for responding.  I am new to puppet, and curious, why do you
>> run
>> > it from cron, instead of in daemon mode?  Is it more secure, or is there
>> > something else I am missing?
>>
>> It used to be that the agent was leaking memory in version 2.something
>> when we first started using it. 'puppet kick' also existed then, to
>> trigger an agent run from the master, but it doesn't anymore; we don't
>> think there's any reason to run the agent as a daemon.
>>
>> --
>> Lauri Tirkkonen | lotheac @ IRCnet
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

2016-02-18 Thread Schweiss, Chip
On Thu, Feb 18, 2016 at 5:14 AM, Michael Rasmussen  wrote:

> On Thu, 18 Feb 2016 07:13:36 +0100
> Stephan Budach  wrote:
>
> >
> > So, when I issue a simple ls -l on the folder of the vdisks, while the
> switchover is happening, the command somtimes comcludes in 18 to 20
> seconds, but sometime ls will just sit there for minutes.
> >
> This is a known limitation in NFS. NFS was never intended to be
> clustered so what you experience is the NFS process on the client side
> keeps kernel locks for the now unavailable NFS server and any request
> to the process hangs waiting for these locks to be resolved. This can
> be compared to a situation where you hot-swap a drive in the pool
> without notifying the pool.
>
> Only way to resolve this is to forcefully kill all NFS client processes
> and the restart the NFS client.
>
>
I've been running RSF-1 on OmniOS since about r151008.  All my clients have
always been NFSv3 and NFSv4.

My memory is a bit fuzzy, but when I first started testing RSF-1, OmniOS
still had the Sun lock manager which was later replaced with the BSD lock
manager.   This has had many difficulties.

I do remember that fail overs when I first started with RSF-1 never had
these stalls, I believe this was because the lock state was stored in the
pool and the server taking over the pool would inherit that state too.
That state is now lost when a pool is imported with the BSD lock manager.

When I did testing I would do both full speed reading and writing to the
pool and force fail overs, both by command line and by killing power on the
active server.Never did I have a fail over take more than about 30
seconds for NFS to fully resume data flow.

Others who know more about the BSD lock manager vs the old Sun lock manager
may be able to tell us more.  I'd also be curious if Nexenta has addressed
this.

-Chip


> --
> Hilsen/Regards
> Michael Rasmussen
>
> Get my public GnuPG keys:
> michael  rasmussen  cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir  datanom  net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir  miras  org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --
> /usr/games/fortune -es says:
> The founding fathers tried to set up a judicial system where the accused
> received a fair trial, not a system to insure an acquittal on
> technicalities.
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Updating to r15016

2016-02-10 Thread Schweiss, Chip
On Wed, Feb 10, 2016 at 1:11 PM, Dan McDonald  wrote:

>
> > On Feb 10, 2016, at 11:26 AM, Schweiss, Chip  wrote:
> >
> > /usr/bin/pkg update --be-name=omnios-r151016 entire@11,5.11-0.151016
>
> Lose the "5."...
>
> r151016(~)[0]% pkg list -v entire
> FMRI
>IFO
> pkg://omnios/entire@11-0.151016:20151202T161203Z
>i--
> r151016(~)[0]%
>
> Do we need to update a wiki page about that?
>

Possibly.   I may be the odd user who doesn't understand what 'entire' on
the update is doing or how to correct it when my syntax is wrong.




>
> Also, you could just specify "entire" if the publisher's set right.
>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Updating to r15016

2016-02-10 Thread Schweiss, Chip
On Wed, Feb 10, 2016 at 11:19 AM, Dale Ghent  wrote:

>
> > On Feb 10, 2016, at 11:26 AM, Schweiss, Chip  wrote:
> >
> > I'm updating one of my systems to r151016.   When I use:
> >
> > /usr/bin/pkg update --be-name=omnios-r151016 entire@11,5.11-0.151016
> >
> > I get:
> > pkg update: 'entire@11,5.11-0.151016' matches no installed packages
> >
> > I'm ignorant of what the entire@ portion does as I've been script
> kidding my way through upgrades.   Can someone explain what this is
> supposed to be?
>
> Did you change your omnios repo to the one for r151016? Different versions
> of OmniOS reside in their own repos now, so first you must switch the
> omnios publisher, then you can just run 'pkg upgrade'
>

Yes.  That was my first step.

Been through this many times on OmniOS, but this time the entire@ seems to
be cause a problem and I'm not clear why.

-Chip

>
> pkg set-publisher -G http://pkg.omniti.com/omnios/r151014/ -g
> http://pkg.omniti.com/omnios/r151016/ omnios
> pkg update -v
>
> /dale
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Updating to r15016

2016-02-10 Thread Schweiss, Chip
I'm updating one of my systems to r151016.   When I use:

/usr/bin/pkg update --be-name=omnios-r151016 entire@11,5.11-0.151016

I get:
pkg update: 'entire@11,5.11-0.151016' matches no installed packages

I'm ignorant of what the entire@ portion does as I've been script kidding
my way through upgrades.   Can someone explain what this is supposed to be?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zlib/zlib-devel packages

2016-01-20 Thread Schweiss, Chip
I'll definitely be trying your build config.

Is that joined to an Active Directory domain?   If so I'm confused on how
that works with the '--without-ad-dc' flag.

Thanks!
-Chip



On Wed, Jan 20, 2016 at 2:29 PM, Michael Talbott  wrote:

> I have samba 4.2.3 working with all the bells and whistles (including
> winbind). Also I have netatalk working along side it ;)
>
> Hope this helps:
>
> Here's what I installed as prereqs before compiling them:
>
> pkg install \
>   library/security/openssl \
>   naming/ldap \
>   system/library/iconv/unicode \
>   system/library/dbus \
>   system/library/libdbus \
>   system/library/libdbus-glib \
>   developer/gnu-binutils \
>   developer/build/libtool \
>   developer/build/autoconf \
>   system/library/math/header-math \
>   /system/library/dbus \
>   /system/library/libdbus-glib \
>   /omniti/database/bdb \
>   /text/gnu-gettext \
>   /service/network/dns/mdns \
>   /developer/build/gnu-make \
>   /developer/build/automake \
>   /developer/build/libtool \
>   /developer/macro/gnu-m4 \
>   /developer/build/gnu-make \
>   /developer/gnu-binutils \
>   developer/build/autoconf \
>   developer/build/automake \
>   developer/lexer/flex \
>   developer/parser/bison \
>   developer/object-file \
>   developer/linker \
>   developer/library/lint \
>   developer/build/gnu-make \
>   library/idnkit \
>   library/idnkit/header-idnkit \
>   system/header \
>   system/library/math/header-math \
>   gcc44 \
>   gcc48
>
> pkg install /omniti/perl/dbd-mysql \
> /omniti/database/mysql-55/library
>
> pkg install libgcrypt
>
> And, this is what I use for building samba (I force 32 bit so winbind
> plays nicely with other 32 bit only tools like the "id" command):
>
> export ISALIST=i386
> CFLAGS=-m32 CXXFLAGS=-m32 CPPFLAGS=-m32 LDFLAGS=-m32 \
> ./configure \
>   --prefix=/usr/local \
>   --bindir=/usr/local/bin \
>   --sbindir=/usr/local/sbin \
>   --libdir=/usr/local/lib/ \
>   --mandir=/usr/local/man \
>   --infodir=/usr/local/info \
>   --sysconfdir=/etc/samba \
>   --with-configdir=/etc/samba \
>   --with-privatedir=/etc/samba/private \
>   --localstatedir=/var \
>   --sharedstatedir=/var \
>   --bundled-libraries=ALL \
>   --with-winbind \
>   --with-ads \
>   --with-ldap \
>   --with-pam \
>   --with-iconv \
>   --with-acl-support \
>   --with-syslog \
>   --with-aio-support \
>   --enable-fhs \
>   --without-ad-dc \
>
> --with-shared-modules=idmap_ad,vfs_zfsacl,vfs_audit,vfs_catia,vfs_full_audit,vfs_readahead,vfs_streams_xattr,time_audit,vfs_fruit
> \
>   --enable-gnutls
>
> gmake
> gmake install
>
> Good luck!
>
> Michael
>
>
> On Jan 20, 2016, at 11:48 AM, Schweiss, Chip  wrote:
>
> I ended up downloading and building zlib separately and got it to build.
>
> The problem only occurs when selecting --with-ads during configure.  It
> fails on checking gnutls, which needs zlib-devel.
>
> My build will not join the domain, but that's out of the scope of this
> list..
>
> Thanks!
> -Chip
>
>
>
> On Wed, Jan 20, 2016 at 1:02 PM, Peter Tribble 
> wrote:
>
>> On Wed, Jan 20, 2016 at 6:40 PM, Schweiss, Chip 
>> wrote:
>>
>>> Is anyone aware of zlib and zlib-devel packages available anywhere for
>>> OmniOS?
>>>
>>
>> Installed in OmniOS by default, and cannot be uninstalled.
>>
>>
>>> These are needed for building any Samba version 4.2.0 or greater.
>>>
>>
>> We build samba (4.3.x) on OmniOS without any issues. We haven't done
>> anything
>> beyond install the basic build tools. What sort of error are you getting?
>>
>> --
>> -Peter Tribble
>> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
>>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zlib/zlib-devel packages

2016-01-20 Thread Schweiss, Chip
On Wed, Jan 20, 2016 at 2:03 PM, Dan McDonald  wrote:

> Probably not useful now for you on LTS, but Nexenta's SMB2 is available
> for r151016 and later.
>

My biggest challenge is I have to support multiple domains on one server.
That's forcing me to build from source because several paths get compiled
in and break things.

-Chip

>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] zlib/zlib-devel packages

2016-01-20 Thread Schweiss, Chip
I ended up downloading and building zlib separately and got it to build.

The problem only occurs when selecting --with-ads during configure.  It
fails on checking gnutls, which needs zlib-devel.

My build will not join the domain, but that's out of the scope of this
list..

Thanks!
-Chip



On Wed, Jan 20, 2016 at 1:02 PM, Peter Tribble 
wrote:

> On Wed, Jan 20, 2016 at 6:40 PM, Schweiss, Chip 
> wrote:
>
>> Is anyone aware of zlib and zlib-devel packages available anywhere for
>> OmniOS?
>>
>
> Installed in OmniOS by default, and cannot be uninstalled.
>
>
>> These are needed for building any Samba version 4.2.0 or greater.
>>
>
> We build samba (4.3.x) on OmniOS without any issues. We haven't done
> anything
> beyond install the basic build tools. What sort of error are you getting?
>
> --
> -Peter Tribble
> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] zlib/zlib-devel packages

2016-01-20 Thread Schweiss, Chip
Is anyone aware of zlib and zlib-devel packages available anywhere for
OmniOS?

These are needed for building any Samba version 4.2.0 or greater.

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] NFS Server restart

2015-12-08 Thread Schweiss, Chip
I had an NFS server become unresponsive on one of my production systems.
The NFS server service would not restart, out of desperation I rebooted
which fixed the problem.

Before reboot I tried restarting all NFS related service with no-avail.
The reboot probably wasn't necessary but the correct list and order of
services to restart is.

Can someone fill me in on which services in what order should be
stopped/started to get NFS fully reset?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Updates for OmniOS r151014 & r151016

2015-11-13 Thread Schweiss, Chip
On Fri, Nov 13, 2015 at 2:13 PM, Dan McDonald  wrote:

>
> 014:
> --
>
> - OpenSSH 7.1p1, including the r151016 method(s) of changing between
> SunSSH and OpenSSH
>
>
Thank you for this!!

-Chip


>
> Happy updating!
> Dan
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow performance with ZeusRAM?

2015-10-22 Thread Schweiss, Chip
The ZIL on log devices suffer a bit from not filling queues well.   In
order to get the queues to fill more, try running your test to several zfs
folders on the pool simultaneously and measure your total I/O.

As I understand it, ff you're writing to only one zfs folder, your queue
depth will stay at 1 on the log device and you be come latency bound.

-Chip

On Thu, Oct 22, 2015 at 2:02 PM, Matej Zerovnik  wrote:

> Hello,
>
> I'm building a new system and I'm having a bit of a performance problem.
> Well, its either that or I'm not getting the whole ZIL idea:)
>
> My system is following:
> - IBM xServer 3550 M4 server (dual CPU with 160GB memory)
> - LSI 9207 HBA (P19 firmware)
> - Supermicro JBOD with SAS expander
> - 4TB SAS3 drives
> - ZeusRAM for ZIL
> - LTS Omnios (all patches applied)
>
> If I benchmark ZeusRAM on its own with random 4k sync writes, I can get
> 48k IOPS out of it, no problem there.
>
> If I create a new raidz2 pool with 10 hard drives, mirrored ZeusRAMs for
> ZIL and set sync=always, I can only squeeze 14k IOPS out of the system.
> Is that normal or should I be getting 48k IOPS on the 2nd pool as well,
> since this is the performance ZeusRAM can deliver?
>
> I'm testing with fio:
> fio --filename=/pool0/test01 --size=5g --rw=randwrite --refill_buffers
> --norandommap --randrepeat=0 --ioengine=solarisaio --bs=4k --iodepth=16
> --numjobs=16 --runtime=60 --group_reporting --name=4ktest
>
> thanks, Matej
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] ZIL TXG commits happen very frequently - why?

2015-10-14 Thread Schweiss, Chip
It all has to do with the write throttle and buffers filling.   Here's a
great blog post on how it works and how it's tuned:

http://dtrace.org/blogs/ahl/2014/02/10/the-openzfs-write-throttle/

http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/

-Chip


On Wed, Oct 14, 2015 at 12:45 AM, Rune Tipsmark  wrote:

> Hi all.
>
>
>
> Wondering if anyone could shed some light on why my ZFS pool would perform
> TXG commits up to 5 times per second. It’s set to the default 5 second
> interval and occasionally it does wait 5 seconds between commits, but only
> when nearly idle.
>
>
>
> I’m not sure if this impacts my performance but I would suspect it doesn’t
> improve it. I force sync on all data.
>
>
>
> I got 11 mirrors (7200rpm sas disks) two SLOG devices and two L2 ARC
> devices and a pair of spare disks.
>
>
>
> Each log device can hold 150GB of data so plenty for 2 TXG commits. The
> system has 384GB memory.
>
>
>
>
> Below is a bit of output from zilstat during a near idle time this morning
> so you wont see 4-5 commits per second, but during load later today it will
> happen..
>
>
>
> root@zfs10:/tmp# ./zilstat.ksh -M -t -p pool01 txg
>
> waiting for txg commit...
>
> TIMEtxg   N-MB N-MB/s N-Max-Rate
> B-MB B-MB/s B-Max-Rateops  <=4kB 4-32kB >=32kB
>
> 2015 Oct 14 06:21:19   10872771  3  3  0
> 21 21  2234 14 19201
>
> 2015 Oct 14 06:21:22   10872772 10  3  3
>  70 23 24806  0 84725
>
> 2015 Oct 14 06:21:24   10872773 12  6  5
> 56 28 26682 17107558
>
> 2015 Oct 14 06:21:25   10872774 13 13  2
>  75 75 14651  0 10641
>
> 2015 Oct 14 06:21:25   10872775  0  0  0
> 0  0  0  1  0  0  1
>
> 2015 Oct 14 06:21:26   10872776 11 11  6
> 53 53 29645  2136507
>
> 2015 Oct 14 06:21:30   10872777 11  2  4
> 81 20 32873 11 60804
>
> 2015 Oct 14 06:21:30   10872778  0  0  0
> 0  0  0  1  0  1  0
>
> 2015 Oct 14 06:21:31   10872779 12 12 11
> 56 56 52631  0  8623
>
> 2015 Oct 14 06:21:33   10872780 11  5  4
> 74 37 27858  0 44814
>
> 2015 Oct 14 06:21:36   10872781 14  4  6
> 79 26 30977 12 82883
>
> 2015 Oct 14 06:21:39   10872782 11  3  4
> 78 26 25957 18 55884
>
> 2015 Oct 14 06:21:43   10872783 13  3  4
> 80 20 24930  0135795
>
> 2015 Oct 14 06:21:46   10872784 13  4  4
> 81 27 29965 13 95857
>
> 2015 Oct 14 06:21:49   10872785 11  3  6
> 80 26 41   1077 12215850
>
> 2015 Oct 14 06:21:53   10872786  9  3  2
> 67 22 18870  1 74796
>
> 2015 Oct 14 06:21:56   10872787 12  3  5
> 72 18 26909 17163729
>
> 2015 Oct 14 06:21:58   10872788 12  6  3
> 53 26 21530  0 33497
>
> 2015 Oct 14 06:21:59   10872789 26 26 24
> 72 72 62882 12 60810
>
> 2015 Oct 14 06:22:02   10872790  9  3  5
> 57 19 28777  0 70708
>
> 2015 Oct 14 06:22:07   10872791 11  2  3
> 96 24 22   1044 12 46986
>
> 2015 Oct 14 06:22:10   10872792 13  3  4
> 78 19 22911 12 38862
>
> 2015 Oct 14 06:22:14   10872793 11  2  4
> 79 19 26930 10 94826
>
> 2015 Oct 14 06:22:17   10872794 11  3  5
> 73 24 26   1054 17151886
>
> 2015 Oct 14 06:22:17   10872795  0  0  0
> 0  0  0  2  0  0  2
>
> 2015 Oct 14 06:22:18   10872796 40 40 38
> 78 78 60707  0 28680
>
> 2015 Oct 14 06:22:22   10872797 10  3  3
> 66 22 21937 14164759
>
> 2015 Oct 14 06:22:25   10872798  9  2  2
> 66 16 21821 11 92718
>
> 2015 Oct 14 06:22:28   10872799 24 12 14
> 80 40 43750  0 23727
>
> 2015 Oct 14 06:22:28   10872800  0  0  0
> 0  0  

Re: [OmniOS-discuss] big zfs storage?

2015-10-07 Thread Schweiss, Chip
I completely concur with Richard on this.  Let me give an a real example
that emphases this point as it's a critical design decision.

I never fully understood this until I saw in action the problem can
automate hot spares can cause.   I had all 5 hot spares get put into action
on one raidz2 vdev of a 300TB pool.  This was triggered by an HA event that
was taking SCSI reservations in a split brain situation that was supposed
to trigger a panic on one system.  This caused a highly corrupted pool.
Fortunately this was not a production pool and I simply trashed it and
started reloading data.

Now I only run one hot spare per pool.  Most of my pools are raidz2 or
raidz3.   This way any event like this can not take out more than one disk
and data parity will never be lost.

There are other causes that can trigger multiple disk replacements. I have
not encountered them.  If I do, they won't hurt my data with the limit of
one hot spare.

-Chip




On Wed, Oct 7, 2015 at 5:38 PM, Richard Elling <
richard.ell...@richardelling.com> wrote:

>
> > On Oct 7, 2015, at 1:59 PM, Mick Burns  wrote:
> >
> > So... how does Nexenta copes with hot spares and all kinds of disk
> failures ?
> > Adding hot spares is part of their administration manuals so can we
> > assume things are almost always handled smoothly ?  I'd like to hear
> > from tangible experiences in production.
>
> I do not speak for Nexenta.
>
> Hot spares are a bigger issue when you have single parity protection.
> With double parity and large pools, warm spares is a better approach.
> The reasons are:
>
> 1. Hot spares exist solely to eliminate the time between disk failure and
> human
>intervention for corrective action. There is no other reason to have
> hot spares.
>The exposure for a single disk failure under single parity protection
> is too risky
>for most folks, but with double parity (eg raidz2 or RAID-6) the few
> hours you
>save has little impact on overall data availabilty vs warm spares.
>
> 2. Under some transient failure conditions (eg isolated power failure, IOM
> reboot, or fabric
>partition), all available hot spares can be kicked into action. This
> can leave you with a
>big mess for large pools with many drives and spares. You can avoid
> this by making a
>human be involved in the decision process, rather than just *locally
> isolated,* automated
>decision making.
>
>  -- richard
>
> >
> >
> > thanks
> >
> > On Mon, Jul 13, 2015 at 7:58 AM, Schweiss, Chip 
> wrote:
> >> Liam,
> >>
> >> This report is encouraging.  Please share some details of your
> >> configuration.   What disk failure parameters are have you set?   Which
> >> JBODs and disks are you running?
> >>
> >> I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
> >> expanders and Supermicro has LSI, both setups have pretty much the same
> >> behavior with disk failures.   All my servers are Supermicro with LSI
> HBAs.
> >>
> >> If there's a magic combination of hardware and OS config out there that
> >> solves the disk failure panic problem, I will certainly change my builds
> >> going forward.
> >>
> >> -Chip
> >>
> >> On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser 
> wrote:
> >>>
> >>> I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T
> systems.
> >>> Things generally work very well.  We loose a disk here and there but
> its
> >>> never resulted in downtime.  They're all on Dell hardware with LSI or
> Dell
> >>> PERC controllers.
> >>>
> >>> Putting in smaller disk failure parameters, so disks fail quicker, was
> a
> >>> big help when something does go wrong with a disk.
> >>>
> >>> thanks,
> >>> liam
> >>>
> >>>
> >>> On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip 
> >>> wrote:
> >>>>
> >>>> Unfortunately for the past couple years panics on disk failure has
> been
> >>>> the norm.   All my production systems are HA with RSF-1, so at least
> things
> >>>> come back online relatively quick.  There are quite a few open
> tickets in
> >>>> the Illumos bug tracker related to mpt_sas related panics.
> >>>>
> >>>> Most of the work to fix these problems has been committed in the past
> >>>> year, though problems still exist.  For example, my systems are dual
> path
> >>>> SAS, however, mpt_sas will panic if you pull a cable instead of
&

Re: [OmniOS-discuss] zfs send/receive corruption?

2015-10-05 Thread Schweiss, Chip
This smells of a problem reported fixed on FreeBSD and ZoL.
http://permalink.gmane.org/gmane.comp.file-systems.openzfs.devel/1545

On the Illumos ZFS the question was posed if the fixed have been
incorporated, but unanswered:
http://www.listbox.com/member/archive/182191/2015/09/sort/time_rev/page/1/entry/23:71/20150916025648:1487D326-5C40-11E5-A45A-20B0EF10038B/

I'd be curious to confirm if this has been fixed in Illimos or not as I now
have systems with lots of CIFS and ACLs and potential vulnerable to the
same sort of problem.  Thus far I cannot find reference to it, but I could
be looking in the wrong place, or for the wrong keywords.

-Chip

On Mon, Oct 5, 2015 at 12:45 PM, Michael Rasmussen  wrote:

> On Mon, 5 Oct 2015 11:30:04 -0600
> Aaron Curry  wrote:
>
> > # zfs get sync pool/fs
> > NAMEPROPERTY  VALUE SOURCE
> > pool/fs  sync  standard  default
> >
> > Is that what you mean?
> >
> Yes. Default means honor sync requests.
>
> --
> Hilsen/Regards
> Michael Rasmussen
>
> Get my public GnuPG keys:
> michael  rasmussen  cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir  datanom  net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir  miras  org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --
> /usr/games/fortune -es says:
> Love isn't only blind, it's also deaf, dumb, and stupid.
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] possible bug

2015-09-29 Thread Schweiss, Chip
I've seen issues like this when you run out of NFS locks.   NFSv3 in
Illumos is really slow at releasing locks.

On all my NFS servers I do:

sharectl set -p lockd_listen_backlog=256 nfs
sharectl set -p lockd_servers=2048 nfs

Everywhere I can, I use NFSv4 instead of v3.   It handles lock much better.

-Chip

On Tue, Sep 29, 2015 at 1:22 PM, Hildebrandt, Bill 
wrote:

> Over the past few weeks, I have had 3 separate occurrences where my
> OmniOS/Napp-it NAS stops responding to NFS and CIFS.  The first time was
> during the week of the ZFS corruption bug announcement.  The system and
> it’s replicated storage were both scrubbed and zdb analyzed, and nothing
> looked wrong.  I rebuilt the NAS from scratch with updated patches and
> imported the pool.  Same thing happened three days later, and now today,
> eight days later.  Each time, a reboot is performed to bring it back.  All
> services appear to be running.  The odd thing is that an “ls –l” hangs on
> every mountpoint.  Has anyone heard of this issue?  Since I am not OmniOS
> savvy, is there anything I can capture while in that state that could help
> debug it?
>
>
>
> Thanks,
>
> Bill
>
> --
>
> This e-mail and any documents accompanying it may contain legally
> privileged and/or confidential information belonging to Exegy, Inc. Such
> information may be protected from disclosure by law. The information is
> intended for use by only the addressee. If you are not the intended
> recipient, you are hereby notified that any disclosure or use of the
> information is strictly prohibited. If you have received this e-mail in
> error, please immediately contact the sender by e-mail or phone regarding
> instructions for return or destruction and do not use or disclose the
> content to others.
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS Bloody update

2015-09-15 Thread Schweiss, Chip
On Mon, Sep 14, 2015 at 10:46 PM, Dan McDonald  wrote:

> Lauri "lotheac" Tirkkonen can provide more details.  Also note - there is
> an effort to replace sunssh with OpenSSh altogether.
>

I'll second that request to port OpenSSH into r151014 when it's ready.
The SunSSH keeps giving me fits.

I understand Joyent has made some great headway in the effort to get
OpenSSH into Illumos.   Looking forward to the day I don't have to script
around SunSSH problems.

-Chip

Sep 14, 2015, at 9:38 PM, Paul B. Henson  wrote:

>> From: Dan McDonald
>> Sent: Monday, September 14, 2015 2:58 PM
>>
>> - OpenSSH is now at version 7.1p1.
>
> Has the packaging been fixed in bloody so you can actually install this
now
> :)? If so, any thoughts on potentially back porting that to the current
LTS
> :)?
>
>> - An additional pair of ZFS fixes from Delphix not yet upstreamed in
> illumos-gate.
>
> That would be DLPX-36997 and DLPX-35372? Do you happen to know if Delphix
> has their issue tracker accessible to the Internet if somebody wanted to
> take a look in more detail at these? Google didn't provide anything of any
> obvious use.
>
> Thanks!
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] r151014 users - beware of illumos 6214 and L2ARC

2015-09-10 Thread Schweiss, Chip
On Thu, Sep 10, 2015 at 11:43 AM, Dan McDonald  wrote:

>
> > On Sep 10, 2015, at 12:15 PM, Schweiss, Chip  wrote:
> >
> > Is this limited to r151014 and bloody?
> >
> > I was under the impression this bug went back to the introduction of
> L2ARC compression.
>
> Did you read the analysis of 6214?  It calls out this commit as the cause:
>
> Author: Chris Williamson 
> Date:   Mon Dec 29 19:12:23 2014 -0800
>
> 5408 managing ZFS cache devices requires lots of RAM
> Reviewed by: Christopher Siden 
> Reviewed by: George Wilson 
> Reviewed by: Matthew Ahrens 
> Reviewed by: Don Brady 
> Reviewed by: Josef 'Jeff' Sipek 
> Approved by: Garrett D'Amore 
>
> That wasn't in '012, just '014 and later.
>

Sorry, I missed that.   I was going off assumptions from other
communications.

-Chip

>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] r151014 users - beware of illumos 6214 and L2ARC

2015-09-10 Thread Schweiss, Chip
Is this limited to r151014 and bloody?

I was under the impression this bug went back to the introduction of L2ARC
compression.

-Chip


On Thu, Sep 10, 2015 at 6:53 AM, Dan McDonald  wrote:

>
> > On Sep 10, 2015, at 7:53 AM, Dan McDonald  wrote:
> >
> > If you are using a zpool with r151014 and you have an L2ARC ("cache")
> vdev, I recommend at this time disabling it.  You may disable it by
> uttering:
>
> This also affects bloody as well.
>
> Dan
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] Periodic SSH connect failures

2015-09-10 Thread Schweiss, Chip
On OmniOS r151014 I use ssh with rsa-keys to allow my storage systems to
communicate and launch things like 'zfs receive'

Periodically the connection fails with "ssh_exchange_identification:
Connection closed by remote host"   When this happens about 1/2 the
connection attempts will fail this way for about 10-20 minutes then thing
return to normal.

root@mir-dr-zfs01:/root# ssh -v mirpool02
OpenSSH_6.6, OpenSSL 1.0.1p 9 Jul 2015
debug1: Reading configuration data /etc/opt/csw/ssh/ssh_config
debug1: Connecting to mirpool02 [10.28.125.130] port 22.
debug1: Connection established.
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type 2
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6
ssh_exchange_identification: Connection closed by remote host
root@mir-dr-zfs01:/root# echo $?
255

I've not been able to get logs out of the SunSSH server, turning things on
in /etc/syslog.conf, doesn't seem to work.   What am I am missing in trying
to get more information out of the ssh server?

I used OpenSSH client from OpenCSW, using the SunSSH client and the problem
happens nearly twice as often.

Any suggestions on how to make these connections robust?

Thanks!
-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Slow Drive Detection and boot-archive

2015-07-29 Thread Schweiss, Chip
The only other thing that come to mind is that you mentioned you have only
a single SAS path to these disks.   Have you disabled multipath?  (stmsboot
-d)


-Chip

On Wed, Jul 29, 2015 at 5:02 PM, Michael Talbott  wrote:

> Gave that a shot. No dice. Still getting the 8 second lag. It reminds me
> of raid cards that do staggered spinups that sequentially spin up 1 drive
> at a time. Only, this is happening after the kernel loads and of course,
> the LSI 9200s are flashed in IT mode with v.19 firmware and bios disabled.
>
>
> Jul 29 14:57:12 store2 genunix: [ID 583861 kern.info] sd10 at mpt_sas2:
> unit-address w5c0f0401c20f,0: w5c0f0401c20f,0
> Jul 29 14:57:12 store2 genunix: [ID 936769 kern.info] sd10 is /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f0401c20f,0
> Jul 29 14:57:12 store2 genunix: [ID 408114 kern.info] /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f0401c20f,0 (sd10)
> online
> Jul 29 14:57:20 store2 genunix: [ID 583861 kern.info] sd11 at mpt_sas2:
> unit-address w5c0f040075db,0: w5c0f040075db,0
> Jul 29 14:57:20 store2 genunix: [ID 936769 kern.info] sd11 is /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f040075db,0
> Jul 29 14:57:21 store2 genunix: [ID 408114 kern.info] /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f040075db,0 (sd11)
> online
> Jul 29 14:57:29 store2 genunix: [ID 583861 kern.info] sd12 at mpt_sas2:
> unit-address w5c0f042c684b,0: w5c0f042c684b,0
> Jul 29 14:57:29 store2 genunix: [ID 936769 kern.info] sd12 is /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f042c684b,0
> Jul 29 14:57:29 store2 genunix: [ID 408114 kern.info] /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f042c684b,0 (sd12)
> online
> Jul 29 14:57:38 store2 genunix: [ID 583861 kern.info] sd13 at mpt_sas2:
> unit-address w5c0f0457149f,0: w5c0f0457149f,0
> Jul 29 14:57:38 store2 genunix: [ID 936769 kern.info] sd13 is /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f0457149f,0
> Jul 29 14:57:38 store2 genunix: [ID 408114 kern.info] /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f0457149f,0 (sd13)
> online
> Jul 29 14:57:47 store2 genunix: [ID 583861 kern.info] sd14 at mpt_sas2:
> unit-address w5c0f042b1c6f,0: w5c0f042b1c6f,0
> Jul 29 14:57:47 store2 genunix: [ID 936769 kern.info] sd14 is /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f042b1c6f,0
> Jul 29 14:57:47 store2 genunix: [ID 408114 kern.info] /pci@0
> ,0/pci8086,e04@2/pci1000,3080@0/iport@f0/disk@w5c0f042b1c6f,0 (sd14)
> online
>
>
> ____
> Michael Talbott
> Systems Administrator
> La Jolla Institute
>
> On Jul 29, 2015, at 1:50 PM, Schweiss, Chip  wrote:
>
> I have an OmniOS box with all the same hardware except the server and hard
> disks.  I would wager this something to do with the WD disks and something
> different happening in the init.
>
> This is a stab in the dark, but try adding "power-condition:false" in
> /kernel/drv/sd.conf for the WD disks.
>
> -Chip
>
>
>
> On Wed, Jul 29, 2015 at 12:48 PM, Michael Talbott 
> wrote:
>
>> Here's the specs of that server.
>>
>> Fujitsu RX300S8
>>  -
>> http://www.fujitsu.com/fts/products/computing/servers/primergy/rack/rx300/
>> 128G ECC DDR3 1600 RAM
>> 2 x Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
>> 2 x LSI 9200-8e
>> 2 x 10Gb Intel NICs
>> 2 x SuperMicro 847E26-RJBOD1 45 bay JBOD enclosures
>>  - http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm
>>
>> The enclosures are not currently set up for multipathing. The front and
>> rear backplane each have a single independent SAS connection to one of the
>> LSI 9200s.
>>
>> The two enclosures are fully loaded with 45 x 4TB WD4001FYYG-01SL3 drives
>> each (90 total).
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16822236353
>>
>> Booting the server up in Ubuntu or CentOS does not have that 8 second
>> delay. Each drive is found in a fraction of a second (activity LEDs on the
>> enclosure flash on and off really quick as the drives are scanned). On
>> OmniOS, the drives seem to be scanned in the same order, but, instead of it
>> spending a fraction of a second on each drive, it spends 8 seconds on 1
>> drive (led of only one drive rapidly flashing during that process) before
>> moving on to the next x 90 drives.
>>
>> Is there anything I can do to get more verbosity in the boot messages
>> that might just reveal the root issue?
>>
>> Any suggestions appreciated.
>>
>> Thanks
>>
>> 
>&g

Re: [OmniOS-discuss] Slow Drive Detection and boot-archive

2015-07-29 Thread Schweiss, Chip
I have an OmniOS box with all the same hardware except the server and hard
disks.  I would wager this something to do with the WD disks and something
different happening in the init.

This is a stab in the dark, but try adding "power-condition:false" in
/kernel/drv/sd.conf for the WD disks.

-Chip



On Wed, Jul 29, 2015 at 12:48 PM, Michael Talbott  wrote:

> Here's the specs of that server.
>
> Fujitsu RX300S8
>  -
> http://www.fujitsu.com/fts/products/computing/servers/primergy/rack/rx300/
> 128G ECC DDR3 1600 RAM
> 2 x Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
> 2 x LSI 9200-8e
> 2 x 10Gb Intel NICs
> 2 x SuperMicro 847E26-RJBOD1 45 bay JBOD enclosures
>  - http://www.supermicro.com/products/chassis/4U/847/SC847E26-RJBOD1.cfm
>
> The enclosures are not currently set up for multipathing. The front and
> rear backplane each have a single independent SAS connection to one of the
> LSI 9200s.
>
> The two enclosures are fully loaded with 45 x 4TB WD4001FYYG-01SL3 drives
> each (90 total).
> http://www.newegg.com/Product/Product.aspx?Item=N82E16822236353
>
> Booting the server up in Ubuntu or CentOS does not have that 8 second
> delay. Each drive is found in a fraction of a second (activity LEDs on the
> enclosure flash on and off really quick as the drives are scanned). On
> OmniOS, the drives seem to be scanned in the same order, but, instead of it
> spending a fraction of a second on each drive, it spends 8 seconds on 1
> drive (led of only one drive rapidly flashing during that process) before
> moving on to the next x 90 drives.
>
> Is there anything I can do to get more verbosity in the boot messages that
> might just reveal the root issue?
>
> Any suggestions appreciated.
>
> Thanks
>
> ____
> Michael Talbott
> Systems Administrator
> La Jolla Institute
>
> On Jul 29, 2015, at 7:51 AM, Schweiss, Chip  wrote:
>
>
>
> On Fri, Jul 24, 2015 at 5:03 PM, Michael Talbott  wrote:
>
>> Hi,
>>
>> I've downgraded the cards (LSI 9211-8e) to v.19 and disabled their boot
>> bios. But I'm still getting the 8 second per drive delay after the kernel
>> loads. Any other ideas?
>>
>>
> 8 seconds is way too long.   What JBODs and disks are you using?   Could
> it be they are powered off and the delay in waiting for the power on
> command to complete?   This could be accelerated by using lsiutils to send
> them all power on commands first.
>
> While I still consider it slow, however, my OmniOS systems with  LSI HBAs
> discover about 2 disks per second.   With systems with LOTS of disk all
> multipathed it still stacks up to a long time to discover them all.
>
> -Chip
>
>
>>
>> 
>> Michael Talbott
>> Systems Administrator
>> La Jolla Institute
>>
>> > On Jul 20, 2015, at 11:27 PM, Floris van Essen ..:: House of Ancients
>> Amstafs ::..  wrote:
>> >
>> > Michael,
>> >
>> > I know v20 does cause lots of issue's.
>> > V19 , to the best of my knowledge doesn't contain any, so I would
>> downgrade to v19
>> >
>> >
>> > Kr,
>> >
>> >
>> > Floris
>> > -Oorspronkelijk bericht-
>> > Van: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com]
>> Namens Michael Talbott
>> > Verzonden: dinsdag 21 juli 2015 4:57
>> > Aan: Marion Hakanson 
>> > CC: omnios-discuss 
>> > Onderwerp: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
>> >
>> > Thanks for the reply. The bios for the card is disabled already. The 8
>> second per drive scan happens after the kernel has already loaded and it is
>> scanning for devices. I wonder if it's due to running newer firmware. I did
>> update the cards to fw v.20.something before I moved to omnios. Is there a
>> particular firmware version on the cards I should run to match OmniOS's
>> drivers?
>> >
>> >
>> > 
>> > Michael Talbott
>> > Systems Administrator
>> > La Jolla Institute
>> >
>> >> On Jul 20, 2015, at 6:06 PM, Marion Hakanson 
>> wrote:
>> >>
>> >> Michael,
>> >>
>> >> I've not seen this;  I do have one system with 120 drives and it
>> >> definitely does not have this problem.  A couple with 80+ drives are
>> >> also free of this issue, though they are still running OpenIndiana.
>> >>
>> >> One thing I pretty much always do here, is to disable the boot option
>> >> in the LSI HBA's config 

Re: [OmniOS-discuss] Slow Drive Detection and boot-archive

2015-07-29 Thread Schweiss, Chip
On Fri, Jul 24, 2015 at 5:03 PM, Michael Talbott  wrote:

> Hi,
>
> I've downgraded the cards (LSI 9211-8e) to v.19 and disabled their boot
> bios. But I'm still getting the 8 second per drive delay after the kernel
> loads. Any other ideas?
>
>
8 seconds is way too long.   What JBODs and disks are you using?   Could it
be they are powered off and the delay in waiting for the power on command
to complete?   This could be accelerated by using lsiutils to send them all
power on commands first.

While I still consider it slow, however, my OmniOS systems with  LSI HBAs
discover about 2 disks per second.   With systems with LOTS of disk all
multipathed it still stacks up to a long time to discover them all.

-Chip


>
> 
> Michael Talbott
> Systems Administrator
> La Jolla Institute
>
> > On Jul 20, 2015, at 11:27 PM, Floris van Essen ..:: House of Ancients
> Amstafs ::..  wrote:
> >
> > Michael,
> >
> > I know v20 does cause lots of issue's.
> > V19 , to the best of my knowledge doesn't contain any, so I would
> downgrade to v19
> >
> >
> > Kr,
> >
> >
> > Floris
> > -Oorspronkelijk bericht-
> > Van: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com]
> Namens Michael Talbott
> > Verzonden: dinsdag 21 juli 2015 4:57
> > Aan: Marion Hakanson 
> > CC: omnios-discuss 
> > Onderwerp: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
> >
> > Thanks for the reply. The bios for the card is disabled already. The 8
> second per drive scan happens after the kernel has already loaded and it is
> scanning for devices. I wonder if it's due to running newer firmware. I did
> update the cards to fw v.20.something before I moved to omnios. Is there a
> particular firmware version on the cards I should run to match OmniOS's
> drivers?
> >
> >
> > 
> > Michael Talbott
> > Systems Administrator
> > La Jolla Institute
> >
> >> On Jul 20, 2015, at 6:06 PM, Marion Hakanson  wrote:
> >>
> >> Michael,
> >>
> >> I've not seen this;  I do have one system with 120 drives and it
> >> definitely does not have this problem.  A couple with 80+ drives are
> >> also free of this issue, though they are still running OpenIndiana.
> >>
> >> One thing I pretty much always do here, is to disable the boot option
> >> in the LSI HBA's config utility (accessible from the during boot after
> >> the BIOS has started up).  I do this because I don't want the BIOS
> >> thinking it can boot from any of the external JBOD disks;  And also
> >> because I've had some system BIOS crashes when they tried to enumerate
> >> too many drives.  But, this all happens at the BIOS level, before the
> >> OS has even started up, so in theory it should not affect what you are
> >> seeing.
> >>
> >> Regards,
> >>
> >> Marion
> >>
> >>
> >> 
> >> Subject: Re: [OmniOS-discuss] Slow Drive Detection and boot-archive
> >> From: Michael Talbott 
> >> Date: Fri, 17 Jul 2015 16:15:47 -0700
> >> To: omnios-discuss 
> >>
> >> Just realized my typo. I'm using this on my 90 and 180 drive systems:
> >>
> >> # svccfg -s boot-archive setprop start/timeout_seconds=720 # svccfg -s
> >> boot-archive setprop start/timeout_seconds=1440
> >>
> >> Seems like 8 seconds to detect each drive is pretty excessive.
> >>
> >> Any ideas on how to speed that up?
> >>
> >>
> >> 
> >> Michael Talbott
> >> Systems Administrator
> >> La Jolla Institute
> >>
> >>> On Jul 17, 2015, at 4:07 PM, Michael Talbott  wrote:
> >>>
> >>> I have multiple NAS servers I've moved to OmniOS and each of them have
> 90-180 4T disks. Everything has worked out pretty well for the most part.
> But I've come into an issue where when I reboot any of them, I'm getting
> boot-archive service timeouts happening. I found a workaround of increasing
> the timeout value which brings me to the following. As you can see below in
> a dmesg output, it's taking the kernel about 8 seconds to detect each of
> the drives. They're connected via a couple SAS2008 based LSI cards.
> >>>
> >>> Is this normal?
> >>> Is there a way to speed that up?
> >>>
> >>> I've fixed my frustrating boot-archive timeout problem by adjusting
> the timeout value from the default of 60 seconds (I guess that'll work ok
> on systems with less than 8 drives?) to 8 seconds * 90 drives + a little
> extra time = 280 seconds (for the 90 drive systems). Which means it takes
> between 12-24 minutes to boot those machines up.
> >>>
> >>> # svccfg -s boot-archive setprop start/timeout_seconds=280
> >>>
> >>> I figure I can't be the only one. A little googling also revealed:
> >>> https://www.illumos.org/issues/4614
> >>> 
> >>>
> >>> Jul 17 15:40:15 store2 genunix: [ID 583861 kern.info] sd29 at
> >>> mpt_sas3: unit-address w5c0f0401bd43,0: w5c0f0401bd43,0 Jul
> >>> 17 15:40:15 store2 genunix: [ID 936769 kern.info] sd29 is
> >>> /pci@0,0/pci8086,e06@2,2/pci1000,3080@0/iport

Re: [OmniOS-discuss] Zil Device

2015-07-16 Thread Schweiss, Chip
The 850 Pro should never be used as a log device.  It does not have power
fail protection of its ram cache.   You might as well set sync=disabled and
skip using a log device entirely because the 850 Pro is not protecting your
last transactions in case of power failure.

Only SSDs with power failure protection should be considered for log
devices.

That being said, unless your running application that need transaction
consistency such as databases, don't bother with using a log device and set
sync=disabled.

-Chip

On Thu, Jul 16, 2015 at 11:55 AM, Doug Hughes  wrote:

> 8GB zil on very active server and 100+GB ssd lasts many years. We have
> yet, after years of use of various SSDs, to have one fail from wear usage,
> and that's with fairly active NFS use.
> They usually fail for other reasons.
> We started with with Intel X series, which are only 32GB in size, and some
> of them are still active, though less active use now. With Samsung 850 pro,
> you practically don't have to worry about it, and the price is really good.
>
>
> On Thu, Jul 16, 2015 at 12:36 PM, Brogyányi József 
> wrote:
>
>>  Hi Doug
>>
>> Can you write its life time? I don't trust any SSD but I've thinking for
>> a while to use as a ZIL+L2ARC.
>> Could you share with us your experiences? I would be interested in server
>> usage. Thanks.
>>
>>
>>
>> 2015.07.15. 22:42 keltezéssel, Doug Hughes írta:
>>
>> We have been preferring commodity SSD like Intel 320 (older), intel 710,
>> or currently, Samsung 850 pro. We also use it as boot drive and reserve an
>> 8GB slide for ZIL so that massive synchronous NFS IOPS are manageable.
>>
>> Sent from my android device.
>>
>> -Original Message-
>> From: Matthew Lagoe 
>> 
>> To: omnios-discuss@lists.omniti.com
>> Sent: Wed, 15 Jul 2015 16:29
>> Subject: [OmniOS-discuss] Zil Device
>>
>>  Is the zeusram SSD still the big zil device out there or are there other
>> high performance reliable options that anyone knows of on the market now?
>> I
>> can't go with like the DDRdrive as its pcie.
>>
>> Thanks
>>
>>
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>>
>> ___
>> OmniOS-discuss mailing 
>> listOmniOS-discuss@lists.omniti.comhttp://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>>
>>
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] re-tune round-robin reading from a mirror

2015-07-15 Thread Schweiss, Chip
This is a very interesting idea.  It could allow for the creation of a
scratch pool with great ROI.   I have a particular need for an extremely
high read rate pool for data analysis that will leverage the fattest read
optimized SSDs I can get for the dollar.  I was considering this to be
raidz1, but if mirroring with disks would work this way, it could be even
better bang for the buck.

Did you ever do any actual testing with this type of setup?  I'd love to
see some real world performance data.

-Chip

On Wed, Jul 15, 2015 at 10:09 AM, Jim Klimov  wrote:

> 15 июля 2015 г. 14:10:15 CEST, Michael Mounteney 
> пишет:
> >Hello list;  is it possible with OmniOS to have a multi-way mirror with
> >one disk being an SSD and the rest magnetic;  then to tune ZFS to
> >perform all reads from the SSD?  for the sake of performance.  The
> >default case is round-robin reading, which is the best if all disks are
> >of approximately equal performance, especially if they're on separate
> >controllers.  But SSD changes that.
> >
> >__
> >Michael Mounteney
> >___
> >OmniOS-discuss mailing list
> >OmniOS-discuss@lists.omniti.com
> >http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
> When I last asked a few years ago (but IIRC forma mirror of local+iSCSI
> vdevs), the answer was along the lines that round-robin first considers the
> available devices. If the faster (ssd, local) device has no queue, it gets
> the load while the slower device still struggles with the task it has, so
> on average the faster device serves more io's - but not 100%. Queue depth
> tuning can also help here.
>
> Jim
>
> --
> Typos courtesy of K-9 Mail on my Samsung Android
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] big zfs storage?

2015-07-13 Thread Schweiss, Chip
Liam,

This report is encouraging.  Please share some details of your
configuration.   What disk failure parameters are have you set?   Which
JBODs and disks are you running?

I have mostly DataON JBODs and a some Supermicro.   DataON has PMC SAS
expanders and Supermicro has LSI, both setups have pretty much the same
behavior with disk failures.   All my servers are Supermicro with LSI HBAs.

If there's a magic combination of hardware and OS config out there that
solves the disk failure panic problem, I will certainly change my builds
going forward.

-Chip

On Fri, Jul 10, 2015 at 1:04 PM, Liam Slusser  wrote:

> I have two 800T ZFS systems on OmniOS and a bunch of smaller <50T
> systems.  Things generally work very well.  We loose a disk here and there
> but its never resulted in downtime.  They're all on Dell hardware with LSI
> or Dell PERC controllers.
>
> Putting in smaller disk failure parameters, so disks fail quicker, was a
> big help when something does go wrong with a disk.
>
> thanks,
> liam
>
>
> On Fri, Jul 10, 2015 at 10:31 AM, Schweiss, Chip 
> wrote:
>
>> Unfortunately for the past couple years panics on disk failure has been
>> the norm.   All my production systems are HA with RSF-1, so at least things
>> come back online relatively quick.  There are quite a few open tickets in
>> the Illumos bug tracker related to mpt_sas related panics.
>>
>> Most of the work to fix these problems has been committed in the past
>> year, though problems still exist.  For example, my systems are dual path
>> SAS, however, mpt_sas will panic if you pull a cable instead of dropping a
>> path to the disks.  Dan McDonald is actively working to resolve this.   He
>> is also pushing a bug fix in genunix from Nexenta that appears to fix a lot
>> of the panic problems.   I'll know for sure in a few months after I see a
>> disk or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta is
>> responsible for most of the updates to mpt_sas including support for 3008
>> (12G SAS).
>>
>> I haven't run any 12G SAS yet, but plan to on my next build in a couple
>> months.   This will be about 300TB using an 84 disk JBOD.  All the code
>> from Nexenta to support the 3008 appears to be in Illumos now, and they
>> fully support it so I suspect it's pretty stable now.  From what I
>> understand there may be some 12G performance fixes coming sometime.
>>
>> The fault manager is nice when the system doesn't panic.  When it panics,
>> the fault manger never gets a chance to take action.  It is still the
>> consensus that is is better to run pools without hot spares because there
>> are situations the fault manager will do bad things.   I witnessed this
>> myself when building a system and the fault manger replaced 5 disks in a
>> raidz2 vdev inside 1 minute, trashing the pool.   I haven't completely
>> yield to the "best practice".  I now run one hot spare per pool.  I figure
>> with raidz2, the odds of the fault manager causing something catastrophic
>> is much less possible.
>>
>> -Chip
>>
>>
>>
>> On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley 
>> wrote:
>>
>>>  I have to build and maintain my own system. I usually help others
>>> build(i teach zfs and freenas classes/consulting). I really love fault
>>> management in solaris and miss it. Just thought since it's my system and I
>>> get to choose I would use omni. I have 20+ years using solaris and only 2
>>> on freebsd.
>>>
>>> I like freebsd for how well tuned for zfs oob. I miss the network, v12n
>>> and resource controls in solaris.
>>>
>>> Concerned about panics on disk failure. Is that common?
>>>
>>>
>> linda
>>>
>>>
>>> On 7/9/15 9:30 PM, Schweiss, Chip wrote:
>>>
>>>   Linda,
>>>
>>>  I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
>>> which is considered the best choice for HBAs.
>>>
>>> Illumos leaves a bit to be desired with handling faults from disks or
>>> SAS problems, but things under OmniOS have been improving, much thanks to
>>> Dan McDonald and OmniTI.   We have a paid support on all of our production
>>> systems with OmniTI.  Their response and dedication has been very good.
>>> Other than the occasional panic and restart from a disk failure, OmniOS has
>>> been solid.   ZFS of course never has lost a single bit of information.
>>>
>>>  I'd be curious why you're looking to move, have there been specific
>>> problems unde

Re: [OmniOS-discuss] big zfs storage?

2015-07-10 Thread Schweiss, Chip
Unfortunately for the past couple years panics on disk failure has been the
norm.   All my production systems are HA with RSF-1, so at least things
come back online relatively quick.  There are quite a few open tickets in
the Illumos bug tracker related to mpt_sas related panics.

Most of the work to fix these problems has been committed in the past year,
though problems still exist.  For example, my systems are dual path SAS,
however, mpt_sas will panic if you pull a cable instead of dropping a path
to the disks.  Dan McDonald is actively working to resolve this.   He is
also pushing a bug fix in genunix from Nexenta that appears to fix a lot of
the panic problems.   I'll know for sure in a few months after I see a disk
or two drop if it truly fixes things.  Hans Rosenfeld at Nexenta is
responsible for most of the updates to mpt_sas including support for 3008
(12G SAS).

I haven't run any 12G SAS yet, but plan to on my next build in a couple
months.   This will be about 300TB using an 84 disk JBOD.  All the code
from Nexenta to support the 3008 appears to be in Illumos now, and they
fully support it so I suspect it's pretty stable now.  From what I
understand there may be some 12G performance fixes coming sometime.

The fault manager is nice when the system doesn't panic.  When it panics,
the fault manger never gets a chance to take action.  It is still the
consensus that is is better to run pools without hot spares because there
are situations the fault manager will do bad things.   I witnessed this
myself when building a system and the fault manger replaced 5 disks in a
raidz2 vdev inside 1 minute, trashing the pool.   I haven't completely
yield to the "best practice".  I now run one hot spare per pool.  I figure
with raidz2, the odds of the fault manager causing something catastrophic
is much less possible.

-Chip



On Fri, Jul 10, 2015 at 11:37 AM, Linda Kateley 
wrote:

>  I have to build and maintain my own system. I usually help others build(i
> teach zfs and freenas classes/consulting). I really love fault management
> in solaris and miss it. Just thought since it's my system and I get to
> choose I would use omni. I have 20+ years using solaris and only 2 on
> freebsd.
>
> I like freebsd for how well tuned for zfs oob. I miss the network, v12n
> and resource controls in solaris.
>
> Concerned about panics on disk failure. Is that common?
>
>
linda
>
>
> On 7/9/15 9:30 PM, Schweiss, Chip wrote:
>
>   Linda,
>
>  I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
> which is considered the best choice for HBAs.
>
> Illumos leaves a bit to be desired with handling faults from disks or SAS
> problems, but things under OmniOS have been improving, much thanks to Dan
> McDonald and OmniTI.   We have a paid support on all of our production
> systems with OmniTI.  Their response and dedication has been very good.
> Other than the occasional panic and restart from a disk failure, OmniOS has
> been solid.   ZFS of course never has lost a single bit of information.
>
>  I'd be curious why you're looking to move, have there been specific
> problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS, but of
> course the skeletons in the closet never seem to come out until you do
> something big.
>
>  -Chip
>
> On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley 
> wrote:
>
>> Hey is there anyone out there running big zfs on omni?
>>
>> I have been doing mostly zol and freebsd for the last year but have to
>> build a 300+TB box and i want to come back home to roots(solaris). Feeling
>> kind of hesitant :) Also, if you had to do over, is there anything you
>> would do different.
>>
>> Also, what is the go to HBA these days? Seems like i saw stable code for
>> lsi 3008?
>>
>> TIA
>>
>> linda
>>
>>
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
>
> --
> Linda Kateley
> Kateley Company
> Skype ID-kateleycohttp://kateleyco.com
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] big zfs storage?

2015-07-09 Thread Schweiss, Chip
Linda,

I have 3.5 PB running under OmniOS.  All my systems have LSI 2108 HBAs
which is considered the best choice for HBAs.

Illumos leaves a bit to be desired with handling faults from disks or SAS
problems, but things under OmniOS have been improving, much thanks to Dan
McDonald and OmniTI.   We have a paid support on all of our production
systems with OmniTI.  Their response and dedication has been very good.
Other than the occasional panic and restart from a disk failure, OmniOS has
been solid.   ZFS of course never has lost a single bit of information.

I'd be curious why you're looking to move, have there been specific
problems under BSD or ZoL?  I've been slowly evaluating FreeBSD ZFS, but of
course the skeletons in the closet never seem to come out until you do
something big.

-Chip

On Thu, Jul 9, 2015 at 4:21 PM, Linda Kateley  wrote:

> Hey is there anyone out there running big zfs on omni?
>
> I have been doing mostly zol and freebsd for the last year but have to
> build a 300+TB box and i want to come back home to roots(solaris). Feeling
> kind of hesitant :) Also, if you had to do over, is there anything you
> would do different.
>
> Also, what is the go to HBA these days? Seems like i saw stable code for
> lsi 3008?
>
> TIA
>
> linda
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Highly Available ZFS

2015-06-29 Thread Schweiss, Chip
On Mon, Jun 29, 2015 at 1:52 PM, Michael Rasmussen  wrote:

> Does anybody have an idea of how Nexenta does their HA-setup?
>
> My guess is that it must involve something with a constant snapshot of
> the pool using zfs send combined with forced import.
>

Nexenta uses RSF-1 from HighAvailability.com.  It is dual servers connected
to SAS devices.  Pools are exported either gracefully or by force on one
host and imported on the other.   A floating IP address allows the clients
to maintain connectivity.

I use RSF-1 with OmniOS.   It works well, but HA in general has a steep
learning curve and A LOT of gotchas that are not well documented anywhere.

It took me about a year of learning until HA started actually increasing my
storage availability.  The price of RSF-1 is well justified if you don't
have a lot experience with HA on ZFS.   Eventually I will attempt HA
without it, but in the mean time it is serving me well.

-Chip


>
> --
> Hilsen/Regards
> Michael Rasmussen
>
> Get my public GnuPG keys:
> michael  rasmussen  cc
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
> mir  datanom  net
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
> mir  miras  org
> http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
> --
> /usr/games/fortune -es says:
> Ego sum ens omnipotens.
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Zpool export while resilvering?

2015-06-10 Thread Schweiss, Chip
On Wed, Jun 10, 2015 at 4:09 AM, Robert A. Brock <
robert.br...@2hoffshore.com> wrote:

>  What did you use to flash them? Fwflash just gives an error about
> firmware file being too large.
>

Santools.

-Chip

>
>
> *From:* Schweiss, Chip [mailto:c...@innovates.com]
> *Sent:* 09 June 2015 21:25
> *To:* Robert A. Brock
> *Cc:* omnios-discuss
> *Subject:* Re: [OmniOS-discuss] Zpool export while resilvering?
>
>
>
> I went through this problem a while back.  There are some gotchas in
> getting them back online and firmware upgraded.   The is will not talk to
> the drive until it has its firmware upgraded or cleared from the fault
> database.
>
> This drives will not flash with multipath enabled either.
>
> I ended up clearing the fault manager's database, disabling it and
> disconnecting half the SAS cables to get them flashed.
>
> -Chip
>
>
>
> 
>  2H Offshore Engineering Ltd | Registered in England No. 02790139 |
> Registered office: Ferryside, Ferry Road, Norwich NR1 1SW.
>
>
>  2H Offshore is an Acteon company specializing in the design, monitoring
> and integrity management of offshore riser and conductor systems. Acteon is
> a group of specialist international engineering companies serving the
> offshore oil and gas industry. Its focus is on subsea services spanning the
> entire life of field. For more information, visit www.acteon.com
>
>
>  The information in and/or accompanying this email is intended for the
> use of the stated recipient only and may be confidential and/or privileged.
> It should not be forwarded or copied nor should its contents be disclosed
> in any manner without the express consent of the sender/author. Any views
> or opinions presented are solely those of the author and do not necessarily
> represent those of 2H.
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Zpool export while resilvering?

2015-06-09 Thread Schweiss, Chip
I went through this problem a while back.  There are some gotchas in
getting them back online and firmware upgraded.   The is will not talk to
the drive until it has its firmware upgraded or cleared from the fault
database.

This drives will not flash with multipath enabled either.

I ended up clearing the fault manager's database, disabling it and
disconnecting half the SAS cables to get them flashed.

-Chip
On Jun 9, 2015 2:32 PM, "Robert A. Brock" 
wrote:

>  They are failed as far as OmniOS is concerned, from what I can tell:
>
>
>
> Jun 08 01:08:54 710768e8-2f2b-4b3d-9d4b-a85ef5617219  DISK-8000-12   Major
>
>
>
> Host: 2hus291
>
> Platform: S5500BC   Chassis_id  : 
>
> Product_sn  :
>
>
>
> Fault class : fault.io.disk.over-temperature
>
> Affects : dev:///:devid=id1,sd@n5000c5007242271f
> //scsi_vhci/disk@g5000c5007242271f
>
>   faulted and taken out of service
>
> FRU : "Slot 21"
> (hc://:product-id=LSI-SAS2X36:server-id=:chassis-id=500304800033213f:serial=S1Z02A8MK4361NF4:part=SEAGATE-ST4000NM0023:revision=0003/ses-enclosure=1/bay=20/disk=0)
>
>   faulty
>
>
>
> Description : A disk's temperature exceeded the limits established by
>
>   its manufacturer.
>
>   Refer to http://illumos.org/msg/DISK-8000-12 for more
>
>   information.
>
>
>
> root@2hus291:/root# zpool status pool0
>
>   pool: pool0
>
> state: DEGRADED
>
> status: One or more devices is currently being resilvered.  The pool will
>
> continue to function, possibly in a degraded state.
>
> action: Wait for the resilver to complete.
>
>   scan: resilver in progress since Tue Jun  9 11:11:16 2015
>
> 18.8T scanned out of 91.7T at 667M/s, 31h48m to go
>
> 591G resilvered, 20.55% done
>
> config:
>
>
>
> NAME STATE READ WRITE CKSUM
>
> pool0DEGRADED 0 0 0
>
>   raidz2-0   DEGRADED 0 0 0
>
> c0t5000C50055ECA49Bd0ONLINE   0 0 0
>
> c0t5000C50055ECA4B3d0ONLINE   0 0 0
>
> c0t5000C50055ECA587d0ONLINE   0 0 0
>
> c0t5000C50055ECA6CFd0ONLINE   0 0 0
>
> c0t5000C50055ECA7F3d0ONLINE   0 0 0
>
> spare-5  REMOVED  0 0 0
>
>   c0t5000C5007242271Fd0  REMOVED  0 0 0
>
>   c0t5000C50055EF8A6Fd0  ONLINE   0 0 0
> (resilvering)
>
> c0t5000C50055ECAB23d0ONLINE   0 0 0
>
> c0t5000C50055ECABABd0ONLINE   0 0 0
>
>   raidz2-1   ONLINE   0 0 0
>
> c0t5000C50055EE9D87d0ONLINE   0 0 0
>
> c0t5000C50055EE9E43d0ONLINE   0 0 0
>
> c0t5000C50055EEA5ABd0ONLINE   0 0 0
>
> c0t5000C50055EEBA5Fd0ONLINE   0 0 0
>
> c0t5000C50055EEC1E3d0ONLINE   0 0 0
>
> c0t5000C500636670BFd0ONLINE   0 0 0
>
> c0t5000C50055EF8CBBd0ONLINE   0 0 0
>
> c0t5000C50055EF8D33d0ONLINE   0 0 0
>
>   raidz2-2   ONLINE   0 0 0
>
> c0t5000C50055F7942Fd0ONLINE   0 0 0
>
> c0t5000C50055F79E03d0ONLINE   0 0 0
>
> c0t5000C50055F7A8DFd0ONLINE   0 0 0
>
> c0t5000C50055F81C1Bd0ONLINE   0 0 0
>
> c0t5000C5005604A42Bd0ONLINE   0 0 0
>
> c0t5000C5005604A487d0ONLINE   0 0 0
>
> c0t5000C5005604A74Bd0ONLINE   0 0 0
>
> c0t5000C5005604A91Bd0ONLINE   0 0 0
>
>   raidz2-4   DEGRADED 0 0 0
>
> c0t5000C500562ED6A3d0ONLINE   0 0 0
>
> c0t5000C500562F8DEFd0ONLINE   0 0 0
>
> c0t5000C500562F92D7d0ONLINE   0 0 0
>
> c0t5000C500562FA0DFd0ONLINE   0 0 0
>
> c0t5000C500636679EBd0ONLINE   0 0 0
>
> spare-5  DEGRADED 0 014
>
>   c0t5000C50057FBB127d0  REMOVED  0 0 0
>
>   c0t5000C5006366906Bd0  ONLINE   0 0 0
>
> c0t5000C5006366808Fd0ONLINE   0 0 0
>
> spare-7  REMOVED  0 0 0
>
>   c0t5000C50057FC84F3d0  REMOVED  0 0 0
>
>   c0t5000C50063669937d0  ONLINE   0 0 0
>
> logs
>
>   mirror-3   ONLINE   0 0 0
>
> c13t5003048000308398d0   ONLINE   0 0 0
>
> c13t

Re: [OmniOS-discuss] Backing up HUGE zfs volumes

2015-05-21 Thread Schweiss, Chip
I would caution against anything using 'zfs diff'  It has been perpetually
broken, either not working at all, or returning incomplete information.

Avoiding crawling the directory is pretty much impossible unless you use
'zfs send'.   However, as long as there is enough cache on the system,
directory crawls can be very efficient.I have daily rsync jobs that
crawl over 200 million files.   The impact of the crawl is not noticeable
to other users.

I has also used ZFS send to AWS Glacier.   This worked well until the data
change rate got high enough I need to start over too often to keep the
storage size reasonable on Glacier.

I also use CrashPlan on my home OmniOS server to back up about 5TB.  It
works very nicely.

-Chip

On Wed, May 20, 2015 at 6:51 PM, Michael Talbott  wrote:

> I'm trying to find ways of efficiently archiving up some huge (120TB and
> growing) zfs volumes with millions maybe billions of files of all sizes. I
> use zfs send/recv for replication to another box for tier 1/2 recovery.
> But, I'm trying to find a good open source solution that runs on Omni for
> archival purposes that doesn't have to crawl the filesystem or rely on any
> proprietary formats.
>
> I was thinking I could use zfs diff to get a list of changed data, parse
> that into a usable format, create a tar and par of the data, and an
> accompanying plain text index file. From there, upload that set of data to
> a cloud provider. While I could probably script it all out myself to
> accomplish this, I'm hoping someone knows of an existing solution that can
> produce somewhat similar results.
>
> Ideas anyone?
>
> Thanks,
>
> Michael
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] disk failure causing reboot?

2015-05-18 Thread Schweiss, Chip
I had the exact same failure mode last week.  With over 1000 spindles I see
this about once a month.

I can publish my dump also if anyone actually want's to try to fix this
problem, but I think there are several of the same thing already linked to
tickets in Illumos-gate.

Pools for the most part should be set to failmode=panic or wait, but a
failed disk should not cause a panic.   The system this happened to me on
failmode was set to wait.  It is also on r151012, waiting on a window to
upgrade to r151014.  My pool is raidz3, so no reason not to kick a bad disk.

All my disks are SAS in DataON JBODs, dual connected across two LSI
HBAs.BTW, pull a SAS cable and you get a panic too, not degraded
multipath.Illumos seems to panic on just about any SAS event these days
regardless of redundancy.

-Chip











On Mon, May 18, 2015 at 3:08 PM, Paul B. Henson  wrote:

> On Mon, May 18, 2015 at 06:25:34PM +, Jeff Stockett wrote:
> > A drive failed in one of our supermicro 5048R-E1CR36L servers running
> > omnios r151012 last night, and somewhat unexpectedly, the whole system
> > seems to have panicked.
>
> You don't happen to have failmode set to panic on the pool?
>
> From the zpool manpage:
>
>failmode=wait | continue | panic
>Controls the system behavior in the event of catastrophic pool
>failure. This condition is typically a result of a loss of
>connectivity to the underlying storage device(s) or a failure of
>all devices within the pool. The behavior of such an event is
>determined as follows:
>
>wait
>Blocks all I/O access until the device connectivity
> is
>recovered and the errors are cleared. This is the
>default behavior.
>
>continue
>Returns EIO to any new write I/O requests but allows
>reads to any of the remaining healthy devices. Any
>write requests that have yet to be committed to disk
>would be blocked.
>
>panic
>Prints out a message to the console and generates a
>system crash dump.
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] High density 2.5" chassis

2015-05-09 Thread Schweiss, Chip
I have an SSD server in one of those chassis.  Here's a write-up about it
on my blog, there are 3 postings about it.

http://www.bigdatajunkie.com/index.php/9-solaris/zfs/10-short-stroking-consumer-ssds

Not necessarily a build for everyone, but it has been absolutely awesome
for our use. After a few bumps at the beginning and giving up on HA on this
server, it has been rock solid.  Many will swear against the interposers,
but combined with Samsung SSDs they have worked very well.

-Chip


On Sat, May 9, 2015 at 1:06 PM, Chris Nagele  wrote:

> Hi all. Continuing on my all SSD discussion, I am looking for some
> recommendations on a new Supermicro
> chassis for our file servers. So far I have been looking at this
> thing:
>
> http://www.supermicro.com/products/chassis/4U/417/SC417E16-R1400LP.cfm
>
> Does anyone have experience with this? If so, what would you recommend
> for a motherboard and HBA to support all of the disks? We've
> traditionally used the X9DRD-7LN4F-JBOD or the X9DRi-F with a LSI
> 9211-8i HBA.
>
> Thanks,
> Chris
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] What repos do people use to build a *AMP server?

2015-05-08 Thread Schweiss, Chip
I've done really well with the OpenCSW packages on OmniOS.

-Chip
On May 8, 2015 11:50 AM, "Saso Kiselkov"  wrote:

> I've decided to try and update my r151006 box to something newer, seeing
> as r151014 just came out and it's supposed to be LTS. Trouble is, I'm
> trying to build a *AMP box and I can't find any prebuilt packages for it
> in any of these repos:
> http://omnios.omniti.com/wiki.php/Packaging
> What do you guys use for getting pre-built software? Do all people here
> just roll their own?
>
> Also, allow me to say, I *hate* consolidations and the way they lock
> accessible package versions. Where are the days when OSes used to be
> backwards-compatible?
>
> Cheers,
> --
> Saso
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] slow ssh login, maybee to many locales?

2015-04-20 Thread Schweiss, Chip
Slow ssh logon is almost always reverse DNS problems on the server side.
Adding the client to the server's /etc/hosts will usually resolve the
problem.

-Chip

On Sat, Apr 18, 2015 at 10:37 PM, PÁSZTOR György <
pasz...@sagv5.gyakg.u-szeged.hu> wrote:

> Hi,
>
> I faced with that, the login onto my new omnios zone is slow.
> I tried to debug.
> Many of the symptoms seemed pretty the same as this:
>
> http://broken.net/uncategorized/resolving-slow-ssh-login-performance-problems-on-openindiana/
>
> Also in my case it stopped at the same point: after the kexinit sent.
>
> However, on my omnios, the cryptadm list showed this:
> pasztor@omni:~$ cryptoadm list
>
> User-level providers:
> Provider: /usr/lib/security/$ISA/pkcs11_kernel.so
> Provider: /usr/lib/security/$ISA/pkcs11_softtoken.so
>
> Kernel software providers:
> des
> aes
> arcfour
> blowfish
> ecc
> sha1
> sha2
> md4
> md5
> rsa
> swrand
>
> Kernel hardware providers:
> [---end of output]
>
> So, in my case it did not contained the tpm module.
> I tried the opposite: enabling the tpm module, but nothing changed.
> (Maybe it become even slower. I did not count the seconds)
> So, I rewert it back, and run the same truss command, which revealed this:
> There are tons's of file openings in the /usr/lib/locale dir at that point:
>
> 24560:   8.8232 stat("/usr/lib/locale/is_IS.UTF-8", 0x08047D48) = 0
> 24560:   8.8234 open("/usr/lib/locale//is_IS.UTF-8/LC_CTYPE/LCL_DATA",
> O_RDONLY) = 7
> 24560:   8.8236 fstat(7, 0x08047658)= 0
> 24560:   8.8237 mmap(0x, 94904, PROT_READ, MAP_PRIVATE, 7, 0) =
> 0xFEDE7000
> 24560:   8.8238 close(7)= 0
> ...
> 24560:  14.5883
> open("/usr/lib/locale//el_GR.ISO8859-7/LC_MESSAGES/LCL_DATA", O_RDONLY) = 7
> 24560:  14.5884 fstat(7, 0x08047678)= 0
> 24560:  14.6061 read(7, " ^ ( ( [EDCD ] ( [E1C1 ]".., 82)   = 82
> 24560:  14.6063 close(7)= 0
> 24560:  14.6065 getdents64(5, 0xFEE04000, 8192) = 0
> 24560:  14.6069 ioctl(1, TCGETA, 0x08046DBE)Err#22
> EINVAL
> 24560:  14.6069 fstat64(1, 0x08046E00)  = 0
> 24560:  14.6070 brk(0x080689D0) = 0
> 24560:  14.6071 brk(0x0806A9D0) = 0
> 24560:  14.6072 fstat64(1, 0x08046D00)  = 0
> 24560:  14.6074 close(5)= 0
> 24560:  14.6075 write(1, " C\n P O S I X\n a f _ Z".., 2891)= 2891
> 24556:  14.6076 read(3, " C\n P O S I X\n a f _ Z".., 5120) = 2891
> 24560:  14.6077 _exit(0)
> 24556:  14.6080 brk(0x080D0488) = 0
> 24556:  14.6082 brk(0x080D2488) = 0
> 24556:  14.6083 read(3, 0x080CD544, 5120)   = 0
> 24556:  14.6084 llseek(3, 0, SEEK_CUR)  Err#29
> ESPIPE
> 24556:  14.6085 close(3)= 0
> 24556:  14.6296 waitid(P_PID, 24560, 0x080473F0, WEXITED|WTRAPPED) = 0
>
> So, does somebody knows what is happening at that point,
> why,
> and how can I "fine-tune" it?
>
> Kind regards,
> György Pásztor
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] esxi 5.5 to omnios r151014 nfs server issue

2015-04-06 Thread Schweiss, Chip
On Mon, Apr 6, 2015 at 9:30 AM, Dan McDonald  wrote:

>
> > On Apr 6, 2015, at 5:50 AM, Hafiz Rafiyev  wrote:
> >
> >
> > only log I see from omnios side is:
> >
> > nfs4cbd[468]: [ID 867284 daemon.notice] nfsv4 cannot determine local
> hostname binding for transport tcp6 - delegations will not be available on
> this transport
>
> Are you having DNS problems?
>
> This error is in an unchanged subsystem, the NFSv4 callback daemon.  The
> error looks like something caused by a naming-services failure.
>

I'd say this is a red-herring.  ESXi 5.5 will only use NFSv3.  However, DNS
resolution is critical for ESXi NFS mounts even when mounting via IP
address.

I always put host entries in /etc/hosts on each ESXi host for all other
hosts, vCenter and NFS servers.   The same on the NFS server side.  I
learned this years ago on a 3 AM call to VMware support. :)

-Chip



>
> I'll forward your note along to an illumos-community NFS expert, I may
> find out more.
>
> Dan
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Fwd: All SSD pool advice

2015-04-06 Thread Schweiss, Chip
On Mon, Apr 6, 2015 at 8:53 AM, Fábio Rabelo 
wrote:

> Sorry, forget to forward to the list ...
>
>
> -- Forwarded message --
> From: Fábio Rabelo 
> Date: 2015-04-06 10:51 GMT-03:00
> Subject: Re: [OmniOS-discuss] All SSD pool advice
> To: Chris Nagele 
>
>
> I never get my hands at that 4U model ...
>
> I have 2 of this babys in a customer of mine :
>
> http://www.supermicro.com/products/chassis/2U/216/SC216BA-R1K28LP.cfm
>
> Each one with 24 1TB Samsung 850PRO for a litle over an year,
> OminOS+Napp-it , no issue whatsoever ...
>
> Expanded Chassis brings me lots and lots of headaches  ...
>

The system I've built with interposers has SAS expanders and gives me no
problems.  Samsung SSDs are the only SSD I've found that works well with
the interposer.

-Chip

>
>
> Fábio Rabelo
>
> 2015-04-06 10:41 GMT-03:00 Chris Nagele :
>
> Thanks everyone. Regarding the expanders, our 4U servers are on the
>> following chassis:
>>
>> http://www.supermicro.com/products/chassis/4U/846/SC846E16-R1200.cfm
>>
>> We are using all SAS disks, except for the SSDs. How big is the risk
>> here when it comes to SAS -> SATA conversion? Our newer servers have
>> direct connections on each lane to the disk.
>>
>> Chris
>>
>> Chris Nagele
>> Co-founder, Wildbit
>> Beanstalk, Postmark, dploy.io
>>
>>
>> On Sat, Apr 4, 2015 at 7:18 PM, Doug Hughes  wrote:
>> >
>> > We have a couple of machines with all SSD pool (~6-10 Samsung 850 pro
>> is the
>> > current favorite). They work great for IOPS. Here's my take.
>> > 1) you don't need a dedicated zil. Just let the zpool intersperse it
>> amongst
>> > the existing zpool devices. They are plenty fast enough.
>> > 2) you don't need an L2arc for the same reason. a smaller number of
>> > dedicated devices would likely cause more of a bottleneck than serving
>> off
>> > the existing pool devices (unless you were to put it on one of those
>> giant
>> > RDRAM things or similar, but that adds a lot of expense)
>> >
>> >
>> >
>> >
>> >
>> > On 4/4/2015 3:07 PM, Chris Nagele wrote:
>> >
>> > We've been running a few 4U Supermicro servers using ZeusRAM for zil and
>> > SSDs for L2. The main disks are regular 1TB SAS.
>> >
>> > I'm considering moving to all SSD since the pricing has dropped so much.
>> > What things should I know or do when moving to all SSD pools? I'm
>> assuming I
>> > don't need L2 and that I should keep the ZeusRAM. Should I only use
>> certain
>> > types of SSDs?
>> >
>> > Thanks,
>> > Chris
>> >
>> >
>> > --
>> >
>> > Chris Nagele
>> > Co-founder, Wildbit
>> > Beanstalk, Postmark, dploy.io
>> >
>> >
>> >
>> > ___
>> > OmniOS-discuss mailing list
>> > OmniOS-discuss@lists.omniti.com
>> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> >
>> >
>> >
>> > ___
>> > OmniOS-discuss mailing list
>> > OmniOS-discuss@lists.omniti.com
>> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> >
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] All SSD pool advice

2015-04-06 Thread Schweiss, Chip
On Mon, Apr 6, 2015 at 8:41 AM, Chris Nagele  wrote:

> Thanks everyone. Regarding the expanders, our 4U servers are on the
> following chassis:
>
> http://www.supermicro.com/products/chassis/4U/846/SC846E16-R1200.cfm
>
> We are using all SAS disks, except for the SSDs. How big is the risk
> here when it comes to SAS -> SATA conversion? Our newer servers have
> direct connections on each lane to the disk.
>

There are A LOT of opinions on this.  What I have done that has worked
extremely well  was use 70 Samsung 840 Pro SSD with LSI interposers in
Supermicro chassis.  There were a couple early failures of the interposers
but it has been rock solid ever since.   mpt_sas blue chunks and panic'd
the system on one.  On another one I caught it in action and doing a 'zpool
offline {device}' kept everything running without a hitch.

I run with ZIL off because this is used entirely for scratch data and
virtual machines that can be redeployed in minutes.   It would be sync safe
with the addition of some good log devices.

I'm not sure if the interposers increased stability or it has simply been
the quality of the Samsung SSD.

-Chip



> Chris
>
> Chris Nagele
> Co-founder, Wildbit
> Beanstalk, Postmark, dploy.io
>
>
> On Sat, Apr 4, 2015 at 7:18 PM, Doug Hughes  wrote:
> >
> > We have a couple of machines with all SSD pool (~6-10 Samsung 850 pro is
> the
> > current favorite). They work great for IOPS. Here's my take.
> > 1) you don't need a dedicated zil. Just let the zpool intersperse it
> amongst
> > the existing zpool devices. They are plenty fast enough.
> > 2) you don't need an L2arc for the same reason. a smaller number of
> > dedicated devices would likely cause more of a bottleneck than serving
> off
> > the existing pool devices (unless you were to put it on one of those
> giant
> > RDRAM things or similar, but that adds a lot of expense)
> >
> >
> >
> >
> >
> > On 4/4/2015 3:07 PM, Chris Nagele wrote:
> >
> > We've been running a few 4U Supermicro servers using ZeusRAM for zil and
> > SSDs for L2. The main disks are regular 1TB SAS.
> >
> > I'm considering moving to all SSD since the pricing has dropped so much.
> > What things should I know or do when moving to all SSD pools? I'm
> assuming I
> > don't need L2 and that I should keep the ZeusRAM. Should I only use
> certain
> > types of SSDs?
> >
> > Thanks,
> > Chris
> >
> >
> > --
> >
> > Chris Nagele
> > Co-founder, Wildbit
> > Beanstalk, Postmark, dploy.io
> >
> >
> >
> > ___
> > OmniOS-discuss mailing list
> > OmniOS-discuss@lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >
> >
> >
> > ___
> > OmniOS-discuss mailing list
> > OmniOS-discuss@lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
> >
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] Best infrastructure for VSphere/Hyper-V

2015-04-02 Thread Schweiss, Chip
On Apr 2, 2015 11:50 AM, "Nate Smith"  wrote:
>
> So going over the forum over the last month, it appears more than a
couple people have had problem with Omnios as a storage backend for
virtualization platforms, both as iSCSI targets and as Fibre Channel
targets. Looking at a list of possible alternatives, what infrastructure
works well?
>
> Is it limited to NFS on VSphere, or is there some way I can get this
working with Hyper-V (which would be vastly preferable due to licensing
advantages for me)?
>
I run OmniOS with NFS for vSphere.  It works very good.

One bit of disclosure all my VM storage is SSD, however ZFS with
compression makes SSD goes a lot further.  I also use linked clones using
the vsphere api.

I have 250 VMS running on 5 TB of SSD.  Performance is awesome for every VM.

-Chip

> -Nate
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] best or preferred 10g card for OmniOS

2015-03-29 Thread Schweiss, Chip
On Sun, Mar 29, 2015 at 8:51 AM, Matthew Lagoe 
wrote:

> The intel cards are nice but they don't have any cx4 cards so we don't use
> them. Copper connections have less latency on short links then fiber as you
> don't have the electric to optical conversion (when done properly)
>

On short links (< 20M) twin-ax copper SFP+ are much more economical and
lower latency than optics.   I would only use optics and fiber if I have
long runs.

-Chip

>
> -Original Message-
> From: OmniOS-discuss [mailto:omnios-discuss-boun...@lists.omniti.com] On
> Behalf Of Richard Elling
> Sent: Saturday, March 28, 2015 07:40 AM
> To: Doug Hughes
> Cc: omnios-discuss
> Subject: Re: [OmniOS-discuss] best or preferred 10g card for OmniOS
>
>
> > On Mar 26, 2015, at 9:24 AM, Doug Hughes  wrote:
> >
> > any recommendations? We're having some pretty big problems with the
> Solarflare card and driver dropping network under high load. We eliminated
> LACP as a culprit, and the switch.
> >
> > Intel? Chelsio? other?
>
> I've been running exclusively Intel for several years now. It gets the most
> attention in the illumos community.
>
>  -- richard
>
>
> >
> > - Doug
> > ___
> > OmniOS-discuss mailing list
> > OmniOS-discuss@lists.omniti.com
> > http://lists.omniti.com/mailman/listinfo/omnios-discuss
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] best or preferred 10g card for OmniOS

2015-03-26 Thread Schweiss, Chip
The Intel X520's  and the Supermicro equivalents  are rock solid.   The
X540 probably is too, I just haven't used it.  I prefer the Supermicro
branded Intel cards because the firmware is not as picky about the twin-ax
cables used.

-Chip

On Thu, Mar 26, 2015 at 11:24 AM, Doug Hughes  wrote:

> any recommendations? We're having some pretty big problems with the
> Solarflare card and driver dropping network under high load. We eliminated
> LACP as a culprit, and the switch.
>
> Intel? Chelsio? other?
>
> - Doug
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] About P19

2015-03-11 Thread Schweiss, Chip
I have P19 on 3 active servers.  No issues.

I consider it safe.

Also interesting, P20 was on them when I first purchased them.  It was
nearly a month of usage before I found out about P20 and then downgraded.
I didn't have any problems with P20 like others were seeing.

-Chip

On Wed, Mar 11, 2015 at 9:22 AM, Dan McDonald  wrote:

>
> > On Mar 11, 2015, at 4:20 AM, Tobias Oetiker  wrote:
> >
> > Dan,
> >
> > you mentioned in an earlier post that you had not heard anything
> > good about P19 ... this seems to prompt people to consider
> > downgreading to P18 ...
>
> I've heard little/nothing about P19.  I've only heard P18 is known to be
> good.
>
> Dan
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] smtp-notify dependency on sendmail

2015-03-10 Thread Schweiss, Chip
On Tue, Mar 10, 2015 at 10:36 AM, Dan McDonald  wrote:

>
> svccfg -s system/fm/smtp-notify setprop startup_req/entities = fmri:
> svc:/milestone/multi-user:default
> svccfg -s system/fm/smtp-notify addpropvalue startup_req/entities fmri:
> svc:/system/fmd:default



That's the trick.  Thanks!

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


[OmniOS-discuss] smtp-notify dependency on sendmail

2015-03-10 Thread Schweiss, Chip
I haven't used sendmail since the 1990's and don't intend to change.

I've figured out how to get smtp-notify to start with sendmail-client
disable, but it was a manual process of using 'svccfg -s smtp-notify
editprop'

What I can't figure out how to do the same on the command line.  Everything
I try either gives a syntax error or 'svccfg: No such property group
"startup_req".'  I really don't want to have to add a manual step to my
system setup scripts.

What's the proper syntax for this setting?:

svccfg -s system/fm/smtp-notify:default setprop "startup_req/entities" =
fmri: \"svc:/milestone/multi-user:default svc:/system/fmd:default\"

-Chip
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] lsi sas 9211-8i it efi firmware 20.00.02.00

2015-03-10 Thread Schweiss, Chip
On Tue, Mar 10, 2015 at 5:48 AM, Stephan Budach 
wrote:

> Am 09.03.15 um 15:47 schrieb Dan McDonald:
>
>> On Mar 9, 2015, at 10:23 AM, Eric Sproul 
>>> wrote:
>>>
>>> On Sat, Mar 7, 2015 at 3:56 PM, Brogyányi József 
>>> wrote:
>>>
 Has anyone tested this firmware? Is it free from this error message
 "Parity
 Error on path"?
 Thanks any information.

>>> P20 firmware is known to be toxic; just google for "lsi p20 firmware"
>>> for the carnage.
>>>
>>> P19 and below are fine, as far as I know.
>>>
>> I've not heard good things about 19.  I HAVE heard that 18 is the best
>> level of FW to run for right now.
>>
>> Thanks!
>> Dan
>>
> Is there a known good way to flash a LSI back to P18 if it already came
> with P19? I happen to have two new LSIs running P19.
> Afaik, the readme explicitly warns about flashing back the fw…
>
>
Backwards is hard.  I went through that trying to get v20 reverted on some
new HBAs.

The only method I could find that worked was using the UEFI shell and UEFI
sas2flash utility to erase the firmware and install the old version.  On
older motherboards, the DOS method should work. Solaris/Illumos sas2flash
is incapable of erasing the firmware.

-Chip





> Cheers,
> budy
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] OmniOS on IBM DX360 UEFI Firmware

2015-03-04 Thread Schweiss, Chip
Sounds like the problem I had on a new Supermicro box.   I found by trial
and error turning off x2apic in the bios fixed the problem.

Also disable C sleep states.

-Chip

On Wed, Mar 4, 2015 at 10:27 AM, John Barfield 
wrote:

>  Greetings,
>
> I’m writing to see if anyone could point me in the direction of a
> document that would detail how to get OmniOS to boot on IBM’s newest UEFI
> firmware on system X machines.
>
>  I’m using a DX360 3U chassis as a storage appliance and I’m having a
> hard time booting the installer iso from USB.
>
>  The installer ISO simply does not work but I can boot another
> “installed” OmniOS appliance image off of a different USB stick.
>
>  However this image just crashes and reboots after the SunOS 5.11 screen
> and goes into an infinite reboot loop.
>
>  If anyone has any experience with this server I would be very grateful
> if you shared your knowledge.
>
>  I’ve tried disabling UEFI or enabling legacy mode but I just don’t think
> that its working…after scanning through IBM’s docs from what I can tell…it
> should just work automatically.
>
>  Thanks in advance for any help!
>
>  John Barfield
>
>
>
>
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


  1   2   >