Hopefully the patch Marcel is talking about fixes this. I've at least figured out enough to predict when the problem is imminent.
We have been migrating to using automounter instead of hard mounts which could to be related to this problem growing over time. Just an FYI: I've kept the server running in this state, but moved its storage pool to a sister server. The port binding problem remains with NO NFS clients connected, but neither pfiles or lsof shows rpcbind as the culprit: # netstat -an|grep BOUND|wc -l 32739 # /opt/ozmt/bin/SunOS/lsof -i:41155 {nothing returned} # pfiles `pgrep rpcbind` 449: /usr/sbin/rpcbind Current rlimit: 65536 file descriptors 0: S_IFCHR mode:0666 dev:527,0 ino:70778888 uid:0 gid:3 rdev:135,2 O_RDWR /devices/pseudo/mm@0:null offset:0 1: S_IFCHR mode:0666 dev:527,0 ino:70778888 uid:0 gid:3 rdev:135,2 O_RDWR /devices/pseudo/mm@0:null offset:0 2: S_IFCHR mode:0666 dev:527,0 ino:70778888 uid:0 gid:3 rdev:135,2 O_RDWR /devices/pseudo/mm@0:null offset:0 3: S_IFCHR mode:0000 dev:527,0 ino:61271 uid:0 gid:0 rdev:231,64 O_RDWR sockname: AF_INET6 :: port: 111 /devices/pseudo/udp6@0:udp6 offset:0 4: S_IFCHR mode:0000 dev:527,0 ino:50998 uid:0 gid:0 rdev:231,59 O_RDWR sockname: AF_INET6 :: port: 0 /devices/pseudo/udp6@0:udp6 offset:0 5: S_IFCHR mode:0000 dev:527,0 ino:61264 uid:0 gid:0 rdev:231,58 O_RDWR sockname: AF_INET6 :: port: 60955 /devices/pseudo/udp6@0:udp6 offset:0 6: S_IFCHR mode:0000 dev:527,0 ino:64334 uid:0 gid:0 rdev:224,57 O_RDWR sockname: AF_INET6 :: port: 111 /devices/pseudo/tcp6@0:tcp6 offset:0 7: S_IFCHR mode:0000 dev:527,0 ino:64333 uid:0 gid:0 rdev:224,56 O_RDWR sockname: AF_INET6 :: port: 0 /devices/pseudo/tcp6@0:tcp6 offset:0 8: S_IFCHR mode:0000 dev:527,0 ino:64332 uid:0 gid:0 rdev:230,55 O_RDWR sockname: AF_INET 0.0.0.0 port: 111 /devices/pseudo/udp@0:udp offset:0 9: S_IFCHR mode:0000 dev:527,0 ino:64330 uid:0 gid:0 rdev:230,54 O_RDWR sockname: AF_INET 0.0.0.0 port: 0 /devices/pseudo/udp@0:udp offset:0 10: S_IFCHR mode:0000 dev:527,0 ino:64331 uid:0 gid:0 rdev:230,53 O_RDWR sockname: AF_INET 0.0.0.0 port: 60994 /devices/pseudo/udp@0:udp offset:0 11: S_IFCHR mode:0000 dev:527,0 ino:64327 uid:0 gid:0 rdev:223,52 O_RDWR sockname: AF_INET 0.0.0.0 port: 111 /devices/pseudo/tcp@0:tcp offset:0 12: S_IFCHR mode:0000 dev:527,0 ino:64326 uid:0 gid:0 rdev:223,51 O_RDWR sockname: AF_INET 0.0.0.0 port: 0 /devices/pseudo/tcp@0:tcp offset:0 13: S_IFCHR mode:0000 dev:527,0 ino:64324 uid:0 gid:0 rdev:226,32 O_RDWR /devices/pseudo/tl@0:ticlts offset:0 14: S_IFCHR mode:0000 dev:527,0 ino:64328 uid:0 gid:0 rdev:226,33 O_RDWR /devices/pseudo/tl@0:ticlts offset:0 15: S_IFCHR mode:0000 dev:527,0 ino:64324 uid:0 gid:0 rdev:226,35 O_RDWR /devices/pseudo/tl@0:ticlts offset:0 16: S_IFCHR mode:0000 dev:527,0 ino:64322 uid:0 gid:0 rdev:226,36 O_RDWR /devices/pseudo/tl@0:ticotsord offset:0 17: S_IFCHR mode:0000 dev:527,0 ino:64321 uid:0 gid:0 rdev:226,37 O_RDWR /devices/pseudo/tl@0:ticotsord offset:0 18: S_IFCHR mode:0000 dev:527,0 ino:64030 uid:0 gid:0 rdev:226,39 O_RDWR /devices/pseudo/tl@0:ticots offset:0 19: S_IFCHR mode:0000 dev:527,0 ino:64029 uid:0 gid:0 rdev:226,40 O_RDWR /devices/pseudo/tl@0:ticots offset:0 20: S_IFIFO mode:0000 dev:525,0 ino:206 uid:1 gid:12 rdev:0,0 O_RDWR|O_NONBLOCK 21: S_IFIFO mode:0000 dev:525,0 ino:206 uid:1 gid:12 rdev:0,0 O_RDWR|O_NONBLOCK 23: S_IFCHR mode:0000 dev:527,0 ino:33089 uid:0 gid:0 rdev:129,21273 O_WRONLY FD_CLOEXEC /devices/pseudo/log@0:conslog offset:0 Restarting rpcbind doesn't affect it either: # svcadm restart svc:/network/rpc/bind:default # netstat -an|grep BOUND|wc -l 32739 In the interim of this patch getting integrated I'll monitor the number of bound ports to know when I should fail my pool over again. On Wed, Jan 3, 2018 at 10:32 AM, Marcel Telka <mar...@telka.sk> wrote: > On Wed, Jan 03, 2018 at 10:02:43AM -0600, Schweiss, Chip wrote: > > The problem occurred again starting last night. I have another clue, > but I > > still don't know how it is occurring or how to fix it. > > > > It looks like all the TCP ports are in "bound" state, but not being > > released. > > > > How can I isolate the cause of this? > > This is a bug in rpcmod, very likely related to > https://www.illumos.org/issues/1616 > > I discussed this few weeks back with some guy who faced the same issue. It > looks like he found the cause and have a fix for it. I thought he will > post a > review request, but that didn't happened for some reason yet. > > I'll try to push this forward... > > > Thanks. > > -- > +-------------------------------------------+ > | Marcel Telka e-mail: mar...@telka.sk | > | homepage: http://telka.sk/ | > | jabber: mar...@jabber.sk | > +-------------------------------------------+ > > ------------------------------------------ > illumos-zfs > Archives: https://illumos.topicbox.com/groups/zfs/discussions/ > T8f10bde64dc0d5c5-Mb17ca753ce6f6fbed5124147 > Powered by Topicbox: https://topicbox.com >
_______________________________________________ OmniOS-discuss mailing list OmniOS-discuss@lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss