[ https://issues.apache.org/jira/browse/AMBARI-21614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Onischuk updated AMBARI-21614: ------------------------------------- Attachment: AMBARI-21614.patch > Restart NFSGateway fails after ResourceManager move to another host > ------------------------------------------------------------------- > > Key: AMBARI-21614 > URL: https://issues.apache.org/jira/browse/AMBARI-21614 > Project: Ambari > Issue Type: Bug > Reporter: Andrew Onischuk > Assignee: Andrew Onischuk > Fix For: 2.5.2 > > Attachments: AMBARI-21614.patch > > > Test performed: > 1. Move ResourceManager to a different host > 2. Regenerate Keytabs > 3. Restart required services > In build #180, while performing Restart of required services, Restart of > NFSGateway fails with the following error for **Administrator** and **Cluster > Administrator** roles: > > > > 2017-07-26 04:47:17,828 INFO nfs3.Nfs3Base (Nfs3Base.java:<init>(45)) - > NFS server port set to: 2049 > 2017-07-26 04:47:17,831 INFO oncrpc.RpcProgram > (RpcProgram.java:<init>(99)) - Will accept client connections from > unprivileged ports > 2017-07-26 04:47:17,839 INFO security.UserGroupInformation > (UserGroupInformation.java:loginUserFromKeytab(1101)) - Login successful for > user nfs/ctr-e134-1499953498516-54517-01-000003.hwx.s...@example.com using > keytab file /etc/security/keytabs/nfs.service.keytab > 2017-07-26 04:47:18,785 INFO oncrpc.SimpleUdpServer > (SimpleUdpServer.java:run(73)) - Started listening to UDP requests at port > 4242 for Rpc program: mountd at localhost:4242 with workerCount 1 > 2017-07-26 04:47:18,805 FATAL mount.MountdBase > (MountdBase.java:startTCPServer(85)) - Failed to start the TCP server. > org.jboss.netty.channel.ChannelException: Failed to bind to: > 0.0.0.0/0.0.0.0:4242 > at > org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) > at org.apache.hadoop.oncrpc.SimpleTcpServer.run(SimpleTcpServer.java:88) > at org.apache.hadoop.mount.MountdBase.startTCPServer(MountdBase.java:83) > at org.apache.hadoop.mount.MountdBase.start(MountdBase.java:98) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startServiceInternal(Nfs3.java:56) > at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startService(Nfs3.java:69) > at > org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter.start(PrivilegedNfsGatewayStarter.java:71) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) > at > org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2017-07-26 04:47:18,828 INFO util.ExitUtil > (ExitUtil.java:terminate(124)) - Exiting with status 1 > 2017-07-26 04:47:18,831 INFO nfs3.Nfs3Base (LogAdapter.java:info(45)) - > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down Nfs3 at > ctr-e134-1499953498516-54517-01-000003.hwx.site/172.27.10.140 > ************************************************************/ > ==> /grid/0/log/hdfs/root/SecurityAuth.audit <== > ==> > /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out.4 > <== > ulimit -a for privileged nfs user cstm-hdfs > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1030387 > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 65536 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) unlimited > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > ==> > /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out.3 > <== > ulimit -a for privileged nfs user cstm-hdfs > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1030387 > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 65536 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) unlimited > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > ==> > /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out.2 > <== > ulimit -a for privileged nfs user cstm-hdfs > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1030387 > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 65536 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) unlimited > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > ==> > /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out.1 > <== > ulimit -a for privileged nfs user cstm-hdfs > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1030387 > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 65536 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) unlimited > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > ==> > /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out > <== > ulimit -a for privileged nfs user cstm-hdfs > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 1030387 > max locked memory (kbytes, -l) unlimited > max memory size (kbytes, -m) unlimited > open files (-n) 65536 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) unlimited > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > Command failed after 1 tries > > Live cluster env: <https://172.27.18.145:8443> extended life for 48 hours > > > > 172.27.18.145 ctr-e134-1499953498516-54516-01-000007.hwx.site > ctr-e134-1499953498516-54516-01-000007 > 172.27.16.83 ctr-e134-1499953498516-54516-01-000006.hwx.site > ctr-e134-1499953498516-54516-01-000006 > 172.27.53.131 ctr-e134-1499953498516-54516-01-000005.hwx.site > ctr-e134-1499953498516-54516-01-000005 > 172.27.54.24 ctr-e134-1499953498516-54516-01-000004.hwx.site > ctr-e134-1499953498516-54516-01-000004 > 172.27.20.195 ctr-e134-1499953498516-54516-01-000002.hwx.site > ctr-e134-1499953498516-54516-01-000002 > -- This message was sent by Atlassian JIRA (v6.4.14#64029)