[ https://issues.apache.org/jira/browse/AMBARI-22303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Onischuk updated AMBARI-22303: ------------------------------------- Fix Version/s: (was: 2.5.2) 2.6.1 > Spark history server is stopped (with umask 027 and custom spark log/pid dir) > ----------------------------------------------------------------------------- > > Key: AMBARI-22303 > URL: https://issues.apache.org/jira/browse/AMBARI-22303 > Project: Ambari > Issue Type: Bug > Reporter: Andrew Onischuk > Assignee: Andrew Onischuk > Fix For: 2.6.1 > > Attachments: AMBARI-22303.patch > > > STR: > 1. Deploy HDP 2.4.3.0-227 on Ambari 2.5.2.0 without Spark > 2. Enable NN HA > 3. Upgrade Ambari to 2.6.0.0 > 4. Register install and perform RU to 2.6.3.0-220 > 5. Add Spark service > 6. Wait for some time. > Result: Spark History server is stopped. > Cluster: 172.27.62.82:8080 - nat-yc-r6-pgos-ambari-hv-r-upg-1 - 48h. > Artifacts: <http://logserver.eng.hortonworks.com/?prefix=qelogs/nat/70440 > /ambari-hv-r-upg/split-1/nat-yc-r6-pgos-ambari- > hv-r-upg-1/artifacts/ctr-e134-1499953498516-250582-01-000014.hwx.site/artifacts/screenshots/com.hw.ambari.ui.tests.monitoring.admin_page.rolling_express_upgrade.TestRegisterAndInstallNewStackVersion/test130_AddService/_24_10_27_28_Component__Spark_History_Server__not_started_on_host_ctr_e134_1499953498516_250582_01_0/> > Spark logs: <http://logserver.eng.hortonworks.com/?prefix=qelogs/nat/70440 > /ambari-hv-r-upg/split-1/nat-yc-r6-pgos-ambari-hv-r-upg-1/var- > logs/spark/ctr-e134-1499953498516-250582-01-000002.hwx.site/> > From Spark.out > > > > The reported blocks 1677 has reached the threshold 0.9900 of total blocks > 1677. The number of live datanodes 5 has reached the minimum number 0. In > safe mode extension. Safe mode will be turned off automatically in 18 seconds. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1422) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2693) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2582) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) > Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: > Cannot create file/spark-history/.342200c3-6e9e-485c-8f08-d998cd9d92aa. Name > node is in safe mode. > The reported blocks 1677 has reached the threshold 0.9900 of total blocks > 1677. The number of live datanodes 5 has reached the minimum number 0. In > safe mode extension. Safe mode will be turned off automatically in 18 seconds. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1418) > ... 13 more > , while invoking ClientNamenodeProtocolTranslatorPB.create over > ctr-e134-1499953498516-250582-01-000002.hwx.site/172.27.56.143:8020. Retrying > after sleeping for 22256ms. > > However, the NN is not in safemode state: > > > > [root@ctr-e134-1499953498516-250582-01-000002 ~]# hdfs dfsadmin -safemode > get > Safe mode is OFF in > ctr-e134-1499953498516-250582-01-000013.hwx.site/172.27.69.83:8020 > Safe mode is OFF in > ctr-e134-1499953498516-250582-01-000002.hwx.site/172.27.56.143:8020 > -- This message was sent by Atlassian JIRA (v6.4.14#64029)