Re: Namenode Connection Refused

2023-10-24 Thread Harry Jamison
It is not an HA cluster, I gave up on that due to separate problems.
And I am doing this query from the same host as the namenode.

including the netstat -tulapn
that shows the namenode is not exposing the port









On Tuesday, October 24, 2023 at 09:40:09 AM PDT, Wei-Chiu Chuang 
 wrote: 





If it's an HA cluster, is it possible the client doesn't have the proper HA 
configuration so it doesn't know what host name to connect to?

Otherwise, the usual suspect is the firewall configuration between the client 
and the NameNode.

On Tue, Oct 24, 2023 at 9:05 AM Harry Jamison 
 wrote:
> I feel like I am doing something really dumb here, but my namenode is having 
> a connection refused on port 8020.
> 
> There is nothing in the logs that seems to indicate an error as far as I can 
> tell
> 
> ps aux shows the namenode is running
> 
> root   13169   10196  9 21:18 pts/100:00:02 
> /usr/lib/jvm/java-11-openjdk-amd64//bin/java -Dproc_namenode 
> -Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=INFO,NullAppender 
> -Dhadoop.security.logger=INFO,RFAS 
> -Dyarn.log.dir=/hadoop/hadoop/hadoop-3.3.6/logs -Dyarn.log.file=hadoop.log 
> -Dyarn.home.dir=/hadoop/hadoop/hadoop-3.3.6 -Dyarn.root.logger=INFO,console 
> -Djava.library.path=/hadoop/hadoop/hadoop-3.3.6/lib/native 
> -Dhadoop.log.dir=/hadoop/hadoop/hadoop-3.3.6/logs 
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/hadoop/hadoop/hadoop-3.3.6 
> -Dhadoop.id.str=root -Dhadoop.root.logger=TRACE,console 
> -Dhadoop.policy.file=hadoop-policy.xml 
> org.apache.hadoop.hdfs.server.namenode.NameNode
> 
> Netstat shows that this port is not open but others are
> root@vmnode1:/hadoop/hadoop/hadoop# netstat -tulapn|grep 802
> tcp        0      0 192.168.1.159:8023      0.0.0.0:*               LISTEN    
>   16347/java          
> tcp        0      0 192.168.1.159:8022      0.0.0.0:*               LISTEN    
>   16347/java          
> tcp        0      0 192.168.1.159:8022      192.168.1.159:56830     
> ESTABLISHED 16347/java          
> tcp        0      0 192.168.1.159:56830     192.168.1.159:8022      
> ESTABLISHED 13889/java          
> tcp        0      0 192.168.1.159:8022      192.168.1.104:58264     
> ESTABLISHED 16347/java          
> 
> 
> From the namenode logs I see that it has 8020 as the expected port
> [2023-10-23 21:18:21,739] INFO fs.defaultFS is hdfs://vmnode1:8020/ 
> (org.apache.hadoop.hdfs.server.namenode.NameNodeUtils)
> [2023-10-23 21:18:21,739] INFO Clients should use vmnode1:8020 to access this 
> namenode/service. (org.apache.hadoop.hdfs.server.namenode.NameNode)
> 
> My datanodes seem to be connecting, because I see that information about 0 
> invalid blocks in the logs
> [2023-10-24 09:03:21,255] INFO BLOCK* registerDatanode: from 
> DatanodeRegistration(192.168.1.159:9866, 
> datanodeUuid=fbefce35-15f7-43df-a666-ecc90f4bef0f, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-0b66d2f6-6c6a-4f3f-bdb1-b1ab0c947d00;nsid=2036303633;c=1697774550786)
>  storage fbefce35-15f7-43df-a666-ecc90f4bef0f 
> (org.apache.hadoop.hdfs.StateChange)
> [2023-10-24 09:03:21,255] INFO Removing a node: 
> /default-rack/192.168.1.159:9866 (org.apache.hadoop.net.NetworkTopology)
> [2023-10-24 09:03:21,255] INFO Adding a new node: 
> /default-rack/192.168.1.159:9866 (org.apache.hadoop.net.NetworkTopology)
> [2023-10-24 09:03:21,281] INFO BLOCK* processReport 0x746ca82e1993dcbb with 
> lease ID 0xa39c5071fd7ca21f: Processing first storage report for 
> DS-ab8f27ed-6129-492c-9b8a-3800c46703fb from datanode 
> DatanodeRegistration(192.168.1.159:9866, 
> datanodeUuid=fbefce35-15f7-43df-a666-ecc90f4bef0f, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-0b66d2f6-6c6a-4f3f-bdb1-b1ab0c947d00;nsid=2036303633;c=1697774550786)
>  (BlockStateChange)
> [2023-10-24 09:03:21,281] INFO BLOCK* processReport 0x746ca82e1993dcbb with 
> lease ID 0xa39c5071fd7ca21f: from storage 
> DS-ab8f27ed-6129-492c-9b8a-3800c46703fb node 
> DatanodeRegistration(192.168.1.159:9866, 
> datanodeUuid=fbefce35-15f7-43df-a666-ecc90f4bef0f, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-0b66d2f6-6c6a-4f3f-bdb1-b1ab0c947d00;nsid=2036303633;c=1697774550786),
>  blocks: 0, hasStaleStorage: false, processing time: 0 msecs, 
> invalidatedBlocks: 0 (BlockStateChange)
> 
> 
> Is there anything else that I should look at?
> I am not sure how to debug why it is not starting up on this port
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
> 
> 


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Namenode Connection Refused

2023-10-24 Thread Harry Jamison
I feel like I am doing something really dumb here, but my namenode is having a 
connection refused on port 8020.

There is nothing in the logs that seems to indicate an error as far as I can 
tell

ps aux shows the namenode is running

root   13169   10196  9 21:18 pts/100:00:02 
/usr/lib/jvm/java-11-openjdk-amd64//bin/java -Dproc_namenode 
-Djava.net.preferIPv4Stack=true -Dhdfs.audit.logger=INFO,NullAppender 
-Dhadoop.security.logger=INFO,RFAS 
-Dyarn.log.dir=/hadoop/hadoop/hadoop-3.3.6/logs -Dyarn.log.file=hadoop.log 
-Dyarn.home.dir=/hadoop/hadoop/hadoop-3.3.6 -Dyarn.root.logger=INFO,console 
-Djava.library.path=/hadoop/hadoop/hadoop-3.3.6/lib/native 
-Dhadoop.log.dir=/hadoop/hadoop/hadoop-3.3.6/logs -Dhadoop.log.file=hadoop.log 
-Dhadoop.home.dir=/hadoop/hadoop/hadoop-3.3.6 -Dhadoop.id.str=root 
-Dhadoop.root.logger=TRACE,console -Dhadoop.policy.file=hadoop-policy.xml 
org.apache.hadoop.hdfs.server.namenode.NameNode

Netstat shows that this port is not open but others are
root@vmnode1:/hadoop/hadoop/hadoop# netstat -tulapn|grep 802
tcp        0      0 192.168.1.159:8023      0.0.0.0:*               LISTEN      
16347/java          
tcp        0      0 192.168.1.159:8022      0.0.0.0:*               LISTEN      
16347/java          
tcp        0      0 192.168.1.159:8022      192.168.1.159:56830     ESTABLISHED 
16347/java          
tcp        0      0 192.168.1.159:56830     192.168.1.159:8022      ESTABLISHED 
13889/java          
tcp        0      0 192.168.1.159:8022      192.168.1.104:58264     ESTABLISHED 
16347/java          


>From the namenode logs I see that it has 8020 as the expected port
[2023-10-23 21:18:21,739] INFO fs.defaultFS is hdfs://vmnode1:8020/ 
(org.apache.hadoop.hdfs.server.namenode.NameNodeUtils)
[2023-10-23 21:18:21,739] INFO Clients should use vmnode1:8020 to access this 
namenode/service. (org.apache.hadoop.hdfs.server.namenode.NameNode)

My datanodes seem to be connecting, because I see that information about 0 
invalid blocks in the logs
[2023-10-24 09:03:21,255] INFO BLOCK* registerDatanode: from 
DatanodeRegistration(192.168.1.159:9866, 
datanodeUuid=fbefce35-15f7-43df-a666-ecc90f4bef0f, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-0b66d2f6-6c6a-4f3f-bdb1-b1ab0c947d00;nsid=2036303633;c=1697774550786)
 storage fbefce35-15f7-43df-a666-ecc90f4bef0f 
(org.apache.hadoop.hdfs.StateChange)
[2023-10-24 09:03:21,255] INFO Removing a node: 
/default-rack/192.168.1.159:9866 (org.apache.hadoop.net.NetworkTopology)
[2023-10-24 09:03:21,255] INFO Adding a new node: 
/default-rack/192.168.1.159:9866 (org.apache.hadoop.net.NetworkTopology)
[2023-10-24 09:03:21,281] INFO BLOCK* processReport 0x746ca82e1993dcbb with 
lease ID 0xa39c5071fd7ca21f: Processing first storage report for 
DS-ab8f27ed-6129-492c-9b8a-3800c46703fb from datanode 
DatanodeRegistration(192.168.1.159:9866, 
datanodeUuid=fbefce35-15f7-43df-a666-ecc90f4bef0f, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-0b66d2f6-6c6a-4f3f-bdb1-b1ab0c947d00;nsid=2036303633;c=1697774550786)
 (BlockStateChange)
[2023-10-24 09:03:21,281] INFO BLOCK* processReport 0x746ca82e1993dcbb with 
lease ID 0xa39c5071fd7ca21f: from storage 
DS-ab8f27ed-6129-492c-9b8a-3800c46703fb node 
DatanodeRegistration(192.168.1.159:9866, 
datanodeUuid=fbefce35-15f7-43df-a666-ecc90f4bef0f, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-0b66d2f6-6c6a-4f3f-bdb1-b1ab0c947d00;nsid=2036303633;c=1697774550786),
 blocks: 0, hasStaleStorage: false, processing time: 0 msecs, 
invalidatedBlocks: 0 (BlockStateChange)


Is there anything else that I should look at?
I am not sure how to debug why it is not starting up on this port



-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



systemctl setup for HDFS

2023-10-22 Thread Harry Jamison
I was trying to setup services for systemctl for hdfs services.
I think I did something wrong.

Could someone send me their example service files?
I just want to see if I am missing something.
Or is there a better way to startup hdfs on boot?


Thanks

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



HA Namenode unable to connect

2023-10-10 Thread Harry Jamison
I am trying to start my hadoop cluster manually.
I am having trouble figuring out what this error means.

I see this error repeatedly and eventually the namenode shuts down
[2023-10-10 21:03:37,179] INFO Retrying connect to server: 
vmnode1/192.168.1.159:8485. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 
(org.apache.hadoop.ipc.Client)


Does this mean that the journal node is having trouble?
Looking on vmnode1's journal log I do not see anything that looks bad to me
[2023-10-10 21:11:24,583] INFO Using callQueue: class 
java.util.concurrent.LinkedBlockingQueue, queueCapacity: 500, scheduler: class 
org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false. 
(org.apache.hadoop.ipc.CallQueueManager)
[2023-10-10 21:11:24,603] INFO Listener at 0.0.0.0:8485 
(org.apache.hadoop.ipc.Server)
[2023-10-10 21:11:24,606] INFO Starting Socket Reader #1 for port 8485 
(org.apache.hadoop.ipc.Server)
[2023-10-10 21:11:24,914] INFO IPC Server listener on 8485: starting 
(org.apache.hadoop.ipc.Server)
[2023-10-10 21:11:24,917] INFO IPC Server Responder: starting 
(org.apache.hadoop.ipc.Server)
[2023-10-10 21:11:25,481] INFO Initializing journal in directory 
/hadoop/data/hdfs/journalnode/mycluster 
(org.apache.hadoop.hdfs.qjournal.server.JournalNode)
[2023-10-10 21:11:25,521] INFO Lock on 
/hadoop/data/hdfs/journalnode/mycluster/in_use.lock acquired by nodename 
10296@vmnode1 (org.apache.hadoop.hdfs.server.common.Storage)
[2023-10-10 21:11:25,562] INFO Scanning storage 
FileJournalManager(root=/hadoop/data/hdfs/journalnode/mycluster) 
(org.apache.hadoop.hdfs.qjournal.server.Journal)
[2023-10-10 21:11:25,643] INFO Latest log is 
EditLogFile(file=/hadoop/data/hdfs/journalnode/mycluster/current/edits_inprogress_017,first=017,last=017,inProgress=true,hasCorruptHeader=false)
 ; journal id: mycluster (org.apache.hadoop.hdfs.qjournal.server.Journal)
[2023-10-10 21:11:25,993] INFO Starting SyncJournal daemon for journal 
mycluster (org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer)
[2023-10-10 21:11:26,017] INFO 
/hadoop/data/hdfs/journalnode/mycluster/edits.sync directory already exists. 
(org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer)
[2023-10-10 21:11:26,018] INFO Syncing Journal /0:0:0:0:0:0:0:0:8485 with 
vmnode1/192.168.1.159:8485, journal id: mycluster 
(org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer)




this is my hdfs-site.xml



  
      fs.defaultFS
      hdfs://mycluster
   
  
    ha.zookeeper.quorum
    vmnode1:2181,vmnode2:2181,vmnode3:2181
  

  
    dfs.ha.automatic-failover.enabled
    true
  

  
    dfs.nameservices
    mycluster
  

  
    dfs.ha.namenodes.mycluster
    nn1,nn2,nn3
  

  
    dfs.namenode.rpc-address.mycluster.nn1
    vmnode1:8020
  
  
    dfs.namenode.rpc-address.mycluster.nn2
    vmnode2:8020
  
  
    dfs.namenode.rpc-address.mycluster.nn3
    vmnode3:8020
  

  
    dfs.namenode.http-address.mycluster.nn1
    vmnode1:9870
  
  
    dfs.namenode.http-address.mycluster.nn2
    vmnode2:9870
  
  
    dfs.namenode.http-address.mycluster.nn3
    vmnode3:9870
  

  
    dfs.namenode.shared.edits.dir
    qjournal://vmnode1:8485;vmnode2:8485;vmnode3:8485/mycluster
  
  
    dfs.client.failover.proxy.provider.mycluster
    
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
  

  
    dfs.ha.fencing.methods
    sshfence
  

  
    dfs.ha.fencing.ssh.private-key-files
    /root/.ssh/id_rsa
  

  
    dfs.namenode.name.dir
    file:/hadoop/data/hdfs/namenode
  
  
    dfs.datanode.data.dir
    file:/hadoop/data/hdfs/datanode
  
  
    dfs.journalnode.edits.dir
    /hadoop/data/hdfs/journalnode
  

  
    dfs.ha.nn.not-become-active-in-safemode
    false
  



-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: HDFS HA standby

2023-10-04 Thread Harry Jamison
 @Kiyoshi Mizumaru 
How would I do that?I tried changing
/hadoop/etc/hadoop/hadoop-env.sh

export HADOOP_ROOT_LOGGER=TRACE,console

But that did not seem to work, I still only get INFO.On Tuesday, October 3, 
2023 at 09:13:13 PM PDT, Harry Jamison  
wrote:  
 
 I am not sure exactly what the problem is now.
My namenode (and I think journal node are getting shut down.Is there a way to 
tell Why it is getting the shutdown signal?
Also the datanode seems to be getting this error
End of File Exception between local host is


Here are the logs, and I only see INFO logging, and then a the Shutdown
[2023-10-03 20:53:00,873] INFO Initializing quota with 12 thread(s) 
(org.apache.hadoop.hdfs.server.namenode.FSDirectory)

[2023-10-03 20:53:00,876] INFO Quota initialization completed in 1 milliseconds

name space=2

storage space=0

storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 
(org.apache.hadoop.hdfs.server.namenode.FSDirectory)

[2023-10-03 20:53:00,882] INFO Total number of blocks            = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO Starting CacheReplicationMonitor with interval 
3 milliseconds 
(org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor)

[2023-10-03 20:53:00,884] INFO Number of invalid blocks          = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO Number of under-replicated blocks = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO Number of  over-replicated blocks = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO Number of blocks being written    = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO STATE* Replication Queue initialization scan for 
invalid, over- and under-replicated blocks completed in 67 msec 
(org.apache.hadoop.hdfs.StateChange)

[2023-10-03 20:54:16,453] ERROR RECEIVED SIGNAL 15: SIGTERM 
(org.apache.hadoop.hdfs.server.namenode.NameNode)

[2023-10-03 20:54:16,467] INFO SHUTDOWN_MSG: 

/

SHUTDOWN_MSG: Shutting down NameNode at vmnode1/192.168.1.159

/ 
(org.apache.hadoop.hdfs.server.namenode.NameNode)




When I start the data node I see this
[2023-10-03 20:53:00,882] INFO Namenode Block pool 
BP-1620264838-192.168.1.159-1696370857417 (Datanode Uuid 
66068658-b08b-49cd-aba0-56ac1f29e7d5) service to vmnode1/192.168.1.159:8020 
trying to claim ACTIVE state with txid=15 
(org.apache.hadoop.hdfs.server.datanode.DataNode)

[2023-10-03 20:53:00,882] INFO Acknowledging ACTIVE Namenode Block pool 
BP-1620264838-192.168.1.159-1696370857417 (Datanode Uuid 
66068658-b08b-49cd-aba0-56ac1f29e7d5) service to vmnode1/192.168.1.159:8020 
(org.apache.hadoop.hdfs.server.datanode.DataNode)

[2023-10-03 20:53:00,882] INFO After receiving heartbeat response, updating 
state of namenode vmnode1:8020 to active 
(org.apache.hadoop.hdfs.server.datanode.DataNode)

[2023-10-03 20:54:18,771] WARN IOException in offerService 
(org.apache.hadoop.hdfs.server.datanode.DataNode)

java.io.EOFException: End of File Exception between local host is: 
"vmnode1/192.168.1.159"; destination host is: "vmnode1":8020; : 
java.io.EOFException; For more details see:  
http://wiki.apache.org/hadoop/EOFException

 at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)

 at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

 at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

 at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)

 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:930)

 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:879)

 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1571)

 at org.apache.hadoop.ipc.Client.call(Client.java:1513)

 at org.apache.hadoop.ipc.Client.call(Client.java:1410)

 at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)

 at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)

 at com.sun.proxy.$Proxy19.sendHeartbeat(Unknown Source)

 at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:168)

 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:562)

 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:710)

 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)

 at java.base/java.lang.Thread.run(Thread.java:829)

Caused by: java.io.EOFException

 at java.base/java.io.Dat

HDFS HA standby

2023-10-03 Thread Harry Jamison
I am not sure exactly what the problem is now.
My namenode (and I think journal node are getting shut down.Is there a way to 
tell Why it is getting the shutdown signal?
Also the datanode seems to be getting this error
End of File Exception between local host is


Here are the logs, and I only see INFO logging, and then a the Shutdown
[2023-10-03 20:53:00,873] INFO Initializing quota with 12 thread(s) 
(org.apache.hadoop.hdfs.server.namenode.FSDirectory)

[2023-10-03 20:53:00,876] INFO Quota initialization completed in 1 milliseconds

name space=2

storage space=0

storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 
(org.apache.hadoop.hdfs.server.namenode.FSDirectory)

[2023-10-03 20:53:00,882] INFO Total number of blocks            = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO Starting CacheReplicationMonitor with interval 
3 milliseconds 
(org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor)

[2023-10-03 20:53:00,884] INFO Number of invalid blocks          = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO Number of under-replicated blocks = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO Number of  over-replicated blocks = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO Number of blocks being written    = 0 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)

[2023-10-03 20:53:00,884] INFO STATE* Replication Queue initialization scan for 
invalid, over- and under-replicated blocks completed in 67 msec 
(org.apache.hadoop.hdfs.StateChange)

[2023-10-03 20:54:16,453] ERROR RECEIVED SIGNAL 15: SIGTERM 
(org.apache.hadoop.hdfs.server.namenode.NameNode)

[2023-10-03 20:54:16,467] INFO SHUTDOWN_MSG: 

/

SHUTDOWN_MSG: Shutting down NameNode at vmnode1/192.168.1.159

/ 
(org.apache.hadoop.hdfs.server.namenode.NameNode)




When I start the data node I see this
[2023-10-03 20:53:00,882] INFO Namenode Block pool 
BP-1620264838-192.168.1.159-1696370857417 (Datanode Uuid 
66068658-b08b-49cd-aba0-56ac1f29e7d5) service to vmnode1/192.168.1.159:8020 
trying to claim ACTIVE state with txid=15 
(org.apache.hadoop.hdfs.server.datanode.DataNode)

[2023-10-03 20:53:00,882] INFO Acknowledging ACTIVE Namenode Block pool 
BP-1620264838-192.168.1.159-1696370857417 (Datanode Uuid 
66068658-b08b-49cd-aba0-56ac1f29e7d5) service to vmnode1/192.168.1.159:8020 
(org.apache.hadoop.hdfs.server.datanode.DataNode)

[2023-10-03 20:53:00,882] INFO After receiving heartbeat response, updating 
state of namenode vmnode1:8020 to active 
(org.apache.hadoop.hdfs.server.datanode.DataNode)

[2023-10-03 20:54:18,771] WARN IOException in offerService 
(org.apache.hadoop.hdfs.server.datanode.DataNode)

java.io.EOFException: End of File Exception between local host is: 
"vmnode1/192.168.1.159"; destination host is: "vmnode1":8020; : 
java.io.EOFException; For more details see:  
http://wiki.apache.org/hadoop/EOFException

 at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)

 at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

 at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

 at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)

 at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:930)

 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:879)

 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1571)

 at org.apache.hadoop.ipc.Client.call(Client.java:1513)

 at org.apache.hadoop.ipc.Client.call(Client.java:1410)

 at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)

 at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)

 at com.sun.proxy.$Proxy19.sendHeartbeat(Unknown Source)

 at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:168)

 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:562)

 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:710)

 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)

 at java.base/java.lang.Thread.run(Thread.java:829)

Caused by: java.io.EOFException

 at java.base/java.io.DataInputStream.readInt(DataInputStream.java:397)

 at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1906)

 at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1187)

 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1078)



Re: HDFS HA namenode issue

2023-10-03 Thread Harry Jamison
 Thanks guys, I figured out what my issue was.I did not setup the ssh key 
correctly, it was for my user but I started the service as root.
Now it is working except none of the namenodes are transitioning to active on 
startup, and the datanodes are not starting automatically (I think because no 
namenode is active).
I can start everything manually though.

On Tuesday, October 3, 2023 at 11:03:33 AM PDT, Susheel Kumar Gadalay 
 wrote:  
 
 Why you have set this again in hdfs-site.xml at the end.
    dfs.namenode.rpc-address    nn1:8020  

Remove this and start name node again.
Regards Susheel Kumar On Tue, 3 Oct 2023, 10:09 pm Harry Jamison, 
 wrote:

 OK here is where I am at now.
When I start the namenodes, they work, but they are all in standby mode.When I 
start my first datanode it seems to kill one of the namenodes (the active one I 
assume)
I am getting 2 different warnings in the namenode
[2023-10-03 09:03:52,162] WARN Unable to initialize FileSignerSecretProvider, 
falling back to use random secrets. Reason: Could not read signature secret 
file: /root/hadoop-http-auth-signature-secret 
(org.apache.hadoop.security.authentication.server.AuthenticationFilter)

[2023-10-03 09:03:52,350] WARN Only one image storage directory 
(dfs.namenode.name.dir) configured. Beware of data loss due to lack of 
redundant storage directories! 
(org.apache.hadoop.hdfs.server.namenode.FSNamesystem)

I am using a journal node, so I am not clear if I am supposed to have multiple 
dfs.namenode.name.dir directoriesI thought each namenode has 1 directory.

Susheel Kumar Gadalay said that my shared.edits.dir Is wrong, but I am not 
clear how it is wrongFrom here mine looks 
righthttps://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

This is what is in the logs right before the namenode dies[2023-10-03 
09:01:22,054] INFO Listener at vmnode3:8020 
(org.apache.hadoop.ipc.Server)[2023-10-03 09:01:22,054] INFO Starting Socket 
Reader #1 for port 8020 (org.apache.hadoop.ipc.Server)[2023-10-03 09:01:22,097] 
INFO Registered FSNamesystemState, ReplicatedBlocksState and ECBlockGroupsState 
MBeans. (org.apache.hadoop.hdfs.server.namenode.FSNamesystem)[2023-10-03 
09:01:22,119] INFO Number of blocks under construction: 0 
(org.apache.hadoop.hdfs.server.namenode.LeaseManager)[2023-10-03 09:01:22,122] 
INFO Initialized the Default Decommission and Maintenance monitor 
(org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminDefaultMonitor)[2023-10-03
 09:01:22,131] INFO STATE* Leaving safe mode after 0 secs 
(org.apache.hadoop.hdfs.StateChange)[2023-10-03 09:01:22,131] INFO STATE* 
Network topology has 0 racks and 0 datanodes 
(org.apache.hadoop.hdfs.StateChange)[2023-10-03 09:01:22,131] INFO STATE* 
UnderReplicatedBlocks has 0 blocks 
(org.apache.hadoop.hdfs.StateChange)[2023-10-03 09:01:22,130] INFO Start 
MarkedDeleteBlockScrubber thread 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)[2023-10-03 
09:01:22,158] INFO IPC Server Responder: starting 
(org.apache.hadoop.ipc.Server)[2023-10-03 09:01:22,159] INFO IPC Server 
listener on 8020: starting (org.apache.hadoop.ipc.Server)[2023-10-03 
09:01:22,165] INFO NameNode RPC up at: vmnode3/192.168.1.103:8020 
(org.apache.hadoop.hdfs.server.namenode.NameNode)[2023-10-03 09:01:22,166] INFO 
Starting services required for standby state 
(org.apache.hadoop.hdfs.server.namenode.FSNamesystem)[2023-10-03 09:01:22,168] 
INFO Will roll logs on active node every 120 seconds. 
(org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer)[2023-10-03 
09:01:22,171] INFO Starting standby checkpoint thread...Checkpointing active NN 
to possible NNs: [http://vmnode1:9870, http://vmnode2:9870]Serving checkpoints 
at http://vmnode3:9870 
(org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer)real-time 
non-blocking time  (microseconds, -R) unlimitedcore file size              
(blocks, -c) 0data seg size               (kbytes, -d) unlimitedscheduling 
priority                 (-e) 0file size                   (blocks, -f) 
unlimitedpending signals                     (-i) 15187max locked memory        
   (kbytes, -l) 8192max memory size             (kbytes, -m) unlimitedopen 
files                          (-n) 1024pipe size                (512 bytes, 
-p) 8POSIX message queues         (bytes, -q) 819200real-time priority          
        (-r) 0stack size                  (kbytes, -s) 8192cpu time             
      (seconds, -t) unlimitedmax user processes                  (-u) 
15187virtual memory              (kbytes, -v) unlimitedfile locks               
           (-x) unlimited






On Tuesday, October 3, 2023 at 03:54:23 AM PDT, Liming Cui 
 wrote:  
 
 Harry,
Great question.I would say the same configurations in core-site.xml and 
hdfs-site.xml will be overwriting each other in some way.
Glad you found the root cause.
Keep going.
On Tue, Oct 3, 2023 at 10:27 AM Harry Jamison  wrote:

 Liming 
After looking at my config

Re: HDFS HA namenode issue

2023-10-03 Thread Harry Jamison
 OK here is where I am at now.
When I start the namenodes, they work, but they are all in standby mode.When I 
start my first datanode it seems to kill one of the namenodes (the active one I 
assume)
I am getting 2 different warnings in the namenode
[2023-10-03 09:03:52,162] WARN Unable to initialize FileSignerSecretProvider, 
falling back to use random secrets. Reason: Could not read signature secret 
file: /root/hadoop-http-auth-signature-secret 
(org.apache.hadoop.security.authentication.server.AuthenticationFilter)

[2023-10-03 09:03:52,350] WARN Only one image storage directory 
(dfs.namenode.name.dir) configured. Beware of data loss due to lack of 
redundant storage directories! 
(org.apache.hadoop.hdfs.server.namenode.FSNamesystem)

I am using a journal node, so I am not clear if I am supposed to have multiple 
dfs.namenode.name.dir directoriesI thought each namenode has 1 directory.

Susheel Kumar Gadalay said that my shared.edits.dir Is wrong, but I am not 
clear how it is wrongFrom here mine looks 
righthttps://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

This is what is in the logs right before the namenode dies[2023-10-03 
09:01:22,054] INFO Listener at vmnode3:8020 
(org.apache.hadoop.ipc.Server)[2023-10-03 09:01:22,054] INFO Starting Socket 
Reader #1 for port 8020 (org.apache.hadoop.ipc.Server)[2023-10-03 09:01:22,097] 
INFO Registered FSNamesystemState, ReplicatedBlocksState and ECBlockGroupsState 
MBeans. (org.apache.hadoop.hdfs.server.namenode.FSNamesystem)[2023-10-03 
09:01:22,119] INFO Number of blocks under construction: 0 
(org.apache.hadoop.hdfs.server.namenode.LeaseManager)[2023-10-03 09:01:22,122] 
INFO Initialized the Default Decommission and Maintenance monitor 
(org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminDefaultMonitor)[2023-10-03
 09:01:22,131] INFO STATE* Leaving safe mode after 0 secs 
(org.apache.hadoop.hdfs.StateChange)[2023-10-03 09:01:22,131] INFO STATE* 
Network topology has 0 racks and 0 datanodes 
(org.apache.hadoop.hdfs.StateChange)[2023-10-03 09:01:22,131] INFO STATE* 
UnderReplicatedBlocks has 0 blocks 
(org.apache.hadoop.hdfs.StateChange)[2023-10-03 09:01:22,130] INFO Start 
MarkedDeleteBlockScrubber thread 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)[2023-10-03 
09:01:22,158] INFO IPC Server Responder: starting 
(org.apache.hadoop.ipc.Server)[2023-10-03 09:01:22,159] INFO IPC Server 
listener on 8020: starting (org.apache.hadoop.ipc.Server)[2023-10-03 
09:01:22,165] INFO NameNode RPC up at: vmnode3/192.168.1.103:8020 
(org.apache.hadoop.hdfs.server.namenode.NameNode)[2023-10-03 09:01:22,166] INFO 
Starting services required for standby state 
(org.apache.hadoop.hdfs.server.namenode.FSNamesystem)[2023-10-03 09:01:22,168] 
INFO Will roll logs on active node every 120 seconds. 
(org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer)[2023-10-03 
09:01:22,171] INFO Starting standby checkpoint thread...Checkpointing active NN 
to possible NNs: [http://vmnode1:9870, http://vmnode2:9870]Serving checkpoints 
at http://vmnode3:9870 
(org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer)real-time 
non-blocking time  (microseconds, -R) unlimitedcore file size              
(blocks, -c) 0data seg size               (kbytes, -d) unlimitedscheduling 
priority                 (-e) 0file size                   (blocks, -f) 
unlimitedpending signals                     (-i) 15187max locked memory        
   (kbytes, -l) 8192max memory size             (kbytes, -m) unlimitedopen 
files                          (-n) 1024pipe size                (512 bytes, 
-p) 8POSIX message queues         (bytes, -q) 819200real-time priority          
        (-r) 0stack size                  (kbytes, -s) 8192cpu time             
      (seconds, -t) unlimitedmax user processes                  (-u) 
15187virtual memory              (kbytes, -v) unlimitedfile locks               
           (-x) unlimited






On Tuesday, October 3, 2023 at 03:54:23 AM PDT, Liming Cui 
 wrote:  
 
 Harry,
Great question.I would say the same configurations in core-site.xml and 
hdfs-site.xml will be overwriting each other in some way.
Glad you found the root cause.
Keep going.
On Tue, Oct 3, 2023 at 10:27 AM Harry Jamison  wrote:

 Liming 
After looking at my config, I think that maybe my problem is because my 
fs.defaultFS is inconsistent between hdfs-site.xml and core-site.xmlWhat does 
hdfs-site.xml vs core-site.xml do why is the same setting in 2 different 
places?Or do I just have it there mistakenly?
this is what I have in hdfs-site.xml
        
fs.defaultFS      hdfs://mycluster     
    ha.zookeeper.quorum    
nn1:2181,nn2:2181,nn3:2181  
      dfs.nameservices    mycluster  

      dfs.ha.namenodes.mycluster    
nn1,nn2,nn3  
      dfs.namenode.rpc-address.mycluster.nn1    
nn1:8020        
dfs.namenode.rpc-address.mycluster.nn2    nn2:8020  
      dfs.namenode.rpc-address.mycluster.nn3  
  nn3:8020

Re: HDFS HA namenode issue

2023-10-03 Thread Harry Jamison
 Liming 
After looking at my config, I think that maybe my problem is because my 
fs.defaultFS is inconsistent between hdfs-site.xml and core-site.xmlWhat does 
hdfs-site.xml vs core-site.xml do why is the same setting in 2 different 
places?Or do I just have it there mistakenly?
this is what I have in hdfs-site.xml
        
fs.defaultFS      hdfs://mycluster     
    ha.zookeeper.quorum    
nn1:2181,nn2:2181,nn3:2181  
      dfs.nameservices    mycluster  

      dfs.ha.namenodes.mycluster    
nn1,nn2,nn3  
      dfs.namenode.rpc-address.mycluster.nn1    
nn1:8020        
dfs.namenode.rpc-address.mycluster.nn2    nn2:8020  
      dfs.namenode.rpc-address.mycluster.nn3  
  nn3:8020  
      dfs.namenode.http-address.mycluster.nn1    
nn1:9870        
dfs.namenode.http-address.mycluster.nn2    nn2:9870 
       
dfs.namenode.http-address.mycluster.nn3    nn3:9870 
 
      dfs.namenode.shared.edits.dir    
qjournal://nn1:8485;nn2:8485;nn3:8485/mycluster    
    dfs.client.failover.proxy.provider.mycluster    
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
  
      dfs.ha.fencing.methods    sshfence  

      dfs.ha.fencing.ssh.private-key-files    
/home/harry/.ssh/id_rsa  
      dfs.namenode.name.dir    
file:/hadoop/data/hdfs/namenode        
dfs.datanode.data.dir    
file:/hadoop/data/hdfs/datanode        
dfs.journalnode.edits.dir    
/hadoop/data/hdfs/journalnode        
dfs.namenode.rpc-address    nn1:8020  
      dfs.ha.nn.not-become-active-in-safemode    
true  



In core-site.xml I have this
















  

    fs.defaultFS

    hdfs://nn1:8020

  







On Tuesday, October 3, 2023 at 12:54:26 AM PDT, Liming Cui 
 wrote:  
 
 Can you show us the configuration files? Maybe I can help you with some 
suggestions.

On Tue, Oct 3, 2023 at 9:05 AM Harry Jamison  
wrote:

I am trying to setup a HA HDFS cluster, and I am running into a problem
I am not sure what I am doing wrong, I thought I followed the HA namenode 
guide, but it is not working.

Apache Hadoop 3.3.6 – HDFS High Availability


I have 2 namenodes and 3 journal nodes, and 3 zookeeper nodes.
After some period of time I see the following and my namenode and journal node 
die.I am not sure where the problem is, or how to diagnose what I am doing 
wrong here.  And the logging here does not make sense to me.
NamenodeServing checkpoints at http://nn1:9870 
(org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer)
real-time non-blocking time  (microseconds, -R) unlimited

core file size              (blocks, -c) 0

data seg size               (kbytes, -d) unlimited

scheduling priority                 (-e) 0

file size                   (blocks, -f) unlimited

pending signals                     (-i) 15187

max locked memory           (kbytes, -l) 8192

max memory size             (kbytes, -m) unlimited

open files                          (-n) 1024

pipe size                (512 bytes, -p) 8

POSIX message queues         (bytes, -q) 819200

real-time priority                  (-r) 0

stack size                  (kbytes, -s) 8192

cpu time                   (seconds, -t) unlimited

max user processes                  (-u) 15187

virtual memory              (kbytes, -v) unlimited

file locks                          (-x) unlimited

[2023-10-02 23:53:46,693] ERROR RECEIVED SIGNAL 15: SIGTERM 
(org.apache.hadoop.hdfs.server.namenode.NameNode)

[2023-10-02 23:53:46,701] INFO SHUTDOWN_MSG: 

/

SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.1.159

/ 
(org.apache.hadoop.hdfs.server.namenode.NameNode)

JournalNode[2023-10-02 23:54:19,162] WARN Journal at nn1/192.168.1.159:8485 has 
no edit logs (org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer)
real-time non-blocking time  (microseconds, -R) unlimited

core file size              (blocks, -c) 0

data seg size               (kbytes, -d) unlimited

scheduling priority                 (-e) 0

file size                   (blocks, -f) unlimited

pending signals                     (-i) 15187

max locked memory           (kbytes, -l) 8192

max memory size             (kbytes, -m) unlimited

open files                          (-n) 1024

pipe size                (512 bytes, -p) 8

POSIX message queues         (bytes, -q) 819200

real-time priority                  (-r) 0

stack size                  (kbytes, -s) 8192

cpu time                   (seconds, -t) unlimited

max user processes                  (-u) 15187

virtual memory              (kbytes, -v) unlimited

file locks                          (-x) unlimited





-- 
Best
Liming  

HDFS HA namenode issue

2023-10-03 Thread Harry Jamison
I am trying to setup a HA HDFS cluster, and I am running into a problem
I am not sure what I am doing wrong, I thought I followed the HA namenode 
guide, but it is not working.

Apache Hadoop 3.3.6 – HDFS High Availability


I have 2 namenodes and 3 journal nodes, and 3 zookeeper nodes.
After some period of time I see the following and my namenode and journal node 
die.I am not sure where the problem is, or how to diagnose what I am doing 
wrong here.  And the logging here does not make sense to me.
NamenodeServing checkpoints at http://nn1:9870 
(org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer)
real-time non-blocking time  (microseconds, -R) unlimited

core file size              (blocks, -c) 0

data seg size               (kbytes, -d) unlimited

scheduling priority                 (-e) 0

file size                   (blocks, -f) unlimited

pending signals                     (-i) 15187

max locked memory           (kbytes, -l) 8192

max memory size             (kbytes, -m) unlimited

open files                          (-n) 1024

pipe size                (512 bytes, -p) 8

POSIX message queues         (bytes, -q) 819200

real-time priority                  (-r) 0

stack size                  (kbytes, -s) 8192

cpu time                   (seconds, -t) unlimited

max user processes                  (-u) 15187

virtual memory              (kbytes, -v) unlimited

file locks                          (-x) unlimited

[2023-10-02 23:53:46,693] ERROR RECEIVED SIGNAL 15: SIGTERM 
(org.apache.hadoop.hdfs.server.namenode.NameNode)

[2023-10-02 23:53:46,701] INFO SHUTDOWN_MSG: 

/

SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.1.159

/ 
(org.apache.hadoop.hdfs.server.namenode.NameNode)

JournalNode[2023-10-02 23:54:19,162] WARN Journal at nn1/192.168.1.159:8485 has 
no edit logs (org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer)
real-time non-blocking time  (microseconds, -R) unlimited

core file size              (blocks, -c) 0

data seg size               (kbytes, -d) unlimited

scheduling priority                 (-e) 0

file size                   (blocks, -f) unlimited

pending signals                     (-i) 15187

max locked memory           (kbytes, -l) 8192

max memory size             (kbytes, -m) unlimited

open files                          (-n) 1024

pipe size                (512 bytes, -p) 8

POSIX message queues         (bytes, -q) 819200

real-time priority                  (-r) 0

stack size                  (kbytes, -s) 8192

cpu time                   (seconds, -t) unlimited

max user processes                  (-u) 15187

virtual memory              (kbytes, -v) unlimited

file locks                          (-x) unlimited