[ 
https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303902#comment-14303902
 ] 

stack commented on HBASE-12958:
-------------------------------

Odd. We go from acknowledging that the host with meta is down (c2022) to of a 
sudden forgetting about it:

{code}
2015-02-02 22:32:15,574 INFO  [main-EventThread] zookeeper.RegionServerTracker: 
RegionServer ephemeral node deleted, processing expiration 
[c2022.halxg.cloudera.com,16020,1422944918568]
2015-02-02 22:32:15,575 DEBUG [main-EventThread] master.AssignmentManager: 
based on AM, current region=hbase:meta,,1.1588230740 is on 
server=c2022.halxg.cloudera.com,16020,1422944918568 server being checked: 
c2022.halxg.cloudera.com,16020,1422944918568
2015-02-02 22:32:15,575 DEBUG [main-EventThread] master.ServerManager: 
Added=c2022.halxg.cloudera.com,16020,1422944918568 to dead servers, submitted 
shutdown handler to be executed meta=true
2015-02-02 22:32:15,576 INFO  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] 
handler.MetaServerShutdownHandler: Splitting hbase:meta logs for 
c2022.halxg.cloudera.com,16020,1422944918568
2015-02-02 22:32:15,577 DEBUG [main-EventThread] zookeeper.RegionServerTracker: 
Added tracking of RS /hbase/rs/c2023.halxg.cloudera.com,16020,1422945128068
2015-02-02 22:32:15,578 DEBUG [main-EventThread] zookeeper.RegionServerTracker: 
Added tracking of RS /hbase/rs/c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:15,578 DEBUG [main-EventThread] zookeeper.RegionServerTracker: 
Added tracking of RS /hbase/rs/c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:15,585 DEBUG [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] 
master.MasterFileSystem: Renamed region directory: 
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting
2015-02-02 22:32:15,585 INFO  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] 
master.SplitLogManager: dead splitlog workers 
[c2022.halxg.cloudera.com,16020,1422944918568]
2015-02-02 22:32:15,587 DEBUG [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] 
master.SplitLogManager: Scheduling batch of logs to split
2015-02-02 22:32:15,587 INFO  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] 
master.SplitLogManager: started splitting 1 logs in 
[hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting]
 for [c2022.halxg.cloudera.com,16020,1422944918568]
2015-02-02 22:32:15,591 DEBUG [main-EventThread] 
coordination.SplitLogManagerCoordination: put up splitlog task at znode 
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:15,591 DEBUG [main-EventThread] 
coordination.SplitLogManagerCoordination: task not yet acquired 
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
 ver = 0
2015-02-02 22:32:15,607 INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: task 
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
 acquired by c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:15,929 INFO  
[c2020.halxg.cloudera.com,16020,1422944946802.splitLogManagerTimeoutMonitor] 
coordination.SplitLogManagerCoordination: resubmitting task 
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
2015-02-02 22:32:15,941 INFO  
[c2020.halxg.cloudera.com,16020,1422944946802.splitLogManagerTimeoutMonitor] 
master.SplitLogManager: resubmitted 1 out of 3 tasks
2015-02-02 22:32:15,941 DEBUG [main-EventThread] 
coordination.SplitLogManagerCoordination: task not yet acquired 
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
 ver = 3
2015-02-02 22:32:15,949 INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: task /hbase/splitWAL/RESCAN0000004442 
entered state: DONE c2020.halxg.cloudera.com,16020,1422944946802
2015-02-02 22:32:15,957 DEBUG [main-EventThread] 
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted 
/hbase/splitWAL/RESCAN0000004442
2015-02-02 22:32:15,957 DEBUG [main-EventThread] 
coordination.SplitLogManagerCoordination: deleted task without in memory state 
/hbase/splitWAL/RESCAN0000004442
2015-02-02 22:32:16,007 INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: task 
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
 acquired by c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:16,208 INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: task 
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945090502
 entered state: DONE c2023.halxg.cloudera.com,16020,1422945128068
2015-02-02 22:32:16,219 INFO  [main-EventThread] wal.WALSplitter: Archived 
processed log 
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2021.halxg.cloudera.com,16020,1422944889403-splitting/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945090502
 to 
hdfs://c2020.halxg.cloudera.com:8020/hbase/oldWALs/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945090502
2015-02-02 22:32:16,220 INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: Done splitting 
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945090502
2015-02-02 22:32:16,224 DEBUG [main-EventThread] 
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted 
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945090502
2015-02-02 22:32:17,225 INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: task 
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
 entered state: DONE c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:17,235 INFO  [main-EventThread] wal.WALSplitter: Archived 
processed log 
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2021.halxg.cloudera.com,16020,1422944889403-splitting/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945068674
 to 
hdfs://c2020.halxg.cloudera.com:8020/hbase/oldWALs/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945068674
2015-02-02 22:32:17,236 INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: Done splitting 
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
2015-02-02 22:32:17,241 DEBUG [main-EventThread] 
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted 
/hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
2015-02-02 22:32:17,244 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
master.SplitLogManager: finished splitting (more than or equal to) 1955009811 
bytes in 16 log files in 
[hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2021.halxg.cloudera.com,16020,1422944889403-splitting]
 in 7355ms
2015-02-02 22:32:17,244 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
master.RegionStates: Adding to processed servers 
c2021.halxg.cloudera.com,16020,1422944889403
2015-02-02 22:32:17,244 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
master.RegionStates: Offline splitting/merging region 
{7e0c50a2296895f66bee85df0eaba533 state=SPLITTING, ts=1422945115260, 
server=c2021.halxg.cloudera.com,16020,1422944889403}
2015-02-02 22:32:17,553 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
ipc.AbstractRpcClient: Use SIMPLE authentication for service ClientService, 
sasl=false
2015-02-02 22:32:17,553 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
ipc.AbstractRpcClient: Connecting to c2022.halxg.cloudera.com/10.20.84.28:16020
2015-02-02 22:32:17,554 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
client.RpcRetryingCaller: Call exception, tries=1, retries=350, started=305 ms 
ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c' on table 
'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
2015-02-02 22:32:18,055 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
ipc.AbstractRpcClient: Use SIMPLE authentication for service ClientService, 
sasl=false
2015-02-02 22:32:18,055 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
ipc.AbstractRpcClient: Not trying to connect to 
c2022.halxg.cloudera.com/10.20.84.28:16020 this server is in the failed servers 
list
2015-02-02 22:32:18,056 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
client.RpcRetryingCaller: Call exception, tries=2, retries=350, started=807 ms 
ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c' on table 
'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
2015-02-02 22:32:18,930 INFO  
[c2020.halxg.cloudera.com,16020,1422944946802.splitLogManagerTimeoutMonitor] 
master.SplitLogManager: total tasks = 1 unassigned = 0 
tasks={/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta=last_update
 = 1422945135625 last_version = 2 cur_worker_name = 
c2025.halxg.cloudera.com,16020,1422935795768 status = in_progress incarnation = 
0 resubmits = 0 batch = installed = 1 done = 0 error = 0}
2015-02-02 22:32:19,059 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
ipc.AbstractRpcClient: Use SIMPLE authentication for service ClientService, 
sasl=false
2015-02-02 22:32:19,059 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
ipc.AbstractRpcClient: Not trying to connect to 
c2022.halxg.cloudera.com/10.20.84.28:16020 this server is in the failed servers 
list
2015-02-02 22:32:19,059 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
client.RpcRetryingCaller: Call exception, tries=3, retries=350, started=1810 ms 
ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c' on table 
'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
2015-02-02 22:32:20,258 INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: task 
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
 entered state: DONE c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:20,269 INFO  [main-EventThread] wal.WALSplitter: Archived 
processed log 
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting/c2022.halxg.cloudera.com%2C16020%2C1422944918568..meta.1422945128892.meta
 to 
hdfs://c2020.halxg.cloudera.com:8020/hbase/oldWALs/c2022.halxg.cloudera.com%2C16020%2C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:20,269 INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: Done splitting 
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:20,273 WARN  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] 
master.SplitLogManager: returning success without actually splitting and 
deleting all the log files in path 
hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting
2015-02-02 22:32:20,273 INFO  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] 
master.SplitLogManager: finished splitting (more than or equal to) 83 bytes in 
1 log files in 
[hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting]
 in 4686ms
2015-02-02 22:32:20,274 DEBUG [main-EventThread] 
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted 
/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:21,051 DEBUG [main-EventThread] zookeeper.RegionServerTracker: 
Added tracking of RS /hbase/rs/c2021.halxg.cloudera.com,16020,1422945139174
2015-02-02 22:32:21,052 DEBUG [main-EventThread] zookeeper.RegionServerTracker: 
Added tracking of RS /hbase/rs/c2023.halxg.cloudera.com,16020,1422945128068
2015-02-02 22:32:21,052 DEBUG [main-EventThread] zookeeper.RegionServerTracker: 
Added tracking of RS /hbase/rs/c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:21,052 DEBUG [main-EventThread] zookeeper.RegionServerTracker: 
Added tracking of RS /hbase/rs/c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:21,071 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
ipc.AbstractRpcClient: Use SIMPLE authentication for service ClientService, 
sasl=false
2015-02-02 22:32:21,071 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
ipc.AbstractRpcClient: Connecting to c2022.halxg.cloudera.com/10.20.84.28:16020
2015-02-02 22:32:21,072 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] 
client.RpcRetryingCaller: Call exception, tries=4, retries=350, started=3823 ms 
ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c' on table 
'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
2015-02-02 22:32:21,111 DEBUG [RpcServer.listener,port=16020] ipc.RpcServer: 
RpcServer.listener,port=16020: connection from 10.20.84.27:35695; # active 
connections: 6
{code}

[~jxiang] Any input boss?  Why would we split the meta log and then not assign 
hbase:meta? Thanks sir.

> SSH doing hbase:meta get but hbase:meta not assigned
> ----------------------------------------------------
>
>                 Key: HBASE-12958
>                 URL: https://issues.apache.org/jira/browse/HBASE-12958
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: stack
>            Assignee: stack
>
> All master threads are blocked waiting on this call to return:
> {code}
> "MASTER_SERVER_OPERATIONS-c2020:16020-2" #189 prio=5 os_prio=0 
> tid=0x00007f4b0408b000 nid=0x7821 in Object.wait() [0x00007f4ada24d000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168)
>         - locked <0x000000041c374f50> (a 
> java.util.concurrent.atomic.AtomicBoolean)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881)
>         at 
> org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208)
>         at 
> org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250)
>         at 
> org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225)
>         at 
> org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634)
>         - locked <0x000000041c1f0d80> (a 
> org.apache.hadoop.hbase.master.RegionStates)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298)
>         at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> Master is stuck trying to find hbase:meta on the server that just crashed and 
> that we just recovered:
> Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: 
> callTimeout=60000, callDuration=68181: row '' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, 
> hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
> Will add more detail in a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to