[
https://issues.apache.org/jira/browse/HBASE-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611389#action_12611389
]
Izaak Rubin commented on HBASE-727:
-----------------------------------
Here's some of a log file that demonstrates the problem:
{code}
2008-07-03 15:28:03,890 INFO org.apache.hadoop.hbase.master.ServerManager:
Received MSG_REPORT_OPEN: -ROOT-,,0 from 127.0.0.1:56998
2008-07-03 15:28:03,891 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 1, Num Servers: 1, Avg Load: 1.0
2008-07-03 15:28:03,892 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {regionname: -ROOT-,,0,
startKey: <>, server: 127.0.0.1:56998}
2008-07-03 15:28:03,951 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScannerREGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY
=> '', ENCODED => 1028785192, TABLE => {NAME => '.META.', FAMILIES => [{NAME =>
'historian', VERSIONS => 2147483647, COMPRESSION => 'NONE', IN_MEMORY => false,
BLOCKCACHE => false, LENGTH => 2147483647, TTL => FOREVER, BLOOMFILTER =>
NONE}, {NAME => 'info', VERSIONS => 1, COMPRESSION => 'NONE', IN_MEMORY =>
false, BLOCKCACHE => false, LENGTH => 2147483647, TTL => FOREVER, BLOOMFILTER
=> NONE}]}}, SERVER => '127.0.0.1:56544', STARTCODE => 1215123936723
2008-07-03 15:28:03,951 DEBUG org.apache.hadoop.hbase.master.BaseScanner:
Current assignment of .META.,,1 is not valid: serverInfo: null, passed
startCode: 1215123936723, storedInfo.startCode: -1, unassignedRegions: false,
pendingRegions: false
2008-07-03 15:28:03,953 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of meta region {regionname: -ROOT-,,0, startKey:
<>, server: 127.0.0.1:56998} complete
2008-07-03 15:28:04,815 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Total Load: 1, Num Servers: 1, Avg Load: 1.0
2008-07-03 15:28:04,821 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT
REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED =>
70236052, TABLE => {NAME => '-ROOT-', FAMILIES => [{NAME => 'info', VERSIONS =>
1, COMPRESSION => 'NONE', IN_MEMORY => false, BLOCKCACHE => false, LENGTH =>
2147483647, TTL => FOREVER, BLOOMFILTER => NONE}]}
2008-07-03 15:28:04,834 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: /127.0.0.1:56544. Already tried 1 time(s).
2008-07-03 15:28:05,834 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: /127.0.0.1:56544. Already tried 2 time(s).
2008-07-03 15:28:06,835 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: /127.0.0.1:56544. Already tried 3 time(s).
2008-07-03 15:28:07,836 INFO org.apache.hadoop.ipc.Client: Retrying connect to
server: /127.0.0.1:56544. Already tried 4 time(s).
{code}
For reference, 127.0.0.1:56544 was the server being used before the restart,
and port 56998 is the one being used after the restart. The retry messages
continue infinitely (only the first 4 are shown above). I'll attach a text
file with more of the surrounding log.
> Client caught in an infinite loop when trying to connect to cached server
> locations
> -----------------------------------------------------------------------------------
>
> Key: HBASE-727
> URL: https://issues.apache.org/jira/browse/HBASE-727
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client, ipc
> Reporter: Izaak Rubin
> Assignee: Izaak Rubin
> Priority: Minor
> Attachments: hbase-727_logfile_sample.txt
>
>
> HbaseRPC, which (to my understanding) is used whenever there is a need to
> connect to a server, enters an infinite loop to continually retry the
> connection until it succeeds. This makes sense for server-to-server
> interaction, but it doesn't necessarily make sense for all client-to-server
> interaction.
> The problem I first observed was in doing fast restarts of HBase. When I
> attempted to reload the UI after a restart, it would infinitely try to
> re-contact the cached server location from before the restart. The correct
> behavior would be to break out of the loop as soon as possible in situations
> like the one above. I think that throwing a RetriesExhaustedException would
> be the best way to do this, although if anyone has any suggestions please let
> me know.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.