[ 
https://issues.apache.org/jira/browse/HBASE-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611389#action_12611389
 ] 

Izaak Rubin commented on HBASE-727:
-----------------------------------

Here's some of a log file that demonstrates the problem:

{code}
2008-07-03 15:28:03,890 INFO org.apache.hadoop.hbase.master.ServerManager: 
Received MSG_REPORT_OPEN: -ROOT-,,0 from 127.0.0.1:56998
2008-07-03 15:28:03,891 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Total Load: 1, Num Servers: 1, Avg Load: 1.0
2008-07-03 15:28:03,892 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.rootScanner scanning meta region {regionname: -ROOT-,,0, 
startKey: <>, server: 127.0.0.1:56998}
2008-07-03 15:28:03,951 DEBUG org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.rootScannerREGION => {NAME => '.META.,,1', STARTKEY => '', ENDKEY 
=> '', ENCODED => 1028785192, TABLE => {NAME => '.META.', FAMILIES => [{NAME => 
'historian', VERSIONS => 2147483647, COMPRESSION => 'NONE', IN_MEMORY => false, 
BLOCKCACHE => false, LENGTH => 2147483647, TTL => FOREVER, BLOOMFILTER => 
NONE}, {NAME => 'info', VERSIONS => 1, COMPRESSION => 'NONE', IN_MEMORY => 
false, BLOCKCACHE => false, LENGTH => 2147483647, TTL => FOREVER, BLOOMFILTER 
=> NONE}]}}, SERVER => '127.0.0.1:56544', STARTCODE => 1215123936723
2008-07-03 15:28:03,951 DEBUG org.apache.hadoop.hbase.master.BaseScanner: 
Current assignment of .META.,,1 is not valid: serverInfo: null, passed 
startCode: 1215123936723, storedInfo.startCode: -1, unassignedRegions: false, 
pendingRegions: false
2008-07-03 15:28:03,953 INFO org.apache.hadoop.hbase.master.BaseScanner: 
RegionManager.rootScanner scan of meta region {regionname: -ROOT-,,0, startKey: 
<>, server: 127.0.0.1:56998} complete
2008-07-03 15:28:04,815 DEBUG org.apache.hadoop.hbase.master.ServerManager: 
Total Load: 1, Num Servers: 1, Avg Load: 1.0
2008-07-03 15:28:04,821 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT 
REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 
70236052, TABLE => {NAME => '-ROOT-', FAMILIES => [{NAME => 'info', VERSIONS => 
1, COMPRESSION => 'NONE', IN_MEMORY => false, BLOCKCACHE => false, LENGTH => 
2147483647, TTL => FOREVER, BLOOMFILTER => NONE}]}
2008-07-03 15:28:04,834 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /127.0.0.1:56544. Already tried 1 time(s).
2008-07-03 15:28:05,834 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /127.0.0.1:56544. Already tried 2 time(s).
2008-07-03 15:28:06,835 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /127.0.0.1:56544. Already tried 3 time(s).
2008-07-03 15:28:07,836 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: /127.0.0.1:56544. Already tried 4 time(s).
{code}

For reference, 127.0.0.1:56544 was the server being used before the restart, 
and port 56998 is the one being used after the restart.  The retry messages 
continue infinitely (only the first 4 are shown above).  I'll attach a text 
file with more of the surrounding log.

> Client caught in an infinite loop when trying to connect to cached server 
> locations
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-727
>                 URL: https://issues.apache.org/jira/browse/HBASE-727
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client, ipc
>            Reporter: Izaak Rubin
>            Assignee: Izaak Rubin
>            Priority: Minor
>         Attachments: hbase-727_logfile_sample.txt
>
>
> HbaseRPC, which (to my understanding) is used whenever there is a need to 
> connect to a server, enters an infinite loop to continually retry the 
> connection until it succeeds.  This makes sense for server-to-server 
> interaction, but it doesn't necessarily make sense for all client-to-server 
> interaction.
> The problem I first observed was in doing fast restarts of HBase.  When I 
> attempted to reload the UI after a restart, it would infinitely try to 
> re-contact the cached server location from before the restart.  The correct 
> behavior would be to break out of the loop as soon as possible in situations 
> like the one above.  I think that throwing a RetriesExhaustedException would 
> be the best way to do this, although if anyone has any suggestions please let 
> me know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to