[
https://issues.apache.org/jira/browse/HBASE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751205#action_12751205
]
stack commented on HBASE-1603:
------------------------------
Mikhail: I doubt very much its you. Its a pity you don't have a timestamp for
when you saw the RetriesExhaustedException issue. It seems like you are having
the issue originally described here and another, the missing .META. region.
I just saw this issue again myself with RC3.
First, the client-side up in the MR task. The task fails because of
TestTable,0254895141,1251965526935. It says retries exhausted but it doesn't
look like it was trying very had... we timed out too fast it would seem.
{code}
2009-09-03 09:04:05,509 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed
.META.,,1 for tableName=.META. from cache because of
TestTable,0256167115,99999999999999
2009-09-03 09:04:05,512 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location
address: XX.XX.44.140:60020, regioninfo: REGION => {NAME => '.META.,,1',
STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME =>
'.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME
=> 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL =>
'604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'},
{NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647',
BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
2009-09-03 09:04:07,724 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
locateRegionInMeta attempt 1 of 10 failed; retrying after sleep of 2000
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address
listed in .META. for region TestTable,0255035045,1251968451499
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:660)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:401)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1036)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1122)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:390)
at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:413)
at
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:639)
at
org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2009-09-03 09:04:07,725 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed
.META.,,1 for tableName=.META. from cache because of
TestTable,0256167115,99999999999999
2009-09-03 09:04:07,728 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location
address: XX.XX.44.140:60020, regioninfo: REGION => {NAME => '.META.,,1',
STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME =>
'.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME
=> 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL =>
'604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'},
{NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647',
BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
2009-09-03 09:04:09,994 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
locateRegionInMeta attempt 2 of 10 failed; retrying after sleep of 2000
org.apache.hadoop.hbase.client.NoServerForRegionException: No server address
listed in .META. for region TestTable,0255035045,1251968451499
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:660)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:401)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocationForRowWithRetries(HConnectionManager.java:1036)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1122)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.testTakedown(PerformanceEvaluation.java:390)
at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:413)
at
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:639)
at
org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2009-09-03 09:04:09,995 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed
.META.,,1 for tableName=.META. from cache because of
TestTable,0256167115,99999999999999
2009-09-03 09:04:09,997 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location
address: XX.XX.44.140:60020, regioninfo: REGION => {NAME => '.META.,,1',
STARTKEY => '', ENDKEY => '', ENCODED => 1028785192, TABLE => {{NAME =>
'.META.', IS_META => 'true', MEMSTORE_FLUSHSIZE => '16384', FAMILIES => [{NAME
=> 'historian', VERSIONS => '2147483647', COMPRESSION => 'NONE', TTL =>
'604800', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'},
{NAME => 'info', VERSIONS => '10', COMPRESSION => 'NONE', TTL => '2147483647',
BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
2009-09-03 09:04:12,050 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cached location
address: XX.XX.44.142:60020, regioninfo: REGION => {NAME =>
'TestTable,0255035045,1251968451499', STARTKEY => '0255035045', ENDKEY =>
'0257612642', ENCODED => 2088805037, TABLE => {{NAME => 'TestTable', FAMILIES
=> [{NAME => 'info', VERSIONS => '3', COMPRESSION => 'NONE', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>
'true'}]}}
2009-09-03 09:04:12,816 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server Some server, retryOnlyOne=true, index=0, islastrow=false,
tries=9, numtries=10, i=0, listsize=8643, location=address: XX.XX.44.142:60020,
regioninfo: REGION => {NAME => 'TestTable,0254895141,1251965526935', STARTKEY
=> '0254895141', ENDKEY => '0257612642', ENCODED => 1579891142, TABLE => {{NAME
=> 'TestTable', FAMILIES => [{NAME => 'info', VERSIONS => '3', COMPRESSION =>
'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}]}}, region=TestTable,0254895141,1251965526935 for region
TestTable,0254895141,1251965526935, row '0256167115', but failed after 10
attempts.
Exceptions:
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1099)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:586)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:451)
at
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest.testRow(PerformanceEvaluation.java:571)
at
org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:406)
at
org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:639)
at
org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:194)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2009-09-03 09:04:12,819 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
cleanup for the task
{code}
At the time, the region in question was being split so I'd expect some timeout
finding region:
{code}
2009-09-03 08:12:11,569 [IPC Server handler 15 on 60001] INFO
org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT:
TestTable,0254895141,1251963952531: Daughters;
TestTable,0254895141,1251965526935, TestTab
le,0257612642,1251965526935 from aa0-000-15.u.powerset.com,60020,1251958693334;
5 of 7
2009-09-03 08:12:11,653 [IPC Server handler 15 on 60001] INFO
org.apache.hadoop.hbase.master.RegionManager: Assigning region
TestTable,0254895141,1251965526935 to
aa0-000-15.u.powerset.com,60020,1251958693334
2009-09-03 08:12:14,691 [IPC Server handler 12 on 60001] INFO
org.apache.hadoop.hbase.master.ServerManager: Processing
MSG_REPORT_PROCESS_OPEN: TestTable,0254895141,1251965526935 from
aa0-000-15.u.powerset.com,60020,12519586933
34; 2 of 4
2009-09-03 08:12:14,691 [IPC Server handler 12 on 60001] INFO
org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_OPEN:
TestTable,0254895141,1251965526935 from
aa0-000-15.u.powerset.com,60020,1251958693334; 4 of
4
2009-09-03 08:12:14,697 [HMaster] INFO
org.apache.hadoop.hbase.master.RegionServerOperation:
TestTable,0254895141,1251965526935 open on XX.XX.44.142:60020
2009-09-03 08:12:14,698 [HMaster] INFO
org.apache.hadoop.hbase.master.RegionServerOperation: Updated row
TestTable,0254895141,1251965526935 in region .META.,,1 with
startcode=1251958693334, server=XX.XX.44.142:60020
2009-09-03 08:23:03,610 [RegionManager.metaScanner] DEBUG
org.apache.hadoop.hbase.master.BaseScanner:
TestTable,0254895141,1251965526935/1579891142 no longer has references to
TestTable,0254895141,1251963952531
2009-09-03 09:04:01,712 [IPC Server handler 3 on 60001] INFO
org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT:
TestTable,0254895141,1251965526935: Daughters;
TestTable,0254895141,1251968451499, TestTabl
e,0255035045,1251968451499 from aa0-000-15.u.powerset.com,60020,1251958693334;
1 of 3
2009-09-03 09:20:03,376 [RegionManager.metaScanner] DEBUG
org.apache.hadoop.hbase.master.BaseScanner:
TestTable,0254895141,1251968451499/906753129 no longer has references to
TestTable,0254895141,1251965526935
2009-09-03 09:20:03,378 [RegionManager.metaScanner] DEBUG
org.apache.hadoop.hbase.master.BaseScanner:
TestTable,0255035045,1251968451499/2088805037 no longer has references to
TestTable,0254895141,1251965526935
2009-09-03 09:20:03,380 [RegionManager.metaScanner] INFO
org.apache.hadoop.hbase.master.BaseScanner: Deleting region
TestTable,0254895141,1251965526935 (encoded=1579891142) because daughter splits
no longer hold references
{code}
The daughters came on line promptly...
Why is the client giving up so quickly?
I forgot to check why the other 3 attempts falied on this task.
I can get rows from the daughter regions without issue.
> MR failed "RetriesExhaustedException: Trying to contact region server Some
> server for region TestTable..."
> ----------------------------------------------------------------------------------------------------------
>
> Key: HBASE-1603
> URL: https://issues.apache.org/jira/browse/HBASE-1603
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Attachments: debugging-v4.patch
>
>
> Here is the master. Region
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246462685358 was split at
> 16:11:42,865. My MR job failed at 18:12:26,462 with this:
> {code}
> 2009-07-01 18:12:26,462 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server Some server for region TestTable,�,1246464670313, row '��
> ', but failed after 10 attempts.
> Exceptions:
> ...
> {code}
> Why after ten attempts did the client not find the region?
> {code}
> 2009-07-01 16:11:42,865 [IPC Server handler 2 on 60001] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT:
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246462685358: Daughters;
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313,
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from
> aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3
> 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO
> org.apache.hadoop.hbase.master.RegionManager: Assigning region
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 to
> aa0-000-15.u.powerset.com,60020,1246461673026
> 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO
> org.apache.hadoop.hbase.master.RegionManager: Assigning region
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 to
> aa0-000-15.u.powerset.com,60020,1246461673026
> 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received
> MSG_REPORT_PROCESS_OPEN:
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from
> aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3
> 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 from
> aa0-000-15.u.powerset.com,60020,1246461673026; 2 of 3
> 2009-07-01 16:11:45,906 [IPC Server handler 8 on 60001] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from
> aa0-000-15.u.powerset.com,60020,1246461673026; 3 of 3
> 2009-07-01 16:11:45,906 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 open on
> 208.76.44.142:60020
> 2009-07-01 16:11:45,906 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: updating row
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 in region
> .META.,,1 with startcode 1246461673026 and server 208.76.44.142:60020
> 2009-07-01 16:11:45,908 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 open on
> 208.76.44.142:60020
> 2009-07-01 16:11:45,908 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: updating row
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 in region
> .META.,,1 with startcode 1246461673026 and server 208.76.44.142:60020
> 2009-07-01 17:46:42,670 [IPC Server handler 0 on 60001] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT:
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313: Daughters;
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246470379467,
> TestTable,\x00\x00\x08\x04\x05\x07\x02\x05\x04\x08,1246470379467 from
> aa0-000-15.u.powerset.com,60020,1246461673026; 5 of 7
> {code}
> Here is over on the regionserver:
> {code}
> 2009-07-01 16:11:42,865 [IPC Server handler 2 on 60001] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT:
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246462685358: Daughters;
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313,
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from
> aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3
> 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO
> org.apache.hadoop.hbase.master.RegionManager: Assigning region
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 to
> aa0-000-15.u.powerset.com,60020,1246461673026
> 2009-07-01 16:11:42,866 [IPC Server handler 2 on 60001] INFO
> org.apache.hadoop.hbase.master.RegionManager: Assigning region
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 to
> aa0-000-15.u.powerset.com,60020,1246461673026
> 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received
> MSG_REPORT_PROCESS_OPEN:
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from
> aa0-000-15.u.powerset.com,60020,1246461673026; 1 of 3
> 2009-07-01 16:11:45,905 [IPC Server handler 8 on 60001] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 from
> aa0-000-15.u.powerset.com,60020,1246461673026; 2 of 3
> 2009-07-01 16:11:45,906 [IPC Server handler 8 on 60001] INFO
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN:
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 from
> aa0-000-15.u.powerset.com,60020,1246461673026; 3 of 3
> 2009-07-01 16:11:45,906 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 open on
> X.X.X.142:60020
> 2009-07-01 16:11:45,906 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: updating row
> TestTable,\x00\x00\x06\x05\x01\x05\x07\x09\x08\x00,1246464670313 in region
> .META.,,1 with startcode 1246461673026 and server X.X.X4.142:60020
> 2009-07-01 16:11:45,908 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation:
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 open on
> X.X.X.142:60020
> 2009-07-01 16:11:45,908 [HMaster] INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: updating row
> TestTable,\x00\x01\x01\x04\x04\x07\x02\x08\x08\x03,1246464670313 in region
> .META.,,1 with startcode 1246461673026 and server X.X.X.142:60020
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.