[hbase] client side retries happen at the wrong level -----------------------------------------------------
Key: HADOOP-2333 URL: https://issues.apache.org/jira/browse/HADOOP-2333 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Currently client side retries are handled by HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions that have never been on-line, because they won't be found in the meta table. However, for regions that have been on-line and have gone off-line (for a region split for example) there are entries in the meta table for the table being found, but they are incorrect. In the latter case, the scan of the meta table succeeded, but the new regions are not yet on-line. If any retries are done, they are done without any wait period. For example: {code} 2007-12-03 05:57:30,433 INFO [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting mrtest,,1196661378142 because largest aggregate size is 815.3k and desired size is 256.0k 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): mrtest,,1196661378142 closing (Adding to retiringRegions) 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started memcache flush for region mrtest,,1196661378142. Size 35.9k 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): flushing region -ROOT-,,0 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that there was nothing to do 2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k 2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): Finished memcache flush for region mrtest,,1196661378142 in 497ms, sequenceid=30021 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/contents 2007-12-03 05:57:30,934 INFO [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed mrtest,,1196661378142 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): mrtest,,1196661378142 closed 2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/contents (no reconstruction log) 2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/contents is 30021 2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -730914122/text (no reconstruction log) 2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -730914122/text is 30021 2007-12-03 05:57:32,856 INFO [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} 2007-12-03 05:57:33,479 WARN [Task Commit Thread] org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): Task Commit Thread exiting... 2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true) org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id for region mrtest,,1196661450435 is 30022 2007-12-03 05:57:33,486 INFO [RegionServer:0.splitter] org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region mrtest,,1196661450435 available 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 2007-12-03 05:57:33,487 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: 1196661340435 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): Checking .META.,,1 is assigned 2007-12-03 05:57:33,490 INFO [HMaster.rootScanner] org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, server: 140.211.11.75:47137} complete 2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/contents (no reconstruction log) 2007-12-03 05:57:33,496 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest 2007-12-03 05:57:33,497 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,[EMAIL PROTECTED]&*()-_+=:;',.<>/?[]{}|,1196661378142 2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true) org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 2007-12-03 05:57:33,500 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 2007-12-03 05:57:33,501 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest 2007-12-03 05:57:33,525 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest 2007-12-03 05:57:33,526 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,[EMAIL PROTECTED]&*()-_+=:;',.<>/?[]{}|,1196661378142 2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true) org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id for hstore -1599058941/contents is 30021 2007-12-03 05:57:33,528 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 2007-12-03 05:57:33,529 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest 2007-12-03 05:57:33,537 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest 2007-12-03 05:57:33,543 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,[EMAIL PROTECTED]&*()-_+=:;',.<>/?[]{}|,1196661378142 2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true) org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 2007-12-03 05:57:33,545 DEBUG [main] org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): reloading table servers because: org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 2007-12-03 05:57:33,547 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): Wiping out all we know of mrtest 2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting -1599058941/text (no reconstruction log) 2007-12-03 05:57:33,558 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): Found 2 servers for table mrtest 2007-12-03 05:57:33,559 DEBUG [main] org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): Result of findTable on mrtest: 0. address=140.211.11.75:47137, mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,[EMAIL PROTECTED]&*()-_+=:;',.<>/?[]{}|,1196661378142 2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): Error opening scanner (fsOk: true) org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) at org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) at org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) {code} Note that the split starts at 05:57:30,433 and the last retry attempt is at 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce. Moving the retry logic to findServersForTable would handle both the cases described above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.