[ https://issues.apache.org/jira/browse/HADOOP-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HADOOP-2333: ---------------------------------- Status: Patch Available (was: Open) > [hbase] client side retries happen at the wrong level > ----------------------------------------------------- > > Key: HADOOP-2333 > URL: https://issues.apache.org/jira/browse/HADOOP-2333 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Affects Versions: 0.16.0 > Reporter: Jim Kellerman > Assignee: Jim Kellerman > Fix For: 0.16.0 > > > Currently client side retries are handled by > HConnectionManager$TableServers.scanOneMetaRegion. This is ok for regions > that have never been on-line, because they won't be found in the meta table. > However, for regions that have been on-line and have gone off-line (for a > region split for example) there are entries in the meta table for the table > being found, but they are incorrect. > In the latter case, the scan of the meta table succeeded, but the new regions > are not yet on-line. If any retries are done, they are done without any wait > period. For example: > {code} > 2007-12-03 05:57:30,433 INFO [RegionServer:0.splitter] > org.apache.hadoop.hbase.HRegion.needsSplit(HRegion.java:657): Splitting > mrtest,,1196661378142 because largest aggregate size is 815.3k and desired > size is 256.0k > 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HRegionServer$Splitter.closing(HRegionServer.java:217): > mrtest,,1196661378142 closing (Adding to retiringRegions) > 2007-12-03 05:57:30,436 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HRegion.snapshotMemcaches(HRegion.java:828): Started > memcache flush for region mrtest,,1196661378142. Size 35.9k > 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] > org.apache.hadoop.hbase.HRegionServer$Flusher.run(HRegionServer.java:449): > flushing region -ROOT-,,0 > 2007-12-03 05:57:30,702 DEBUG [RegionServer:0.cacheFlusher] > org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:882): Not > flushing cache for region -ROOT-,,0: snapshotMemcaches() determined that > there was nothing to do > 2007-12-03 05:57:30,894 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added > -1097746468/text/7689205514340304602 with sequence id 30021 and size 42.9k > 2007-12-03 05:57:30,932 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.internalFlushCache(HStore.java:930): Added > -1097746468/contents/6256958480029690533 with sequence id 30021 and size 110.0 > 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HRegion.internalFlushcache(HRegion.java:939): > Finished memcache flush for region mrtest,,1196661378142 in 497ms, > sequenceid=30021 > 2007-12-03 05:57:30,933 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed -1097746468/text > 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.close(HStore.java:840): closed > -1097746468/contents > 2007-12-03 05:57:30,934 INFO [RegionServer:0.splitter] > org.apache.hadoop.hbase.HRegion.close(HRegion.java:428): closed > mrtest,,1196661378142 > 2007-12-03 05:57:30,934 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HRegionServer$Splitter.closed(HRegionServer.java:231): > mrtest,,1196661378142 closed > 2007-12-03 05:57:31,924 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting > -730914122/contents (no reconstruction log) > 2007-12-03 05:57:31,964 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id > for hstore -730914122/contents is 30021 > 2007-12-03 05:57:31,979 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting > -730914122/text (no reconstruction log) > 2007-12-03 05:57:31,996 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id > for hstore -730914122/text is 30021 > 2007-12-03 05:57:32,856 INFO [HMaster.rootScanner] > org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:212): > HMaster.rootScanner scanning meta region regionname: -ROOT-,,0, startKey: <>, > server: 140.211.11.75:47137} > 2007-12-03 05:57:33,479 WARN [Task Commit Thread] > org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2011): > Task Commit Thread exiting... > 2007-12-03 05:57:33,481 ERROR [IPC Server handler 3 on 47137] > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): > Error opening scanner (fsOk: true) > org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) > at > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 2007-12-03 05:57:33,485 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:299): Next sequence id > for region mrtest,,1196661450435 is 30022 > 2007-12-03 05:57:33,486 INFO [RegionServer:0.splitter] > org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:327): region > mrtest,,1196661450435 available > 2007-12-03 05:57:33,487 DEBUG [main] > org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): > reloading table servers because: > org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) > at > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 2007-12-03 05:57:33,487 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): > Wiping out all we know of mrtest > 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] > org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:248): > HMaster.rootScanner scanner: -5727235478161864863 regioninfo: {regionname: > .META.,,1, startKey: <>, tableDesc: {name: .META., families: {info:={name: > info, max versions: 1, compression: NONE, in memory: false, max length: > 2147483647, bloom filter: none}}}}, server: 140.211.11.75:47137, startCode: > 1196661340435 > 2007-12-03 05:57:33,488 DEBUG [HMaster.rootScanner] > org.apache.hadoop.hbase.HMaster$BaseScanner.checkAssigned(HMaster.java:451): > Checking .META.,,1 is assigned > 2007-12-03 05:57:33,490 INFO [HMaster.rootScanner] > org.apache.hadoop.hbase.HMaster$BaseScanner.scanRegion(HMaster.java:294): > HMaster.rootScanner scan of meta region regionname: -ROOT-,,0, startKey: <>, > server: 140.211.11.75:47137} complete > 2007-12-03 05:57:33,494 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting > -1599058941/contents (no reconstruction log) > 2007-12-03 05:57:33,496 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): > Found 2 servers for table mrtest > 2007-12-03 05:57:33,497 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): > Result of findTable on mrtest: 0. address=140.211.11.75:47137, > mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,[EMAIL > PROTECTED]&*()-_+=:;',.<>/?[]{}|,1196661378142 > 2007-12-03 05:57:33,498 ERROR [IPC Server handler 1 on 47137] > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): > Error opening scanner (fsOk: true) > org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) > at > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 2007-12-03 05:57:33,500 DEBUG [main] > org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): > reloading table servers because: > org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) > at > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 2007-12-03 05:57:33,501 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): > Wiping out all we know of mrtest > 2007-12-03 05:57:33,525 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): > Found 2 servers for table mrtest > 2007-12-03 05:57:33,526 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): > Result of findTable on mrtest: 0. address=140.211.11.75:47137, > mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,[EMAIL > PROTECTED]&*()-_+=:;',.<>/?[]{}|,1196661378142 > 2007-12-03 05:57:33,526 ERROR [IPC Server handler 4 on 47137] > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): > Error opening scanner (fsOk: true) > org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) > at > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 2007-12-03 05:57:33,528 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.<init>(HStore.java:617): maximum sequence id > for hstore -1599058941/contents is 30021 > 2007-12-03 05:57:33,528 DEBUG [main] > org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): > reloading table servers because: > org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) > at > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 2007-12-03 05:57:33,529 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): > Wiping out all we know of mrtest > 2007-12-03 05:57:33,537 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): > Found 2 servers for table mrtest > 2007-12-03 05:57:33,543 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): > Result of findTable on mrtest: 0. address=140.211.11.75:47137, > mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,[EMAIL > PROTECTED]&*()-_+=:;',.<>/?[]{}|,1196661378142 > 2007-12-03 05:57:33,544 ERROR [IPC Server handler 2 on 47137] > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): > Error opening scanner (fsOk: true) > org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) > at > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 2007-12-03 05:57:33,545 DEBUG [main] > org.apache.hadoop.hbase.HTable$ClientScanner.nextScanner(HTable.java:944): > reloading table servers because: > org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) > at > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 2007-12-03 05:57:33,547 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.findServersForTable(HConnectionManager.java:427): > Wiping out all we know of mrtest > 2007-12-03 05:57:33,548 DEBUG [RegionServer:0.splitter] > org.apache.hadoop.hbase.HStore.<init>(HStore.java:581): starting > -1599058941/text (no reconstruction log) > 2007-12-03 05:57:33,558 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.scanOneMetaRegion(HConnectionManager.java:698): > Found 2 servers for table mrtest > 2007-12-03 05:57:33,559 DEBUG [main] > org.apache.hadoop.hbase.HConnectionManager$TableServers.reloadTableServers(HConnectionManager.java:333): > Result of findTable on mrtest: 0. address=140.211.11.75:47137, > mrtest,,1196661378142 1. address=140.211.11.75:47137, mrtest,[EMAIL > PROTECTED]&*()-_+=:;',.<>/?[]{}|,1196661378142 > 2007-12-03 05:57:33,560 ERROR [IPC Server handler 0 on 47137] > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1378): > Error opening scanner (fsOk: true) > org.apache.hadoop.hbase.NotServingRegionException: mrtest,,1196661378142 > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1510) > at > org.apache.hadoop.hbase.HRegionServer.getRegion(HRegionServer.java:1482) > at > org.apache.hadoop.hbase.HRegionServer.openScanner(HRegionServer.java:1365) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > {code} > Note that the split starts at 05:57:30,433 and the last retry attempt is at > 05:57:33,560, hardly the 5 * 10 seconds configured for TestTableMapReduce. > Moving the retry logic to findServersForTable would handle both the cases > described above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.