[ https://issues.apache.org/jira/browse/HBASE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lars Hofhansl updated HBASE-8778: --------------------------------- Attachment: 8778-dirmodtime.txt Trivial patch that does the table dir modtime check before the table dir is enumerated. > Region assigments scan table directory making them slow for huge tables > ----------------------------------------------------------------------- > > Key: HBASE-8778 > URL: https://issues.apache.org/jira/browse/HBASE-8778 > Project: HBase > Issue Type: Improvement > Reporter: Dave Latham > Assignee: Dave Latham > Fix For: 0.98.0, 0.95.2, 0.94.11 > > Attachments: 8778-dirmodtime.txt, HBASE-8778-0.94.5.patch, > HBASE-8778-0.94.5-v2.patch > > > On a table with 130k regions it takes about 3 seconds for a region server to > open a region once it has been assigned. > Watching the threads for a region server running 0.94.5 that is opening many > such regions shows the thread opening the reigon in code like this: > {noformat} > "PRI IPC Server handler 4 on 60020" daemon prio=10 tid=0x00002aaac07e9000 > nid=0x6566 runnable [0x000000004c46d000] > java.lang.Thread.State: RUNNABLE > at java.lang.String.indexOf(String.java:1521) > at java.net.URI$Parser.scan(URI.java:2912) > at java.net.URI$Parser.parse(URI.java:3004) > at java.net.URI.<init>(URI.java:736) > at org.apache.hadoop.fs.Path.initialize(Path.java:145) > at org.apache.hadoop.fs.Path.<init>(Path.java:126) > at org.apache.hadoop.fs.Path.<init>(Path.java:50) > at > org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:215) > at > org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:252) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:311) > at > org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:159) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:842) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:867) > at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1168) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:269) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:255) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoModtime(FSTableDescriptors.java:368) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:155) > at > org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:126) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2834) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2807) > at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) > {noformat} > To open the region, the region server first loads the latest > HTableDescriptor. Since HBASE-4553 HTableDescriptor's are stored in the file > system at "/hbase/<tableDir>/.tableinfo.<sequenceNum>". The file with the > largest sequenceNum is the current descriptor. This is done so that the > current descirptor is updated atomically. However, since the filename is not > known in advance FSTableDescriptors it has to do a FileSystem.listStatus > operation which has to list all files in the directory to find it. The > directory also contains all the region directories, so in our case it has to > load 130k FileStatus objects. Even using a globStatus matching function > still transfers all the objects to the client before performing the pattern > matching. Furthermore HDFS uses a default of transferring 1000 directory > entries in each RPC call, so it requires 130 roundtrips to the namenode to > fetch all the directory entries. > Consequently, to reassign all the regions of a table (or a constant fraction > thereof) requires time proportional to the square of the number of regions. > In our case, if a region server fails with 200 such regions, it takes 10+ > minutes for them all to be reassigned, after the zk expiration and log > splitting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira