[ 
https://issues.apache.org/jira/browse/HBASE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711745#comment-13711745
 ] 

Dave Latham commented on HBASE-8778:
------------------------------------

I would really like to see a resolution of this for 0.96 so that 0.96 can 
handle huge tables with reasonable performance.  For 0.94 I'm content to rely 
on the patch posted here.  Other users can grab it if they find it helpful.

I'm still most comfortable with simply moving the table descriptor files to a 
well known subdirectory of the tabledir in HDFS.  If done as part of the 0.96 
migration then it can be a simple change without concern for supporting readers 
or writers using the old location and new location simultaneously.  I'd like to 
make the trunk patch to do this migration and operate in the new location.

I like Enis's proposal of using a system table (HBASE-7999) for table 
descriptors as well, but am not as comfortable trying to pull that off 
correctly in a short time frame as we're looking for a 0.96 release.  That 
presents a couple possibilites.  One, we can first change the table dir 
locaiton in hdfs but create a new JIRA to work on transitioning it into a 
system table.  Or if someone who is more comfortable with that work wants to 
dive in and work on it together with me now and thinks we can get it done for 
the 0.96 then I'll give it a shot with them.  Does this sound reasonable?  If I 
hear no objections, I'll work on a trunk patch later this week or next week.
                
> Region assigments scan table directory making them slow for huge tables
> -----------------------------------------------------------------------
>
>                 Key: HBASE-8778
>                 URL: https://issues.apache.org/jira/browse/HBASE-8778
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>         Attachments: HBASE-8778-0.94.5.patch, HBASE-8778-0.94.5-v2.patch
>
>
> On a table with 130k regions it takes about 3 seconds for a region server to 
> open a region once it has been assigned.
> Watching the threads for a region server running 0.94.5 that is opening many 
> such regions shows the thread opening the reigon in code like this:
> {noformat}
> "PRI IPC Server handler 4 on 60020" daemon prio=10 tid=0x00002aaac07e9000 
> nid=0x6566 runnable [0x000000004c46d000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.String.indexOf(String.java:1521)
>         at java.net.URI$Parser.scan(URI.java:2912)
>         at java.net.URI$Parser.parse(URI.java:3004)
>         at java.net.URI.<init>(URI.java:736)
>         at org.apache.hadoop.fs.Path.initialize(Path.java:145)
>         at org.apache.hadoop.fs.Path.<init>(Path.java:126)
>         at org.apache.hadoop.fs.Path.<init>(Path.java:50)
>         at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:215)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:252)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:311)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:159)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:842)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:867)
>         at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1168)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:269)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:255)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoModtime(FSTableDescriptors.java:368)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:155)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:126)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2834)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2807)
>         at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> {noformat}
> To open the region, the region server first loads the latest 
> HTableDescriptor.  Since HBASE-4553 HTableDescriptor's are stored in the file 
> system at "/hbase/<tableDir>/.tableinfo.<sequenceNum>".  The file with the 
> largest sequenceNum is the current descriptor.  This is done so that the 
> current descirptor is updated atomically.  However, since the filename is not 
> known in advance FSTableDescriptors it has to do a FileSystem.listStatus 
> operation which has to list all files in the directory to find it.  The 
> directory also contains all the region directories, so in our case it has to 
> load 130k FileStatus objects.  Even using a globStatus matching function 
> still transfers all the objects to the client before performing the pattern 
> matching.  Furthermore HDFS uses a default of transferring 1000 directory 
> entries in each RPC call, so it requires 130 roundtrips to the namenode to 
> fetch all the directory entries.
> Consequently, to reassign all the regions of a table (or a constant fraction 
> thereof) requires time proportional to the square of the number of regions.
> In our case, if a region server fails with 200 such regions, it takes 10+ 
> minutes for them all to be reassigned, after the zk expiration and log 
> splitting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to