Saad Mufti created HBASE-20218:
----------------------------------

             Summary: Proposed Perfromace Enhancements For 
TableSnapshotInputFomat
                 Key: HBASE-20218
                 URL: https://issues.apache.org/jira/browse/HBASE-20218
             Project: HBase
          Issue Type: Bug
          Components: mapreduce
    Affects Versions: 1.4.0
         Environment: HBase 1.4.0 running in AWS EMR 5.12.0 with the HBase 
rootdir set to a folder in S3

 
            Reporter: Saad Mufti


I have been testing a few Spark jobs we have at my company which work off of 
TableSnapshotInputFormat to read directly from the filesystem snapshots created 
on another EMR/Hbase cluster and stored in S3. During performance testing I 
found various small changes which would greatly enhance peformance. Right now 
we are running our jobs linked with a patched version of HBase 1.4.0 in which I 
made these changes, and I am hoping to submit my patch for review and eventual 
acceptance into the main codebase.

 

The list of changes are :

 
1. a flag to control whether the snapshot restore uses a UUID based random temp 
dir in the specified restore directory. We use the flag to turn this off so 
that we can benefit from a AWS S3 specific bucket partitioning scheme we have 
provisioned. The way S3 partitioning works, you have to give a fixed path 
prefix and a pattern of files after that, and AWS can then partition on the 
paths after the fixed prefix into different resources to get more 
parallelization. We were advised by AWS that we could only get this good 
partitioning behavior if we didn't have that rancom directory in there.
 
2. a flag to turn off the  code that tries to compute locality information for 
the splits. This is useless when dealing with S3 since the files are not on the 
cluster so there is no use in computing locality; and worse yet, it uses a 
single thread in the driver to iterate over all the files in the restored 
snapshot. For a very large table this was taking hours and hours iterating 
through S3 objects just to list them (about 360,000 of them for the our 
specific table).
 
3. a flag to override the column family schema setting to prefetch regions on 
open. This was causing the main executor thread on which a Spark task was 
running, which was trying to read through HFile's for its scan, compete for a 
lock on the underlying EMRFS stream object with prefetch threads trying to read 
the same file, so most tasks in the Spark stage would finish but the last few 
would linger half an hour or more competing with the prefetch threads 
alternately for a lock on an EMRFS stream object. This is the only change that 
had to be outside the mapreduce package as it directly affects the prefetch 
behavior in CacheConfig.java
 
4. a flag to turn off maintenance of Scan metrics. this was also causing a 
major slowdown, getting rid of this sped things up 4-5 times. What I observed 
in the thread dumps was that every call to update scan metrics was trying to 
get some HBase counter object and deep underneath was trying to access some 
Java resource bundle, and throwing an exception that it wasn't found. The 
exception was never visible at the application level and was swallowed 
underneath but whatever it was doing was causing a major slowdown. So we use 
this flag to avoid collecting those metrics because we never used them
 
I am polishing my patch a bit more and hopefully will attach it tomorrow. One 
caveat, I tried but struggled with how to write any useful unit/component tests 
for these as these are invisible behaviors that do not affect the final result 
at all. And I am not that familiar with the HBase testing standards, so for now 
I am looking for guidance on what to tests. 
 
Would appreciate any feedback plus guidance on writing tests, provided of 
course there is interest in incorporating my patch into the main codebase.
 
Cheers.
 
----Saad
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to