[ 
https://issues.apache.org/jira/browse/HBASE-23679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-23679:
-------------------------------
    Description: 
Spent the better part of a week chasing an issue on HBase 2.x where the number 
of DistributedFileSystem instances on the heap of a RegionServer would grow 
unbounded. Looking at multiple heap-dumps, it was obvious to see that we had an 
immense number of DFS instances cached (in FileSystem$Cache) for the same user, 
with the unique number of Tokens contained in that DFS's UGI member (one hbase 
delegation token, and two HDFS delegation tokens – we only do this for bulk 
loads). For the user's clusters, they eventually experienced 10x perf 
degradation as RegionServers spent all of their time in JVM GC (they were 
unlucky to not have RegionServers crash outright, as this would've, albeit 
temporarily, fixed the issue).

The problem seems to be two-fold with changes by HBASE-15291 being largely the 
cause. This issue tried to close FileSystem instances which were being leaked – 
however, it did this by instrumenting the method 
{{SecureBulkLoadManager.cleanupBulkLoad(..)}}. Two big issues with this 
approach:
 # It relies on clients to call this method (client's hanging up will leak 
resources in RegionServers)
 # This method is only called on the RegionServer hosting the first Region of 
the table which was bulk-loaded into. For multiple RegionServers, they are left 
to leak resources.

HBASE-21342 later tried to fix an issue where FS objects were now being closed 
prematurely via reference-counting (which appears to work fine), but does not 
address the other two issues above. Point #2 makes debugging this issue harder 
than normal because it doesn't manifest on a single node instance :)

Through all of this, I (re)learned the dirty history of UGI and how its caching 
doesn't work so great HADOOP-6670. I see trying to continue to leverage the 
FileSystem$CACHE as a potentially dangerous thing (we've been back here 
multiple times already). My opinion at this point is that we should cleanly 
create a new FileSystem instance during the call to 
{{SecureBulkLoadManager#secureBulkLoadHFiles(..)}} and close it in a finally 
block in that same method. This both simplifies the lifecycle of a FileSystem 
instance in the bulk-load codepath but also helps us avoid future problems with 
UGI and FS caching. The one downside is that we pay the penalty to create a new 
FileSystem instance, but I'm of the opinion that we cross that bridge when we 
get there.

Thanks for [~jdcryans] and [~busbey] for their help along the way.

  was:
Spent the better part of a week chasing an issue on HBase 2.x where the number 
of DistributedFileSystem instances on the heap of a RegionServer would grow 
unbounded. Looking at multiple heap-dumps, it was obvious to see that we had an 
immense number of DFS instances cached (in FileSystem$Cache) for the same user, 
with the unique number of Tokens contained in that DFS's UGI member (one hbase 
delegation token, and two HDFS delegation tokens – we only do this for bulk 
loads). For the user's clusters, they eventually experienced 10x perf 
degradation as RegionServers spent all of their time in JVM GC (they were 
unlucky to not have RegionServers crash outright, as this would've, albeit 
temporarily, fixed the issue).

The problem seems to be two-fold with changes by HBASE-15291 being largely the 
cause. This issue tried to close
 FileSystem instances which were being leaked – however, it did this by 
instrumenting the method
 {{SecureBulkLoadManager.cleanupBulkLoad(..)}}. Two big issues with this 
approach:
 1. It relies on clients to call this method (client's hanging up will leak 
resources in RegionServers)
 2. This method is only called on the RegionServer hosting the first Region of 
the table which was bulk-loaded into. For
 multiple RegionServers, they are left to leak resources.

HBASE-21342 later tried to fix an issue where FS objects were now being closed 
prematurely via reference-counting (which appears to work fine), but does not 
address the other two issues above.

Through all of this, I (re)learned the dirty history of UGI and how its caching 
doesn't work so great HADOOP-6670. I see trying to continue to leverage the 
FileSystem$CACHE as a potentially dangerous thing (we've been back here 
multiple times already). My opinion at this point is that we should cleanly 
create a new FileSystem instance during the call to 
{{SecureBulkLoadManager#secureBulkLoadHFiles(..)}} and close it in a finally 
block in that same method. This both simplifies the lifecycle of a FileSystem 
instance in the bulk-load codepath but also helps us avoid future problems with 
UGI and FS caching. The one downside is that we pay the penalty to create a new 
FileSystem instance, but I'm of the opinion that we cross that bridge when we 
get there.

Thanks for [~jdcryans] and [~busbey] for their help along the way.


> FileSystem instance leaks due to bulk loads with Kerberos enabled
> -----------------------------------------------------------------
>
>                 Key: HBASE-23679
>                 URL: https://issues.apache.org/jira/browse/HBASE-23679
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Critical
>
> Spent the better part of a week chasing an issue on HBase 2.x where the 
> number of DistributedFileSystem instances on the heap of a RegionServer would 
> grow unbounded. Looking at multiple heap-dumps, it was obvious to see that we 
> had an immense number of DFS instances cached (in FileSystem$Cache) for the 
> same user, with the unique number of Tokens contained in that DFS's UGI 
> member (one hbase delegation token, and two HDFS delegation tokens – we only 
> do this for bulk loads). For the user's clusters, they eventually experienced 
> 10x perf degradation as RegionServers spent all of their time in JVM GC (they 
> were unlucky to not have RegionServers crash outright, as this would've, 
> albeit temporarily, fixed the issue).
> The problem seems to be two-fold with changes by HBASE-15291 being largely 
> the cause. This issue tried to close FileSystem instances which were being 
> leaked – however, it did this by instrumenting the method 
> {{SecureBulkLoadManager.cleanupBulkLoad(..)}}. Two big issues with this 
> approach:
>  # It relies on clients to call this method (client's hanging up will leak 
> resources in RegionServers)
>  # This method is only called on the RegionServer hosting the first Region of 
> the table which was bulk-loaded into. For multiple RegionServers, they are 
> left to leak resources.
> HBASE-21342 later tried to fix an issue where FS objects were now being 
> closed prematurely via reference-counting (which appears to work fine), but 
> does not address the other two issues above. Point #2 makes debugging this 
> issue harder than normal because it doesn't manifest on a single node 
> instance :)
> Through all of this, I (re)learned the dirty history of UGI and how its 
> caching doesn't work so great HADOOP-6670. I see trying to continue to 
> leverage the FileSystem$CACHE as a potentially dangerous thing (we've been 
> back here multiple times already). My opinion at this point is that we should 
> cleanly create a new FileSystem instance during the call to 
> {{SecureBulkLoadManager#secureBulkLoadHFiles(..)}} and close it in a finally 
> block in that same method. This both simplifies the lifecycle of a FileSystem 
> instance in the bulk-load codepath but also helps us avoid future problems 
> with UGI and FS caching. The one downside is that we pay the penalty to 
> create a new FileSystem instance, but I'm of the opinion that we cross that 
> bridge when we get there.
> Thanks for [~jdcryans] and [~busbey] for their help along the way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to