[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Rohini Palaniswamy (JIRA) Thu, 21 Jun 2012 11:31:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398711#comment-13398711
 ]


Rohini Palaniswamy commented on HIVE-3098:
------------------------------------------

@Ashutosh,
   We would not hit the wrong token issue mentioned by Daryn, because hive 
metastore proxies as the user requesting the operation. The actual ugi with 
which the operation would be done is "user via metastoreuser". The 
metastoreuser UGI is the RealUser and the kerberos TGT of it is used so there 
is never a issue of expired or wrong tokens. We have had the same fix in 
hdfsproxy and Oozie for 2 years now in production and we have not had issues 
with it. Hdfsproxy/Oozie UGI cache is slightly more advanced and they close 
unused filesystem objects from the cache after 30 mins or some configured time. 
Closing the FileSystem object immediately and not taking advantage of the cache 
is a bad thing as the fs init operation is costly. Not closing the filesystem 
and retaining it in the cache with this fix is not going to cause memory leak 
unless you have 10000s of users accessing the hivemetastore. Oozie was doing it 
for sometime before the cache expiry logic was added.
                
> Memory leak from large number of FileSystem instances in FileSystem.CACHE. 
> (Must cache UGIs.)
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-3098
>                 URL: https://issues.apache.org/jira/browse/HIVE-3098
>             Project: Hive
>          Issue Type: Bug
>          Components: Shims
>    Affects Versions: 0.9.0
>         Environment: Running with Hadoop 20.205.0.3+ / 1.0.x with security 
> turned on.
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: HIVE-3098.patch
>
>
> The problem manifested from stress-testing HCatalog 0.4.1 (as part of testing 
> the Oracle backend).
> The HCatalog server ran out of memory (-Xmx2048m) when pounded by 60-threads, 
> in under 24 hours. The heap-dump indicates that hadoop::FileSystem.CACHE had 
> 1000000 instances of FileSystem, whose combined retained-mem consumed the 
> entire heap.
> It boiled down to hadoop::UserGroupInformation::equals() being implemented 
> such that the "Subject" member is compared for equality ("=="), and not 
> equivalence (".equals()"). This causes equivalent UGI instances to compare as 
> unequal, and causes a new FileSystem instance to be created and cached.
> The UGI.equals() is so implemented, incidentally, as a fix for yet another 
> problem (HADOOP-6670); so it is unlikely that that implementation can be 
> modified.
> The solution for this is to check for UGI equivalence in HCatalog (i.e. in 
> the Hive metastore), using an cache for UGI instances in the shims.
> I have a patch to fix this. I'll upload it shortly. I just ran an overnight 
> test to confirm that the memory-leak has been arrested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3098) Memory leak from large number of FileSystem instances in FileSystem.CACHE. (Must cache UGIs.)

Reply via email to