[
https://issues.apache.org/jira/browse/HIVE-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880136#comment-13880136
]
Sushanth Sowmyan commented on HIVE-6268:
----------------------------------------
To fix this, we can do two things :
a) Provide a jobconf parameter that allows users to disable usage of the cache
altogether. Useful for massively multithreaded cases.
b) In the cases that use a cache, we should spawn a separate maintenance thread
that will prune and expire from time to time.
Attaching a patch which does both of the above.
> Network resource leak with HiveClientCache when using HCatInputFormat
> ---------------------------------------------------------------------
>
> Key: HIVE-6268
> URL: https://issues.apache.org/jira/browse/HIVE-6268
> Project: Hive
> Issue Type: Bug
> Components: HCatalog
> Affects Versions: 0.12.0
> Reporter: Sushanth Sowmyan
> Assignee: Sushanth Sowmyan
> Attachments: HIVE-6268.patch
>
>
> HCatInputFormat has a cache feature that allows HCat to cache hive client
> connections to the metastore, so as to not keep reinstantiating a new hive
> server every single time. This uses a guava cache of hive clients, which only
> evicts entries from cache on the next write, or by manually managing the
> cache.
> So, in a single threaded case, where we reuse the hive client, the cache
> works well, but in a massively multithreaded case, where each thread might
> perform one action, and then is never used, there are no more writes to the
> cache, and all the clients stay alive, thus keeping ports open.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)