[
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225276#comment-14225276
]
Marcelo Vanzin commented on HIVE-8574:
--------------------------------------
Hey [~chengxiang li], I'd like to have a better understanding of how these
metrics will be used by Hive to come up with the proper fix here.
I see two approaches:
* Add an API to clean up the metrics. This keeps the current "collect all
metrics" approach, but adds APIs that will to delete the metrics. This assumes
that Hive will always process metrics of finished jobs, even if just to ask
them to be deleted.
* Suggested by [~xuefuz]: add a timeout after a job is finished for cleaning up
the metrics. This means that Hive has some time after a job finished where this
data will be available, but after that, it's gone.
I could also add some internal checks so that the collection doesn't keep
acumulating data indefinitely if data is never deleted; like track only the
last "x" finished jobs, evicting the oldest when a new job starts.
What do you think?
> Enhance metrics gathering in Spark Client [Spark Branch]
> --------------------------------------------------------
>
> Key: HIVE-8574
> URL: https://issues.apache.org/jira/browse/HIVE-8574
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Marcelo Vanzin
> Assignee: Marcelo Vanzin
>
> The current implementation of metrics gathering in the Spark client is a
> little hacky. First, it's awkward to use (and the implementation is also
> pretty ugly). Second, it will just collect metrics indefinitely, so in the
> long term it turns into a huge memory leak.
> We need a simplified interface and some mechanism for disposing of old
> metrics.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)