[
https://issues.apache.org/jira/browse/HAWQ-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238374#comment-15238374
]
ASF GitHub Bot commented on HAWQ-665:
-
GitHub user foyzur opened a pull request:
https://github.com/apache/incubator-hawq/pull/599
HAWQ-665. Dumping memory usage during runaway query termination.
Previously when we ran out of memory, we logged the memory usage of
all the running queries on the segment where OOM happened. However,
if runaway termination successfully terminates the biggest offender,
before we hit OOM, we did not have any logging mechanism. This left
us with insufficient information for root cause analysis of memory leaks.
This PR is introducing logging of memory usage for only the runaway query.
Note, we do not log usage for all the other queries, like we do for out
of memory. Rather we restrict ourselves to only the biggest violator that
is getting terminated. Moreover, because of the sensitive nature of runaway
termination, we need to terminate as fast as possible, to prevent hitting
an out of memory in other processes. Therefore, we cannot log the full
memory
context dump at the time of runaway cleanup. Rather, we attempt to dump
memory usage after the cleanup. This gives us the details of memory
accounting p
eak usage for all the operators. For memory context, however, we only dump
partial
memory context tree as most of the memory contexts will have been dropped
by then. This is still a valuable piece of information to do root cause
analysis as the memory accounting tree represents the execution plan closely
and helps us pinpoint the operator where we might have excessive memory
consumption.
Signed-off-by: Nikos Armenatzoglou
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/foyzur/incubator-hawq runaway
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hawq/pull/599.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #599
commit 8290db59b258446277a4e3e0db1c86d595fe
Author: Foyzur Rahman
Date: 2016-04-06T19:47:29Z
HAWQ-665. Dumping memory usage during runaway query termination.
Previously when we ran out of memory, we logged the memory usage of
all the running queries on the segment where OOM happened. However,
if runaway termination successfully terminates the biggest offender,
before we hit OOM, we did not have any logging mechanism. This left
us with insufficient information for root cause analysis of memory leaks.
This PR is introducing logging of memory usage for only the runaway query.
Note, we do not log usage for all the other queries, like we do for out
of memory. Rather we restrict ourselves to only the biggest violator that
is getting terminated. Moreover, because of the sensitive nature of runaway
termination, we need to terminate as fast as possible, to prevent hitting
an out of memory in other processes. Therefore, we cannot log the full
memory
context dump at the time of runaway cleanup. Rather, we attempt to dump
memory usage after the cleanup. This gives us the details of memory
accounting p
eak usage for all the operators. For memory context, however, we only dump
partial
memory context tree as most of the memory contexts will have been dropped
by then. This is still a valuable piece of information to do root cause
analysis as the memory accounting tree represents the execution plan closely
and helps us pinpoint the operator where we might have excessive memory
consumption.
Signed-off-by: Nikos Armenatzoglou
> Dump memory usage information during runaway query termination
> --
>
> Key: HAWQ-665
> URL: https://issues.apache.org/jira/browse/HAWQ-665
> Project: Apache HAWQ
> Issue Type: New Feature
> Components: Query Execution
>Reporter: Foyzur Rahman
>Assignee: Foyzur Rahman
>
> Currently when we run out of memory, we logged the memory usage of all the
> running queries on the segment where OOM happened. However, if runaway
> termination successfully terminates the biggest offender, before we hit OOM,
> we do not have any logging mechanism. This left us with insufficient
> information for root cause analysis of memory leaks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)