[
https://issues.apache.org/jira/browse/TRAFODION-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315550#comment-16315550
]
Selvaganesan Govindarajan commented on TRAFODION-2888:
------------------------------------------------------
It is true that every effort should be to ensure the process continue to run,
but to retain the stability of the cluster, it should be ok to bring down a
process or node.
I think out of memory management condition (OOM) and memory allocation failure
are entirely orthogonal.
OOM memory condition can happen when there is a memory pressure or when there
is RAM exhaustion. It could be due to
a) There are too many processes in the system than the system can handle
a) Some of the processes are building up the virtual memory due to memory
leak.
b) Ran out of swap space
Memory allocation failure rarely happen in 64 bit addressing scheme unless some
process limit like PTE (page table entry) is reached either at the process or
system level. The process dump I have analyzed had allocated huge amount of
of memory out of which only 1.6 GB is from accounted SQL memory via Trafodion
Heap management.
OOM condition can lead to memory allocation failure, but it is too late because
OOM killer would have kicked in and killed some process that would have made
the node unusable anyway.
If longjmp/setjmp needs work with heap correctly, it needs to be associated
with top level heap cli_globals::executorMemory (even for the ESP process)
because EsgynDB heap management is hierarchical. The heap in the lower rank
requests its parent heap to allocate a block if it can’t assign the memory from
already allocated block. This continues till it reaches the top level Heap. In
case of multi-threaded ESPs this heap is used from multiple threads by setting
the heap to be thread safe. But setjmp/longjmp are thread-safe only when it is
coded such that you don't setjmp from one thread and longjmp to its context
from another thread. It is not possible to guarantee the thread-safeness for
setjmp/longjmp in a multi-threaded ESP.
NAMemory::setJmpBuf is supposed to assert when threadSafe is set to true.
In legacy Trafodion code, all memory allocations in executor are from the Heap
infrastructure. But it isn’t the case anymore. I have seen the Trafodion heap
memory constitute less than 10-20% of the total virtual memory of the process.
In some scenarios, it could be much worse because of memory fragmentation as
seen from the core dump. So, the memory allocation failure(if it happens) most
likely to happen in other parts of the code.
So, it is imperative that the memory growth/leak is managed in a pro-active
manner in Trafodion processes. My suggestion would be to look for the memory
pressure in the cluster or virtual memory growth in the process at some logical
points. For eq, It is possible to prevent new queries in mxosrvr if the virtual
memory of the mxosrvr process exceeds a certain value. When the application
needs to execute multiple statements simultaneously, this restriction would
make sense. If there is only one user SQL statement active at any point of
time, then the memory growth seen in the mxosrvr process most likely is due to
memory leak. Currently Trafodion code doesn't detect this memory leak and
recover the mxosrvr from it before the next user SQL statement is submitted to
it. But it is possible to incorporate such self-healing concepts.
To ensure that the process continue to run, the setjmp/longjmp concepts are
retained in the compiler for all cases other than the memory allocation
failure(which shouldn’t happen at all).
> Streamline setjmp/longjmp concepts in Trafodion
> -----------------------------------------------
>
> Key: TRAFODION-2888
> URL: https://issues.apache.org/jira/browse/TRAFODION-2888
> Project: Apache Trafodion
> Issue Type: Improvement
> Components: sql-general
> Reporter: Selvaganesan Govindarajan
> Assignee: Selvaganesan Govindarajan
> Fix For: 2.3
>
>
> I happened to come across a core dump with longjmp in executor layer that
> brought down the node. Unfortunately, the core dump wasn’t useful to figure
> out what was the root cause for the longjmp. Hence,
> a) I wonder is there a way to figure out what caused longjmp from the core?
> b) If no, why do longjmp? It might be better to let it dump naturally by
> accessing the invalid address or null pointer right at the point of failure.
> Was longjmp put in place in legacy Trafodion code base to avoid node being
> brought down when the privilege code gets into segment violation?
> If a) is not possible, I would want to remove the remnants of setjmp and
> longjmp from the code to enable us to debug the issue better.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)