[jira] [Comment Edited] (HADOOP-17209) Erasure Coding: Native library memory leak

Stephen O'Donnell (Jira) Thu, 20 Aug 2020 02:44:11 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-17209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181071#comment-17181071
 ]


Stephen O'Donnell edited comment on HADOOP-17209 at 8/20/20, 9:43 AM:
----------------------------------------------------------------------

>From the tutorial posted here:

[http://www.iitk.ac.in/esc101/05Aug/tutorial/native1.1/implementing/array.html]

It does indeed seem that you must call ReleaseIntArrayElements each time you 
call GetIntArrayElements, so the change makes sense to me. However, I have 
never used JNI, so my knowledge in this area is very small.

Grepping the code for GetIntArrayElements, I see there are 3 occurrences of 
this currently:
{code:java}
$ pwd
/Users/sodonnell/source/upstream_hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/erasurecode
$ grep GetIntArrayElements *.c
jni_common.c:  tmpInputOffsets = (int*)(*env)->GetIntArrayElements(env,
jni_common.c:  tmpOutputOffsets = (int*)(*env)->GetIntArrayElements(env,
jni_rs_decoder.c:  int* tmpErasedIndexes = 
(int*)(*env)->GetIntArrayElements(env,
{code}
-This patch address 2 of them - in jni_common.c in the function getOutputs, do 
we need a call to ReleaseIntArrayElements there too?-

This patch seems to address all 3. Earlier I missed the 3rd change in the patch 
file.

[~seanlook] Have you been running with this patch in production for some time, 
and all EC operations are working fine with it?


was (Author: sodonnell):
>From the tutorial posted here:

http://www.iitk.ac.in/esc101/05Aug/tutorial/native1.1/implementing/array.html

It does indeed seem that you must call ReleaseIntArrayElements each time you 
call GetIntArrayElements, so the change makes sense to me. However, I have 
never used JNI, so my knowledge in this area is very small.

Grepping the code for GetIntArrayElements, I see there are 3 occurrences of 
this currently:

{code}
$ pwd
/Users/sodonnell/source/upstream_hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/erasurecode
$ grep GetIntArrayElements *.c
jni_common.c:  tmpInputOffsets = (int*)(*env)->GetIntArrayElements(env,
jni_common.c:  tmpOutputOffsets = (int*)(*env)->GetIntArrayElements(env,
jni_rs_decoder.c:  int* tmpErasedIndexes = 
(int*)(*env)->GetIntArrayElements(env,
{code}

This patch address 2 of them - in jni_common.c in the function getOutputs, do 
we need a call to ReleaseIntArrayElements there too?

{code}
void getOutputs(JNIEnv *env, jobjectArray outputs, jintArray outputOffsets,
                              unsigned char** destOutputs, int num) {
  int numOutputs = (*env)->GetArrayLength(env, outputs);
  int i, *tmpOutputOffsets;
  jobject byteBuffer;

  if (numOutputs != num) {
    THROW(env, "java/lang/InternalError", "Invalid outputs");
  }

  tmpOutputOffsets = (int*)(*env)->GetIntArrayElements(env,
                                                          outputOffsets, NULL);
  for (i = 0; i < numOutputs; i++) {
    byteBuffer = (*env)->GetObjectArrayElement(env, outputs, i);
    destOutputs[i] = (unsigned char *)((*env)->GetDirectBufferAddress(env,
                                                                  byteBuffer));
    destOutputs[i] += tmpOutputOffsets[i];
  }
}
{code}

[~seanlook] Have you been running with this patch in production for some time, 
and all EC operations are working fine with it?

> Erasure Coding: Native library memory leak
> ------------------------------------------
>
>                 Key: HADOOP-17209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17209
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: native
>    Affects Versions: 3.3.0, 3.2.1, 3.1.3
>            Reporter: Sean Chow
>            Assignee: Sean Chow
>            Priority: Major
>         Attachments: HADOOP-17209.001.patch, 
> datanode.202137.detail_diff.5.txt, image-2020-08-15-18-26-44-744.png, 
> image-2020-08-20-12-35-39-906.png
>
>
> We use both {{apache-hadoop-3.1.3}} and {{CDH-6.1.1-1.cdh6.1.1.p0.875250}} 
> HDFS in production, and both of them have the memory increasing over {{-Xmx}} 
> value. 
> !image-2020-08-15-18-26-44-744.png!
>  
> We use EC strategy to to save storage costs.
> This's the jvm options:
> {code:java}
> -Dproc_datanode -Dhdfs.audit.logger=INFO,RFAAUDIT 
> -Dsecurity.audit.logger=INFO,RFAS -Djava.net.preferIPv4Stack=true 
> -Xms8589934592 -Xmx8589934592 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled 
> -XX:+HeapDumpOnOutOfMemoryError ...{code}
> The max jvm heapsize is 8GB, but we can see the datanode RSS memory is 48g. 
> All the other datanodes in this hdfs cluster has the same issue.
> {code:java}
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
> 226044 hdfs 20 0 50.6g 48g 4780 S 90.5 77.0 14728:27 
> /usr/java/jdk1.8.0_162/bin/java -Dproc_datanode{code}
>  
> This too much memory used leads to my machine unresponsive(if enable swap), 
> or oom-killer happens.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-17209) Erasure Coding: Native library memory leak

Reply via email to