[
https://issues.apache.org/jira/browse/HADOOP-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647026#action_12647026
]
Pete Wyckoff commented on HADOOP-4635:
--------------------------------------
It's this code in getGroups that is causing the (a?) leak:
{code}
groupnames = (char**)malloc(sizeof(char*)* (*num_groups) + 1); **
allocating num_groups + 1 **
assert(groupnames);
int i;
for (i=0; i < *num_groups; i++) {
groupnames[i] = getGroup(grouplist[i]);
if (groupnames[i] == NULL) {
fprintf(stderr, "error could not lookup group %d\n",(int)grouplist[i]);
}
}
free(grouplist);
assert(user != NULL);
groupnames[i] = user; ** setting position beyond num_groups - never
released **
{code}
The last groupnames[i] = user is never released because freeGroups code is:
{code}
static void freeGroups(char **groups, int numgroups) {
if (groups == NULL) {
return;
}
int i ;
for (i = 0; i < numgroups; i++) {
free(groups[i]);
}
free(groups);
}
{code}
Also note that Hadoop never sees the last group either. I think we need a
{code}
*num_groups = *num_groups + 1;
{code}
in getGroups after setting groupnames[i] = user to fix the leak and hadoop not
seeing the last group.
-- pete
> Memory leak ?
> -------------
>
> Key: HADOOP-4635
> URL: https://issues.apache.org/jira/browse/HADOOP-4635
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/fuse-dfs
> Affects Versions: 0.20.0
> Reporter: Marc-Olivier Fleury
>
> I am running a process that needs to crawl a tree structure containing ~10K
> images, copy the images to the local disk, process these images, and copy
> them back to HDFS.
> My problem is the following : after about 10h of processing, the processes
> crash, complaining about a std::bad_alloc exception (I use hadoop pipes to
> run existing software). When running fuse_dfs in debug mode, I get an
> outOfMemoryError, telling that there is no more room in the heap.
> While the process is running, using top or ps, I notice that fuse is using up
> an increasing amount of memory, until some limit is reached. At that point ,
> the memory used is oscillating. I suppose that this is due to the use of the
> virtual memory.
> This leads me to the conclusion that there is some memory leak in fuse_dfs,
> since the only other programs running are Hadoop and the existing software,
> both thoroughly tested in the past.
> My problem is that my knowledge concerning memory leak tracking is rather
> limited, so I will need some instructions to get more insight concerning this
> issue.
> Thank you
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
