My plan is currently to: * switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 patch to test it out. * if the tests work, work on getting YETUS-561 committed to yetus master * switch jobs back to ASF yetus master either post-YETUS-561 or without it if it doesn’t work * go back to working on something else, regardless of the outcome
> On Oct 24, 2017, at 2:55 PM, Chris Douglas <cdoug...@apache.org> wrote: > > Sean/Junping- > > Ignoring the epistemology, it's a problem. Let's figure out what's > causing memory to balloon and then we can work out the appropriate > remedy. > > Is this reproducible outside the CI environment? To Junping's point, > would YETUS-561 provide more detailed information to aid debugging? -C > > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <j...@hortonworks.com> wrote: >> In general, the "solid evidence" of memory leak comes from analysis of >> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which >> piece of code are leaking memory from the analysis. >> >> Unfortunately, I cannot find any conclusion from previous comments and it >> even cannot tell which daemons/components of HDFS consumes unexpected high >> memory. Don't sounds like a solid bug report to me. >> >> >> >> Thanks,? >> >> >> Junping >> >> >> ________________________________ >> From: Sean Busbey <bus...@cloudera.com> >> Sent: Tuesday, October 24, 2017 2:20 PM >> To: Junping Du >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; >> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 >> >> Just curious, Junping what would "solid evidence" look like? Is the >> supposition here that the memory leak is within HDFS test code rather than >> library runtime code? How would such a distinction be shown? >> >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du >> <j...@hortonworks.com<mailto:j...@hortonworks.com>> wrote: >> Allen, >> Do we have any solid evidence to show the HDFS unit tests going through >> the roof are due to serious memory leak by HDFS? Normally, I don't expect >> memory leak are identified in our UTs - mostly, it (test jvm gone) is just >> because of test or deployment issues. >> Unless there is concrete evidence, my concern on seriously memory leak >> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, >> etc.) have deployed 2.8 on large production environment for months. >> Non-serious memory leak (like forgetting to close stream in non-critical >> path, etc.) and other non-critical bugs always happens here and there that >> we have to live with. >> >> Thanks, >> >> Junping >> >> ________________________________________ >> From: Allen Wittenauer >> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> >> Sent: Tuesday, October 24, 2017 8:27 AM >> To: Hadoop Common >> Cc: Hdfs-dev; >> mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>; >> yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 >> >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer >>> <a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote: >>> >>> >>> >>> With no other information or access to go on, my current hunch is that one >>> of the HDFS unit tests is ballooning in memory size. The easiest way to >>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and >>> that's what this "feels" like. >>> >>> Someone should verify if 2.8.2 has the same issues before a release goes >>> out ... >> >> >> FWIW, I ran 2.8.2 last night and it has the same problems. >> >> Also: the node didn't die! Looking through the workspace (so the >> next run will destroy them), two sets of logs stand out: >> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt >> >> and >> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/ >> >> It looks like my hunch is correct: RAM in the HDFS unit tests are >> going through the roof. It's also interesting how MANY log files there are. >> Is surefire not picking up that jobs are dying? Maybe not if memory is >> getting tight. >> >> Anyway, at the point, branch-2.8 and higher are probably fubar'd. >> Additionally, I've filed YETUS-561 so that Yetus-controlled Docker >> containers can have their RAM limits set in order to prevent more nodes >> going catatonic. >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: >> yarn-dev-unsubscr...@hadoop.apache.org<mailto:yarn-dev-unsubscr...@hadoop.apache.org> >> For additional commands, e-mail: >> yarn-dev-h...@hadoop.apache.org<mailto:yarn-dev-h...@hadoop.apache.org> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: >> common-dev-unsubscr...@hadoop.apache.org<mailto:common-dev-unsubscr...@hadoop.apache.org> >> For additional commands, e-mail: >> common-dev-h...@hadoop.apache.org<mailto:common-dev-h...@hadoop.apache.org> >> >> >> >> >> -- >> busbey > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org