Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

Allen Wittenauer Tue, 24 Oct 2017 16:05:43 -0700

My plan is currently to:

*  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 
patch to test it out. 
* if the tests work, work on getting YETUS-561 committed to yetus master
* switch jobs back to ASF yetus master either post-YETUS-561 or without it if 
it doesn’t work
* go back to working on something else, regardless of the outcome



> On Oct 24, 2017, at 2:55 PM, Chris Douglas <[email protected]> wrote:
> 
> Sean/Junping-
> 
> Ignoring the epistemology, it's a problem. Let's figure out what's
> causing memory to balloon and then we can work out the appropriate
> remedy.
> 
> Is this reproducible outside the CI environment? To Junping's point,
> would YETUS-561 provide more detailed information to aid debugging? -C
> 
> On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <[email protected]> wrote:
>> In general, the "solid evidence" of memory leak comes from analysis of 
>> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
>> piece of code are leaking memory from the analysis.
>> 
>> Unfortunately, I cannot find any conclusion from previous comments and it 
>> even cannot tell which daemons/components of HDFS consumes unexpected high 
>> memory. Don't sounds like a solid bug report to me.
>> 
>> 
>> 
>> Thanks,?
>> 
>> 
>> Junping
>> 
>> 
>> ________________________________
>> From: Sean Busbey <[email protected]>
>> Sent: Tuesday, October 24, 2017 2:20 PM
>> To: Junping Du
>> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; 
>> [email protected]; [email protected]
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>> Just curious, Junping what would "solid evidence" look like? Is the 
>> supposition here that the memory leak is within HDFS test code rather than 
>> library runtime code? How would such a distinction be shown?
>> 
>> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
>> <[email protected]<mailto:[email protected]>> wrote:
>> Allen,
>>     Do we have any solid evidence to show the HDFS unit tests going through 
>> the roof are due to serious memory leak by HDFS? Normally, I don't expect 
>> memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
>> because of test or deployment issues.
>>     Unless there is concrete evidence, my concern on seriously memory leak 
>> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, 
>> etc.) have deployed 2.8 on large production environment for months. 
>> Non-serious memory leak (like forgetting to close stream in non-critical 
>> path, etc.) and other non-critical bugs always happens here and there that 
>> we have to live with.
>> 
>> Thanks,
>> 
>> Junping
>> 
>> ________________________________________
>> From: Allen Wittenauer 
>> <[email protected]<mailto:[email protected]>>
>> Sent: Tuesday, October 24, 2017 8:27 AM
>> To: Hadoop Common
>> Cc: Hdfs-dev; 
>> [email protected]<mailto:[email protected]>; 
>> [email protected]<mailto:[email protected]>
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
>>> <[email protected]<mailto:[email protected]>> wrote:
>>> 
>>> 
>>> 
>>> With no other information or access to go on, my current hunch is that one 
>>> of the HDFS unit tests is ballooning in memory size.  The easiest way to 
>>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and 
>>> that's what this "feels" like.
>>> 
>>> Someone should verify if 2.8.2 has the same issues before a release goes 
>>> out ...
>> 
>> 
>>        FWIW, I ran 2.8.2 last night and it has the same problems.
>> 
>>        Also: the node didn't die!  Looking through the workspace (so the 
>> next run will destroy them), two sets of logs stand out:
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>> 
>>                                                        and
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>> 
>>        It looks like my hunch is correct:  RAM in the HDFS unit tests are 
>> going through the roof.  It's also interesting how MANY log files there are. 
>>  Is surefire not picking up that jobs are dying?  Maybe not if memory is 
>> getting tight.
>> 
>>        Anyway, at the point, branch-2.8 and higher are probably fubar'd. 
>> Additionally, I've filed YETUS-561 so that Yetus-controlled Docker 
>> containers can have their RAM limits set in order to prevent more nodes 
>> going catatonic.
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: 
>> [email protected]<mailto:[email protected]>
>> For additional commands, e-mail: 
>> [email protected]<mailto:[email protected]>
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: 
>> [email protected]<mailto:[email protected]>
>> For additional commands, e-mail: 
>> [email protected]<mailto:[email protected]>
>> 
>> 
>> 
>> 
>> --
>> busbey
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

Reply via email to