[ https://issues.apache.org/jira/browse/HADOOP-17224?focusedWorklogId=525284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525284 ]
ASF GitHub Bot logged work on HADOOP-17224: ------------------------------------------- Author: ASF GitHub Bot Created on: 16/Dec/20 21:40 Start Date: 16/Dec/20 21:40 Worklog Time Spent: 10m Work Description: amahussein commented on pull request #2537: URL: https://github.com/apache/hadoop/pull/2537#issuecomment-747057646 > All OOMs are "unable to create new native thread" indicating ulimit or resource shortage to create LWP. The first OOM is in TestJvmMetrics in hadoop-common. If ISA-L is related, the cause should be in the code path of ErasureCodeNative#loadLibrary. I don't have clear insight yet. I think we have been familiar with test failures by "unable to create new native thread" for a long time.. @iwasakims , I cannot fully confident that `ErasureCodeNative#loadLibrary` is a strong indication that ISLA-L does not contribute to the OOM. ISA-L is a native library; therefore loading this library means different memory allocations and possibly some background threads. For sure, we do not want to blame those pre-existing failures to ISA-L. However, adding ISA-L could increase failures because of the hadoop code, or the native code. I think there are two approaches: 1. Profile the memory. Then compare the two profiles with and without ISA-L. If there is no Yetus hookup to do that, then it will have to be done on a local machine for a sample of unit tests. 2. Add another commit that ignores the failures frequently reported in QBT report. In addition I suggest adding "ignore" to `TestDistributredShell#testDistributedShellWithResourcesWithLargeContainers` and `TestDistributredShell#testDistributedShellWithResources`. Those two tests leave two ApplicationMaster processes running in the background. After ignoring the "every-day" failures, we can look at the remaining failures as possible consequences of loading ISA-L. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 525284) Time Spent: 3h 10m (was: 3h) > Install Intel ISA-L library in Dockerfile > ----------------------------------------- > > Key: HADOOP-17224 > URL: https://issues.apache.org/jira/browse/HADOOP-17224 > Project: Hadoop Common > Issue Type: Bug > Reporter: Takanobu Asanuma > Assignee: Takanobu Asanuma > Priority: Blocker > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently, there is not isa-l library in the docker container, and jenkins > skips the natvie tests, TestNativeRSRawCoder and TestNativeXORRawCoder. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org