[ 
https://issues.apache.org/jira/browse/HADOOP-17224?focusedWorklogId=525284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525284
 ]

ASF GitHub Bot logged work on HADOOP-17224:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Dec/20 21:40
            Start Date: 16/Dec/20 21:40
    Worklog Time Spent: 10m 
      Work Description: amahussein commented on pull request #2537:
URL: https://github.com/apache/hadoop/pull/2537#issuecomment-747057646


   > All OOMs are "unable to create new native thread" indicating ulimit or 
resource shortage to create LWP. The first OOM is in TestJvmMetrics in 
hadoop-common. If ISA-L is related, the cause should be in the code path of 
ErasureCodeNative#loadLibrary. I don't have clear insight yet. I think we have 
been familiar with test failures by "unable to create new native thread" for a 
long time..
   
   @iwasakims , I cannot fully confident that `ErasureCodeNative#loadLibrary` 
is a strong indication that ISLA-L does not contribute to the OOM.
   ISA-L is a native library; therefore loading this library means different 
memory allocations and possibly some background threads.
   
   For sure, we do not want to blame those pre-existing failures to ISA-L. 
However, adding ISA-L could increase failures because of the hadoop code, or 
the native code.
   
   I think there are two approaches:
   
   1. Profile the memory. Then compare the two profiles with and without ISA-L. 
If there is no Yetus hookup to do that, then it will have to be done on a local 
machine for a sample of unit tests.
   2. Add another commit that ignores the failures frequently reported in QBT 
report. In addition I suggest adding "ignore" to 
`TestDistributredShell#testDistributedShellWithResourcesWithLargeContainers` 
and `TestDistributredShell#testDistributedShellWithResources`. Those two tests 
leave two ApplicationMaster processes running in the background. After ignoring 
the "every-day" failures, we can look at the remaining failures as possible 
consequences of loading ISA-L.
    
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 525284)
    Time Spent: 3h 10m  (was: 3h)

> Install Intel ISA-L library in Dockerfile
> -----------------------------------------
>
>                 Key: HADOOP-17224
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17224
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Takanobu Asanuma
>            Assignee: Takanobu Asanuma
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently, there is not isa-l library in the docker container, and jenkins 
> skips the natvie tests, TestNativeRSRawCoder and TestNativeXORRawCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to