[ 
https://issues.apache.org/jira/browse/HDFS-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049734#comment-14049734
 ] 

Mickael Olivier commented on HDFS-6515:
---------------------------------------

As said on https://issues.apache.org/jira/browse/HDFS-6608, the bug might be 
related to the hard-coded limit maxBytes = 65536 bytes, assigned at the 
begining of the TestFsDatasetCache.java file as follows : 

// Most Linux installs allow a default of 64KB locked memory
private static final long CACHE_CAPACITY = 64 * 1024;
conf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY,
        CACHE_CAPACITY);

Then on FsDatasetCache we have

this.maxBytes = dataset.datanode.getDnConf().getMaxLockedMemory();
This call gets back the value.

So I actually tried to come with something like 
private static final long CACHE_CAPACITY = 16 * 64 * 1024;

Forking I indeed retrieve in the logs maxBytes : 1048576\0A !
But the count value is now capped at 4096, which is weird. So I finally get

verifyExpectedCacheUsage: have 20480/327680 bytes cached; 5/5 blocks cached. 
memlock limit = 1125899906842624.  Waiting...\0A

each time the supplier tries to check the cache is used as expected.
Though it seems all 5 blocks are cached, the osPageSize on the size of the 
cache is still 4096. Which is what should be changed !

public long round(long count) {
      long newCount = 
          (count + (osPageSize - 1)) / osPageSize;
      return newCount * osPageSize;
    }

private final long osPageSize =
        NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize();

That should give 65536 again so when reserving 512 bytes, we should have 
newCount = 1, returning 65536 bytes to reserve.
Why is that not the case ? (First step is  @@reserve:: count : 4096 | next : 
4096 | maxBytes : 1048576\0A)


> testPageRounder   (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-6515
>                 URL: https://issues.apache.org/jira/browse/HDFS-6515
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.4.0
>         Environment: Linux on PPC64
>            Reporter: Tony Reix
>            Priority: Blocker
>              Labels: test
>
> I have an issue with test :
>    testPageRounder
>   (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
> on Linux/PowerPC.
> On Linux/Intel, test runs fine.
> On Linux/PowerPC, I have:
> testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)  
> Time elapsed: 64.037 sec  <<< ERROR!
> java.lang.Exception: test timed out after 60000 milliseconds
> Looking at details, I see that some "Failed to cache " messages appear in the 
> traces. Only 10 on Intel, but 186 on PPC64.
> On PPC64, it looks like some thread is waiting for something that never 
> happens, generating a TimeOut.
> I'm now using IBM JVM, however I've just checked that the issue also appears 
> with OpenJDK.
> I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 .
> I need help for understanding what the test is doing, what traces are 
> expected, in order to understand what/where is the root cause.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to