[jira] [Created] (HDFS-8520) Patch for PPC64

2015-06-02 Thread Tony Reix (JIRA)
Tony Reix created HDFS-8520:
---

 Summary: Patch for PPC64
 Key: HDFS-8520
 URL: https://issues.apache.org/jira/browse/HDFS-8520
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: RHEL 7.1 /PPC64
Reporter: Tony Reix
 Fix For: 2.7.1


The attached patch enables Hadoop to work on PPC64.
That deals with SystemPageSize and BloclSize , which are not 4096 on PPC64.

There are changes in 3 files:
- 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
- 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
- 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java

where 4096 is replaced by getOperatingSystemPageSize() or by using PAGE_SIZE

The patch has been built on branch-2.7 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8519) Patch for PPC64

2015-06-02 Thread Tony Reix (JIRA)
Tony Reix created HDFS-8519:
---

 Summary: Patch for PPC64
 Key: HDFS-8519
 URL: https://issues.apache.org/jira/browse/HDFS-8519
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: RHEL 7.1 /PPC64
Reporter: Tony Reix
 Fix For: 2.7.1


The attached patch enables Hadoop to work on PPC64.
That deals with SystemPageSize and BloclSize , which are not 4096 on PPC64.

There are changes in 3 files:
- 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
- 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
- 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java

where 4096 is replaced by getOperatingSystemPageSize() or by using PAGE_SIZE

The patch has been built on branch-2.7 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8518) Patch for PPC64

2015-06-02 Thread Tony Reix (JIRA)
Tony Reix created HDFS-8518:
---

 Summary: Patch for PPC64
 Key: HDFS-8518
 URL: https://issues.apache.org/jira/browse/HDFS-8518
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: RHEL 7.1 /PPC64
Reporter: Tony Reix
 Fix For: 2.7.1


The attached patch enables Hadoop to work on PPC64.
That deals with SystemPageSize and BloclSize , which are not 4096 on PPC64.

There are changes in 3 files:
- 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
- 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
- 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java

where 4096 is replaced by getOperatingSystemPageSize() or by using PAGE_SIZE

The patch has been built on branch-2.7 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8518) Patch for PPC64

2015-06-02 Thread Tony Reix (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Reix resolved HDFS-8518.
-
Resolution: Invalid

 Patch for PPC64
 ---

 Key: HDFS-8518
 URL: https://issues.apache.org/jira/browse/HDFS-8518
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: RHEL 7.1 /PPC64
Reporter: Tony Reix
 Fix For: 2.7.1


 The attached patch enables Hadoop to work on PPC64.
 That deals with SystemPageSize and BloclSize , which are not 4096 on PPC64.
 There are changes in 3 files:
 - 
 hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
 - 
 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
 - 
 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
 where 4096 is replaced by getOperatingSystemPageSize() or by using PAGE_SIZE
 The patch has been built on branch-2.7 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8519) Patch for PPC64

2015-06-02 Thread Tony Reix (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Reix resolved HDFS-8519.
-
Resolution: Invalid

 Patch for PPC64
 ---

 Key: HDFS-8519
 URL: https://issues.apache.org/jira/browse/HDFS-8519
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
 Environment: RHEL 7.1 /PPC64
Reporter: Tony Reix
 Fix For: 2.7.1


 The attached patch enables Hadoop to work on PPC64.
 That deals with SystemPageSize and BloclSize , which are not 4096 on PPC64.
 There are changes in 3 files:
 - 
 hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java
 - 
 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
 - 
 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
 where 4096 is replaced by getOperatingSystemPageSize() or by using PAGE_SIZE
 The patch has been built on branch-2.7 .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8506) List of 33 Unstable tests on branch-2.7

2015-06-01 Thread Tony Reix (JIRA)
Tony Reix created HDFS-8506:
---

 Summary: List of 33 Unstable tests on branch-2.7
 Key: HDFS-8506
 URL: https://issues.apache.org/jira/browse/HDFS-8506
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.1
 Environment: Ubuntu / x86_64
Hadoop branch-2.7 (source code of Monday May 26th)
Reporter: Tony Reix


On my Ubuntu / x86_64 machine, configured for Hadoop since months, I've run 
Hadoop tests of branch branch-2.7 (source code of Monday May 26th) during days. 
It produced 14 runs in the EXACT same environment. And it shows that several 
tests sometimes fail, randomly.
12 runs gave the exact same number of tests done and tests skipped:
- 10977 tests
- 254 skipped
1 test gave only 10972 tests. Another gave only 9760 tests.
I've used the 12 runs with 10977 tests for building the attached result file, 
which shows that 33 tests sometimes fail.

T: Tests
F: Failures
E: Errors
S: Skipped
NN/n : Number of times the issue appeared out of the 12 runs
m-M: minimum number of failure up to Maximum number of failures.

Example:
  T  F  
E  S |   NN/n
--
cli.TestHDFSCLI10-1  0  0 | 
 11/12
hdfs.TestAppendSnapshotTruncate  1  00-1  0 |   1/12
...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-6608) FsDatasetCache: hard-coded 4096 value in test is not appropriate for all HW

2014-06-30 Thread Tony Reix (JIRA)
Tony Reix created HDFS-6608:
---

 Summary: FsDatasetCache: hard-coded 4096 value in test is not 
appropriate for all HW
 Key: HDFS-6608
 URL: https://issues.apache.org/jira/browse/HDFS-6608
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
 Environment: PPC64 (LE  BE, OpenJDK  IBM JVM, Ubuntu, RHEL 7  RHEL 
6.5)
Reporter: Tony Reix


The value 4096 is hard-coded in HDFS code (product and tests).
It appears 171 times, including 8 times in product (not tests) code:
hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs : 163
hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs : 4
hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http : 3
hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/wsrs : 1

This value deals with different subjects: files, block size, page size, etc.
4096 (as block size and page size) is appropriate for many systems, but not for 
PPC64, for which it is 65536.

Looking at HDFS product (not test) code, it seems (no 100% sure) that the code 
is OK (not using hard-coded page/block size). However someone should check this 
in depth.

his.maxBytes = dataset.datanode.getDnConf().getMaxLockedMemory();

However, at test level, the value 4096 is used in many places and it is very 
hard to understand if it depends on the HW architecture or not.

About test TestFsDatasetCache#testPageRounder, the HW value is sometimes got 
from the system :
 private static final long PAGE_SIZE = 
NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize();
private static final long BLOCK_SIZE = PAGE_SIZE;
but there are several places where 4096 is used whenever it should depend on 
the HW value.

conf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY, CACHE_CAPACITY);
 With:
// Most Linux installs allow a default of 64KB locked memory
private static final long CACHE_CAPACITY = 64 * 1024
However, for PPC64, this value should be much bigger.

This TestFsDatasetCache#testPageRounder test is aimed to cache 5 pages of size 
512. However, the page size is 65536 on PPC64 and 4064 on x86_64. Thus, the 
method in charge of reserving blocks in the HDFS cache will by 4096 bytes steps 
on x86_64 and 65536 bytes steps on PPC64 , whith a hard-coded limit : maxBytes 
= 65536 bytes

5 * 4096 = 20480 : OK
5 * 65536 = 327680 : KO : the test ends by TimeOut since the limit is 
overpassed at the very beginning and the test is still waiting.

As a conclusion, there are several issues to fix:
 - instead of using many hard-coded values 4096, the (test mainly) code should 
use Java constants built by using HW values (like : 
NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize() )
 - several constants must be used since 4096 deals with different subjects, 
included some that do not depend on the HW
 - the test must be improved for handling cases where the limit is over-passed 
at the very beginning



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6515) testPageRounder (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)

2014-06-11 Thread Tony Reix (JIRA)
Tony Reix created HDFS-6515:
---

 Summary: testPageRounder   
(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
 Key: HDFS-6515
 URL: https://issues.apache.org/jira/browse/HDFS-6515
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
 Environment: Linux on PPC64
Reporter: Tony Reix
Priority: Blocker


I have an issue with test :
   testPageRounder
  (org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)
on Linux/PowerPC.

On Linux/Intel, test runs fine.

On Linux/PowerPC, I have:
testPageRounder(org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache)  
Time elapsed: 64.037 sec   ERROR!
java.lang.Exception: test timed out after 6 milliseconds

Looking at details, I see that some Failed to cache  messages appear in the 
traces. Only 10 on Intel, but 186 on PPC64.

On PPC64, it looks like some thread is waiting for something that never 
happens, generating a TimeOut.

I'm now using IBM JVM, however I've just checked that the issue also appears 
with OpenJDK.

I'm now using Hadoop latest, however, the issue appeared within Hadoop 2.4.0 .

I need help for understanding what the test is doing, what traces are expected, 
in order to understand what/where is the root cause.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6311) TestLargeBlock#testLargeBlockSize : File /tmp/TestLargeBlock/2147484160.dat could only be replicated to 0 nodes instead of minReplication (=1)

2014-04-30 Thread Tony Reix (JIRA)
Tony Reix created HDFS-6311:
---

 Summary: TestLargeBlock#testLargeBlockSize : File 
/tmp/TestLargeBlock/2147484160.dat could only be replicated to 0 nodes instead 
of minReplication (=1)
 Key: HDFS-6311
 URL: https://issues.apache.org/jira/browse/HDFS-6311
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
 Environment: Virtual Box - Ubuntu 14.04 - x86_64
Reporter: Tony Reix


I'm testing HDFS 2.4.0 

Apache Hadoop HDFS: Tests run: 2650, Failures: 2, Errors: 
2, Skipped: 99

I have the following error each time I launch my tests (3 tries).

Forking command line: /bin/sh -c cd 
/home/tony/HADOOP/hadoop-2.4.0-src/hadoop-hdfs-project/hadoop-hdfs  
/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java -Xmx1024m 
-XX:+HeapDumpOnOutOfMemoryError -jar 
/home/tony/HADOOP/hadoop-2.4.0-src/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefirebooter2355654085353142996.jar
 
/home/tony/HADOOP/hadoop-2.4.0-src/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire983005167523288650tmp
 
/home/tony/HADOOP/hadoop-2.4.0-src/hadoop-hdfs-project/hadoop-hdfs/target/surefire/surefire_4328161716955453811297tmp

Running org.apache.hadoop.hdfs.TestLargeBlock

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.011 sec  
FAILURE! - in org.apache.hadoop.hdfs.TestLargeBlock
testLargeBlockSize(org.apache.hadoop.hdfs.TestLargeBlock)  Time elapsed: 15.549 
sec   ERROR!

org.apache.hadoop.ipc.RemoteException: File /tmp/TestLargeBlock/2147484160.dat 
could only be replicated to 0 nodes instead of minReplication (=1).  There are 
1 datanode(s) running and no node(s) are excluded in this operation.
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)

at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)




--
This message was sent by Atlassian JIRA
(v6.2#6252)