[jira] [Created] (HADOOP-12926) lz4.c does not detect 64-bit mode properly

2016-03-15 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12926:
--

 Summary: lz4.c does not detect 64-bit mode properly
 Key: HADOOP-12926
 URL: https://issues.apache.org/jira/browse/HADOOP-12926
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Affects Versions: 2.7.2
 Environment: All
Reporter: Alan Burlison
Assignee: Alan Burlison


lz4.c should check to see if the _LP64 macro is defined to correctly detect 
when it is being built in 64-bit mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12925) Checks for SPARC architecture need to include 64-bit SPARC

2016-03-15 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12925:
--

 Summary: Checks for SPARC architecture need to include 64-bit SPARC
 Key: HADOOP-12925
 URL: https://issues.apache.org/jira/browse/HADOOP-12925
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: conf
Affects Versions: 2.7.2
 Environment: 64-bit SPARC
Reporter: Alan Burlison
Assignee: Alan Burlison


FastByteComparisons.java and NativeCrc32.java check for the SPARC platform by 
comparing the os.arch property against "sparc". That doesn't detect 64-bit 
SPARC ("sparcv9"), the test should be "startsWith", not "equals"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12727) Minor cleanups needed for CMake 3.X

2016-01-21 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12727:
--

 Summary: Minor cleanups needed for CMake 3.X
 Key: HADOOP-12727
 URL: https://issues.apache.org/jira/browse/HADOOP-12727
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Affects Versions: 2.7.1
Reporter: Alan Burlison
Assignee: Alan Burlison
Priority: Minor


On switching from CMake 2.8.6 to 3.3.2 a couple of minor issues popped up:
\\
\\
* There's a syntax error in 
{{hadoop-common-project/hadoop-common/src/CMakeLists.txt}} that generates a 
warning in 3.X

* {{CMAKE_SHARED_LINKER_FLAGS}} is being incorrectly set in 
{{hadoop-common-project/hadoop-common/HadoopCommon.cmake}} - despite the name 
it contains the flags passed to {{ar}} not to the linker. 2.8.6 ignores the 
incorrect flags, 3.3.2 doesn't and building static libraries fails as a result. 
See http://public.kitware.com/pipermail/cmake/2016-January/062447.html

Patch to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12720) Misuse of sun.misc.Unsafe by org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo causes misaligned memory access coredumps

2016-01-18 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12720:
--

 Summary: Misuse of sun.misc.Unsafe by 
org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo
 causes misaligned memory access coredumps
 Key: HADOOP-12720
 URL: https://issues.apache.org/jira/browse/HADOOP-12720
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Affects Versions: 2.7.1
 Environment: Solaeis SPARC
Reporter: Alan Burlison


Core dump details below:

{noformat}
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.85-b07 mixed mode 
solaris-sparc compressed oops)
# Problematic frame:
# J 86 C2 
org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo([BII[BII)I
 (273 bytes) @ 0x6fc9b150 [0x6fc9b0e0+0x70]

Stack: [0x7e20,0x7e30],  sp=0x7e2fce50,  free 
space=1011k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 86 C2 
org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo([BII[BII)I
 (273 bytes) @ 0x6fc9b150 [0x6fc9b0e0+0x70]
j  
org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(Ljava/lang/Object;IILjava/lang/Object;II)I+16
j  org.apache.hadoop.io.FastByteComparisons.compareTo([BII[BII)I+11
j  org.apache.hadoop.io.WritableComparator.compareBytes([BII[BII)I+8
j  org.apache.hadoop.io.Text$Comparator.compare([BII[BII)I+39
j  org.apache.hadoop.io.TestText.testCompare()V+167
v  ~StubRoutines::call_stub
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12630) Mususe of sun.misc.Unsafe causes midaligned memory access coredumps

2015-12-09 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12630:
--

 Summary: Mususe of sun.misc.Unsafe causes midaligned memory access 
coredumps
 Key: HADOOP-12630
 URL: https://issues.apache.org/jira/browse/HADOOP-12630
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Affects Versions: 3.0.0
 Environment: Solaris SPARC
Reporter: Alan Burlison
Assignee: Alan Burlison


Misuse of sun.misc.unsafe by {{org.apache.hadoop.io.FastByteComparisons}} 
causes misaligned memory accesses and results in coredumps. Stack traces below:

{code}
hadoop-tools/hadoop-gridmix/core
 --- called from signal handler with signal 10 (SIGBUS) ---
 7717fa40 Unsafe_GetLong (18c000, 7e2fd6d8, 0, 19, 
775d4be0, 10018c000) + 158
 70810dcc * sun/misc/Unsafe.getLong(Ljava/lang/Object;J)J+-30004
 70810d70 * sun/misc/Unsafe.getLong(Ljava/lang/Object;J)J+0
 70806d58 * 
org/apache/hadoop/io/FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo([BII[BII)I+91
 (line 405)
 70806cb4 * 
org/apache/hadoop/io/FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(Ljava/lang/Object;IILjava/lang/Object;II)I+16
 (line 264)
 7080783c * 
org/apache/hadoop/io/FastByteComparisons.compareTo([BII[BII)I+11 (line 92)
 70806cb4 * 
org/apache/hadoop/io/WritableComparator.compareBytes([BII[BII)I+8 (line 376)
 70806cb4 * 
org/apache/hadoop/mapred/gridmix/GridmixRecord$Comparator.compare([BII[BII)I+61 
(line 522)
 70806cb4 * 
org/apache/hadoop/mapred/gridmix/TestGridmixRecord.binSortTest(Lorg/apache/hadoop/mapred/gridmix/GridmixRecord;Lorg/apache/hadoop/mapred/gridmix/GridmixRecord;IILorg/apache/hadoop/io/WritableComparator;)V+280
 (line 268)
 70806f44 * 
org/apache/hadoop/mapred/gridmix/TestGridmixRecord.testBaseRecord()V+57 (line 
482)
{code}

This also causes {{hadoop-mapreduce-project/hadoop-mapreduce-examples/core}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12629) Misaligned memory accesses in CRC32 native code causes coredumps

2015-12-09 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12629:
--

 Summary: Misaligned memory accesses in CRC32 native code causes 
coredumps
 Key: HADOOP-12629
 URL: https://issues.apache.org/jira/browse/HADOOP-12629
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Affects Versions: 3.0.0
 Environment: Solaris SPARC
Reporter: Alan Burlison
Assignee: Alan Burlison


Testing on Solaris SPARC produces multiple SIGBUS core dumps, which are usually 
due to misaligned memory access. Some representative stack traces are below:

{code}
hadoop-hdfs-project/hadoop-hdfs/core
 --- called from signal handler with signal 10 (SIGBUS) ---
 5d3245ec crc32c_sb8 (, 7ada5f02f, 0, 40, ff, ff00) + 9c
 5d324954 pipelined_crc32c_sb8 (4a3fe0b4, 4a3fe0b8, 
4a3fe0bc, 7ada5f02f, 200, 2) + 24
 5d324c58 bulk_crc (7ada5f02f, 400, 7ada5f027, 2, 200, 
4a3fe1b0) + 1f8
 5d3242d0 
Java_org_apache_hadoop_util_NativeCrc32_nativeComputeChunkedSumsByteArray 
(1029b99e8, 4a3fe4e8, 200, 10, 4a3fe4f8, 1f) + 1c0
 70810dcc * 
org/apache/hadoop/util/NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V+14336
 70810d70 * 
org/apache/hadoop/util/NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V+0
 70806f44 * 
org/apache/hadoop/util/NativeCrc32.verifyChunkedSumsByteArray(II[BI[BIILjava/lang/String;J)V+15
 (line 138)
 70806f44 * 
org/apache/hadoop/util/DataChecksum.verifyChunkedSums([BII[BILjava/lang/String;J)V+39
 (line 691)
 70806f44 * 
org/apache/hadoop/util/DataChecksum.verifyChunkedSums(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;Ljava/lang/String;J)V+59
 (line 585)
 70806f44 * 
org/apache/hadoop/hdfs/server/datanode/BlockReceiver.verifyChunks(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)V+11
 (line 914)
 70806f44 * 
org/apache/hadoop/hdfs/server/datanode/BlockReceiver.receivePacket()I+658 (line 
1092)
 70806cb4 * 
org/apache/hadoop/hdfs/server/datanode/BlockReceiver.receiveBlock(Ljava/io/DataOutputStream;Ljava/io/DataInputStream;Ljava/io/DataOutputStream;Ljava/lang/String;Lorg/apache/hadoop/hdfs/util/DataTransferThrottler;[Lorg/apache/hadoop/hdfs/protocol/DatanodeInf+97
 (line 1786)
 70806f44 * 
org/apache/hadoop/hdfs/server/datanode/DataXceiver.writeBlock(Lorg/apache/hadoop/hdfs/protocol/ExtendedBlock;Lorg/apache/hadoop/fs/StorageType;Lorg/apache/hadoop/security/token/Token;Ljava/lang/String;[Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;[Lorg/apa+1428
 (line 1403)
 70806f44 * 
org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.opWriteBlock(Ljava/io/DataInputStream;)V+178
 (line 327)
 70806f44 * 
org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.processOp(Lorg/apache/hadoop/hdfs/protocol/datatransfer/Op;)V+72
 (line 196)
 70806f44 * 
org/apache/hadoop/hdfs/server/datanode/DataXceiver.run()V+539 (line 778)

hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/core
 --- called from signal handler with signal 10 (SIGBUS) ---
 5a324c8c bulk_crc (7b3ef0e53, 1, 7b3ef0e4f, 0, 0, 2d5fe230) + 
22c
 5a3242d0 
Java_org_apache_hadoop_util_NativeCrc32_nativeComputeChunkedSumsByteArray 
(1048ec1e8, 2d5fe568, 200, 10, 2d5fe578, 1f) + 1c0
 70810dcc * 
org/apache/hadoop/util/NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V+14336
 70810d70 * 
org/apache/hadoop/util/NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V+0
 70806f44 * 
org/apache/hadoop/util/NativeCrc32.verifyChunkedSumsByteArray(II[BI[BIILjava/lang/String;J)V+15
 (line 138)
 70806f44 * 
org/apache/hadoop/util/DataChecksum.verifyChunkedSums([BII[BILjava/lang/String;J)V+39
 (line 691)
 70806f44 * 
org/apache/hadoop/util/DataChecksum.verifyChunkedSums(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;Ljava/lang/String;J)V+59
 (line 585)
 70806f44 * 
org/apache/hadoop/hdfs/server/datanode/BlockReceiver.verifyChunks(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)V+11
 (line 914)
 70806f44 * 
org/apache/hadoop/hdfs/server/datanode/BlockReceiver.receivePacket()I+658 (line 
1092)
 70806cb4 * 
org/apache/hadoop/hdfs/server/datanode/BlockReceiver.receiveBlock(Ljava/io/DataOutputStream;Ljava/io/DataInputStream;Ljava/io/DataOutputStream;Ljava/lang/String;Lorg/apache/hadoop/hdfs/util/DataTransferThrottler;[Lorg/apache/hadoop/hdfs/protocol/DatanodeInf+97
 (line 1786)
 70806f44 * 
org/apache/hadoop/hdfs/server/datanode/DataXceiver.writeBlock(Lorg/apache/hadoop/hdfs/protocol/ExtendedBlock;Lorg/apache/hadoop/fs/StorageType;Lorg/apache/hadoop/security/token/Token;Ljava/lang/String;[Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;[Lorg/apa+1428
 (line 1403)
 70806f44 * 
org/apache/

[jira] [Created] (HADOOP-12583) Sundry symlink problems on Solaris

2015-11-18 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12583:
--

 Summary: Sundry symlink problems on Solaris
 Key: HADOOP-12583
 URL: https://issues.apache.org/jira/browse/HADOOP-12583
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.7.1
 Environment: Solaris
Reporter: Alan Burlison
Priority: Minor


There are various filesystem test failures on Solaris:

{code}
  TestSymlinkLocalFSFileContext>TestSymlinkLocalFS.testDanglingLink:156 
expected:<[alanbur]> but was:<[]>
  
TestSymlinkLocalFSFileContext>TestSymlinkLocalFS.testSetTimesSymlinkToDir:233->SymlinkBaseTest.testSetTimesSymlinkToDir:1391
 expected:<1447788288000> but was:<3000>
  
TestSymlinkLocalFSFileContext>TestSymlinkLocalFS.testSetTimesSymlinkToFile:227->SymlinkBaseTest.testSetTimesSymlinkToFile:1376
 expected:<144778829> but was:<3000>
  TestSymlinkLocalFSFileSystem>TestSymlinkLocalFS.testDanglingLink:156 
expected:<[alanbur]> but was:<[]>
  
TestSymlinkLocalFSFileSystem>TestSymlinkLocalFS.testSetTimesSymlinkToDir:233->SymlinkBaseTest.testSetTimesSymlinkToDir:1391
 expected:<1447788416000> but was:<3000>
  
TestSymlinkLocalFSFileSystem>TestSymlinkLocalFS.testSetTimesSymlinkToFile:227->SymlinkBaseTest.testSetTimesSymlinkToFile:1376
 expected:<1447788417000> but was:<3000>
{code}

I'm not sure what the root cause it, most likely Linux-specific assumptions 
about how symlinks behave. Further investigation needed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12581) ShellBasedIdMapping needs suport for Solaris

2015-11-18 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12581:
--

 Summary: ShellBasedIdMapping needs suport for Solaris
 Key: HADOOP-12581
 URL: https://issues.apache.org/jira/browse/HADOOP-12581
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: 2.7.1
 Environment: Solaris
Reporter: Alan Burlison


ShellBasedIdMapping only supports Linux and OSX, support for Solaris needs 
adding.

>From looking at the Linux support in ShellBasedIdMapping, the same sequences 
>of shell commands should work for Solaris as well so all that's probably 
>needed is to change the implementation of checkSupportedPlatform() to treat 
>Linux and Solaris the same way, plus possibly some renaming of other methods 
>to make it more obvious they are not Linux-only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12580) Hadoop needs a SysInfo class for Solaris

2015-11-18 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12580:
--

 Summary: Hadoop needs a SysInfo class for Solaris
 Key: HADOOP-12580
 URL: https://issues.apache.org/jira/browse/HADOOP-12580
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: util
Affects Versions: 2.7.1
 Environment: Solaris
Reporter: Alan Burlison
Assignee: Alan Burlison


During testing multiple failures of the following sort are reported:

{code}
java.lang.UnsupportedOperationException: Could not determine OS
at org.apache.hadoop.util.SysInfo.newInstance(SysInfo.java:43)
at 
org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.(ResourceCalculatorPlugin.java:41)
at 
org.apache.hadoop.mapred.gridmix.DummyResourceCalculatorPlugin.(DummyResourceCalculatorPlugin.java:32)
at 
org.apache.hadoop.mapred.gridmix.TestGridmixMemoryEmulation.testTotalHeapUsageEmulatorPlugin(TestGridmixMemoryEmulation.java:131)
{code}

This is because there is no SysInfo subclass for Solaris, from SysInfo.java

{code}
  public static SysInfo newInstance() {
if (Shell.LINUX) {
  return new SysInfoLinux();
}
if (Shell.WINDOWS) {
  return new SysInfoWindows();
}
throw new UnsupportedOperationException("Could not determine OS");
  }
{code}

An implementation of SysInfoSolaris needs to be written and plumbed in to 
SysInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HADOOP-12344) Improve validateSocketPathSecurity0 error message

2015-11-06 Thread Alan Burlison (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison reopened HADOOP-12344:


This fix introduced multiple sprintf format warnings which need attention.

{code}
 [exec] 
op/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:488:10:
 warning: format ‘%ld’ expects argument of type ‘long int’, but argument 6 has 
type ‘long long int’ [-Wformat=]
 [exec]   check, path, mode, (long long)st.st_uid, (long 
long)st.st_gid, check);
 [exec]   ^
 [exec] 
/pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:488:10:
 warning: format ‘%ld’ expects argument of type ‘long int’, but argument 7 has 
type ‘long long int’ [-Wformat=]
 [exec] 
/pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:500:10:
 warning: format ‘%ld’ expects argument of type ‘long int’, but argument 6 has 
type ‘long long int’ [-Wformat=]
 [exec]   check, check);
 [exec]   ^
 [exec] 
/pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:500:10:
 warning: format ‘%ld’ expects argument of type ‘long int’, but argument 7 has 
type ‘long long int’ [-Wformat=]
 [exec] 
/pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:513:10:
 warning: format ‘%ld’ expects argument of type ‘long int’, but argument 6 has 
type ‘long long int’ [-Wformat=]
 [exec]   (long long)uid, check, (long long)uid, check);
 [exec]   ^
 [exec] 
/pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:513:10:
 warning: format ‘%ld’ expects argument of type ‘long int’, but argument 7 has 
type ‘long long int’ [-Wformat=]
 [exec] 
/pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:513:10:
 warning: format ‘%ld’ expects argument of type ‘long int’, but argument 8 has 
type ‘long long int’ [-Wformat=]
 [exec] 
/pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:513:10:
 warning: format ‘%ld’ expects argument of type ‘long int’, but argument 10 has 
type ‘long long int’ [-Wformat=]
{code}

> Improve validateSocketPathSecurity0 error message
> -
>
> Key: HADOOP-12344
> URL: https://issues.apache.org/jira/browse/HADOOP-12344
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Reporter: Casey Brotherton
>Assignee: Casey Brotherton
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: HADOOP-12344.001.patch, HADOOP-12344.002.patch, 
> HADOOP-12344.003.patch, HADOOP-12344.004.patch, HADOOP-12344.patch
>
>
> When a socket path does not have the correct permissions, an error is thrown.
> That error just has the failing component of the path and not the entire path 
> of the socket.
> The entire path of the socket could be printed out to allow for a direct 
> check of the permissions of the entire path.
> {code}
> java.io.IOException: the path component: '/' is world-writable.  Its 
> permissions are 0077.  Please fix this or select a different socket path.
>   at 
> org.apache.hadoop.net.unix.DomainSocket.validateSocketPathSecurity0(Native 
> Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:189)
> ...
> {code}
> The error message could also provide the socket path:
> {code}
> java.io.IOException: the path component: '/' is world-writable.  Its 
> permissions are 0077.  Please fix this or select a different socket path than 
> '/var/run/hdfs-sockets/dn'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12488) DomainSocket: Solaris does not support timeouts on AF_UNIX sockets

2015-10-16 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12488:
--

 Summary: DomainSocket: Solaris does not support timeouts on 
AF_UNIX sockets
 Key: HADOOP-12488
 URL: https://issues.apache.org/jira/browse/HADOOP-12488
 Project: Hadoop Common
  Issue Type: Bug
  Components: net
Affects Versions: 2.7.1
 Environment: Solaris
Reporter: Alan Burlison


>From the hadoop-common-dev mailing list:

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201509.mbox/%3c560b99f6.6010...@oracle.com%3E
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201510.mbox/%3c560ea6bf.2070...@oracle.com%3E

{noformat}
Now that the Hadoop native code builds on Solaris I've been chipping 
away at all the test failures. About 50% of the failures involve 
DomainSocket, either directly or indirectly. That seems to be mainly 
because the tests use DomainSocket to do single-node testing, whereas in 
production it seems that DomainSocket is less commonly used 
(https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html).

The particular problem on Solaris is that socket read/write timeouts 
(the SO_SNDTIMEO and SO_RCVTIMEO socket options) are not supported for 
UNIX domain (PF_UNIX) sockets. Those options are however supported for 
PF_INET sockets. That's because the socket implementation on Solaris is 
split roughly into two parts, for inet sockets and for STREAMS sockets, 
and the STREAMS implementation lacks support for SO_SNDTIMEO and 
SO_RCVTIMEO. As an aside, performance of sockets that use loopback or 
the host's own IP is slightly better than that of UNIX domain sockets on 
Solaris.

I'm investigating getting timeouts supported for PF_UNIX sockets added 
to Solaris, but in the meantime I'm also looking how this might be 
worked around in Hadoop. One way would be to implement timeouts by 
wrapping all the read/write/send/recv etc calls in DomainSocket.c with 
either poll() or select().

The basic idea is to add two new fields to DomainSocket.c to hold the 
read/write timeouts. On platforms that support SO_SNDTIMEO and 
SO_RCVTIMEO these would be unused as setsockopt() would be used to set 
the socket timeouts. On platforms such as Solaris the JNI code would use 
the values to implement the timeouts appropriately.

To prevent the code in DomainSocket.c becoming a #ifdef hairball, the 
current socket IO function calls such as accept(), send(), read() etc 
would be replaced with a macros such as HD_ACCEPT. On platforms that 
provide timeouts these would just expand to the normal socket functions, 
on platforms that don't support timeouts it would expand to wrappers 
that implements timeouts for them.

The only caveats are that all code that does anything to a PF_UNIX 
socket would *always* have to do so via DomainSocket. As far as I can 
tell that's not an issue, but it would have to be borne in mind if any 
changes were made in this area.

Before I set about doing this, does the approach seem reasonable?
{noformat}

{noformat}
Unfortunately it's not a simple as I'd hoped. For some reason I don't 
really understand, nearly all the JNI methods are declared as static and 
therefore don't get a "this" pointer and as a consequence all the class 
data members that are needed by the JNI code have to be passed in as 
parameters. That also means it's not possible to store the timeouts in 
the DomainSocket fields from within the JNI code. Most of the JNI 
methods should be instance methods rather than static ones, but making 
that change would require some significant surgery to DomainSocket.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12487) DomainSocket.close() assumes incorrect Linux behaviour

2015-10-16 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12487:
--

 Summary: DomainSocket.close() assumes incorrect Linux behaviour
 Key: HADOOP-12487
 URL: https://issues.apache.org/jira/browse/HADOOP-12487
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: net
Affects Versions: 2.7.1
 Environment: Linux Solaris
Reporter: Alan Burlison


I'm getting a test failure in TestDomainSocket.java, in the 
testSocketAcceptAndClose test. That test creates a socket which one thread 
waits on in DomainSocket.accept() whilst a second thread sleeps for a short 
time before closing the same socket with DomainSocket.close().

DomainSocket.close() first calls shutdown0() on the socket before closing 
close0() - both those are thin wrappers around the corresponding libc socket 
calls. DomainSocket.close() contains the following comment, explaining the 
logic involved:

{code}
  // Calling shutdown on the socket will interrupt blocking system
  // calls like accept, write, and read that are going on in a
  // different thread.
{code}

Unfortunately that relies on non-standards-compliant Linux behaviour. I've 
written a simple C test case that replicates the scenario above:

# ThreadA opens, binds, listens and accepts on a socket, waiting for 
connections.
# Some time later ThreadB calls shutdown on the socket ThreadA is waiting in 
accept on.

Here is what happens:

On Linux, the shutdown call in ThreadB succeeds and the accept call in ThreadA 
returns with EINVAL.

On Solaris, the shutdown call in ThreadB fails and returns ENOTCONN. ThreadA 
continues to wait in accept.

Relevant POSIX manpages:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/accept.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/shutdown.html

The POSIX shutdown manpage says:

"The shutdown() function shall cause all or part of a full-duplex connection on 
the socket associated with the file descriptor socket to be shut down."
...
"\[ENOTCONN] The socket is not connected."

Page 229 & 303 of "UNIX System V Network Programming" say:

"shutdown can only be called on sockets that have been previously connected"

"The socket \[passed to accept that] fd refers to does not participate in the 
connection. It remains available to receive further connect indications"

That is pretty clear, sockets being waited on with accept are not connected by 
definition. Nor is it the accept socket connected when a client connects to it, 
it is the socket returned by accept that is connected to the client. Therefore 
the Solaris behaviour of failing the shutdown call is correct.

In order to get the required behaviour of ThreadB causing ThreadA to exit the 
accept call with an error, the correct way is for ThreadB to call close on the 
socket that ThreadA is waiting on in accept.

On Solaris, calling close in ThreadB succeeds, and the accept call in ThreadA 
fails and returns EBADF.

On Linux, calling close in ThreadB succeeds but ThreadA continues to wait in 
accept until there is an incoming connection. That accept returns successfully. 
However subsequent accept calls on the same socket return EBADF.

The Linux behaviour is fundamentally broken in three places:

# Allowing shutdown to succeed on an unconnected socket is incorrect.  
# Returning a successful accept on a closed file descriptor is incorrect, 
especially as future accept calls on the same socket fail.
# Once shutdown has been called on the socket, calling close on the socket 
fails with EBADF. That is incorrect, shutdown should just prevent further IO on 
the socket, it should not close it.

The real issue though is that there's no single way of doing this that works on 
both Solaris and Linux, there will need to be platform-specific code in Hadoop 
to cater for the Linux brokenness. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-12300) BUILDING.txt instructions for skipping tests incomplete/obsolete, build can fail.

2015-08-04 Thread Alan Burlison (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison resolved HADOOP-12300.

Resolution: Duplicate

> BUILDING.txt instructions for skipping tests incomplete/obsolete, build can 
> fail.
> -
>
> Key: HADOOP-12300
> URL: https://issues.apache.org/jira/browse/HADOOP-12300
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.5.0, 2.6.0
> Environment: maven 3.2.3 +
>Reporter: Peter D Kirchner
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Instructions currently online, and within the hadoop source tree, appear to 
> be incomplete/obsolete for building the source.  I checked 2.5.0 and 2.6.0 .
> with "mvn install -DskipTests" my build fails on multiple hadoop src/test 
> sub-directories with:
>Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
> (default-testCompile)...
> This compile failure may be related to my jdk/jre, but why is my build ("mvn 
> install -DskipTests") compiling these tests?
> The pom.xml files are using the maven-compiler-plugin to compile some of the 
> tests instead of surefire.  The maven-compiler-plugin is unaffected by 
> -DskipTests and per maven documentation requires -Dmaven.test.skip instead, 
> which the surefire plugin also obeys.
> Building with
> mvn install -Dmaven.test.skip
> completes in my environment.
> I suggest a "major" rating because of the impact on users of the source 
> tarballs.  IMO the build instructions in the tarball and online (e.g. 
> https://wiki.apache.org/hadoop/EclipseEnvironment ) should work reliably.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12261) Surefire needs to make sure the JVMs it fires up are 64-bit

2015-07-23 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12261:
--

 Summary: Surefire needs to make sure the JVMs it fires up are 
64-bit
 Key: HADOOP-12261
 URL: https://issues.apache.org/jira/browse/HADOOP-12261
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.1
Reporter: Alan Burlison


hadoop-project/pom.xml sets maven-surefire-plugin.argLine to include -Xmx4096m. 
Allocating  that amount of memory requires a 64-bit JVM, but on platforms with 
both 32 and 64-bit JVMs surefire runs the 32 bit version by default and tests 
fail to start as a result. "-d64" should be added to the command-line arguments 
to ensure a 64-bit JVM is always used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12170) hadoop-common's JNIFlags.cmake is redundant and can be removed

2015-07-01 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12170:
--

 Summary: hadoop-common's JNIFlags.cmake is redundant and can be 
removed
 Key: HADOOP-12170
 URL: https://issues.apache.org/jira/browse/HADOOP-12170
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Reporter: Alan Burlison
Assignee: Alan Burlison


With the integration of:

* HADOOP-12036 Consolidate all of the cmake extensions in one *directory
* HADOOP-12104 Migrate Hadoop Pipes native build to new CMake
* HDFS-8635 Migrate HDFS native build to new CMake framework
* MAPREDUCE-6407 Migrate MAPREDUCE native build to new CMake
YARN-3827 Migrate YARN native build to new CMake framework

hadoop-common-project/hadoop-common/src/JNIFlags.cmake is now redundant and can 
be removed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12114) Make hadoop-tools/hadoop-pipes Native code -Wall-clean

2015-06-23 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12114:
--

 Summary: Make hadoop-tools/hadoop-pipes Native code -Wall-clean
 Key: HADOOP-12114
 URL: https://issues.apache.org/jira/browse/HADOOP-12114
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Affects Versions: 2.7.0
Reporter: Alan Burlison
Assignee: Alan Burlison


As we specify -Wall as a default compilation flag, it would be helpful if the 
Native code was -Wall-clean



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12112) Make hadoop-common-project Native code -Wall-clean

2015-06-23 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12112:
--

 Summary: Make hadoop-common-project Native code -Wall-clean
 Key: HADOOP-12112
 URL: https://issues.apache.org/jira/browse/HADOOP-12112
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Affects Versions: 2.7.0
Reporter: Alan Burlison
Assignee: Alan Burlison


As we specify -Wall as a default compilation flag, it would be helpful if the 
Native code was -Wall-clean



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12104) Migrate Hadoop Pipes native build to new CMake framework

2015-06-18 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12104:
--

 Summary: Migrate Hadoop Pipes native build to new CMake framework
 Key: HADOOP-12104
 URL: https://issues.apache.org/jira/browse/HADOOP-12104
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.7.0
Reporter: Alan Burlison
Assignee: Alan Burlison


As per HADOOP-12036, the CMake infrastructure should be refactored and made 
common across all Hadoop components. This bug covers the migration of Hadoop 
Pipes to the new CMake infrastructure. This change will also add support for 
building Hadoop Pipes Native components on Solaris.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12012) Investigate JNI for improving byte array comparison performance

2015-05-21 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12012:
--

 Summary: Investigate JNI for improving byte array comparison 
performance
 Key: HADOOP-12012
 URL: https://issues.apache.org/jira/browse/HADOOP-12012
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: benchmarks, io, performance
Affects Versions: 2.7.0
 Environment: All
Reporter: Alan Burlison
Assignee: Alan Burlison
Priority: Minor


HADOOP-7761 added functionality to compare byte arrays by treating them as 
arrays of 64-bit longs for performance. However HADOOP-11466 reverted this 
change for the SPARC architecture as it causes misaligned traps which causes 
performance to be worse rather than better.

Most platforms have a highly-optimised memcmp() libc function that uses 
processor-specific functionality to perform byte array comparison as quickly as 
is possible for the platform.

We have done some preliminary benchmarking on Solaris that suggests that, for 
reasonably-sized byte arrays, JNI code using memcmp() outperforms both of the 
current Java byte-size and long-size implementations on both SPARC and x86 . We 
are confirming the results and will repeat the same benchmark on Linux and 
report the results here for discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12008) Investigate providing SPARC hardware-accelerated CRC32 code

2015-05-20 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-12008:
--

 Summary: Investigate providing SPARC hardware-accelerated CRC32 
code
 Key: HADOOP-12008
 URL: https://issues.apache.org/jira/browse/HADOOP-12008
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: performance
Affects Versions: 2.7.0
 Environment: Solaris SPARC
Reporter: Alan Burlison


hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util 
contains code for hardware-accelerated CRC32 on x86 platforms. There is no 
corresponding code for the SPARC architecture, the possibility of providing it 
should be investigated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11997) CMake CMAKE_C_FLAGS are non-portable

2015-05-19 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-11997:
--

 Summary: CMake CMAKE_C_FLAGS are non-portable
 Key: HADOOP-11997
 URL: https://issues.apache.org/jira/browse/HADOOP-11997
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: build
Affects Versions: 2.7.0
 Environment: All
Reporter: Alan Burlison
Assignee: Alan Burlison
Priority: Critical


hadoop-common-project/hadoop-common/src/CMakeLists.txt 
(https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt#L110)
 contains the following unconditional assignments to CMAKE_C_FLAGS:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g -Wall -O2")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_REENTRANT -D_GNU_SOURCE")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64")

There are several issues here:

1. "-D_GNU_SOURCE" globally enables the use of all Linux-only extensions in 
hadoop-common native source. This is probably a major contributor to the poor 
cross-platform portability of Hadoop native code to non-Linux platforms as it 
makes it easy for developers to use non-portable Linux features without 
realising. Use of Linux-specific features should be correctly bracketed with 
conditional macro blocks that provide an alternative for non-Linux platforms.

2. "-g -Wall -O2" turns on debugging for all builds, I believe the correct 
mechanism is to set the CMAKE_BUILD_TYPE CMake variable. If it is still 
necessary to override CFLAGS it should probably be done conditionally dependent 
on the value of CMAKE_BUILD_TYPE.

3. "-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64" On Solaris these flags are only 
needed for largefile support in ILP32 applications, LP64 applications are 
largefile by default. I believe the same is true on Linux, so these flags are 
harmless but redundant for 64-bit compilation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11987) JNI build should use default cmake FindJNI.cmake

2015-05-16 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-11987:
--

 Summary: JNI build should use default cmake FindJNI.cmake
 Key: HADOOP-11987
 URL: https://issues.apache.org/jira/browse/HADOOP-11987
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Affects Versions: 2.7.0
 Environment: All
Reporter: Alan Burlison
Assignee: Alan Burlison
Priority: Minor


>From 
>http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201505.mbox/%3C55568DAC.1040303%40oracle.com%3E
--
Why does  hadoop-common-project/hadoop-common/src/CMakeLists.txt use 
JNIFlags.cmake in the same directory to set things up for JNI 
compilation rather than FindJNI.cmake, which comes as a standard cmake 
module? The checks in JNIFlags.cmake make several assumptions that I 
believe are only correct on Linux whereas I'd expect FindJNI.cmake to be 
more platform-independent.
--
Just checked the repo of cmake and it turns out that FindJNI.cmake is
available even before cmake 2.4. I think it makes sense to file a bug
to replace it to the standard cmake module. Can you please file a jira
for this?
--

This also applies to 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/JNIFlags.cmake



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11985) Improve Solaris support in Hadoop

2015-05-16 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-11985:
--

 Summary: Improve Solaris support in Hadoop
 Key: HADOOP-11985
 URL: https://issues.apache.org/jira/browse/HADOOP-11985
 Project: Hadoop Common
  Issue Type: New Feature
  Components: build, conf
Affects Versions: 2.7.0
 Environment: Solaris x86, Solaris sparc
Reporter: Alan Burlison
Assignee: Alan Burlison


At present the Hadoop native components aren't fully supported on Solaris 
primarily due to differences between Linux and Solaris. This top-level task 
will be used to group together both existing and new issues related to this 
work. A second goal is to improve Hadoop performance on Solaris wherever 
possible.

Steve Loughran suggested a top-level JIRA was the best way to manage the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11975) Native code needs to be built to match the 32/64 bitness of the JVM

2015-05-15 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-11975:
--

 Summary: Native code needs to be built to match the 32/64 bitness 
of the JVM
 Key: HADOOP-11975
 URL: https://issues.apache.org/jira/browse/HADOOP-11975
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 2.7.0
 Environment: Solaris
Reporter: Alan Burlison
Assignee: Alan Burlison


When building with a 64-bit JVM on Solaris the following error occurs at the 
link stage of building the native code:

 [exec] ld: fatal: file 
/usr/jdk/instances/jdk1.8.0/jre/lib/amd64/server/libjvm.so: wrong ELF class: 
ELFCLASS64
 [exec] collect2: error: ld returned 1 exit status
 [exec] make[2]: *** [target/usr/local/lib/libhadoop.so.1.0.0] Error 1
 [exec] make[1]: *** [CMakeFiles/hadoop.dir/all] Error 2

The compilation flags in the makefiles need to explicitly state if 32 or 64 bit 
code is to be generated, to match the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11974) FIONREAD is not always in the same header

2015-05-15 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-11974:
--

 Summary: FIONREAD is not always in the same header
 Key: HADOOP-11974
 URL: https://issues.apache.org/jira/browse/HADOOP-11974
 Project: Hadoop Common
  Issue Type: Bug
  Components: net
Affects Versions: 2.7.0
 Environment: Solaris
Reporter: Alan Burlison
Assignee: Alan Burlison
Priority: Minor


The FIONREAD macro is found in  on Linux and  on 
Solaris. A conditional include block is required to make sure it is looked for 
in the right place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11968) BUILDING.txt is still unclear about the need to build

2015-05-13 Thread Alan Burlison (JIRA)
Alan Burlison created HADOOP-11968:
--

 Summary: BUILDING.txt is still unclear about the need to build 
 Key: HADOOP-11968
 URL: https://issues.apache.org/jira/browse/HADOOP-11968
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 2.7.0
 Environment: All
Reporter: Alan Burlison
Assignee: Alan Burlison
Priority: Minor


HADOOP-9279 attempted to address the issue of having to make sure that 
hadoop-maven-plugins is built first by modifying BUILDING.txt but it still 
isn't clear this is a requirement. The "compile" target doesn't do this, so if 
the first build is a "mvn compile" it will fail.

BUILDING.txt should be modified to recommend either that an "mvn compile" is 
done first in hadoop-maven-plugins or that a "mvn clean install -Pnative 
-DskipTests" is done from the top-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)