[jira] [Created] (HADOOP-12926) lz4.c does not detect 64-bit mode properly
Alan Burlison created HADOOP-12926: -- Summary: lz4.c does not detect 64-bit mode properly Key: HADOOP-12926 URL: https://issues.apache.org/jira/browse/HADOOP-12926 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: 2.7.2 Environment: All Reporter: Alan Burlison Assignee: Alan Burlison lz4.c should check to see if the _LP64 macro is defined to correctly detect when it is being built in 64-bit mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12925) Checks for SPARC architecture need to include 64-bit SPARC
Alan Burlison created HADOOP-12925: -- Summary: Checks for SPARC architecture need to include 64-bit SPARC Key: HADOOP-12925 URL: https://issues.apache.org/jira/browse/HADOOP-12925 Project: Hadoop Common Issue Type: Sub-task Components: conf Affects Versions: 2.7.2 Environment: 64-bit SPARC Reporter: Alan Burlison Assignee: Alan Burlison FastByteComparisons.java and NativeCrc32.java check for the SPARC platform by comparing the os.arch property against "sparc". That doesn't detect 64-bit SPARC ("sparcv9"), the test should be "startsWith", not "equals" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12727) Minor cleanups needed for CMake 3.X
Alan Burlison created HADOOP-12727: -- Summary: Minor cleanups needed for CMake 3.X Key: HADOOP-12727 URL: https://issues.apache.org/jira/browse/HADOOP-12727 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: 2.7.1 Reporter: Alan Burlison Assignee: Alan Burlison Priority: Minor On switching from CMake 2.8.6 to 3.3.2 a couple of minor issues popped up: \\ \\ * There's a syntax error in {{hadoop-common-project/hadoop-common/src/CMakeLists.txt}} that generates a warning in 3.X * {{CMAKE_SHARED_LINKER_FLAGS}} is being incorrectly set in {{hadoop-common-project/hadoop-common/HadoopCommon.cmake}} - despite the name it contains the flags passed to {{ar}} not to the linker. 2.8.6 ignores the incorrect flags, 3.3.2 doesn't and building static libraries fails as a result. See http://public.kitware.com/pipermail/cmake/2016-January/062447.html Patch to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12720) Misuse of sun.misc.Unsafe by org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo causes misaligned memory access coredumps
Alan Burlison created HADOOP-12720: -- Summary: Misuse of sun.misc.Unsafe by org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo causes misaligned memory access coredumps Key: HADOOP-12720 URL: https://issues.apache.org/jira/browse/HADOOP-12720 Project: Hadoop Common Issue Type: Sub-task Components: io Affects Versions: 2.7.1 Environment: Solaeis SPARC Reporter: Alan Burlison Core dump details below: {noformat} # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.85-b07 mixed mode solaris-sparc compressed oops) # Problematic frame: # J 86 C2 org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo([BII[BII)I (273 bytes) @ 0x6fc9b150 [0x6fc9b0e0+0x70] Stack: [0x7e20,0x7e30], sp=0x7e2fce50, free space=1011k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) J 86 C2 org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo([BII[BII)I (273 bytes) @ 0x6fc9b150 [0x6fc9b0e0+0x70] j org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(Ljava/lang/Object;IILjava/lang/Object;II)I+16 j org.apache.hadoop.io.FastByteComparisons.compareTo([BII[BII)I+11 j org.apache.hadoop.io.WritableComparator.compareBytes([BII[BII)I+8 j org.apache.hadoop.io.Text$Comparator.compare([BII[BII)I+39 j org.apache.hadoop.io.TestText.testCompare()V+167 v ~StubRoutines::call_stub {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12630) Mususe of sun.misc.Unsafe causes midaligned memory access coredumps
Alan Burlison created HADOOP-12630: -- Summary: Mususe of sun.misc.Unsafe causes midaligned memory access coredumps Key: HADOOP-12630 URL: https://issues.apache.org/jira/browse/HADOOP-12630 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: 3.0.0 Environment: Solaris SPARC Reporter: Alan Burlison Assignee: Alan Burlison Misuse of sun.misc.unsafe by {{org.apache.hadoop.io.FastByteComparisons}} causes misaligned memory accesses and results in coredumps. Stack traces below: {code} hadoop-tools/hadoop-gridmix/core --- called from signal handler with signal 10 (SIGBUS) --- 7717fa40 Unsafe_GetLong (18c000, 7e2fd6d8, 0, 19, 775d4be0, 10018c000) + 158 70810dcc * sun/misc/Unsafe.getLong(Ljava/lang/Object;J)J+-30004 70810d70 * sun/misc/Unsafe.getLong(Ljava/lang/Object;J)J+0 70806d58 * org/apache/hadoop/io/FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo([BII[BII)I+91 (line 405) 70806cb4 * org/apache/hadoop/io/FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(Ljava/lang/Object;IILjava/lang/Object;II)I+16 (line 264) 7080783c * org/apache/hadoop/io/FastByteComparisons.compareTo([BII[BII)I+11 (line 92) 70806cb4 * org/apache/hadoop/io/WritableComparator.compareBytes([BII[BII)I+8 (line 376) 70806cb4 * org/apache/hadoop/mapred/gridmix/GridmixRecord$Comparator.compare([BII[BII)I+61 (line 522) 70806cb4 * org/apache/hadoop/mapred/gridmix/TestGridmixRecord.binSortTest(Lorg/apache/hadoop/mapred/gridmix/GridmixRecord;Lorg/apache/hadoop/mapred/gridmix/GridmixRecord;IILorg/apache/hadoop/io/WritableComparator;)V+280 (line 268) 70806f44 * org/apache/hadoop/mapred/gridmix/TestGridmixRecord.testBaseRecord()V+57 (line 482) {code} This also causes {{hadoop-mapreduce-project/hadoop-mapreduce-examples/core}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12629) Misaligned memory accesses in CRC32 native code causes coredumps
Alan Burlison created HADOOP-12629: -- Summary: Misaligned memory accesses in CRC32 native code causes coredumps Key: HADOOP-12629 URL: https://issues.apache.org/jira/browse/HADOOP-12629 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: 3.0.0 Environment: Solaris SPARC Reporter: Alan Burlison Assignee: Alan Burlison Testing on Solaris SPARC produces multiple SIGBUS core dumps, which are usually due to misaligned memory access. Some representative stack traces are below: {code} hadoop-hdfs-project/hadoop-hdfs/core --- called from signal handler with signal 10 (SIGBUS) --- 5d3245ec crc32c_sb8 (, 7ada5f02f, 0, 40, ff, ff00) + 9c 5d324954 pipelined_crc32c_sb8 (4a3fe0b4, 4a3fe0b8, 4a3fe0bc, 7ada5f02f, 200, 2) + 24 5d324c58 bulk_crc (7ada5f02f, 400, 7ada5f027, 2, 200, 4a3fe1b0) + 1f8 5d3242d0 Java_org_apache_hadoop_util_NativeCrc32_nativeComputeChunkedSumsByteArray (1029b99e8, 4a3fe4e8, 200, 10, 4a3fe4f8, 1f) + 1c0 70810dcc * org/apache/hadoop/util/NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V+14336 70810d70 * org/apache/hadoop/util/NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V+0 70806f44 * org/apache/hadoop/util/NativeCrc32.verifyChunkedSumsByteArray(II[BI[BIILjava/lang/String;J)V+15 (line 138) 70806f44 * org/apache/hadoop/util/DataChecksum.verifyChunkedSums([BII[BILjava/lang/String;J)V+39 (line 691) 70806f44 * org/apache/hadoop/util/DataChecksum.verifyChunkedSums(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;Ljava/lang/String;J)V+59 (line 585) 70806f44 * org/apache/hadoop/hdfs/server/datanode/BlockReceiver.verifyChunks(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)V+11 (line 914) 70806f44 * org/apache/hadoop/hdfs/server/datanode/BlockReceiver.receivePacket()I+658 (line 1092) 70806cb4 * org/apache/hadoop/hdfs/server/datanode/BlockReceiver.receiveBlock(Ljava/io/DataOutputStream;Ljava/io/DataInputStream;Ljava/io/DataOutputStream;Ljava/lang/String;Lorg/apache/hadoop/hdfs/util/DataTransferThrottler;[Lorg/apache/hadoop/hdfs/protocol/DatanodeInf+97 (line 1786) 70806f44 * org/apache/hadoop/hdfs/server/datanode/DataXceiver.writeBlock(Lorg/apache/hadoop/hdfs/protocol/ExtendedBlock;Lorg/apache/hadoop/fs/StorageType;Lorg/apache/hadoop/security/token/Token;Ljava/lang/String;[Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;[Lorg/apa+1428 (line 1403) 70806f44 * org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.opWriteBlock(Ljava/io/DataInputStream;)V+178 (line 327) 70806f44 * org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.processOp(Lorg/apache/hadoop/hdfs/protocol/datatransfer/Op;)V+72 (line 196) 70806f44 * org/apache/hadoop/hdfs/server/datanode/DataXceiver.run()V+539 (line 778) hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/core --- called from signal handler with signal 10 (SIGBUS) --- 5a324c8c bulk_crc (7b3ef0e53, 1, 7b3ef0e4f, 0, 0, 2d5fe230) + 22c 5a3242d0 Java_org_apache_hadoop_util_NativeCrc32_nativeComputeChunkedSumsByteArray (1048ec1e8, 2d5fe568, 200, 10, 2d5fe578, 1f) + 1c0 70810dcc * org/apache/hadoop/util/NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V+14336 70810d70 * org/apache/hadoop/util/NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V+0 70806f44 * org/apache/hadoop/util/NativeCrc32.verifyChunkedSumsByteArray(II[BI[BIILjava/lang/String;J)V+15 (line 138) 70806f44 * org/apache/hadoop/util/DataChecksum.verifyChunkedSums([BII[BILjava/lang/String;J)V+39 (line 691) 70806f44 * org/apache/hadoop/util/DataChecksum.verifyChunkedSums(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;Ljava/lang/String;J)V+59 (line 585) 70806f44 * org/apache/hadoop/hdfs/server/datanode/BlockReceiver.verifyChunks(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)V+11 (line 914) 70806f44 * org/apache/hadoop/hdfs/server/datanode/BlockReceiver.receivePacket()I+658 (line 1092) 70806cb4 * org/apache/hadoop/hdfs/server/datanode/BlockReceiver.receiveBlock(Ljava/io/DataOutputStream;Ljava/io/DataInputStream;Ljava/io/DataOutputStream;Ljava/lang/String;Lorg/apache/hadoop/hdfs/util/DataTransferThrottler;[Lorg/apache/hadoop/hdfs/protocol/DatanodeInf+97 (line 1786) 70806f44 * org/apache/hadoop/hdfs/server/datanode/DataXceiver.writeBlock(Lorg/apache/hadoop/hdfs/protocol/ExtendedBlock;Lorg/apache/hadoop/fs/StorageType;Lorg/apache/hadoop/security/token/Token;Ljava/lang/String;[Lorg/apache/hadoop/hdfs/protocol/DatanodeInfo;[Lorg/apa+1428 (line 1403) 70806f44 * org/apache/
[jira] [Created] (HADOOP-12583) Sundry symlink problems on Solaris
Alan Burlison created HADOOP-12583: -- Summary: Sundry symlink problems on Solaris Key: HADOOP-12583 URL: https://issues.apache.org/jira/browse/HADOOP-12583 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 2.7.1 Environment: Solaris Reporter: Alan Burlison Priority: Minor There are various filesystem test failures on Solaris: {code} TestSymlinkLocalFSFileContext>TestSymlinkLocalFS.testDanglingLink:156 expected:<[alanbur]> but was:<[]> TestSymlinkLocalFSFileContext>TestSymlinkLocalFS.testSetTimesSymlinkToDir:233->SymlinkBaseTest.testSetTimesSymlinkToDir:1391 expected:<1447788288000> but was:<3000> TestSymlinkLocalFSFileContext>TestSymlinkLocalFS.testSetTimesSymlinkToFile:227->SymlinkBaseTest.testSetTimesSymlinkToFile:1376 expected:<144778829> but was:<3000> TestSymlinkLocalFSFileSystem>TestSymlinkLocalFS.testDanglingLink:156 expected:<[alanbur]> but was:<[]> TestSymlinkLocalFSFileSystem>TestSymlinkLocalFS.testSetTimesSymlinkToDir:233->SymlinkBaseTest.testSetTimesSymlinkToDir:1391 expected:<1447788416000> but was:<3000> TestSymlinkLocalFSFileSystem>TestSymlinkLocalFS.testSetTimesSymlinkToFile:227->SymlinkBaseTest.testSetTimesSymlinkToFile:1376 expected:<1447788417000> but was:<3000> {code} I'm not sure what the root cause it, most likely Linux-specific assumptions about how symlinks behave. Further investigation needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12581) ShellBasedIdMapping needs suport for Solaris
Alan Burlison created HADOOP-12581: -- Summary: ShellBasedIdMapping needs suport for Solaris Key: HADOOP-12581 URL: https://issues.apache.org/jira/browse/HADOOP-12581 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: 2.7.1 Environment: Solaris Reporter: Alan Burlison ShellBasedIdMapping only supports Linux and OSX, support for Solaris needs adding. >From looking at the Linux support in ShellBasedIdMapping, the same sequences >of shell commands should work for Solaris as well so all that's probably >needed is to change the implementation of checkSupportedPlatform() to treat >Linux and Solaris the same way, plus possibly some renaming of other methods >to make it more obvious they are not Linux-only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12580) Hadoop needs a SysInfo class for Solaris
Alan Burlison created HADOOP-12580: -- Summary: Hadoop needs a SysInfo class for Solaris Key: HADOOP-12580 URL: https://issues.apache.org/jira/browse/HADOOP-12580 Project: Hadoop Common Issue Type: Sub-task Components: util Affects Versions: 2.7.1 Environment: Solaris Reporter: Alan Burlison Assignee: Alan Burlison During testing multiple failures of the following sort are reported: {code} java.lang.UnsupportedOperationException: Could not determine OS at org.apache.hadoop.util.SysInfo.newInstance(SysInfo.java:43) at org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.(ResourceCalculatorPlugin.java:41) at org.apache.hadoop.mapred.gridmix.DummyResourceCalculatorPlugin.(DummyResourceCalculatorPlugin.java:32) at org.apache.hadoop.mapred.gridmix.TestGridmixMemoryEmulation.testTotalHeapUsageEmulatorPlugin(TestGridmixMemoryEmulation.java:131) {code} This is because there is no SysInfo subclass for Solaris, from SysInfo.java {code} public static SysInfo newInstance() { if (Shell.LINUX) { return new SysInfoLinux(); } if (Shell.WINDOWS) { return new SysInfoWindows(); } throw new UnsupportedOperationException("Could not determine OS"); } {code} An implementation of SysInfoSolaris needs to be written and plumbed in to SysInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HADOOP-12344) Improve validateSocketPathSecurity0 error message
[ https://issues.apache.org/jira/browse/HADOOP-12344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison reopened HADOOP-12344: This fix introduced multiple sprintf format warnings which need attention. {code} [exec] op/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:488:10: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 6 has type ‘long long int’ [-Wformat=] [exec] check, path, mode, (long long)st.st_uid, (long long)st.st_gid, check); [exec] ^ [exec] /pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:488:10: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 7 has type ‘long long int’ [-Wformat=] [exec] /pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:500:10: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 6 has type ‘long long int’ [-Wformat=] [exec] check, check); [exec] ^ [exec] /pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:500:10: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 7 has type ‘long long int’ [-Wformat=] [exec] /pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:513:10: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 6 has type ‘long long int’ [-Wformat=] [exec] (long long)uid, check, (long long)uid, check); [exec] ^ [exec] /pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:513:10: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 7 has type ‘long long int’ [-Wformat=] [exec] /pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:513:10: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 8 has type ‘long long int’ [-Wformat=] [exec] /pool/home/alanbur/bigdata/hadoop/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c:513:10: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 10 has type ‘long long int’ [-Wformat=] {code} > Improve validateSocketPathSecurity0 error message > - > > Key: HADOOP-12344 > URL: https://issues.apache.org/jira/browse/HADOOP-12344 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Reporter: Casey Brotherton >Assignee: Casey Brotherton >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HADOOP-12344.001.patch, HADOOP-12344.002.patch, > HADOOP-12344.003.patch, HADOOP-12344.004.patch, HADOOP-12344.patch > > > When a socket path does not have the correct permissions, an error is thrown. > That error just has the failing component of the path and not the entire path > of the socket. > The entire path of the socket could be printed out to allow for a direct > check of the permissions of the entire path. > {code} > java.io.IOException: the path component: '/' is world-writable. Its > permissions are 0077. Please fix this or select a different socket path. > at > org.apache.hadoop.net.unix.DomainSocket.validateSocketPathSecurity0(Native > Method) > at > org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:189) > ... > {code} > The error message could also provide the socket path: > {code} > java.io.IOException: the path component: '/' is world-writable. Its > permissions are 0077. Please fix this or select a different socket path than > '/var/run/hdfs-sockets/dn' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12488) DomainSocket: Solaris does not support timeouts on AF_UNIX sockets
Alan Burlison created HADOOP-12488: -- Summary: DomainSocket: Solaris does not support timeouts on AF_UNIX sockets Key: HADOOP-12488 URL: https://issues.apache.org/jira/browse/HADOOP-12488 Project: Hadoop Common Issue Type: Bug Components: net Affects Versions: 2.7.1 Environment: Solaris Reporter: Alan Burlison >From the hadoop-common-dev mailing list: http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201509.mbox/%3c560b99f6.6010...@oracle.com%3E http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201510.mbox/%3c560ea6bf.2070...@oracle.com%3E {noformat} Now that the Hadoop native code builds on Solaris I've been chipping away at all the test failures. About 50% of the failures involve DomainSocket, either directly or indirectly. That seems to be mainly because the tests use DomainSocket to do single-node testing, whereas in production it seems that DomainSocket is less commonly used (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html). The particular problem on Solaris is that socket read/write timeouts (the SO_SNDTIMEO and SO_RCVTIMEO socket options) are not supported for UNIX domain (PF_UNIX) sockets. Those options are however supported for PF_INET sockets. That's because the socket implementation on Solaris is split roughly into two parts, for inet sockets and for STREAMS sockets, and the STREAMS implementation lacks support for SO_SNDTIMEO and SO_RCVTIMEO. As an aside, performance of sockets that use loopback or the host's own IP is slightly better than that of UNIX domain sockets on Solaris. I'm investigating getting timeouts supported for PF_UNIX sockets added to Solaris, but in the meantime I'm also looking how this might be worked around in Hadoop. One way would be to implement timeouts by wrapping all the read/write/send/recv etc calls in DomainSocket.c with either poll() or select(). The basic idea is to add two new fields to DomainSocket.c to hold the read/write timeouts. On platforms that support SO_SNDTIMEO and SO_RCVTIMEO these would be unused as setsockopt() would be used to set the socket timeouts. On platforms such as Solaris the JNI code would use the values to implement the timeouts appropriately. To prevent the code in DomainSocket.c becoming a #ifdef hairball, the current socket IO function calls such as accept(), send(), read() etc would be replaced with a macros such as HD_ACCEPT. On platforms that provide timeouts these would just expand to the normal socket functions, on platforms that don't support timeouts it would expand to wrappers that implements timeouts for them. The only caveats are that all code that does anything to a PF_UNIX socket would *always* have to do so via DomainSocket. As far as I can tell that's not an issue, but it would have to be borne in mind if any changes were made in this area. Before I set about doing this, does the approach seem reasonable? {noformat} {noformat} Unfortunately it's not a simple as I'd hoped. For some reason I don't really understand, nearly all the JNI methods are declared as static and therefore don't get a "this" pointer and as a consequence all the class data members that are needed by the JNI code have to be passed in as parameters. That also means it's not possible to store the timeouts in the DomainSocket fields from within the JNI code. Most of the JNI methods should be instance methods rather than static ones, but making that change would require some significant surgery to DomainSocket. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12487) DomainSocket.close() assumes incorrect Linux behaviour
Alan Burlison created HADOOP-12487: -- Summary: DomainSocket.close() assumes incorrect Linux behaviour Key: HADOOP-12487 URL: https://issues.apache.org/jira/browse/HADOOP-12487 Project: Hadoop Common Issue Type: Sub-task Components: net Affects Versions: 2.7.1 Environment: Linux Solaris Reporter: Alan Burlison I'm getting a test failure in TestDomainSocket.java, in the testSocketAcceptAndClose test. That test creates a socket which one thread waits on in DomainSocket.accept() whilst a second thread sleeps for a short time before closing the same socket with DomainSocket.close(). DomainSocket.close() first calls shutdown0() on the socket before closing close0() - both those are thin wrappers around the corresponding libc socket calls. DomainSocket.close() contains the following comment, explaining the logic involved: {code} // Calling shutdown on the socket will interrupt blocking system // calls like accept, write, and read that are going on in a // different thread. {code} Unfortunately that relies on non-standards-compliant Linux behaviour. I've written a simple C test case that replicates the scenario above: # ThreadA opens, binds, listens and accepts on a socket, waiting for connections. # Some time later ThreadB calls shutdown on the socket ThreadA is waiting in accept on. Here is what happens: On Linux, the shutdown call in ThreadB succeeds and the accept call in ThreadA returns with EINVAL. On Solaris, the shutdown call in ThreadB fails and returns ENOTCONN. ThreadA continues to wait in accept. Relevant POSIX manpages: http://pubs.opengroup.org/onlinepubs/9699919799/functions/accept.html http://pubs.opengroup.org/onlinepubs/9699919799/functions/shutdown.html The POSIX shutdown manpage says: "The shutdown() function shall cause all or part of a full-duplex connection on the socket associated with the file descriptor socket to be shut down." ... "\[ENOTCONN] The socket is not connected." Page 229 & 303 of "UNIX System V Network Programming" say: "shutdown can only be called on sockets that have been previously connected" "The socket \[passed to accept that] fd refers to does not participate in the connection. It remains available to receive further connect indications" That is pretty clear, sockets being waited on with accept are not connected by definition. Nor is it the accept socket connected when a client connects to it, it is the socket returned by accept that is connected to the client. Therefore the Solaris behaviour of failing the shutdown call is correct. In order to get the required behaviour of ThreadB causing ThreadA to exit the accept call with an error, the correct way is for ThreadB to call close on the socket that ThreadA is waiting on in accept. On Solaris, calling close in ThreadB succeeds, and the accept call in ThreadA fails and returns EBADF. On Linux, calling close in ThreadB succeeds but ThreadA continues to wait in accept until there is an incoming connection. That accept returns successfully. However subsequent accept calls on the same socket return EBADF. The Linux behaviour is fundamentally broken in three places: # Allowing shutdown to succeed on an unconnected socket is incorrect. # Returning a successful accept on a closed file descriptor is incorrect, especially as future accept calls on the same socket fail. # Once shutdown has been called on the socket, calling close on the socket fails with EBADF. That is incorrect, shutdown should just prevent further IO on the socket, it should not close it. The real issue though is that there's no single way of doing this that works on both Solaris and Linux, there will need to be platform-specific code in Hadoop to cater for the Linux brokenness. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-12300) BUILDING.txt instructions for skipping tests incomplete/obsolete, build can fail.
[ https://issues.apache.org/jira/browse/HADOOP-12300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison resolved HADOOP-12300. Resolution: Duplicate > BUILDING.txt instructions for skipping tests incomplete/obsolete, build can > fail. > - > > Key: HADOOP-12300 > URL: https://issues.apache.org/jira/browse/HADOOP-12300 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.5.0, 2.6.0 > Environment: maven 3.2.3 + >Reporter: Peter D Kirchner > Original Estimate: 1h > Remaining Estimate: 1h > > Instructions currently online, and within the hadoop source tree, appear to > be incomplete/obsolete for building the source. I checked 2.5.0 and 2.6.0 . > with "mvn install -DskipTests" my build fails on multiple hadoop src/test > sub-directories with: >Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile > (default-testCompile)... > This compile failure may be related to my jdk/jre, but why is my build ("mvn > install -DskipTests") compiling these tests? > The pom.xml files are using the maven-compiler-plugin to compile some of the > tests instead of surefire. The maven-compiler-plugin is unaffected by > -DskipTests and per maven documentation requires -Dmaven.test.skip instead, > which the surefire plugin also obeys. > Building with > mvn install -Dmaven.test.skip > completes in my environment. > I suggest a "major" rating because of the impact on users of the source > tarballs. IMO the build instructions in the tarball and online (e.g. > https://wiki.apache.org/hadoop/EclipseEnvironment ) should work reliably. > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12261) Surefire needs to make sure the JVMs it fires up are 64-bit
Alan Burlison created HADOOP-12261: -- Summary: Surefire needs to make sure the JVMs it fires up are 64-bit Key: HADOOP-12261 URL: https://issues.apache.org/jira/browse/HADOOP-12261 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.7.1 Reporter: Alan Burlison hadoop-project/pom.xml sets maven-surefire-plugin.argLine to include -Xmx4096m. Allocating that amount of memory requires a 64-bit JVM, but on platforms with both 32 and 64-bit JVMs surefire runs the 32 bit version by default and tests fail to start as a result. "-d64" should be added to the command-line arguments to ensure a 64-bit JVM is always used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12170) hadoop-common's JNIFlags.cmake is redundant and can be removed
Alan Burlison created HADOOP-12170: -- Summary: hadoop-common's JNIFlags.cmake is redundant and can be removed Key: HADOOP-12170 URL: https://issues.apache.org/jira/browse/HADOOP-12170 Project: Hadoop Common Issue Type: Sub-task Components: native Reporter: Alan Burlison Assignee: Alan Burlison With the integration of: * HADOOP-12036 Consolidate all of the cmake extensions in one *directory * HADOOP-12104 Migrate Hadoop Pipes native build to new CMake * HDFS-8635 Migrate HDFS native build to new CMake framework * MAPREDUCE-6407 Migrate MAPREDUCE native build to new CMake YARN-3827 Migrate YARN native build to new CMake framework hadoop-common-project/hadoop-common/src/JNIFlags.cmake is now redundant and can be removed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12114) Make hadoop-tools/hadoop-pipes Native code -Wall-clean
Alan Burlison created HADOOP-12114: -- Summary: Make hadoop-tools/hadoop-pipes Native code -Wall-clean Key: HADOOP-12114 URL: https://issues.apache.org/jira/browse/HADOOP-12114 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12112) Make hadoop-common-project Native code -Wall-clean
Alan Burlison created HADOOP-12112: -- Summary: Make hadoop-common-project Native code -Wall-clean Key: HADOOP-12112 URL: https://issues.apache.org/jira/browse/HADOOP-12112 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12104) Migrate Hadoop Pipes native build to new CMake framework
Alan Burlison created HADOOP-12104: -- Summary: Migrate Hadoop Pipes native build to new CMake framework Key: HADOOP-12104 URL: https://issues.apache.org/jira/browse/HADOOP-12104 Project: Hadoop Common Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Reporter: Alan Burlison Assignee: Alan Burlison As per HADOOP-12036, the CMake infrastructure should be refactored and made common across all Hadoop components. This bug covers the migration of Hadoop Pipes to the new CMake infrastructure. This change will also add support for building Hadoop Pipes Native components on Solaris. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12012) Investigate JNI for improving byte array comparison performance
Alan Burlison created HADOOP-12012: -- Summary: Investigate JNI for improving byte array comparison performance Key: HADOOP-12012 URL: https://issues.apache.org/jira/browse/HADOOP-12012 Project: Hadoop Common Issue Type: Sub-task Components: benchmarks, io, performance Affects Versions: 2.7.0 Environment: All Reporter: Alan Burlison Assignee: Alan Burlison Priority: Minor HADOOP-7761 added functionality to compare byte arrays by treating them as arrays of 64-bit longs for performance. However HADOOP-11466 reverted this change for the SPARC architecture as it causes misaligned traps which causes performance to be worse rather than better. Most platforms have a highly-optimised memcmp() libc function that uses processor-specific functionality to perform byte array comparison as quickly as is possible for the platform. We have done some preliminary benchmarking on Solaris that suggests that, for reasonably-sized byte arrays, JNI code using memcmp() outperforms both of the current Java byte-size and long-size implementations on both SPARC and x86 . We are confirming the results and will repeat the same benchmark on Linux and report the results here for discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12008) Investigate providing SPARC hardware-accelerated CRC32 code
Alan Burlison created HADOOP-12008: -- Summary: Investigate providing SPARC hardware-accelerated CRC32 code Key: HADOOP-12008 URL: https://issues.apache.org/jira/browse/HADOOP-12008 Project: Hadoop Common Issue Type: Sub-task Components: performance Affects Versions: 2.7.0 Environment: Solaris SPARC Reporter: Alan Burlison hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util contains code for hardware-accelerated CRC32 on x86 platforms. There is no corresponding code for the SPARC architecture, the possibility of providing it should be investigated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11997) CMake CMAKE_C_FLAGS are non-portable
Alan Burlison created HADOOP-11997: -- Summary: CMake CMAKE_C_FLAGS are non-portable Key: HADOOP-11997 URL: https://issues.apache.org/jira/browse/HADOOP-11997 Project: Hadoop Common Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: All Reporter: Alan Burlison Assignee: Alan Burlison Priority: Critical hadoop-common-project/hadoop-common/src/CMakeLists.txt (https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt#L110) contains the following unconditional assignments to CMAKE_C_FLAGS: set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g -Wall -O2") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_REENTRANT -D_GNU_SOURCE") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64") There are several issues here: 1. "-D_GNU_SOURCE" globally enables the use of all Linux-only extensions in hadoop-common native source. This is probably a major contributor to the poor cross-platform portability of Hadoop native code to non-Linux platforms as it makes it easy for developers to use non-portable Linux features without realising. Use of Linux-specific features should be correctly bracketed with conditional macro blocks that provide an alternative for non-Linux platforms. 2. "-g -Wall -O2" turns on debugging for all builds, I believe the correct mechanism is to set the CMAKE_BUILD_TYPE CMake variable. If it is still necessary to override CFLAGS it should probably be done conditionally dependent on the value of CMAKE_BUILD_TYPE. 3. "-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64" On Solaris these flags are only needed for largefile support in ILP32 applications, LP64 applications are largefile by default. I believe the same is true on Linux, so these flags are harmless but redundant for 64-bit compilation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11987) JNI build should use default cmake FindJNI.cmake
Alan Burlison created HADOOP-11987: -- Summary: JNI build should use default cmake FindJNI.cmake Key: HADOOP-11987 URL: https://issues.apache.org/jira/browse/HADOOP-11987 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: 2.7.0 Environment: All Reporter: Alan Burlison Assignee: Alan Burlison Priority: Minor >From >http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201505.mbox/%3C55568DAC.1040303%40oracle.com%3E -- Why does hadoop-common-project/hadoop-common/src/CMakeLists.txt use JNIFlags.cmake in the same directory to set things up for JNI compilation rather than FindJNI.cmake, which comes as a standard cmake module? The checks in JNIFlags.cmake make several assumptions that I believe are only correct on Linux whereas I'd expect FindJNI.cmake to be more platform-independent. -- Just checked the repo of cmake and it turns out that FindJNI.cmake is available even before cmake 2.4. I think it makes sense to file a bug to replace it to the standard cmake module. Can you please file a jira for this? -- This also applies to hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/JNIFlags.cmake -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11985) Improve Solaris support in Hadoop
Alan Burlison created HADOOP-11985: -- Summary: Improve Solaris support in Hadoop Key: HADOOP-11985 URL: https://issues.apache.org/jira/browse/HADOOP-11985 Project: Hadoop Common Issue Type: New Feature Components: build, conf Affects Versions: 2.7.0 Environment: Solaris x86, Solaris sparc Reporter: Alan Burlison Assignee: Alan Burlison At present the Hadoop native components aren't fully supported on Solaris primarily due to differences between Linux and Solaris. This top-level task will be used to group together both existing and new issues related to this work. A second goal is to improve Hadoop performance on Solaris wherever possible. Steve Loughran suggested a top-level JIRA was the best way to manage the work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11975) Native code needs to be built to match the 32/64 bitness of the JVM
Alan Burlison created HADOOP-11975: -- Summary: Native code needs to be built to match the 32/64 bitness of the JVM Key: HADOOP-11975 URL: https://issues.apache.org/jira/browse/HADOOP-11975 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 2.7.0 Environment: Solaris Reporter: Alan Burlison Assignee: Alan Burlison When building with a 64-bit JVM on Solaris the following error occurs at the link stage of building the native code: [exec] ld: fatal: file /usr/jdk/instances/jdk1.8.0/jre/lib/amd64/server/libjvm.so: wrong ELF class: ELFCLASS64 [exec] collect2: error: ld returned 1 exit status [exec] make[2]: *** [target/usr/local/lib/libhadoop.so.1.0.0] Error 1 [exec] make[1]: *** [CMakeFiles/hadoop.dir/all] Error 2 The compilation flags in the makefiles need to explicitly state if 32 or 64 bit code is to be generated, to match the JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11974) FIONREAD is not always in the same header
Alan Burlison created HADOOP-11974: -- Summary: FIONREAD is not always in the same header Key: HADOOP-11974 URL: https://issues.apache.org/jira/browse/HADOOP-11974 Project: Hadoop Common Issue Type: Bug Components: net Affects Versions: 2.7.0 Environment: Solaris Reporter: Alan Burlison Assignee: Alan Burlison Priority: Minor The FIONREAD macro is found in on Linux and on Solaris. A conditional include block is required to make sure it is looked for in the right place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11968) BUILDING.txt is still unclear about the need to build
Alan Burlison created HADOOP-11968: -- Summary: BUILDING.txt is still unclear about the need to build Key: HADOOP-11968 URL: https://issues.apache.org/jira/browse/HADOOP-11968 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 2.7.0 Environment: All Reporter: Alan Burlison Assignee: Alan Burlison Priority: Minor HADOOP-9279 attempted to address the issue of having to make sure that hadoop-maven-plugins is built first by modifying BUILDING.txt but it still isn't clear this is a requirement. The "compile" target doesn't do this, so if the first build is a "mvn compile" it will fail. BUILDING.txt should be modified to recommend either that an "mvn compile" is done first in hadoop-maven-plugins or that a "mvn clean install -Pnative -DskipTests" is done from the top-level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)