[ https://issues.apache.org/jira/browse/MAPREDUCE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055444#comment-15055444 ]
Binglin Chang commented on MAPREDUCE-6417: ------------------------------------------ I think it's probably OK to remove them, most of those method are added because of old compilers(gcc3) or glibc, which have inefficient memcpy & memcmp, like which we used in 2011 in our production environment, I recall it's the main reason adding them. At least on macosx clang and gcc4.4+, I see memcpy are pretty fast. So if we are drop old compilers/oses support and need to support sparc/arm, use system library is better. > MapReduceClient's primitives.h is toxic and should be extirpated > ---------------------------------------------------------------- > > Key: MAPREDUCE-6417 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6417 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: client > Affects Versions: 3.0.0 > Reporter: Alan Burlison > Assignee: Alan Burlison > Priority: Blocker > Attachments: MAPREDUCE-6417.001.patch > > > MapReduceClient's primitives.h attempts to provide optimised versions of > standard library memory copy and comparison functions. It has been the > subject of several portability-related bugs: > * HADOOP-11505 hadoop-mapreduce-client-nativetask uses bswap where be32toh is > needed, doesn't work on non-x86 > * HADOOP-11665 Provide and unify cross platform byteorder support in native > code > * MAPREDUCE-6397 MAPREDUCE makes many endian-dependent assumptions > * HADOOP-11484 hadoop-mapreduce-client-nativetask fails to build on ARM > AARCH64 due to x86 asm statements > At present it only works on x86 and ARM64 as it lacks definitions for bswap > and bswap64 for any platforms other than those. > However it has even more serious problems on non-x86 architectures, for > example on SPARC simple_memcpy simply doesn't work at all: > {code} > $ cat bang.cc > #include <string.h> > #define SIMPLE_MEMCPY > #include "primitives.h" > int main(int argc, char **argv) > { > char b1[9]; > char b2[9]; > simple_memcpy(b2, b1, sizeof(b1)); > } > $ gcc -o bang bang.cc && ./bang > Bus Error (core dumped) > {code} > That's because simple_memcpy does pointer fiddling that results in misaligned > accesses, which are illegal on SPARC. > fmemcmp is also broken. Even if a definition of bswap is provided, on > big-endian architectures the result is simply wrong because of its > unconditional use of bswap: > {code} > $ cat thud.cc > #include <stdio.h> > #include <strings.h> > #include "primitives.h" > int main(int argc, char **argv) > { > char a[] = { 0,1,2,0 }; > char b[] = { 0,2,1,0 }; > printf("%lld %d\n", fmemcmp(a, b, sizeof(a), memcmp(a, b, sizeof(a)))); > } > $ g++ -o thud thud.cc && ./thud > 65280 -1 > {code} > And in addition fmemcmp suffers from the same misalignment issues as > simple_memcpy and coredumps on SPARC when asked to compare odd-sized buffers. > primitives.h provides the following functions: > * bswap - used in 12 files in MRC but as HADOOP-11505 points out, mostly > incorrectly as it takes no account of platform endianness > * bswap64 - used in 4 files in MRC, same comments as per bswap apply > * simple_memcpy - used in 3 files in MRC, should be replaced with the > standard memcpy > * fmemcmp - used in 1 file, should be replaced with the standard memcmp > * fmemeq - used in 1 file, should be replaced with the standard memcmp > * frmemeq - not used at all, should just be removed > *Summary*: primitives.h should simply be deleted and replaced with the > standard memory copy & compare functions, or with thin wrappers around them > where the APIs are different. -- This message was sent by Atlassian JIRA (v6.3.4#6332)