On 12 Jan 2016, at 10:49, Reynold Xin <r...@databricks.com<mailto:r...@databricks.com>> wrote:
How big of a deal this use case is in a heterogeneous endianness environment? If we do want to fix it, we should do it when right before Spark shuffles data to minimize performance penalty, i.e. turn big-endian encoded data into little-indian encoded data before it goes on the wire. This is a pretty involved change and given other things that might break across heterogeneous endianness environments, I am not sure if it is high priority enough to even warrant review bandwidth right now. This is a classic problem in distributed computing, which has two common strategies the SunOS RPC strategy: fixed order. For Sun, hence NFS, the order was that of the Motorola 68K, so cost-free on Sun workstations. As SPARC used the same byte ordering; again, free. For x86 parts wanting to play, inefficient at both sending and receiving. Protobuf has a fixed order, but here little-endian https://developers.google.com/protocol-buffers/docs/encoding. Apollo RPC DCE strategy: packets declare byte order, recipient gets to deal with it. This is efficient in a homogenous cluster of either endianness, as x86-x86 would be zero-byteswapping. The Apollo design ended up in DCE, which is what Windows Distributed COM uses. ( http://pubs.opengroup.org/onlinepubs/9629399/chap14.htm ). If you look at that spec, you can see its floating point marshalling that's most trouble. recipient-makes-good is ideal for clusters where the systems all share the same endianness: the amount of marshalling is guaranteed to be zero if all CPU parts are the same. That's clearly the defacto strategy in Spark. On contrast, the one-network-fomat is guaranteed to have 0 byteswaps on CPUs whose endian matches the wire format, guaranteed to be two for the other part (one at each end). For mixed-endian RPC there'll be one bswap, so the cost is the same as for the apollo DCE. Bits of hadoop core do byteswap stuff; for performance this is in native code; code which has to use assembly and builtin functions for max efficiency. It's a big patch —one that's designed for effective big-endian support, *ignoring heterogenous clusters* https://issues.apache.org/jira/secure/attachment/12776247/HADOOP-11505.007.patch All that stuff cropped up during Alan Burlinson sitting down to get Hadoop working properly on Sparc —that's a big enough project on its own that worrying about heterogenous systems isn't on his roadmap —and nobody else appears to care. I'd suggest the same to IBM: focus effort & testing on Power + AIX rather than worrying about heterogenous systems. -Steve