[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...
Github user michaelkamprath commented on the issue: https://github.com/apache/spark/pull/16403 I am withdrawing this pull request and will instead publish a patch for those who are interested in using Spark on a platforms requiring byte alignment for double access. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath closed the pull request at: https://github.com/apache/spark/pull/16403 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...
Github user michaelkamprath commented on the issue: https://github.com/apache/spark/pull/16403 @srowen No one has any delusions that an ARM platform is the place to do heavy lifting for data. But if your goal is academic in nature (hands on learning how to set up and operate a cluster, learning the API, etc), it's a very economical platform. I do understand your concerns for the core target and totally get why you wouldn't want this fix in the main line, so I will instead pursue a patch approach for those who'd be interested. I'm not sure what that means here ... do you or I close this pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94259189 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -22,10 +22,14 @@ import java.lang.reflect.Method; import java.nio.ByteBuffer; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + import sun.misc.Cleaner; import sun.misc.Unsafe; public final class Platform { --- End diff -- I missed that. I'll address that when we determine the final path here (per my comment below). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94257179 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -244,6 +251,18 @@ public static void throwException(Throwable t) { LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class); FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class); DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class); + + // determine whether double access should be aligned. + String arch = System.getProperty("os.arch", ""); + if (arch.matches("^(arm|arm32)")) { --- End diff -- @kiszk I have tested on ARM 64 (`aarch64`). [Any alignment works for double access there](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch08s02.html), though 8-byte aligned access looks to be about 10% faster than unaligned access. Using an intermediate long buffer (your idea) is about 5% slower than direct access regardless of being aligned or not. In both cases, I tested with an ODROID C2 using Oracle Java 8. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...
Github user michaelkamprath commented on the issue: https://github.com/apache/spark/pull/16403 @srowen To answer the use case question, it is primarily academic for learning and testing. Students and researchers build clusters of Raspberry PI, ODROID, or other SBCs to have a cost effective access to a multi-node hardware cluster. [Here](http://likemagicappears.com/projects/raspberry-pi-cluster/) [are](http://coen.boisestate.edu/ece/research-areas/raspberry-pi/) [some](https://www.raspberrypi.org/magpi/pi-spark-supercomputer/) [examples](http://hackaday.com/2016/05/09/designing-a-high-performance-parallel-personal-cluster/) [of](http://katie.atomicburn.com/2016/06/12/2016-school-gt-exhibition-raspberry-piodroid-c2-supercomputer/) [projects](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4803722/). There is even [a commercial vendor](https://www.picocluster.com/collections/) selling these SBC clusters. [In my own case](http://diybigdata.net/odroid-xu4-cluster/), its being used to economically learn how to deal problems of efficiency (it's easier to spot and work through patter ns of inefficiency on constrained systems than full powered systems). I am personally not aware if there are currently any server-class CPUs that requires double alignment. Double alignment SPARC processors used to be the bane of my existence in the early 2000's, but that was over a decade ago. My understanding is that today x86 supports unaligned double access with [a theoretical performance hit](https://developers.redhat.com/blog/2016/06/01/how-to-avoid-wasting-megabytes-of-memory-a-few-bytes-at-a-time/) that [in practice is rarely seen](http://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/). Typically you never concern yourself with alignment in Java because the JVM takes care of it for you, but here we are delving into the world of Unsafe, which bypasses the protections the JVM provides. Admittedly, it took me a long while to even figure out that my problem was related to alignment because as indicated, I haven't dealt with such issues in over a decade. With all that said, maybe a better approach here is to create a patch that users can use to create a spark build when they want to run Spark on a system that requires double alignment, which to the best of my knowledge are currently just ARM 32-bit CPUs. That would even let to be more concise, without needing to determine at runtime which method to use. And if ever should arise a server-class CPU with alignment requirements, we know what to do. Given that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94054024 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java --- @@ -74,4 +76,71 @@ public void memoryDebugFillEnabledInTest() { Platform.getByte(offheap.getBaseObject(), offheap.getBaseOffset()), MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE); } + + /** + * This test ensures that double access is handled correctly for the host platform. + * On platforms that require double access to be aligned to 8-byte boundaries, this + * test should pass and not cause the JVM to segfault. + */ + @Test + public void unalignedDoublePutAndGet() { +MemoryBlock testBuffer = MemoryBlock.fromLongArray(new long[20]); + +// write a double to an unaligned location +long unalignedOffset = testBuffer.getBaseOffset() + 3; +// double check unalignment just to be sure: +if (unalignedOffset%8 == 0) { + ++unalignedOffset; +} + +Platform.putDouble(testBuffer.getBaseObject(), unalignedOffset, 3.14159); + +double foundDouble = Platform.getDouble(testBuffer.getBaseObject(), unalignedOffset); + +Assert.assertEquals(3.14159, foundDouble, 0.01); + } + + /** + * The goal of this test is to ensure that doubles written to a memory location + * using the long-buffered functor can be read back using the direct functor, + * and vice-versa. This is to ensure binary output such as parquet files written + * from one platform will be readable on another. + */ + @Test + public void mixedDoubleFunctors() { +sun.misc.Unsafe unsafe; +try { + Field unsafeField = Unsafe.class.getDeclaredField("theUnsafe"); + unsafeField.setAccessible(true); + unsafe = (sun.misc.Unsafe) unsafeField.get(null); + + MemoryBlock testBuffer = MemoryBlock.fromLongArray(new long[20]); + + // ensure offset is aligned so test can run on any host platform. + long alignedOffset = testBuffer.getBaseOffset(); + if (alignedOffset%8 != 0) { +alignedOffset += 8 - alignedOffset%8; + } + + DoubleAccessFunctor buffered = new DoubleAccessBuffered(); + DoubleAccessFunctor direct = new DoubleAccessDirect(); + + final double direct2bufferedValue = 3.14159; + final double buffered2directValue = 2.71828; + + buffered.putDouble(unsafe, testBuffer.getBaseObject(), alignedOffset, buffered2directValue); + Assert.assertEquals( + buffered2directValue, + direct.getDouble(unsafe, testBuffer.getBaseObject(), alignedOffset), + 0.01); + + direct.putDouble(unsafe, testBuffer.getBaseObject(), alignedOffset, direct2bufferedValue); + Assert.assertEquals( + direct2bufferedValue, + buffered.getDouble(unsafe, testBuffer.getBaseObject(), alignedOffset), + 0.01); +} catch (Throwable cause) { + // can't run test because Unsafe is unavailable. --- End diff -- refactored test to isolate the `Unsafe` object creation better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94053751 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/DoubleAccessFunctor.java --- @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe; + +import sun.misc.Unsafe; + +interface DoubleAccessFunctor { --- End diff -- No reason other than there was no commonality between the concrete classes other than the interface. I will switch to an abstract class given your performance input. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94049781 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java --- @@ -74,4 +76,71 @@ public void memoryDebugFillEnabledInTest() { Platform.getByte(offheap.getBaseObject(), offheap.getBaseOffset()), MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE); } + + /** + * This test ensures that double access is handled correctly for the host platform. + * On platforms that require double access to be aligned to 8-byte boundaries, this + * test should pass and not cause the JVM to segfault. + */ + @Test + public void unalignedDoublePutAndGet() { +MemoryBlock testBuffer = MemoryBlock.fromLongArray(new long[20]); + +// write a double to an unaligned location +long unalignedOffset = testBuffer.getBaseOffset() + 3; +// double check unalignment just to be sure: +if (unalignedOffset%8 == 0) { + ++unalignedOffset; --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94049706 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java --- @@ -74,4 +76,71 @@ public void memoryDebugFillEnabledInTest() { Platform.getByte(offheap.getBaseObject(), offheap.getBaseOffset()), MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE); } + + /** + * This test ensures that double access is handled correctly for the host platform. + * On platforms that require double access to be aligned to 8-byte boundaries, this + * test should pass and not cause the JVM to segfault. + */ + @Test + public void unalignedDoublePutAndGet() { +MemoryBlock testBuffer = MemoryBlock.fromLongArray(new long[20]); + +// write a double to an unaligned location +long unalignedOffset = testBuffer.getBaseOffset() + 3; +// double check unalignment just to be sure: +if (unalignedOffset%8 == 0) { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94049578 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -252,6 +272,7 @@ public static void throwException(Throwable t) { LONG_ARRAY_OFFSET = 0; FLOAT_ARRAY_OFFSET = 0; DOUBLE_ARRAY_OFFSET = 0; + _doubleAccessFunc = null; --- End diff -- I am setting it to `null` to keep in line with the behavior of the `_UNSAFE` class variable. In case, `_UNSAFE` is null. In that case, any calls to the `Platform.get...` or `Platform.put...` will result in a `NullPointerException`. I'm assuming it is a programming error elsewhere to call these methods without checking ( [example](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/MemoryManager.scala#L200) ). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94047852 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -244,6 +251,19 @@ public static void throwException(Throwable t) { LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class); FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class); DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class); + + // determine whether double access should be aligned. + String arch = System.getProperty("os.arch", ""); + if (arch.matches("^(arm|arm32)")) { +logger.info( +"Host platform '{}' requires aligned double access. "+ +"Using aligned double access method.", +arch); +_doubleAccessFunc = new DoubleAccessBuffered(); + } else { +// set the buffer address to null to indicate no buffer allocated +_doubleAccessFunc = new DoubleAccessDirect();; --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94047799 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -244,6 +251,19 @@ public static void throwException(Throwable t) { LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class); FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class); DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class); + + // determine whether double access should be aligned. + String arch = System.getProperty("os.arch", ""); + if (arch.matches("^(arm|arm32)")) { +logger.info( +"Host platform '{}' requires aligned double access. "+ +"Using aligned double access method.", +arch); +_doubleAccessFunc = new DoubleAccessBuffered(); + } else { +// set the buffer address to null to indicate no buffer allocated --- End diff -- removed (vestige from prior approach) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94047695 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -44,6 +48,9 @@ public static final int DOUBLE_ARRAY_OFFSET; private static final boolean unaligned; + + private static final DoubleAccessFunctor _doubleAccessFunc; --- End diff -- changed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94047557 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/DoubleAccessDirect.java --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe; + +import sun.misc.Unsafe; + +class DoubleAccessDirect implements DoubleAccessFunctor { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r94047419 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/DoubleAccessBuffered.java --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.unsafe; + +import sun.misc.Unsafe; + +public class DoubleAccessBuffered implements DoubleAccessFunctor { --- End diff -- No reason but forgot to uncheck the Eclipse modifiers when creating a new class. Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...
Github user michaelkamprath commented on the issue: https://github.com/apache/spark/pull/16403 I've refactored this issue's solution using the idea that @kiszk supplied above. This ended up being a better idea, thanks. It produced cleaner code than my original allocated buffer approach. Also, based on the input that this solution should have zero impact on the x86 platform, I've refactored to a functor design pattern which creates the equivalent performance behavior for x86 as before this fix. The performance impact of the using an intermediate long as a buffer when fetching a double from a memory block looks to be about 10%. I ran some performance tests writing and reading random doubles to random aligned locations in a buffer 100M times over. On my mac laptop and an ARM unit, the performance difference was the long-buffered approach was about 10% slower than the original direct fetch approach. However, since the this refactor isolates the fetching behavior into a functor, the 10% slowdown only affects the ARM platform. I also added some unit tests if only to convince myself I hadn't broken anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r93983319 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -244,6 +257,34 @@ public static void throwException(Throwable t) { LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class); FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class); DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class); + + // determine whether double access should be aligned. + String arch = System.getProperty("os.arch", ""); + if (arch.matches("^(arm|arm32)")) { +logger.info( +"Host platform '{}' requires aligned double access. "+ +"Creating an aligned buffer for unsafe double reads.", +arch); + +// allocate a 2x memory block to ensure buffer used is 8-byte aligned. Java +// objects are always aligned, so we just need to ensure the offset is aligned +// to an 8-byte boundary +byte[] heapObj = new byte[16]; +long offset = BYTE_ARRAY_OFFSET; +long bufferSize = 16; +for (long i = 0; i < 8; ++i ) { + if ((offset+i)%8 == 0) { +logger.debug("Found aligned buffer offset at {} + {}", offset, i); +offset += i; +bufferSize -= i; +break; + } --- End diff -- No, we can't, because we would still need the buffer to avoid doing the direct double read from an unaligned memory location on [line 131](https://github.com/michaelkamprath/spark/blob/30c6c997ca8e864a31d2f4dfa55d47b5aa629596/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L131). The reason for finding an aligned offset here is because on ARM7, the actual memory address for the `byte[0]` location is not the memory address of the `byte[]` object. That starting point for the 0-index item is indicated by `Platform.BYTE_ARRAY_OFFSET`, which is the Java overhead memory used for managing the `byte[]` object. So if we read from `byte[0]`, that could be unaligned because of the Java overhead, hence the reason to find the index in the byte buffer where overhead+index is aligned. I am depending on the Java behavior of aligning on objects here, the `byte[]` object) for needing to only check that the offset is also aligned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r93982964 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -119,7 +127,12 @@ public static void putFloat(Object object, long offset, float value) { } public static double getDouble(Object object, long offset) { -return _UNSAFE.getDouble(object, offset); +if ( null == _doubleBuffer) { + return _UNSAFE.getDouble(object, offset); +} else { + copyMemory(object, offset, _doubleBuffer.getBaseObject(), _doubleBuffer.getBaseOffset(), 8); + return _UNSAFE.getDouble(_doubleBuffer.getBaseObject(), _doubleBuffer.getBaseOffset()); --- End diff -- I will test this idea and report back. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r93982937 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -244,6 +257,34 @@ public static void throwException(Throwable t) { LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class); FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class); DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class); + + // determine whether double access should be aligned. + String arch = System.getProperty("os.arch", ""); + if (arch.matches("^(arm|arm32)")) { +logger.info( +"Host platform '{}' requires aligned double access. "+ +"Creating an aligned buffer for unsafe double reads.", +arch); + +// allocate a 2x memory block to ensure buffer used is 8-byte aligned. Java +// objects are always aligned, so we just need to ensure the offset is aligned +// to an 8-byte boundary +byte[] heapObj = new byte[16]; +long offset = BYTE_ARRAY_OFFSET; +long bufferSize = 16; +for (long i = 0; i < 8; ++i ) { --- End diff -- Simply to match the `offset` variable, which needs to be a long. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
Github user michaelkamprath commented on a diff in the pull request: https://github.com/apache/spark/pull/16403#discussion_r93982849 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java --- @@ -44,6 +49,9 @@ public static final int DOUBLE_ARRAY_OFFSET; private static final boolean unaligned; + + private static final MemoryBlock _doubleBuffer; --- End diff -- Admittedly, I am not deeply familiar with how this class is being using concurrently within a single JVM. If multiple threads are using it concurrently, you are right and a different approach must be found. I was trying to avoid the cost of reallocating a buffer with ever call to `getDouble()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...
GitHub user michaelkamprath opened a pull request: https://github.com/apache/spark/pull/16403 [SPARK-18819][CORE] Double byte alignment on ARM platforms ## What changes were proposed in this pull request? On 32-bit ARM platforms, doubles require 8-byte alignment. The use of Java's `sun.misc.Unsafe` library for memory block management exposes Spark to misaligned doubles when using `org.apache.spark.unsafe.Platform.getDouble()` to extract a double from a memory block with arbitrary offset locations. For the alignment sensitive platforms, this proposed change creates an properly aligned buffer that the bytes to be cast to a double are first copied to, then the double the double is extracted via the `sun.misc.Unsafe.getDouble()` method. On all other platforms the previous behavior is maintained, which was to cast the double value from the memory block regardless of offset alignment . ## How was this patch tested? All unit tests pass, plus tested on both x86 and ARM71 platforms. For the ARM71 platform, it is verified that this fix removes the problem as described in SPARK-18819. You can merge this pull request into a Git repository by running: $ git pull https://github.com/michaelkamprath/spark fix/SPARK-18819_double-alignment Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16403.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16403 commit 30c6c997ca8e864a31d2f4dfa55d47b5aa629596 Author: Michael Kamprath <mich...@claireware.com> Date: 2016-12-25T21:54:26Z [SPARK-18819] Double alignment on ARM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org