[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...

2017-01-21 Thread michaelkamprath
Github user michaelkamprath commented on the issue:

https://github.com/apache/spark/pull/16403
  
I am withdrawing this pull request and will instead publish a patch for 
those who are interested in using Spark on a platforms requiring byte alignment 
for double access.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2017-01-21 Thread michaelkamprath
Github user michaelkamprath closed the pull request at:

https://github.com/apache/spark/pull/16403


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...

2017-01-02 Thread michaelkamprath
Github user michaelkamprath commented on the issue:

https://github.com/apache/spark/pull/16403
  
@srowen No one has any delusions that an ARM platform is the place to do 
heavy lifting for data. But if your goal is academic in nature (hands on 
learning how to set up and operate a cluster,  learning the API, etc), it's a 
very economical platform. I do understand your concerns for the core target and 
totally get why you wouldn't want this fix in the main line, so I will instead 
pursue a patch approach for those who'd be interested. I'm not sure what that 
means here ... do you or I close this pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-30 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94259189
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -22,10 +22,14 @@
 import java.lang.reflect.Method;
 import java.nio.ByteBuffer;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import sun.misc.Cleaner;
 import sun.misc.Unsafe;
 
 public final class Platform {
--- End diff --

I missed that. I'll address that when we determine the final path here (per 
my comment below).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-30 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94257179
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -244,6 +251,18 @@ public static void throwException(Throwable t) {
   LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class);
   FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class);
   DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class);
+
+  // determine whether double access should be aligned.
+  String arch = System.getProperty("os.arch", "");
+  if (arch.matches("^(arm|arm32)")) {
--- End diff --

@kiszk I have tested on ARM 64 (`aarch64`). [Any alignment works for double 
access 
there](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch08s02.html),
 though 8-byte aligned access looks to be about 10% faster than unaligned 
access. Using an intermediate long buffer (your idea) is about 5% slower than 
direct access regardless of being aligned or not. In both cases, I tested with 
an ODROID C2 using Oracle Java 8.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...

2016-12-30 Thread michaelkamprath
Github user michaelkamprath commented on the issue:

https://github.com/apache/spark/pull/16403
  
@srowen To answer the use case question, it is primarily academic for 
learning and testing. Students and researchers build clusters of Raspberry PI, 
ODROID, or other SBCs to have a cost effective access to a multi-node hardware 
cluster. [Here](http://likemagicappears.com/projects/raspberry-pi-cluster/) 
[are](http://coen.boisestate.edu/ece/research-areas/raspberry-pi/) 
[some](https://www.raspberrypi.org/magpi/pi-spark-supercomputer/) 
[examples](http://hackaday.com/2016/05/09/designing-a-high-performance-parallel-personal-cluster/)
 
[of](http://katie.atomicburn.com/2016/06/12/2016-school-gt-exhibition-raspberry-piodroid-c2-supercomputer/)
 [projects](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4803722/). There is 
even [a commercial vendor](https://www.picocluster.com/collections/) selling 
these SBC clusters. [In my own 
case](http://diybigdata.net/odroid-xu4-cluster/), its being used to 
economically learn how to deal problems of efficiency (it's easier to spot and 
work through patter
 ns of inefficiency on constrained systems than full powered systems). 

I am personally not aware if there are currently any server-class CPUs that 
requires double alignment. Double alignment SPARC processors used to be the 
bane of my existence in the early 2000's, but that was over a decade ago. My 
understanding is that today x86 supports unaligned double access with [a 
theoretical performance 
hit](https://developers.redhat.com/blog/2016/06/01/how-to-avoid-wasting-megabytes-of-memory-a-few-bytes-at-a-time/)
 that [in practice is rarely 
seen](http://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/).
 Typically you never concern yourself with alignment in Java because the JVM 
takes care of it for you, but here we are delving into the world of Unsafe, 
which bypasses the protections the JVM provides. Admittedly, it took me a long 
while to even figure out that my problem was related to alignment because as 
indicated, I haven't dealt with such issues in over a decade.

With all that said, maybe a better approach here is to create a patch that 
users can use to create a spark build when they want to run Spark on a system 
that requires double alignment, which to the best of my knowledge are currently 
just ARM 32-bit CPUs. That would even let to be more concise, without needing 
to determine at runtime which method to use. And if ever should arise a 
server-class CPU with alignment requirements, we know what to do.

Given that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94054024
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java ---
@@ -74,4 +76,71 @@ public void memoryDebugFillEnabledInTest() {
   Platform.getByte(offheap.getBaseObject(), offheap.getBaseOffset()),
   MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
   }
+
+  /**
+   * This test ensures that double access is handled correctly for the 
host platform.
+   * On platforms that require double access to be aligned to 8-byte 
boundaries, this
+   * test should pass and not cause the JVM to segfault.
+   */
+  @Test
+  public void unalignedDoublePutAndGet() {
+MemoryBlock testBuffer = MemoryBlock.fromLongArray(new long[20]);
+
+// write a double to an unaligned location
+long unalignedOffset = testBuffer.getBaseOffset() + 3;
+// double check unalignment just to be sure:
+if (unalignedOffset%8 == 0) {
+  ++unalignedOffset;
+}
+
+Platform.putDouble(testBuffer.getBaseObject(), unalignedOffset, 
3.14159);
+
+double foundDouble = Platform.getDouble(testBuffer.getBaseObject(), 
unalignedOffset);
+
+Assert.assertEquals(3.14159, foundDouble, 0.01);
+  }
+
+  /**
+   * The goal of this test is to ensure that doubles written to a memory 
location
+   * using the long-buffered functor can be read back using the direct 
functor,
+   * and vice-versa. This is to ensure binary output such as parquet files 
written
+   * from one platform will be readable on another.
+   */
+  @Test
+  public void mixedDoubleFunctors() {
+sun.misc.Unsafe unsafe;
+try {
+  Field unsafeField = Unsafe.class.getDeclaredField("theUnsafe");
+  unsafeField.setAccessible(true);
+  unsafe = (sun.misc.Unsafe) unsafeField.get(null);
+
+  MemoryBlock testBuffer = MemoryBlock.fromLongArray(new long[20]);
+
+  // ensure offset is aligned so test can run on any host platform.
+  long alignedOffset = testBuffer.getBaseOffset();
+  if (alignedOffset%8 != 0) {
+alignedOffset += 8 - alignedOffset%8;
+  }
+
+  DoubleAccessFunctor buffered = new DoubleAccessBuffered();
+  DoubleAccessFunctor direct = new DoubleAccessDirect();
+
+  final double direct2bufferedValue = 3.14159;
+  final double buffered2directValue = 2.71828;
+
+  buffered.putDouble(unsafe, testBuffer.getBaseObject(), 
alignedOffset, buffered2directValue);
+  Assert.assertEquals(
+  buffered2directValue,
+  direct.getDouble(unsafe, testBuffer.getBaseObject(), 
alignedOffset),
+  0.01);
+
+  direct.putDouble(unsafe, testBuffer.getBaseObject(), alignedOffset, 
direct2bufferedValue);
+  Assert.assertEquals(
+  direct2bufferedValue,
+  buffered.getDouble(unsafe, testBuffer.getBaseObject(), 
alignedOffset),
+  0.01);
+} catch (Throwable cause) {
+  // can't run test because Unsafe is unavailable.
--- End diff --

refactored test to isolate the `Unsafe` object creation better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94053751
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/DoubleAccessFunctor.java ---
@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe;
+
+import sun.misc.Unsafe;
+
+interface DoubleAccessFunctor {
--- End diff --

No reason other than there was no commonality between the concrete classes 
other than the interface. I will switch to an abstract class given your 
performance input. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94049781
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java ---
@@ -74,4 +76,71 @@ public void memoryDebugFillEnabledInTest() {
   Platform.getByte(offheap.getBaseObject(), offheap.getBaseOffset()),
   MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
   }
+
+  /**
+   * This test ensures that double access is handled correctly for the 
host platform.
+   * On platforms that require double access to be aligned to 8-byte 
boundaries, this
+   * test should pass and not cause the JVM to segfault.
+   */
+  @Test
+  public void unalignedDoublePutAndGet() {
+MemoryBlock testBuffer = MemoryBlock.fromLongArray(new long[20]);
+
+// write a double to an unaligned location
+long unalignedOffset = testBuffer.getBaseOffset() + 3;
+// double check unalignment just to be sure:
+if (unalignedOffset%8 == 0) {
+  ++unalignedOffset;
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94049706
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java ---
@@ -74,4 +76,71 @@ public void memoryDebugFillEnabledInTest() {
   Platform.getByte(offheap.getBaseObject(), offheap.getBaseOffset()),
   MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
   }
+
+  /**
+   * This test ensures that double access is handled correctly for the 
host platform.
+   * On platforms that require double access to be aligned to 8-byte 
boundaries, this
+   * test should pass and not cause the JVM to segfault.
+   */
+  @Test
+  public void unalignedDoublePutAndGet() {
+MemoryBlock testBuffer = MemoryBlock.fromLongArray(new long[20]);
+
+// write a double to an unaligned location
+long unalignedOffset = testBuffer.getBaseOffset() + 3;
+// double check unalignment just to be sure:
+if (unalignedOffset%8 == 0) {
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94049578
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -252,6 +272,7 @@ public static void throwException(Throwable t) {
   LONG_ARRAY_OFFSET = 0;
   FLOAT_ARRAY_OFFSET = 0;
   DOUBLE_ARRAY_OFFSET = 0;
+  _doubleAccessFunc = null;
--- End diff --

I am setting it to `null` to keep in line with the behavior of the 
`_UNSAFE` class variable. In case, `_UNSAFE` is null. In that case, any calls 
to the `Platform.get...` or `Platform.put...` will result in a 
`NullPointerException`. I'm assuming it is a programming error elsewhere to 
call these methods without checking ( 
[example](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/MemoryManager.scala#L200)
 ).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94047852
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -244,6 +251,19 @@ public static void throwException(Throwable t) {
   LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class);
   FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class);
   DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class);
+
+  // determine whether double access should be aligned.
+  String arch = System.getProperty("os.arch", "");
+  if (arch.matches("^(arm|arm32)")) {
+logger.info(
+"Host platform '{}' requires aligned double access. "+
+"Using aligned double access method.",
+arch);
+_doubleAccessFunc = new DoubleAccessBuffered();
+  } else {
+// set the buffer address to null to indicate no buffer allocated
+_doubleAccessFunc = new DoubleAccessDirect();;
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94047799
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -244,6 +251,19 @@ public static void throwException(Throwable t) {
   LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class);
   FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class);
   DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class);
+
+  // determine whether double access should be aligned.
+  String arch = System.getProperty("os.arch", "");
+  if (arch.matches("^(arm|arm32)")) {
+logger.info(
+"Host platform '{}' requires aligned double access. "+
+"Using aligned double access method.",
+arch);
+_doubleAccessFunc = new DoubleAccessBuffered();
+  } else {
+// set the buffer address to null to indicate no buffer allocated
--- End diff --

removed (vestige from prior approach)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94047695
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -44,6 +48,9 @@
   public static final int DOUBLE_ARRAY_OFFSET;
 
   private static final boolean unaligned;
+
+  private static final DoubleAccessFunctor _doubleAccessFunc;
--- End diff --

changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94047557
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/DoubleAccessDirect.java ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe;
+
+import sun.misc.Unsafe;
+
+class DoubleAccessDirect implements DoubleAccessFunctor {
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r94047419
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/DoubleAccessBuffered.java 
---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe;
+
+import sun.misc.Unsafe;
+
+public class DoubleAccessBuffered implements DoubleAccessFunctor {
--- End diff --

No reason but forgot to uncheck the Eclipse modifiers when creating a new 
class. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16403: [SPARK-18819][CORE] Double byte alignment on ARM platfor...

2016-12-28 Thread michaelkamprath
Github user michaelkamprath commented on the issue:

https://github.com/apache/spark/pull/16403
  
I've refactored this issue's solution using the idea that @kiszk supplied 
above. This ended up being a better idea, thanks. It produced cleaner code than 
my original allocated buffer approach. Also, based on the input that this 
solution should have zero impact on the x86 platform, I've refactored to a 
functor design pattern which creates the equivalent performance behavior for 
x86 as before this fix. 

The performance impact of the using an intermediate long as a buffer when 
fetching a double from a memory block looks to be about 10%. I ran some 
performance tests writing and reading random doubles to random  aligned 
locations in a buffer 100M times over. On my mac laptop and an ARM unit, the 
performance difference was the long-buffered approach was about 10% slower than 
the original direct fetch approach. However, since the this refactor isolates 
the fetching behavior into a functor, the 10% slowdown only affects the ARM 
platform.

I also added some unit tests if only to convince myself I hadn't broken 
anything. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-27 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r93983319
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -244,6 +257,34 @@ public static void throwException(Throwable t) {
   LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class);
   FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class);
   DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class);
+
+  // determine whether double access should be aligned.
+  String arch = System.getProperty("os.arch", "");
+  if (arch.matches("^(arm|arm32)")) {
+logger.info(
+"Host platform '{}' requires aligned double access. "+
+"Creating an aligned buffer for unsafe double reads.",
+arch);
+
+// allocate a 2x memory block to ensure buffer used is 8-byte 
aligned. Java
+// objects are always aligned, so we just need to ensure the 
offset is aligned
+// to an 8-byte boundary
+byte[] heapObj = new byte[16];
+long offset = BYTE_ARRAY_OFFSET;
+long bufferSize = 16;
+for (long i = 0; i < 8; ++i ) {
+  if ((offset+i)%8 == 0) {
+logger.debug("Found aligned buffer offset at {} + {}", offset, 
i);
+offset += i;
+bufferSize -= i;
+break;
+  }
--- End diff --

No, we can't, because we would still need the buffer to avoid doing the 
direct double read from  an unaligned memory location on [line 
131](https://github.com/michaelkamprath/spark/blob/30c6c997ca8e864a31d2f4dfa55d47b5aa629596/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L131).
 

The reason for finding an aligned offset here is because on ARM7, the 
actual memory address for the `byte[0]` location is not the memory address of 
the `byte[]` object. That starting point for the 0-index item is indicated by 
`Platform.BYTE_ARRAY_OFFSET`, which is the Java overhead memory used for 
managing the `byte[]` object. So if we read from `byte[0]`, that could be 
unaligned because of the Java overhead, hence the reason to find the index in 
the byte buffer where overhead+index is aligned. I am depending on the Java 
behavior of aligning on objects here, the `byte[]` object) for needing to only 
check that the offset is also aligned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-27 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r93982964
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -119,7 +127,12 @@ public static void putFloat(Object object, long 
offset, float value) {
   }
 
   public static double getDouble(Object object, long offset) {
-return _UNSAFE.getDouble(object, offset);
+if ( null == _doubleBuffer) {
+  return _UNSAFE.getDouble(object, offset);
+} else {
+  copyMemory(object, offset, _doubleBuffer.getBaseObject(), 
_doubleBuffer.getBaseOffset(), 8);
+  return _UNSAFE.getDouble(_doubleBuffer.getBaseObject(), 
_doubleBuffer.getBaseOffset());
--- End diff --

I will test this idea and report back. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-27 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r93982937
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -244,6 +257,34 @@ public static void throwException(Throwable t) {
   LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class);
   FLOAT_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(float[].class);
   DOUBLE_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(double[].class);
+
+  // determine whether double access should be aligned.
+  String arch = System.getProperty("os.arch", "");
+  if (arch.matches("^(arm|arm32)")) {
+logger.info(
+"Host platform '{}' requires aligned double access. "+
+"Creating an aligned buffer for unsafe double reads.",
+arch);
+
+// allocate a 2x memory block to ensure buffer used is 8-byte 
aligned. Java
+// objects are always aligned, so we just need to ensure the 
offset is aligned
+// to an 8-byte boundary
+byte[] heapObj = new byte[16];
+long offset = BYTE_ARRAY_OFFSET;
+long bufferSize = 16;
+for (long i = 0; i < 8; ++i ) {
--- End diff --

Simply to match the `offset` variable, which needs to be a long.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-27 Thread michaelkamprath
Github user michaelkamprath commented on a diff in the pull request:

https://github.com/apache/spark/pull/16403#discussion_r93982849
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -44,6 +49,9 @@
   public static final int DOUBLE_ARRAY_OFFSET;
 
   private static final boolean unaligned;
+
+  private static final MemoryBlock _doubleBuffer;
--- End diff --

Admittedly, I am not deeply familiar with how this class is being using 
concurrently within a single JVM. If multiple threads are using it 
concurrently, you are right and a different approach must be found. I was 
trying to avoid the cost of reallocating a buffer with ever call to 
`getDouble()`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16403: [SPARK-18819][CORE] Double byte alignment on ARM ...

2016-12-26 Thread michaelkamprath
GitHub user michaelkamprath opened a pull request:

https://github.com/apache/spark/pull/16403

[SPARK-18819][CORE] Double byte alignment on ARM platforms

## What changes were proposed in this pull request?

On 32-bit ARM platforms, doubles require 8-byte alignment. The use of 
Java's `sun.misc.Unsafe` library for memory block management exposes Spark to 
misaligned doubles when using `org.apache.spark.unsafe.Platform.getDouble()` to 
extract a double from a memory block with arbitrary offset locations. For the 
alignment sensitive platforms, this proposed change creates an properly aligned 
buffer that the bytes to be cast to a double are first copied to, then the 
double the double is extracted via the `sun.misc.Unsafe.getDouble()` method. On 
all other platforms the previous behavior is maintained, which was to cast the 
double value from the memory block regardless of offset alignment . 

## How was this patch tested?

All unit tests pass, plus tested on both x86 and ARM71 platforms. For the 
ARM71 platform, it is verified that this fix removes the problem as described 
in SPARK-18819.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/michaelkamprath/spark 
fix/SPARK-18819_double-alignment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16403


commit 30c6c997ca8e864a31d2f4dfa55d47b5aa629596
Author: Michael Kamprath <mich...@claireware.com>
Date:   2016-12-25T21:54:26Z

[SPARK-18819] Double alignment on ARM




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org