[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-06 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22338
  
Since the change looks safer to me and it does fix the regression, I'm 
merging it to unblock 2.4 release. Please continue to investigate the root 
cause, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95721/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22338
  
**[Test build #95721 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95721/testReport)**
 for PR 22338 at commit 
[`6cb94ed`](https://github.com/apache/spark/commit/6cb94ed5ed76e23e4fb775a2fd6f0e66e9c15abd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22338
  
Yeah, it's interesting... Seems both `checkedCast` and `Platform.getByte` 
are changed and performance gets gain. Any single change doesn't work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22338
  
> While I say these performance differences, I do not understand why these 
difference occurs completely. That is why I said "I have not found the root 
cause".

Yes, I am in the same situation. I see the difference but I cannot explain 
it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22338
  
While I say these performance differences, I do not understand why these 
difference occurs completely. That is why I said "I have not found the root 
cause".
Let us narrow down the problem and find the root cause.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22338
  
**[Test build #95721 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95721/testReport)**
 for PR 22338 at commit 
[`6cb94ed`](https://github.com/apache/spark/commit/6cb94ed5ed76e23e4fb775a2fd6f0e66e9c15abd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2877/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22338
  
you're right @kiszk. Let me update the PR accordingly in order to narrow 
down the change/problem, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22338
  
>This does fix the regression, will we also lose the perf speed up too? 
IIUC we did observe significant perf boost when introducing the memory block.

@cloud-fan I checked the original PR and I couldn't find any benchmark 
related to Murmur3 hash. The only benchmark I found there was using the 
`HiveHasher`. So I cannot really answer on this. Do you have a benchmark to 
test it? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22338
  
In addition to [your 
commit](https://github.com/apache/spark/pull/22338/commits/91adce590461dda885d88319a700a775e63f9ce6),
 I applied the following change, basically use `MemoryBlock` in 
`hashBytesByIntBlock()`. I got 33us.

```
  private static int hashBytesByIntBlock(MemoryBlock base, int 
lengthInBytes, int seed) {
assert (lengthInBytes % 4 == 0);
int h1 = seed;
for (int i = 0; i < lengthInBytes; i += 4) {
  int halfWord = base.getInt(i);
  int k1 = mixK1(halfWord);
  h1 = mixH1(h1, k1);
}
return h1;
  }
```

But, furthermore, when I also used `MemoryBlock` in 
`hashUnsafeBytesBlock()`, I got 111us.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22338
  
@kiszk yes, I know. Moving `checkedCast` makes a big difference, but if you 
just move it, without the other changes, there is no perf gain (at least this 
is what I found in my experiments).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95704/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22338
  
@mgaido91 thanks, interestingly I did experiments with similar code in my 
box.
While I am using the linux box, I can confirm the performance improvement 
(or performance recover).

Regarding bytecode size, what I have seen in my experiments is the 
following. I have not found the root cause of this behavior.

Your original code: 38us 
```
  public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
return hashUnsafeBytesBlock(base, Ints.checkedCast(base.size()), seed); 

   
  }

  private static int hashUnsafeBytesBlock(MemoryBlock base, int 
lengthInBytes, int seed) {
// This is not compatible with original and another implementations.

  
// But remain it for backward compatibility for the components existing 
before 2.3. 
  
assert (lengthInBytes >= 0): "lengthInBytes cannot be negative";
int lengthAligned = lengthInBytes - lengthInBytes % 4;

```

Your original code with moving `checkedCast`: 112us 
```
  public static int hashUnsafeBytesBlock(MemoryBlock base, int seed) {
// return hashUnsafeBytesBlock(base, Ints.checkedCast(base.size()), 
seed);  
  
return hashUnsafeBytesBlock(base, -1, seed);
  }

  private static int hashUnsafeBytesBlock(MemoryBlock base, int 
lengthInBytes, int seed) {
// This is not compatible with original and another implementations.

  
// But remain it for backward compatibility for the components existing 
before 2.3. 
  
lengthInBytes = Ints.checkedCast(base.size());
assert (lengthInBytes >= 0): "lengthInBytes cannot be negative";
int lengthAligned = lengthInBytes - lengthInBytes % 4;
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22338
  
**[Test build #95704 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95704/testReport)**
 for PR 22338 at commit 
[`91adce5`](https://github.com/apache/spark/commit/91adce590461dda885d88319a700a775e63f9ce6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22338
  
This basically reverts the memory block in the hash computing, now the 
memory block is just a holder of the base object and base offset. This does fix 
the regression, will we also lose the perf speed up too? IIUC we did observe 
significant perf boost when introducing the memory block.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22338
  
@cloud-fan I think I tried doing something like what you suggested but it 
didn't help. Moreover, the current code in `MemoryBlock` already leverages 
`Platform.get*` for most of its methods, so that wouldn't really change much I 
think.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22338
  
Thanks for working on it! Will it be helpful if we move these hash methods 
to `MemoryBlock`? e.g. the code can be `int halfWord =bytes[offset + i];`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22338
  
cc @cloud-fan  @kiszk 

I checked the bytecode generated and the size of the generated code seems 
not to be the issue either. FYI I am attaching here the disassembled code 
before and after:

[afterPatch.txt](https://github.com/apache/spark/files/2352699/afterPatch.txt)

[beforePatch.txt](https://github.com/apache/spark/files/2352700/beforePatch.txt)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22338
  
**[Test build #95704 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95704/testReport)**
 for PR 22338 at commit 
[`91adce5`](https://github.com/apache/spark/commit/91adce590461dda885d88319a700a775e63f9ce6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2865/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22338: [SPARK-25317][CORE] Avoid perf regression in Murmur3 Has...

2018-09-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22338
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org