[ 
https://issues.apache.org/jira/browse/FLINK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592292#comment-15592292
 ] 

Stefan Richter edited comment on FLINK-4867 at 10/20/16 4:32 PM:
-----------------------------------------------------------------

If sort performance is crucial to you, I wrote some inplace radix sort 
algorithm that was extremely fast for me in a precious project. On primitives 
types and serialized byte strings I found it typically factor 2-3x faster than 
JDK's Arrays.sort() for primitives or multikey quicksort for Strings). I was 
considering to port it onto Flink, but did not find the time yet. As it is 
radix based and not comparison based, it would require some way to expose 
partial sort keys instead of a compareTo method . If that is interesting to you 
let me know and I can share the original code.


was (Author: srichter):
If you sort performance is crucial to you, I wrote some inplace radix sort 
algorithm that was extremely fast for me in a precious project. On primitives 
types and serialized byte strings I found it typically factor 2-3x faster than 
JDK's Arrays.sort() for primitives or multikey quicksort for Strings). I was 
considering to port it onto Flink, but did not find the time yet. As it is 
radix based and not comparison based, it would require some way to expose 
partial sort keys instead of a compareTo method . If that is interesting to you 
let me know and I can share the original code.

> Investigate code generation for improving sort performance
> ----------------------------------------------------------
>
>                 Key: FLINK-4867
>                 URL: https://issues.apache.org/jira/browse/FLINK-4867
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Local Runtime
>            Reporter: Gabor Gevay
>            Priority: Minor
>              Labels: performance
>
> This issue is for investigating whether code generation could speed up 
> sorting. We should make some performance measurements on hand-written code 
> that is similar to what we could generate, to see whether investing more time 
> into this is worth it. If we find that it is worth it, we can open a second 
> Jira for the actual implementation of the code generation.
> I think we could generate one class at places where we currently instantiate 
> {{QuickSort}}. This generated class would include the functionality of 
> {{QuickSort}}, {{NormalizedKeySorter}} or {{FixedLengthRecordSorter}}, 
> {{MemorySegment.compare}}, and {{MemorySegment.swap}}.
> Btw. I'm planning to give this as a student project at a TU Berlin course in 
> the next few months.
> Some concrete ideas about how could a generated sorter be faster than the 
> current sorting code:
> * {{MemorySegment.compare}} could be specialized for
> ** Length: for small records, the loop could be unrolled
> ** Endiannes (currently it is optimized for big endian; and in the little 
> endian case (e.g. x86) it does a Long.reverseBytes for each long read)
> * {{MemorySegment.swapBytes}}
> ** In case of small records, using three {{UNSAFE.copyMemory}} is probably 
> not as efficient as a specialized swap, because
> *** We could use total loop unrolling in generated code (because we know the 
> exact record size)
> *** {{UNSAFE.copyMemory}} checks for alignment first \[6,9\]
> *** We will only need 2/3 the memory bandwidth, because the temporary storage 
> could be a register if we swap one byte (or one {{long}}) at a time
> ** several checks might be eliminated
> * Better inlining behaviour could be achieved 
> ** Virtual function calls to the methods of {{InMemorySorter}} could be 
> eliminated. (Note, that these are problematic to devirtualize by the JVM if 
> there are different derived classes used in a single Flink job (see \[8,7\]).)
> ** {{MemorySegment.swapBytes}} is probably not inlined currently, because the 
> excessive checks make it too large
> ** {{MemorySegment.compare}} is probably also not inlined currently, because 
> those two while loops are too large
> \[6\] http://www.docjar.com/docs/api/sun/misc/Unsafe.html#copyMemory(Object, 
> long, Object, long, long)
> \[7\] https://shipilev.net/blog/2015/black-magic-method-dispatch/
> \[8\] 
> http://insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/
> \[9\] 
> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/cpu/x86/vm/stubGenerator_x86_64.cpp#l2409



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to