[ 
https://issues.apache.org/jira/browse/FLINK-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591734#comment-15591734
 ] 

ASF GitHub Bot commented on FLINK-3722:
---------------------------------------

Github user ggevay commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2628#discussion_r84248344
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/QuickSort.java
 ---
    @@ -49,20 +49,40 @@ protected static int getMaxDepth(int x) {
         * then switch to {@link HeapSort}.
         */
        public void sort(final IndexedSortable s, int p, int r) {
    -           sortInternal(s, p, r, getMaxDepth(r - p));
    +           int recordsPerSegment = s.recordsPerSegment();
    +           int recordSize = s.recordSize();
    +
    +           int maxOffset = recordSize * (recordsPerSegment - 1);
    +
    +           int size = s.size();
    +           int sizeN = size / recordsPerSegment;
    +           int sizeO = (size % recordsPerSegment) * recordSize;
    +
    +           sortInternal(s, recordsPerSegment, recordSize, maxOffset, 0, 0, 
0, size, sizeN, sizeO, getMaxDepth(r - p));
        }
     
        public void sort(IndexedSortable s) {
                sort(s, 0, s.size());
        }
     
    -   private static void sortInternal(final IndexedSortable s, int p, int r, 
int depth) {
    +   private static void sortInternal(final IndexedSortable s, int 
recordsPerSegment, int recordSize, int maxOffset,
    +                   int p, int pN, int pO, int r, int rN, int rO, int 
depth) {
    --- End diff --
    
    Could you please add a comment that explains all these parameters? (I 
understand them only because I know the original code and also what you are 
trying to achieve, but for someone who sees the code for the first time this 
will be quite scary.)


> The divisions in the InMemorySorters' swap/compare methods hurt performance
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-3722
>                 URL: https://issues.apache.org/jira/browse/FLINK-3722
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Local Runtime
>            Reporter: Gabor Gevay
>            Assignee: Greg Hogan
>            Priority: Minor
>              Labels: performance
>
> NormalizedKeySorter's and FixedLengthRecordSorter's swap and compare methods 
> use divisions (which take a lot of time \[1\]) to calculate the index of the 
> MemorySegment and the offset inside the segment. [~greghogan] reported on the 
> mailing list \[2\] measuring a ~12-14% performance effect in one case.
> A possibility to improve the situation is the following:
> The way that QuickSort mostly uses these compare and swap methods is that it 
> maintains two indices, and uses them to call compare and swap. The key 
> observation is that these indices are mostly stepped by one, and 
> _incrementally_ calculating the quotient and modulo is actually easy when the 
> index changes only by one: increment/decrement the modulo, and check whether 
> the modulo has reached 0 or the divisor, and if it did, then wrap-around the 
> modulo and increment/decrement the quotient.
> To implement this, InMemorySorter would have to expose an iterator that would 
> have the divisor and the current modulo and quotient as state, and have 
> increment/decrement methods. Compare and swap could then have overloads that 
> take these iterators as arguments.
> \[1\] http://www.agner.org/optimize/instruction_tables.pdf
> \[2\] 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Macro-benchmarking-for-performance-tuning-and-regression-detection-td11078.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to