[jira] [Updated] (FLINK-3722) The divisions in the InMemorySorters' swap/compare methods hurt performance
[ https://issues.apache.org/jira/browse/FLINK-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-3722: -- Fix Version/s: 1.3.0 > The divisions in the InMemorySorters' swap/compare methods hurt performance > --- > > Key: FLINK-3722 > URL: https://issues.apache.org/jira/browse/FLINK-3722 > Project: Flink > Issue Type: Sub-task > Components: Local Runtime >Reporter: Gabor Gevay >Assignee: Greg Hogan >Priority: Minor > Labels: performance > Fix For: 1.3.0 > > > NormalizedKeySorter's and FixedLengthRecordSorter's swap and compare methods > use divisions (which take a lot of time \[1\]) to calculate the index of the > MemorySegment and the offset inside the segment. [~greghogan] reported on the > mailing list \[2\] measuring a ~12-14% performance effect in one case. > A possibility to improve the situation is the following: > The way that QuickSort mostly uses these compare and swap methods is that it > maintains two indices, and uses them to call compare and swap. The key > observation is that these indices are mostly stepped by one, and > _incrementally_ calculating the quotient and modulo is actually easy when the > index changes only by one: increment/decrement the modulo, and check whether > the modulo has reached 0 or the divisor, and if it did, then wrap-around the > modulo and increment/decrement the quotient. > To implement this, InMemorySorter would have to expose an iterator that would > have the divisor and the current modulo and quotient as state, and have > increment/decrement methods. Compare and swap could then have overloads that > take these iterators as arguments. > \[1\] http://www.agner.org/optimize/instruction_tables.pdf > \[2\] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Macro-benchmarking-for-performance-tuning-and-regression-detection-td11078.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (FLINK-3722) The divisions in the InMemorySorters' swap/compare methods hurt performance
[ https://issues.apache.org/jira/browse/FLINK-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-3722: -- Issue Type: Sub-task (was: Improvement) Parent: FLINK-4860 > The divisions in the InMemorySorters' swap/compare methods hurt performance > --- > > Key: FLINK-3722 > URL: https://issues.apache.org/jira/browse/FLINK-3722 > Project: Flink > Issue Type: Sub-task > Components: Local Runtime >Reporter: Gabor Gevay >Assignee: Greg Hogan >Priority: Minor > Labels: performance > > NormalizedKeySorter's and FixedLengthRecordSorter's swap and compare methods > use divisions (which take a lot of time \[1\]) to calculate the index of the > MemorySegment and the offset inside the segment. [~greghogan] reported on the > mailing list \[2\] measuring a ~12-14% performance effect in one case. > A possibility to improve the situation is the following: > The way that QuickSort mostly uses these compare and swap methods is that it > maintains two indices, and uses them to call compare and swap. The key > observation is that these indices are mostly stepped by one, and > _incrementally_ calculating the quotient and modulo is actually easy when the > index changes only by one: increment/decrement the modulo, and check whether > the modulo has reached 0 or the divisor, and if it did, then wrap-around the > modulo and increment/decrement the quotient. > To implement this, InMemorySorter would have to expose an iterator that would > have the divisor and the current modulo and quotient as state, and have > increment/decrement methods. Compare and swap could then have overloads that > take these iterators as arguments. > \[1\] http://www.agner.org/optimize/instruction_tables.pdf > \[2\] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Macro-benchmarking-for-performance-tuning-and-regression-detection-td11078.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3722) The divisions in the InMemorySorters' swap/compare methods hurt performance
[ https://issues.apache.org/jira/browse/FLINK-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay updated FLINK-3722: --- Assignee: (was: Gabor Gevay) > The divisions in the InMemorySorters' swap/compare methods hurt performance > --- > > Key: FLINK-3722 > URL: https://issues.apache.org/jira/browse/FLINK-3722 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Reporter: Gabor Gevay >Priority: Minor > Labels: performance > > NormalizedKeySorter's and FixedLengthRecordSorter's swap and compare methods > use divisions (which take a lot of time \[1\]) to calculate the index of the > MemorySegment and the offset inside the segment. [~greghogan] reported on the > mailing list \[2\] measuring a ~12-14% performance effect in one case. > A possibility to improve the situation is the following: > The way that QuickSort mostly uses these compare and swap methods is that it > maintains two indices, and uses them to call compare and swap. The key > observation is that these indices are mostly stepped by one, and > _incrementally_ calculating the quotient and modulo is actually easy when the > index changes only by one: increment/decrement the modulo, and check whether > the modulo has reached 0 or the divisor, and if it did, then wrap-around the > modulo and increment/decrement the quotient. > To implement this, InMemorySorter would have to expose an iterator that would > have the divisor and the current modulo and quotient as state, and have > increment/decrement methods. Compare and swap could then have overloads that > take these iterators as arguments. > \[1\] http://www.agner.org/optimize/instruction_tables.pdf > \[2\] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Macro-benchmarking-for-performance-tuning-and-regression-detection-td11078.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3722) The divisions in the InMemorySorters' swap/compare methods hurt performance
[ https://issues.apache.org/jira/browse/FLINK-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay updated FLINK-3722: --- Component/s: (was: Distributed Coordination) Local Runtime > The divisions in the InMemorySorters' swap/compare methods hurt performance > --- > > Key: FLINK-3722 > URL: https://issues.apache.org/jira/browse/FLINK-3722 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Reporter: Gabor Gevay >Priority: Minor > Labels: performance > > NormalizedKeySorter's and FixedLengthRecordSorter's swap and compare methods > use divisions (which take a lot of time \[1\]) to calculate the index of the > MemorySegment and the offset inside the segment. [~greghogan] reported on the > mailing list \[2\] measuring a ~12-14% performance effect in one case. > A possibility to improve the situation is the following: > The way that QuickSort mostly uses these compare and swap methods is that it > maintains two indices, and uses them to call compare and swap. The key > observation is that these indices are mostly stepped by one, and > _incrementally_ calculating the quotient and modulo is actually easy when the > index changes only by one: increment/decrement the modulo, and check whether > the modulo has reached 0 or the divisor, and if it did, then wrap-around the > modulo and increment/decrement the quotient. > To implement this, InMemorySorter would have to expose an iterator that would > have the divisor and the current modulo and quotient as state, and have > increment/decrement methods. Compare and swap could then have overloads that > take these iterators as arguments. > \[1\] http://www.agner.org/optimize/instruction_tables.pdf > \[2\] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Macro-benchmarking-for-performance-tuning-and-regression-detection-td11078.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)