GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/23171
[SPARK-26205][SQL] Optimize In for bytes, shorts, ints ## What changes were proposed in this pull request? This PR optimizes `In` expressions for byte, short, integer types. It is a follow-up on PR #21442 from @dbtsai. Currently, `In` expressions are compiled into a sequence of if-else statement, which results in O(n) time complexity. `InSet` is an optimized version of `In`, which is supposed to improve the performance if the number of elements is big enough. However, `InSet` actually degrades the performance in many cases due to various reasons (`InSet` will be handled in [SPARK-26203](https://issues.apache.org/jira/browse/SPARK-26203) and [SPARK-26204](https://issues.apache.org/jira/browse/SPARK-26204)). The main idea of this PR is to make use of `tableswitch` and `lookupswitch` bytecode instructions to speed up `In` expressions. In short, we can improve our time complexity from O(n) to O(1) or at least O(log n) by using Java `switch` statements. We will have O(1) time complexity if our case values are compact and `tableswitch` can be used. Otherwise, `lookupswitch` will give us O(log n). See [here](https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-3.html#jvms-3.10) and [here](https://stackoverflow.com/questions/10287700/difference-between-jvms-lookupswitch-and-tableswitch) for more information. An important benefit of the proposed approach is that we do not have to pay an extra cost for autoboxing as in case of `InSet`. As a consequence, we can substantially outperform `InSet` even on 250+ elements. This PR does not cover long values as Java `switch` statements cannot be used on them. However, we can have a follow-up PR with an approach similar to binary search. ## How was this patch tested? ### Correctness There is a new test that verifies the logic of the proposed optimization. ### Performance Benchmarks The performance impact was measured by the benchmarks below (an appropriate part of them will be included in [SPARK-26203](https://issues.apache.org/jira/browse/SPARK-26203)). ``` def compactBytesBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Benchmark = { val name = s"$numItems compact bytes" val values = (1 to numItems).map(v => s"CAST($v AS tinyint)") val df = spark.range(1, numRows).select($"id".cast(ByteType)) benchmark(name, df, values, numRows, minNumIters) } def nonCompactBytesBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Benchmark = { val step = (Byte.MaxValue.toInt - Byte.MinValue.toInt) / numItems val maxValue = Byte.MinValue + numItems * step val rangeSize = maxValue - Byte.MinValue require(isLookupSwitch(rangeSize, numItems)) val name = s"$numItems non-compact bytes" val values = (Byte.MinValue until maxValue by step).map(v => s"CAST($v AS tinyint)") val df = spark.range(1, numRows).select($"id".cast(ByteType)) benchmark(name, df, values, numRows, minNumIters) } def compactShortsBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Benchmark = { val name = s"$numItems compact shorts" val values = (1 to numItems).map(v => s"CAST($v AS smallint)") val df = spark.range(1, numRows).select($"id".cast(ShortType)) benchmark(name, df, values, numRows, minNumIters) } def nonCompactShortsBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Benchmark = { val step = (Short.MaxValue.toInt - Short.MinValue.toInt) / numItems val maxValue = Short.MinValue + numItems * step val rangeSize = maxValue - Short.MinValue require(isLookupSwitch(rangeSize, numItems)) val name = s"$numItems non-compact shorts" val values = (Short.MinValue until maxValue by step).map(v => s"CAST($v AS smallint)") val df = spark.range(1, numRows).select($"id".cast(ShortType)) benchmark(name, df, values, numRows, minNumIters) } def compactIntsBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Benchmark = { val name = s"$numItems compact ints" val values = (1 to numItems).map(v => 10000000 + v) val df = spark.range(1, numRows).select($"id".cast(IntegerType)) benchmark(name, df, values, numRows, minNumIters) } def nonCompactIntsBenchmark(numItems: Int, numRows: Long, minNumIters: Int): Benchmark = { val step = (Int.MaxValue.toLong - Int.MinValue.toLong) / numItems val maxValue = Int.MinValue + numItems * step val rangeSize = maxValue - Int.MinValue require(isLookupSwitch(rangeSize, numItems)) val name = s"$numItems non-compact ints" val values = Int.MinValue until maxValue.toInt by step.toInt val df = spark.range(1, numRows).select($"id".cast(IntegerType)) benchmark(name, df, values, numRows, minNumIters) } def benchmark( name: String, df: DataFrame, values: Seq[Any], numRows: Long, minNumIters: Int): Benchmark = { val benchmark = new Benchmark(name, numRows, minNumIters, output = output) df.createOrReplaceTempView("t") val testClosure = () => { val df = spark.sql(s"SELECT * FROM t WHERE id IN (${values mkString ","})") df.queryExecution.toRdd.foreach(_ => Unit) } benchmark.addCase("In expression") { _ => withSQLConf(SQLConf.OPTIMIZER_INSET_CONVERSION_THRESHOLD.key -> values.size.toString) { testClosure() } } benchmark.addCase("InSet expression") { _ => withSQLConf(SQLConf.OPTIMIZER_INSET_CONVERSION_THRESHOLD.key -> "1") { testClosure() } } benchmark } // this logic is derived from com.sun.tools.javac.jvm.Gen.java private def isLookupSwitch(rangeSize: Long, numLabels: Int): Boolean = { val tableSpaceCost = 4 + rangeSize val tableTimeCost = 3 val lookupSpaceCost = 3 + 2 * numLabels val lookupTimeCost = numLabels lookupSpaceCost + 3 * lookupTimeCost < tableSpaceCost + 3 * tableTimeCost } ``` The actual benchmarking code looks like: ``` val numRows = 100000000 val minNumIters = 10 compactBytesBenchmark(numItems = 10, numRows = numRows, minNumIters).run() nonCompactBytesBenchmark(numItems = 10, numRows = numRows, minNumIters).run() compactBytesBenchmark(numItems = 25, numRows = numRows, minNumIters).run() nonCompactBytesBenchmark(numItems = 25, numRows = numRows, minNumIters).run() compactShortsBenchmark(numItems = 10, numRows = numRows, minNumIters).run() nonCompactShortsBenchmark(numItems = 10, numRows = numRows, minNumIters).run() compactShortsBenchmark(numItems = 50, numRows = numRows, minNumIters).run() nonCompactShortsBenchmark(numItems = 50, numRows = numRows, minNumIters).run() compactShortsBenchmark(numItems = 100, numRows = numRows, minNumIters).run() nonCompactShortsBenchmark(numItems = 100, numRows = numRows, minNumIters).run() compactIntsBenchmark(numItems = 10, numRows = numRows, minNumIters).run() nonCompactIntsBenchmark(numItems = 10, numRows = numRows, minNumIters).run() compactIntsBenchmark(numItems = 100, numRows = numRows, minNumIters).run() nonCompactIntsBenchmark(numItems = 100, numRows = numRows, minNumIters).run() compactIntsBenchmark(numItems = 250, numRows = numRows, minNumIters).run() nonCompactIntsBenchmark(numItems = 250, numRows = numRows, minNumIters).run() ``` #### Byte **_Performance Summary_** Proposed Solution | # values | InSet | In with compact data | In with non-compact data | | ------------- |:-------------:| -----:| ---:| | 10 | 230 ms | 73 ms | 87 ms | | 25 | 290 ms | 110 ms | 117 ms | Current Solution | # values | InSet | In with compact data | In with non-compact data | | ------------- |:-------------:| -----:| ---:| | 10 | 220 ms | 91 ms | 91 ms | | 25 | 291 ms | 167 ms | 168 ms | **_Raw Results_** Proposed Solution ``` Running benchmark: 10 compact bytes Running case: In expression Stopped after 28 iterations, 2053 ms Running case: InSet expression Stopped after 15 iterations, 3277 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 compact bytes: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 68 / 73 1474.6 0.7 1.0X InSet expression 216 / 218 463.9 2.2 0.3X Running benchmark: 10 non-compact bytes Running case: In expression Stopped after 24 iterations, 2076 ms Running case: InSet expression Stopped after 15 iterations, 3340 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 non-compact bytes: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 84 / 87 1186.6 0.8 1.0X InSet expression 217 / 223 460.7 2.2 0.4X Running benchmark: 25 compact bytes Running case: In expression Stopped after 19 iterations, 2080 ms Running case: InSet expression Stopped after 15 iterations, 4285 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 25 compact bytes: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 105 / 110 955.0 1.0 1.0X InSet expression 273 / 286 365.8 2.7 0.4X Running benchmark: 25 non-compact bytes Running case: In expression Stopped after 18 iterations, 2104 ms Running case: InSet expression Stopped after 15 iterations, 4449 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 25 non-compact bytes: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 107 / 117 938.5 1.1 1.0X InSet expression 284 / 297 352.3 2.8 0.4X ``` Current Solution ``` Running benchmark: 10 compact bytes Running case: In expression Stopped after 22 iterations, 2002 ms Running case: InSet expression Stopped after 15 iterations, 3601 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 compact bytes: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 86 / 91 1168.9 0.9 1.0X InSet expression 226 / 240 442.5 2.3 0.4X Running benchmark: 10 non-compact bytes Running case: In expression Stopped after 23 iterations, 2085 ms Running case: InSet expression Stopped after 15 iterations, 3265 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 non-compact bytes: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 86 / 91 1169.1 0.9 1.0X InSet expression 213 / 218 469.5 2.1 0.4X Running benchmark: 25 compact bytes Running case: In expression Stopped after 15 iterations, 2499 ms Running case: InSet expression Stopped after 15 iterations, 4326 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 25 compact bytes: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 164 / 167 610.7 1.6 1.0X InSet expression 282 / 288 354.2 2.8 0.6X Running benchmark: 25 non-compact bytes Running case: In expression Stopped after 15 iterations, 2518 ms Running case: InSet expression Stopped after 15 iterations, 4404 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 25 non-compact bytes: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 164 / 168 611.5 1.6 1.0X InSet expression 282 / 294 354.4 2.8 0.6X ``` **_Conclusion_** As expected, we see that the proposed logic improves the performance on compact and non-compact sequences of byte values. While the performance on compact elements is better than on non-compact ones, we do not see that it is stable independently of the number of elements inside `In`. Hence, we cannot conclude O(1) time performance on compact byte values. Spark generates `(byte) 1`, `(byte) 100` to represent byte literals, so it can potentially be a reason for the non-linear time complexity. We also see that the performance of the old approach on bytes is the same independently of data compactness (as expected). #### Short **_Performance Summary_** Proposed Solution | # values | InSet | In with compact data | In with non-compact data | | ------------- |:-------------:| -----:| ---:| | 10 | 243 ms | 45 ms | 56 ms | | 50 | 294 ms | 45 ms | 71 ms | | 100 | 260 ms | 46 ms | 82 ms | Current Solution | # values | InSet | In with compact data | In with non-compact data | | ------------- |:-------------:| -----:| ---:| | 10 | 248 ms | 71 ms | 69 ms | | 50 | 295 ms | 169 ms | 173 ms | | 100 | 265 ms | 335 ms | 370 ms | **_Raw Results_** Proposed Solution ``` Running benchmark: 10 compact shorts Running case: In expression Stopped after 45 iterations, 2032 ms Running case: InSet expression Stopped after 15 iterations, 3633 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 41 / 45 2430.9 0.4 1.0X InSet expression 232 / 242 430.4 2.3 0.2X Running benchmark: 10 non-compact shorts Running case: In expression Stopped after 36 iterations, 2012 ms Running case: InSet expression Stopped after 15 iterations, 3645 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 non-compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 53 / 56 1880.2 0.5 1.0X InSet expression 233 / 243 428.4 2.3 0.2X Running benchmark: 50 compact shorts Running case: In expression Stopped after 45 iterations, 2007 ms Running case: InSet expression Stopped after 15 iterations, 4366 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 50 compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 42 / 45 2368.9 0.4 1.0X InSet expression 280 / 291 357.4 2.8 0.2X Running benchmark: 50 non-compact shorts Running case: In expression Stopped after 29 iterations, 2053 ms Running case: InSet expression Stopped after 15 iterations, 4447 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 50 non-compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 69 / 71 1451.0 0.7 1.0X InSet expression 292 / 297 342.3 2.9 0.2X Running benchmark: 100 compact shorts Running case: In expression Stopped after 44 iterations, 2032 ms Running case: InSet expression Stopped after 15 iterations, 3967 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 100 compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 44 / 46 2248.6 0.4 1.0X InSet expression 259 / 264 385.6 2.6 0.2X Running benchmark: 100 non-compact shorts Running case: In expression Stopped after 25 iterations, 2060 ms Running case: InSet expression Stopped after 15 iterations, 3832 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 100 non-compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 79 / 82 1267.6 0.8 1.0X InSet expression 250 / 256 399.4 2.5 0.3X ``` Current Solution ``` Running benchmark: 10 compact shorts Running case: In expression Stopped after 29 iterations, 2051 ms Running case: InSet expression Stopped after 15 iterations, 3788 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 64 / 71 1553.8 0.6 1.0X InSet expression 243 / 253 411.4 2.4 0.3X Running benchmark: 10 non-compact shorts Running case: In expression Stopped after 30 iterations, 2055 ms Running case: InSet expression Stopped after 15 iterations, 3658 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 non-compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 64 / 69 1572.3 0.6 1.0X InSet expression 238 / 244 420.0 2.4 0.3X Running benchmark: 50 compact shorts Running case: In expression Stopped after 15 iterations, 2532 ms Running case: InSet expression Stopped after 15 iterations, 4394 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 50 compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 163 / 169 614.5 1.6 1.0X InSet expression 288 / 293 346.8 2.9 0.6X Running benchmark: 50 non-compact shorts Running case: In expression Stopped after 15 iterations, 2588 ms Running case: InSet expression Stopped after 15 iterations, 4554 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 50 non-compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 168 / 173 594.0 1.7 1.0X InSet expression 290 / 304 345.0 2.9 0.6X Running benchmark: 100 compact shorts Running case: In expression Stopped after 15 iterations, 5021 ms Running case: InSet expression Stopped after 15 iterations, 4086 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 100 compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 330 / 335 303.3 3.3 1.0X InSet expression 259 / 272 386.8 2.6 1.3X Running benchmark: 100 non-compact shorts Running case: In expression Stopped after 15 iterations, 5542 ms Running case: InSet expression Stopped after 15 iterations, 3921 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 100 non-compact shorts: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 351 / 370 284.9 3.5 1.0X InSet expression 252 / 261 396.6 2.5 1.4X ``` **_Conclusion_** The proposed optimization gives us even more benefits on shorts. As opposed to bytes, we see that the performance on compact data stays the same independently of the number of values inside `In`. Therefore, we see O(1) time complexity for compact data and a significant improvement even on non-compact data. As on bytes, the performance of the old approach on shorts is the same independently of data compactness. #### Int **_Performance Summary_** Proposed Solution | # values | InSet | In with compact data | In with non-compact data | | ------------- |:-------------:| -----:| ---:| | 10 | 230 ms | 43 ms | 36 ms | | 100 | 240 ms | 45 ms | 99 ms | | 250 | 250 ms | 49 ms | 109 ms | Current Solution | # values | InSet | In with compact data | In with non-compact data | | ------------- |:-------------:| -----:| ---:| | 10 | 225 ms | 93 ms | 74 ms | | 100 | 240 ms | 624 ms | 608 ms | | 250 | 250 ms | 1131 ms | 1120 ms | **_Raw Results_** Proposed Solution ``` Running benchmark: 10 compact ints Running case: In expression Stopped after 47 iterations, 2018 ms Running case: InSet expression Stopped after 15 iterations, 3404 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 41 / 43 2457.4 0.4 1.0X InSet expression 224 / 227 446.5 2.2 0.2X Running benchmark: 10 non-compact ints Running case: In expression Stopped after 56 iterations, 2021 ms Running case: InSet expression Stopped after 15 iterations, 3470 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 non-compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 35 / 36 2853.6 0.4 1.0X InSet expression 226 / 231 442.9 2.3 0.2X Running benchmark: 100 compact ints Running case: In expression Stopped after 45 iterations, 2007 ms Running case: InSet expression Stopped after 15 iterations, 3766 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 100 compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 43 / 45 2344.0 0.4 1.0X InSet expression 239 / 251 417.8 2.4 0.2X Running benchmark: 100 non-compact ints Running case: In expression Stopped after 21 iterations, 2077 ms Running case: InSet expression Stopped after 15 iterations, 3776 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 100 non-compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 96 / 99 1040.2 1.0 1.0X InSet expression 248 / 252 403.6 2.5 0.4X Running benchmark: 250 compact ints Running case: In expression Stopped after 43 iterations, 2023 ms Running case: InSet expression Stopped after 15 iterations, 3673 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 250 compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 45 / 47 2201.2 0.5 1.0X InSet expression 241 / 245 414.7 2.4 0.2X Running benchmark: 250 non-compact ints Running case: In expression Stopped after 19 iterations, 2073 ms Running case: InSet expression Stopped after 15 iterations, 3802 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 250 non-compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 108 / 109 929.2 1.1 1.0X InSet expression 247 / 254 405.4 2.5 0.4X ``` Current Solution ``` Running benchmark: 10 compact ints Running case: In expression Stopped after 22 iterations, 2050 ms Running case: InSet expression Stopped after 15 iterations, 3387 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 89 / 93 1123.1 0.9 1.0X InSet expression 215 / 226 464.9 2.2 0.4X Running benchmark: 10 non-compact ints Running case: In expression Stopped after 28 iterations, 2059 ms Running case: InSet expression Stopped after 15 iterations, 3364 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 10 non-compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 72 / 74 1397.9 0.7 1.0X InSet expression 216 / 224 463.3 2.2 0.3X Running benchmark: 100 compact ints Running case: In expression Stopped after 15 iterations, 9360 ms Running case: InSet expression Stopped after 15 iterations, 3672 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 100 compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 621 / 624 161.2 6.2 1.0X InSet expression 234 / 245 428.2 2.3 2.7X Running benchmark: 100 non-compact ints Running case: In expression Stopped after 15 iterations, 9113 ms Running case: InSet expression Stopped after 15 iterations, 3563 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 100 non-compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 593 / 608 168.6 5.9 1.0X InSet expression 230 / 238 435.5 2.3 2.6X Running benchmark: 250 compact ints Running case: In expression Stopped after 15 iterations, 16970 ms Running case: InSet expression Stopped after 15 iterations, 3724 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 250 compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 1123 / 1131 89.0 11.2 1.0X InSet expression 236 / 248 423.6 2.4 4.8X Running benchmark: 250 non-compact ints Running case: In expression Stopped after 15 iterations, 16807 ms Running case: InSet expression Stopped after 15 iterations, 3788 ms Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.14 Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz 250 non-compact ints: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ In expression 1097 / 1120 91.1 11.0 1.0X InSet expression 246 / 253 407.2 2.5 4.5X ``` **_Conclusion_** Similarly to previous types, Spark performs better on ints with the proposed optimization (up to 23x in some cases). Interestingly, the new approach can perform 5 times faster than `InSet` even on 250 elements. The reason for that is autoboxing, which will be addressed in a separate PR. As on other types, the performance of the old approach on ints is the same independently of data compactness. ## Notes - The performance difference was confirmed on OpenJDK 64-Bit Server VM 1.8.0_181-b13 on Linux 4.9.93-linuxkit-aufs. - [This link] contains source code that decides between `tableswitch` and `lookupswitch`. The logic was re-used in the benchmarks. See the `isLookupSwitch` method. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aokolnychyi/spark spark-26205 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23171.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23171 ---- commit 1477f101951ed39b96f884ddd2d451e164c0cc43 Author: Anton Okolnychyi <aokolnychyi@...> Date: 2018-11-13T01:48:19Z [SPARK-26205][SQL] Optimize In for bytes, shorts, ints ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org