[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22484 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221503808 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { + + val spark: SparkSession = getSparkSession + + /** Subclass can override this function to build their own SparkSession */ + def getSparkSession: SparkSession = { +SparkSession.builder() + .master("local[1]") + .appName(this.getClass.getCanonicalName) + .config(SQLConf.SHUFFLE_PARTITIONS.key, 1) + .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1) + .getOrCreate() + } + + /** Runs function `f` with whole stage codegen on and off. */ --- End diff -- Can we use `codegenBenchmark` instead? `runBenchmarkWithCodegen` looks like an extension of `runBenchmark`. It's more like `bitEncodingBenchmark` or `sortBenchmark`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221501996 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { + + val spark: SparkSession = getSparkSession + + /** Subclass can override this function to build their own SparkSession */ + def getSparkSession: SparkSession = { +SparkSession.builder() + .master("local[1]") + .appName(this.getClass.getCanonicalName) + .config(SQLConf.SHUFFLE_PARTITIONS.key, 1) + .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1) + .getOrCreate() + } + + /** Runs function `f` with whole stage codegen on and off. */ + def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => Unit): Unit = { --- End diff -- This should be `final def runBenchmarkWithCodegen` instead of `def runBenchmarkWithCodegen`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221501890 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { + + val spark: SparkSession = getSparkSession --- End diff -- `val spark` -> `protected val spark` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221501761 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- For the naming, let's keep the current one for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221434117 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- Thank you, @gengliangwang and @wangyum . Let me think about this again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221417293 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- Then each function can be in different trait...I don't think that `runBenchmarkWithCodegen` has much in common with `runBenchmarkWithParquetPushDown`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221416957 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- Maybe we can add more common functions in the future. e.g. `runBenchmarkWithCodegen`, `runBenchmarkWithParquetPushDown`, `runBenchmarkWithOrcPushDown`... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221416202 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- How about `CodegenBenchmarkBase` ? This is the best I can think of.. @wangyum @dongjoon-hyun @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221415701 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- @dongjoon-hyun in https://github.com/apache/spark/pull/22522 I feel that it would be better to have a example refactoring, thus we can see how the new trait is used. We can move back to https://github.com/apache/spark/pull/22522 . I am OK either way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221415642 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- Actually I don't think the the name `SqlBasedBenchmark` is not appropriate..From the naming we can't tell it is about benchmarking with/without whole codegen. I will try to come up with a better name. Or we can discuss in this thread. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221401306 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- So, if @gengliangwang agree with that, `SqlBasedBenchmark` is another refactoring (renaming and improvement) like `[SPARK-25499][TEST] Refactor BenchmarkBase and Benchmark`. Could you do that in a separate PR in advance? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221397904 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- I think we can remove `BenchmarkWithCodegen` after all refactor finished. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r221396456 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { --- End diff -- @wangyum and @gengliangwang . What is the future plan for the usage of both `SqlBasedBenchmark` and `BenchmarkWithCodegen`? I'm wondering what is the criteria to choose each trait. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220959695 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap /** * Benchmark to measure performance for aggregate primitives. - * To run this: - * build/sbt "sql/test-only *benchmark.AggregateBenchmark" - * - * Benchmarks in this file are skipped in normal builds. + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/AggregateBenchmark-results.txt". + * }}} */ -class AggregateBenchmark extends BenchmarkWithCodegen { +object AggregateBenchmark extends SqlBasedBenchmark { - ignore("aggregate without grouping") { -val N = 500L << 22 -val benchmark = new Benchmark("agg without grouping", N) -runBenchmark("agg w/o group", N) { - sparkSession.range(N).selectExpr("sum(id)").collect() + override def benchmark(): Unit = { +runBenchmark("aggregate without grouping") { + val N = 500L << 22 + runBenchmarkWithCodegen("agg w/o group", N) { +spark.range(N).selectExpr("sum(id)").collect() + } } -/* -agg w/o group: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -agg w/o group wholestage off30136 / 31885 69.6 14.4 1.0X -agg w/o group wholestage on 1851 / 1860 1132.9 0.9 16.3X - */ - } - ignore("stat functions") { -val N = 100L << 20 +runBenchmark("stat functions") { + val N = 100L << 20 -runBenchmark("stddev", N) { - sparkSession.range(N).groupBy().agg("id" -> "stddev").collect() -} + runBenchmarkWithCodegen("stddev", N) { +spark.range(N).groupBy().agg("id" -> "stddev").collect() + } -runBenchmark("kurtosis", N) { - sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect() + runBenchmarkWithCodegen("kurtosis", N) { +spark.range(N).groupBy().agg("id" -> "kurtosis").collect() + } } -/* -Using ImperativeAggregate (as implemented in Spark 1.6): - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev:Avg Time(ms)Avg Rate(M/s) Relative Rate - --- - stddev w/o codegen 2019.0410.39 1.00 X - stddev w codegen2097.2910.00 0.96 X - kurtosis w/o codegen2108.99 9.94 0.96 X - kurtosis w codegen 2090.6910.03 0.97 X - - Using DeclarativeAggregate: - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - stddev codegen=false 5630 / 5776 18.0 55.6 1.0X - stddev codegen=true 1259 / 1314 83.0 12.0 4.5X - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - kurtosis: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - kurtosis codegen=false 14847 / 15084 7.0 142.9 1.0X - kurtosis codegen=true1652 / 2124 63.0 15.9 9.0X -*/ - } - - ignore("aggregate with linear keys") { -val N = 20 << 22 +runBenchmark("aggregate with linear keys") { + val N = 20 << 22 -val benchmark = new Benchmark("Aggregate w keys", N) -def f(): Unit = { - sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() -} + val benchmark = new Benchmark("Aggregate w keys", N, output = output) -benchmark.addCase(s"codegen = F", numIters = 2) { iter => - sparkSession.conf.set("spark.sql.codegen.wholeStage", "false") - f() -} +
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220800365 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap /** * Benchmark to measure performance for aggregate primitives. - * To run this: - * build/sbt "sql/test-only *benchmark.AggregateBenchmark" - * - * Benchmarks in this file are skipped in normal builds. + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/AggregateBenchmark-results.txt". + * }}} */ -class AggregateBenchmark extends BenchmarkWithCodegen { +object AggregateBenchmark extends SqlBasedBenchmark { - ignore("aggregate without grouping") { -val N = 500L << 22 -val benchmark = new Benchmark("agg without grouping", N) -runBenchmark("agg w/o group", N) { - sparkSession.range(N).selectExpr("sum(id)").collect() + override def benchmark(): Unit = { +runBenchmark("aggregate without grouping") { + val N = 500L << 22 + runBenchmarkWithCodegen("agg w/o group", N) { +spark.range(N).selectExpr("sum(id)").collect() + } } -/* -agg w/o group: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -agg w/o group wholestage off30136 / 31885 69.6 14.4 1.0X -agg w/o group wholestage on 1851 / 1860 1132.9 0.9 16.3X - */ - } - ignore("stat functions") { -val N = 100L << 20 +runBenchmark("stat functions") { + val N = 100L << 20 -runBenchmark("stddev", N) { - sparkSession.range(N).groupBy().agg("id" -> "stddev").collect() -} + runBenchmarkWithCodegen("stddev", N) { +spark.range(N).groupBy().agg("id" -> "stddev").collect() + } -runBenchmark("kurtosis", N) { - sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect() + runBenchmarkWithCodegen("kurtosis", N) { +spark.range(N).groupBy().agg("id" -> "kurtosis").collect() + } } -/* -Using ImperativeAggregate (as implemented in Spark 1.6): - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev:Avg Time(ms)Avg Rate(M/s) Relative Rate - --- - stddev w/o codegen 2019.0410.39 1.00 X - stddev w codegen2097.2910.00 0.96 X - kurtosis w/o codegen2108.99 9.94 0.96 X - kurtosis w codegen 2090.6910.03 0.97 X - - Using DeclarativeAggregate: - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - stddev codegen=false 5630 / 5776 18.0 55.6 1.0X - stddev codegen=true 1259 / 1314 83.0 12.0 4.5X - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - kurtosis: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - kurtosis codegen=false 14847 / 15084 7.0 142.9 1.0X - kurtosis codegen=true1652 / 2124 63.0 15.9 9.0X -*/ - } - - ignore("aggregate with linear keys") { -val N = 20 << 22 +runBenchmark("aggregate with linear keys") { + val N = 20 << 22 -val benchmark = new Benchmark("Aggregate w keys", N) -def f(): Unit = { - sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() -} + val benchmark = new Benchmark("Aggregate w keys", N, output = output) -benchmark.addCase(s"codegen = F", numIters = 2) { iter => - sparkSession.conf.set("spark.sql.codegen.wholeStage", "false") - f() -}
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220800434 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap /** * Benchmark to measure performance for aggregate primitives. - * To run this: - * build/sbt "sql/test-only *benchmark.AggregateBenchmark" - * - * Benchmarks in this file are skipped in normal builds. + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/AggregateBenchmark-results.txt". + * }}} */ -class AggregateBenchmark extends BenchmarkWithCodegen { +object AggregateBenchmark extends SqlBasedBenchmark { - ignore("aggregate without grouping") { -val N = 500L << 22 -val benchmark = new Benchmark("agg without grouping", N) -runBenchmark("agg w/o group", N) { - sparkSession.range(N).selectExpr("sum(id)").collect() + override def benchmark(): Unit = { +runBenchmark("aggregate without grouping") { + val N = 500L << 22 + runBenchmarkWithCodegen("agg w/o group", N) { +spark.range(N).selectExpr("sum(id)").collect() + } } -/* -agg w/o group: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -agg w/o group wholestage off30136 / 31885 69.6 14.4 1.0X -agg w/o group wholestage on 1851 / 1860 1132.9 0.9 16.3X - */ - } - ignore("stat functions") { -val N = 100L << 20 +runBenchmark("stat functions") { + val N = 100L << 20 -runBenchmark("stddev", N) { - sparkSession.range(N).groupBy().agg("id" -> "stddev").collect() -} + runBenchmarkWithCodegen("stddev", N) { +spark.range(N).groupBy().agg("id" -> "stddev").collect() + } -runBenchmark("kurtosis", N) { - sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect() + runBenchmarkWithCodegen("kurtosis", N) { +spark.range(N).groupBy().agg("id" -> "kurtosis").collect() + } } -/* -Using ImperativeAggregate (as implemented in Spark 1.6): - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev:Avg Time(ms)Avg Rate(M/s) Relative Rate - --- - stddev w/o codegen 2019.0410.39 1.00 X - stddev w codegen2097.2910.00 0.96 X - kurtosis w/o codegen2108.99 9.94 0.96 X - kurtosis w codegen 2090.6910.03 0.97 X - - Using DeclarativeAggregate: - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - stddev codegen=false 5630 / 5776 18.0 55.6 1.0X - stddev codegen=true 1259 / 1314 83.0 12.0 4.5X - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - kurtosis: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - kurtosis codegen=false 14847 / 15084 7.0 142.9 1.0X - kurtosis codegen=true1652 / 2124 63.0 15.9 9.0X -*/ - } - - ignore("aggregate with linear keys") { -val N = 20 << 22 +runBenchmark("aggregate with linear keys") { + val N = 20 << 22 -val benchmark = new Benchmark("Aggregate w keys", N) -def f(): Unit = { - sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() -} + val benchmark = new Benchmark("Aggregate w keys", N, output = output) -benchmark.addCase(s"codegen = F", numIters = 2) { iter => - sparkSession.conf.set("spark.sql.codegen.wholeStage", "false") - f() -}
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220799981 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap /** * Benchmark to measure performance for aggregate primitives. - * To run this: - * build/sbt "sql/test-only *benchmark.AggregateBenchmark" - * - * Benchmarks in this file are skipped in normal builds. + * To run this benchmark: + * {{{ + * 1. without sbt: bin/spark-submit --class + * 2. build/sbt "sql/test:runMain " + * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/AggregateBenchmark-results.txt". + * }}} */ -class AggregateBenchmark extends BenchmarkWithCodegen { +object AggregateBenchmark extends SqlBasedBenchmark { - ignore("aggregate without grouping") { -val N = 500L << 22 -val benchmark = new Benchmark("agg without grouping", N) -runBenchmark("agg w/o group", N) { - sparkSession.range(N).selectExpr("sum(id)").collect() + override def benchmark(): Unit = { +runBenchmark("aggregate without grouping") { + val N = 500L << 22 + runBenchmarkWithCodegen("agg w/o group", N) { +spark.range(N).selectExpr("sum(id)").collect() + } } -/* -agg w/o group: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative - -agg w/o group wholestage off30136 / 31885 69.6 14.4 1.0X -agg w/o group wholestage on 1851 / 1860 1132.9 0.9 16.3X - */ - } - ignore("stat functions") { -val N = 100L << 20 +runBenchmark("stat functions") { + val N = 100L << 20 -runBenchmark("stddev", N) { - sparkSession.range(N).groupBy().agg("id" -> "stddev").collect() -} + runBenchmarkWithCodegen("stddev", N) { +spark.range(N).groupBy().agg("id" -> "stddev").collect() + } -runBenchmark("kurtosis", N) { - sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect() + runBenchmarkWithCodegen("kurtosis", N) { +spark.range(N).groupBy().agg("id" -> "kurtosis").collect() + } } -/* -Using ImperativeAggregate (as implemented in Spark 1.6): - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev:Avg Time(ms)Avg Rate(M/s) Relative Rate - --- - stddev w/o codegen 2019.0410.39 1.00 X - stddev w codegen2097.2910.00 0.96 X - kurtosis w/o codegen2108.99 9.94 0.96 X - kurtosis w codegen 2090.6910.03 0.97 X - - Using DeclarativeAggregate: - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - stddev: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - stddev codegen=false 5630 / 5776 18.0 55.6 1.0X - stddev codegen=true 1259 / 1314 83.0 12.0 4.5X - - Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz - kurtosis: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative - --- - kurtosis codegen=false 14847 / 15084 7.0 142.9 1.0X - kurtosis codegen=true1652 / 2124 63.0 15.9 9.0X -*/ - } - - ignore("aggregate with linear keys") { -val N = 20 << 22 +runBenchmark("aggregate with linear keys") { + val N = 20 << 22 -val benchmark = new Benchmark("Aggregate w keys", N) -def f(): Unit = { - sparkSession.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() -} + val benchmark = new Benchmark("Aggregate w keys", N, output = output) -benchmark.addCase(s"codegen = F", numIters = 2) { iter => - sparkSession.conf.set("spark.sql.codegen.wholeStage", "false") - f() -}
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220764252 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase { + + val spark: SparkSession = getSparkSession + + /** Subclass can override this function to build their own SparkSession */ + def getSparkSession: SparkSession = { +SparkSession.builder() + .master("local[1]") + .appName(this.getClass.getCanonicalName) + .config(SQLConf.SHUFFLE_PARTITIONS.key, 1) + .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1) + .getOrCreate() + } + + /** + * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL + * configurations. + */ + protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { --- End diff -- @dongjoon-hyun I finished it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220438521 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase { + + val spark: SparkSession = getSparkSession + + /** Subclass can override this function to build their own SparkSession */ + def getSparkSession: SparkSession = { +SparkSession.builder() + .master("local[1]") + .appName(this.getClass.getCanonicalName) + .config(SQLConf.SHUFFLE_PARTITIONS.key, 1) + .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1) + .getOrCreate() + } + + /** + * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL + * configurations. + */ + protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { --- End diff -- Yes, I will do it now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220435538 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase { + + val spark: SparkSession = getSparkSession + + /** Subclass can override this function to build their own SparkSession */ + def getSparkSession: SparkSession = { +SparkSession.builder() + .master("local[1]") + .appName(this.getClass.getCanonicalName) + .config(SQLConf.SHUFFLE_PARTITIONS.key, 1) + .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1) + .getOrCreate() + } + + /** + * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL + * configurations. + */ + protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { --- End diff -- @wangyum . Thank you for waiting. Since SPARK-25534 is merged, could you use `SQLHelper.withSQLConf` instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220304213 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase { + + val spark: SparkSession = getSparkSession + + /** Subclass can override this function to build their own SparkSession */ + def getSparkSession: SparkSession = { +SparkSession.builder() + .master("local[1]") + .appName(this.getClass.getCanonicalName) + .config(SQLConf.SHUFFLE_PARTITIONS.key, 1) + .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1) + .getOrCreate() + } + + /** + * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL + * configurations. + */ + protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { +val conf = SQLConf.get +val (keys, values) = pairs.unzip +val currentValues = keys.map { key => + if (conf.contains(key)) { +Some(conf.getConfString(key)) + } else { +None + } +} +(keys, values).zipped.foreach { (k, v) => + if (SQLConf.staticConfKeys.contains(k)) { +throw new AnalysisException(s"Cannot modify the value of a static config: $k") + } + conf.setConfString(k, v) +} +try f finally { + keys.zip(currentValues).foreach { +case (key, Some(value)) => conf.setConfString(key, value) +case (key, None) => conf.unsetConf(key) + } +} + } + + /** Runs function `f` with whole stage codegen on and off. */ + def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => Unit): Unit = { +val benchmark = new Benchmark(name, cardinality, output = output) + +benchmark.addCase(s"$name wholestage off", numIters = 2) { _ => + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { --- End diff -- Is it too much to introduce a trait for `withSQLConf`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220303975 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase { + + val spark: SparkSession = getSparkSession + + /** Subclass can override this function to build their own SparkSession */ + def getSparkSession: SparkSession = { +SparkSession.builder() + .master("local[1]") + .appName(this.getClass.getCanonicalName) + .config(SQLConf.SHUFFLE_PARTITIONS.key, 1) + .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1) + .getOrCreate() + } + + /** + * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL + * configurations. + */ + protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { +val conf = SQLConf.get +val (keys, values) = pairs.unzip +val currentValues = keys.map { key => + if (conf.contains(key)) { +Some(conf.getConfString(key)) + } else { +None + } +} +(keys, values).zipped.foreach { (k, v) => + if (SQLConf.staticConfKeys.contains(k)) { +throw new AnalysisException(s"Cannot modify the value of a static config: $k") + } + conf.setConfString(k, v) +} +try f finally { + keys.zip(currentValues).foreach { +case (key, Some(value)) => conf.setConfString(key, value) +case (key, None) => conf.unsetConf(key) + } +} + } + + /** Runs function `f` with whole stage codegen on and off. */ + def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => Unit): Unit = { +val benchmark = new Benchmark(name, cardinality, output = output) + +benchmark.addCase(s"$name wholestage off", numIters = 2) { _ => + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { --- End diff -- Right. It's the same. And, the previous issue was we don't call `unset`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220298090 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase { + + val spark: SparkSession = getSparkSession + + /** Subclass can override this function to build their own SparkSession */ + def getSparkSession: SparkSession = { +SparkSession.builder() + .master("local[1]") + .appName(this.getClass.getCanonicalName) + .config(SQLConf.SHUFFLE_PARTITIONS.key, 1) + .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1) + .getOrCreate() + } + + /** + * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL + * configurations. + */ + protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { +val conf = SQLConf.get +val (keys, values) = pairs.unzip +val currentValues = keys.map { key => + if (conf.contains(key)) { +Some(conf.getConfString(key)) + } else { +None + } +} +(keys, values).zipped.foreach { (k, v) => + if (SQLConf.staticConfKeys.contains(k)) { +throw new AnalysisException(s"Cannot modify the value of a static config: $k") + } + conf.setConfString(k, v) +} +try f finally { + keys.zip(currentValues).foreach { +case (key, Some(value)) => conf.setConfString(key, value) +case (key, None) => conf.unsetConf(key) + } +} + } + + /** Runs function `f` with whole stage codegen on and off. */ + def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => Unit): Unit = { +val benchmark = new Benchmark(name, cardinality, output = output) + +benchmark.addCase(s"$name wholestage off", numIters = 2) { _ => + withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") { --- End diff -- for this particular case, do we really need `withSQLConf`? I think we can do ``` val conf = SQLConf.get conf.set(WHOLESTAGE_CODEGEN_ENABLED, false) run benchmark... conf.set(WHOLESTAGE_CODEGEN_ENABLED, true) run benchmark... conf.unset(WHOLESTAGE_CODEGEN_ENABLED) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220043531 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.internal.SQLConf + +/** + * Common base trait to run benchmark with the Dataset and DataFrame API. + */ +trait SqlBasedBenchmark extends BenchmarkBase { + + val spark: SparkSession = getSparkSession + + /** Subclass can override this function to build their own SparkSession */ + def getSparkSession: SparkSession = { +SparkSession.builder() + .master("local[1]") + .appName(this.getClass.getCanonicalName) + .config(SQLConf.SHUFFLE_PARTITIONS.key, 1) + .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1) + .getOrCreate() + } + + /** + * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL + * configurations. + */ + protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { --- End diff -- Shall we avoid duplicating the existing logic `withSQLConf`? Let me try to fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220034493 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark { spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() } - benchmark.addCase(s"codegen = F", numIters = 2) { iter => -spark.conf.set("spark.sql.codegen.wholeStage", "false") -f() + benchmark.addCase(s"codegen = F", numIters = 2) { _ => +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) { + f() +} } - benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter => -spark.conf.set("spark.sql.codegen.wholeStage", "true") -spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false") - spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false") -f() + benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ => +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString, + SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString, + "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) { --- End diff -- Fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220033439 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark { spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() } - benchmark.addCase(s"codegen = F", numIters = 2) { iter => -spark.conf.set("spark.sql.codegen.wholeStage", "false") -f() + benchmark.addCase(s"codegen = F", numIters = 2) { _ => +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) { + f() +} } - benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter => -spark.conf.set("spark.sql.codegen.wholeStage", "true") -spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false") - spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false") -f() + benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ => +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString, + SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString, + "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) { --- End diff -- Yes! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220028846 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark { spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() } - benchmark.addCase(s"codegen = F", numIters = 2) { iter => -spark.conf.set("spark.sql.codegen.wholeStage", "false") -f() + benchmark.addCase(s"codegen = F", numIters = 2) { _ => +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) { + f() +} } - benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter => -spark.conf.set("spark.sql.codegen.wholeStage", "true") -spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false") - spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false") -f() + benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ => +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString, + SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString, + "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) { --- End diff -- Do you mean change ```scala withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true", SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false", "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") { f() } ``` to ```scala withSQLConf( SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true", SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false", "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") { f() } ``` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220027149 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark { spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() } - benchmark.addCase(s"codegen = F", numIters = 2) { iter => -spark.conf.set("spark.sql.codegen.wholeStage", "false") -f() + benchmark.addCase(s"codegen = F", numIters = 2) { _ => +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) { + f() +} } - benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter => -spark.conf.set("spark.sql.codegen.wholeStage", "true") -spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", "false") - spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false") -f() + benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ => +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> true.toString, + SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString, + "spark.sql.codegen.aggregate.map.vectorized.enable" -> false.toString) { --- End diff -- @wangyum . This one is also for indentation. Please note that `withSQLConf(` is beyond the first configuration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220026018 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala --- @@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark { spark.range(N).selectExpr("(id & 65535) as k").groupBy("k").sum().collect() } - benchmark.addCase(s"codegen = F", numIters = 2) { iter => -spark.conf.set("spark.sql.codegen.wholeStage", "false") -f() + benchmark.addCase(s"codegen = F", numIters = 2) { _ => +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> false.toString) { --- End diff -- When we use `Seq(true, false).foreach { value =>`, we usually do `s"$value"`. But, for this, I think `"false"` is the simplest and the best. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org