[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-10-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22484


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221503808
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /** Runs function `f` with whole stage codegen on and off. */
--- End diff --

Can we use `codegenBenchmark` instead? `runBenchmarkWithCodegen` looks like 
an extension of `runBenchmark`. It's more like `bitEncodingBenchmark` or 
`sortBenchmark`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221501996
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /** Runs function `f` with whole stage codegen on and off. */
+  def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => 
Unit): Unit = {
--- End diff --

This should be `final def runBenchmarkWithCodegen` instead of `def 
runBenchmarkWithCodegen`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221501890
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
+
+  val spark: SparkSession = getSparkSession
--- End diff --

`val spark` -> `protected val spark`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221501761
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

For the naming, let's keep the current one for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221434117
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

Thank you, @gengliangwang and @wangyum . Let me think about this again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221417293
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

Then each function can be in different trait...I don't think that 
`runBenchmarkWithCodegen` has much in common with 
`runBenchmarkWithParquetPushDown`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-28 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221416957
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

Maybe we can add more common functions in the future. e.g. 
`runBenchmarkWithCodegen`, `runBenchmarkWithParquetPushDown`, 
`runBenchmarkWithOrcPushDown`...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221416202
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

How about `CodegenBenchmarkBase` ? This is the best I can think of.. 
@wangyum @dongjoon-hyun @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221415701
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

@dongjoon-hyun in https://github.com/apache/spark/pull/22522 I feel that it 
would be better to have a example refactoring, thus we can see how the new 
trait is used. 
We can move back to https://github.com/apache/spark/pull/22522 . I am OK 
either way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221415642
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

Actually I don't think the the name `SqlBasedBenchmark` is not 
appropriate..From the naming we can't tell it is about benchmarking 
with/without whole codegen. I will try to come up with a better name. Or we can 
discuss in this thread.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221401306
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

So, if @gengliangwang agree with that, `SqlBasedBenchmark` is another 
refactoring (renaming and improvement) like `[SPARK-25499][TEST] Refactor 
BenchmarkBase and Benchmark`. Could you do that in a separate PR in advance?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-28 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221397904
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

I think we can remove `BenchmarkWithCodegen` after all refactor finished.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r221396456
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
--- End diff --

@wangyum and @gengliangwang .

What is the future plan for the usage of both `SqlBasedBenchmark` and 
`BenchmarkWithCodegen`? I'm wondering what is the criteria to choose each trait.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-27 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220959695
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class  
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to 
"benchmarks/AggregateBenchmark-results.txt".
+ * }}}
  */
-class AggregateBenchmark extends BenchmarkWithCodegen {
+object AggregateBenchmark extends SqlBasedBenchmark {
 
-  ignore("aggregate without grouping") {
-val N = 500L << 22
-val benchmark = new Benchmark("agg without grouping", N)
-runBenchmark("agg w/o group", N) {
-  sparkSession.range(N).selectExpr("sum(id)").collect()
+  override def benchmark(): Unit = {
+runBenchmark("aggregate without grouping") {
+  val N = 500L << 22
+  runBenchmarkWithCodegen("agg w/o group", N) {
+spark.range(N).selectExpr("sum(id)").collect()
+  }
 }
-/*
-agg w/o group:   Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
-

-agg w/o group wholestage off30136 / 31885 69.6 
 14.4   1.0X
-agg w/o group wholestage on   1851 / 1860   1132.9 
  0.9  16.3X
- */
-  }
 
-  ignore("stat functions") {
-val N = 100L << 20
+runBenchmark("stat functions") {
+  val N = 100L << 20
 
-runBenchmark("stddev", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
-}
+  runBenchmarkWithCodegen("stddev", N) {
+spark.range(N).groupBy().agg("id" -> "stddev").collect()
+  }
 
-runBenchmark("kurtosis", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  runBenchmarkWithCodegen("kurtosis", N) {
+spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  }
 }
 
-/*
-Using ImperativeAggregate (as implemented in Spark 1.6):
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev:Avg Time(ms)Avg Rate(M/s)  
Relative Rate
-  
---
-  stddev w/o codegen  2019.0410.39 
1.00 X
-  stddev w codegen2097.2910.00 
0.96 X
-  kurtosis w/o codegen2108.99 9.94 
0.96 X
-  kurtosis w codegen  2090.6910.03 
0.97 X
-
-  Using DeclarativeAggregate:
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  stddev codegen=false 5630 / 5776 18.0
  55.6   1.0X
-  stddev codegen=true  1259 / 1314 83.0
  12.0   4.5X
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  kurtosis:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  kurtosis codegen=false 14847 / 15084  7.0
 142.9   1.0X
-  kurtosis codegen=true1652 / 2124 63.0
  15.9   9.0X
-*/
-  }
-
-  ignore("aggregate with linear keys") {
-val N = 20 << 22
+runBenchmark("aggregate with linear keys") {
+  val N = 20 << 22
 
-val benchmark = new Benchmark("Aggregate w keys", N)
-def f(): Unit = {
-  sparkSession.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
-}
+  val benchmark = new Benchmark("Aggregate w keys", N, output = output)
 
-benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-  sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
-  f()
-}
+  

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220800365
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class  
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to 
"benchmarks/AggregateBenchmark-results.txt".
+ * }}}
  */
-class AggregateBenchmark extends BenchmarkWithCodegen {
+object AggregateBenchmark extends SqlBasedBenchmark {
 
-  ignore("aggregate without grouping") {
-val N = 500L << 22
-val benchmark = new Benchmark("agg without grouping", N)
-runBenchmark("agg w/o group", N) {
-  sparkSession.range(N).selectExpr("sum(id)").collect()
+  override def benchmark(): Unit = {
+runBenchmark("aggregate without grouping") {
+  val N = 500L << 22
+  runBenchmarkWithCodegen("agg w/o group", N) {
+spark.range(N).selectExpr("sum(id)").collect()
+  }
 }
-/*
-agg w/o group:   Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
-

-agg w/o group wholestage off30136 / 31885 69.6 
 14.4   1.0X
-agg w/o group wholestage on   1851 / 1860   1132.9 
  0.9  16.3X
- */
-  }
 
-  ignore("stat functions") {
-val N = 100L << 20
+runBenchmark("stat functions") {
+  val N = 100L << 20
 
-runBenchmark("stddev", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
-}
+  runBenchmarkWithCodegen("stddev", N) {
+spark.range(N).groupBy().agg("id" -> "stddev").collect()
+  }
 
-runBenchmark("kurtosis", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  runBenchmarkWithCodegen("kurtosis", N) {
+spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  }
 }
 
-/*
-Using ImperativeAggregate (as implemented in Spark 1.6):
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev:Avg Time(ms)Avg Rate(M/s)  
Relative Rate
-  
---
-  stddev w/o codegen  2019.0410.39 
1.00 X
-  stddev w codegen2097.2910.00 
0.96 X
-  kurtosis w/o codegen2108.99 9.94 
0.96 X
-  kurtosis w codegen  2090.6910.03 
0.97 X
-
-  Using DeclarativeAggregate:
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  stddev codegen=false 5630 / 5776 18.0
  55.6   1.0X
-  stddev codegen=true  1259 / 1314 83.0
  12.0   4.5X
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  kurtosis:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  kurtosis codegen=false 14847 / 15084  7.0
 142.9   1.0X
-  kurtosis codegen=true1652 / 2124 63.0
  15.9   9.0X
-*/
-  }
-
-  ignore("aggregate with linear keys") {
-val N = 20 << 22
+runBenchmark("aggregate with linear keys") {
+  val N = 20 << 22
 
-val benchmark = new Benchmark("Aggregate w keys", N)
-def f(): Unit = {
-  sparkSession.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
-}
+  val benchmark = new Benchmark("Aggregate w keys", N, output = output)
 
-benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-  sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
-  f()
-}
 

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220800434
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class  
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to 
"benchmarks/AggregateBenchmark-results.txt".
+ * }}}
  */
-class AggregateBenchmark extends BenchmarkWithCodegen {
+object AggregateBenchmark extends SqlBasedBenchmark {
 
-  ignore("aggregate without grouping") {
-val N = 500L << 22
-val benchmark = new Benchmark("agg without grouping", N)
-runBenchmark("agg w/o group", N) {
-  sparkSession.range(N).selectExpr("sum(id)").collect()
+  override def benchmark(): Unit = {
+runBenchmark("aggregate without grouping") {
+  val N = 500L << 22
+  runBenchmarkWithCodegen("agg w/o group", N) {
+spark.range(N).selectExpr("sum(id)").collect()
+  }
 }
-/*
-agg w/o group:   Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
-

-agg w/o group wholestage off30136 / 31885 69.6 
 14.4   1.0X
-agg w/o group wholestage on   1851 / 1860   1132.9 
  0.9  16.3X
- */
-  }
 
-  ignore("stat functions") {
-val N = 100L << 20
+runBenchmark("stat functions") {
+  val N = 100L << 20
 
-runBenchmark("stddev", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
-}
+  runBenchmarkWithCodegen("stddev", N) {
+spark.range(N).groupBy().agg("id" -> "stddev").collect()
+  }
 
-runBenchmark("kurtosis", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  runBenchmarkWithCodegen("kurtosis", N) {
+spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  }
 }
 
-/*
-Using ImperativeAggregate (as implemented in Spark 1.6):
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev:Avg Time(ms)Avg Rate(M/s)  
Relative Rate
-  
---
-  stddev w/o codegen  2019.0410.39 
1.00 X
-  stddev w codegen2097.2910.00 
0.96 X
-  kurtosis w/o codegen2108.99 9.94 
0.96 X
-  kurtosis w codegen  2090.6910.03 
0.97 X
-
-  Using DeclarativeAggregate:
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  stddev codegen=false 5630 / 5776 18.0
  55.6   1.0X
-  stddev codegen=true  1259 / 1314 83.0
  12.0   4.5X
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  kurtosis:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  kurtosis codegen=false 14847 / 15084  7.0
 142.9   1.0X
-  kurtosis codegen=true1652 / 2124 63.0
  15.9   9.0X
-*/
-  }
-
-  ignore("aggregate with linear keys") {
-val N = 20 << 22
+runBenchmark("aggregate with linear keys") {
+  val N = 20 << 22
 
-val benchmark = new Benchmark("Aggregate w keys", N)
-def f(): Unit = {
-  sparkSession.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
-}
+  val benchmark = new Benchmark("Aggregate w keys", N, output = output)
 
-benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-  sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
-  f()
-}
 

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220799981
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -34,621 +34,539 @@ import org.apache.spark.unsafe.map.BytesToBytesMap
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.AggregateBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt: bin/spark-submit --class  
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to 
"benchmarks/AggregateBenchmark-results.txt".
+ * }}}
  */
-class AggregateBenchmark extends BenchmarkWithCodegen {
+object AggregateBenchmark extends SqlBasedBenchmark {
 
-  ignore("aggregate without grouping") {
-val N = 500L << 22
-val benchmark = new Benchmark("agg without grouping", N)
-runBenchmark("agg w/o group", N) {
-  sparkSession.range(N).selectExpr("sum(id)").collect()
+  override def benchmark(): Unit = {
+runBenchmark("aggregate without grouping") {
+  val N = 500L << 22
+  runBenchmarkWithCodegen("agg w/o group", N) {
+spark.range(N).selectExpr("sum(id)").collect()
+  }
 }
-/*
-agg w/o group:   Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
-

-agg w/o group wholestage off30136 / 31885 69.6 
 14.4   1.0X
-agg w/o group wholestage on   1851 / 1860   1132.9 
  0.9  16.3X
- */
-  }
 
-  ignore("stat functions") {
-val N = 100L << 20
+runBenchmark("stat functions") {
+  val N = 100L << 20
 
-runBenchmark("stddev", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "stddev").collect()
-}
+  runBenchmarkWithCodegen("stddev", N) {
+spark.range(N).groupBy().agg("id" -> "stddev").collect()
+  }
 
-runBenchmark("kurtosis", N) {
-  sparkSession.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  runBenchmarkWithCodegen("kurtosis", N) {
+spark.range(N).groupBy().agg("id" -> "kurtosis").collect()
+  }
 }
 
-/*
-Using ImperativeAggregate (as implemented in Spark 1.6):
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev:Avg Time(ms)Avg Rate(M/s)  
Relative Rate
-  
---
-  stddev w/o codegen  2019.0410.39 
1.00 X
-  stddev w codegen2097.2910.00 
0.96 X
-  kurtosis w/o codegen2108.99 9.94 
0.96 X
-  kurtosis w codegen  2090.6910.03 
0.97 X
-
-  Using DeclarativeAggregate:
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  stddev: Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  stddev codegen=false 5630 / 5776 18.0
  55.6   1.0X
-  stddev codegen=true  1259 / 1314 83.0
  12.0   4.5X
-
-  Intel(R) Core(TM) i7-4558U CPU @ 2.80GHz
-  kurtosis:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
-  
---
-  kurtosis codegen=false 14847 / 15084  7.0
 142.9   1.0X
-  kurtosis codegen=true1652 / 2124 63.0
  15.9   9.0X
-*/
-  }
-
-  ignore("aggregate with linear keys") {
-val N = 20 << 22
+runBenchmark("aggregate with linear keys") {
+  val N = 20 << 22
 
-val benchmark = new Benchmark("Aggregate w keys", N)
-def f(): Unit = {
-  sparkSession.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
-}
+  val benchmark = new Benchmark("Aggregate w keys", N, output = output)
 
-benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-  sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
-  f()
-}
 

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-26 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220764252
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.{AnalysisException, SparkSession}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /**
+   * Sets all SQL configurations specified in `pairs`, calls `f`, and then 
restores all SQL
+   * configurations.
+   */
+  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
--- End diff --

@dongjoon-hyun  I finished it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220438521
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.{AnalysisException, SparkSession}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /**
+   * Sets all SQL configurations specified in `pairs`, calls `f`, and then 
restores all SQL
+   * configurations.
+   */
+  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
--- End diff --

Yes, I will do it now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220435538
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.{AnalysisException, SparkSession}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /**
+   * Sets all SQL configurations specified in `pairs`, calls `f`, and then 
restores all SQL
+   * configurations.
+   */
+  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
--- End diff --

@wangyum . Thank you for waiting.
Since SPARK-25534 is merged, could you use `SQLHelper.withSQLConf` instead?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220304213
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.{AnalysisException, SparkSession}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /**
+   * Sets all SQL configurations specified in `pairs`, calls `f`, and then 
restores all SQL
+   * configurations.
+   */
+  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val conf = SQLConf.get
+val (keys, values) = pairs.unzip
+val currentValues = keys.map { key =>
+  if (conf.contains(key)) {
+Some(conf.getConfString(key))
+  } else {
+None
+  }
+}
+(keys, values).zipped.foreach { (k, v) =>
+  if (SQLConf.staticConfKeys.contains(k)) {
+throw new AnalysisException(s"Cannot modify the value of a static 
config: $k")
+  }
+  conf.setConfString(k, v)
+}
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => conf.setConfString(key, value)
+case (key, None) => conf.unsetConf(key)
+  }
+}
+  }
+
+  /** Runs function `f` with whole stage codegen on and off. */
+  def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => 
Unit): Unit = {
+val benchmark = new Benchmark(name, cardinality, output = output)
+
+benchmark.addCase(s"$name wholestage off", numIters = 2) { _ =>
+  withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
--- End diff --

Is it too much to introduce a trait for `withSQLConf`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220303975
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.{AnalysisException, SparkSession}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /**
+   * Sets all SQL configurations specified in `pairs`, calls `f`, and then 
restores all SQL
+   * configurations.
+   */
+  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val conf = SQLConf.get
+val (keys, values) = pairs.unzip
+val currentValues = keys.map { key =>
+  if (conf.contains(key)) {
+Some(conf.getConfString(key))
+  } else {
+None
+  }
+}
+(keys, values).zipped.foreach { (k, v) =>
+  if (SQLConf.staticConfKeys.contains(k)) {
+throw new AnalysisException(s"Cannot modify the value of a static 
config: $k")
+  }
+  conf.setConfString(k, v)
+}
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => conf.setConfString(key, value)
+case (key, None) => conf.unsetConf(key)
+  }
+}
+  }
+
+  /** Runs function `f` with whole stage codegen on and off. */
+  def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => 
Unit): Unit = {
+val benchmark = new Benchmark(name, cardinality, output = output)
+
+benchmark.addCase(s"$name wholestage off", numIters = 2) { _ =>
+  withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
--- End diff --

Right. It's the same. And, the previous issue was we don't call `unset`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220298090
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.{AnalysisException, SparkSession}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /**
+   * Sets all SQL configurations specified in `pairs`, calls `f`, and then 
restores all SQL
+   * configurations.
+   */
+  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
+val conf = SQLConf.get
+val (keys, values) = pairs.unzip
+val currentValues = keys.map { key =>
+  if (conf.contains(key)) {
+Some(conf.getConfString(key))
+  } else {
+None
+  }
+}
+(keys, values).zipped.foreach { (k, v) =>
+  if (SQLConf.staticConfKeys.contains(k)) {
+throw new AnalysisException(s"Cannot modify the value of a static 
config: $k")
+  }
+  conf.setConfString(k, v)
+}
+try f finally {
+  keys.zip(currentValues).foreach {
+case (key, Some(value)) => conf.setConfString(key, value)
+case (key, None) => conf.unsetConf(key)
+  }
+}
+  }
+
+  /** Runs function `f` with whole stage codegen on and off. */
+  def runBenchmarkWithCodegen(name: String, cardinality: Long)(f: => 
Unit): Unit = {
+val benchmark = new Benchmark(name, cardinality, output = output)
+
+benchmark.addCase(s"$name wholestage off", numIters = 2) { _ =>
+  withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
--- End diff --

for this particular case, do we really need `withSQLConf`? I think we can do
```
val conf = SQLConf.get
conf.set(WHOLESTAGE_CODEGEN_ENABLED, false)
run benchmark...
conf.set(WHOLESTAGE_CODEGEN_ENABLED, true)
run benchmark...
conf.unset(WHOLESTAGE_CODEGEN_ENABLED)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220043531
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 ---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.{AnalysisException, SparkSession}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait to run benchmark with the Dataset and DataFrame API.
+ */
+trait SqlBasedBenchmark extends BenchmarkBase {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /**
+   * Sets all SQL configurations specified in `pairs`, calls `f`, and then 
restores all SQL
+   * configurations.
+   */
+  protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = {
--- End diff --

Shall we avoid duplicating the existing logic `withSQLConf`? Let me try to 
fix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-24 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220034493
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
 spark.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
   }
 
-  benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-spark.conf.set("spark.sql.codegen.wholeStage", "false")
-f()
+  benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
+withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
false.toString) {
+  f()
+}
   }
 
-  benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
-spark.conf.set("spark.sql.codegen.wholeStage", "true")
-spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", 
"false")
-
spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
-f()
+  benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
+withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
true.toString,
+  SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
+  "spark.sql.codegen.aggregate.map.vectorized.enable" -> 
false.toString) {
--- End diff --

Fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220033439
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
 spark.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
   }
 
-  benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-spark.conf.set("spark.sql.codegen.wholeStage", "false")
-f()
+  benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
+withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
false.toString) {
+  f()
+}
   }
 
-  benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
-spark.conf.set("spark.sql.codegen.wholeStage", "true")
-spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", 
"false")
-
spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
-f()
+  benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
+withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
true.toString,
+  SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
+  "spark.sql.codegen.aggregate.map.vectorized.enable" -> 
false.toString) {
--- End diff --

Yes!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-24 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220028846
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
 spark.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
   }
 
-  benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-spark.conf.set("spark.sql.codegen.wholeStage", "false")
-f()
+  benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
+withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
false.toString) {
+  f()
+}
   }
 
-  benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
-spark.conf.set("spark.sql.codegen.wholeStage", "true")
-spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", 
"false")
-
spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
-f()
+  benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
+withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
true.toString,
+  SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
+  "spark.sql.codegen.aggregate.map.vectorized.enable" -> 
false.toString) {
--- End diff --

Do you mean change
```scala
withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
  SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false",
  "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") {
  f()
}
```
to 
```scala
withSQLConf(
  SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
  SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> "false",
  "spark.sql.codegen.aggregate.map.vectorized.enable" -> "false") {
  f()
}
```
?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220027149
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
 spark.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
   }
 
-  benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-spark.conf.set("spark.sql.codegen.wholeStage", "false")
-f()
+  benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
+withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
false.toString) {
+  f()
+}
   }
 
-  benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { iter =>
-spark.conf.set("spark.sql.codegen.wholeStage", "true")
-spark.conf.set("spark.sql.codegen.aggregate.map.twolevel.enabled", 
"false")
-
spark.conf.set("spark.sql.codegen.aggregate.map.vectorized.enable", "false")
-f()
+  benchmark.addCase(s"codegen = T hashmap = F", numIters = 3) { _ =>
+withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
true.toString,
+  SQLConf.ENABLE_TWOLEVEL_AGG_MAP.key -> false.toString,
+  "spark.sql.codegen.aggregate.map.vectorized.enable" -> 
false.toString) {
--- End diff --

@wangyum . This one is also for indentation. Please note that 
`withSQLConf(` is beyond the first configuration.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r220026018
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -73,23 +73,26 @@ object AggregateBenchmark extends SqlBasedBenchmark {
 spark.range(N).selectExpr("(id & 65535) as 
k").groupBy("k").sum().collect()
   }
 
-  benchmark.addCase(s"codegen = F", numIters = 2) { iter =>
-spark.conf.set("spark.sql.codegen.wholeStage", "false")
-f()
+  benchmark.addCase(s"codegen = F", numIters = 2) { _ =>
+withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
false.toString) {
--- End diff --

When we use `Seq(true, false).foreach { value =>`, we usually do 
`s"$value"`. But, for this, I think `"false"` is the simplest and the best.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org