[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-11-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r230060038
  
--- Diff: sql/core/benchmarks/JSONBenchmark-results.txt ---
@@ -0,0 +1,37 @@

+
+Benchmark for performance of JSON parsing

+
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 62946 / 63310  1.6 
629.5   1.0X
+UTF-8 is set  112814 / 112866  0.9 
   1128.1   0.6X
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 16468 / 16553  6.1 
164.7   1.0X
+UTF-8 is set16420 / 16441  6.1 
164.2   1.0X
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+JSON parsing of wide lines:  Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 39789 / 40053  0.3 
   3978.9   1.0X
+UTF-8 is set39505 / 39584  0.3 
   3950.5   1.0X
--- End diff --

I commented on the PR. Please add another benchmark cases instead of 
changing the existing numbers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r230030467
  
--- Diff: sql/core/benchmarks/JSONBenchmark-results.txt ---
@@ -0,0 +1,37 @@

+
+Benchmark for performance of JSON parsing

+
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 62946 / 63310  1.6 
629.5   1.0X
+UTF-8 is set  112814 / 112866  0.9 
   1128.1   0.6X
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 16468 / 16553  6.1 
164.7   1.0X
+UTF-8 is set16420 / 16441  6.1 
164.2   1.0X
+
+Preparing data for benchmarking ...
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+JSON parsing of wide lines:  Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 39789 / 40053  0.3 
   3978.9   1.0X
+UTF-8 is set39505 / 39584  0.3 
   3950.5   1.0X
--- End diff --

The numbers for currently used Jackson parser should be slightly different. 
The PR https://github.com/apache/spark/pull/22920 triggers creation of Jackson 
parser.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22844


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229376327
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

Thank you for the confirmation!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229243855
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

Ah, I see. This is also because of count optimization. ratio is weird but 
actually it's performance improvement for both cases. shouldn't be a big deal.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229214337
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

Let me take a quick look within few days. This is per line basic case where 
many users are affected.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229213742
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

IIRC, this benchmark was added rather we can make sure setting encoding 
does not affect the performance without encoding (right @MaxGekk ?). We should 
fix this. @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229213583
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

I also run this benchmark and got the same ratio. So it's a little weird.
- 
https://github.com/heary-cao/spark/pull/3/files#diff-7676fb48b895486092bea2fb491e6de4R18


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229212923
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

Wait .. this is almost 50% slower. This had to be around 8000ish.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229211569
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
--- End diff --

@heary-cao . Could you review and merge 
https://github.com/heary-cao/spark/pull/3 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229210176
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON schema inferring:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 48088 / 48180  2.1 
480.9   1.0X
+UTF-8 is set71881 / 71992  1.4 
718.8   0.7X
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+JSON per-line parsing:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative

+
+No encoding 12107 / 12246  8.3 
121.1   1.0X
+UTF-8 is set12375 / 12475  8.1 
123.8   1.0X
--- End diff --

Hi, @HyukjinKwon . According to the ratio, it seems to be a regression on 
`No encoding` case. How do you think this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229205020
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
--- End diff --

I'll make a PR to you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229202285
  
--- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt ---
@@ -0,0 +1,33 @@

+
+Benchmark for performance of JSON parsing

+
+
+OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
--- End diff --

With the same reason 
(https://github.com/apache/spark/pull/22845#discussion_r229199434), it's 
difficult to figure out the CPU.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229019783
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
 ---
@@ -195,23 +170,16 @@ object JSONBenchmarks extends SQLHelper {
 ds.count()
   }
 
-  /*
-  Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
-
-  Count a dataset with 10 columns:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative
-  
-
-  Select 10 columns + count()   9961 / 10006  1.0  
   996.1   1.0X
-  Select 1 column + count()  8355 / 8470  1.2  
   835.5   1.2X
-  count()2104 / 2156  4.8  
   210.4   4.7X
-  */
   benchmark.run()
 }
   }
 
-  def main(args: Array[String]): Unit = {
-schemaInferring(100 * 1000 * 1000)
-perlineParsing(100 * 1000 * 1000)
-perlineParsingOfWideColumn(10 * 1000 * 1000)
-countBenchmark(10 * 1000 * 1000)
+  override def runBenchmarkSuite(): Unit = {
--- End diff --

+1 for @yucai 's comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-29 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r229010476
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
 ---
@@ -195,23 +170,16 @@ object JSONBenchmarks extends SQLHelper {
 ds.count()
   }
 
-  /*
-  Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
-
-  Count a dataset with 10 columns:  Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative
-  
-
-  Select 10 columns + count()   9961 / 10006  1.0  
   996.1   1.0X
-  Select 1 column + count()  8355 / 8470  1.2  
   835.5   1.2X
-  count()2104 / 2156  4.8  
   210.4   4.7X
-  */
   benchmark.run()
 }
   }
 
-  def main(args: Array[String]): Unit = {
-schemaInferring(100 * 1000 * 1000)
-perlineParsing(100 * 1000 * 1000)
-perlineParsingOfWideColumn(10 * 1000 * 1000)
-countBenchmark(10 * 1000 * 1000)
+  override def runBenchmarkSuite(): Unit = {
--- End diff --

#22872 has updated `runBenchmarkSuite`'s signature.
```suggestion
  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-29 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r228841276
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
 ---
@@ -16,32 +16,33 @@
  */
 package org.apache.spark.sql.execution.datasources.json
 
-import java.io.File
-
-import org.apache.spark.SparkConf
 import org.apache.spark.benchmark.Benchmark
-import org.apache.spark.sql.{Row, SparkSession}
-import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
 import org.apache.spark.sql.functions.lit
 import org.apache.spark.sql.types._
 
 /**
  * The benchmarks aims to measure performance of JSON parsing when 
encoding is set and isn't.
- * To run this:
- *  spark-submit --class  --jars 
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  
org.apache.spark.sql.execution.datasources.json.JSONBenchmarks
--- End diff --

Sorry @heary-cao I mean update here to:
```
bin/spark-submit --class  --jars , 
```
and update [PR 
description](https://github.com/apache/spark/pull/22844#issue-225971331) to:
```
bin/spark-submit --class 
org.apache.spark.sql.execution.datasources.json.JSONBenchmarks --jars 
./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar
 ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-29 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r228832524
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
 ---
@@ -16,32 +16,30 @@
  */
 package org.apache.spark.sql.execution.datasources.json
 
-import java.io.File
-
-import org.apache.spark.SparkConf
 import org.apache.spark.benchmark.Benchmark
-import org.apache.spark.sql.{Row, SparkSession}
-import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
 import org.apache.spark.sql.functions.lit
 import org.apache.spark.sql.types._
 
 /**
  * The benchmarks aims to measure performance of JSON parsing when 
encoding is set and isn't.
- * To run this:
- *  spark-submit --class  --jars 
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars  

--- End diff --

Also update the usage in description:
```console
bin/spark-submit --class 
org.apache.spark.sql.execution.datasources.csv.CSVBenchmarks --jars 
./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar,./sql/catalyst/target/spark-catalyst_2.11-3.0.0-SNAPSHOT-tests.jar
 ./sql/core/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-29 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22844#discussion_r228832419
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmarks.scala
 ---
@@ -16,32 +16,30 @@
  */
 package org.apache.spark.sql.execution.datasources.json
 
-import java.io.File
-
-import org.apache.spark.SparkConf
 import org.apache.spark.benchmark.Benchmark
-import org.apache.spark.sql.{Row, SparkSession}
-import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark
 import org.apache.spark.sql.functions.lit
 import org.apache.spark.sql.types._
 
 /**
  * The benchmarks aims to measure performance of JSON parsing when 
encoding is set and isn't.
- * To run this:
- *  spark-submit --class  --jars 
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *  bin/spark-submit --class  --jars  

--- End diff --

Please update `without sbt` usage to:
```
bin/spark-submit --class  --jars , 
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-10-25 Thread heary-cao
GitHub user heary-cao opened a pull request:

https://github.com/apache/spark/pull/22844

[SPARK-25847][SQL][TEST] Refactor JSONBenchmarks to use main method

## What changes were proposed in this pull request?

Refactor JSONBenchmarks to use main method

use spark-submit:
bin/spark-submit --class  
org.apache.spark.sql.execution.datasources.json.JSONBenchmarks --jars 
./core/target/spark-core_2.11-3.0.0-SNAPSHOT-tests.jar 
./sql/catalyst/target/spark-sql_2.11-3.0.0-SNAPSHOT-tests.jar
Generate benchmark result:
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.datasources.json.JSONBenchmarks"
  

## How was this patch tested?

manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/heary-cao/spark JSONBenchmarks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22844.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22844


commit 937111f7f53744c8fe1a6b4fd0559643743eefae
Author: caoxuewen 
Date:   2018-10-26T03:52:31Z

Refactor JSONBenchmarks to use main method




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org