Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22844#discussion_r229243855 --- Diff: sql/core/benchmarks/JSONBenchmarks-results.txt --- @@ -0,0 +1,33 @@ +================================================================================================ +Benchmark for performance of JSON parsing +================================================================================================ + +OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1 +Intel64 Family 6 Model 94 Stepping 3, GenuineIntel +JSON schema inferring: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +No encoding 48088 / 48180 2.1 480.9 1.0X +UTF-8 is set 71881 / 71992 1.4 718.8 0.7X + +OpenJDK 64-Bit Server VM 1.8.0_163-b01 on Windows 7 6.1 +Intel64 Family 6 Model 94 Stepping 3, GenuineIntel +JSON per-line parsing: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------ +No encoding 12107 / 12246 8.3 121.1 1.0X +UTF-8 is set 12375 / 12475 8.1 123.8 1.0X --- End diff -- Ah, I see. This is also because of count optimization. ratio is weird but actually it's performance improvement for both cases. shouldn't be a big deal.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org