dongjoon-hyun commented on code in PR #42667: URL: https://github.com/apache/spark/pull/42667#discussion_r1315206808
########## sql/core/benchmarks/JsonBenchmark-results.txt: ########## @@ -3,121 +3,125 @@ Benchmark for performance of JSON parsing ================================================================================================ Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws +Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz JSON schema inferring: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 3720 3843 121 1.3 743.9 1.0X -UTF-8 is set 5412 5455 45 0.9 1082.4 0.7X +No encoding 2084 2134 46 2.4 416.8 1.0X +UTF-8 is set 3077 3093 14 1.6 615.3 0.7X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws +Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz count a short column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 3234 3254 33 1.5 646.7 1.0X -UTF-8 is set 4847 4868 21 1.0 969.5 0.7X +No encoding 2854 2863 8 1.8 570.8 1.0X +UTF-8 is set 4066 4066 1 1.2 813.1 0.7X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws +Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz count a wide column: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 5702 5794 101 0.2 5702.1 1.0X -UTF-8 is set 9526 9607 73 0.1 9526.1 0.6X +No encoding 3348 3368 26 0.3 3347.8 1.0X +UTF-8 is set 5215 5239 22 0.2 5214.7 0.6X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws +Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz select wide row: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -No encoding 18318 18448 199 0.0 366367.7 1.0X -UTF-8 is set 19791 19887 99 0.0 395817.1 0.9X +No encoding 11046 11102 54 0.0 220928.4 1.0X +UTF-8 is set 12135 12181 54 0.0 242697.4 0.9X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws +Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz Select a subset of 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Select 10 columns 2531 2570 51 0.4 2531.3 1.0X -Select 1 column 1867 1882 16 0.5 1867.0 1.4X +Select 10 columns 2486 2488 2 0.4 2486.5 1.0X +Select 1 column 1505 1506 2 0.7 1504.6 1.7X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws +Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz creation of JSON parser per line: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Short column without encoding 868 875 7 1.2 868.4 1.0X -Short column with UTF-8 1151 1163 11 0.9 1150.9 0.8X -Wide column without encoding 12063 12299 205 0.1 12063.0 0.1X -Wide column with UTF-8 16095 16136 51 0.1 16095.3 0.1X +Short column without encoding 888 889 3 1.1 887.6 1.0X +Short column with UTF-8 1134 1136 2 0.9 1134.3 0.8X +Wide column without encoding 8012 8056 51 0.1 8012.4 0.1X +Wide column with UTF-8 9830 9844 22 0.1 9829.7 0.1X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws +Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz JSON functions: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 165 170 4 6.1 164.7 1.0X -from_json 2339 2386 77 0.4 2338.9 0.1X -json_tuple 2667 2730 55 0.4 2667.3 0.1X -get_json_object 2627 2659 32 0.4 2627.1 0.1X +Text read 85 87 2 11.7 85.4 1.0X +from_json 1706 1711 4 0.6 1706.4 0.1X +json_tuple 1528 1534 7 0.7 1528.2 0.1X +get_json_object 1275 1286 17 0.8 1275.0 0.1X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws +Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz Dataset of json strings: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 700 715 20 7.1 140.1 1.0X -schema inferring 3144 3166 20 1.6 628.7 0.2X -parsing 3261 3271 9 1.5 652.1 0.2X +Text read 369 370 1 13.6 73.8 1.0X +schema inferring 1880 1883 4 2.7 376.0 0.2X +parsing 3731 3737 8 1.3 746.1 0.1X Preparing data for benchmarking ... -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1037-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 on Linux 5.4.0-1045-aws +Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz Json files in the per-line mode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -Text read 1096 1105 12 4.6 219.1 1.0X -Schema inferring 3818 3830 16 1.3 763.6 0.3X -Parsing without charset 4107 4137 32 1.2 821.4 0.3X -Parsing with UTF-8 5717 5763 41 0.9 1143.3 0.2X +Text read 553 579 32 9.0 110.6 1.0X +Schema inferring 2195 2196 2 2.3 439.0 0.3X +Parsing without charset 4272 4274 3 1.2 854.3 0.1X Review Comment: Given the ration between `Test read` and `Parsing without charset `, is there a chance of regression in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org