[ https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700241#comment-17700241 ]
Apache Spark commented on SPARK-42789: -------------------------------------- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/40419 > rewrites multiple GetJsonObjects to a JsonTuple if their json expression is > the same > ------------------------------------------------------------------------------------ > > Key: SPARK-42789 > URL: https://issues.apache.org/jira/browse/SPARK-42789 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.5.0 > Reporter: Yuming Wang > Priority: Major > > Benchmark result: > {noformat} > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 2 > Stopped after 2 iterations, 80787 ms > Running case: Rewrite: 2 > Stopped after 2 iterations, 48900 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > Default: 2 39026 40394 > 1935 0.2 5397.8 1.0X > Rewrite: 2 24354 24450 > 137 0.3 3368.4 1.6X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 3 > Stopped after 2 iterations, 115055 ms > Running case: Rewrite: 3 > Stopped after 2 iterations, 62297 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > Default: 3 54652 57528 > NaN 0.1 7559.1 1.0X > Rewrite: 3 30702 31149 > 631 0.2 4246.6 1.8X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 4 > Stopped after 2 iterations, 155392 ms > Running case: Rewrite: 4 > Stopped after 2 iterations, 54776 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > Default: 4 75503 77696 > NaN 0.1 10443.1 1.0X > Rewrite: 4 26962 27388 > 602 0.3 3729.3 2.8X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 5 > Stopped after 2 iterations, 192836 ms > Running case: Rewrite: 5 > Stopped after 2 iterations, 51967 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > Default: 5 94923 96418 > 2115 0.1 13129.1 1.0X > Rewrite: 5 25362 25984 > 880 0.3 3507.8 3.7X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 10 > Stopped after 2 iterations, 317246 ms > Running case: Rewrite: 10 > Stopped after 2 iterations, 56734 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > Default: 10 157458 158623 > 1648 0.0 21778.6 1.0X > Rewrite: 10 28296 28367 > 100 0.3 3913.8 5.6X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 20 > Stopped after 2 iterations, 618089 ms > Running case: Rewrite: 20 > Stopped after 2 iterations, 63576 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > Default: 20 285338 309045 > NaN 0.0 39466.2 1.0X > Rewrite: 20 31682 31788 > 151 0.2 4382.0 9.0X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 30 > 07:25:58.851 WARN org.apache.spark.sql.catalyst.util.package: Truncated the > string representation of a plan since it was too large. This behavior can be > adjusted by setting 'spark.sql.debug.maxToStringFields'. > Stopped after 2 iterations, 1113910 ms > Running case: Rewrite: 30 > Stopped after 2 iterations, 101468 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > Default: 30 481691 556955 > 1722 0.0 66624.5 1.0X > Rewrite: 30 50497 50734 > 335 0.1 6984.5 9.5X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 36 > Stopped after 2 iterations, 1272619 ms > Running case: Rewrite: 36 > Stopped after 2 iterations, 81609 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > Default: 36 576500 636310 > NaN 0.0 79737.8 1.0X > Rewrite: 36 40461 40805 > 486 0.2 5596.4 14.2X > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org