bersprockets edited a comment on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables URL: https://github.com/apache/spark/pull/23336#issuecomment-458008758 @HyukjinKwon The last time CSVBenchmark-results.txt or JSONBenchmark-results.txt was committed to master was roughly 200 commits before the partial results change. Also, this PR contains some changes to CSVBenchmark.scala and JsonBenchmark.scala. Until this PR, JsonBenchmark.scala had no test for wide rows (which is where the issue really shows up). Thought 1: I am thinking of creating a separate PR with only the CSVBenchmark.scala and JsonBenchmark.scala changes, plus the two results files. This would be the baseline, and it would include wide-row tests. After that PR is merged, I would run the benchmarks again, but against this PR. I would include the new results files as part of this PR. Or, thought 2: I could just update CSVBenchmark.scala and JsonBenchmark.scala in a local working copy of the baseline, run benchmarks for both my local baseline and the PR, and simply verify that the PR fixes the issue. Then, I would commit the two results file to this PR. Lastly, thought 3: Maybe we don't need the changes to CSVBenchmark.scala and JsonBenchmark.scala. I would just commit the results files without the new cases.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org