bersprockets edited a comment on issue #23336: [SPARK-26378][SQL] Restore 
performance of queries against wide CSV/JSON tables
URL: https://github.com/apache/spark/pull/23336#issuecomment-458008758
 
 
   @HyukjinKwon 
   
   The last time CSVBenchmark-results.txt or JSONBenchmark-results.txt was 
committed to master was roughly 200 commits before the partial results change.
   
   Also, this PR contains some changes to CSVBenchmark.scala and 
JsonBenchmark.scala. Until this PR, JsonBenchmark.scala had no test for wide 
rows (which is where the issue really shows up).
   
   Thought 1:
   
   I am thinking of creating a separate PR with only the CSVBenchmark.scala and 
JsonBenchmark.scala changes, plus the two results files. This would be the 
baseline, and it would include wide-row tests.
   
   After that PR is merged, I would run the benchmarks again, but against this 
PR. I would include the new results files as part of this PR.
   
   Or, thought 2:
   
   I could just update CSVBenchmark.scala and JsonBenchmark.scala in a local 
working copy of the baseline, run benchmarks for both my local baseline and the 
PR, and simply verify that the PR fixes the issue. Then, I would commit the two 
results file to this PR.
   
   Lastly, thought 3:
   
   Maybe we don't need the changes to CSVBenchmark.scala and 
JsonBenchmark.scala. I would just commit the results files without the new 
cases.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to