[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables URL: https://github.com/apache/spark/pull/23336#issuecomment-459021719 @HyukjinKwon Thanks for your help! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables URL: https://github.com/apache/spark/pull/23336#issuecomment-458568416 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables URL: https://github.com/apache/spark/pull/23336#issuecomment-458404648 @HyukjinKwon Please note that I additionally changed JsonBenchmark.scala to run inference benchmarks with inferTimestamp=false. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables URL: https://github.com/apache/spark/pull/23336#issuecomment-458008758 @HyukjinKwon The last time CSVBenchmark-results.txt or JSONBenchmark-results.txt was committed to master was roughly 200 commits before the partial results change. Also, this PR contains some changes to CSVBenchmark.scala and JsonBenchmark.scala. Until this PR, JsonBenchmark.scala had no test for wide rows (which is where the issue really shows up). Thought #1: I am thinking of creating a separate PR with only the CSVBenchmark.scala and JsonBenchmark.scala changes, plus the two results files. This would be the baseline, and it would include wide-row tests. After that PR is merged, I would run the benchmarks again, but against this PR. I would include the new results files as part of this PR. Or, thought #2: I could just update CSVBenchmark.scala and JsonBenchmark.scala in a local working copy of the baseline, run benchmarks for both my local baseline and the PR, and simply verify that the PR fixes the issue. Then, I would commit the two results file to this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables URL: https://github.com/apache/spark/pull/23336#issuecomment-457205039 @MaxGekk @HyukjinKwon Yes, I should rerun the benchmarks since Spark itself has changed (there have been nearly 200 commits to Spark since I last ran the benchmarks). However, currently running the JSON benchmark twice would take at least 4-5 hours (due to SPARK-26711). So I am waiting until that is fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables URL: https://github.com/apache/spark/pull/23336#issuecomment-457061380 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org