[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables

2019-01-30 Thread GitBox
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance 
of queries against wide CSV/JSON tables
URL: https://github.com/apache/spark/pull/23336#issuecomment-459021719
 
 
   @HyukjinKwon Thanks for your help!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables

2019-01-29 Thread GitBox
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance 
of queries against wide CSV/JSON tables
URL: https://github.com/apache/spark/pull/23336#issuecomment-458568416
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables

2019-01-28 Thread GitBox
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance 
of queries against wide CSV/JSON tables
URL: https://github.com/apache/spark/pull/23336#issuecomment-458404648
 
 
   @HyukjinKwon Please note that I additionally changed JsonBenchmark.scala to 
run inference benchmarks with inferTimestamp=false.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables

2019-01-27 Thread GitBox
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance 
of queries against wide CSV/JSON tables
URL: https://github.com/apache/spark/pull/23336#issuecomment-458008758
 
 
   @HyukjinKwon 
   
   The last time CSVBenchmark-results.txt or JSONBenchmark-results.txt was 
committed to master was roughly 200 commits before the partial results change.
   
   Also, this PR contains some changes to CSVBenchmark.scala and 
JsonBenchmark.scala. Until this PR, JsonBenchmark.scala had no test for wide 
rows (which is where the issue really shows up).
   
   Thought #1:
   
   I am thinking of creating a separate PR with only the CSVBenchmark.scala and 
JsonBenchmark.scala changes, plus the two results files. This would be the 
baseline, and it would include wide-row tests.
   
   After that PR is merged, I would run the benchmarks again, but against this 
PR. I would include the new results files as part of this PR.
   
   Or, thought #2:
   
   I could just update CSVBenchmark.scala and JsonBenchmark.scala in a local 
working copy of the baseline, run benchmarks for both my local baseline and the 
PR, and simply verify that the PR fixes the issue. Then, I would commit the two 
results file to this PR.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables

2019-01-24 Thread GitBox
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance 
of queries against wide CSV/JSON tables
URL: https://github.com/apache/spark/pull/23336#issuecomment-457205039
 
 
   @MaxGekk @HyukjinKwon Yes, I should rerun the benchmarks since Spark itself 
has changed (there have been nearly 200 commits to Spark since I last ran the 
benchmarks). However, currently running the JSON benchmark twice would take at 
least 4-5 hours (due to SPARK-26711). So I am waiting until that is fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance of queries against wide CSV/JSON tables

2019-01-23 Thread GitBox
bersprockets commented on issue #23336: [SPARK-26378][SQL] Restore performance 
of queries against wide CSV/JSON tables
URL: https://github.com/apache/spark/pull/23336#issuecomment-457061380
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org