GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21969
[SPARK-24945][SQL] Switching to uniVocity 2.7.3 ## What changes were proposed in this pull request? In the PR, I propose to upgrade uniVocity parser from **2.6.3** to **2.7.3**. The recent version includes a fix for the SPARK-24645 issue and has better performance. Before changes: ``` Parsing quoted values: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ One quoted string 33336 / 34122 0.0 666727.0 1.0X Wide rows with 1000 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Select 1000 columns 90287 / 91713 0.0 90286.9 1.0X Select 100 columns 31826 / 36589 0.0 31826.4 2.8X Select one column 25738 / 25872 0.0 25737.9 3.5X count() 6931 / 7269 0.1 6931.5 13.0X ``` after: ``` Parsing quoted values: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ One quoted string 33411 / 33510 0.0 668211.4 1.0X Wide rows with 1000 columns: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Select 1000 columns 88028 / 89311 0.0 88028.1 1.0X Select 100 columns 29010 / 32755 0.0 29010.1 3.0X Select one column 22936 / 22953 0.0 22936.5 3.8X count() 6657 / 6740 0.2 6656.6 13.5X ``` Closes #21892 ## How was this patch tested? It was tested by `CSVSuite` and `CSVBenchmarks` You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 univocity-2_7_3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21969.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21969 ---- commit 7b569ae1318316129d4b0d46969b02324b18b0aa Author: Maxim Gekk <maxim.gekk@...> Date: 2018-07-27T11:59:39Z Bumping version of uniVocity parser up to 2.7.2 commit b116987d9a0adb887201177d41c1b94e6f5aeb63 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-07-27T13:25:11Z Call uniVocity even the set of selected columns is empty commit 3fb9cf76df65abe14dd39d233d18242e72e0a729 Author: Maxim Gekk <maxim.gekk@...> Date: 2018-08-02T09:14:27Z Bumping version to 2.7.3 commit a053994bcc6027668f64c9e55d09dfaa45cb97cf Author: Maxim Gekk <maxim.gekk@...> Date: 2018-08-02T09:14:48Z Revert "Call uniVocity even the set of selected columns is empty" This reverts commit b116987d9a0adb887201177d41c1b94e6f5aeb63. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org