GitHub user bersprockets opened a pull request:

    https://github.com/apache/spark/pull/21043

    [SPARK-23963] [SQL] Properly handle large number of columns in query on 
text-based Hive table

    ## What changes were proposed in this pull request?
    
    TableReader would get disproportionately slower as the number of columns in 
the query increased.
    
    I fixed the way TableReader was looking up metadata for each column in the 
row. Previously, it had been looking up this data in linked lists, accessing 
each linked list by an index (column number). Now it looks up this data in 
arrays, where indexing by column number works better.
    
    ## How was this patch tested?
    
    All sbt unit tests
    python sql tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bersprockets/spark tabreadfix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21043.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21043
    
----
commit 26715a8110be1d72f18604dfd4ae74a5a56d9878
Author: Bruce Robbins <bersprockets@...>
Date:   2018-04-11T05:05:12Z

    Initial commit for testing

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to