[ https://issues.apache.org/jira/browse/PARQUET-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084075#comment-17084075 ]
Gabor Szadovszky commented on PARQUET-1739: ------------------------------------------- [~yumwang], Have you succeeded to implement the page skipping mechanism in Spark? Without that you may only see the overhead of the column-indexes and not the benefit. Meanwhile, even if the page skipping is implemented there might be a little performance degradation in case of the data is not sorted at all (the min/max values are very similar for the different pages). In this case the column/offset index reading I/O is the overhead while we cannot drop any pages based on the min/max values so we read the same amount of data as we would not have column indexes. >From column index point of view we should not have too much difference between >the runs if no ppd is used (no filter is set in the parquet API). > Make Spark SQL support Column indexes > ------------------------------------- > > Key: PARQUET-1739 > URL: https://issues.apache.org/jira/browse/PARQUET-1739 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr > Affects Versions: 1.11.0 > Reporter: Yuming Wang > Assignee: Yuming Wang > Priority: Major > Fix For: 1.11.1 > > > Make Spark SQL supportĀ Column indexes. -- This message was sent by Atlassian Jira (v8.3.4#803005)