[ https://issues.apache.org/jira/browse/PARQUET-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070830#comment-17070830 ]
Gabor Szadovszky commented on PARQUET-1830: ------------------------------------------- [~FelixKJose], agreed. So this jira is to track the long term effort of having a vectorized API in parquet-mr so our clients don't have to use our internal API to have fast reading yet having our ppd filtering (including column indexes and bloom filters) automatically executed under the hood. > Vectorized API to support Column Index in Apache Spark > ------------------------------------------------------ > > Key: PARQUET-1830 > URL: https://issues.apache.org/jira/browse/PARQUET-1830 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr > Affects Versions: 1.11.0 > Reporter: Felix Kizhakkel Jose > Priority: Major > > As per the comment on https://issues.apache.org/jira/browse/SPARK-26345. Its > seems like Apache Spark doesn't support Column Index until we disable > vectorizedReader in Spark - which will have other performance implications. > As per [~zi] , parquet-mr should implement a Vectorized API. Is it already > implemented or any pull request for the same? -- This message was sent by Atlassian Jira (v8.3.4#803005)