[
https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791061#comment-17791061
]
ASF GitHub Bot commented on PARQUET-2171:
-----------------------------------------
steveloughran commented on PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1831828739
Code wise, no, other than reviews from others about what is the best place
for things, such as that awaitFuture stuff or any other suggestions which
people who know the parquet codebase think is best. Code works and we have been
testing this through Amazon S3 Express storage for extra speed up. To be
ruthless: there's no point paying the premium for that until you've embraced
the extra speed ups you get from this first
> Implement vectored IO in parquet file format
> --------------------------------------------
>
> Key: PARQUET-2171
> URL: https://issues.apache.org/jira/browse/PARQUET-2171
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-mr
> Reporter: Mukund Thakur
> Priority: Major
>
> We recently added a new feature called vectored IO in Hadoop for improving
> read performance for seek heavy readers. Spark Jobs and others which uses
> parquet will greatly benefit from this api. Details can be found hereĀ
> [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5]
> https://issues.apache.org/jira/browse/HADOOP-18103
> https://issues.apache.org/jira/browse/HADOOP-11867
--
This message was sent by Atlassian Jira
(v8.20.10#820010)