[
https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765395#comment-17765395
]
ASF GitHub Bot commented on PARQUET-2171:
-----------------------------------------
danielcweeks commented on PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1720309506
@steveloughran This looks really great! I think my only comment would be
about wether we can expose the implementation in a way that might be more
pluggable. In Iceberg we have a similar parallel to the InputFile and
SeekableStream, but it's not apparent to me that we would be able to adapt our
IO implementation to leverage vectored reads.
Open to thoughts on how we might do that as well.
> Implement vectored IO in parquet file format
> --------------------------------------------
>
> Key: PARQUET-2171
> URL: https://issues.apache.org/jira/browse/PARQUET-2171
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-mr
> Reporter: Mukund Thakur
> Priority: Major
>
> We recently added a new feature called vectored IO in Hadoop for improving
> read performance for seek heavy readers. Spark Jobs and others which uses
> parquet will greatly benefit from this api. Details can be found hereĀ
> [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5]
> https://issues.apache.org/jira/browse/HADOOP-18103
> https://issues.apache.org/jira/browse/HADOOP-11867
--
This message was sent by Atlassian Jira
(v8.20.10#820010)