[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

ASF GitHub Bot (Jira) Thu, 14 Sep 2023 17:11:38 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765395#comment-17765395
 ]


ASF GitHub Bot commented on PARQUET-2171:
-----------------------------------------

danielcweeks commented on PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1720309506

   @steveloughran This looks really great!  I think my only comment would be 
about wether we can expose the implementation in a way that might be more 
pluggable.  In Iceberg we have a similar parallel to the InputFile and 
SeekableStream, but it's not apparent to me that we would be able to adapt our 
IO implementation to leverage vectored reads.
   
   Open to thoughts on how we might do that as well.




> Implement vectored IO in parquet file format
> --------------------------------------------
>
>                 Key: PARQUET-2171
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2171
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Mukund Thakur
>            Priority: Major
>
> We recently added a new feature called vectored IO in Hadoop for improving 
> read performance for seek heavy readers. Spark Jobs and others which uses 
> parquet will greatly benefit from this api. Details can be found here 
> [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5]
> https://issues.apache.org/jira/browse/HADOOP-18103
> https://issues.apache.org/jira/browse/HADOOP-11867



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

Reply via email to