[ 
https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770167#comment-17770167
 ] 

ASF GitHub Bot commented on PARQUET-2171:
-----------------------------------------

parthchandra commented on PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1739701945

   > @danielcweeks that's a good point about pluggability.
   > I don't know if that would be useful for iceberg 
https://github.com/apache/hadoop-api-shim
   
   Iceberg can use the base Parquet File reader out of the box so should be 
able to use vector IO as it is. 
   
   > getting iceberg to pass down which stripes it wants to read is critical 
for this to work best with s3, abfs and gcs. how are you reading the files at 
present?
   
   However if the S3FileIO feature is enabled, Iceberg provides its own 
InputStream and InputFile implementation that use AWS SDK V2. Maybe an option 
to provide your own input stream to vector io might work.




> Implement vectored IO in parquet file format
> --------------------------------------------
>
>                 Key: PARQUET-2171
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2171
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Mukund Thakur
>            Priority: Major
>
> We recently added a new feature called vectored IO in Hadoop for improving 
> read performance for seek heavy readers. Spark Jobs and others which uses 
> parquet will greatly benefit from this api. Details can be found hereĀ 
> [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5]
> https://issues.apache.org/jira/browse/HADOOP-18103
> https://issues.apache.org/jira/browse/HADOOP-11867



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to