[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

ASF GitHub Bot (Jira) Fri, 17 Nov 2023 12:01:07 -0800


    [ 
https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787375#comment-17787375
 ]


ASF GitHub Bot commented on PARQUET-2171:
-----------------------------------------

steveloughran commented on PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1817012867

   OK, I've tried to address the changes as well as merge with master
   
   The one thing I'm yet to do is the one by @danielcweeks : have an interface 
for which the hadoop vector IO would be just one implementation.
   
   We effectively have that in SeekableInputStream; two new default methods: 
one a probe for the api availability and the other an invocation.
   ```
   
   Would you be able to wire up the iceberg reader to that? And if not, what 
changes are needed?
   
   One thing we would need to make sure was good is the awaitFuture stuff; 
that's a copy of what's in hadoop to handle async IO operations. There's also a 
hard coded timeout of 300s to wait for the results; I don't know/recall where 
that number came from but it's potentially dubious as it won't recover from 
network problems.
   




> Implement vectored IO in parquet file format
> --------------------------------------------
>
>                 Key: PARQUET-2171
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2171
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Mukund Thakur
>            Priority: Major
>
> We recently added a new feature called vectored IO in Hadoop for improving 
> read performance for seek heavy readers. Spark Jobs and others which uses 
> parquet will greatly benefit from this api. Details can be found here 
> [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5]
> https://issues.apache.org/jira/browse/HADOOP-18103
> https://issues.apache.org/jira/browse/HADOOP-11867



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

Reply via email to