[jira] [Commented] (SPARK-44116) Utilize Hadoop vectorized APIs

Steve Loughran (Jira) Mon, 31 Jul 2023 09:33:58 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-44116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749266#comment-17749266
 ]


Steve Loughran commented on SPARK-44116:
----------------------------------------

If this gets into the libraries, you don't need explicit support in spark 
unless you really want to do your own.

what could be good is replacing FileSystem.open() with the openFile() builder, 
passing in your read policy and any file status/file length you have. saves 
HEAD requests and tunes GET/prefetching based on expected use.

> Utilize Hadoop vectorized APIs
> ------------------------------
>
>                 Key: SPARK-44116
>                 URL: https://issues.apache.org/jira/browse/SPARK-44116
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 4.0.0
>            Reporter: Dongjoon Hyun
>            Priority: Major
>
> Apache Hadoop 3.3.5+ supports vectorized APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44116) Utilize Hadoop vectorized APIs

Reply via email to