[
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048521#comment-18048521
]
ASF GitHub Bot commented on HADOOP-19767:
-----------------------------------------
ahmarsuhail commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3701812349
@anujmodi2021 I am trying to propose a single optimised implementation of an
input stream across cloud implementations, as I think we all need this kind of
logic. Ideally I want to get to a place where 80% of the logic is shared in a
common layer, and then we only implement cloud specific clients to actually
make the requests separately.
There is some consensus to move the shared logic into the parquet-java repo:
https://lists.apache.org/thread/nbksq32cs8h1ldj8762y6wh9zzp8gqx6 , and some
buy-in from the team at google. I'll be following up on this in the new year.
Would be great to get your thoughts and if your team would also like to
collaborate on this.
> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> --------------------------------------------------------------------
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.4.2
> Reporter: Anuj Modi
> Assignee: Anuj Modi
> Priority: Major
> Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of
> AbfsInputStream. Different kinds of workloads require different heuristics to
> give the best performance for that type of workload. For example:
> # Sequential Read Workloads like DFSIO and DistCP gain performance
> improvement from prefetched
> # Random Read Workloads on other hand do not need Prefetches and enabling
> prefetches for them is an overhead and TPS heavy
> # Query Workloads involving Parquet/ORC files benefit from improvements like
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create
> Input Streams implemented for that particular pattern.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]