anujmodi2021 opened a new pull request, #8153:
URL: https://github.com/apache/hadoop/pull/8153

   ### Description of PR
   Since the onset of ABFS Driver, there has been a single implementation of 
AbfsInputStream. Different kinds of workloads require different heuristics to 
give the best performance for that type of workload. For example: 
   
   Sequential Read Workloads like DFSIO and DistCP gain performance improvement 
from prefetched 
   Random Read Workloads on other hand do not need Prefetches and enabling 
prefetches for them is an overhead and TPS heavy 
   Query Workloads involving Parquet/ORC files benefit from improvements like 
Footer Read and Small Files Reads
   
   To accomodate this we need to determine the pattern and accordingly create 
Input Streams implemented for that particular pattern.
   
   <img width="635" height="290" alt="image" 
src="https://github.com/user-attachments/assets/5b7a3db9-ab04-43cf-b44e-5e7a6582205f";
 />
   
   Moving ahead more relevant policies and specialized implementation of 
AbfsInputStream can be added.
   
   This PR only refactors the way we create input streams. No logical change 
introduced. As today by default we will continue to use AbfsAdaptiveInputStream 
which can cater to all kind of workloads.
   
   ### How was this patch tested?
   New tests were added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to