[ 
https://issues.apache.org/jira/browse/IMPALA-11064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yida Wu updated IMPALA-11064:
-----------------------------
    Description: 
Impala allows batching reading of remote temporary files in IMPALA-10791. It 
can improve the query performance when the data amount is relatively small, but 
it can have regressions on big amount of data. The main reason is the temporary 
files mix the data from all the partitions, therefore, during batching reading, 
there are so many random reads even for sequential scanning and the recycling 
rate of the read buffer blocks are often quite low.

 

Therefore, a partitioned structure of temporary files should be needed for a 
better query performance when batching reading is used.

  was:
Impala allows spilling to a remote filesystem, like S3. The rate of uploading 
the spilled data to the remote filesystem is fast, but the speed of reading 
from the remote is slow because each time only one page is read from the remote 
filesystem while we do per upload per file.

 

The task aims to improve the reading performance of the spilling to a remote 
filesystem by using batch reading.


> Optimizing Temporary File Structure for Batching Reading
> --------------------------------------------------------
>
>                 Key: IMPALA-11064
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11064
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Yida Wu
>            Assignee: Yida Wu
>            Priority: Major
>
> Impala allows batching reading of remote temporary files in IMPALA-10791. It 
> can improve the query performance when the data amount is relatively small, 
> but it can have regressions on big amount of data. The main reason is the 
> temporary files mix the data from all the partitions, therefore, during 
> batching reading, there are so many random reads even for sequential scanning 
> and the recycling rate of the read buffer blocks are often quite low.
>  
> Therefore, a partitioned structure of temporary files should be needed for a 
> better query performance when batching reading is used.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to