[ 
https://issues.apache.org/jira/browse/IMPALA-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030093#comment-17030093
 ] 

Sahil Takiar commented on IMPALA-9316:
--------------------------------------

Talked with Gopal V about this. This change will be particularly helpful when 
scanning a lot of columns from a Parquet/ORC table with a lot of columns.

Since each column is stored separately in Parquet/ORC, all columns will be 
stored a part from each other. If 4 MB of data need to be read from the file, 
it would be better to read all of them at once rather than 1000 separate 4K 
reads (e.g. if reading 1000 columns). Even if the reads are slightly a part 
from each other, the extra data can just be skipped over.

> Consider coalescing S3 scans
> ----------------------------
>
>                 Key: IMPALA-9316
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9316
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Priority: Major
>
> We should consider coalescing S3 reads. IIUC the current {{DiskIoMgr}} code 
> for S3A does not do anything special for scheduling S3 scan ranges. It simply 
> round-robin assigns scans to IO threads.
> I think there might be a smarter algorithm we could employ when scheduling S3 
> reads. A few things to consider:
> * With the migration to {{hdfsPreadFully}}, each S3 scan range should 
> correspond to a single HTTP GET request (assuming the 8 MB limit is not hit, 
> see below)
> * {{read_size}} limits the size of a read to 8 MB (I believe if a scan range 
> exceeds this limit, the reads are just done on the same IO thread, but 
> sequentially - they are broken up into multiple HTTP GET requests)
> * S3A has a readahead option that defaults to 64 KB, however, it only applies 
> in certain situations
> ** If {{fs.s3a.experimental.input.fadvise=random}} (which is the recommended 
> value when reading Parquet / ORC data), the readahead applies if (1) it won't 
> cause the read to go past the end of the file, and (2) the request read 
> length is under 64 KB (it reads up to Math.max(requested-read-length, 64 KB)) 
> (so the readahead most likely applies for small reads)
> Coalescing reads would allow Impala to combine multiple, small HTTP GET 
> requests into fewer, larger HTTP GET requests. There may be some data that 
> needs to be skipped over, but the cost of reading that extra data might 
> outweigh the cost of issuing multiple HTTP requests. Since each HTTP request 
> requires a round-trip to S3, issuing a lot of GET requests can be costly, 
> especially if each only reads a small amount of data.
> Some implementation factors to consider:
> * There should probably be a limit on the maximum size of a read request (is 
> 8 MB the right value for S3?)
> * Since S3A uses a default of 64 KB for their readahead, we can probably use 
> a similar value
> * Should the number of disk IO threads be considered when coalescing reads? 
> e.g. by default there are 16 IO threads, if there are 16 small scan ranges, 
> does it make more sense to coalesce them into a single large scan range, or 
> would we get better throughput by issuing all 16 in parallel



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to