[ 
https://issues.apache.org/jira/browse/DRILL-6814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16666930#comment-16666930
 ] 

Arina Ielchiieva edited comment on DRILL-6814 at 10/29/18 10:45 AM:
--------------------------------------------------------------------

[~ashishkshukladb] you are querying the same files from S3 and HDFS with the 
same Drill cluster layout, you can compare query profiles and see where the 
bottleneck is, how many major fragments are created, i.e. if Drill parallels 
read operation on S3 and HDFS. 

Also what type of storage do you use 
(https://aws.amazon.com/s3/storage-classes/)?


was (Author: arina):
[~ashishkshukladb] you are querying the same files from S3 and HDFS with the 
same Drill cluster layout, you can compare query profiles and see where the 
bottleneck is, how many major fragments are created, i.e. if Drill parallels 
read operation on S3 and HDFS. 

> Query performance on S3 files
> -----------------------------
>
>                 Key: DRILL-6814
>                 URL: https://issues.apache.org/jira/browse/DRILL-6814
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Other
>    Affects Versions: 1.14.0
>         Environment: Amazon EC2 instances-
> 4 Linux Redhat machines -version 7.5
> RAM- 32GB
>            Reporter: Ashish Shukla
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.15.0
>
>
> I have installed 4 Node drill cluster on Amazon EC2 and  trying to execute a 
> simple count on one Amazon S3 file. File type is CSV and size is approx- 14GB.
>  The query returns expected count after the execution of approx 30 minutes.
>  If we keep the same file in hdfs or create a table in postgres, execution 
> time is relatively very less (approx 2-3 minutes).
>  Is it normal behavior or something can be done for S3 files to make 
> execution time comparable ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to