[ https://issues.apache.org/jira/browse/DRILL-6814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16666930#comment-16666930 ]
Arina Ielchiieva commented on DRILL-6814: ----------------------------------------- [~ashishkshukladb] you are querying the same files from S3 and HDFS with the same Drill cluster layout, you can compare query profiles and see where the bottleneck is, how many major fragments are created, i.e. if Drill parallels read operation on S3 and HDFS. > Query performance on S3 files > ----------------------------- > > Key: DRILL-6814 > URL: https://issues.apache.org/jira/browse/DRILL-6814 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other > Affects Versions: 1.14.0 > Environment: Amazon EC2 instances- > 4 Linux Redhat machines -version 7.5 > RAM- 32GB > Reporter: Ashish Shukla > Assignee: Arina Ielchiieva > Priority: Major > Fix For: 1.15.0 > > > I have installed 4 Node drill cluster on Amazon EC2 and trying to execute a > simple count on one Amazon S3 file. File type is CSV and size is approx- 14GB. > The query returns expected count after the execution of approx 30 minutes. > If we keep the same file in hdfs or create a table in postgres, execution > time is relatively very less (approx 2-3 minutes). > Is it normal behavior or something can be done for S3 files to make > execution time comparable ? -- This message was sent by Atlassian JIRA (v7.6.3#76005)