Rahul Challapalli created DRILL-1072:
----------------------------------------

             Summary: Drill is very slow when we have a large number of text 
files
                 Key: DRILL-1072
                 URL: https://issues.apache.org/jira/browse/DRILL-1072
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization, Storage - Parquet, Storage 
- Text & CSV
            Reporter: Rahul Challapalli


git.commit.id.abbrev=efa3274
Build# 26178

As the total number of files under the below directory increase, drill becomes 
very slow. Check the results for different file counts for the below query.

All files just contain 1 number and have a '.tbl' extension

select count(*) from dfs.`/drill/testdata/morefiles`;

100 files --- 5.183 seconds
250 files --- 15.021 seconds
500 files --- 26.846 seconds
1000 files --- 69.835 seconds
5000 files --- 1573.589 seconds

The logs contain these messages repeatedly when executing against 5000 files:

22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:22.819 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
22:02:22.840 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:22.864 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
22:02:23.035 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
22:02:23.060 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to