[ 
https://issues.apache.org/jira/browse/DRILL-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093116#comment-14093116
 ] 

Steven Phillips commented on DRILL-1072:
----------------------------------------

Is the 1573 second you show in the bug description a typo? Because you have 
152.7 seconds for the same table in the first comment.

> Drill is very slow when we have a large number of text files
> ------------------------------------------------------------
>
>                 Key: DRILL-1072
>                 URL: https://issues.apache.org/jira/browse/DRILL-1072
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization, Storage - Parquet, 
> Storage - Text & CSV
>            Reporter: Rahul Challapalli
>            Assignee: Steven Phillips
>             Fix For: 0.5.0
>
>
> git.commit.id.abbrev=efa3274
> Build# 26178
> As the total number of files under the below directory increase, drill 
> becomes very slow. Check the results for different file counts for the below 
> query.
> All files just contain 1 number and have a '.tbl' extension
> select count(*) from dfs.`/drill/testdata/morefiles`;
> 100 files --- 5.183 seconds
> 250 files --- 15.021 seconds
> 500 files --- 26.846 seconds
> 1000 files --- 69.835 seconds
> 5000 files --- 1573.589 seconds
> The logs contain these messages repeatedly when executing against 5000 files:
> 22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
> 22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
> 22:02:22.819 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
> 22:02:22.840 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
> 22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
> 22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
> 22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
> 22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
> 22:02:22.864 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5
> 22:02:23.035 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
> 22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
> 22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0
> 22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536
> 22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500
> 22:02:23.060 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG 
> o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to