Arshadh created DRILL-7719:
------------------------------

             Summary: Query Execution plan don't use MongoDB Index.
                 Key: DRILL-7719
                 URL: https://issues.apache.org/jira/browse/DRILL-7719
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 1.17.0
            Reporter: Arshadh


The query plan was showing a collection scan, going over all the rows in the 
mongo collection. Hence created an index on the predicate cols, expecting Drill 
to choose and index based access plan. But continues to use the full table scan 
path.

The actual query, generated query plan and the mongo index are given below.

SQL:
Select j.user as User, TO_DATE(j.created_at) as submitted_on
from mongo.example.jobs j
where j.user = 'j...@example.ai' and j.created_at BETWEEN timestamp '2020-03-25 
13:12:55' AND timestamp '2020-04-24 13:12:55'

Physical Plan
00-00 Screen : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 
121.2375, cumulative cost = \{6720.59875 rows, 23532.19875 cpu, 895541.0 io, 
0.0 network, 0.0 memory}, id = 10468
00-01 Project(User=[$0], submitted_on=[TO_DATE($1)]) : rowType = RecordType(ANY 
User, ANY submitted_on): rowcount = 121.2375, cumulative cost = \{6708.475 
rows, 23520.075 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10467
00-02 SelectionVectorRemover : rowType = RecordType(ANY user, ANY created_at): 
rowcount = 121.2375, cumulative cost = \{6587.2375 rows, 22913.8875 cpu, 
895541.0 io, 0.0 network, 0.0 memory}, id = 10466
00-03 Filter(condition=[AND(=($0, 'j...@example.ai'), >=($1, 2020-03-25 
13:12:55), <=($1, 2020-04-24 13:12:55))]) : rowType = RecordType(ANY user, ANY 
created_at): rowcount = 121.2375, cumulative cost = \{6466.0 rows, 22792.65 
cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10465
00-04 Scan(table=[[mongo, example, jobs]], groupscan=[MongoGroupScan 
[MongoScanSpec=MongoScanSpec [dbName=example, collectionName=jobs, 
filters=null], columns=[`user`, `created_at`]]]) : rowType = RecordType(ANY 
user, ANY created_at): rowcount = 3233.0, cumulative cost = \{3233.0 rows, 
6466.0 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10464

Index created in MongoDB
{
 "v" : 2,
 "key" : {
 "user" : 1,
 "created_at" : 1,
 "method_map_id" : 1
 },
 "name" : "user_1_created_at_1_method_map_id_1",
 "ns" : "example.jobs"
}

 

In a drill documentation, I see that drill supports indexes only for MapR DB. 
Does that mean Indexes of other data sources like mongo won't be used ?

[https://drill.apache.org/docs/querying-indexes-introduction/]

Kindly take a look at this issue.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to