Github user ppadma commented on the issue:
https://github.com/apache/drill/pull/1015
@paul-rogers Thanks a lot for the review. Updated the PR with code review
comments. Please take a look.
Overall, good improvement with this change. Here are the numbers.
select count(*) from `/Users/ppenumarthy/MAPRTECH/padma/testdata` where
l_comment like '%a'
1.4 sec vs 7 sec
select count(*) from `/Users/ppenumarthy/MAPRTECH/padma/testdata` where
l_comment like '%a%'
6.5 sec vs 13.5 sec
select count(*) from `/Users/ppenumarthy/MAPRTECH/padma/testdata` where
l_comment like 'a%'
1.4 sec vs 5.8 sec
select count(*) from `/Users/ppenumarthy/MAPRTECH/padma/testdata` where
l_comment like 'a'
1.1.65 sec vs 5.8 sec
I think for "contains", improvement is not as much as others, probably
because of nested for loops. @sachouche changes on top of these changes can
improve further.
---