Hi all,

I had an initial implementation for statistics support for Drill
[DRILL-1328] <https://issues.apache.org/jira/browse/DRILL-1328>. This JIRA
has links to the design spec as well as the PR. Unfortunately, because of
some regressions on performance benchmarks (TPCH/TPCDS) we decided to
temporarily shelve the implementation. I would like to resolve the pending
issues and get the changes in.

Hopefully, it will be okay to merge it in as an experimental feature since
in order to resolve these issues we may need to change the existing join
ordering algorithm in Drill, add support for Histograms and a few other
planning related issues. Moreover, the community is adding a meta-store for
Drill [DRILL-6552] <https://issues.apache.org/jira/browse/DRILL-6552>.
Statistics should also be able to leverage the brand new meta-store instead
of/in addition to having a custom store implementation.

My plan is to address the most critical review comments and get the initial
version in as an experimental feature. Some other good-to-have aspects like
handling schema changes during the statistics collection process maybe
deferred to the next iteration. Subsequently, I will improve these
good-to-have features and additional performance improvements. It would be
great to get the initial implementation in to avoid the rebase issues and
allow other community members to use and contribute to the feature.

Please take a look at the design doc and the PR and provide suggestions and
feedback on the JIRA. Also I will try to present the current state of
statistics and the feature in one of the bi-weekly Drill Community Hangouts.

Thanks,
Gautam

Reply via email to