+1
I'd say go for it.
If the option to use enhanced stats an be turned on per session, then users
can experiment and choose to turn it on for queries where they do not
experience performance degradation.


On Fri, Nov 2, 2018 at 3:25 PM Gautam Parai <gpa...@mapr.com> wrote:

> Hi all,
>
> I had an initial implementation for statistics support for Drill
> [DRILL-1328] <https://issues.apache.org/jira/browse/DRILL-1328>. This JIRA
> has links to the design spec as well as the PR. Unfortunately, because of
> some regressions on performance benchmarks (TPCH/TPCDS) we decided to
> temporarily shelve the implementation. I would like to resolve the pending
> issues and get the changes in.
>
> Hopefully, it will be okay to merge it in as an experimental feature since
> in order to resolve these issues we may need to change the existing join
> ordering algorithm in Drill, add support for Histograms and a few other
> planning related issues. Moreover, the community is adding a meta-store for
> Drill [DRILL-6552] <https://issues.apache.org/jira/browse/DRILL-6552>.
> Statistics should also be able to leverage the brand new meta-store instead
> of/in addition to having a custom store implementation.
>
> My plan is to address the most critical review comments and get the initial
> version in as an experimental feature. Some other good-to-have aspects like
> handling schema changes during the statistics collection process maybe
> deferred to the next iteration. Subsequently, I will improve these
> good-to-have features and additional performance improvements. It would be
> great to get the initial implementation in to avoid the rebase issues and
> allow other community members to use and contribute to the feature.
>
> Please take a look at the design doc and the PR and provide suggestions and
> feedback on the JIRA. Also I will try to present the current state of
> statistics and the feature in one of the bi-weekly Drill Community
> Hangouts.
>
> Thanks,
> Gautam
>

Reply via email to