+1 It will help to rely on that code in the process of implementing Drill Metastore, DRILL-6552.
@Gautam Please address all current commits and rebase onto latest master, then Vova and me will do additional review for it. Just for clarification, am I right, the changes state is the same as in last comment in DRILL-1328 [1] (will not include histograms and will cause some regressions for TPC-H and TPC-DS benchmarks)? [1] https://issues.apache.org/jira/browse/DRILL-1328?focusedCommentId=16061374&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16061374 Kind regards Vitalii On Tue, Nov 6, 2018 at 1:47 AM Parth Chandra <par...@apache.org> wrote: > +1 > I'd say go for it. > If the option to use enhanced stats an be turned on per session, then users > can experiment and choose to turn it on for queries where they do not > experience performance degradation. > > > On Fri, Nov 2, 2018 at 3:25 PM Gautam Parai <gpa...@mapr.com> wrote: > > > Hi all, > > > > I had an initial implementation for statistics support for Drill > > [DRILL-1328] <https://issues.apache.org/jira/browse/DRILL-1328>. This > JIRA > > has links to the design spec as well as the PR. Unfortunately, because of > > some regressions on performance benchmarks (TPCH/TPCDS) we decided to > > temporarily shelve the implementation. I would like to resolve the > pending > > issues and get the changes in. > > > > Hopefully, it will be okay to merge it in as an experimental feature > since > > in order to resolve these issues we may need to change the existing join > > ordering algorithm in Drill, add support for Histograms and a few other > > planning related issues. Moreover, the community is adding a meta-store > for > > Drill [DRILL-6552] <https://issues.apache.org/jira/browse/DRILL-6552>. > > Statistics should also be able to leverage the brand new meta-store > instead > > of/in addition to having a custom store implementation. > > > > My plan is to address the most critical review comments and get the > initial > > version in as an experimental feature. Some other good-to-have aspects > like > > handling schema changes during the statistics collection process maybe > > deferred to the next iteration. Subsequently, I will improve these > > good-to-have features and additional performance improvements. It would > be > > great to get the initial implementation in to avoid the rebase issues and > > allow other community members to use and contribute to the feature. > > > > Please take a look at the design doc and the PR and provide suggestions > and > > feedback on the JIRA. Also I will try to present the current state of > > statistics and the feature in one of the bi-weekly Drill Community > > Hangouts. > > > > Thanks, > > Gautam > > >