Hi Vitalii: Glad to hear that you are also looking at this part. Let's keep discussion under that Jira.
On Fri, Jun 29, 2018 at 1:27 AM Vitalii Diravka <[email protected]> wrote: > Hi Weijie, > > Thanks for bringing this topic up! > > Basically you are right, Hive Metastore is one the best candidates for > storing Driil's metadata. > Also it will be good to make an abstraction, which will allow to implement > and use other kind of tools for Metastore. > The question of Metastore performance can be important especially for light > Drill tables. > > Currently Vova and I are working on the proposal for metastore. > I have created Jira DRILL-6552 [1] where all the related discussions can be > held. > > [1] https://issues.apache.org/jira/browse/DRILL-6552 > > Kind regards > Vitalii > > > On Thu, Jun 28, 2018 at 6:49 PM Arina Yelchiyeva < > [email protected]> > wrote: > > > Hi, > > > > Vitalii and Vova is also looking at this part, you might want to sync up > > with them. Or even better, we can create Jira for this and held all > > discussions there. > > Vitalii, what do you think? > > > > Kind regards, > > Arina > > > > On Thu, Jun 28, 2018 at 6:46 PM weijie tong <[email protected]> > > wrote: > > > > > HI all: > > > > > > As @aman ever noticed me about the roadmap of DRILL-2.0 ,which > > includes > > > the description of the metadata design ( > > > > > > > > > https://lists.apache.org/thread.html/74cf48dd78d323535dc942c969e72008884e51f8715f4a20f6f8fb66@%3Cdev.drill.apache.org%3E > > > ) > > > , I am interested in taking the role to implement the metadata part. > > > Here I fire this discussion thread to know your idea about this > problem. > > > > > > I have investigated some open source project about the metadata > ,such > > > as Hive Metastore ( > > > > https://cwiki.apache.org/confluence/display/Hive/Design#Design-Metastore > > ) > > > ,Netflix metacat, Apache Atlas,LinkedIn WhereHows( > > > https://github.com/linkedin/WhereHows) ; Except Hive Metastore, > other > > > projects have an high abstract definition to the actual physical > metadata > > > which will benefit to extend to add new metadata property. Hive > > Metastore‘s > > > design is to the physical metadata , also with thrift interface to > > > different languages, but depend on the relational database not good to > > > scale and performance. To my opinion , I would prefer Hive Metastore > as > > > our design template or just reuse it, as we don't need to do a rich > > > metadata management system. Maybe we should change the backend database > > to > > > a high query performance kv store like Hbase. > > > > > > Besides the metadata interface design and the backend storage > chosen, > > we > > > should also provide the random query ability . So users can calculate > the > > > statistics like NDV to store in the metadata. Btw, maybe we can go > > further > > > to take in the Verdictdb (https://github.com/mozafari/verdictdb) to > > > provide more richful approximate query processing . > > > > > >
