Hi Gautam, You touched on the key issue: storage. You mention that the Drill stats implementation learned from Oracle. Very wise: Oracle is the clear expert in this space.
There is a very important difference, however, between Drill and Oracle. Oracle is a complete database including both query engine and storage. Drill is a query engine only. This is the issue at the heart of our discussion. Oracle has a tabular storage engine for relational data. Oracle uses that storage engine for metadata and stats. This ensures that metadata and stats benefit from concurrency control, transactions, crash recovery (i.e. roll forward/roll back), backup and so. Drill's equivalents are. . . (crickets.) Drill is a query engine that sits atop the storage engine of your choice. That is what sets Drill apart from Impala and Hive which are tightly coupled to HDFS, HMS, Ranger/Sentry, etc. (Spark takes a similar position to Drill: Spark runs on anything and has no storage, other than shuffle files.) As a query engine, Drill should compute stats, as you suggested. But, when it comes to STORING stats, Drill has nothing to say, nor should it. We currently use a broken implementation for Parquet metadata. We write files into the data directory (destroying directory update timestamps), across multiple files, with no concurrency control, no versioning, no crash recovery, no nothing. Run a query concurrently with Parquet metadata collection: things get corrupted. Run two Parquet metadata updates, things get really corrupted. Why? Storage is hard to get right when doing concurrent access and update. This is not a foundation on which to build! Oracle would not survive a day if it corrupted system tables when two or more users did operations at the same time. OK, Drill has a problem. The first step is to acknowledge it. The next is to look for solutions. Either Drill adds a storage engine, or it stays agnostic, leaves storage to an external system, and makes stats storage a plugin. Drill already accesses data via a plugin. This is why Drill can read HDFS, S3, Aluxio, Kafka, JDBC, and on and on. This is a valuable, differentiating feature. It is, in fact, why Drill has a place in a world dominated by Hive, Spark and Impala. For stats, this means that Drill does the query engine part (gather stats on the one hand, and consume stats for planning on the other.) But, it means that Drill DOES NOT attempt to store the stats. Drill relies on an external system for that role. Here is where the stats discussion aligns with the metadata (table schema) discussion. There are many ways to store metadata (including stats). In a RDBMS, in HMS, in files (done with MVCC or other concurrency control), in a key/value store and so on. All of these are more robust than the broken Parquet metadata file implementation. So, if stats are to be stored by an external storage system, that means that Drill's focus should be on APIs: how to obtain the stats from Drill to store them, and how to return them to Drill when requested when planning a query. This is exactly the same model we take with data (Drill gives data to HDFS to store, asks HDFS for the location of the data during planning.) This is the reason I suggested gathering stats as a query: you need add no new API: just issue a query using the existing Drill client. As you point out, perhaps Drill is in a better position to decide what stats should be gathered. Point taken. So, instead of using a query, define a stats API with both "put" and "get" interfaces. Then, of course, you can certainly create a POC implementation of the storage engine based on the broken Parquet metadata file format. Since it is just a reference implementation, the fragility of the solution can be forgiven. This is a very complex topic, and touches on Drill's place in the open source query engine world. Thanks much for having the patience to discuss the issues here on the dev list. What do other people think about the storage question? Is the plugin approach the right one? Is there some other alternative the project should consider? Should Drill build its own? Thanks, - Paul On Friday, November 9, 2018, 3:11:11 PM PST, Gautam Parai <gpa...@mapr.com> wrote: Hi Paul, ...