Hi Gautam,

You touched on the key issue: storage. You mention that the Drill stats 
implementation learned from Oracle. Very wise: Oracle is the clear expert in 
this space.

There is a very important difference, however, between Drill and Oracle. Oracle 
is a complete database including both query engine and storage. Drill is a 
query engine only. This is the issue at the heart of our discussion.

Oracle has a tabular storage engine for relational data. Oracle uses that 
storage engine for metadata and stats. This ensures that metadata and stats 
benefit from concurrency control, transactions, crash recovery (i.e. roll 
forward/roll back), backup and so.

Drill's equivalents are. . . (crickets.)

Drill is a query engine that sits atop the storage engine of your choice. That 
is what sets Drill apart from Impala and Hive which are tightly coupled to 
HDFS, HMS, Ranger/Sentry, etc. (Spark takes a similar position to Drill: Spark 
runs on anything and has no storage, other than shuffle files.)

As a query engine, Drill should compute stats, as you suggested. But, when it 
comes to STORING stats, Drill has nothing to say, nor should it.

We currently use a broken implementation for Parquet metadata. We write files 
into the data directory (destroying directory update timestamps), across 
multiple files, with no concurrency control, no versioning, no crash recovery, 
no nothing. Run a query concurrently with Parquet metadata collection: things 
get corrupted. Run two Parquet metadata updates, things get really corrupted. 
Why? Storage is hard to get right when doing concurrent access and update.

This is not a foundation on which to build! Oracle would not survive a day if 
it corrupted system tables when two or more users did operations at the same 
time.

OK, Drill has a problem. The first step is to acknowledge it. The next is to 
look for solutions.

Either Drill adds a storage engine, or it stays agnostic, leaves storage to an 
external system, and makes stats storage a plugin. Drill already accesses data 
via a plugin. This is why Drill can read HDFS, S3, Aluxio, Kafka, JDBC, and on 
and on. This is a valuable, differentiating feature. It is, in fact, why Drill 
has a place in a world dominated by Hive, Spark and Impala.

For stats, this means that Drill does the query engine part (gather stats on 
the one hand, and consume stats for planning on the other.) But, it means that 
Drill DOES NOT attempt to store the stats. Drill relies on an external system 
for that role.

Here is where the stats discussion aligns with the metadata (table schema) 
discussion. There are many ways to store metadata (including stats). In a 
RDBMS, in HMS, in files (done with MVCC or other concurrency control), in a 
key/value store and so on. All of these are more robust than the broken Parquet 
metadata file implementation.

So, if stats are to be stored by an external storage system, that means that 
Drill's focus should be on APIs: how to obtain the stats from Drill to store 
them, and how to return them to Drill when requested when planning a query. 
This is exactly the same model we take with data (Drill gives data to HDFS to 
store, asks HDFS for the location of the data during planning.)

This is the reason I suggested gathering stats as a query: you need add no new 
API: just issue a query using the existing Drill client. As you point out, 
perhaps Drill is in a better position to decide what stats should be gathered. 
Point taken. So, instead of using a query, define a stats API with both "put" 
and "get" interfaces.

Then, of course, you can certainly create a POC implementation of the storage 
engine based on the broken Parquet metadata file format. Since it is just a 
reference implementation, the fragility of the solution can be forgiven.

This is a very complex topic, and touches on Drill's place in the open source 
query engine world. Thanks much for having the patience to discuss the issues 
here on the dev list.

What do other people think about the storage question? Is the plugin approach 
the right one? Is there some other alternative the project should consider? 
Should Drill build its own?

Thanks,
- Paul

 

    On Friday, November 9, 2018, 3:11:11 PM PST, Gautam Parai <gpa...@mapr.com> 
wrote:  
 
 Hi Paul,

...  

Reply via email to