[ 
https://issues.apache.org/jira/browse/DRILL-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13601352#comment-13601352
 ] 

Jacques Nadeau commented on DRILL-13:
-------------------------------------

- Storage engines have local interface accessible within the Drill daemon that 
is colocated with the underlying system's daemons, correct? 
   - specifically I mean that for NoSQL stores like Cassandra or HBase there 
will be a local daemon in each node that does inter-process communication with 
the underlying store and provides information on the local partitions so that 
the query planner can take that into account. 

>> Correct


- We will need a meta store for ad-hoc schema matching/schema caching correct? 
  - While Cassandra has a schema that is easy to use and read when queried with 
CQL3 we could probably use storing the schema of certain HBase tables so that 
the values in it can be returned in some form other than byte[]s, the user 
would be responsible for maintaining this. 

>> Yes(ish).  Couple of things different than a traditional vision.  1) We want 
>> to support pure schemaless queries.  E.g. point at hbase zk address and do 
>> select * from tablename.  This means the entire infrastructure should work 
>> with minimal schema managment.  (Especially true when you think of future 
>> sources like MongoDB.)   2) Our hope is that schema management comes out of 
>> views on top of schemaless queries (as opposed to something more like 
>> external table definitions) so that query-level data provenance is very 
>> clear for analysts.  3) When we do store schema data, hopefully we can 
>> leverage the hive metastore.  (as well as support data sources that are 
>> already configured in hive metastore without much complexity).


>>> RE: Push down.  In the ideal world, I've been thinking that the storage 
>>> engine interface should be able to receive a logical plan and return back 
>>> an updated logical plan with a more complex scan node and a simplified 
>>> plan.  Basically the storage engine would use the opaque selection property 
>>> to encapsulate removed logical nodes that the storage engine can take care 
>>> of internally.  The storage engine implementation would thus be internally 
>>> responsible for converting the portions of the logical plan that it took 
>>> over ownership of into whatever its native formats were.  In a the ultimate 
>>> case (select into in single system), the data would never actually leave 
>>> the datastore/storage engine.

                
> Storage Engine: Define Java Interface
> -------------------------------------
>
>                 Key: DRILL-13
>                 URL: https://issues.apache.org/jira/browse/DRILL-13
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Jacques Nadeau
>            Assignee: Jacques Nadeau
>
> We're going to need to define a storage engine API.  At a minimum, we'll need 
> to generate a Java one.  We will probably need to also create a CPP one.  
> This task is for the former.  Things that are likely to be included in a the 
> Java interface are: reader (scanner), writer, capabilities interface, schema 
> interface, statistics interface, data layout and ordering

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to