[ 
https://issues.apache.org/jira/browse/DRILL-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603016#comment-13603016
 ] 

David Alves commented on DRILL-13:
----------------------------------

wrt to the storage engine and push down:

I don't know if pushing logical plan rewriting into the storage engine impl is 
the best idea, but I definitely agree with the goal (i.e., "In a the ultimate 
case (select into in single system), the data would never actually leave the 
datastore/storage engine.").
My main pain points are the following:
- engines would have to be aware of more than they need wrt to logical plans
- login plan processing logic would have to be replicated/used in multiple 
places
- a simplification triggered by an engine could allow simplifications from 
other engines making us have to cycle through the intervening engines multiple 
times.
- it's really an external capability of the SE itself (e.g. we wouldn't want to 
call a live SE proxy, making a remote call to simplify a plan)

Do you see an issue in generalizing this logic in query pre-processor (not 
calling it an optimizer since there's really the one rule).
I mean have something simple like having SE's return a StorageEngineCapability 
such as :

public void HBaseStorageEngine {
  
  public Capabilities capabilities() {
     return new Capabilities() {

       public void List<Class<LogicalOperator>> internalProcessingAbility() {
         return 
ImmutableList.of(Filter.class,Project.class,PartialAggregation.class, 
Sink.class).
       }
      
       ...
     }
  }
  ...
}

and then have the query pre-processor push nodes in the DAG *above* the SE's 
scanner below it if they match the SE's ability.
If all the datasources (leafs) are from the SE and the SE supports Sink and all 
the operators in between then we'd push the whole plan below the engine's 
scanner.

Now if I'm over simplifying and there are cases that I haven't thought of where 
this doesn't work then I think we might want to have the engine provide a 
MyStorageEnginePreProcessor as the entity that does the modifications to the 
LogicalPlan.

wdyt?



                
> Storage Engine: Define Java Interface
> -------------------------------------
>
>                 Key: DRILL-13
>                 URL: https://issues.apache.org/jira/browse/DRILL-13
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Jacques Nadeau
>            Assignee: Jacques Nadeau
>
> We're going to need to define a storage engine API.  At a minimum, we'll need 
> to generate a Java one.  We will probably need to also create a CPP one.  
> This task is for the former.  Things that are likely to be included in a the 
> Java interface are: reader (scanner), writer, capabilities interface, schema 
> interface, statistics interface, data layout and ordering

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to