[
https://issues.apache.org/jira/browse/DRILL-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603016#comment-13603016
]
David Alves commented on DRILL-13:
----------------------------------
wrt to the storage engine and push down:
I don't know if pushing logical plan rewriting into the storage engine impl is
the best idea, but I definitely agree with the goal (i.e., "In a the ultimate
case (select into in single system), the data would never actually leave the
datastore/storage engine.").
My main pain points are the following:
- engines would have to be aware of more than they need wrt to logical plans
- login plan processing logic would have to be replicated/used in multiple
places
- a simplification triggered by an engine could allow simplifications from
other engines making us have to cycle through the intervening engines multiple
times.
- it's really an external capability of the SE itself (e.g. we wouldn't want to
call a live SE proxy, making a remote call to simplify a plan)
Do you see an issue in generalizing this logic in query pre-processor (not
calling it an optimizer since there's really the one rule).
I mean have something simple like having SE's return a StorageEngineCapability
such as :
public void HBaseStorageEngine {
public Capabilities capabilities() {
return new Capabilities() {
public void List<Class<LogicalOperator>> internalProcessingAbility() {
return
ImmutableList.of(Filter.class,Project.class,PartialAggregation.class,
Sink.class).
}
...
}
}
...
}
and then have the query pre-processor push nodes in the DAG *above* the SE's
scanner below it if they match the SE's ability.
If all the datasources (leafs) are from the SE and the SE supports Sink and all
the operators in between then we'd push the whole plan below the engine's
scanner.
Now if I'm over simplifying and there are cases that I haven't thought of where
this doesn't work then I think we might want to have the engine provide a
MyStorageEnginePreProcessor as the entity that does the modifications to the
LogicalPlan.
wdyt?
> Storage Engine: Define Java Interface
> -------------------------------------
>
> Key: DRILL-13
> URL: https://issues.apache.org/jira/browse/DRILL-13
> Project: Apache Drill
> Issue Type: Task
> Reporter: Jacques Nadeau
> Assignee: Jacques Nadeau
>
> We're going to need to define a storage engine API. At a minimum, we'll need
> to generate a Java one. We will probably need to also create a CPP one.
> This task is for the former. Things that are likely to be included in a the
> Java interface are: reader (scanner), writer, capabilities interface, schema
> interface, statistics interface, data layout and ordering
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira