[
https://issues.apache.org/jira/browse/DRILL-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605889#comment-13605889
]
David Alves commented on DRILL-13:
----------------------------------
My interest in BF's is not so much in advertising that the underlying engine
supports them for generic purposes (even though that might be interesting in
some obscure optimization choices), my interest pertains to using them in large
scale joins.
My assumption is that large scale joins will be composed of two parts, one
local part below the SE layer that handles node local data, and one part above
the SE layer that coordinates the join across cluster nodes.
Now of course in an ideal world we could have portable-format BF's that could
be used on semi-joins across datasource formats, but that is much harder that
what I'm proposing.
Im proposing to start by having a portable BF's definition but the BF itself
would be opaque and could only be used for inter-datasource joins (across hbase
nodes or across cassandra nodes but not between hbase and cassandra).
Now I agree with your definition of what the real use cases are, but the join
coordination layer would still sit above the SE, which means that we could use
the same code for both hbase or cassandra at this layer since we dont care
about the BF format, but it would have to access the BF definition and the BF
itself in opaque form.
Now I know this is certainly not a design priority, but I do think BF
definition info would sit nicely with the partitioning info and would no
require much.
In any case I'll try and output some code that illustrates what I'm saying and
maybe you can take a look and tell me what you think then.
> Storage Engine: Define Java Interface
> -------------------------------------
>
> Key: DRILL-13
> URL: https://issues.apache.org/jira/browse/DRILL-13
> Project: Apache Drill
> Issue Type: Task
> Reporter: Jacques Nadeau
> Assignee: Jacques Nadeau
>
> We're going to need to define a storage engine API. At a minimum, we'll need
> to generate a Java one. We will probably need to also create a CPP one.
> This task is for the former. Things that are likely to be included in a the
> Java interface are: reader (scanner), writer, capabilities interface, schema
> interface, statistics interface, data layout and ordering
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira