[ 
https://issues.apache.org/jira/browse/DRILL-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605889#comment-13605889
 ] 

David Alves commented on DRILL-13:
----------------------------------

My interest in BF's is not so much in advertising that the underlying engine 
supports them for generic purposes (even though that might be interesting in 
some obscure optimization choices), my interest pertains to using them in large 
scale joins.

My assumption is that large scale joins will be composed of two parts, one 
local part below the SE layer that handles node local data, and one part above 
the SE layer that coordinates the join across cluster nodes.

Now of course in an ideal world we could have portable-format BF's that could 
be used on semi-joins across datasource formats, but that is much harder that 
what I'm proposing.

Im proposing to start by having a portable BF's definition but the BF itself 
would be opaque and could only be used for inter-datasource joins (across hbase 
nodes or across cassandra nodes but not between hbase and cassandra).

Now I agree with your definition of what the real use cases are, but the join 
coordination layer would still sit above the SE, which means that we could use 
the same code for both hbase or cassandra at this layer since we dont care 
about the BF format, but it would have to access the BF definition and the BF 
itself in opaque form.

Now I know this is certainly not a design priority, but I do think BF 
definition info would sit nicely with the partitioning info and would no 
require much.

In any case I'll try and output some code that illustrates what I'm saying and 
maybe you can take a look and tell me what you think then.
                
> Storage Engine: Define Java Interface
> -------------------------------------
>
>                 Key: DRILL-13
>                 URL: https://issues.apache.org/jira/browse/DRILL-13
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Jacques Nadeau
>            Assignee: Jacques Nadeau
>
> We're going to need to define a storage engine API.  At a minimum, we'll need 
> to generate a Java one.  We will probably need to also create a CPP one.  
> This task is for the former.  Things that are likely to be included in a the 
> Java interface are: reader (scanner), writer, capabilities interface, schema 
> interface, statistics interface, data layout and ordering

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to