[ 
https://issues.apache.org/jira/browse/BLUR-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475733#comment-13475733
 ] 

Aaron McCurry commented on BLUR-18:
-----------------------------------

1. The the new Thrift API is incomplete, but assuming we add in a few methods 
from 0.1 API such as shardLayout(tableName) which return a map of shards to 
servers (shard servers serve 0 or more shards per table).  So with that 
information, one input split could correspond to one shard.  For example if 
there is a table with 1000 shards being served on 100 machines there would 1000 
splits.  So really you would only need to care about number of shards per table.

2. No, a shard server serves potentially many shards of a given table  And if 
more servers are added, or some of the servers fail the indexes will logically 
move toa new server.

3. My first thought is the session gets created in the MR driver program and 
executes the query against one of the controller servers.  Then the splits read 
the results directly from the shard servers.

4. Yes, but I'm going to change it back to Record.  There was enough confusion 
on my project when I was chatting with people to realize that it was a bad 
name.  :)

5. At this point both are being created, the idea is that the BlurTuple service 
is to be used by external clients.  So simplicity/ease of use is the driver for 
this API, the BlurShard is to be used by internal code such as the controllers 
and the MR system.  In the past the shard server ans controller servers 
presented the same API, but now since we in a state of change I'm not sure if 
that's necessary going forward.  And for that matter I'm totally sold on 
keeping the internal API thrift based.  It would probably be easier to provided 
a more MR friendly API to the MR programs.
                
> Rework the MapReduce Library to implement Input/OutputFromats
> -------------------------------------------------------------
>
>                 Key: BLUR-18
>                 URL: https://issues.apache.org/jira/browse/BLUR-18
>             Project: Apache Blur
>          Issue Type: Improvement
>            Reporter: Aaron McCurry
>
> Currently the only way to implement indexing is to use the BlurReducer.  A 
> better way to implement this would be to support Hadoop input/outputformats 
> in both the new and old api's.  This would allow an easier integration with 
> other Hadoop projects such as Hive and Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to