[
https://issues.apache.org/jira/browse/BLUR-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475733#comment-13475733
]
Aaron McCurry commented on BLUR-18:
-----------------------------------
1. The the new Thrift API is incomplete, but assuming we add in a few methods
from 0.1 API such as shardLayout(tableName) which return a map of shards to
servers (shard servers serve 0 or more shards per table). So with that
information, one input split could correspond to one shard. For example if
there is a table with 1000 shards being served on 100 machines there would 1000
splits. So really you would only need to care about number of shards per table.
2. No, a shard server serves potentially many shards of a given table And if
more servers are added, or some of the servers fail the indexes will logically
move toa new server.
3. My first thought is the session gets created in the MR driver program and
executes the query against one of the controller servers. Then the splits read
the results directly from the shard servers.
4. Yes, but I'm going to change it back to Record. There was enough confusion
on my project when I was chatting with people to realize that it was a bad
name. :)
5. At this point both are being created, the idea is that the BlurTuple service
is to be used by external clients. So simplicity/ease of use is the driver for
this API, the BlurShard is to be used by internal code such as the controllers
and the MR system. In the past the shard server ans controller servers
presented the same API, but now since we in a state of change I'm not sure if
that's necessary going forward. And for that matter I'm totally sold on
keeping the internal API thrift based. It would probably be easier to provided
a more MR friendly API to the MR programs.
> Rework the MapReduce Library to implement Input/OutputFromats
> -------------------------------------------------------------
>
> Key: BLUR-18
> URL: https://issues.apache.org/jira/browse/BLUR-18
> Project: Apache Blur
> Issue Type: Improvement
> Reporter: Aaron McCurry
>
> Currently the only way to implement indexing is to use the BlurReducer. A
> better way to implement this would be to support Hadoop input/outputformats
> in both the new and old api's. This would allow an easier integration with
> other Hadoop projects such as Hive and Pig.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira