[
https://issues.apache.org/jira/browse/BLUR-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475938#comment-13475938
]
Aaron McCurry commented on BLUR-18:
-----------------------------------
After thinking about it, we should probably just run one input split per server
instead of per shard. That way a single MR program won't overwhelm the shard
cluster. In the future we may want to allow this to be configurable.
I think that you are on the right track, I modified your code and added a
little of my own to describe what I was thinking.
Driver program:
public static void main(String[] args) {
//This code will execute against the blur controllers
Configuration conf = new Configuration();
Session session = client.openReadSession();
QuerySession querySession = client.executeQuery(session, "select * from
table1");
Job job = BlurInputFormat.configureJob(configuration,querySession);
//run job
client.closeReadSession(session);
}
public List<InputSplit> getSplits(JobContext context) throws IOException,
InterruptedException {
try {
QuerySession querySession = BlurInputFormat.readQuerySession(context);
List<InputSplit> splits = new ArrayList<String>();
List<String> shardServerConnections = getShardServers(querySession);
for (String shardServerConnection : shardServerConnections){
splits.add(new BlurSplit(shardServerConnection, querySession));
}
return splits;
} catch (...)
//throw exceptions
}
}
private List<String> getShardServers(QuerySession querySession) {
//add to the query session object what shard cluster the query is executing
against
//we will need to add this into the thrift api
//then lookup the shard servers from the blur controller
}
> Rework the MapReduce Library to implement Input/OutputFromats
> -------------------------------------------------------------
>
> Key: BLUR-18
> URL: https://issues.apache.org/jira/browse/BLUR-18
> Project: Apache Blur
> Issue Type: Improvement
> Reporter: Aaron McCurry
>
> Currently the only way to implement indexing is to use the BlurReducer. A
> better way to implement this would be to support Hadoop input/outputformats
> in both the new and old api's. This would allow an easier integration with
> other Hadoop projects such as Hive and Pig.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira