[ 
https://issues.apache.org/jira/browse/BLUR-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490135#comment-13490135
 ] 

Aaron McCurry commented on BLUR-18:
-----------------------------------

I have created a remote branch of 0.2-dev-mr-formats.  Also I think that we 
need to create some new Writable types for the InputFormat.  I'm thinking 
DocLocation (to contain the shard index, and the document id) as the key, and a 
Document Writable object for carrying the Thrift Document data as the value, 
from there we can work on the InputSplits.  I know I have been back and forth 
on this but I think that we need to make the split be for each shard not per 
server.  My reasoning here is because in the event of a shard server failure 
during a MapReduce job, it will be easier to rerun each shard then to rerun 
each server.  This is because the shards in the down shard server we be evenly 
spread out across the cluster of remaining shard servers.

I should have some more time tomorrow to discussion and 
rework/implement/review.  Thanks for the good start!
                
> Rework the MapReduce Library to implement Input/OutputFormats
> -------------------------------------------------------------
>
>                 Key: BLUR-18
>                 URL: https://issues.apache.org/jira/browse/BLUR-18
>             Project: Apache Blur
>          Issue Type: Improvement
>            Reporter: Aaron McCurry
>             Fix For: 0.2.0
>
>         Attachments: 0001-BLUR-ID-18-Created-New-Version-of-Files.patch
>
>
> Currently the only way to implement indexing is to use the BlurReducer.  A 
> better way to implement this would be to support Hadoop input/outputformats 
> in both the new and old api's.  This would allow an easier integration with 
> other Hadoop projects such as Hive and Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to