Does Keith's input format apply the necessary Accumulo iterators to provide a 
sane view of the data to MapReduce?

And what you're proposing is an input format that works over RFiles where 
perhaps multiple versions of the same row/column don't exist in multiple files 
and where there are no delete markers, etc?

On Mar 9, 2012, at 12:10 PM, John Vines (Created) (JIRA) wrote:

> RFile Input Format
> ------------------
> 
>                 Key: ACCUMULO-454
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-454
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client
>            Reporter: John Vines
>            Assignee: Billie Rinaldi
>             Fix For: 1.4.1
> 
> 
> We currently provide InputFormats for reading from Accumulo and output 
> formats for both direct input as well as outputting RFiles. But we provide no 
> mechanism for doing a mapreduce over existing RFiles, which may be useful for 
> optimizing data flow. We already have input formats which use RFiles directly 
> for input (The offline input format Keith just finished), but that still 
> relies on the Accumulo structure. We should go ahead and also create an input 
> format that just hits RFiles like the other standard file input formats.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA 
> administrators: 
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 

Reply via email to