Does Keith's input format apply the necessary Accumulo iterators to provide a sane view of the data to MapReduce?
And what you're proposing is an input format that works over RFiles where perhaps multiple versions of the same row/column don't exist in multiple files and where there are no delete markers, etc? On Mar 9, 2012, at 12:10 PM, John Vines (Created) (JIRA) wrote: > RFile Input Format > ------------------ > > Key: ACCUMULO-454 > URL: https://issues.apache.org/jira/browse/ACCUMULO-454 > Project: Accumulo > Issue Type: New Feature > Components: client > Reporter: John Vines > Assignee: Billie Rinaldi > Fix For: 1.4.1 > > > We currently provide InputFormats for reading from Accumulo and output > formats for both direct input as well as outputting RFiles. But we provide no > mechanism for doing a mapreduce over existing RFiles, which may be useful for > optimizing data flow. We already have input formats which use RFiles directly > for input (The offline input format Keith just finished), but that still > relies on the Accumulo structure. We should go ahead and also create an input > format that just hits RFiles like the other standard file input formats. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > For more information on JIRA, see: http://www.atlassian.com/software/jira > >
