[ https://issues.apache.org/jira/browse/ACCUMULO-942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Newton resolved ACCUMULO-942. ---------------------------------- Resolution: Won't Fix See ACCUMULO-919 > accumulo should be more resilient in the face of NN failures > ------------------------------------------------------------ > > Key: ACCUMULO-942 > URL: https://issues.apache.org/jira/browse/ACCUMULO-942 > Project: Accumulo > Issue Type: Bug > Components: tserver > Reporter: Eric Newton > Assignee: Eric Newton > Priority: Critical > > We experienced a NN failure on a large cluster. The edit log was written to > a RAIDed file system, but it did lose data sent to the edit log. We suspect > drivers making promises it did not keep. > This left Accumulo in a slightly corrupt state: a few references to files > that were missing. > Also, we have attempted to have backup images of HDFS archived for disaster > recovery. This has not been helpful because Accumulo needs a highly > consistent set of metadata, and a slightly older version of the file system > confuses it. > One defense is to use snapshots. However, this works at the table level, and > it is hard to coordinate with the HDFS snapshot. > Another approach is to leave a short history of the files in the !METADATA > table. The Google paper hints at keeping historical information: > {quote} > We also store secondary information in the > METADATA table, including a log of all events per- > taining to each tablet (such as when a server begins > serving it). This information is helpful for debugging > and performance analysis. > {quote} > I think it would also be helpful for disaster recovery. It may require the > GC to be more sensitive to historical information about compactions. > Alternatively, we should start looking into high-availability NNs and > bookkeeper high-performance logging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira